qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v2 00/67] Hexagon patch series
@ 2020-02-28 16:42 Taylor Simpson
  2020-02-28 16:42 ` [RFC PATCH v2 01/67] Hexagon Maintainers Taylor Simpson
                   ` (67 more replies)
  0 siblings, 68 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

This series adds support for the Hexagon processor with Linux user support

See patch 02/67 Hexagon README for detailed information.

The patches up to and including "Hexagon build infractructure" implement the
base Hexagon core and the remainder add HVX.  Once the build infrastructure
patch is applied, you can build and qemu will execute non-HVX Hexagon programs.

We have a parallel effort to make the Hexagon Linux toolchain publically
available.


*** Testing ***

The port passes the following tests
    Directed unit tests will contributed when the Hexagon toolchain is available
    MUSL libc test suite (good coverage of Linux system calls)
    https://git.musl-libc.org/cgit/libc-testsuite/
    Internal compiler intrinsics test suite (good coverage of instructions)
    Hexagon machine learning library unit tests
    TODO - pull these from the CAF repo
    make check-tcg TIMEOUT=60

*** Known checkpatch issues ***

The following are known checkpatch errors in the series
    include/disas/dis-asm.h             space prohibited
        (Follow convention of other targets on prior lines)
    target/hexagon/reg_fields.h         Complex macro
    target/hexagon/attribs.h            Complex macro
    target/hexagon/decode.c             Complex macro
    target/hexagon/q6v_decode.c         Macro needs do - while
    target/hexagon/printinsn.c          Macro needs do - while
    target/hexagon/gen_semantics.c      Suspicious ; after while (0)
    target/hexagon/gen_dectree_import.c Complex macro
    target/hexagon/gen_dectree_import.c Suspicious ; after while (0)
    target/hexagon/opcodes.c            Complex macro
    target/hexagon/iclass.h             Complex macro
    scripts/qemu-binfmt-conf.sh         Line over 90 characters
    target/hexagon/mmvec/macros.h       Suspicious ; after while (0)

The following are known checkpatch warnings in the series
    target/hexagon/fma_emu.c            Comments inside macro definition
    scripts/qemu-binfmt-conf.sh         Line over 80 characters

*** Changes in v2 ***
- Use scripts/git.orderfile
- Create a README with the code overview in patch 0001
- Change #define's in hex_regs.h to an enum
- Replace hard coded disassembly buffer length (1028) with #define
- Move Hexagon architecture types patch earlier in series
- Replace #include standard header files with #include "qemu/osdep.h"
- Prefix all header file #ifndef's with HEXAGON_
- Update python version to python3
- #include "tcg/tcg.h" in genptr_helpers.h
- Change target/hexagon/Makefile.objs to support out-of-tree build
- Updated copyright to include year 2020
- Bug fixes
    Fix some problems with HEX_DEBUG output
    Fix bug in circular addressing
- Optimizations to reduce the amount of TCG code generated
    Change pred_written from an array to a bit mask
    Optimize readonly vector registers
    Conditionally call gen_helper_commit_hvx_stores

Taylor Simpson (67):
  Hexagon Maintainers
  Hexagon README
  Hexagon ELF Machine Definition
  Hexagon CPU Scalar Core Definition
  Hexagon register names
  Hexagon Disassembler
  Hexagon CPU Scalar Core Helpers
  Hexagon GDB Stub
  Hexagon architecture types
  Hexagon instruction and packet types
  Hexagon register fields
  Hexagon instruction attributes
  Hexagon register map
  Hexagon instruction/packet decode
  Hexagon instruction printing
  Hexagon arch import - instruction semantics definitions
  Hexagon arch import - macro definitions
  Hexagon arch import - instruction encoding
  Hexagon instruction class definitions
  Hexagon instruction utility functions
  Hexagon generator phase 1 - C preprocessor for semantics
  Hexagon generator phase 2 - qemu_def_generated.h
  Hexagon generator phase 2 - qemu_wrap_generated.h
  Hexagon generator phase 2 - opcodes_def_generated.h
  Hexagon generator phase 2 - op_attribs_generated.h
  Hexagon generator phase 2 - op_regs_generated.h
  Hexagon generator phase 2 - printinsn-generated.h
  Hexagon generator phase 3 - C preprocessor for decode tree
  Hexagon generater phase 4 - Decode tree
  Hexagon opcode data structures
  Hexagon macros to interface with the generator
  Hexagon macros referenced in instruction semantics
  Hexagon instruction classes
  Hexagon TCG generation helpers - step 1
  Hexagon TCG generation helpers - step 2
  Hexagon TCG generation helpers - step 3
  Hexagon TCG generation helpers - step 4
  Hexagon TCG generation helpers - step 5
  Hexagon TCG generation - step 01
  Hexagon TCG generation - step 02
  Hexagon TCG generation - step 03
  Hexagon TCG generation - step 04
  Hexagon TCG generation - step 05
  Hexagon TCG generation - step 06
  Hexagon TCG generation - step 07
  Hexagon TCG generation - step 08
  Hexagon TCG generation - step 09
  Hexagon TCG generation - step 10
  Hexagon TCG generation - step 11
  Hexagon TCG generation - step 12
  Hexagon translation
  Hexagon Linux user emulation
  Hexagon build infrastructure
  Hexagon - Add Hexagon Vector eXtensions (HVX) to core definition
  Hexagon HVX support in gdbstub
  Hexagon HVX import instruction encodings
  Hexagon HVX import semantics
  Hexagon HVX import macro definitions
  Hexagon HVX semantics generator
  Hexagon HVX instruction decoding
  Hexagon HVX instruction utility functions
  Hexagon HVX macros to interface with the generator
  Hexagon HVX macros referenced in instruction semantics
  Hexagon HVX helper to commit vector stores (masked and scatter/gather)
  Hexagon HVX TCG generation
  Hexagon HVX translation
  Hexagon HVX build infrastructure

 configure                                    |    9 +
 default-configs/hexagon-linux-user.mak       |    1 +
 include/disas/dis-asm.h                      |    1 +
 include/elf.h                                |    2 +
 linux-user/hexagon/sockbits.h                |   18 +
 linux-user/hexagon/syscall_nr.h              |  346 +++
 linux-user/hexagon/target_cpu.h              |   44 +
 linux-user/hexagon/target_elf.h              |   38 +
 linux-user/hexagon/target_fcntl.h            |   18 +
 linux-user/hexagon/target_signal.h           |   34 +
 linux-user/hexagon/target_structs.h          |   46 +
 linux-user/hexagon/target_syscall.h          |   32 +
 linux-user/hexagon/termbits.h                |   18 +
 linux-user/syscall_defs.h                    |   33 +
 target/hexagon/arch.h                        |   62 +
 target/hexagon/attribs.h                     |   32 +
 target/hexagon/attribs_def.h                 |  404 +++
 target/hexagon/conv_emu.h                    |   50 +
 target/hexagon/cpu-param.h                   |   26 +
 target/hexagon/cpu.h                         |  207 ++
 target/hexagon/cpu_bits.h                    |   37 +
 target/hexagon/decode.h                      |   39 +
 target/hexagon/fma_emu.h                     |   30 +
 target/hexagon/genptr.h                      |   25 +
 target/hexagon/genptr_helpers.h              | 1049 +++++++
 target/hexagon/helper.h                      |   38 +
 target/hexagon/helper_overrides.h            | 1850 ++++++++++++
 target/hexagon/hex_arch_types.h              |   42 +
 target/hexagon/hex_regs.h                    |   99 +
 target/hexagon/iclass.h                      |   46 +
 target/hexagon/insn.h                        |  149 +
 target/hexagon/internal.h                    |   54 +
 target/hexagon/macros.h                      | 1474 ++++++++++
 target/hexagon/mmvec/decode_ext_mmvec.h      |   24 +
 target/hexagon/mmvec/macros.h                |  698 +++++
 target/hexagon/mmvec/mmvec.h                 |   87 +
 target/hexagon/mmvec/system_ext_mmvec.h      |   38 +
 target/hexagon/opcodes.h                     |   67 +
 target/hexagon/printinsn.h                   |   26 +
 target/hexagon/reg_fields.h                  |   40 +
 target/hexagon/reg_fields_def.h              |  109 +
 target/hexagon/regmap.h                      |   38 +
 target/hexagon/translate.h                   |  112 +
 disas/hexagon.c                              |   62 +
 linux-user/elfload.c                         |   16 +
 linux-user/hexagon/cpu_loop.c                |  173 ++
 linux-user/hexagon/signal.c                  |  276 ++
 linux-user/syscall.c                         |    2 +
 target/hexagon/arch.c                        |  663 +++++
 target/hexagon/conv_emu.c                    |  369 +++
 target/hexagon/cpu.c                         |  374 +++
 target/hexagon/decode.c                      |  788 +++++
 target/hexagon/fma_emu.c                     |  916 ++++++
 target/hexagon/gdbstub.c                     |  111 +
 target/hexagon/gen_dectree_import.c          |  205 ++
 target/hexagon/gen_semantics.c               |  101 +
 target/hexagon/genptr.c                      |   61 +
 target/hexagon/iclass.c                      |  107 +
 target/hexagon/mmvec/decode_ext_mmvec.c      |  670 +++++
 target/hexagon/mmvec/system_ext_mmvec.c      |  263 ++
 target/hexagon/op_helper.c                   |  509 ++++
 target/hexagon/opcodes.c                     |  217 ++
 target/hexagon/printinsn.c                   |   91 +
 target/hexagon/q6v_decode.c                  |  416 +++
 target/hexagon/reg_fields.c                  |   28 +
 target/hexagon/translate.c                   |  916 ++++++
 MAINTAINERS                                  |    8 +
 disas/Makefile.objs                          |    1 +
 scripts/qemu-binfmt-conf.sh                  |    6 +-
 target/hexagon/Makefile.objs                 |  127 +
 target/hexagon/README                        |  296 ++
 target/hexagon/dectree.py                    |  353 +++
 target/hexagon/do_qemu.py                    | 1194 ++++++++
 target/hexagon/imported/allext.idef          |   25 +
 target/hexagon/imported/allext_macros.def    |   25 +
 target/hexagon/imported/allextenc.def        |   20 +
 target/hexagon/imported/allidefs.def         |   92 +
 target/hexagon/imported/alu.idef             | 1335 +++++++++
 target/hexagon/imported/branch.idef          |  344 +++
 target/hexagon/imported/compare.idef         |  639 +++++
 target/hexagon/imported/encode.def           |  126 +
 target/hexagon/imported/encode_pp.def        | 2283 +++++++++++++++
 target/hexagon/imported/encode_subinsn.def   |  150 +
 target/hexagon/imported/float.idef           |  498 ++++
 target/hexagon/imported/iclass.def           |   52 +
 target/hexagon/imported/ldst.idef            |  421 +++
 target/hexagon/imported/macros.def           | 3970 ++++++++++++++++++++++++++
 target/hexagon/imported/mmvec/encode_ext.def |  830 ++++++
 target/hexagon/imported/mmvec/ext.idef       | 2780 ++++++++++++++++++
 target/hexagon/imported/mmvec/macros.def     | 1110 +++++++
 target/hexagon/imported/mpy.idef             | 1269 ++++++++
 target/hexagon/imported/shift.idef           | 1211 ++++++++
 target/hexagon/imported/subinsns.idef        |  152 +
 target/hexagon/imported/system.idef          |  302 ++
 tests/tcg/configure.sh                       |    4 +-
 tests/tcg/hexagon/float_convs.ref            |  748 +++++
 tests/tcg/hexagon/float_madds.ref            |  768 +++++
 97 files changed, 36063 insertions(+), 2 deletions(-)
 create mode 100644 default-configs/hexagon-linux-user.mak
 create mode 100644 linux-user/hexagon/sockbits.h
 create mode 100644 linux-user/hexagon/syscall_nr.h
 create mode 100644 linux-user/hexagon/target_cpu.h
 create mode 100644 linux-user/hexagon/target_elf.h
 create mode 100644 linux-user/hexagon/target_fcntl.h
 create mode 100644 linux-user/hexagon/target_signal.h
 create mode 100644 linux-user/hexagon/target_structs.h
 create mode 100644 linux-user/hexagon/target_syscall.h
 create mode 100644 linux-user/hexagon/termbits.h
 create mode 100644 target/hexagon/arch.h
 create mode 100644 target/hexagon/attribs.h
 create mode 100644 target/hexagon/attribs_def.h
 create mode 100644 target/hexagon/conv_emu.h
 create mode 100644 target/hexagon/cpu-param.h
 create mode 100644 target/hexagon/cpu.h
 create mode 100644 target/hexagon/cpu_bits.h
 create mode 100644 target/hexagon/decode.h
 create mode 100644 target/hexagon/fma_emu.h
 create mode 100644 target/hexagon/genptr.h
 create mode 100644 target/hexagon/genptr_helpers.h
 create mode 100644 target/hexagon/helper.h
 create mode 100644 target/hexagon/helper_overrides.h
 create mode 100644 target/hexagon/hex_arch_types.h
 create mode 100644 target/hexagon/hex_regs.h
 create mode 100644 target/hexagon/iclass.h
 create mode 100644 target/hexagon/insn.h
 create mode 100644 target/hexagon/internal.h
 create mode 100644 target/hexagon/macros.h
 create mode 100644 target/hexagon/mmvec/decode_ext_mmvec.h
 create mode 100644 target/hexagon/mmvec/macros.h
 create mode 100644 target/hexagon/mmvec/mmvec.h
 create mode 100644 target/hexagon/mmvec/system_ext_mmvec.h
 create mode 100644 target/hexagon/opcodes.h
 create mode 100644 target/hexagon/printinsn.h
 create mode 100644 target/hexagon/reg_fields.h
 create mode 100644 target/hexagon/reg_fields_def.h
 create mode 100644 target/hexagon/regmap.h
 create mode 100644 target/hexagon/translate.h
 create mode 100644 disas/hexagon.c
 create mode 100644 linux-user/hexagon/cpu_loop.c
 create mode 100644 linux-user/hexagon/signal.c
 create mode 100644 target/hexagon/arch.c
 create mode 100644 target/hexagon/conv_emu.c
 create mode 100644 target/hexagon/cpu.c
 create mode 100644 target/hexagon/decode.c
 create mode 100644 target/hexagon/fma_emu.c
 create mode 100644 target/hexagon/gdbstub.c
 create mode 100644 target/hexagon/gen_dectree_import.c
 create mode 100644 target/hexagon/gen_semantics.c
 create mode 100644 target/hexagon/genptr.c
 create mode 100644 target/hexagon/iclass.c
 create mode 100644 target/hexagon/mmvec/decode_ext_mmvec.c
 create mode 100644 target/hexagon/mmvec/system_ext_mmvec.c
 create mode 100644 target/hexagon/op_helper.c
 create mode 100644 target/hexagon/opcodes.c
 create mode 100644 target/hexagon/printinsn.c
 create mode 100644 target/hexagon/q6v_decode.c
 create mode 100644 target/hexagon/reg_fields.c
 create mode 100644 target/hexagon/translate.c
 create mode 100644 target/hexagon/Makefile.objs
 create mode 100644 target/hexagon/README
 create mode 100755 target/hexagon/dectree.py
 create mode 100755 target/hexagon/do_qemu.py
 create mode 100644 target/hexagon/imported/allext.idef
 create mode 100644 target/hexagon/imported/allext_macros.def
 create mode 100644 target/hexagon/imported/allextenc.def
 create mode 100644 target/hexagon/imported/allidefs.def
 create mode 100644 target/hexagon/imported/alu.idef
 create mode 100644 target/hexagon/imported/branch.idef
 create mode 100644 target/hexagon/imported/compare.idef
 create mode 100644 target/hexagon/imported/encode.def
 create mode 100644 target/hexagon/imported/encode_pp.def
 create mode 100644 target/hexagon/imported/encode_subinsn.def
 create mode 100644 target/hexagon/imported/float.idef
 create mode 100644 target/hexagon/imported/iclass.def
 create mode 100644 target/hexagon/imported/ldst.idef
 create mode 100755 target/hexagon/imported/macros.def
 create mode 100644 target/hexagon/imported/mmvec/encode_ext.def
 create mode 100644 target/hexagon/imported/mmvec/ext.idef
 create mode 100755 target/hexagon/imported/mmvec/macros.def
 create mode 100644 target/hexagon/imported/mpy.idef
 create mode 100644 target/hexagon/imported/shift.idef
 create mode 100644 target/hexagon/imported/subinsns.idef
 create mode 100644 target/hexagon/imported/system.idef
 create mode 100644 tests/tcg/hexagon/float_convs.ref
 create mode 100644 tests/tcg/hexagon/float_madds.ref

-- 
2.7.4


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 01/67] Hexagon Maintainers
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
@ 2020-02-28 16:42 ` Taylor Simpson
  2020-02-28 16:42 ` [RFC PATCH v2 02/67] Hexagon README Taylor Simpson
                   ` (66 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Add Taylor Simpson as the Hexagon target maintainer

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 MAINTAINERS | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 36d94c1..85fc0ae 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -172,6 +172,14 @@ F: include/hw/cris/
 F: tests/tcg/cris/
 F: disas/cris.c
 
+Hexagon TCG CPUs
+M: Taylor Simpson <tsimpson@quicinc.com>
+S: Supported
+F: target/hexagon/
+F: linux-user/hexagon/
+F: disas/hexagon.c
+F: default-configs/hexagon-linux-user.mak
+
 HPPA (PA-RISC) TCG CPUs
 M: Richard Henderson <rth@twiddle.net>
 S: Maintained
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 02/67] Hexagon README
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
  2020-02-28 16:42 ` [RFC PATCH v2 01/67] Hexagon Maintainers Taylor Simpson
@ 2020-02-28 16:42 ` Taylor Simpson
  2020-02-28 16:42 ` [RFC PATCH v2 03/67] Hexagon ELF Machine Definition Taylor Simpson
                   ` (65 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Gives an introduction and overview to the Hexagon target

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/README | 296 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 296 insertions(+)
 create mode 100644 target/hexagon/README

diff --git a/target/hexagon/README b/target/hexagon/README
new file mode 100644
index 0000000..6de71e2
--- /dev/null
+++ b/target/hexagon/README
@@ -0,0 +1,296 @@
+Hexagon is Qualcomm's very long instruction word (VLIW) digital signal
+processor(DSP).  We also support Hexagon Vector eXtensions (HVX).  HVX
+is a wide vector coprocessor designed for high performance computer vision,
+image processing, machine learning, and other workloads.
+
+The following versions of the Hexagon core are supported
+    Scalar core: v67
+    https://developer.qualcomm.com/downloads/qualcomm-hexagon-v67-programmer-s-reference-manual
+    HVX extension: v66
+    https://developer.qualcomm.com/downloads/qualcomm-hexagon-v66-hvx-programmer-s-reference-manual
+
+We presented an overview of the project at the 2019 KVM Forum.
+    https://kvmforum2019.sched.com/event/Tmwc/qemu-hexagon-automatic-translation-of-the-isa-manual-pseudcode-to-tiny-code-instructions-of-a-vliw-architecture-niccolo-izzo-revng-taylor-simpson-qualcomm-innovation-center
+
+*** Tour of the code ***
+
+The qemu-hexagon implementation is a combination of qemu and the Hexagon
+architecture library (aka archlib).  The three primary directories with
+Hexagon-specific code are
+
+    qemu/target/hexagon
+        This has all the instruction and packet semantics
+    qemu/target/hexagon/imported
+        These files are imported with very little modification from archlib
+        *.idef                  Instruction semantics definition
+        macros.def              Mapping of macros to instruction attributes
+        encode*.def             Encoding patterns for each instruction
+        iclass.def              Instruction class definitions used to determine
+                                legal VLIW slots for each instruction
+    qemu/linux-user/hexagon
+        Helpers for loading the ELF file and making Linux system calls,
+        signals, etc
+
+We start with a script that generates qemu helper for each instruction.  This
+is a two step process.  The first step is to use the C preprocessor to expand
+macros inside the architecture definition files.  This is done in
+target/hexagon/semantics.c.  This step produces
+    <BUILD_DIR>/hexagon-linux-user/semantics_generated.pyinc.
+That file is consumed by the do_qemu.py script.  This script generates
+several files.  All of the generated files end in "_generated.*".  The
+primary file produced is
+    <BUILD_DIR>/hexagon-linux-user/qemu_def_generated.h
+
+Qemu helper functions have 3 parts
+    DEF_HELPER declaration indicates the signature of the helper
+    gen_helper_<NAME> will generate a TCG call to the helper function
+    The helper implementation
+
+In the qemu_def_generated.h file, there is a DEF_QEMU macro for each user-space
+instruction.  The file is included several times with DEF_QEMU defined
+differently, depending on the context.  The macro has four arguments
+    The instruction tag
+    The semantics_short code
+    DEF_HELPER declaration
+    Call to the helper
+    Helper implementation
+
+Here's an example of the A2_add instruction.
+    Instruction tag        A2_add
+    Assembly syntax        "Rd32=add(Rs32,Rt32)"
+    Instruction semantics  "{ RdV=RsV+RtV;}"
+
+By convention, the operands are identified by letter
+    RdV is the destination register
+    RsV, RtV are source registers
+
+The generator uses the operand naming conventions (see large comment in
+do_qemu.py) to determine the signature of the helper function.  Here is the
+result for A2_add from qemu_def_generated.h
+
+DEF_QEMU(A2_add,{ RdV=RsV+RtV;},
+#ifndef fWRAP_A2_add
+DEF_HELPER_3(A2_add, s32, env, s32, s32)
+#endif
+,
+{
+/* A2_add */
+DECL_RREG_d(RdV, RdN, 0, 0);
+DECL_RREG_s(RsV, RsN, 1, 0);
+DECL_RREG_t(RtV, RtN, 2, 0);
+READ_RREG_s(RsV, RsN);
+READ_RREG_t(RtV, RtN);
+fWRAP_A2_add(
+do {
+gen_helper_A2_add(RdV, cpu_env, RsV, RtV);
+} while (0),
+{ RdV=RsV+RtV;});
+WRITE_RREG_d(RdN, RdV);
+FREE_RREG_d(RdV);
+FREE_RREG_s(RsV);
+FREE_RREG_t(RtV);
+/* A2_add */
+},
+#ifndef fWRAP_A2_add
+int32_t HELPER(A2_add)(CPUHexagonState *env, int32_t RsV, int32_t RtV)
+{
+uint32_t slot = 4; slot = slot;
+int32_t RdV = 0;
+{ RdV=RsV+RtV;}
+COUNT_HELPER(A2_add);
+return RdV;
+}
+#endif
+)
+
+For each operand, there are macros for DECL, FREE, READ, WRITE.  These are
+defined in macros.h.  Note that we append the operand type to the macro name,
+which allows us to specialize the TCG code tenerated.  For read-only operands,
+DECL simply declares the TCGv variable (no need for tcg_temp_local_new()),
+and READ will assign from the TCGv corresponding to the GPR, and FREE doesn't
+have to do anything.  Also, note that the WRITE macros update the disassembly
+context to be processed when the packet commits (see "Packet Semantics" below).
+
+Note the fWRAP_A2_add macro around the gen_helper call.  Each instruction has a fWRAP_<tag> macro that takes 2 arguments
+    gen_helper call
+    C semantics (aka short code)
+
+This allows the code generator to override the auto-generated code.  In some
+cases this is necessary for correct execution.  We can also override for
+faster emulation.  For example, calling a helper for add is more expensive
+than generating a TCG add operation.
+
+The qemu_wrap_generated.h file contains a default fWRAP_<tag> for each
+instruction.  The default is to invoke the gen_helper code.
+    #ifndef fWRAP_A2_add
+    #define fWRAP_A2_add(GENHLPR, SHORTCODE) GENHLPR
+    #endif
+
+The helper_overrides.h file has any overrides. For example,
+    #define fWRAP_A2_add(GENHLPR, SHORTCODE) \
+        tcg_gen_add_tl(RdV, RsV, RtV)
+
+This file is included twice
+1) In genptr.c, it overrides the semantics of the desired instructions
+2) In helper.h, it prevents the generation of helpers for overridden
+   instructions.  Notice the #ifndef fWRAP_A2_add above.
+
+The instruction semantics C code heavily on macros.  In cases where the C
+semantics are specified only with macros, we can override the default with
+the short semantics option and #define the macros to generate TCG code.  One
+example is Y2_dczeroa (dc == data cache, zero == zero out the cache line,
+a == address: zero out the data cache line at the given address):
+    Instruction tag        Y2_dczeroa
+    Assembly syntax        "dczeroa(Rs32)"
+    Instruction semantics  "{fEA_REG(RsV); fDCZEROA(EA);}"
+
+In helper_overrides.h, we use the shortcode
+#define fWRAP_Y2_dczeroa(GENHLPR, SHORTCODE) SHORTCODE
+
+In other cases, just a little bit of wrapper code needs to be written.
+    #define fWRAP_tmp(SHORTCODE) \
+    { \
+        TCGv tmp = tcg_temp_new(); \
+        SHORTCODE; \
+        tcg_temp_free(tmp); \
+    }
+
+For example, some load instructions use a temporary for address computation.
+The SL2_loadrd_sp instruction needs a temporary to hold the value of the stack
+pointer (r29)
+    Instruction tag        SL2_loadrd_sp
+    Assembly syntax        "Rdd8=memd(r29+#u5:3)"
+    Instruction semantics  "{fEA_RI(fREAD_SP(),uiV); fLOAD(1,8,u,EA,RddV);}"
+
+In helper_overrides.h you'll see
+    #define fWRAP_SL2_loadrd_sp(GENHLPR, SHORTCODE)      fWRAP_tmp(SHORTCODE)
+
+There are also cases where we brute force the TCG code generation.  The
+allocframe and deallocframe instructions are examples.  Other examples are
+instructions with multiple definitions.  These require special handling
+because qemu helpers can only return a single value.
+
+In addition to instruction semantics, we use a generator to create the decode
+tree.  This generation is also a two step process.  The first step is to run
+target/hexagon/gen_dectree_import.c to produce
+    <BUILD_DIR>/hexagon-linux-user/iset.py
+This file is imported by target/hexagon/dectree.py to produce
+    <BUILD_DIR>/hexagon-linux-user/dectree_generated.h
+
+*** Key Files ***
+
+cpu.h
+
+This file contains the definition of the CPUHexagonState struct.  It is the
+runtime information for each thread and contains stuff like the GPR and
+predicate registers.
+
+macros.h
+mmvec/macros.h
+
+The Hexagon arch lib relies heavily on macros for the instruction semantics.
+This is a great advantage for qemu because we can override them for different
+purposes.  You will also notice there are sometimes two definitions of a macro.
+The QEMU_GENERATE variable determines whether we want the macro to generate TCG
+code.  If QEMU_GENERATE is not defined, we want the macro to generate vanilla
+C code that will work in the helper implementation.
+
+translate.c
+
+The functions in this file generate TCG code for a translation block.  Some
+important functions in this file are
+
+    gen_start_packet - initialize the data structures for packet semantics
+    gen_commit_packet - commit the register writes, stores, etc for a packet
+    decode_packet - disassemble a packet and generate code
+
+genptr.c
+genptr_helpers.h
+helper_overrides.h
+
+These file create a function for each instruction.  It is mostly composed of
+fWRAP_<tag> definitions followed by including qemu_def_generated.h.  The
+genptr_helpers.h file contains helper functions that are invoked by the macros
+in helper_overrides.h and macros.h
+
+op_helper.c
+
+This file contains the implementations of all the helpers.  There are a few
+general purpose helpers, but most of them are generated by including
+qemu_def_generated.h.  There are also several helpers used for debugging.
+
+
+*** Packet Semantics ***
+
+VLIW packet semantics differ from serial semantics in that all input operands
+are read, then the operations are performed, then all the results are written.
+For exmaple, this packet performs a swap of registers r0 and r1
+    { r0 = r1; r1 = r0 }
+Note that the result is different if the instructions are executed serially.
+
+Packet semantics dictate that we defer any changes of state until the entire
+packet is committed.  We record the results of each instruction in a side data
+structure, and update the visible processor state when we commit the packet.
+
+The data structures are divided between the runtime state and the translation
+context.
+
+During the TCG generation (see translate.[ch]), we use the DisasContext to
+track what needs to be done during packet commit.  Here are the relevant
+fields
+
+    ctx_reg_log            list of registers written
+    ctx_reg_log_idx        index into ctx_reg_log
+    ctx_pred_log           list of predicates written
+    ctx_pred_log_idx       index into ctx_pred_log
+    ctx_store_width        width of stores (indexed by slot)
+
+During runtime, the following fields in CPUHexagonState (see cpu.h) are used
+
+    new_value             new value of a given register
+    reg_written           boolean indicating if register was written
+    new_pred_value        new value of a predicate register
+    pred_written          boolean indicating if predicate was written
+    mem_log_stores        record of the stores (indexed by slot)
+
+For Hexagon Vector eXtensions (HVX), the following fields are used
+
+    future_VRegs
+    tmp_VRegs
+    future_ZRegs
+    ZRegs_updated
+    VRegs_updated_tmp
+    VRegs_updated
+    VRegs_select
+
+*** Debugging ***
+
+You can turn on a lot of debugging by changing the HEX_DEBUG macro to 1 in
+internal.h.  This will stream a lot of information as it generates TCG and
+executes the code.
+
+To track down nasty issues with Hexagon->TCG generation, we compare the
+execution results with actual hardware running on a Hexagon Linux target.
+Run qemu with the "-d cpu" option.  Then, we can diff the results and figure
+out where qemu and hardware behave differently.
+
+The stacks are located at different locations.  We handle this by changing
+env->stack_adjust in translate.c.  First, set this to zero and run qemu.
+Then, change env->stack_adjust to the difference between the two stack
+locations.  Then rebuild qemu and run again. That will produce a very
+clean diff.
+
+Here are some handy places to set breakpoints
+
+    At the call to gen_start_packet for a given PC (note that the line number
+        might change in the future)
+        br translate.c:602 if ctx->base.pc_next == 0xdeadbeef
+    The helper function for each instruction is named helper_<TAG>, so here's
+        an example that will set a breakpoint at the start
+        br helper_V6_vgathermh
+    If you have the HEX_DEBUG macro set, the following will be useful
+        At the start of execution of a packet for a given PC
+            br helper_debug_start_packet if env->gpr[41] == 0xdeadbeef
+        At the end of execution of a packet for a given PC
+            br helper_debug_commit_end if env->this_PC == 0xdeadbeef
+
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 03/67] Hexagon ELF Machine Definition
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
  2020-02-28 16:42 ` [RFC PATCH v2 01/67] Hexagon Maintainers Taylor Simpson
  2020-02-28 16:42 ` [RFC PATCH v2 02/67] Hexagon README Taylor Simpson
@ 2020-02-28 16:42 ` Taylor Simpson
  2020-02-28 16:43 ` [RFC PATCH v2 04/67] Hexagon CPU Scalar Core Definition Taylor Simpson
                   ` (64 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Define EM_HEXAGON 164

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
---
 include/elf.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/elf.h b/include/elf.h
index 8fbfe60..d51e7d4 100644
--- a/include/elf.h
+++ b/include/elf.h
@@ -170,6 +170,8 @@ typedef struct mips_elf_abiflags_v0 {
 
 #define EM_UNICORE32    110     /* UniCore32 */
 
+#define EM_HEXAGON     164     /* Qualcomm Hexagon */
+
 #define EM_RISCV        243     /* RISC-V */
 
 #define EM_NANOMIPS     249     /* Wave Computing nanoMIPS */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 04/67] Hexagon CPU Scalar Core Definition
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (2 preceding siblings ...)
  2020-02-28 16:42 ` [RFC PATCH v2 03/67] Hexagon ELF Machine Definition Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-02-28 16:43 ` [RFC PATCH v2 05/67] Hexagon register names Taylor Simpson
                   ` (63 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Add CPU state header, CPU definitions and initialization routines

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/cpu-param.h |  26 ++++
 target/hexagon/cpu.h       | 165 +++++++++++++++++++++++
 target/hexagon/cpu_bits.h  |  37 ++++++
 target/hexagon/internal.h  |  52 ++++++++
 target/hexagon/cpu.c       | 322 +++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 602 insertions(+)
 create mode 100644 target/hexagon/cpu-param.h
 create mode 100644 target/hexagon/cpu.h
 create mode 100644 target/hexagon/cpu_bits.h
 create mode 100644 target/hexagon/internal.h
 create mode 100644 target/hexagon/cpu.c

diff --git a/target/hexagon/cpu-param.h b/target/hexagon/cpu-param.h
new file mode 100644
index 0000000..3a6b727
--- /dev/null
+++ b/target/hexagon/cpu-param.h
@@ -0,0 +1,26 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HEXAGON_CPU_PARAM_H
+#define HEXAGON_CPU_PARAM_H
+
+#define TARGET_PHYS_ADDR_SPACE_BITS 36
+#define TARGET_VIRT_ADDR_SPACE_BITS 32
+
+#define NB_MMU_MODES 1
+
+#endif
diff --git a/target/hexagon/cpu.h b/target/hexagon/cpu.h
new file mode 100644
index 0000000..6146561
--- /dev/null
+++ b/target/hexagon/cpu.h
@@ -0,0 +1,165 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HEXAGON_CPU_H
+#define HEXAGON_CPU_H
+
+/* Forward declaration needed by some of the header files */
+typedef struct CPUHexagonState CPUHexagonState;
+
+#include <fenv.h>
+
+#define TARGET_PAGE_BITS 16     /* 64K pages */
+#define TARGET_LONG_BITS 32
+
+#include "qemu/compiler.h"
+#include "qemu-common.h"
+#include "exec/cpu-defs.h"
+#include "hex_regs.h"
+
+#define NUM_PREGS 4
+#ifdef CONFIG_USER_ONLY
+#define TOTAL_PER_THREAD_REGS 64
+#else
+#error System mode not implemented
+#endif
+
+#define SLOTS_MAX 4
+#define STORES_MAX 2
+#define REG_WRITES_MAX 32
+#define PRED_WRITES_MAX 5                   /* 4 insns + endloop */
+
+#define TYPE_HEXAGON_CPU "hexagon-cpu"
+
+#define HEXAGON_CPU_TYPE_SUFFIX "-" TYPE_HEXAGON_CPU
+#define HEXAGON_CPU_TYPE_NAME(name) (name HEXAGON_CPU_TYPE_SUFFIX)
+#define CPU_RESOLVING_TYPE TYPE_HEXAGON_CPU
+
+#define TYPE_HEXAGON_CPU_V67 HEXAGON_CPU_TYPE_NAME("v67")
+
+#define MMU_USER_IDX 0
+
+struct MemLog {
+    target_ulong va;
+    uint8_t width;
+    uint32_t data32;
+    uint64_t data64;
+};
+
+#define EXEC_STATUS_OK          0x0000
+#define EXEC_STATUS_STOP        0x0002
+#define EXEC_STATUS_REPLAY      0x0010
+#define EXEC_STATUS_LOCKED      0x0020
+#define EXEC_STATUS_EXCEPTION   0x0100
+
+
+#define EXCEPTION_DETECTED      (env->status & EXEC_STATUS_EXCEPTION)
+#define REPLAY_DETECTED         (env->status & EXEC_STATUS_REPLAY)
+#define CLEAR_EXCEPTION         (env->status &= (~EXEC_STATUS_EXCEPTION))
+#define SET_EXCEPTION           (env->status |= EXEC_STATUS_EXCEPTION)
+
+struct CPUHexagonState {
+    target_ulong gpr[TOTAL_PER_THREAD_REGS];
+    target_ulong pred[NUM_PREGS];
+    target_ulong branch_taken;
+    target_ulong next_PC;
+
+    /* For comparing with LLDB on target - see hack_stack_ptrs function */
+    target_ulong stack_start;
+    target_ulong stack_adjust;
+
+    uint8_t slot_cancelled;
+    target_ulong new_value[TOTAL_PER_THREAD_REGS];
+
+    /*
+     * Only used when HEX_DEBUG is on, but unconditionally included
+     * to reduce recompile time when turning HEX_DEBUG on/off.
+     */
+    target_ulong this_PC;
+    target_ulong reg_written[TOTAL_PER_THREAD_REGS];
+
+    target_ulong new_pred_value[NUM_PREGS];
+    target_ulong pred_written;
+
+    struct MemLog mem_log_stores[STORES_MAX];
+
+    target_ulong dczero_addr;
+
+    fenv_t fenv;
+
+    target_ulong llsc_addr;
+    target_ulong llsc_val;
+    uint64_t     llsc_val_i64;
+    target_ulong llsc_newval;
+    uint64_t     llsc_newval_i64;
+    target_ulong llsc_reg;
+
+    target_ulong is_gather_store_insn;
+    target_ulong gather_issued;
+};
+
+#define HEXAGON_CPU_CLASS(klass) \
+    OBJECT_CLASS_CHECK(HexagonCPUClass, (klass), TYPE_HEXAGON_CPU)
+#define HEXAGON_CPU(obj) \
+    OBJECT_CHECK(HexagonCPU, (obj), TYPE_HEXAGON_CPU)
+#define HEXAGON_CPU_GET_CLASS(obj) \
+    OBJECT_GET_CLASS(HexagonCPUClass, (obj), TYPE_HEXAGON_CPU)
+
+typedef struct HexagonCPUClass {
+    /*< private >*/
+    CPUClass parent_class;
+    /*< public >*/
+    DeviceRealize parent_realize;
+    void (*parent_reset)(CPUState *cpu);
+} HexagonCPUClass;
+
+typedef struct HexagonCPU {
+    /*< private >*/
+    CPUState parent_obj;
+    /*< public >*/
+    CPUNegativeOffsetState neg;
+    CPUHexagonState env;
+} HexagonCPU;
+
+static inline HexagonCPU *hexagon_env_get_cpu(CPUHexagonState *env)
+{
+    return container_of(env, HexagonCPU, env);
+}
+
+#include "cpu_bits.h"
+
+#define cpu_signal_handler cpu_hexagon_signal_handler
+extern int cpu_hexagon_signal_handler(int host_signum, void *pinfo, void *puc);
+
+static inline void cpu_get_tb_cpu_state(CPUHexagonState *env, target_ulong *pc,
+                                        target_ulong *cs_base, uint32_t *flags)
+{
+    *pc = env->gpr[HEX_REG_PC];
+    *cs_base = 0;
+#ifdef CONFIG_USER_ONLY
+    *flags = 0;
+#else
+#error System mode not supported on Hexagon yet
+#endif
+}
+
+typedef struct CPUHexagonState CPUArchState;
+typedef HexagonCPU ArchCPU;
+
+#include "exec/cpu-all.h"
+
+#endif /* HEXAGON_CPU_H */
diff --git a/target/hexagon/cpu_bits.h b/target/hexagon/cpu_bits.h
new file mode 100644
index 0000000..3fedde7
--- /dev/null
+++ b/target/hexagon/cpu_bits.h
@@ -0,0 +1,37 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HEXAGON_CPU_BITS_H
+#define HEXAGON_CPU_BITS_H
+
+#define HEX_EXCP_FETCH_NO_UPAGE  0x012
+#define HEX_EXCP_INVALID_PACKET  0x015
+#define HEX_EXCP_INVALID_OPCODE  0x015
+#define HEX_EXCP_PRIV_NO_UREAD   0x024
+#define HEX_EXCP_PRIV_NO_UWRITE  0x025
+
+#define HEX_EXCP_TRAP0           0x172
+#define HEX_EXCP_TRAP1           0x173
+#define HEX_EXCP_SC4             0x100
+#define HEX_EXCP_SC8             0x200
+
+#define PACKET_WORDS_MAX         4
+
+extern int disassemble_hexagon(uint32_t *words, int nwords,
+                               char *buf, int bufsize);
+
+#endif
diff --git a/target/hexagon/internal.h b/target/hexagon/internal.h
new file mode 100644
index 0000000..092dedc
--- /dev/null
+++ b/target/hexagon/internal.h
@@ -0,0 +1,52 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HEXAGON_INTERNAL_H
+#define HEXAGON_INTERNAL_H
+
+/*
+ * Change HEX_DEBUG to 1 to turn on debugging output
+ */
+#define HEX_DEBUG 0
+#define HEX_DEBUG_LOG(...) \
+    do { \
+        if (HEX_DEBUG) { \
+            fprintf(stderr, __VA_ARGS__); \
+        } \
+    } while (0)
+
+/*
+ * Change COUNT_HEX_HELPERS to 1 to count how many times each helper
+ * is called.  This is useful to figure out which helpers would benefit
+ * from writing an fWRAP macro.
+ */
+#define COUNT_HEX_HELPERS 0
+
+extern int hexagon_gdb_read_register(CPUState *cpu, uint8_t *buf, int reg);
+extern int hexagon_gdb_write_register(CPUState *cpu, uint8_t *buf, int reg);
+
+extern void hexagon_debug(CPUHexagonState *env);
+
+#if COUNT_HEX_HELPERS
+extern void print_helper_counts(void);
+#endif
+
+extern const char * const hexagon_regnames[TOTAL_PER_THREAD_REGS];
+
+extern void init_genptr(void);
+
+#endif
diff --git a/target/hexagon/cpu.c b/target/hexagon/cpu.c
new file mode 100644
index 0000000..98d6bdc
--- /dev/null
+++ b/target/hexagon/cpu.c
@@ -0,0 +1,322 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "cpu.h"
+#include "internal.h"
+#include "translate.h"
+#include "exec/exec-all.h"
+#include "qapi/error.h"
+#include "migration/vmstate.h"
+
+static void hexagon_v67_cpu_init(Object *obj)
+{
+}
+
+static ObjectClass *hexagon_cpu_class_by_name(const char *cpu_model)
+{
+    ObjectClass *oc;
+    char *typename;
+    char **cpuname;
+
+    cpuname = g_strsplit(cpu_model, ",", 1);
+    typename = g_strdup_printf(HEXAGON_CPU_TYPE_NAME("%s"), cpuname[0]);
+    oc = object_class_by_name(typename);
+    g_strfreev(cpuname);
+    g_free(typename);
+    if (!oc || !object_class_dynamic_cast(oc, TYPE_HEXAGON_CPU) ||
+        object_class_is_abstract(oc)) {
+        return NULL;
+    }
+    return oc;
+}
+
+const char * const hexagon_regnames[TOTAL_PER_THREAD_REGS] = {
+   "r0", "r1",  "r2",  "r3",  "r4",   "r5",  "r6",  "r7",
+   "r8", "r9",  "r10", "r11", "r12",  "r13", "r14", "r15",
+  "r16", "r17", "r18", "r19", "r20",  "r21", "r22", "r23",
+  "r24", "r25", "r26", "r27", "r28",  "r29", "r30", "r31",
+  "sa0", "lc0", "sa1", "lc1", "p3_0", "c5",  "m0",  "m1",
+  "usr", "pc",  "ugp", "gp",  "cs0",  "cs1", "c14", "c15",
+  "c16", "c17", "c18", "c19", "pkt_cnt",  "insn_cnt", "hvx_cnt", "c23",
+  "c24", "c25", "c26", "c27", "c28",  "c29", "c30", "c31",
+};
+
+/*
+ * One of the main debugging techniques is to use "-d cpu" and compare against
+ * LLDB output when single stepping.  However, the target and qemu put the
+ * stacks at different locations.  This is used to compensate so the diff is
+ * cleaner.
+ */
+static inline target_ulong hack_stack_ptrs(CPUHexagonState *env,
+                                           target_ulong addr)
+{
+    static bool first = true;
+    if (first) {
+        first = false;
+        env->stack_start = env->gpr[HEX_REG_SP];
+        env->gpr[HEX_REG_USR] = 0x56000;
+
+#define ADJUST_STACK 0
+#if ADJUST_STACK
+        /*
+         * Change the two numbers below to
+         *     1    qemu stack location
+         *     2    hardware stack location
+         * Or set to zero for normal mode (no stack adjustment)
+         */
+        env->stack_adjust = 0xfffeeb80 - 0xbf89f980;
+#else
+        env->stack_adjust = 0;
+#endif
+    }
+
+    target_ulong stack_start = env->stack_start;
+    target_ulong stack_size = 0x10000;
+    target_ulong stack_adjust = env->stack_adjust;
+
+    if (stack_start + 0x1000 >= addr && addr >= (stack_start - stack_size)) {
+        return addr - stack_adjust;
+    }
+    return addr;
+}
+
+/* HEX_REG_P3_0 (aka C4) is an alias for the predicate registers */
+static inline target_ulong read_p3_0(CPUHexagonState *env)
+{
+    int32_t control_reg = 0;
+    int i;
+    for (i = NUM_PREGS - 1; i >= 0; i--) {
+        control_reg <<= 8;
+        control_reg |= env->pred[i] & 0xff;
+    }
+    return control_reg;
+}
+
+static void print_reg(FILE *f, CPUHexagonState *env, int regnum)
+{
+    target_ulong value;
+
+    if (regnum == HEX_REG_P3_0) {
+        value = read_p3_0(env);
+    } else {
+        value = regnum < 32 ? hack_stack_ptrs(env, env->gpr[regnum])
+                            : env->gpr[regnum];
+    }
+
+    fprintf(f, "  %s = 0x" TARGET_FMT_lx "\n", hexagon_regnames[regnum], value);
+}
+
+static void hexagon_dump(CPUHexagonState *env, FILE *f)
+{
+    static target_ulong last_pc;
+    int i;
+
+    /*
+     * When comparing with LLDB, it doesn't step through single-cycle
+     * hardware loops the same way.  So, we just skip them here
+     */
+    if (env->gpr[HEX_REG_PC] == last_pc) {
+        return;
+    }
+    last_pc = env->gpr[HEX_REG_PC];
+    fprintf(f, "General Purpose Registers = {\n");
+    for (i = 0; i < 32; i++) {
+        print_reg(f, env, i);
+    }
+    print_reg(f, env, HEX_REG_SA0);
+    print_reg(f, env, HEX_REG_LC0);
+    print_reg(f, env, HEX_REG_SA1);
+    print_reg(f, env, HEX_REG_LC1);
+    print_reg(f, env, HEX_REG_M0);
+    print_reg(f, env, HEX_REG_M1);
+    print_reg(f, env, HEX_REG_USR);
+    print_reg(f, env, HEX_REG_P3_0);
+    print_reg(f, env, HEX_REG_GP);
+    print_reg(f, env, HEX_REG_UGP);
+    print_reg(f, env, HEX_REG_PC);
+#ifdef CONFIG_USER_ONLY
+    /*
+     * Not modelled in user mode, print junk to minimize the diff's
+     * with LLDB output
+     */
+    fprintf(f, "  cause = 0x000000db\n");
+    fprintf(f, "  badva = 0x00000000\n");
+    fprintf(f, "  cs0 = 0x00000000\n");
+    fprintf(f, "  cs1 = 0x00000000\n");
+#else
+    print_reg(f, env, HEX_REG_CAUSE);
+    print_reg(f, env, HEX_REG_BADVA);
+    print_reg(f, env, HEX_REG_CS0);
+    print_reg(f, env, HEX_REG_CS1);
+#endif
+    fprintf(f, "}\n");
+}
+
+static void hexagon_dump_state(CPUState *cs, FILE *f, int flags)
+{
+    HexagonCPU *cpu = HEXAGON_CPU(cs);
+    CPUHexagonState *env = &cpu->env;
+
+    hexagon_dump(env, f);
+}
+
+void hexagon_debug(CPUHexagonState *env)
+{
+    hexagon_dump(env, stdout);
+}
+
+static void hexagon_cpu_set_pc(CPUState *cs, vaddr value)
+{
+    HexagonCPU *cpu = HEXAGON_CPU(cs);
+    CPUHexagonState *env = &cpu->env;
+    env->gpr[HEX_REG_PC] = value;
+}
+
+static void hexagon_cpu_synchronize_from_tb(CPUState *cs, TranslationBlock *tb)
+{
+    HexagonCPU *cpu = HEXAGON_CPU(cs);
+    CPUHexagonState *env = &cpu->env;
+    env->gpr[HEX_REG_PC] = tb->pc;
+}
+
+static bool hexagon_cpu_has_work(CPUState *cs)
+{
+    return true;
+}
+
+void restore_state_to_opc(CPUHexagonState *env, TranslationBlock *tb,
+                          target_ulong *data)
+{
+    env->gpr[HEX_REG_PC] = data[0];
+}
+
+static void hexagon_cpu_reset(CPUState *cs)
+{
+    HexagonCPU *cpu = HEXAGON_CPU(cs);
+    HexagonCPUClass *mcc = HEXAGON_CPU_GET_CLASS(cpu);
+
+    mcc->parent_reset(cs);
+}
+
+static void hexagon_cpu_disas_set_info(CPUState *s, disassemble_info *info)
+{
+    info->print_insn = print_insn_hexagon;
+}
+
+static void hexagon_cpu_realize(DeviceState *dev, Error **errp)
+{
+    CPUState *cs = CPU(dev);
+    HexagonCPUClass *mcc = HEXAGON_CPU_GET_CLASS(dev);
+    Error *local_err = NULL;
+
+    cpu_exec_realizefn(cs, &local_err);
+    if (local_err != NULL) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    qemu_init_vcpu(cs);
+    cpu_reset(cs);
+
+    mcc->parent_realize(dev, errp);
+}
+
+static void hexagon_cpu_init(Object *obj)
+{
+    HexagonCPU *cpu = HEXAGON_CPU(obj);
+
+    cpu_set_cpustate_pointers(cpu);
+}
+
+static bool hexagon_tlb_fill(CPUState *cs, vaddr address, int size,
+                             MMUAccessType access_type, int mmu_idx,
+                             bool probe, uintptr_t retaddr)
+{
+#ifdef CONFIG_USER_ONLY
+    switch (access_type) {
+    case MMU_INST_FETCH:
+        cs->exception_index = HEX_EXCP_FETCH_NO_UPAGE;
+        break;
+    case MMU_DATA_LOAD:
+        cs->exception_index = HEX_EXCP_PRIV_NO_UREAD;
+        break;
+    case MMU_DATA_STORE:
+        cs->exception_index = HEX_EXCP_PRIV_NO_UWRITE;
+        break;
+    }
+    cpu_loop_exit_restore(cs, retaddr);
+#else
+#error System mode not implemented for Hexagon
+#endif
+}
+
+static const VMStateDescription vmstate_hexagon_cpu = {
+    .name = "cpu",
+    .unmigratable = 1,
+};
+
+static void hexagon_cpu_class_init(ObjectClass *c, void *data)
+{
+    HexagonCPUClass *mcc = HEXAGON_CPU_CLASS(c);
+    CPUClass *cc = CPU_CLASS(c);
+    DeviceClass *dc = DEVICE_CLASS(c);
+
+    mcc->parent_realize = dc->realize;
+    dc->realize = hexagon_cpu_realize;
+
+    mcc->parent_reset = cc->reset;
+    cc->reset = hexagon_cpu_reset;
+
+    cc->class_by_name = hexagon_cpu_class_by_name;
+    cc->has_work = hexagon_cpu_has_work;
+    cc->dump_state = hexagon_dump_state;
+    cc->set_pc = hexagon_cpu_set_pc;
+    cc->synchronize_from_tb = hexagon_cpu_synchronize_from_tb;
+    cc->gdb_num_core_regs = TOTAL_PER_THREAD_REGS;
+    cc->gdb_stop_before_watchpoint = true;
+    cc->disas_set_info = hexagon_cpu_disas_set_info;
+#ifdef CONFIG_TCG
+    cc->tcg_initialize = hexagon_translate_init;
+    cc->tlb_fill = hexagon_tlb_fill;
+#endif
+    /* For now, mark unmigratable: */
+    cc->vmsd = &vmstate_hexagon_cpu;
+}
+
+#define DEFINE_CPU(type_name, initfn)      \
+    {                                      \
+        .name = type_name,                 \
+        .parent = TYPE_HEXAGON_CPU,        \
+        .instance_init = initfn            \
+    }
+
+static const TypeInfo hexagon_cpu_type_infos[] = {
+    {
+        .name = TYPE_HEXAGON_CPU,
+        .parent = TYPE_CPU,
+        .instance_size = sizeof(HexagonCPU),
+        .instance_init = hexagon_cpu_init,
+        .abstract = true,
+        .class_size = sizeof(HexagonCPUClass),
+        .class_init = hexagon_cpu_class_init,
+    },
+    DEFINE_CPU(TYPE_HEXAGON_CPU_V67,              hexagon_v67_cpu_init),
+};
+
+DEFINE_TYPES(hexagon_cpu_type_infos)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 05/67] Hexagon register names
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (3 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 04/67] Hexagon CPU Scalar Core Definition Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-02-28 16:43 ` [RFC PATCH v2 06/67] Hexagon Disassembler Taylor Simpson
                   ` (62 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/hex_regs.h | 99 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 99 insertions(+)
 create mode 100644 target/hexagon/hex_regs.h

diff --git a/target/hexagon/hex_regs.h b/target/hexagon/hex_regs.h
new file mode 100644
index 0000000..670da1a
--- /dev/null
+++ b/target/hexagon/hex_regs.h
@@ -0,0 +1,99 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HEXAGON_REGS_H
+#define HEXAGON_REGS_H
+
+enum {
+    HEX_REG_R00              = 0,
+    HEX_REG_R01              = 1,
+    HEX_REG_R02              = 2,
+    HEX_REG_R03              = 3,
+    HEX_REG_R04              = 4,
+    HEX_REG_R05              = 5,
+    HEX_REG_R06              = 6,
+    HEX_REG_R07              = 7,
+    HEX_REG_R08              = 8,
+    HEX_REG_R09              = 9,
+    HEX_REG_R10              = 10,
+    HEX_REG_R11              = 11,
+    HEX_REG_R12              = 12,
+    HEX_REG_R13              = 13,
+    HEX_REG_R14              = 14,
+    HEX_REG_R15              = 15,
+    HEX_REG_R16              = 16,
+    HEX_REG_R17              = 17,
+    HEX_REG_R18              = 18,
+    HEX_REG_R19              = 19,
+    HEX_REG_R20              = 20,
+    HEX_REG_R21              = 21,
+    HEX_REG_R22              = 22,
+    HEX_REG_R23              = 23,
+    HEX_REG_R24              = 24,
+    HEX_REG_R25              = 25,
+    HEX_REG_R26              = 26,
+    HEX_REG_R27              = 27,
+    HEX_REG_R28              = 28,
+    HEX_REG_R29              = 29,
+    HEX_REG_SP               = 29,
+    HEX_REG_FP               = 30,
+    HEX_REG_R30              = 30,
+    HEX_REG_LR               = 31,
+    HEX_REG_R31              = 31,
+    HEX_REG_SA0              = 32,
+    HEX_REG_LC0              = 33,
+    HEX_REG_SA1              = 34,
+    HEX_REG_LC1              = 35,
+    HEX_REG_P3_0             = 36,
+    HEX_REG_M0               = 38,
+    HEX_REG_M1               = 39,
+    HEX_REG_USR              = 40,
+    HEX_REG_PC               = 41,
+    HEX_REG_UGP              = 42,
+    HEX_REG_GP               = 43,
+    HEX_REG_CS0              = 44,
+    HEX_REG_CS1              = 45,
+    HEX_REG_UPCYCLELO        = 46,
+    HEX_REG_UPCYCLEHI        = 47,
+    HEX_REG_FRAMELIMIT       = 48,
+    HEX_REG_FRAMEKEY         = 49,
+    HEX_REG_PKTCNTLO         = 50,
+    HEX_REG_PKTCNTHI         = 51,
+    /* Use reserved control registers for qemu execution counts */
+    HEX_REG_QEMU_PKT_CNT      = 52,
+    HEX_REG_QEMU_INSN_CNT     = 53,
+    HEX_REG_QEMU_HVX_CNT      = 54,
+    HEX_REG_UTIMERLO          = 62,
+    HEX_REG_UTIMERHI          = 63,
+
+#ifndef CONFIG_USER_ONLY
+    HEX_REG_SGP0              = 64,
+    HEX_REG_SGP1              = 65,
+    HEX_REG_STID              = 66,
+    HEX_REG_ELR               = 67,
+    HEX_REG_BADVA0            = 68,
+    HEX_REG_BADVA1            = 69,
+    HEX_REG_SSR               = 70,
+    HEX_REG_CCR               = 71,
+    HEX_REG_HTID              = 72,
+    HEX_REG_BADVA             = 73,
+    HEX_REG_IMASK             = 74,
+    HEX_REG_GEVB              = 75,
+#endif
+};
+
+#endif
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 06/67] Hexagon Disassembler
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (4 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 05/67] Hexagon register names Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-02-28 16:43 ` [RFC PATCH v2 07/67] Hexagon CPU Scalar Core Helpers Taylor Simpson
                   ` (61 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

The Hexagon disassembler calls disassemble_hexagon to decode a packet
and format it for printing

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 include/disas/dis-asm.h |  1 +
 disas/hexagon.c         | 62 +++++++++++++++++++++++++++++++++++++++++++++++++
 disas/Makefile.objs     |  1 +
 3 files changed, 64 insertions(+)
 create mode 100644 disas/hexagon.c

diff --git a/include/disas/dis-asm.h b/include/disas/dis-asm.h
index f87f468..a7fa82d 100644
--- a/include/disas/dis-asm.h
+++ b/include/disas/dis-asm.h
@@ -436,6 +436,7 @@ int print_insn_little_nios2     (bfd_vma, disassemble_info*);
 int print_insn_xtensa           (bfd_vma, disassemble_info*);
 int print_insn_riscv32          (bfd_vma, disassemble_info*);
 int print_insn_riscv64          (bfd_vma, disassemble_info*);
+int print_insn_hexagon          (bfd_vma, disassemble_info*);
 
 #if 0
 /* Fetch the disassembler for a given BFD, if that support is available.  */
diff --git a/disas/hexagon.c b/disas/hexagon.c
new file mode 100644
index 0000000..6ee8653
--- /dev/null
+++ b/disas/hexagon.c
@@ -0,0 +1,62 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+/*
+ * QEMU Hexagon Disassembler
+ */
+
+#include "qemu/osdep.h"
+#include "disas/dis-asm.h"
+#include "target/hexagon/cpu_bits.h"
+
+/*
+ * We will disassemble a packet with up to 4 instructions, so we need
+ * a hefty size buffer.
+ */
+#define PACKET_BUFFER_LEN                   1028
+
+int print_insn_hexagon(bfd_vma memaddr, struct disassemble_info *info)
+{
+    uint32_t words[PACKET_WORDS_MAX];
+    int len, slen;
+    char buf[PACKET_BUFFER_LEN];
+    int status;
+    int i;
+
+    for (i = 0; i < PACKET_WORDS_MAX; i++) {
+        status = (*info->read_memory_func)(memaddr + i * sizeof(uint32_t),
+                                           (bfd_byte *)&words[i],
+                                           sizeof(uint32_t), info);
+        if (status) {
+            if (i > 0) {
+                break;
+            }
+            (*info->memory_error_func)(status, memaddr, info);
+            return status;
+        }
+    }
+
+    len = disassemble_hexagon(words, i, buf, PACKET_BUFFER_LEN);
+    slen = strlen(buf);
+    if (buf[slen - 1] == '\n') {
+        buf[slen - 1] = '\0';
+    }
+    (*info->fprintf_func)(info->stream, "%s", buf);
+
+    return len;
+}
+
diff --git a/disas/Makefile.objs b/disas/Makefile.objs
index 3c1cdce..0e86964 100644
--- a/disas/Makefile.objs
+++ b/disas/Makefile.objs
@@ -24,6 +24,7 @@ common-obj-$(CONFIG_SH4_DIS) += sh4.o
 common-obj-$(CONFIG_SPARC_DIS) += sparc.o
 common-obj-$(CONFIG_LM32_DIS) += lm32.o
 common-obj-$(CONFIG_XTENSA_DIS) += xtensa.o
+common-obj-$(CONFIG_HEXAGON_DIS) += hexagon.o
 
 # TODO: As long as the TCG interpreter and its generated code depend
 # on the QEMU target, we cannot compile the disassembler here.
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 07/67] Hexagon CPU Scalar Core Helpers
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (5 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 06/67] Hexagon Disassembler Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-02-28 16:43 ` [RFC PATCH v2 08/67] Hexagon GDB Stub Taylor Simpson
                   ` (60 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

The majority of helpers are generated.  Define the helper functions needed
then include the generated file

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/helper.h    |  37 ++++
 target/hexagon/op_helper.c | 434 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 471 insertions(+)
 create mode 100644 target/hexagon/helper.h
 create mode 100644 target/hexagon/op_helper.c

diff --git a/target/hexagon/helper.h b/target/hexagon/helper.h
new file mode 100644
index 0000000..8558da8
--- /dev/null
+++ b/target/hexagon/helper.h
@@ -0,0 +1,37 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "helper_overrides.h"
+
+DEF_HELPER_2(raise_exception, noreturn, env, i32)
+DEF_HELPER_1(debug_start_packet, void, env)
+DEF_HELPER_2(new_value, s32, env, int)
+DEF_HELPER_3(debug_check_store_width, void, env, int, int)
+DEF_HELPER_3(debug_commit_end, void, env, int, int)
+DEF_HELPER_3(sfrecipa_val, s32, env, s32, s32)
+DEF_HELPER_3(sfrecipa_pred, s32, env, s32, s32)
+DEF_HELPER_2(sfinvsqrta_val, s32, env, s32)
+DEF_HELPER_2(sfinvsqrta_pred, s32, env, s32)
+DEF_HELPER_4(vacsh_val, s64, env, s64, s64, s64)
+DEF_HELPER_4(vacsh_pred, s32, env, s64, s64, s64)
+
+#define DEF_QEMU(TAG, SHORTCODE, HELPER, GENFN, HELPFN) HELPER
+#include "qemu_def_generated.h"
+#undef DEF_QEMU
+
+DEF_HELPER_2(debug_value, void, env, s32)
+DEF_HELPER_2(debug_value_i64, void, env, s64)
diff --git a/target/hexagon/op_helper.c b/target/hexagon/op_helper.c
new file mode 100644
index 0000000..d944d03
--- /dev/null
+++ b/target/hexagon/op_helper.c
@@ -0,0 +1,434 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <math.h>
+#include "qemu/osdep.h"
+#include "qemu.h"
+#include "exec/helper-proto.h"
+#include "tcg/tcg-op.h"
+#include "cpu.h"
+#include "internal.h"
+#include "macros.h"
+#include "arch.h"
+#include "fma_emu.h"
+#include "conv_emu.h"
+
+#if COUNT_HEX_HELPERS
+#include "opcodes.h"
+
+typedef struct {
+    int count;
+    const char *tag;
+} helper_count_t;
+
+helper_count_t helper_counts[] = {
+#define OPCODE(TAG)    { 0, #TAG },
+#include "opcodes_def_generated.h"
+#undef OPCODE
+    { 0, NULL }
+};
+
+#define COUNT_HELPER(TAG) \
+    do { \
+        helper_counts[(TAG)].count++; \
+    } while (0)
+
+void print_helper_counts(void)
+{
+    helper_count_t *p;
+
+    printf("HELPER COUNTS\n");
+    for (p = helper_counts; p->tag; p++) {
+        if (p->count) {
+            printf("\t%d\t\t%s\n", p->count, p->tag);
+        }
+    }
+}
+#else
+#define COUNT_HELPER(TAG)              /* Nothing */
+#endif
+
+/* Exceptions processing helpers */
+static void QEMU_NORETURN do_raise_exception_err(CPUHexagonState *env,
+                                                 uint32_t exception,
+                                                 uintptr_t pc)
+{
+    CPUState *cs = CPU(hexagon_env_get_cpu(env));
+    qemu_log_mask(CPU_LOG_INT, "%s: %d\n", __func__, exception);
+    cs->exception_index = exception;
+    cpu_loop_exit_restore(cs, pc);
+}
+
+void HELPER(raise_exception)(CPUHexagonState *env, uint32_t exception)
+{
+    do_raise_exception_err(env, exception, 0);
+}
+
+static inline void log_reg_write(CPUHexagonState *env, int rnum,
+                                 target_ulong val, uint32_t slot)
+{
+    HEX_DEBUG_LOG("log_reg_write[%d] = " TARGET_FMT_ld " (0x" TARGET_FMT_lx ")",
+                  rnum, val, val);
+    if (env->slot_cancelled & (1 << slot)) {
+        HEX_DEBUG_LOG(" CANCELLED");
+    }
+    if (val == env->gpr[rnum]) {
+        HEX_DEBUG_LOG(" NO CHANGE");
+    }
+    HEX_DEBUG_LOG("\n");
+    if (!(env->slot_cancelled & (1 << slot))) {
+        env->new_value[rnum] = val;
+#if HEX_DEBUG
+        /* Do this so HELPER(debug_commit_end) will know */
+        env->reg_written[rnum] = 1;
+#endif
+    }
+}
+
+static __attribute__((unused))
+inline void log_reg_write_pair(CPUHexagonState *env, int rnum,
+                                      int64_t val, uint32_t slot)
+{
+    HEX_DEBUG_LOG("log_reg_write_pair[%d:%d] = %ld\n", rnum + 1, rnum, val);
+    log_reg_write(env, rnum, val & 0xFFFFFFFF, slot);
+    log_reg_write(env, rnum + 1, (val >> 32) & 0xFFFFFFFF, slot);
+}
+
+static inline void log_pred_write(CPUHexagonState *env, int pnum,
+                                  target_ulong val)
+{
+    HEX_DEBUG_LOG("log_pred_write[%d] = " TARGET_FMT_ld
+                  " (0x" TARGET_FMT_lx ")\n",
+                  pnum, val, val);
+
+    /* Multiple writes to the same preg are and'ed together */
+    if (env->pred_written & (1 << pnum)) {
+        env->new_pred_value[pnum] &= val & 0xff;
+    } else {
+        env->new_pred_value[pnum] = val & 0xff;
+        env->pred_written |= 1 << pnum;
+    }
+}
+
+static inline void log_store32(CPUHexagonState *env, target_ulong addr,
+                               target_ulong val, int width, int slot)
+{
+    HEX_DEBUG_LOG("log_store%d(0x" TARGET_FMT_lx ", " TARGET_FMT_ld
+                  " [0x" TARGET_FMT_lx "])\n",
+                  width, addr, val, val);
+    env->mem_log_stores[slot].va = addr;
+    env->mem_log_stores[slot].width = width;
+    env->mem_log_stores[slot].data32 = val;
+}
+
+static inline void log_store64(CPUHexagonState *env, target_ulong addr,
+                               int64_t val, int width, int slot)
+{
+    HEX_DEBUG_LOG("log_store%d(0x" TARGET_FMT_lx ", %ld [0x%lx])\n",
+                   width, addr, val, val);
+    env->mem_log_stores[slot].va = addr;
+    env->mem_log_stores[slot].width = width;
+    env->mem_log_stores[slot].data64 = val;
+}
+
+static inline void write_new_pc(CPUHexagonState *env, target_ulong addr)
+{
+    HEX_DEBUG_LOG("write_new_pc(0x" TARGET_FMT_lx ")\n", addr);
+
+    /*
+     * If more than one branch it taken in a packet, only the first one
+     * is actually done.
+     */
+    if (env->branch_taken) {
+        HEX_DEBUG_LOG("INFO: multiple branches taken in same packet, "
+                      "ignoring the second one\n");
+    } else {
+        fCHECK_PCALIGN(addr);
+        env->branch_taken = 1;
+        env->next_PC = addr;
+    }
+}
+
+/* Handy place to set a breakpoint */
+void HELPER(debug_start_packet)(CPUHexagonState *env)
+{
+    HEX_DEBUG_LOG("Start packet: pc = 0x" TARGET_FMT_lx "\n",
+                  env->gpr[HEX_REG_PC]);
+
+    int i;
+    for (i = 0; i < TOTAL_PER_THREAD_REGS; i++) {
+        env->reg_written[i] = 0;
+    }
+}
+
+/*
+ * This helper is needed when the rnum has already been turned into a TCGv,
+ * so we can't just do tcg_gen_mov_tl(result, hex_new_value[rnum]);
+ */
+int32_t HELPER(new_value)(CPUHexagonState *env, int rnum)
+{
+    return env->new_value[rnum];
+}
+
+static inline int32_t new_pred_value(CPUHexagonState *env, int pnum)
+{
+    return env->new_pred_value[pnum];
+}
+
+/* Checks for bookkeeping errors between disassembly context and runtime */
+void HELPER(debug_check_store_width)(CPUHexagonState *env, int slot, int check)
+{
+    if (env->mem_log_stores[slot].width != check) {
+        HEX_DEBUG_LOG("ERROR: %d != %d\n",
+                      env->mem_log_stores[slot].width, check);
+        g_assert_not_reached();
+    }
+}
+
+static void print_store(CPUHexagonState *env, int slot)
+{
+    if (!(env->slot_cancelled & (1 << slot))) {
+        size1u_t width = env->mem_log_stores[slot].width;
+        if (width == 1) {
+            size4u_t data = env->mem_log_stores[slot].data32 & 0xff;
+            HEX_DEBUG_LOG("\tmemb[0x" TARGET_FMT_lx "] = %d (0x%02x)\n",
+                          env->mem_log_stores[slot].va, data, data);
+        } else if (width == 2) {
+            size4u_t data = env->mem_log_stores[slot].data32 & 0xffff;
+            HEX_DEBUG_LOG("\tmemh[0x" TARGET_FMT_lx "] = %d (0x%04x)\n",
+                          env->mem_log_stores[slot].va, data, data);
+        } else if (width == 4) {
+            size4u_t data = env->mem_log_stores[slot].data32;
+            HEX_DEBUG_LOG("\tmemw[0x" TARGET_FMT_lx "] = %d (0x%08x)\n",
+                          env->mem_log_stores[slot].va, data, data);
+        } else if (width == 8) {
+            HEX_DEBUG_LOG("\tmemd[0x" TARGET_FMT_lx "] = %lu (0x%016lx)\n",
+                          env->mem_log_stores[slot].va,
+                          env->mem_log_stores[slot].data64,
+                          env->mem_log_stores[slot].data64);
+        } else {
+            HEX_DEBUG_LOG("\tBad store width %d\n", width);
+            g_assert_not_reached();
+        }
+    }
+}
+
+/* This function is a handy place to set a breakpoint */
+void HELPER(debug_commit_end)(CPUHexagonState *env, int has_st0, int has_st1)
+{
+    bool reg_printed = false;
+    bool pred_printed = false;
+    int i;
+
+    HEX_DEBUG_LOG("Packet committed: pc = 0x" TARGET_FMT_lx "\n",
+                  env->this_PC);
+    HEX_DEBUG_LOG("slot_cancelled = %d\n", env->slot_cancelled);
+
+    for (i = 0; i < TOTAL_PER_THREAD_REGS; i++) {
+        if (env->reg_written[i]) {
+            if (!reg_printed) {
+                HEX_DEBUG_LOG("Regs written\n");
+                reg_printed = true;
+            }
+            HEX_DEBUG_LOG("\tr%d = " TARGET_FMT_ld " (0x" TARGET_FMT_lx " )\n",
+                          i, env->new_value[i], env->new_value[i]);
+        }
+    }
+
+    for (i = 0; i < NUM_PREGS; i++) {
+        if (env->pred_written & (1 << i)) {
+            if (!pred_printed) {
+                HEX_DEBUG_LOG("Predicates written\n");
+                pred_printed = true;
+            }
+            HEX_DEBUG_LOG("\tp%d = 0x" TARGET_FMT_lx "\n",
+                          i, env->new_pred_value[i]);
+        }
+    }
+
+    if (has_st0 || has_st1) {
+        HEX_DEBUG_LOG("Stores\n");
+        if (has_st0) {
+            print_store(env, 0);
+        }
+        if (has_st1) {
+            print_store(env, 1);
+        }
+    }
+
+    HEX_DEBUG_LOG("Next PC = 0x%x\n", env->next_PC);
+    HEX_DEBUG_LOG("Exec counters: pkt = " TARGET_FMT_lx
+                  ", insn = " TARGET_FMT_lx
+                  ", hvx = " TARGET_FMT_lx "\n",
+                  env->gpr[HEX_REG_QEMU_PKT_CNT],
+                  env->gpr[HEX_REG_QEMU_INSN_CNT],
+                  env->gpr[HEX_REG_QEMU_HVX_CNT]);
+
+}
+
+/*
+ * sfrecipa, sfinvsqrta, vacsh have two results
+ *     r0,p0=sfrecipa(r1,r2)
+ *     r0,p0=sfinvsqrta(r1)
+ *     r1:0,p0=vacsh(r3:2,r5:4)
+ * Since helpers can only return a single value, we have two helpers
+ * for each of these. They each contain basically the same code (copy/pasted
+ * from the arch library), but one returns the register and the other
+ * returns the predicate.
+ */
+int32_t HELPER(sfrecipa_val)(CPUHexagonState *env, int32_t RsV, int32_t RtV)
+{
+    /* int32_t PeV; Not needed to compute value */
+    int32_t RdV;
+    fHIDE(int idx;)
+    fHIDE(int adjust;)
+    fHIDE(int mant;)
+    fHIDE(int exp;)
+    if (fSF_RECIP_COMMON(RsV, RtV, RdV, adjust)) {
+        /* PeV = adjust; Not needed to compute value */
+        idx = (RtV >> 16) & 0x7f;
+        mant = (fSF_RECIP_LOOKUP(idx) << 15) | 1;
+        exp = fSF_BIAS() - (fSF_GETEXP(RtV) - fSF_BIAS()) - 1;
+        RdV = fMAKESF(fGETBIT(31, RtV), exp, mant);
+    }
+    return RdV;
+}
+
+int32_t HELPER(sfrecipa_pred)(CPUHexagonState *env, int32_t RsV, int32_t RtV)
+{
+    int32_t PeV = 0;
+    int32_t RdV;
+    fHIDE(int idx;)
+    fHIDE(int adjust;)
+    fHIDE(int mant;)
+    fHIDE(int exp;)
+    if (fSF_RECIP_COMMON(RsV, RtV, RdV, adjust)) {
+        PeV = adjust;
+        idx = (RtV >> 16) & 0x7f;
+        mant = (fSF_RECIP_LOOKUP(idx) << 15) | 1;
+        exp = fSF_BIAS() - (fSF_GETEXP(RtV) - fSF_BIAS()) - 1;
+        RdV = fMAKESF(fGETBIT(31, RtV), exp, mant);
+    }
+    return PeV;
+}
+
+int32_t HELPER(sfinvsqrta_val)(CPUHexagonState *env, int32_t RsV)
+{
+    /* int32_t PeV; Not needed for val version */
+    int32_t RdV;
+    fHIDE(int idx;)
+    fHIDE(int adjust;)
+    fHIDE(int mant;)
+    fHIDE(int exp;)
+    if (fSF_INVSQRT_COMMON(RsV, RdV, adjust)) {
+        /* PeV = adjust; Not needed for val version */
+        idx = (RsV >> 17) & 0x7f;
+        mant = (fSF_INVSQRT_LOOKUP(idx) << 15);
+        exp = fSF_BIAS() - ((fSF_GETEXP(RsV) - fSF_BIAS()) >> 1) - 1;
+        RdV = fMAKESF(fGETBIT(31, RsV), exp, mant);
+    }
+    return RdV;
+}
+
+int32_t HELPER(sfinvsqrta_pred)(CPUHexagonState *env, int32_t RsV)
+{
+    int32_t PeV = 0;
+    int32_t RdV;
+    fHIDE(int idx;)
+    fHIDE(int adjust;)
+    fHIDE(int mant;)
+    fHIDE(int exp;)
+    if (fSF_INVSQRT_COMMON(RsV, RdV, adjust)) {
+        PeV = adjust;
+        idx = (RsV >> 17) & 0x7f;
+        mant = (fSF_INVSQRT_LOOKUP(idx) << 15);
+        exp = fSF_BIAS() - ((fSF_GETEXP(RsV) - fSF_BIAS()) >> 1) - 1;
+        RdV = fMAKESF(fGETBIT(31, RsV), exp, mant);
+    }
+    return PeV;
+}
+
+int64_t HELPER(vacsh_val)(CPUHexagonState *env,
+                           int64_t RxxV, int64_t RssV, int64_t RttV)
+{
+    int32_t PeV = 0;
+    fHIDE(int i;)
+    fHIDE(int xv;)
+    fHIDE(int sv;)
+    fHIDE(int tv;)
+    for (i = 0; i < 4; i++) {
+        xv = (int)fGETHALF(i, RxxV);
+        sv = (int)fGETHALF(i, RssV);
+        tv = (int)fGETHALF(i, RttV);
+        xv = xv + tv;
+        sv = sv - tv;
+        fSETBIT(i * 2, PeV, (xv > sv));
+        fSETBIT(i * 2 + 1, PeV, (xv > sv));
+        fSETHALF(i, RxxV, fSATH(fMAX(xv, sv)));
+    }
+    return RxxV;
+}
+
+int32_t HELPER(vacsh_pred)(CPUHexagonState *env,
+                           int64_t RxxV, int64_t RssV, int64_t RttV)
+{
+    int32_t PeV = 0;
+    fHIDE(int i;)
+    fHIDE(int xv;)
+    fHIDE(int sv;)
+    fHIDE(int tv;)
+    for (i = 0; i < 4; i++) {
+        xv = (int)fGETHALF(i, RxxV);
+        sv = (int)fGETHALF(i, RssV);
+        tv = (int)fGETHALF(i, RttV);
+        xv = xv + tv;
+        sv = sv - tv;
+        fSETBIT(i * 2, PeV, (xv > sv));
+        fSETBIT(i * 2 + 1, PeV, (xv > sv));
+        fSETHALF(i, RxxV, fSATH(fMAX(xv, sv)));
+    }
+    return PeV;
+}
+
+/* Helpful for printing intermediate values within instructions */
+void HELPER(debug_value)(CPUHexagonState *env, int32_t value)
+{
+    HEX_DEBUG_LOG("value = 0x%x\n", value);
+}
+
+void HELPER(debug_value_i64)(CPUHexagonState *env, int64_t value)
+{
+    HEX_DEBUG_LOG("value = 0x%lx\n", value);
+}
+
+static void cancel_slot(CPUHexagonState *env, uint32_t slot)
+{
+    HEX_DEBUG_LOG("Slot %d cancelled\n", slot);
+    env->slot_cancelled |= (1 << slot);
+}
+
+/* These macros can be referenced in the generated helper functions */
+#define warn(...) /* Nothing */
+#define fatal(...) g_assert_not_reached();
+
+#define BOGUS_HELPER(tag) \
+    printf("ERROR: bogus helper: " #tag "\n")
+
+#define DEF_QEMU(TAG, SHORTCODE, HELPER, GENFN, HELPFN) HELPFN
+#include "qemu_def_generated.h"
+#undef DEF_QEMU
+
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 08/67] Hexagon GDB Stub
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (6 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 07/67] Hexagon CPU Scalar Core Helpers Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-02-28 16:43 ` [RFC PATCH v2 09/67] Hexagon architecture types Taylor Simpson
                   ` (59 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

GDB register read and write routines

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/cpu.c     |  3 +++
 target/hexagon/gdbstub.c | 49 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 52 insertions(+)
 create mode 100644 target/hexagon/gdbstub.c

diff --git a/target/hexagon/cpu.c b/target/hexagon/cpu.c
index 98d6bdc..576c566 100644
--- a/target/hexagon/cpu.c
+++ b/target/hexagon/cpu.c
@@ -288,6 +288,9 @@ static void hexagon_cpu_class_init(ObjectClass *c, void *data)
     cc->dump_state = hexagon_dump_state;
     cc->set_pc = hexagon_cpu_set_pc;
     cc->synchronize_from_tb = hexagon_cpu_synchronize_from_tb;
+    cc->gdb_core_xml_file = "hexagon-core.xml";
+    cc->gdb_read_register = hexagon_gdb_read_register;
+    cc->gdb_write_register = hexagon_gdb_write_register;
     cc->gdb_num_core_regs = TOTAL_PER_THREAD_REGS;
     cc->gdb_stop_before_watchpoint = true;
     cc->disas_set_info = hexagon_cpu_disas_set_info;
diff --git a/target/hexagon/gdbstub.c b/target/hexagon/gdbstub.c
new file mode 100644
index 0000000..e678aea
--- /dev/null
+++ b/target/hexagon/gdbstub.c
@@ -0,0 +1,49 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "exec/gdbstub.h"
+#include "cpu.h"
+#include "internal.h"
+
+int hexagon_gdb_read_register(CPUState *cs, uint8_t *mem_buf, int n)
+{
+    HexagonCPU *cpu = HEXAGON_CPU(cs);
+    CPUHexagonState *env = &cpu->env;
+
+    if (n < TOTAL_PER_THREAD_REGS) {
+        return gdb_get_regl(mem_buf, env->gpr[n]);
+    }
+
+    g_assert_not_reached();
+    return 0;
+}
+
+int hexagon_gdb_write_register(CPUState *cs, uint8_t *mem_buf, int n)
+{
+    HexagonCPU *cpu = HEXAGON_CPU(cs);
+    CPUHexagonState *env = &cpu->env;
+
+    if (n < TOTAL_PER_THREAD_REGS) {
+        env->gpr[n] = ldtul_p(mem_buf);
+        return sizeof(target_ulong);
+    }
+
+    g_assert_not_reached();
+    return 0;
+}
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 09/67] Hexagon architecture types
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (7 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 08/67] Hexagon GDB Stub Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-02-28 16:43 ` [RFC PATCH v2 10/67] Hexagon instruction and packet types Taylor Simpson
                   ` (58 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Define types used in files imported from the Hexagon architecture library

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/hex_arch_types.h | 42 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)
 create mode 100644 target/hexagon/hex_arch_types.h

diff --git a/target/hexagon/hex_arch_types.h b/target/hexagon/hex_arch_types.h
new file mode 100644
index 0000000..bb58f49
--- /dev/null
+++ b/target/hexagon/hex_arch_types.h
@@ -0,0 +1,42 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HEXAGON_ARCH_TYPES_H
+#define HEXAGON_ARCH_TYPES_H
+
+/*
+ * These types are used by the code generated from the Hexagon
+ * architecture library.
+ */
+typedef unsigned char size1u_t;
+typedef char size1s_t;
+typedef unsigned short int size2u_t;
+typedef short size2s_t;
+typedef unsigned int size4u_t;
+typedef int size4s_t;
+typedef unsigned long long int size8u_t;
+typedef long long int size8s_t;
+typedef size8u_t paddr_t;
+typedef size4u_t vaddr_t;
+typedef size8u_t pcycles_t;
+
+typedef struct size16s {
+    size8s_t hi;
+    size8u_t lo;
+} size16s_t;
+
+#endif
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 10/67] Hexagon instruction and packet types
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (8 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 09/67] Hexagon architecture types Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-02-28 16:43 ` [RFC PATCH v2 11/67] Hexagon register fields Taylor Simpson
                   ` (57 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

The insn_t and packet_t are the interface between instruction decoding and
TCG code generation

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/insn.h | 133 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 133 insertions(+)
 create mode 100644 target/hexagon/insn.h

diff --git a/target/hexagon/insn.h b/target/hexagon/insn.h
new file mode 100644
index 0000000..a80bcb9
--- /dev/null
+++ b/target/hexagon/insn.h
@@ -0,0 +1,133 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HEXAGON_INSN_H
+#define HEXAGON_INSN_H
+
+#include "cpu.h"
+#include "hex_arch_types.h"
+#include "translate.h"
+
+#define INSTRUCTIONS_MAX 7    /* 2 pairs + loopend */
+#define REG_OPERANDS_MAX 5
+#define IMMEDS_MAX 2
+
+struct Instruction;
+
+typedef void (*semantic_insn_t)(CPUHexagonState *env,
+                                DisasContext *ctx,
+                                struct Instruction *insn);
+
+struct Instruction {
+    semantic_insn_t generate;            /* pointer to genptr routine */
+    size1u_t regno[REG_OPERANDS_MAX];    /* reg operands including predicates */
+    size2u_t opcode;
+
+    size4u_t iclass:6;
+    size4u_t slot:3;
+    size4u_t part1:1;        /*
+                              * cmp-jumps are split into two insns.
+                              * set for the compare and clear for the jump
+                              */
+    size4u_t extension_valid:1;   /* Has a constant extender attached */
+    size4u_t which_extended:1;    /* If has an extender, which immediate */
+    size4u_t is_dcop:1;      /* Is a dcacheop */
+    size4u_t is_dcfetch:1;   /* Has an A_DCFETCH attribute */
+    size4u_t is_load:1;      /* Has A_LOAD attribute */
+    size4u_t is_store:1;     /* Has A_STORE attribute */
+    size4u_t is_memop:1;     /* Has A_MEMOP attribute */
+    size4u_t is_dealloc:1;   /* Is a dealloc return or dealloc frame */
+    size4u_t is_aia:1;       /* Is a post increment */
+    size4u_t is_endloop:1;   /* This is an end of loop */
+    size4u_t is_2nd_jump:1;  /* This is the second jump of a dual-jump packet */
+    size4u_t new_value_producer_slot:4;
+    size4s_t immed[IMMEDS_MAX];    /* immediate field */
+};
+
+typedef struct Instruction insn_t;
+
+struct Packet {
+    size2u_t num_insns;
+    size2u_t encod_pkt_size_in_bytes;
+
+    /* Pre-decodes about LD/ST */
+    size8u_t single_load:1;
+    size8u_t dual_load:1;
+    size8u_t single_store:1;
+    size8u_t dual_store:1;
+    size8u_t load_and_store:1;
+    size8u_t memop_or_nvstore:1;
+
+    /* Pre-decodes about COF */
+    size8u_t pkt_has_cof:1;          /* Has any change-of-flow */
+    size8u_t pkt_has_dual_jump:1;
+    size8u_t pkt_has_initloop:1;
+    size8u_t pkt_has_initloop0:1;
+    size8u_t pkt_has_initloop1:1;
+    size8u_t pkt_has_endloop:1;
+    size8u_t pkt_has_endloop0:1;
+    size8u_t pkt_has_endloop1:1;
+    size8u_t pkt_has_endloop01:1;
+    size8u_t pkt_has_call:1;
+    size8u_t pkt_has_jumpr:1;
+    size8u_t pkt_has_cjump:1;
+    size8u_t pkt_has_cjump_dotnew:1;
+    size8u_t pkt_has_cjump_dotold:1;
+    size8u_t pkt_has_cjump_newval:1;
+    size8u_t pkt_has_duplex:1;
+    size8u_t pkt_has_payload:1;      /* Contains a constant extender */
+    size8u_t pkt_has_dealloc_return:1;
+
+    /* Pre-decodes about SLOTS */
+    size8u_t slot0_valid:1;
+    size8u_t slot1_valid:1;
+    size8u_t slot2_valid:1;
+    size8u_t slot3_valid:1;
+
+    /* When a predicate cancels something, track that */
+    size8u_t pkt_has_fp_op:1;
+    size8u_t pkt_has_fpsp_op:1;
+    size8u_t pkt_has_fpdp_op:1;
+
+    /* Contains a cacheop */
+    size8u_t pkt_has_cacheop:1;
+    size8u_t pkt_has_dczeroa:1;
+    size8u_t pkt_has_ictagop:1;
+    size8u_t pkt_has_icflushop:1;
+    size8u_t pkt_has_dcflushop:1;
+    size8u_t pkt_has_dctagop:1;
+    size8u_t pkt_has_l2flushop:1;
+    size8u_t pkt_has_l2tagop:1;
+
+    /* load store for slots */
+    size8u_t pkt_has_load_s0:1;
+    size8u_t pkt_has_load_s1:1;
+    size8u_t pkt_has_store_s0:1;
+    size8u_t pkt_has_store_s1:1;
+
+    /* Misc */
+    size8u_t num_rops:4;            /* Num risc ops in the packet */
+    size8u_t pkt_access_count:2;    /* Is a vmem access going to VTCM */
+    size8u_t pkt_ldaccess_l2:2;     /* vmem ld access to l2 */
+    size8u_t pkt_ldaccess_vtcm:2;   /* vmem ld access to vtcm */
+
+    insn_t insn[INSTRUCTIONS_MAX];
+};
+
+typedef struct Packet packet_t;
+
+#endif
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 11/67] Hexagon register fields
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (9 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 10/67] Hexagon instruction and packet types Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-02-28 16:43 ` [RFC PATCH v2 12/67] Hexagon instruction attributes Taylor Simpson
                   ` (56 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Declare bitfields within registers such as user status register (USR)

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/reg_fields.h     |  40 +++++++++++++++
 target/hexagon/reg_fields_def.h | 109 ++++++++++++++++++++++++++++++++++++++++
 target/hexagon/reg_fields.c     |  28 +++++++++++
 3 files changed, 177 insertions(+)
 create mode 100644 target/hexagon/reg_fields.h
 create mode 100644 target/hexagon/reg_fields_def.h
 create mode 100644 target/hexagon/reg_fields.c

diff --git a/target/hexagon/reg_fields.h b/target/hexagon/reg_fields.h
new file mode 100644
index 0000000..cf168f0
--- /dev/null
+++ b/target/hexagon/reg_fields.h
@@ -0,0 +1,40 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HEXAGON_REG_FIELDS_H
+#define HEXAGON_REG_FIELDS_H
+
+#define NUM_GEN_REGS 32
+
+typedef struct {
+    const char *name;
+    int offset;
+    int width;
+    const char *description;
+} reg_field_t;
+
+extern reg_field_t reg_field_info[];
+
+enum reg_fields_enum {
+#define DEF_REG_FIELD(TAG, NAME, START, WIDTH, DESCRIPTION) \
+    TAG,
+#include "reg_fields_def.h"
+    NUM_REG_FIELDS
+#undef DEF_REG_FIELD
+};
+
+#endif
diff --git a/target/hexagon/reg_fields_def.h b/target/hexagon/reg_fields_def.h
new file mode 100644
index 0000000..20ccc3e
--- /dev/null
+++ b/target/hexagon/reg_fields_def.h
@@ -0,0 +1,109 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+/*
+ * For registers that have individual fields, explain them here
+ *   DEF_REG_FIELD(tag,
+ *                 name,
+ *                 bit start offset,
+ *                 width,
+ *                 description
+ */
+
+/* USR fields */
+DEF_REG_FIELD(USR_OVF,
+    "ovf", 0, 1,
+    "Sticky Saturation Overflow - "
+    "Set when saturation occurs while executing instruction that specifies "
+    "optional saturation, remains set until explicitly cleared by a USR=Rs "
+    "instruction.")
+DEF_REG_FIELD(USR_FPINVF,
+    "fpinvf", 1, 1,
+    "Floating-point IEEE Invalid Sticky Flag.")
+DEF_REG_FIELD(USR_FPDBZF,
+    "fpdbzf", 2, 1,
+    "Floating-point IEEE Divide-By-Zero Sticky Flag.")
+DEF_REG_FIELD(USR_FPOVFF,
+    "fpovff", 3, 1,
+    "Floating-point IEEE Overflow Sticky Flag.")
+DEF_REG_FIELD(USR_FPUNFF,
+    "fpunff", 4, 1,
+    "Floating-point IEEE Underflow Sticky Flag.")
+DEF_REG_FIELD(USR_FPINPF,
+    "fpinpf", 5, 1,
+    "Floating-point IEEE Inexact Sticky Flag.")
+
+DEF_REG_FIELD(USR_LPCFG,
+    "lpcfg", 8, 2,
+    "Hardware Loop Configuration: "
+    "Number of loop iterations (0-3) remaining before pipeline predicate "
+    "should be set.")
+DEF_REG_FIELD(USR_PKTCNT_U,
+    "pktcnt_u", 10, 1,
+    "Enable packet counting in User mode.")
+DEF_REG_FIELD(USR_PKTCNT_G,
+    "pktcnt_g", 11, 1,
+    "Enable packet counting in Guest mode.")
+DEF_REG_FIELD(USR_PKTCNT_M,
+    "pktcnt_m", 12, 1,
+    "Enable packet counting in Monitor mode.")
+DEF_REG_FIELD(USR_HFD,
+    "hfd", 13, 2,
+    "Two bits that let the user control the amount of L1 hardware data cache "
+    "prefetching (up to 4 cache lines): "
+    "00: No prefetching, "
+    "01: Prefetch Loads with post-updating address mode when execution is "
+        "within a hardware loop, "
+    "10: Prefetch any hardware-detected striding Load when execution is within "
+        "a hardware loop, "
+    "11: Prefetch any hardware-detected striding Load.")
+DEF_REG_FIELD(USR_HFI,
+    "hfi", 15, 2,
+    "Two bits that let the user control the amount of L1 instruction cache "
+    "prefetching. "
+    "00: No prefetching, "
+    "01: Allow prefetching of at most 1 additional cache line, "
+    "10: Allow prefetching of at most 2 additional cache lines.")
+
+DEF_REG_FIELD(USR_FPRND,
+    "fprnd", 22, 2,
+    "Rounding Mode for Floating-Point Instructions: "
+    "00: Round to nearest, ties to even (default), "
+    "01: Toward zero, "
+    "10: Downward (toward negative infinity), "
+    "11: Upward (toward positive infinity).")
+
+DEF_REG_FIELD(USR_FPINVE,
+    "fpinve", 25, 1,
+    "Enable trap on IEEE Invalid.")
+DEF_REG_FIELD(USR_FPDBZE,
+    "fpdbze", 26, 1, "Enable trap on IEEE Divide-By-Zero.")
+DEF_REG_FIELD(USR_FPOVFE,
+    "fpovfe", 27, 1,
+    "Enable trap on IEEE Overflow.")
+DEF_REG_FIELD(USR_FPUNFE,
+    "fpunfe", 28, 1,
+    "Enable trap on IEEE Underflow.")
+DEF_REG_FIELD(USR_FPINPE,
+    "fpinpe", 29, 1,
+    "Enable trap on IEEE Inexact.")
+DEF_REG_FIELD(USR_PFA,
+    "pfa", 31, 1,
+    "L2 Prefetch Active: Set when non-blocking l2fetch instruction is "
+    "prefetching requested data, remains set until l2fetch prefetch operation "
+    "is completed (or not active).") /* read-only */
+
diff --git a/target/hexagon/reg_fields.c b/target/hexagon/reg_fields.c
new file mode 100644
index 0000000..2a3e4f5a
--- /dev/null
+++ b/target/hexagon/reg_fields.c
@@ -0,0 +1,28 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "reg_fields.h"
+
+reg_field_t reg_field_info[] = {
+#define DEF_REG_FIELD(TAG, NAME, START, WIDTH, DESCRIPTION)    \
+      {NAME, START, WIDTH, DESCRIPTION},
+#include "reg_fields_def.h"
+      {NULL, 0, 0}
+#undef DEF_REG_FIELD
+};
+
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 12/67] Hexagon instruction attributes
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (10 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 11/67] Hexagon register fields Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-02-28 16:43 ` [RFC PATCH v2 13/67] Hexagon register map Taylor Simpson
                   ` (55 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/attribs.h     |  32 ++++
 target/hexagon/attribs_def.h | 404 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 436 insertions(+)
 create mode 100644 target/hexagon/attribs.h
 create mode 100644 target/hexagon/attribs_def.h

diff --git a/target/hexagon/attribs.h b/target/hexagon/attribs.h
new file mode 100644
index 0000000..d35af0c
--- /dev/null
+++ b/target/hexagon/attribs.h
@@ -0,0 +1,32 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HEXAGON_ATTRIBS_H
+#define HEXAGON_ATTRIBS_H
+
+enum {
+#define DEF_ATTRIB(NAME, ...) A_##NAME,
+#include "attribs_def.h"
+#undef DEF_ATTRIB
+};
+
+#define ATTRIB_WIDTH 32
+#define GET_ATTRIB(opcode, attrib) \
+    (((opcode_attribs[opcode][attrib / ATTRIB_WIDTH])\
+    >> (attrib % ATTRIB_WIDTH)) & 0x1)
+
+#endif /* ATTRIBS_H */
diff --git a/target/hexagon/attribs_def.h b/target/hexagon/attribs_def.h
new file mode 100644
index 0000000..6938820
--- /dev/null
+++ b/target/hexagon/attribs_def.h
@@ -0,0 +1,404 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+/* Keep this as the first attribute: */
+DEF_ATTRIB(AA_DUMMY, "Dummy Zeroth Attribute", "", "")
+
+/* Misc */
+DEF_ATTRIB(FAKEINSN, "Not a real instruction", "", "")
+DEF_ATTRIB(MAPPING, "Not real -- asm mapped", "", "")
+DEF_ATTRIB(CONDMAPPING, "Not real -- mapped based on values", "", "")
+DEF_ATTRIB(EXTENSION, "Extension instruction", "", "")
+DEF_ATTRIB(SHARED_EXTENSION, "Shared extension instruction", "", "")
+
+DEF_ATTRIB(PRIV, "Not available in user or guest mode", "", "")
+DEF_ATTRIB(GUEST, "Not available in user mode", "", "")
+
+DEF_ATTRIB(FPOP, "Floating Point Operation", "", "")
+DEF_ATTRIB(FPDOUBLE, "Double-precision Floating Point Operation", "", "")
+DEF_ATTRIB(FPSINGLE, "Single-precision Floating Point Operation", "", "")
+DEF_ATTRIB(SFMAKE, "Single Float Make", "", "")
+DEF_ATTRIB(DFMAKE, "Single Float Make", "", "")
+
+DEF_ATTRIB(NO_TIMING_LOG, "Does not get logged to the timing model", "", "")
+
+DEF_ATTRIB(EXTENDABLE, "Immediate may be extended", "", "")
+DEF_ATTRIB(EXT_UPPER_IMMED, "Extend upper case immediate", "", "")
+DEF_ATTRIB(EXT_LOWER_IMMED, "Extend lower case immediate", "", "")
+DEF_ATTRIB(MUST_EXTEND, "Immediate must be extended", "", "")
+DEF_ATTRIB(INVPRED, "The predicate is inverted for true/false sense", "", "")
+
+DEF_ATTRIB(ARCHV2, "V2 architecture", "", "")
+DEF_ATTRIB(ARCHV3, "V3 architecture", "", "")
+DEF_ATTRIB(ARCHV4, "V4 architecture", "", "")
+DEF_ATTRIB(ARCHV5, "V5 architecture", "", "")
+
+DEF_ATTRIB(PACKED, "Packable instruction", "", "")
+DEF_ATTRIB(SUBINSN, "sub-instruction", "", "")
+
+/* Load and Store attributes */
+DEF_ATTRIB(LOAD, "Loads from memory", "", "")
+DEF_ATTRIB(STORE, "Stores to memory", "", "")
+DEF_ATTRIB(STOREIMMED, "Stores immed to memory", "", "")
+DEF_ATTRIB(MEMSIZE_1B, "Memory width is 1 byte", "", "")
+DEF_ATTRIB(MEMSIZE_2B, "Memory width is 2 bytes", "", "")
+DEF_ATTRIB(MEMSIZE_4B, "Memory width is 4 bytes", "", "")
+DEF_ATTRIB(MEMSIZE_8B, "Memory width is 8 bytes", "", "")
+DEF_ATTRIB(MEMLIKE, "Memory-like instruction", "", "")
+DEF_ATTRIB(MEMLIKE_PACKET_RULES, "follows Memory-like packet rules", "", "")
+DEF_ATTRIB(CACHEOP, "Cache operation", "", "")
+DEF_ATTRIB(COPBYADDRESS, "Cache operation by address", "", "")
+DEF_ATTRIB(COPBYIDX, "Cache operation by index", "", "")
+DEF_ATTRIB(RELEASE, "Releases a lock", "", "")
+DEF_ATTRIB(ACQUIRE, "Acquires a lock", "", "")
+
+
+/* Load and Store Addressing Mode Attributes */
+DEF_ATTRIB(EA_REG_ONLY, "EA = input register only", "", "")
+DEF_ATTRIB(EA_IMM_ONLY, "EA = immediate only", "", "")
+DEF_ATTRIB(EA_REG_PLUS_IMM, "EA = register plus immediate", "", "")
+DEF_ATTRIB(EA_REG_PLUS_REGSCALED, "EA = register plus scaled register", "", "")
+DEF_ATTRIB(EA_IMM_PLUS_REGSCALED, "EA = immediate plus scaled register", "", "")
+DEF_ATTRIB(EA_BREV_REG, "EA = bit-reversed input register", "", "")
+DEF_ATTRIB(EA_GP_IMM, "EA = GP plus immediate (unless extended)", "", "")
+DEF_ATTRIB(EA_PAGECROSS, "EA calculation can have a Page Cross Stall", "", "")
+
+DEF_ATTRIB(PM_ANY, "Post Modify", "", "")
+DEF_ATTRIB(PM_I, "Post Modify by Immediate", "", "")
+DEF_ATTRIB(PM_M, "Post Modify by M register", "", "")
+DEF_ATTRIB(PM_CIRI, "Post Modify with Circular Addressing by immediate", "", "")
+DEF_ATTRIB(PM_CIRR, "Post Modify with Circular Addressing by I field", "", "")
+
+DEF_ATTRIB(VMEM, "VMEM-type", "", "")
+DEF_ATTRIB(VBUF, "Touches the VBUF", "", "")
+DEF_ATTRIB(VDBG, "Vector debugging instruction", "", "")
+
+/* V6 Vector attributes */
+DEF_ATTRIB(CVI, "Executes on the HVX extension", "", "")
+DEF_ATTRIB(NT_VMEM, "Non-temporal memory access", "", "")
+DEF_ATTRIB(VMEMU, "Unaligned memory access", "", "")
+
+DEF_ATTRIB(CVI_NEW, "New value memory instruction executes on HVX", "", "")
+DEF_ATTRIB(CVI_VM, "Memory instruction executes on HVX", "", "")
+DEF_ATTRIB(CVI_VP, "Permute instruction executes on HVX", "", "")
+DEF_ATTRIB(CVI_VP_VS, "Double vector permute/shft insn executes on HVX", "", "")
+DEF_ATTRIB(CVI_VX, "Multiply instruction executes on HVX", "", "")
+DEF_ATTRIB(CVI_VX_DV, "Double vector multiply insn executes on HVX", "", "")
+DEF_ATTRIB(CVI_VS, "Shift instruction executes on HVX", "", "")
+DEF_ATTRIB(CVI_VS_VX, "Permute/shift and multiply insn executes on HVX", "", "")
+DEF_ATTRIB(CVI_VA, "ALU instruction executes on HVX", "", "")
+DEF_ATTRIB(CVI_VA_DV, "Double vector alu instruction executes on HVX", "", "")
+DEF_ATTRIB(CVI_4SLOT, "Consumes all the vector execution resources", "", "")
+DEF_ATTRIB(CVI_TMP, "Transient Memory Load not written to register", "", "")
+DEF_ATTRIB(CVI_TMP_SRC, "Transient reassign", "", "")
+DEF_ATTRIB(CVI_EXTRACT, "HVX Extract Instruction that goes through L2", "", "")
+DEF_ATTRIB(CVI_EARLY, "HVX instructions that require early sources", "", "")
+DEF_ATTRIB(CVI_LATE, "HVX insn that always require late sources", "", "")
+DEF_ATTRIB(CVI_VV_LATE, "HVX insn that always require late Vv source", "", "")
+DEF_ATTRIB(CVI_REQUIRES_TMPLOAD, ".tmp load must be included in packet", "", "")
+DEF_ATTRIB(CVI_PUMP_2X, "Goes through the pipeline twice", "", "")
+DEF_ATTRIB(CVI_PUMP_4X, "Goes through the pipeline four times", "", "")
+DEF_ATTRIB(CVI_GATHER, "CVI Gather operation", "", "")
+DEF_ATTRIB(CVI_SCATTER, "CVI Scatter operation", "", "")
+DEF_ATTRIB(CVI_SCATTER_RELEASE, "CVI Store Release for scatter", "", "")
+DEF_ATTRIB(CVI_GATHER_RELEASE, "CVI Store Release for gather", "", "")
+DEF_ATTRIB(CVI_TMP_DST, "CVI instruction that doesn't write a register", "", "")
+DEF_ATTRIB(CVI_SCATTER_WORD_ACC, "CVI Scatter Word Accum (second pass)", "", "")
+DEF_ATTRIB(CVI_SCATTER_ACC, "CVI Scatter Accumulate", "", "")
+
+DEF_ATTRIB(CVI_GATHER_ADDR_2B, "CVI Scatter/Gather address is halfword", "", "")
+DEF_ATTRIB(CVI_GATHER_ADDR_4B, "CVI Scatter/Gather address is word", "", "")
+
+DEF_ATTRIB(VFETCH, "memory fetch op to L2 for a single vector", "", "")
+
+DEF_ATTRIB(CVI_SLOT23, "Can execute in slot 2 or slot 3 (HVX)", "", "")
+
+DEF_ATTRIB(VTCM_ALLBANK_ACCESS, "Allocates in all VTCM schedulers", "", "")
+
+/* Change-of-flow attributes */
+DEF_ATTRIB(JUMP, "Jump-type instruction", "", "")
+DEF_ATTRIB(DIRECT, "Uses an PC-relative immediate field", "", "")
+DEF_ATTRIB(INDIRECT, "Absolute register jump", "", "")
+DEF_ATTRIB(CJUMP, "Conditional jump", "", "")
+DEF_ATTRIB(CALL, "Function call instruction", "", "")
+DEF_ATTRIB(RET, "Function return instruction", "", "")
+DEF_ATTRIB(PERM, "Permute instruction", "", "")
+DEF_ATTRIB(COF, "Change-of-flow instruction", "", "")
+DEF_ATTRIB(CONDEXEC, "May be cancelled by a predicate", "", "")
+DEF_ATTRIB(DOTOLD, "Uses a predicate generated in a previous packet", "", "")
+DEF_ATTRIB(DOTNEW, "Uses a predicate generated in the same packet", "", "")
+DEF_ATTRIB(DOTNEWVALUE, "Uses a register value generated in this pkt", "", "")
+DEF_ATTRIB(NEWCMPJUMP, "Compound compare and jump", "", "")
+DEF_ATTRIB(NVSTORE, "New-value store", "", "")
+DEF_ATTRIB(MEMOP, "memop", "", "")
+
+DEF_ATTRIB(ROPS_2, "Compound instruction worth 2 wimpy RISC-ops", "", "")
+DEF_ATTRIB(ROPS_3, "Compound instruction worth 3 wimpy RISC-ops", "", "")
+
+
+/* access to implicit registers */
+DEF_ATTRIB(IMPLICIT_WRITES_LR, "Writes the link register", "", "UREG.LR")
+DEF_ATTRIB(IMPLICIT_READS_LR, "Reads the link register", "UREG.LR", "")
+DEF_ATTRIB(IMPLICIT_READS_LC0, "Reads loop count for loop 0", "UREG.LC0", "")
+DEF_ATTRIB(IMPLICIT_READS_LC1, "Reads loop count for loop 1", "UREG.LC1", "")
+DEF_ATTRIB(IMPLICIT_READS_SA0, "Reads start address for loop 0", "UREG.SA0", "")
+DEF_ATTRIB(IMPLICIT_READS_SA1, "Reads start address for loop 1", "UREG.SA1", "")
+DEF_ATTRIB(IMPLICIT_WRITES_PC, "Writes the program counter", "", "UREG.PC")
+DEF_ATTRIB(IMPLICIT_READS_PC, "Reads the program counter", "UREG.PC", "")
+DEF_ATTRIB(IMPLICIT_WRITES_SP, "Writes the stack pointer", "", "UREG.SP")
+DEF_ATTRIB(IMPLICIT_READS_SP, "Reads the stack pointer", "UREG.SP", "")
+DEF_ATTRIB(IMPLICIT_WRITES_FP, "Writes the frame pointer", "", "UREG.FP")
+DEF_ATTRIB(IMPLICIT_READS_FP, "Reads the frame pointer", "UREG.FP", "")
+DEF_ATTRIB(IMPLICIT_WRITES_GP, "Writes the GP register", "", "UREG.GP")
+DEF_ATTRIB(IMPLICIT_READS_GP, "Reads the GP register", "UREG.GP", "")
+DEF_ATTRIB(IMPLICIT_WRITES_LC0, "Writes loop count for loop 0", "", "UREG.LC0")
+DEF_ATTRIB(IMPLICIT_WRITES_LC1, "Writes loop count for loop 1", "", "UREG.LC1")
+DEF_ATTRIB(IMPLICIT_WRITES_SA0, "Writes start addr for loop 0", "", "UREG.SA0")
+DEF_ATTRIB(IMPLICIT_WRITES_SA1, "Writes start addr for loop 1", "", "UREG.SA1")
+DEF_ATTRIB(IMPLICIT_WRITES_R00, "Writes Register 0", "", "UREG.R00")
+DEF_ATTRIB(IMPLICIT_WRITES_P0, "Writes Predicate 0", "", "UREG.P0")
+DEF_ATTRIB(IMPLICIT_WRITES_P1, "Writes Predicate 1", "", "UREG.P1")
+DEF_ATTRIB(IMPLICIT_WRITES_P2, "Writes Predicate 1", "", "UREG.P2")
+DEF_ATTRIB(IMPLICIT_WRITES_P3, "May write Predicate 3", "", "UREG.P3")
+DEF_ATTRIB(IMPLICIT_READS_R00, "Reads Register 0", "UREG.R00", "")
+DEF_ATTRIB(IMPLICIT_READS_P0, "Reads Predicate 0", "UREG.P0", "")
+DEF_ATTRIB(IMPLICIT_READS_P1, "Reads Predicate 1", "UREG.P1", "")
+DEF_ATTRIB(IMPLICIT_READS_P3, "Reads Predicate 3", "UREG.P3", "")
+DEF_ATTRIB(IMPLICIT_READS_Q3, "Reads Vector Predicate 3", "UREG.Q3", "")
+DEF_ATTRIB(IMPLICIT_READS_CS, "Reads the CS/M register", "UREG.CS", "")
+DEF_ATTRIB(IMPLICIT_READS_FRAMEKEY, "Reads FRAMEKEY", "UREG.FRAMEKEY", "")
+DEF_ATTRIB(IMPLICIT_READS_FRAMELIMIT, "Reads FRAMELIMIT", "UREG.FRAMELIMIT", "")
+DEF_ATTRIB(IMPLICIT_READS_ELR, "Reads the ELR register", "MREG.ELR", "")
+DEF_ATTRIB(IMPLICIT_READS_SGP0, "Reads the SGP0 register", "MREG.SGP0", "")
+DEF_ATTRIB(IMPLICIT_READS_SGP1, "Reads the SGP1 register", "MREG.SGP1", "")
+DEF_ATTRIB(IMPLICIT_WRITES_SGP0, "Reads the SGP0 register", "", "MREG.SGP0")
+DEF_ATTRIB(IMPLICIT_WRITES_SGP1, "Reads the SGP1 register", "", "MREG.SGP1")
+DEF_ATTRIB(IMPLICIT_WRITES_STID_PRIO_ANYTHREAD, "Reads", "", "MREG.STID.PRIO")
+DEF_ATTRIB(IMPLICIT_WRITES_SRBIT, "Writes the OVF bit", "", "UREG.SR.OVF")
+DEF_ATTRIB(IMPLICIT_WRITES_FPFLAGS, "May write FP flags", "", "UREG.SR.FPFLAGS")
+DEF_ATTRIB(IMPLICIT_WRITES_LPCFG, "Writes the loop config", "", "UREG.SR.LPCFG")
+DEF_ATTRIB(IMPLICIT_WRITES_CVBITS, "Writes the CV flags", "", "UREG.SR.CV")
+DEF_ATTRIB(IMPLICIT_READS_FPRND, "May read FP rnd mode", "UREG.SR.FPRND", "")
+DEF_ATTRIB(IMPLICIT_READS_SSR, "May read SSR values", "MREG.SSR", "")
+DEF_ATTRIB(IMPLICIT_READS_CCR, "May read CCR values", "MREG.CCR", "")
+DEF_ATTRIB(IMPLICIT_WRITES_CCR, "May write CCR values", "", "MREG.CCR")
+DEF_ATTRIB(IMPLICIT_WRITES_SSR, "May write SSR values", "", "MREG.SSR")
+DEF_ATTRIB(IMPLICIT_READS_GELR, "May read GELR values", "GREG.GELR", "")
+DEF_ATTRIB(IMPLICIT_READS_GEVB, "May read GEVB values", "MREG.GEVB", "")
+DEF_ATTRIB(IMPLICIT_READS_GSR, "May read GSR values", "GREG.GSR", "")
+DEF_ATTRIB(IMPLICIT_READS_GOSP, "May read GOSP values", "GREG.GOSP", "")
+DEF_ATTRIB(IMPLICIT_WRITES_GELR, "May write GELR values", "", "GREG.GELR")
+DEF_ATTRIB(IMPLICIT_WRITES_GSR, "May write GSR values", "", "GREG.GSR")
+DEF_ATTRIB(IMPLICIT_WRITES_GOSP, "May write GOSP values", "", "GREG.GOSP")
+DEF_ATTRIB(IMPLICIT_READS_IPENDAD_IPEND, "May read", "MREG.IPENDAD.IPEND", "")
+DEF_ATTRIB(IMPLICIT_WRITES_IPENDAD_IPEND, "May write", "", "MREG.IPENDAD.IPEND")
+DEF_ATTRIB(IMPLICIT_READS_IPENDAD_IAD, "May read", "MREG.IPENDAD.IAD", "")
+DEF_ATTRIB(IMPLICIT_WRITES_IPENDAD_IAD, "May write", "", "MREG.IPENDAD.IAD")
+DEF_ATTRIB(IMPLICIT_WRITES_IMASK_ANYTHREAD, "May write", "", "MREG.IMASK")
+DEF_ATTRIB(IMPLICIT_READS_IMASK_ANYTHREAD, "May read", "MREG.IMASK", "")
+DEF_ATTRIB(IMPLICIT_READS_SYSCFG_K0LOCK, "May read", "MREG.SYSCFG.K0LOCK", "")
+DEF_ATTRIB(IMPLICIT_WRITES_SYSCFG_K0LOCK, "May write", "", "MREG.SYSCFG.K0LOCK")
+DEF_ATTRIB(IMPLICIT_READS_SYSCFG_TLBLOCK, "May read", "MREG.SYSCFG.TLBLOCK", "")
+DEF_ATTRIB(IMPLICIT_WRITES_SYSCFG_TLBLOCK, "May wr", "", "MREG.SYSCFG.TLBLOCK")
+DEF_ATTRIB(IMPLICIT_WRITES_SYSCFG_GCA, "May write", "", "MREG.SYSCFG.GCA")
+DEF_ATTRIB(IMPLICIT_READS_SYSCFG_GCA, "May read", "MREG.SYSCFG.GCA", "")
+DEF_ATTRIB(IMPLICIT_WRITES_USR_PFA, "May write USR_PFA", "", "UREG.SR.PFA")
+
+/* Other things the instruction does */
+DEF_ATTRIB(ACC, "Has a multiply", "", "")
+DEF_ATTRIB(MPY, "Has a multiply", "", "")
+DEF_ATTRIB(SATURATE, "Does signed saturation", "", "")
+DEF_ATTRIB(USATURATE, "Does unsigned saturation", "", "")
+DEF_ATTRIB(CIRCADDR, "Uses circular addressing mode", "", "")
+DEF_ATTRIB(BREVADDR, "Uses bit reverse addressing mode", "", "")
+DEF_ATTRIB(BIDIRSHIFTL, "Uses a bidirectional shift left", "", "")
+DEF_ATTRIB(BIDIRSHIFTR, "Uses a bidirectional shift right", "", "")
+DEF_ATTRIB(BRANCHADDER, "Contains a PC-plus-immediate operation.", "", "")
+DEF_ATTRIB(CRSLOT23, "Can execute in slot 2 or slot 3 (CR)", "", "")
+DEF_ATTRIB(COMMUTES, "The operation is communitive", "", "")
+DEF_ATTRIB(DEALLOCRET, "dealloc_return", "", "")
+DEF_ATTRIB(DEALLOCFRAME, "deallocframe", "", "")
+
+/* Instruction Types */
+
+DEF_ATTRIB(IT_ALU, "ALU type", "", "")
+DEF_ATTRIB(IT_ALU_ADDSUB, "ALU add or subtract type", "", "")
+DEF_ATTRIB(IT_ALU_MINMAX, "ALU MIN or MAX type", "", "")
+DEF_ATTRIB(IT_ALU_MOVE, "ALU data movement type", "", "")
+DEF_ATTRIB(IT_ALU_LOGICAL, "ALU logical operation type", "", "")
+DEF_ATTRIB(IT_ALU_SHIFT, "ALU shift operation type", "", "")
+DEF_ATTRIB(IT_ALU_SHIFT_AND_OP, "ALU shift and additional op type", "", "")
+DEF_ATTRIB(IT_ALU_CMP, "ALU compare operation type", "", "")
+
+DEF_ATTRIB(IT_LOAD, "Loads from memory", "", "")
+DEF_ATTRIB(IT_STORE, "Stores to memory", "", "")
+
+DEF_ATTRIB(IT_MPY, "Multiply type", "", "")
+DEF_ATTRIB(IT_MPY_32, "32-bit Multiply type", "", "")
+
+DEF_ATTRIB(IT_COF, "Change-of-flow type", "", "")
+DEF_ATTRIB(IT_HWLOOP, "Sets up hardware loop registers", "", "")
+
+DEF_ATTRIB(IT_MISC, "misc instruction type", "", "")
+
+DEF_ATTRIB(IT_NOP, "nop instruction", "", "")
+DEF_ATTRIB(IT_EXTENDER, "constant extender instruction", "", "")
+
+
+/* Exceptions the instruction can generate */
+
+DEF_ATTRIB(EXCEPTION_TLB, "Can generate a TLB Miss Exception", "", "")
+DEF_ATTRIB(EXCEPTION_ACCESS, "Can generate Access Violation Exception", "", "")
+DEF_ATTRIB(EXCEPTION_SWI, "Software Interrupt (trap) exception", "", "")
+
+
+/* Documentation Notes */
+DEF_ATTRIB(NOTE_ARCHV2, "Only available in the V2 architecture", "", "")
+
+DEF_ATTRIB(NOTE_PACKET_PC, "The PC is the addr of the start of the pkt", "", "")
+
+DEF_ATTRIB(NOTE_PACKET_NPC, "Next PC is the address following pkt", "", "")
+
+DEF_ATTRIB(NOTE_CONDITIONAL, "can be conditionally executed", "", "")
+
+DEF_ATTRIB(NOTE_NEWVAL_SLOT0, "New-value oprnd must execute on slot 0", "", "")
+
+DEF_ATTRIB(NOTE_RELATIVE_ADDRESS, "A PC-relative address is formed", "", "")
+
+DEF_ATTRIB(NOTE_LA_RESTRICT, "Cannot be in the last pkt of a HW loop", "", "")
+
+DEF_ATTRIB(NOTE_OOBVSHIFT, "Possible shift overflow", "", "")
+DEF_ATTRIB(NOTE_BIDIRSHIFT, "Bidirectional shift", "", "")
+
+DEF_ATTRIB(NOTE_CVFLAGS, "Sets the Carry and Overflow flags in USR.", "", "")
+DEF_ATTRIB(NOTE_SR_OVF_WHEN_SATURATING, "Might set OVF bit", "", "")
+DEF_ATTRIB(NOTE_PRIV, "Monitor-level feature", "", "")
+DEF_ATTRIB(NOTE_GUEST, "Guest-level feature", "", "")
+DEF_ATTRIB(NOTE_NOPACKET, "solo instruction", "", "")
+DEF_ATTRIB(NOTE_AXOK, "May only be grouped with ALU32 or non-FP XTYPE.", "", "")
+DEF_ATTRIB(NOTE_NOSLOT1, "Packet with this insn must have slot 1 empty", "", "")
+DEF_ATTRIB(NOTE_SLOT1_AOK, "Packet must have slot 1 empty or ALU32", "", "")
+DEF_ATTRIB(NOTE_NOSLOT01, "Packet must have both slot 1 and 2 empty", "", "")
+DEF_ATTRIB(NOTE_NEEDS_MEMLD, "Must be grouped with a memory load", "", "")
+DEF_ATTRIB(NOTE_LATEPRED, "The predicate can not be used as a .new", "", "")
+DEF_ATTRIB(NOTE_COMPAT_ACCURACY, "In the future accuracy may increase", "", "")
+DEF_ATTRIB(NOTE_NVSLOT0, "Can execute only in slot 0 (ST)", "", "")
+DEF_ATTRIB(NOTE_DEPRECATED, "Will be deprecated in a future version.", "", "")
+DEF_ATTRIB(NOTE_NONAPALIV1, "may not work correctly in Napali V1.", "", "")
+DEF_ATTRIB(NOTE_BADTAG_UNDEF, "Undefined if a tag is non-present", "", "")
+DEF_ATTRIB(NOTE_NOSLOT2_MPY, "Packet cannot have a slot 2 multiply", "", "")
+DEF_ATTRIB(NOTE_HVX_ONLY, "Only available on a core with HVX.", "", "")
+
+DEF_ATTRIB(NOTE_NOCOF_RESTRICT, "Cannot be grouped with any COF", "", "")
+DEF_ATTRIB(NOTE_BRANCHADDER_MAX1, "One PC-plus-offset calculation", "", "")
+
+DEF_ATTRIB(NOTE_CRSLOT23, "Execute on either slot2 or slot3 (CR)", "", "")
+DEF_ATTRIB(NOTE_EXTENSION_AUDIO, "Hexagon audio extensions", "", "")
+
+
+/* V6 MMVector Notes for Documentation */
+DEF_ATTRIB(NOTE_ANY_RESOURCE, "Can use any HVX resource.", "", "")
+DEF_ATTRIB(NOTE_ANY2_RESOURCE, "Uses any pair of the HVX resources", "", "")
+DEF_ATTRIB(NOTE_PERMUTE_RESOURCE, "Uses the HVX permute resource.", "", "")
+DEF_ATTRIB(NOTE_SHIFT_RESOURCE, "Uses the HVX shift resource.", "", "")
+DEF_ATTRIB(NOTE_MPY_RESOURCE, "Uses a HVX multiply resource.", "", "")
+DEF_ATTRIB(NOTE_MPYDV_RESOURCE, "Uses both HVX multiply resources.", "", "")
+DEF_ATTRIB(NOTE_NT_VMEM, "Non-temporal hint to the micro-architecture", "", "")
+DEF_ATTRIB(NOTE_ALL_RESOURCE, "Uses all HVX resources.", "", "")
+DEF_ATTRIB(NOTE_VMEM, "Immediates are in multiples of vector length.", "", "")
+DEF_ATTRIB(NOTE_ANY_VS_VX_RESOURCE, "Consumes two resources", "", "")
+
+DEF_ATTRIB(NOTE_RT8, "Input scalar register Rt is limited to R0-R7", "", "")
+
+/* Restrictions to make note of */
+DEF_ATTRIB(RESTRICT_LOOP_LA, "Cannot be in the last packet of a loop", "", "")
+DEF_ATTRIB(RESTRICT_NEEDS_MEMLD, "Must be grouped with a load", "", "")
+DEF_ATTRIB(RESTRICT_COF_MAX1, "One change-of-flow per packet", "", "")
+DEF_ATTRIB(RESTRICT_NOPACKET, "Not allowed in a packet", "", "")
+DEF_ATTRIB(RESTRICT_NOSRMOVE, "Do not write SR in the same packet", "", "")
+DEF_ATTRIB(RESTRICT_SLOT0ONLY, "Must execute on slot0", "", "")
+DEF_ATTRIB(RESTRICT_SLOT1ONLY, "Must execute on slot1", "", "")
+DEF_ATTRIB(RESTRICT_SLOT2ONLY, "Must execute on slot2", "", "")
+DEF_ATTRIB(RESTRICT_SLOT3ONLY, "Must execute on slot3", "", "")
+DEF_ATTRIB(RESTRICT_NOSLOT2_MPY, "A packet cannot have a slot 2 mpy", "", "")
+DEF_ATTRIB(RESTRICT_NOSLOT1, "No slot 1 instruction in parallel", "", "")
+DEF_ATTRIB(RESTRICT_SLOT1_AOK, "Slot 1 insn must be empty or A-type", "", "")
+DEF_ATTRIB(RESTRICT_NOSLOT01, "No slot 0 or 1 instructions in parallel", "", "")
+DEF_ATTRIB(RESTRICT_NOSLOT1_STORE, "Packet must not have slot 1 store", "", "")
+DEF_ATTRIB(RESTRICT_NOSLOT0_LOAD, "Packet must not have a slot 1 load", "", "")
+DEF_ATTRIB(RESTRICT_NOCOF, "Cannot be grouped with any COF", "", "")
+DEF_ATTRIB(RESTRICT_BRANCHADDER_MAX1, "One PC-plus-offset calculation", "", "")
+DEF_ATTRIB(RESTRICT_PREFERSLOT0, "Try to encode into slot 0", "", "")
+DEF_ATTRIB(RESTRICT_SINGLE_MEM_FIRST, "Single memory op must be last", "", "")
+DEF_ATTRIB(RESTRICT_PACKET_AXOK, "May exist with A-type or X-type", "", "")
+DEF_ATTRIB(RESTRICT_PACKET_SOMEREGS_OK, "Relaxed grouping rules", "", "")
+DEF_ATTRIB(RESTRICT_LATEPRED, "Predicate can not be used as a .new.", "", "")
+
+DEF_ATTRIB(PAIR_1OF2, "For assembler", "", "")
+DEF_ATTRIB(PAIR_2OF2, "For assembler", "", "")
+
+/* Performance based preferences */
+DEF_ATTRIB(PREFER_SLOT3, "Complex XU prefering slot3", "", "")
+
+DEF_ATTRIB(RELAX_COF_1ST, "COF can be fisrt in assembly order", "", "")
+DEF_ATTRIB(RELAX_COF_2ND, "COF can be second in assembly order", "", "")
+
+DEF_ATTRIB(ICOP, "Instruction cache op", "", "")
+
+DEF_ATTRIB(INTRINSIC_RETURNS_UNSIGNED, "Intrinsic returns an unsigned", "", "")
+
+DEF_ATTRIB(PRED_BIT_1, "The branch uses bit 1 as the prediction bit", "", "")
+DEF_ATTRIB(PRED_BIT_4, "The branch uses bit 4 as the prediction bit", "", "")
+DEF_ATTRIB(PRED_BIT_8, "The branch uses bit 8 as the prediction bit", "", "")
+DEF_ATTRIB(PRED_BIT_12, "The branch uses bit 12 as the prediction bit", "", "")
+DEF_ATTRIB(PRED_BIT_13, "The branch uses bit 13 as the prediction bit", "", "")
+DEF_ATTRIB(PRED_BIT_7, "The branch uses bit 7 as the prediction bit", "", "")
+DEF_ATTRIB(HWLOOP0_SETUP, "Sets up HW loop0", "", "")
+DEF_ATTRIB(HWLOOP1_SETUP, "Sets up HW loop1", "", "")
+DEF_ATTRIB(HWLOOP0_END, "Ends HW loop0", "", "")
+DEF_ATTRIB(HWLOOP1_END, "Ends HW loop1", "", "")
+DEF_ATTRIB(RET_TYPE, "return type", "", "")
+DEF_ATTRIB(HINTJR, "hintjr type", "", "")
+DEF_ATTRIB(DCZEROA, "dczeroa type", "", "")
+DEF_ATTRIB(ICTAGOP, "ictag op type", "", "")
+DEF_ATTRIB(ICFLUSHOP, "icflush op type", "", "")
+DEF_ATTRIB(DCFLUSHOP, "dcflush op type", "", "")
+DEF_ATTRIB(DCTAGOP, "dctag op type", "", "")
+DEF_ATTRIB(L2FLUSHOP, "l2flush op type", "", "")
+DEF_ATTRIB(L2TAGOP, "l2tag op type", "", "")
+DEF_ATTRIB(DCFETCH, "dcfetch type", "", "")
+DEF_ATTRIB(BIMODAL_BRANCH, "Updates the bimodal branch predictor", "", "")
+
+DEF_ATTRIB(VECINSN, "Long Vector Instruction", "", "")
+DEF_ATTRIB(MEMSIZE_32B, "Memory width is 32 bytes", "", "")
+DEF_ATTRIB(FOUR_PHASE, "Four Phase Instruction", "", "")
+DEF_ATTRIB(L2FETCH, "Instruction is l2fetch type", "", "")
+
+DEF_ATTRIB(PREDUSE_BSB, "Instructions need back-skip-back scheduling", "", "")
+DEF_ATTRIB(ICINVA, "icinva", "", "")
+DEF_ATTRIB(DCCLEANINVA, "dccleaninva", "", "")
+
+DEF_ATTRIB(EXTENSION_AUDIO, "audio extension", "", "")
+
+DEF_ATTRIB(MEMCPY, "memcpy or dma-type instruction", "", "")
+DEF_ATTRIB(NO_INTRINSIC, "Don't generate an intrisic", "", "")
+
+DEF_ATTRIB(NO_XML, "Don't generate a XML docs for this instruction", "", "")
+
+/* Keep this as the last attribute: */
+DEF_ATTRIB(ZZ_LASTATTRIB, "Last attribute in the file", "", "")
+
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 13/67] Hexagon register map
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (11 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 12/67] Hexagon instruction attributes Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-02-28 16:43 ` [RFC PATCH v2 14/67] Hexagon instruction/packet decode Taylor Simpson
                   ` (54 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Certain operand types represent a non-contiguous set of values.
For example, the compound compare-and-jump instruction can only access
registers R0-R7 and R16-23.
This table represents the mapping from the encoding to the actual values.

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/regmap.h | 38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)
 create mode 100644 target/hexagon/regmap.h

diff --git a/target/hexagon/regmap.h b/target/hexagon/regmap.h
new file mode 100644
index 0000000..2bcc0de
--- /dev/null
+++ b/target/hexagon/regmap.h
@@ -0,0 +1,38 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+/*
+ *  Certain operand types represent a non-contiguous set of values.
+ *  For example, the compound compare-and-jump instruction can only access
+ *  registers R0-R7 and R16-23.
+ *  This table represents the mapping from the encoding to the actual values.
+ */
+
+#ifndef HEXAGON_REGMAP_H
+#define HEXAGON_REGMAP_H
+
+        /* Name   Num Table */
+DEF_REGMAP(R_16,  16, 0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23)
+DEF_REGMAP(R__8,  8,  0, 2, 4, 6, 16, 18, 20, 22)
+DEF_REGMAP(R__4,  4,  0, 2, 4, 6)
+DEF_REGMAP(R_4,   4,  0, 1, 2, 3)
+DEF_REGMAP(R_8S,  8,  0, 1, 2, 3, 16, 17, 18, 19)
+DEF_REGMAP(R_8,   8,  0, 1, 2, 3, 4, 5, 6, 7)
+DEF_REGMAP(V__8,  8,  0, 4, 8, 12, 16, 20, 24, 28)
+DEF_REGMAP(V__16, 16, 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30)
+
+#endif
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 14/67] Hexagon instruction/packet decode
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (12 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 13/67] Hexagon register map Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-02-28 16:43 ` [RFC PATCH v2 15/67] Hexagon instruction printing Taylor Simpson
                   ` (53 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Take the words from instruction memory and build a packet_t for TCG code
generation

The following operations are performed
    Convert the .new encoded offset to the register number of the producer
    Reorder the instructions in the packet so .new producer is before consumer
    Apply constant extenders
    Separate subinsn's into two instructions
    Break compare-jumps into two instructions
    Create instructions for :endloop

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/decode.h     |  39 +++
 target/hexagon/decode.c     | 769 ++++++++++++++++++++++++++++++++++++++++++++
 target/hexagon/q6v_decode.c | 402 +++++++++++++++++++++++
 3 files changed, 1210 insertions(+)
 create mode 100644 target/hexagon/decode.h
 create mode 100644 target/hexagon/decode.c
 create mode 100644 target/hexagon/q6v_decode.c

diff --git a/target/hexagon/decode.h b/target/hexagon/decode.h
new file mode 100644
index 0000000..7f63b1c
--- /dev/null
+++ b/target/hexagon/decode.h
@@ -0,0 +1,39 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HEXAGON_DECODE_H
+#define HEXAGON_DECODE_H
+
+#include "cpu.h"
+#include "opcodes.h"
+#include "hex_arch_types.h"
+#include "insn.h"
+
+extern void decode_init(void);
+
+static inline int is_packet_end(uint32_t word)
+{
+    uint32_t bits = (word >> 14) & 0x3;
+    return ((bits == 0x3) || (bits == 0x0));
+}
+
+extern void decode_send_insn_to(packet_t *packet, int start, int newloc);
+
+extern packet_t *decode_this(int max_words, size4u_t *words,
+                             packet_t *decode_pkt);
+
+#endif
diff --git a/target/hexagon/decode.c b/target/hexagon/decode.c
new file mode 100644
index 0000000..2201c23
--- /dev/null
+++ b/target/hexagon/decode.c
@@ -0,0 +1,769 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "iclass.h"
+#include "opcodes.h"
+#include "genptr.h"
+#include "decode.h"
+#include "insn.h"
+#include "macros.h"
+#include "printinsn.h"
+
+enum {
+    EXT_IDX_noext = 0,
+    EXT_IDX_noext_AFTER = 4,
+    EXT_IDX_mmvec = 4,
+    EXT_IDX_mmvec_AFTER = 8,
+    XX_LAST_EXT_IDX
+};
+
+#define DEF_REGMAP(NAME, ELEMENTS, ...) \
+    static const unsigned int DECODE_REGISTER_##NAME[ELEMENTS] = \
+    { __VA_ARGS__ };
+#include "regmap.h"
+
+#define DECODE_MAPPED_REG(REGNO, NAME) \
+    insn->regno[REGNO] = DECODE_REGISTER_##NAME[insn->regno[REGNO]];
+
+typedef struct {
+    struct _dectree_table_struct *table_link;
+    struct _dectree_table_struct *table_link_b;
+    opcode_t opcode;
+    enum {
+        DECTREE_ENTRY_INVALID,
+        DECTREE_TABLE_LINK,
+        DECTREE_SUBINSNS,
+        DECTREE_EXTSPACE,
+        DECTREE_TERMINAL
+    } type;
+} dectree_entry_t;
+
+typedef struct _dectree_table_struct {
+    unsigned int (*lookup_function)(int startbit, int width, size4u_t opcode);
+    unsigned int size;
+    unsigned int startbit;
+    unsigned int width;
+    dectree_entry_t table[];
+} dectree_table_t;
+
+#define DECODE_NEW_TABLE(TAG, SIZE, WHATNOT) \
+    static struct _dectree_table_struct dectree_table_##TAG;
+#define TABLE_LINK(TABLE)                     /* NOTHING */
+#define TERMINAL(TAG, ENC)                    /* NOTHING */
+#define SUBINSNS(TAG, CLASSA, CLASSB, ENC)    /* NOTHING */
+#define EXTSPACE(TAG, ENC)                    /* NOTHING */
+#define INVALID()                             /* NOTHING */
+#define DECODE_END_TABLE(...)                 /* NOTHING */
+#define DECODE_MATCH_INFO(...)                /* NOTHING */
+#define DECODE_LEGACY_MATCH_INFO(...)         /* NOTHING */
+#define DECODE_OPINFO(...)                    /* NOTHING */
+
+#include "dectree_generated.h"
+
+#undef DECODE_OPINFO
+#undef DECODE_MATCH_INFO
+#undef DECODE_LEGACY_MATCH_INFO
+#undef DECODE_END_TABLE
+#undef INVALID
+#undef TERMINAL
+#undef SUBINSNS
+#undef EXTSPACE
+#undef TABLE_LINK
+#undef DECODE_NEW_TABLE
+#undef DECODE_SEPARATOR_BITS
+
+#define DECODE_SEPARATOR_BITS(START, WIDTH) NULL, START, WIDTH
+#define DECODE_NEW_TABLE_HELPER(TAG, SIZE, FN, START, WIDTH) \
+    static dectree_table_t dectree_table_##TAG = { \
+        .size = SIZE, \
+        .lookup_function = FN, \
+        .startbit = START, \
+        .width = WIDTH, \
+        .table = {
+#define DECODE_NEW_TABLE(TAG, SIZE, WHATNOT) \
+    DECODE_NEW_TABLE_HELPER(TAG, SIZE, WHATNOT)
+
+#define TABLE_LINK(TABLE) \
+    { .type = DECTREE_TABLE_LINK, .table_link = &dectree_table_##TABLE },
+#define TERMINAL(TAG, ENC) \
+    { .type = DECTREE_TERMINAL, .opcode = TAG  },
+#define SUBINSNS(TAG, CLASSA, CLASSB, ENC) \
+    { \
+        .type = DECTREE_SUBINSNS, \
+        .table_link = &dectree_table_DECODE_SUBINSN_##CLASSA, \
+        .table_link_b = &dectree_table_DECODE_SUBINSN_##CLASSB \
+    },
+#define EXTSPACE(TAG, ENC) { .type = DECTREE_EXTSPACE },
+#define INVALID() { .type = DECTREE_ENTRY_INVALID, .opcode = XX_LAST_OPCODE },
+
+#define DECODE_END_TABLE(...) } };
+
+#define DECODE_MATCH_INFO(...)                /* NOTHING */
+#define DECODE_LEGACY_MATCH_INFO(...)         /* NOTHING */
+#define DECODE_OPINFO(...)                    /* NOTHING */
+
+#include "dectree_generated.h"
+
+#undef DECODE_OPINFO
+#undef DECODE_MATCH_INFO
+#undef DECODE_LEGACY_MATCH_INFO
+#undef DECODE_END_TABLE
+#undef INVALID
+#undef TERMINAL
+#undef SUBINSNS
+#undef EXTSPACE
+#undef TABLE_LINK
+#undef DECODE_NEW_TABLE
+#undef DECODE_NEW_TABLE_HELPER
+#undef DECODE_SEPARATOR_BITS
+
+static dectree_table_t dectree_table_DECODE_EXT_EXT_noext = {
+    .size = 1, .lookup_function = NULL, .startbit = 0, .width = 0,
+    .table = {
+        { .type = DECTREE_ENTRY_INVALID, .opcode = XX_LAST_OPCODE },
+    }
+};
+
+static dectree_table_t *ext_trees[XX_LAST_EXT_IDX];
+
+static void decode_ext_init(void)
+{
+    int i;
+    for (i = EXT_IDX_noext; i < EXT_IDX_noext_AFTER; i++) {
+        ext_trees[i] = &dectree_table_DECODE_EXT_EXT_noext;
+    }
+}
+
+typedef struct {
+    size4u_t mask;
+    size4u_t match;
+} decode_itable_entry_t;
+
+#define DECODE_NEW_TABLE(TAG, SIZE, WHATNOT)  /* NOTHING */
+#define TABLE_LINK(TABLE)                     /* NOTHING */
+#define TERMINAL(TAG, ENC)                    /* NOTHING */
+#define SUBINSNS(TAG, CLASSA, CLASSB, ENC)    /* NOTHING */
+#define EXTSPACE(TAG, ENC)                    /* NOTHING */
+#define INVALID()                             /* NOTHING */
+#define DECODE_END_TABLE(...)                 /* NOTHING */
+#define DECODE_OPINFO(...)                    /* NOTHING */
+
+#define DECODE_MATCH_INFO_NORMAL(TAG, MASK, MATCH) \
+    [TAG] = { \
+        .mask = MASK, \
+        .match = MATCH, \
+    },
+
+#define DECODE_MATCH_INFO_NULL(TAG, MASK, MATCH) \
+    [TAG] = { .match = ~0 },
+
+#define DECODE_MATCH_INFO(...) DECODE_MATCH_INFO_NORMAL(__VA_ARGS__)
+#define DECODE_LEGACY_MATCH_INFO(...) /* NOTHING */
+
+static const decode_itable_entry_t decode_itable[XX_LAST_OPCODE] = {
+#include "dectree_generated.h"
+};
+
+#undef DECODE_MATCH_INFO
+#define DECODE_MATCH_INFO(...) DECODE_MATCH_INFO_NULL(__VA_ARGS__)
+
+#undef DECODE_LEGACY_MATCH_INFO
+#define DECODE_LEGACY_MATCH_INFO(...) DECODE_MATCH_INFO_NORMAL(__VA_ARGS__)
+
+static const decode_itable_entry_t decode_legacy_itable[XX_LAST_OPCODE] = {
+#include "dectree_generated.h"
+};
+
+#undef DECODE_OPINFO
+#undef DECODE_MATCH_INFO
+#undef DECODE_LEGACY_MATCH_INFO
+#undef DECODE_END_TABLE
+#undef INVALID
+#undef TERMINAL
+#undef SUBINSNS
+#undef EXTSPACE
+#undef TABLE_LINK
+#undef DECODE_NEW_TABLE
+#undef DECODE_SEPARATOR_BITS
+
+void decode_init(void)
+{
+    decode_ext_init();
+}
+
+void decode_send_insn_to(packet_t *packet, int start, int newloc)
+{
+    insn_t tmpinsn;
+    int direction;
+    int i;
+    if (start == newloc) {
+        return;
+    }
+    if (start < newloc) {
+        /* Move towards end */
+        direction = 1;
+    } else {
+        /* move towards beginning */
+        direction = -1;
+    }
+    for (i = start; i != newloc; i += direction) {
+        tmpinsn = packet->insn[i];
+        packet->insn[i] = packet->insn[i + direction];
+        packet->insn[i + direction] = tmpinsn;
+    }
+}
+
+/* Fill newvalue registers with the correct regno */
+static int
+decode_fill_newvalue_regno(packet_t *packet)
+{
+    int i, def_regnum, use_regidx, def_idx;
+    size2u_t def_opcode, use_opcode;
+    char *dststr;
+
+    for (i = 1; i < packet->num_insns; i++) {
+        if (GET_ATTRIB(packet->insn[i].opcode, A_DOTNEWVALUE) &&
+            !GET_ATTRIB(packet->insn[i].opcode, A_EXTENSION)) {
+            use_opcode = packet->insn[i].opcode;
+
+            /* It's a store, so we're adjusting the Nt field */
+            if (GET_ATTRIB(use_opcode, A_STORE)) {
+                use_regidx = strchr(opcode_reginfo[use_opcode], 't') -
+                    opcode_reginfo[use_opcode];
+            } else {    /* It's a Jump, so we're adjusting the Ns field */
+                use_regidx = strchr(opcode_reginfo[use_opcode], 's') -
+                    opcode_reginfo[use_opcode];
+            }
+
+            /*
+             * What's encoded at the N-field is the offset to who's producing
+             * the value.  Shift off the LSB which indicates odd/even register.
+             */
+            def_idx = i - ((packet->insn[i].regno[use_regidx]) >> 1);
+
+            /*
+             * Check for a badly encoded N-field which points to an instruction
+             * out-of-range
+             */
+            if ((def_idx < 0) || (def_idx > (packet->num_insns - 1))) {
+                g_assert_not_reached();
+                return 1;
+            }
+
+            /* previous insn is the producer */
+            def_opcode = packet->insn[def_idx].opcode;
+            dststr = strstr(opcode_wregs[def_opcode], "Rd");
+            if (dststr) {
+                dststr = strchr(opcode_reginfo[def_opcode], 'd');
+            } else {
+                dststr = strstr(opcode_wregs[def_opcode], "Rx");
+                if (dststr) {
+                    dststr = strchr(opcode_reginfo[def_opcode], 'x');
+                } else {
+                    dststr = strstr(opcode_wregs[def_opcode], "Re");
+                    if (dststr) {
+                        dststr = strchr(opcode_reginfo[def_opcode], 'e');
+                    } else {
+                        dststr = strstr(opcode_wregs[def_opcode], "Ry");
+                        if (dststr) {
+                            dststr = strchr(opcode_reginfo[def_opcode], 'y');
+                        } else {
+                            g_assert_not_reached();
+                            return 1;
+                        }
+                    }
+                }
+            }
+            g_assert(dststr != NULL);
+            def_regnum =
+                packet->insn[def_idx].regno[dststr -
+                    opcode_reginfo[def_opcode]];
+
+            /* Now patch up the consumer with the register number */
+            packet->insn[i].regno[use_regidx] = def_regnum;
+            /*
+             * We need to remember who produces this value to later
+             * check if it was dynamically cancelled
+             */
+            packet->insn[i].new_value_producer_slot =
+                packet->insn[def_idx].slot;
+        }
+    }
+    return 0;
+}
+
+/* Split CJ into a compare and a jump */
+static int decode_split_cmpjump(packet_t *pkt)
+{
+    int last, i;
+    int numinsns = pkt->num_insns;
+
+    /*
+     * First, split all compare-jumps.
+     * The compare is sent to the end as a new instruction.
+     * Do it this way so we don't reorder dual jumps. Those need to stay in
+     * original order.
+     */
+    for (i = 0; i < numinsns; i++) {
+        /* It's a cmp-jump */
+        if (GET_ATTRIB(pkt->insn[i].opcode, A_NEWCMPJUMP)) {
+            last = pkt->num_insns;
+            pkt->insn[last] = pkt->insn[i];    /* copy the instruction */
+            pkt->insn[last].part1 = 1;    /* last instruction does the CMP */
+            pkt->insn[i].part1 = 0;    /* existing instruction does the JUMP */
+        pkt->num_insns++;
+        }
+    }
+
+    /* Now re-shuffle all the compares back to the beginning */
+    for (i = 0; i < pkt->num_insns; i++) {
+        if (pkt->insn[i].part1) {
+            decode_send_insn_to(pkt, i, 0);
+        }
+    }
+    return 0;
+}
+
+static inline int decode_opcode_can_jump(int opcode)
+{
+    if ((GET_ATTRIB(opcode, A_JUMP)) ||
+        (GET_ATTRIB(opcode, A_CALL)) ||
+        (opcode == J2_trap0) ||
+        (opcode == J2_trap1) ||
+        (opcode == J2_rte) ||
+        (opcode == J2_pause)) {
+        /* Exception to A_JUMP attribute */
+        if (opcode == J4_hintjumpr) {
+            return 0;
+        }
+        return 1;
+    }
+
+    return 0;
+}
+
+static inline int decode_opcode_ends_loop(int opcode)
+{
+    return GET_ATTRIB(opcode, A_HWLOOP0_END) ||
+           GET_ATTRIB(opcode, A_HWLOOP1_END);
+}
+
+/* Set the is_* fields in each instruction */
+static int decode_set_insn_attr_fields(packet_t *pkt)
+{
+    int i;
+    int numinsns = pkt->num_insns;
+    size2u_t opcode;
+    int loads = 0;
+    int stores = 0;
+    int canjump;
+    int total_slots_valid = 0;
+
+    pkt->num_rops = 0;
+    pkt->pkt_has_cof = 0;
+    pkt->pkt_has_call = 0;
+    pkt->pkt_has_jumpr = 0;
+    pkt->pkt_has_cjump = 0;
+    pkt->pkt_has_cjump_dotnew = 0;
+    pkt->pkt_has_cjump_dotold = 0;
+    pkt->pkt_has_cjump_newval = 0;
+    pkt->pkt_has_endloop = 0;
+    pkt->pkt_has_endloop0 = 0;
+    pkt->pkt_has_endloop01 = 0;
+    pkt->pkt_has_endloop1 = 0;
+    pkt->pkt_has_cacheop = 0;
+    pkt->memop_or_nvstore = 0;
+    pkt->pkt_has_dczeroa = 0;
+    pkt->pkt_has_dealloc_return = 0;
+
+    for (i = 0; i < numinsns; i++) {
+        opcode = pkt->insn[i].opcode;
+        if (pkt->insn[i].part1) {
+            continue;    /* Skip compare of cmp-jumps */
+        }
+
+        if (GET_ATTRIB(opcode, A_ROPS_3)) {
+            pkt->num_rops += 3;
+        } else if (GET_ATTRIB(opcode, A_ROPS_2)) {
+            pkt->num_rops += 2;
+        } else {
+            pkt->num_rops++;
+        }
+        if (pkt->insn[i].extension_valid) {
+            pkt->num_rops += 2;
+        }
+
+        if (GET_ATTRIB(opcode, A_MEMOP) ||
+            GET_ATTRIB(opcode, A_NVSTORE)) {
+            pkt->memop_or_nvstore = 1;
+        }
+
+        if (GET_ATTRIB(opcode, A_CACHEOP)) {
+            pkt->pkt_has_cacheop = 1;
+            if (GET_ATTRIB(opcode, A_DCZEROA)) {
+                pkt->pkt_has_dczeroa = 1;
+            }
+            if (GET_ATTRIB(opcode, A_ICTAGOP)) {
+                pkt->pkt_has_ictagop = 1;
+            }
+            if (GET_ATTRIB(opcode, A_ICFLUSHOP)) {
+                pkt->pkt_has_icflushop = 1;
+            }
+            if (GET_ATTRIB(opcode, A_DCTAGOP)) {
+                pkt->pkt_has_dctagop = 1;
+            }
+            if (GET_ATTRIB(opcode, A_DCFLUSHOP)) {
+                pkt->pkt_has_dcflushop = 1;
+            }
+            if (GET_ATTRIB(opcode, A_L2TAGOP)) {
+                pkt->pkt_has_l2tagop = 1;
+            }
+            if (GET_ATTRIB(opcode, A_L2FLUSHOP)) {
+                pkt->pkt_has_l2flushop = 1;
+            }
+        }
+
+        if (GET_ATTRIB(opcode, A_DEALLOCRET)) {
+            pkt->pkt_has_dealloc_return = 1;
+        }
+
+        if (GET_ATTRIB(opcode, A_STORE)) {
+            pkt->insn[i].is_store = 1;
+
+            if (pkt->insn[i].slot == 0) {
+                pkt->pkt_has_store_s0 = 1;
+            } else {
+                pkt->pkt_has_store_s1 = 1;
+            }
+        }
+        if (GET_ATTRIB(opcode, A_DCFETCH)) {
+            pkt->insn[i].is_dcfetch = 1;
+        }
+        if (GET_ATTRIB(opcode, A_LOAD)) {
+            pkt->insn[i].is_load = 1;
+
+            if (pkt->insn[i].slot == 0) {
+                pkt->pkt_has_load_s0 = 1;
+            } else {
+                pkt->pkt_has_load_s1 = 1;
+            }
+        }
+        if (GET_ATTRIB(opcode, A_MEMOP)) {
+            pkt->insn[i].is_memop = 1;
+        }
+        if (GET_ATTRIB(opcode, A_DEALLOCRET) ||
+            GET_ATTRIB(opcode, A_DEALLOCFRAME)) {
+            pkt->insn[i].is_dealloc = 1;
+        }
+        if (GET_ATTRIB(opcode, A_DCFLUSHOP) ||
+            GET_ATTRIB(opcode, A_DCTAGOP)) {
+            pkt->insn[i].is_dcop = 1;
+        }
+
+        pkt->pkt_has_call |= GET_ATTRIB(opcode, A_CALL);
+        pkt->pkt_has_jumpr |= GET_ATTRIB(opcode, A_INDIRECT) &&
+                              !GET_ATTRIB(opcode, A_HINTJR);
+        pkt->pkt_has_cjump |= GET_ATTRIB(opcode, A_CJUMP);
+        pkt->pkt_has_cjump_dotnew |= GET_ATTRIB(opcode, A_DOTNEW) &&
+                                     GET_ATTRIB(opcode, A_CJUMP);
+        pkt->pkt_has_cjump_dotold |= GET_ATTRIB(opcode, A_DOTOLD) &&
+                                     GET_ATTRIB(opcode, A_CJUMP);
+        pkt->pkt_has_cjump_newval |= GET_ATTRIB(opcode, A_DOTNEWVALUE) &&
+                                     GET_ATTRIB(opcode, A_CJUMP);
+
+        canjump = decode_opcode_can_jump(opcode);
+
+        if (pkt->pkt_has_cof) {
+            if (canjump) {
+                pkt->pkt_has_dual_jump = 1;
+                pkt->insn[i].is_2nd_jump = 1;
+            }
+        } else {
+            pkt->pkt_has_cof |= canjump;
+        }
+
+        pkt->insn[i].is_endloop = decode_opcode_ends_loop(opcode);
+
+        pkt->pkt_has_endloop |= pkt->insn[i].is_endloop;
+        pkt->pkt_has_endloop0 |= GET_ATTRIB(opcode, A_HWLOOP0_END) &&
+                                 !GET_ATTRIB(opcode, A_HWLOOP1_END);
+        pkt->pkt_has_endloop01 |= GET_ATTRIB(opcode, A_HWLOOP0_END) &&
+                                  GET_ATTRIB(opcode, A_HWLOOP1_END);
+        pkt->pkt_has_endloop1 |= GET_ATTRIB(opcode, A_HWLOOP1_END) &&
+                                 !GET_ATTRIB(opcode, A_HWLOOP0_END);
+
+        pkt->pkt_has_cof |= pkt->pkt_has_endloop;
+
+        /* Now create slot valids */
+        if (pkt->insn[i].is_endloop)    /* Don't count endloops */
+            continue;
+
+        switch (pkt->insn[i].slot) {
+        case 0:
+            pkt->slot0_valid = 1;
+            break;
+        case 1:
+            pkt->slot1_valid = 1;
+            break;
+        case 2:
+            pkt->slot2_valid = 1;
+            break;
+        case 3:
+            pkt->slot3_valid = 1;
+        break;
+        }
+        total_slots_valid++;
+
+        /* And track #loads/stores */
+        if (pkt->insn[i].is_store) {
+            stores++;
+        } else if (pkt->insn[i].is_load) {
+            loads++;
+        }
+    }
+
+    if (stores == 2) {
+        pkt->dual_store = 1;
+    } else if (loads == 2) {
+        pkt->dual_load = 1;
+    } else if ((loads == 1) && (stores == 1)) {
+        pkt->load_and_store = 1;
+    } else if (loads == 1) {
+        pkt->single_load = 1;
+    } else if (stores == 1) {
+        pkt->single_store = 1;
+    }
+
+    return 0;
+}
+
+/*
+ * Shuffle for execution
+ * Move stores to end (in same order as encoding)
+ * Move compares to beginning (for use by .new insns)
+ */
+static int decode_shuffle_for_execution(packet_t *packet)
+{
+    int changed = 0;
+    int i;
+    int flag;    /* flag means we've seen a non-memory instruction */
+    int n_mems;
+    int last_insn = packet->num_insns - 1;
+
+    /*
+     * Skip end loops, somehow an end loop is getting in and messing
+     * up the order
+     */
+    if (decode_opcode_ends_loop(packet->insn[last_insn].opcode)) {
+        last_insn--;
+    }
+
+    do {
+        changed = 0;
+        /*
+         * Stores go last, must not reorder.
+         * Cannot shuffle stores past loads, either.
+         * Iterate backwards.  If we see a non-memory instruction,
+         * then a store, shuffle the store to the front.  Don't shuffle
+         *  stores wrt each other or a load.
+         */
+        for (flag = n_mems = 0, i = last_insn; i >= 0; i--) {
+            int opcode = packet->insn[i].opcode;
+
+            if (flag && GET_ATTRIB(opcode, A_STORE)) {
+                decode_send_insn_to(packet, i, last_insn - n_mems);
+                n_mems++;
+                changed = 1;
+            } else if (GET_ATTRIB(opcode, A_STORE)) {
+                n_mems++;
+            } else if (GET_ATTRIB(opcode, A_LOAD)) {
+                /*
+                 * Don't set flag, since we don't want to shuffle a
+                 * store pasta load
+                 */
+                n_mems++;
+            } else if (GET_ATTRIB(opcode, A_DOTNEWVALUE)) {
+                /*
+                 * Don't set flag, since we don't want to shuffle past
+                 * a .new value
+                 */
+            } else {
+                flag = 1;
+            }
+        }
+
+        if (changed) {
+            continue;
+        }
+        /* Compares go first, may be reordered wrt each other */
+        for (flag = 0, i = 0; i < last_insn + 1; i++) {
+            int opcode = packet->insn[i].opcode;
+
+            if ((strstr(opcode_wregs[opcode], "Pd4") ||
+                 strstr(opcode_wregs[opcode], "Pe4")) &&
+                GET_ATTRIB(opcode, A_STORE) == 0) {
+                /* This should be a compare (not a store conditional) */
+                if (flag) {
+                    decode_send_insn_to(packet, i, 0);
+                    changed = 1;
+                    continue;
+                }
+            } else if (GET_ATTRIB(opcode, A_IMPLICIT_WRITES_P3) &&
+                       !decode_opcode_ends_loop(packet->insn[i].opcode)) {
+                /*
+                 * spNloop instruction
+                 * Don't reorder endloops; they are not valid for .new uses,
+                 * and we want to match HW
+                 */
+                if (flag) {
+                    decode_send_insn_to(packet, i, 0);
+                    changed = 1;
+                    continue;
+                }
+            } else if (GET_ATTRIB(opcode, A_IMPLICIT_WRITES_P0) &&
+                       !GET_ATTRIB(opcode, A_NEWCMPJUMP)) {
+                /* CABAC instruction */
+                if (flag) {
+                    decode_send_insn_to(packet, i, 0);
+                    changed = 1;
+                    continue;
+                }
+            } else {
+                flag = 1;
+            }
+        }
+        if (changed) {
+            continue;
+        }
+    } while (changed);
+
+    /*
+     * If we have a .new register compare/branch, move that to the very
+     * very end, past stores
+     */
+    for (i = 0; i < last_insn; i++) {
+        if (GET_ATTRIB(packet->insn[i].opcode, A_DOTNEWVALUE)) {
+            decode_send_insn_to(packet, i, last_insn);
+            break;
+        }
+    }
+
+    /*
+     * And at the very very very end, move any RTE's, since they update
+     * user/supervisor mode.
+     */
+    for (i = 0; i < last_insn; i++) {
+        if ((packet->insn[i].opcode == J2_rte)) {
+            decode_send_insn_to(packet, i, last_insn);
+            break;
+        }
+    }
+    return 0;
+}
+
+static void decode_assembler_count_fpops(packet_t *pkt)
+{
+    int i;
+    for (i = 0; i < pkt->num_insns; i++) {
+        if (GET_ATTRIB(pkt->insn[i].opcode, A_FPOP)) {
+            pkt->pkt_has_fp_op = 1;
+        }
+        if (GET_ATTRIB(pkt->insn[i].opcode, A_FPDOUBLE)) {
+            pkt->pkt_has_fpdp_op = 1;
+        } else if (GET_ATTRIB(pkt->insn[i].opcode, A_FPSINGLE)) {
+            pkt->pkt_has_fpsp_op = 1;
+        }
+    }
+}
+
+static int
+apply_extender(packet_t *pkt, int i, size4u_t extender)
+{
+    int immed_num;
+    size4u_t base_immed;
+
+    immed_num = opcode_which_immediate_is_extended(pkt->insn[i].opcode);
+    base_immed = pkt->insn[i].immed[immed_num];
+
+    pkt->insn[i].immed[immed_num] = extender | fZXTN(6, 32, base_immed);
+    return 0;
+}
+
+static int decode_apply_extenders(packet_t *packet)
+{
+    int i;
+    for (i = 0; i < packet->num_insns; i++) {
+        if (GET_ATTRIB(packet->insn[i].opcode, A_IT_EXTENDER)) {
+            packet->insn[i + 1].extension_valid = 1;
+            packet->pkt_has_payload = 1;
+            apply_extender(packet, i + 1, packet->insn[i].immed[0]);
+        }
+    }
+    return 0;
+}
+
+static int decode_remove_extenders(packet_t *packet)
+{
+    int i, j;
+    for (i = 0; i < packet->num_insns; i++) {
+        if (GET_ATTRIB(packet->insn[i].opcode, A_IT_EXTENDER)) {
+            for (j = i;
+                (j < packet->num_insns - 1) && (j < INSTRUCTIONS_MAX - 1);
+                j++) {
+                packet->insn[j] = packet->insn[j + 1];
+            }
+            packet->num_insns--;
+        }
+    }
+    return 0;
+}
+
+static const char *
+get_valid_slot_str(const packet_t *pkt, unsigned int slot)
+{
+    return find_iclass_slots(pkt->insn[slot].opcode,
+                             pkt->insn[slot].iclass);
+}
+
+#include "q6v_decode.c"
+
+packet_t *decode_this(int max_words, size4u_t *words, packet_t *decode_pkt)
+{
+    int ret;
+    ret = do_decode_packet(max_words, words, decode_pkt);
+    if (ret <= 0) {
+        /* ERROR or BAD PARSE */
+        return NULL;
+    }
+    return decode_pkt;
+}
+
+/* Used for "-d in_asm" logging */
+int disassemble_hexagon(uint32_t *words, int nwords, char *buf, int bufsize)
+{
+    packet_t pkt;
+
+    if (decode_this(nwords, words, &pkt)) {
+        snprint_a_pkt(buf, bufsize, &pkt);
+        return pkt.encod_pkt_size_in_bytes;
+    } else {
+        snprintf(buf, bufsize, "<invalid>");
+        return 0;
+    }
+}
diff --git a/target/hexagon/q6v_decode.c b/target/hexagon/q6v_decode.c
new file mode 100644
index 0000000..f2b1548
--- /dev/null
+++ b/target/hexagon/q6v_decode.c
@@ -0,0 +1,402 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#define DECODE_NEW_TABLE(TAG, SIZE, WHATNOT)     /* NOTHING */
+#define TABLE_LINK(TABLE)                        /* NOTHING */
+#define TERMINAL(TAG, ENC)                       /* NOTHING */
+#define SUBINSNS(TAG, CLASSA, CLASSB, ENC)       /* NOTHING */
+#define EXTSPACE(TAG, ENC)                       /* NOTHING */
+#define INVALID()                                /* NOTHING */
+#define DECODE_END_TABLE(...)                    /* NOTHING */
+#define DECODE_MATCH_INFO(...)                   /* NOTHING */
+#define DECODE_LEGACY_MATCH_INFO(...)            /* NOTHING */
+
+#define DECODE_REG(REGNO, WIDTH, STARTBIT) \
+    insn->regno[REGNO] = ((encoding >> STARTBIT) & ((1 << WIDTH) - 1));
+
+#define DECODE_IMPL_REG(REGNO, VAL) \
+    insn->regno[REGNO] = VAL;
+
+#define DECODE_IMM(IMMNO, WIDTH, STARTBIT, VALSTART) \
+    insn->immed[IMMNO] |= (((encoding >> STARTBIT) & ((1 << WIDTH) - 1))) << \
+                          (VALSTART);
+
+#define DECODE_IMM_SXT(IMMNO, WIDTH) \
+    insn->immed[IMMNO] = ((((size4s_t)insn->immed[IMMNO]) << (32 - WIDTH)) >> \
+                          (32 - WIDTH));
+
+#define DECODE_IMM_NEG(IMMNO, WIDTH) \
+    insn->immed[IMMNO] = -insn->immed[IMMNO];
+
+#define DECODE_IMM_SHIFT(IMMNO, SHAMT)                                 \
+    if ((!insn->extension_valid) || \
+        (insn->which_extended != IMMNO)) { \
+        insn->immed[IMMNO] <<= SHAMT; \
+    }
+
+#define DECODE_OPINFO(TAG, BEH) \
+    case TAG: \
+        { BEH  } \
+        break; \
+
+static void
+decode_op(insn_t *insn, opcode_t tag, size4u_t encoding)
+{
+    insn->immed[0] = 0;
+    insn->immed[1] = 0;
+    if (insn->extension_valid) {
+        insn->which_extended = opcode_which_immediate_is_extended(tag);
+    }
+    insn->opcode = tag;
+
+    switch (tag) {
+#include "dectree_generated.h"
+    default:
+        break;
+    }
+
+    insn->generate = opcode_genptr[tag];
+    insn->iclass = (encoding >> 28) & 0xf;
+    if (((encoding >> 14) & 3) == 0) {
+        insn->iclass += 16;
+    }
+}
+
+#undef DECODE_REG
+#undef DECODE_IMPL_REG
+#undef DECODE_IMM
+#undef DECODE_IMM_SHIFT
+#undef DECODE_OPINFO
+#undef DECODE_MATCH_INFO
+#undef DECODE_LEGACY_MATCH_INFO
+#undef DECODE_END_TABLE
+#undef INVALID
+#undef TERMINAL
+#undef SUBINSNS
+#undef EXTSPACE
+#undef TABLE_LINK
+#undef DECODE_NEW_TABLE
+#undef DECODE_SEPARATOR_BITS
+
+static unsigned int
+decode_subinsn_tablewalk(insn_t *insn, dectree_table_t *table,
+                         size4u_t encoding)
+{
+    unsigned int i;
+    opcode_t opc;
+    if (table->lookup_function) {
+        i = table->lookup_function(table->startbit, table->width, encoding);
+    } else {
+        i = ((encoding >> table->startbit) & ((1 << table->width) - 1));
+    }
+    if (table->table[i].type == DECTREE_TABLE_LINK) {
+        return decode_subinsn_tablewalk(insn, table->table[i].table_link,
+                                        encoding);
+    } else if (table->table[i].type == DECTREE_TERMINAL) {
+        opc = table->table[i].opcode;
+        if ((encoding & decode_itable[opc].mask) != decode_itable[opc].match) {
+            return 0;
+        }
+        decode_op(insn, opc, encoding);
+        return 1;
+    } else {
+        return 0;
+    }
+}
+
+static unsigned int get_insn_a(size4u_t encoding)
+{
+    return encoding & 0x00001fff;
+}
+
+static unsigned int get_insn_b(size4u_t encoding)
+{
+    return (encoding >> 16) & 0x00001fff;
+}
+
+static unsigned int
+decode_insns_tablewalk(insn_t *insn, dectree_table_t *table, size4u_t encoding)
+{
+    unsigned int i;
+    unsigned int a, b;
+    opcode_t opc;
+    if (table->lookup_function) {
+        i = table->lookup_function(table->startbit, table->width, encoding);
+    } else {
+        i = ((encoding >> table->startbit) & ((1 << table->width) - 1));
+    }
+    if (table->table[i].type == DECTREE_TABLE_LINK) {
+        return decode_insns_tablewalk(insn, table->table[i].table_link,
+                                      encoding);
+    } else if (table->table[i].type == DECTREE_SUBINSNS) {
+        a = get_insn_a(encoding);
+        b = get_insn_b(encoding);
+        b = decode_subinsn_tablewalk(insn, table->table[i].table_link_b, b);
+        a = decode_subinsn_tablewalk(insn + 1, table->table[i].table_link, a);
+        if ((a == 0) || (b == 0)) {
+            return 0;
+        }
+        return 2;
+    } else if (table->table[i].type == DECTREE_TERMINAL) {
+        opc = table->table[i].opcode;
+        if ((encoding & decode_itable[opc].mask) != decode_itable[opc].match) {
+            if ((encoding & decode_legacy_itable[opc].mask) !=
+                decode_legacy_itable[opc].match) {
+                return 0;
+            }
+        }
+        decode_op(insn, opc, encoding);
+        return 1;
+    } else {
+        return 0;
+    }
+}
+
+static unsigned int
+decode_insns(insn_t *insn, size4u_t encoding)
+{
+    dectree_table_t *table;
+    if ((encoding & 0x0000c000) != 0) {
+        /* Start with PP table */
+        table = &dectree_table_DECODE_ROOT_32;
+    } else {
+        /* start with EE table */
+        table = &dectree_table_DECODE_ROOT_EE;
+    }
+    return decode_insns_tablewalk(insn, table, encoding);
+}
+
+static void decode_add_endloop_insn(insn_t *insn, int loopnum)
+{
+    if (loopnum == 10) {
+        insn->opcode = J2_endloop01;
+        insn->generate = opcode_genptr[J2_endloop01];
+    } else if (loopnum == 1) {
+        insn->opcode = J2_endloop1;
+        insn->generate = opcode_genptr[J2_endloop1];
+    } else {
+        insn->opcode = J2_endloop0;
+        insn->generate = opcode_genptr[J2_endloop0];
+    }
+}
+
+static inline int decode_parsebits_is_end(size4u_t encoding32)
+{
+    size4u_t bits = (encoding32 >> 14) & 0x3;
+    return ((bits == 0x3) || (bits == 0x0));
+}
+
+static inline int decode_parsebits_is_loopend(size4u_t encoding32)
+{
+    size4u_t bits = (encoding32 >> 14) & 0x3;
+    return ((bits == 0x2));
+}
+
+static int
+decode_set_slot_number(packet_t *pkt)
+{
+    int slot;
+    int i;
+    int hit_mem_insn = 0;
+    int hit_duplex = 0;
+    const char *valid_slot_str;
+
+    for (i = 0, slot = 3; i < pkt->num_insns; i++) {
+        valid_slot_str = get_valid_slot_str(pkt, i);
+
+        while (strchr(valid_slot_str, '0' + slot) == NULL) {
+            slot--;
+        }
+        pkt->insn[i].slot = slot;
+        if (slot) {
+            /* I've assigned the slot, now decrement it for the next insn */
+            slot--;
+        }
+    }
+
+    /* Fix the exceptions - mem insns to slot 0,1 */
+    for (i = pkt->num_insns - 1; i >= 0; i--) {
+
+        /* First memory instruction always goes to slot 0 */
+        if ((GET_ATTRIB(pkt->insn[i].opcode, A_MEMLIKE) ||
+             GET_ATTRIB(pkt->insn[i].opcode, A_MEMLIKE_PACKET_RULES)) &&
+            !hit_mem_insn) {
+            hit_mem_insn = 1;
+            pkt->insn[i].slot = 0;
+            continue;
+        }
+
+        /* Next memory instruction always goes to slot 1 */
+        if ((GET_ATTRIB(pkt->insn[i].opcode, A_MEMLIKE) ||
+             GET_ATTRIB(pkt->insn[i].opcode, A_MEMLIKE_PACKET_RULES)) &&
+            hit_mem_insn) {
+            pkt->insn[i].slot = 1;
+        }
+    }
+
+    /* Fix the exceptions - duplex always slot 0,1 */
+    for (i = pkt->num_insns - 1; i >= 0; i--) {
+
+        /* First subinsn always goes to slot 0 */
+        if (GET_ATTRIB(pkt->insn[i].opcode, A_SUBINSN) && !hit_duplex) {
+            pkt->pkt_has_duplex = 1;
+            hit_duplex = 1;
+            pkt->insn[i].slot = 0;
+            continue;
+        }
+
+        /* Next subinsn always goes to slot 1 */
+        if (GET_ATTRIB(pkt->insn[i].opcode, A_SUBINSN) && hit_duplex) {
+            pkt->insn[i].slot = 1;
+        }
+    }
+
+    /* Fix the exceptions - slot 1 is never empty, always aligns to slot 0 */
+    {
+        int slot0_found = 0;
+        int slot1_found = 0;
+        int slot1_iidx = 0;
+        for (i = pkt->num_insns - 1; i >= 0; i--) {
+            /* Is slot0 used? */
+            if (pkt->insn[i].slot == 0) {
+                int is_endloop = (pkt->insn[i].opcode == J2_endloop01);
+                is_endloop |= (pkt->insn[i].opcode == J2_endloop0);
+                is_endloop |= (pkt->insn[i].opcode == J2_endloop1);
+
+                /*
+                 * Make sure it's not endloop since, we're overloading
+                 * slot0 for endloop
+                 */
+                if (!is_endloop) {
+                    slot0_found = 1;
+                }
+            }
+            /* Is slot1 used? */
+            if (pkt->insn[i].slot == 1) {
+                slot1_found = 1;
+                slot1_iidx = i;
+            }
+        }
+        /* Is slot0 empty and slot1 used? */
+        if ((slot0_found == 0) && (slot1_found == 1)) {
+            /* Then push it to slot0 */
+            pkt->insn[slot1_iidx].slot = 0;
+        }
+    }
+    return 0;
+}
+
+/*
+ * do_decode_packet
+ * Decodes packet with given words
+ * Returns negative on error, 0 on insufficient words,
+ * and number of words used on success
+ */
+
+static int do_decode_packet(int max_words, const size4u_t *words, packet_t *pkt)
+{
+    int num_insns = 0;
+    int words_read = 0;
+    int end_of_packet = 0;
+    int new_insns = 0;
+    int num_mems = 0;
+    int errors = 0;
+    int i;
+    size4u_t encoding32;
+
+    /* Initialize */
+    memset(pkt, 0, sizeof(*pkt));
+    /* Try to build packet */
+    while (!end_of_packet && (words_read < max_words)) {
+        encoding32 = words[words_read];
+        end_of_packet = decode_parsebits_is_end(encoding32);
+        new_insns = decode_insns(&pkt->insn[num_insns], encoding32);
+        /*
+         * If we saw an extender, mark next word extended so immediate
+         * decode works
+         */
+        if (pkt->insn[num_insns].opcode == A4_ext) {
+            pkt->insn[num_insns + 1].extension_valid = 1;
+            pkt->pkt_has_payload = 1;
+        }
+        num_insns += new_insns;
+        words_read++;
+    }
+
+    pkt->num_insns = num_insns;
+    if (!end_of_packet) {
+        /* Ran out of words! */
+        return 0;
+    }
+    pkt->encod_pkt_size_in_bytes = words_read * 4;
+    /* Check packet / aux info */
+    for (i = 0; i < num_insns; i++) {
+        if (GET_ATTRIB(pkt->insn[i].opcode, A_MEMCPY)) {
+            num_mems += 2;
+        } else if (GET_ATTRIB(pkt->insn[i].opcode, A_LOAD) ||
+                   GET_ATTRIB(pkt->insn[i].opcode, A_STORE)) {
+            num_mems++;
+        }
+        if (pkt->insn[i].opcode == A4_ext) {
+            pkt->insn[i + 1].extension_valid = 1;
+            pkt->pkt_has_payload = 1;
+        }
+    }
+    pkt->pkt_has_initloop = 0;
+    pkt->pkt_has_initloop0 = 0;
+    pkt->pkt_has_initloop1 = 0;
+    for (i = 0; i < num_insns; i++) {
+        pkt->pkt_has_initloop0 |=
+            GET_ATTRIB(pkt->insn[i].opcode, A_HWLOOP0_SETUP);
+        pkt->pkt_has_initloop1 |=
+            GET_ATTRIB(pkt->insn[i].opcode, A_HWLOOP1_SETUP);
+    }
+    pkt->pkt_has_initloop |= pkt->pkt_has_initloop0 | pkt->pkt_has_initloop1;
+
+    /* Shuffle / split / reorder for execution */
+    if ((words_read == 2) && (decode_parsebits_is_loopend(words[0]))) {
+        decode_add_endloop_insn(&pkt->insn[pkt->num_insns++], 0);
+    }
+    if (words_read >= 3) {
+        size4u_t has_loop0, has_loop1;
+        has_loop0 = decode_parsebits_is_loopend(words[0]);
+        has_loop1 = decode_parsebits_is_loopend(words[1]);
+        if (has_loop0 && has_loop1) {
+            decode_add_endloop_insn(&pkt->insn[pkt->num_insns++], 10);
+        } else if (has_loop1) {
+            decode_add_endloop_insn(&pkt->insn[pkt->num_insns++], 1);
+        } else if (has_loop0) {
+            decode_add_endloop_insn(&pkt->insn[pkt->num_insns++], 0);
+        }
+    }
+
+    decode_assembler_count_fpops(pkt);
+
+    errors += decode_apply_extenders(pkt);
+    errors += decode_remove_extenders(pkt);
+    errors += decode_set_slot_number(pkt);
+    errors += decode_fill_newvalue_regno(pkt);
+
+    errors += decode_shuffle_for_execution(pkt);
+    errors += decode_split_cmpjump(pkt);
+    errors += decode_set_insn_attr_fields(pkt);
+    if (errors) {
+        return -1;
+    }
+
+    return words_read;
+}
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 15/67] Hexagon instruction printing
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (13 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 14/67] Hexagon instruction/packet decode Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-02-28 16:43 ` [RFC PATCH v2 16/67] Hexagon arch import - instruction semantics definitions Taylor Simpson
                   ` (52 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/printinsn.h | 26 +++++++++++++
 target/hexagon/printinsn.c | 91 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 117 insertions(+)
 create mode 100644 target/hexagon/printinsn.h
 create mode 100644 target/hexagon/printinsn.c

diff --git a/target/hexagon/printinsn.h b/target/hexagon/printinsn.h
new file mode 100644
index 0000000..264b63c
--- /dev/null
+++ b/target/hexagon/printinsn.h
@@ -0,0 +1,26 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HEXAGON_PRINTINSN_H
+#define HEXAGON_PRINTINSN_H
+
+#include "qemu/osdep.h"
+#include "insn.h"
+
+extern void snprint_a_pkt(char *buf, int n, packet_t *pkt);
+
+#endif
diff --git a/target/hexagon/printinsn.c b/target/hexagon/printinsn.c
new file mode 100644
index 0000000..f59383e
--- /dev/null
+++ b/target/hexagon/printinsn.c
@@ -0,0 +1,91 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "opcodes.h"
+#include "printinsn.h"
+#include "insn.h"
+#include "macros.h"
+#include "internal.h"
+
+static const char *sreg2str(unsigned int reg)
+{
+    if (reg < TOTAL_PER_THREAD_REGS) {
+        return hexagon_regnames[reg];
+    } else {
+        return "???";
+    }
+}
+
+static const char *creg2str(unsigned int reg)
+{
+    return sreg2str(reg + NUM_GEN_REGS);
+}
+
+static void snprintinsn(char *buf, int n, insn_t * insn)
+{
+    switch (insn->opcode) {
+#define DEF_VECX_PRINTINFO(TAG, FMT, ...) DEF_PRINTINFO(TAG, FMT, __VA_ARGS__)
+#define DEF_PRINTINFO(TAG, FMT, ...) \
+    case TAG: \
+        snprintf(buf, n, FMT, __VA_ARGS__);\
+        break;
+#include "printinsn_generated.h"
+#undef DEF_VECX_PRINTINFO
+#undef DEF_PRINTINFO
+    }
+}
+
+void snprint_a_pkt(char *buf, int n, packet_t * pkt)
+{
+    char tmpbuf[128];
+    buf[0] = '\0';
+    int i, slot, opcode;
+
+    if (pkt == NULL) {
+        snprintf(buf, n, "<printpkt: NULL ptr>");
+        return;
+    }
+
+    if (pkt->num_insns > 1) {
+        strncat(buf, "\n{\n", n);
+    }
+    for (i = 0; i < pkt->num_insns; i++) {
+        if (pkt->insn[i].part1) {
+            continue;
+        }
+        snprintinsn(tmpbuf, 127, &(pkt->insn[i]));
+        strncat(buf, "\t", n);
+        strncat(buf, tmpbuf, n);
+        if (GET_ATTRIB(pkt->insn[i].opcode, A_SUBINSN)) {
+            strncat(buf, " //subinsn", n);
+        }
+        if (pkt->insn[i].extension_valid) {
+            strncat(buf, " //constant extended", n);
+        }
+        slot = pkt->insn[i].slot;
+        opcode = pkt->insn[i].opcode;
+        snprintf(tmpbuf, 127, " //slot=%d:tag=%s", slot, opcode_names[opcode]);
+        strncat(buf, tmpbuf, n);
+
+        strncat(buf, "\n", n);
+    }
+    if (pkt->num_insns > 1) {
+        strncat(buf, "}\n", n);
+    }
+}
+
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 16/67] Hexagon arch import - instruction semantics definitions
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (14 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 15/67] Hexagon instruction printing Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-02-28 16:43 ` [RFC PATCH v2 17/67] Hexagon arch import - macro definitions Taylor Simpson
                   ` (51 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Imported from the Hexagon architecture library
    imported/allidefs.def          Top level instruction definition file
    imported/*.idef                Instruction definition files
These files are input to the first phase of the generator (gen_semantics.c)
to create a python include file with the instruction semantics and attributes.
The python include file is fed to the second phase (do_qemu.py).

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/imported/allidefs.def  |   91 +++
 target/hexagon/imported/alu.idef      | 1335 +++++++++++++++++++++++++++++++++
 target/hexagon/imported/branch.idef   |  344 +++++++++
 target/hexagon/imported/compare.idef  |  639 ++++++++++++++++
 target/hexagon/imported/float.idef    |  498 ++++++++++++
 target/hexagon/imported/ldst.idef     |  421 +++++++++++
 target/hexagon/imported/mpy.idef      | 1269 +++++++++++++++++++++++++++++++
 target/hexagon/imported/shift.idef    | 1211 ++++++++++++++++++++++++++++++
 target/hexagon/imported/subinsns.idef |  152 ++++
 target/hexagon/imported/system.idef   |  302 ++++++++
 10 files changed, 6262 insertions(+)
 create mode 100644 target/hexagon/imported/allidefs.def
 create mode 100644 target/hexagon/imported/alu.idef
 create mode 100644 target/hexagon/imported/branch.idef
 create mode 100644 target/hexagon/imported/compare.idef
 create mode 100644 target/hexagon/imported/float.idef
 create mode 100644 target/hexagon/imported/ldst.idef
 create mode 100644 target/hexagon/imported/mpy.idef
 create mode 100644 target/hexagon/imported/shift.idef
 create mode 100644 target/hexagon/imported/subinsns.idef
 create mode 100644 target/hexagon/imported/system.idef

diff --git a/target/hexagon/imported/allidefs.def b/target/hexagon/imported/allidefs.def
new file mode 100644
index 0000000..d2f13a2
--- /dev/null
+++ b/target/hexagon/imported/allidefs.def
@@ -0,0 +1,91 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+/*
+ * Top level instruction definition file
+ */
+
+#define DEF_MAPPING(TAG,FROMSYN,TOSYN) \
+	Q6INSN(TAG,FROMSYN,ATTRIBS(A_MAPPING), \
+		"Mapping from " FROMSYN " to " TOSYN, \
+		fASM_MAP(FROMSYN,TOSYN))
+
+#define DEF_COND_MAPPING(TAG,FROMSYN,COND,TOSYNA,TOSYNB) \
+        Q6INSN(TAG,FROMSYN,ATTRIBS(A_MAPPING,A_CONDMAPPING), \
+		"Conditional Mapping from " FROMSYN " to either " TOSYNA " or " TOSYNB, \
+		fCOND_ASM_MAP(FROMSYN,COND,TOSYNA,TOSYNB))
+
+#define DEF_V2_MAPPING(TAG,FROMSYN,TOSYN) \
+	Q6INSN(TAG,FROMSYN,ATTRIBS(A_MAPPING,A_ARCHV2), \
+		"Mapping from " FROMSYN " to " TOSYN, \
+		fASM_MAP(FROMSYN,TOSYN))
+
+#define DEF_V2_COND_MAPPING(TAG,FROMSYN,COND,TOSYNA,TOSYNB) \
+        Q6INSN(TAG,FROMSYN,ATTRIBS(A_MAPPING,A_CONDMAPPING,A_ARCHV2), \
+		"Conditional Mapping from " FROMSYN " to either " TOSYNA " or " TOSYNB, \
+		fCOND_ASM_MAP(FROMSYN,COND,TOSYNA,TOSYNB))
+
+#define DEF_V3_MAPPING(TAG,FROMSYN,TOSYN) \
+	Q6INSN(TAG,FROMSYN,ATTRIBS(A_MAPPING,A_ARCHV3), \
+		"Mapping from " FROMSYN " to " TOSYN, \
+		fASM_MAP(FROMSYN,TOSYN))
+
+#define DEF_V3_COND_MAPPING(TAG,FROMSYN,COND,TOSYNA,TOSYNB) \
+        Q6INSN(TAG,FROMSYN,ATTRIBS(A_MAPPING,A_CONDMAPPING,A_ARCHV3), \
+		"Conditional Mapping from " FROMSYN " to either " TOSYNA " or " TOSYNB, \
+		fCOND_ASM_MAP(FROMSYN,COND,TOSYNA,TOSYNB))
+
+#define DEF_V4_MAPPING(TAG,FROMSYN,TOSYN) \
+	Q6INSN(TAG,FROMSYN,ATTRIBS(A_MAPPING,A_ARCHV4), \
+		"Mapping from " FROMSYN " to " TOSYN, \
+		fASM_MAP(FROMSYN,TOSYN))
+
+#define DEF_V4_COND_MAPPING(TAG,FROMSYN,COND,TOSYNA,TOSYNB) \
+        Q6INSN(TAG,FROMSYN,ATTRIBS(A_MAPPING,A_CONDMAPPING,A_ARCHV4), \
+		"Conditional Mapping from " FROMSYN " to either " TOSYNA " or " TOSYNB, \
+		fCOND_ASM_MAP(FROMSYN,COND,TOSYNA,TOSYNB))
+
+#define DEF_V5_MAPPING(TAG,FROMSYN,TOSYN) \
+	Q6INSN(TAG,FROMSYN,ATTRIBS(A_MAPPING,A_ARCHV5), \
+		"Mapping from " FROMSYN " to " TOSYN, \
+		fASM_MAP(FROMSYN,TOSYN))
+
+#define DEF_V5_COND_MAPPING(TAG,FROMSYN,COND,TOSYNA,TOSYNB) \
+        Q6INSN(TAG,FROMSYN,ATTRIBS(A_MAPPING,A_CONDMAPPING,A_ARCHV5), \
+		"Conditional Mapping from " FROMSYN " to either " TOSYNA " or " TOSYNB, \
+		fCOND_ASM_MAP(FROMSYN,COND,TOSYNA,TOSYNB))
+
+#define DEF_VECX_MAPPING(TAG,FROMSYN,TOSYN) \
+	EXTINSN(TAG,FROMSYN,ATTRIBS(A_MAPPING,A_VECX), \
+		"VECX Mapping from " FROMSYN " to " TOSYN, \
+		fASM_MAP(FROMSYN,TOSYN))
+
+#define DEF_CVI_MAPPING(TAG,FROMSYN,TOSYN) \
+	Q6INSN(TAG,FROMSYN,ATTRIBS(A_MAPPING,A_CVI), \
+		"CVI Mapping from " FROMSYN " to " TOSYN, \
+		fASM_MAP(FROMSYN,TOSYN))
+
+
+#include "branch.idef"
+#include "ldst.idef"
+#include "compare.idef"
+#include "mpy.idef"
+#include "alu.idef"
+#include "float.idef"
+#include "shift.idef"
+#include "system.idef"
+#include "subinsns.idef"
diff --git a/target/hexagon/imported/alu.idef b/target/hexagon/imported/alu.idef
new file mode 100644
index 0000000..c17d52f
--- /dev/null
+++ b/target/hexagon/imported/alu.idef
@@ -0,0 +1,1335 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+/*
+ * ALU Instructions
+ */
+
+
+/**********************************************/
+/* Add/Sub instructions		       */
+/**********************************************/
+
+Q6INSN(A2_add,"Rd32=add(Rs32,Rt32)",ATTRIBS(),
+"Add 32-bit registers",
+{ RdV=RsV+RtV;})
+
+Q6INSN(A2_sub,"Rd32=sub(Rt32,Rs32)",ATTRIBS(),
+"Subtract 32-bit registers",
+{ RdV=RtV-RsV;})
+
+#define COND_ALU(TAG,OPER,DESCR,SEMANTICS)\
+Q6INSN(TAG##t,"if (Pu4) "OPER,ATTRIBS(A_ARCHV2),DESCR,{if(fLSBOLD(PuV)){SEMANTICS;} else {CANCEL;}})\
+Q6INSN(TAG##f,"if (!Pu4) "OPER,ATTRIBS(A_ARCHV2),DESCR,{if(fLSBOLDNOT(PuV)){SEMANTICS;} else {CANCEL;}})\
+Q6INSN(TAG##tnew,"if (Pu4.new) " OPER,ATTRIBS(A_ARCHV2),DESCR,{if(fLSBNEW(PuN)){SEMANTICS;} else {CANCEL;}})\
+Q6INSN(TAG##fnew,"if (!Pu4.new) "OPER,ATTRIBS(A_ARCHV2),DESCR,{if(fLSBNEWNOT(PuN)){SEMANTICS;} else {CANCEL;}})
+
+COND_ALU(A2_padd,"Rd32=add(Rs32,Rt32)","Conditionally Add 32-bit registers",RdV=RsV+RtV)
+COND_ALU(A2_psub,"Rd32=sub(Rt32,Rs32)","Conditionally Subtract 32-bit registers",RdV=RtV-RsV)
+COND_ALU(A2_paddi,"Rd32=add(Rs32,#s8)","Conditionally Add Register and immediate",fIMMEXT(siV); RdV=RsV+siV)
+COND_ALU(A2_pxor,"Rd32=xor(Rs32,Rt32)","Conditionally XOR registers",RdV=RsV^RtV);
+COND_ALU(A2_pand,"Rd32=and(Rs32,Rt32)","Conditionally AND registers",RdV=RsV&RtV);
+COND_ALU(A2_por,"Rd32=or(Rs32,Rt32)","Conditionally OR registers",RdV=RsV|RtV);
+
+COND_ALU(A4_psxtb,"Rd32=sxtb(Rs32)","Conditionally sign-extend byte", RdV=fSXTN(8,32,RsV));
+COND_ALU(A4_pzxtb,"Rd32=zxtb(Rs32)","Conditionally zero-extend byte", RdV=fZXTN(8,32,RsV));
+COND_ALU(A4_psxth,"Rd32=sxth(Rs32)","Conditionally sign-extend halfword", RdV=fSXTN(16,32,RsV));
+COND_ALU(A4_pzxth,"Rd32=zxth(Rs32)","Conditionally zero-extend halfword", RdV=fZXTN(16,32,RsV));
+COND_ALU(A4_paslh,"Rd32=aslh(Rs32)","Conditionally zero-extend halfword", RdV=RsV<<16);
+COND_ALU(A4_pasrh,"Rd32=asrh(Rs32)","Conditionally zero-extend halfword", RdV=RsV>>16);
+
+
+DEF_V2_MAPPING(A2_tfrt,   "if (Pu4) Rd32=Rs32","if (Pu4) Rd32=add(Rs32,#0)")
+DEF_V2_MAPPING(A2_tfrf,   "if (!Pu4) Rd32=Rs32","if (!Pu4) Rd32=add(Rs32,#0)")
+DEF_V2_MAPPING(A2_tfrtnew,"if (Pu4.new) Rd32=Rs32","if (Pu4.new) Rd32=add(Rs32,#0)")
+DEF_V2_MAPPING(A2_tfrfnew,"if (!Pu4.new) Rd32=Rs32","if (!Pu4.new) Rd32=add(Rs32,#0)")
+
+
+DEF_V2_MAPPING(A2_tfrpt,   "if (Pu4) Rdd32=Rss32","if (Pu4) Rdd32=combine(Rss.H32,Rss.L32)")
+DEF_V2_MAPPING(A2_tfrpf,   "if (!Pu4) Rdd32=Rss32","if (!Pu4) Rdd32=combine(Rss.H32,Rss.L32)")
+DEF_V2_MAPPING(A2_tfrptnew,"if (Pu4.new) Rdd32=Rss32","if (Pu4.new) Rdd32=combine(Rss.H32,Rss.L32)")
+DEF_V2_MAPPING(A2_tfrpfnew,"if (!Pu4.new) Rdd32=Rss32","if (!Pu4.new) Rdd32=combine(Rss.H32,Rss.L32)")
+
+
+Q6INSN(A2_addsat,"Rd32=add(Rs32,Rt32):sat",ATTRIBS(),
+"Add 32-bit registers with saturation",
+{ RdV=fSAT(fSE32_64(RsV)+fSE32_64(RtV)); })
+
+Q6INSN(A2_subsat,"Rd32=sub(Rt32,Rs32):sat",ATTRIBS(),
+"Subtract 32-bit registers with saturation",
+{ RdV=fSAT(fSE32_64(RtV) - fSE32_64(RsV)); })
+
+
+Q6INSN(A2_addi,"Rd32=add(Rs32,#s16)",ATTRIBS(),
+"Add a signed immediate to a register",
+{ fIMMEXT(siV); RdV=RsV+siV;})
+
+
+Q6INSN(C4_addipc,"Rd32=add(pc,#u6)",ATTRIBS(),
+"Add immediate to PC",
+{ RdV=fREAD_PC()+fIMMEXT(uiV);})
+
+
+
+/**********************************************/
+/* Single-precision HL forms		  */
+/* These insns and the SP mpy are the ones    */
+/* that can do .HL stuff		      */
+/**********************************************/
+#define STD_HL_INSN(TAG,OPER,AOPER,ATR,SEM)\
+Q6INSN(A2_##TAG##_ll, OPER"(Rt.L32,Rs.L32)"AOPER,    ATR,"",{SEM(fGETHALF(0,RtV),fGETHALF(0,RsV));})\
+Q6INSN(A2_##TAG##_lh, OPER"(Rt.L32,Rs.H32)"AOPER,    ATR,"",{SEM(fGETHALF(0,RtV),fGETHALF(1,RsV));})\
+Q6INSN(A2_##TAG##_hl, OPER"(Rt.H32,Rs.L32)"AOPER,    ATR,"",{SEM(fGETHALF(1,RtV),fGETHALF(0,RsV));})\
+Q6INSN(A2_##TAG##_hh, OPER"(Rt.H32,Rs.H32)"AOPER,    ATR,"",{SEM(fGETHALF(1,RtV),fGETHALF(1,RsV));})
+
+#define SUBSTD_HL_INSN(TAG,OPER,AOPER,ATR,SEM)\
+Q6INSN(A2_##TAG##_ll, OPER"(Rt.L32,Rs.L32)"AOPER,    ATR,"",{SEM(fGETHALF(0,RtV),fGETHALF(0,RsV));})\
+Q6INSN(A2_##TAG##_hl, OPER"(Rt.L32,Rs.H32)"AOPER,    ATR,"",{SEM(fGETHALF(0,RtV),fGETHALF(1,RsV));})
+
+
+#undef HLSEM
+#define HLSEM(A,B) RdV=fSXTN(16,32,(A+B))
+SUBSTD_HL_INSN(addh_l16,"Rd32=add","",ATTRIBS(),HLSEM)
+
+#undef HLSEM
+#define HLSEM(A,B) RdV=fSATH(A+B)
+SUBSTD_HL_INSN(addh_l16_sat,"Rd32=add",":sat",ATTRIBS(),HLSEM)
+
+#undef HLSEM
+#define HLSEM(A,B) RdV=fSXTN(16,32,(A-B))
+SUBSTD_HL_INSN(subh_l16,"Rd32=sub","",ATTRIBS(),HLSEM)
+
+#undef HLSEM
+#define HLSEM(A,B) RdV=fSATH(A-B)
+SUBSTD_HL_INSN(subh_l16_sat,"Rd32=sub",":sat",ATTRIBS(),HLSEM)
+
+#undef HLSEM
+#define HLSEM(A,B) RdV=(A+B)<<16
+STD_HL_INSN(addh_h16,"Rd32=add",":<<16",ATTRIBS(),HLSEM)
+
+#undef HLSEM
+#define HLSEM(A,B) RdV=(fSATH(A+B))<<16
+STD_HL_INSN(addh_h16_sat,"Rd32=add",":sat:<<16",ATTRIBS(),HLSEM)
+
+#undef HLSEM
+#define HLSEM(A,B) RdV=(A-B)<<16
+STD_HL_INSN(subh_h16,"Rd32=sub",":<<16",ATTRIBS(),HLSEM)
+
+#undef HLSEM
+#define HLSEM(A,B) RdV=(fSATH(A-B))<<16
+STD_HL_INSN(subh_h16_sat,"Rd32=sub",":sat:<<16",ATTRIBS(),HLSEM)
+
+
+
+
+Q6INSN(A2_aslh,"Rd32=aslh(Rs32)",ATTRIBS(),
+"Arithmetic Shift Left by Halfword",{ RdV=RsV<<16; })
+
+Q6INSN(A2_asrh,"Rd32=asrh(Rs32)",ATTRIBS(),
+"Arithmetic Shift Right by Halfword",{ RdV=RsV>>16; })
+
+
+/* 64-bit versions */
+
+Q6INSN(A2_addp,"Rdd32=add(Rss32,Rtt32)",ATTRIBS(),
+"Add",
+{ RddV=RssV+RttV;})
+
+Q6INSN(A2_addpsat,"Rdd32=add(Rss32,Rtt32):sat",ATTRIBS(A_ARCHV3),
+"Add",
+{ fADDSAT64(RddV,RssV,RttV);})
+
+Q6INSN(A2_addspl,"Rdd32=add(Rss32,Rtt32):raw:lo",ATTRIBS(A_ARCHV3),
+"Add",
+{ RddV=RttV+fSXTN(32,64,fGETWORD(0,RssV));})
+
+Q6INSN(A2_addsph,"Rdd32=add(Rss32,Rtt32):raw:hi",ATTRIBS(A_ARCHV3),
+"Add",
+{ RddV=RttV+fSXTN(32,64,fGETWORD(1,RssV));})
+
+DEF_V3_COND_MAPPING(A2_addsp,"Rdd32=add(Rs32,Rtt32)","Rs32 & 1","Rdd32=add(Rss32,Rtt32):raw:hi","Rdd32=add(Rss32,Rtt32):raw:lo")
+
+Q6INSN(A2_subp,"Rdd32=sub(Rtt32,Rss32)",ATTRIBS(),
+"Sub",
+{ RddV=RttV-RssV;})
+
+/* 64-bit with carry */
+
+Q6INSN(A4_addp_c,"Rdd32=add(Rss32,Rtt32,Px4):carry",ATTRIBS(A_RESTRICT_LATEPRED,A_NOTE_LATEPRED),"Add with Carry",
+{
+  fPREDUSE_TIMING();
+  RddV = RssV + RttV + fLSBOLD(PxV);
+  PxV = f8BITSOF(fCARRY_FROM_ADD(RssV,RttV,fLSBOLD(PxV)));
+  fHIDE(MARK_LATE_PRED_WRITE(PxN))
+})
+
+Q6INSN(A4_subp_c,"Rdd32=sub(Rss32,Rtt32,Px4):carry",ATTRIBS(A_RESTRICT_LATEPRED,A_NOTE_LATEPRED),"Sub with Carry",
+{
+  fPREDUSE_TIMING();
+  RddV = RssV + ~RttV + fLSBOLD(PxV);
+  PxV = f8BITSOF(fCARRY_FROM_ADD(RssV,~RttV,fLSBOLD(PxV)));
+  fHIDE(MARK_LATE_PRED_WRITE(PxN))
+})
+
+
+/* NEG and ABS */
+
+DEF_MAPPING(A2_neg,"Rd32=neg(Rs32)","Rd32=sub(#0,Rs32)")
+
+
+Q6INSN(A2_negsat,"Rd32=neg(Rs32):sat",ATTRIBS(),
+"Arithmetic negate register", { RdV = fSAT(-fCAST8s(RsV)); })
+
+Q6INSN(A2_abs,"Rd32=abs(Rs32)",ATTRIBS(),
+"Absolute Value register", { RdV = fABS(RsV); })
+
+Q6INSN(A2_abssat,"Rd32=abs(Rs32):sat",ATTRIBS(),
+"Arithmetic negate register", { RdV = fSAT(fABS(fCAST4_8s(RsV))); })
+
+Q6INSN(A2_vconj,"Rdd32=vconj(Rss32):sat",ATTRIBS(A_ARCHV2),
+"Vector Complex conjugate of Rss",
+{  fSETHALF(1,RddV,fSATN(16,-fGETHALF(1,RssV)));
+   fSETHALF(0,RddV,fGETHALF(0,RssV));
+   fSETHALF(3,RddV,fSATN(16,-fGETHALF(3,RssV)));
+   fSETHALF(2,RddV,fGETHALF(2,RssV));
+})
+
+
+/* 64-bit versions */
+
+Q6INSN(A2_negp,"Rdd32=neg(Rss32)",ATTRIBS(),
+"Arithmetic negate register", { RddV = -RssV; })
+
+Q6INSN(A2_absp,"Rdd32=abs(Rss32)",ATTRIBS(),
+"Absolute Value register", { RddV = fABS(RssV); })
+
+
+/* MIN and MAX  R */
+
+Q6INSN(A2_max,"Rd32=max(Rs32,Rt32)",ATTRIBS(),
+"Maximum of two registers",
+{ RdV = fMAX(RsV,RtV); })
+
+Q6INSN(A2_maxu,"Rd32=maxu(Rs32,Rt32)",ATTRIBS(A_INTRINSIC_RETURNS_UNSIGNED),
+"Maximum of two registers (unsigned)",
+{ RdV = fMAX(fCAST4u(RsV),fCAST4u(RtV)); })
+
+Q6INSN(A2_min,"Rd32=min(Rt32,Rs32)",ATTRIBS(),
+"Minimum of two registers",
+{ RdV = fMIN(RtV,RsV); })
+
+Q6INSN(A2_minu,"Rd32=minu(Rt32,Rs32)",ATTRIBS(A_INTRINSIC_RETURNS_UNSIGNED),
+"Minimum of two registers (unsigned)",
+{ RdV = fMIN(fCAST4u(RtV),fCAST4u(RsV)); })
+
+/* MIN and MAX Pairs */
+#if 1
+Q6INSN(A2_maxp,"Rdd32=max(Rss32,Rtt32)",ATTRIBS(A_ARCHV3),
+"Maximum of two register pairs",
+{ RddV = fMAX(RssV,RttV); })
+
+Q6INSN(A2_maxup,"Rdd32=maxu(Rss32,Rtt32)",ATTRIBS(A_INTRINSIC_RETURNS_UNSIGNED,A_ARCHV3),
+"Maximum of two register pairs (unsigned)",
+{ RddV = fMAX(fCAST8u(RssV),fCAST8u(RttV)); })
+
+Q6INSN(A2_minp,"Rdd32=min(Rtt32,Rss32)",ATTRIBS(A_ARCHV3),
+"Minimum of two register pairs",
+{ RddV = fMIN(RttV,RssV); })
+
+Q6INSN(A2_minup,"Rdd32=minu(Rtt32,Rss32)",ATTRIBS(A_INTRINSIC_RETURNS_UNSIGNED,A_ARCHV3),
+"Minimum of two register pairs (unsigned)",
+{ RddV = fMIN(fCAST8u(RttV),fCAST8u(RssV)); })
+#endif
+
+/**********************************************/
+/* Register and Immediate Transfers	   */
+/**********************************************/
+
+Q6INSN(A2_nop,"nop",ATTRIBS(A_IT_NOP),
+"Nop (32-bit encoding)",
+ fHIDE( { fNOP_EXECUTED }  ))
+
+
+Q6INSN(A4_ext,"immext(#u26:6)",ATTRIBS(A_IT_EXTENDER),
+"This instruction carries the 26 most-significant immediate bits for the next instruction",
+{ fHIDE(); })
+
+
+Q6INSN(A2_tfr,"Rd32=Rs32",ATTRIBS(),
+"tfr register",{ RdV=RsV;})
+
+Q6INSN(A2_tfrsi,"Rd32=#s16",ATTRIBS(),
+"transfer signed immediate to register",{ fIMMEXT(siV); RdV=siV;})
+
+DEF_MAPPING(A2_tfrp,"Rdd32=Rss32","Rdd32=combine(Rss.H32,Rss.L32)")
+DEF_V2_COND_MAPPING(A2_tfrpi,"Rdd32=#s8","#s8<0","Rdd32=combine(#-1,#s8)","Rdd32=combine(#0,#s8)")
+DEF_MAPPING(A2_zxtb,"Rd32=zxtb(Rs32)","Rd32=and(Rs32,#255)")
+
+Q6INSN(A2_sxtb,"Rd32=sxtb(Rs32)",ATTRIBS(),
+"Sign extend byte", {RdV = fSXTN(8,32,RsV);})
+
+Q6INSN(A2_zxth,"Rd32=zxth(Rs32)",ATTRIBS(),
+"Zero extend half", {RdV = fZXTN(16,32,RsV);})
+
+Q6INSN(A2_sxth,"Rd32=sxth(Rs32)",ATTRIBS(),
+"Sign extend half", {RdV = fSXTN(16,32,RsV);})
+
+Q6INSN(A2_combinew,"Rdd32=combine(Rs32,Rt32)",ATTRIBS(A_ROPS_2),
+"Combine two words into a register pair",
+{ fSETWORD(0,RddV,RtV);
+  fSETWORD(1,RddV,RsV);
+})
+
+Q6INSN(A4_combineri,"Rdd32=combine(Rs32,#s8)",ATTRIBS(A_ROPS_2),
+"Combine a word and an immediate into a register pair",
+{ fIMMEXT(siV); fSETWORD(0,RddV,siV);
+  fSETWORD(1,RddV,RsV);
+})
+
+Q6INSN(A4_combineir,"Rdd32=combine(#s8,Rs32)",ATTRIBS(A_ROPS_2),
+"Combine a word and an immediate into a register pair",
+{ fIMMEXT(siV); fSETWORD(0,RddV,RsV);
+  fSETWORD(1,RddV,siV);
+})
+
+
+
+Q6INSN(A2_combineii,"Rdd32=combine(#s8,#S8)",ATTRIBS(A_ARCHV2,A_ROPS_2),
+"Set two small immediates",
+{ fIMMEXT(siV); fSETWORD(0,RddV,SiV); fSETWORD(1,RddV,siV); })
+
+Q6INSN(A4_combineii,"Rdd32=combine(#s8,#U6)",ATTRIBS(A_ROPS_2),"Set two small immediates",
+{ fIMMEXT(UiV); fSETWORD(0,RddV,UiV); fSETWORD(1,RddV,siV); })
+
+
+Q6INSN(A2_combine_hh,"Rd32=combine(Rt.H32,Rs.H32)",ATTRIBS(),
+"Combine two halfs into a register", {RdV = (fGETUHALF(1,RtV)<<16) | fGETUHALF(1,RsV);})
+
+Q6INSN(A2_combine_hl,"Rd32=combine(Rt.H32,Rs.L32)",ATTRIBS(),
+"Combine two halfs into a register", {RdV = (fGETUHALF(1,RtV)<<16) | fGETUHALF(0,RsV);})
+
+Q6INSN(A2_combine_lh,"Rd32=combine(Rt.L32,Rs.H32)",ATTRIBS(),
+"Combine two halfs into a register", {RdV = (fGETUHALF(0,RtV)<<16) | fGETUHALF(1,RsV);})
+
+Q6INSN(A2_combine_ll,"Rd32=combine(Rt.L32,Rs.L32)",ATTRIBS(),
+"Combine two halfs into a register", {RdV = (fGETUHALF(0,RtV)<<16) | fGETUHALF(0,RsV);})
+
+Q6INSN(A2_tfril,"Rx.L32=#u16",ATTRIBS(),
+"Set low 16-bits, leave upper 16 unchanged",{ fSETHALF(0,RxV,uiV);})
+
+Q6INSN(A2_tfrih,"Rx.H32=#u16",ATTRIBS(),
+"Set high 16-bits, leave low 16 unchanged",{ fSETHALF(1,RxV,uiV);})
+
+Q6INSN(A2_tfrcrr,"Rd32=Cs32",ATTRIBS(),
+"transfer control register to general register",{ RdV=CsV;})
+
+Q6INSN(A2_tfrrcr,"Cd32=Rs32",ATTRIBS(),
+"transfer general register to control register",{ CdV=RsV;})
+
+Q6INSN(A4_tfrcpp,"Rdd32=Css32",ATTRIBS(),
+"transfer control register to general register",{ RddV=CssV;})
+
+Q6INSN(A4_tfrpcp,"Cdd32=Rss32",ATTRIBS(),
+"transfer general register to control register",{ CddV=RssV;})
+
+
+/**********************************************/
+/* Logicals				   */
+/**********************************************/
+
+Q6INSN(A2_and,"Rd32=and(Rs32,Rt32)",ATTRIBS(),
+"logical AND",{ RdV=RsV&RtV;})
+
+Q6INSN(A2_or,"Rd32=or(Rs32,Rt32)",ATTRIBS(),
+"logical OR",{ RdV=RsV|RtV;})
+
+Q6INSN(A2_xor,"Rd32=xor(Rs32,Rt32)",ATTRIBS(),
+"logical XOR",{ RdV=RsV^RtV;})
+
+DEF_MAPPING(A2_not,"Rd32=not(Rs32)","Rd32=sub(#-1,Rs32)")
+
+Q6INSN(M2_xor_xacc,"Rx32^=xor(Rs32,Rt32)",ATTRIBS(A_ARCHV2),
+"logical XOR with XOR accumulation",{ RxV^=RsV^RtV;})
+
+Q6INSN(M4_xor_xacc,"Rxx32^=xor(Rss32,Rtt32)",,
+"logical XOR with XOR accumulation",{ RxxV^=RssV^RttV;})
+
+
+
+Q6INSN(A4_andn,"Rd32=and(Rt32,~Rs32)",,
+"And-Not", { RdV = (RtV & ~RsV); })
+
+Q6INSN(A4_orn,"Rd32=or(Rt32,~Rs32)",,
+"Or-Not", { RdV = (RtV | ~RsV); })
+
+
+Q6INSN(A4_andnp,"Rdd32=and(Rtt32,~Rss32)",,
+"And-Not", { RddV = (RttV & ~RssV); })
+
+Q6INSN(A4_ornp,"Rdd32=or(Rtt32,~Rss32)",,
+"Or-Not", { RddV = (RttV | ~RssV); })
+
+
+
+
+/********************/
+/* Compound add-add */
+/********************/
+
+Q6INSN(S4_addaddi,"Rd32=add(Rs32,add(Ru32,#s6))",ATTRIBS(A_ROPS_2),
+	"3-input add",
+	{ RdV = RsV + RuV + fIMMEXT(siV); })
+
+
+Q6INSN(S4_subaddi,"Rd32=add(Rs32,sub(#s6,Ru32))",ATTRIBS(A_ROPS_2),
+	"3-input sub",
+	{ RdV = RsV - RuV + fIMMEXT(siV); })
+
+
+
+/****************************/
+/* Compound logical-logical */
+/****************************/
+
+Q6INSN(M4_and_and,"Rx32&=and(Rs32,Rt32)",ATTRIBS(A_ROPS_2),
+"Compound And-And", { RxV &= (RsV & RtV); })
+
+Q6INSN(M4_and_andn,"Rx32&=and(Rs32,~Rt32)",ATTRIBS(A_ROPS_2),
+"Compound And-Andn", { RxV &= (RsV & ~RtV); })
+
+Q6INSN(M4_and_or,"Rx32&=or(Rs32,Rt32)",ATTRIBS(A_ROPS_2),
+"Compound And-Or", { RxV &= (RsV | RtV); })
+
+Q6INSN(M4_and_xor,"Rx32&=xor(Rs32,Rt32)",ATTRIBS(A_ROPS_2),
+"Compound And-xor", { RxV &= (RsV ^ RtV); })
+
+
+
+Q6INSN(M4_or_and,"Rx32|=and(Rs32,Rt32)",ATTRIBS(A_ROPS_2),
+"Compound Or-And", { RxV |= (RsV & RtV); })
+
+Q6INSN(M4_or_andn,"Rx32|=and(Rs32,~Rt32)",ATTRIBS(A_ROPS_2),
+"Compound Or-AndN", { RxV |= (RsV & ~RtV); })
+
+Q6INSN(M4_or_or,"Rx32|=or(Rs32,Rt32)",ATTRIBS(A_ROPS_2),
+"Compound Or-Or", { RxV |= (RsV | RtV); })
+
+Q6INSN(M4_or_xor,"Rx32|=xor(Rs32,Rt32)",ATTRIBS(A_ROPS_2),
+"Compound Or-xor", { RxV |= (RsV ^ RtV); })
+
+
+Q6INSN(S4_or_andix,"Rx32=or(Ru32,and(Rx32,#s10))",ATTRIBS(A_ROPS_2),
+"Compound Or-And", { RxV = RuV | (RxV & fIMMEXT(siV)); })
+
+Q6INSN(S4_or_andi,"Rx32|=and(Rs32,#s10)",ATTRIBS(A_ROPS_2),
+"Compound Or-And", { RxV = RxV | (RsV & fIMMEXT(siV)); })
+
+Q6INSN(S4_or_ori,"Rx32|=or(Rs32,#s10)",ATTRIBS(A_ROPS_2),
+"Compound Or-And", { RxV = RxV | (RsV | fIMMEXT(siV)); })
+
+
+
+
+Q6INSN(M4_xor_and,"Rx32^=and(Rs32,Rt32)",ATTRIBS(A_ROPS_2),
+"Compound Xor-And", { RxV ^= (RsV & RtV); })
+
+Q6INSN(M4_xor_or,"Rx32^=or(Rs32,Rt32)",ATTRIBS(A_ROPS_2),
+"Compound Xor-Or", { RxV ^= (RsV | RtV); })
+
+Q6INSN(M4_xor_andn,"Rx32^=and(Rs32,~Rt32)",ATTRIBS(A_ROPS_2),
+"Compound Xor-And", { RxV ^= (RsV & ~RtV); })
+
+
+
+
+
+
+Q6INSN(A2_subri,"Rd32=sub(#s10,Rs32)",ATTRIBS(A_ARCHV2),
+"Subtract register from immediate",{ fIMMEXT(siV); RdV=siV-RsV;})
+
+Q6INSN(A2_andir,"Rd32=and(Rs32,#s10)",ATTRIBS(A_ARCHV2),
+"logical AND with immediate",{ fIMMEXT(siV); RdV=RsV&siV;})
+
+Q6INSN(A2_orir,"Rd32=or(Rs32,#s10)",ATTRIBS(A_ARCHV2),
+"logical OR with immediate",{ fIMMEXT(siV); RdV=RsV|siV;})
+
+
+
+
+Q6INSN(A2_andp,"Rdd32=and(Rss32,Rtt32)",ATTRIBS(),
+"logical AND pair",{ RddV=RssV&RttV;})
+
+Q6INSN(A2_orp,"Rdd32=or(Rss32,Rtt32)",ATTRIBS(),
+"logical OR pair",{ RddV=RssV|RttV;})
+
+Q6INSN(A2_xorp,"Rdd32=xor(Rss32,Rtt32)",ATTRIBS(),
+"logical eXclusive OR pair",{ RddV=RssV^RttV;})
+
+Q6INSN(A2_notp,"Rdd32=not(Rss32)",ATTRIBS(),
+"logical NOT pair",{ RddV=~RssV;})
+
+Q6INSN(A2_sxtw,"Rdd32=sxtw(Rs32)",ATTRIBS(),
+"Sign extend 32-bit word to 64-bit pair",
+{ RddV = fCAST4_8s(RsV); })
+
+Q6INSN(A2_sat,"Rd32=sat(Rss32)",ATTRIBS(),
+"Saturate to 32-bit Signed",
+{ RdV = fSAT(RssV); })
+
+Q6INSN(A2_roundsat,"Rd32=round(Rss32):sat",ATTRIBS(),
+"Round & Saturate to 32-bit Signed",
+{ fHIDE(size8s_t tmp;) fADDSAT64(tmp,RssV,0x080000000ULL); RdV = fGETWORD(1,tmp); })
+
+Q6INSN(A2_sath,"Rd32=sath(Rs32)",ATTRIBS(),
+"Saturate to 16-bit Signed",
+{ RdV = fSATH(RsV); })
+
+Q6INSN(A2_satuh,"Rd32=satuh(Rs32)",ATTRIBS(),
+"Saturate to 16-bit Unsigned",
+{ RdV = fSATUH(RsV); })
+
+Q6INSN(A2_satub,"Rd32=satub(Rs32)",ATTRIBS(),
+"Saturate to 8-bit Unsigned",
+{ RdV = fSATUB(RsV); })
+
+Q6INSN(A2_satb,"Rd32=satb(Rs32)",ATTRIBS(A_ARCHV2),
+"Saturate to 8-bit Signed",
+{ RdV = fSATB(RsV); })
+
+/**********************************************/
+/* Vector Add				 */
+/**********************************************/
+
+Q6INSN(A2_vaddub,"Rdd32=vaddub(Rss32,Rtt32)",ATTRIBS(),
+"Add vector of bytes",
+{
+	fHIDE(int i;)
+	for (i = 0; i < 8; i++) {
+		fSETBYTE(i,RddV,(fGETUBYTE(i,RssV)+fGETUBYTE(i,RttV)));
+	}
+})
+DEF_MAPPING(A2_vaddb_map,"Rdd32=vaddb(Rss32,Rtt32)","Rdd32=vaddub(Rss32,Rtt32)")
+
+Q6INSN(A2_vaddubs,"Rdd32=vaddub(Rss32,Rtt32):sat",ATTRIBS(),
+"Add vector of bytes",
+{
+	fHIDE(int i;)
+	for (i = 0; i < 8; i++) {
+		fSETBYTE(i,RddV,fSATUN(8,fGETUBYTE(i,RssV)+fGETUBYTE(i,RttV)));
+	}
+})
+
+Q6INSN(A2_vaddh,"Rdd32=vaddh(Rss32,Rtt32)",ATTRIBS(),
+"Add vector of half integers",
+{
+	fHIDE(int i;)
+	for (i=0;i<4;i++) {
+	  fSETHALF(i,RddV,fGETHALF(i,RssV)+fGETHALF(i,RttV));
+	}
+})
+
+Q6INSN(A2_vaddhs,"Rdd32=vaddh(Rss32,Rtt32):sat",ATTRIBS(),
+"Add vector of half integers with saturation",
+{
+	fHIDE(int i;)
+	for (i=0;i<4;i++) {
+	  fSETHALF(i,RddV,fSATN(16,fGETHALF(i,RssV)+fGETHALF(i,RttV)));
+	}
+})
+
+Q6INSN(A2_vadduhs,"Rdd32=vadduh(Rss32,Rtt32):sat",ATTRIBS(),
+"Add vector of unsigned half integers with saturation",
+{
+	fHIDE(int i;)
+	for (i=0;i<4;i++) {
+	  fSETHALF(i,RddV,fSATUN(16,fGETUHALF(i,RssV)+fGETUHALF(i,RttV)));
+	}
+})
+
+Q6INSN(A5_vaddhubs,"Rd32=vaddhub(Rss32,Rtt32):sat",ATTRIBS(),
+"Add vector of half integers with saturation and pack to unsigned bytes",
+{
+	fHIDE(int i;)
+	for (i=0;i<4;i++) {
+	  		fSETBYTE(i,RdV,fSATUB(fGETHALF(i,RssV)+fGETHALF(i,RttV)));
+	}
+})
+
+Q6INSN(A2_vaddw,"Rdd32=vaddw(Rss32,Rtt32)",ATTRIBS(),
+"Add vector of words",
+{
+	fHIDE(int i;)
+	for (i=0;i<2;i++) {
+	  fSETWORD(i,RddV,fGETWORD(i,RssV)+fGETWORD(i,RttV));
+	}
+})
+
+Q6INSN(A2_vaddws,"Rdd32=vaddw(Rss32,Rtt32):sat",ATTRIBS(),
+"Add vector of words with saturation",
+{
+	fHIDE(int i;)
+	for (i=0;i<2;i++) {
+	  fSETWORD(i,RddV,fSATN(32,fGETWORD(i,RssV)+fGETWORD(i,RttV)));
+	}
+})
+
+
+
+Q6INSN(S4_vxaddsubw,"Rdd32=vxaddsubw(Rss32,Rtt32):sat",ATTRIBS(),
+"Cross vector add-sub words with saturation",
+{
+	fSETWORD(0,RddV,fSAT(fGETWORD(0,RssV)+fGETWORD(1,RttV)));
+	fSETWORD(1,RddV,fSAT(fGETWORD(1,RssV)-fGETWORD(0,RttV)));
+})
+Q6INSN(S4_vxsubaddw,"Rdd32=vxsubaddw(Rss32,Rtt32):sat",ATTRIBS(),
+"Cross vector sub-add words with saturation",
+{
+	fSETWORD(0,RddV,fSAT(fGETWORD(0,RssV)-fGETWORD(1,RttV)));
+	fSETWORD(1,RddV,fSAT(fGETWORD(1,RssV)+fGETWORD(0,RttV)));
+})
+
+
+
+Q6INSN(S4_vxaddsubh,"Rdd32=vxaddsubh(Rss32,Rtt32):sat",ATTRIBS(),
+"Cross vector add-sub halfwords with saturation",
+{
+	fSETHALF(0,RddV,fSATH(fGETHALF(0,RssV)+fGETHALF(1,RttV)));
+	fSETHALF(1,RddV,fSATH(fGETHALF(1,RssV)-fGETHALF(0,RttV)));
+
+	fSETHALF(2,RddV,fSATH(fGETHALF(2,RssV)+fGETHALF(3,RttV)));
+	fSETHALF(3,RddV,fSATH(fGETHALF(3,RssV)-fGETHALF(2,RttV)));
+
+})
+Q6INSN(S4_vxsubaddh,"Rdd32=vxsubaddh(Rss32,Rtt32):sat",ATTRIBS(),
+"Cross vector sub-add halfwords with saturation",
+{
+	fSETHALF(0,RddV,fSATH(fGETHALF(0,RssV)-fGETHALF(1,RttV)));
+	fSETHALF(1,RddV,fSATH(fGETHALF(1,RssV)+fGETHALF(0,RttV)));
+
+	fSETHALF(2,RddV,fSATH(fGETHALF(2,RssV)-fGETHALF(3,RttV)));
+	fSETHALF(3,RddV,fSATH(fGETHALF(3,RssV)+fGETHALF(2,RttV)));
+})
+
+
+
+
+Q6INSN(S4_vxaddsubhr,"Rdd32=vxaddsubh(Rss32,Rtt32):rnd:>>1:sat",ATTRIBS(),
+"Cross vector add-sub halfwords with shift, round, and saturation",
+{
+	fSETHALF(0,RddV,fSATH((fGETHALF(0,RssV)+fGETHALF(1,RttV)+1)>>1));
+	fSETHALF(1,RddV,fSATH((fGETHALF(1,RssV)-fGETHALF(0,RttV)+1)>>1));
+
+	fSETHALF(2,RddV,fSATH((fGETHALF(2,RssV)+fGETHALF(3,RttV)+1)>>1));
+	fSETHALF(3,RddV,fSATH((fGETHALF(3,RssV)-fGETHALF(2,RttV)+1)>>1));
+
+})
+Q6INSN(S4_vxsubaddhr,"Rdd32=vxsubaddh(Rss32,Rtt32):rnd:>>1:sat",ATTRIBS(),
+"Cross vector sub-add halfwords with shift, round, and saturation",
+{
+	fSETHALF(0,RddV,fSATH((fGETHALF(0,RssV)-fGETHALF(1,RttV)+1)>>1));
+	fSETHALF(1,RddV,fSATH((fGETHALF(1,RssV)+fGETHALF(0,RttV)+1)>>1));
+
+	fSETHALF(2,RddV,fSATH((fGETHALF(2,RssV)-fGETHALF(3,RttV)+1)>>1));
+	fSETHALF(3,RddV,fSATH((fGETHALF(3,RssV)+fGETHALF(2,RttV)+1)>>1));
+})
+
+
+
+
+
+/**********************************************/
+/* 1/2 Vector operations		      */
+/**********************************************/
+
+
+Q6INSN(A2_svavgh,"Rd32=vavgh(Rs32,Rt32)",ATTRIBS(A_ARCHV2),
+"Avg vector of half integers",
+{
+	fHIDE(int i;)
+	for (i=0;i<2;i++) {
+	  fSETHALF(i,RdV,((fGETHALF(i,RsV)+fGETHALF(i,RtV))>>1));
+	}
+})
+
+Q6INSN(A2_svavghs,"Rd32=vavgh(Rs32,Rt32):rnd",ATTRIBS(A_ARCHV2),
+"Avg vector of half integers with rounding",
+{
+	fHIDE(int i;)
+	for (i=0;i<2;i++) {
+	  fSETHALF(i,RdV,((fGETHALF(i,RsV)+fGETHALF(i,RtV)+1)>>1));
+	}
+})
+
+
+
+Q6INSN(A2_svnavgh,"Rd32=vnavgh(Rt32,Rs32)",ATTRIBS(A_ARCHV2),
+"Avg vector of half integers",
+{
+	fHIDE(int i;)
+	for (i=0;i<2;i++) {
+	  fSETHALF(i,RdV,((fGETHALF(i,RtV)-fGETHALF(i,RsV))>>1));
+	}
+})
+
+
+Q6INSN(A2_svaddh,"Rd32=vaddh(Rs32,Rt32)",ATTRIBS(),
+"Add vector of half integers",
+{
+	fHIDE(int i;)
+	for (i=0;i<2;i++) {
+	  fSETHALF(i,RdV,fGETHALF(i,RsV)+fGETHALF(i,RtV));
+	}
+})
+
+Q6INSN(A2_svaddhs,"Rd32=vaddh(Rs32,Rt32):sat",ATTRIBS(),
+"Add vector of half integers with saturation",
+{
+	fHIDE(int i;)
+	for (i=0;i<2;i++) {
+	  fSETHALF(i,RdV,fSATN(16,fGETHALF(i,RsV)+fGETHALF(i,RtV)));
+	}
+})
+
+Q6INSN(A2_svadduhs,"Rd32=vadduh(Rs32,Rt32):sat",ATTRIBS(),
+"Add vector of unsigned half integers with saturation",
+{
+	fHIDE(int i;)
+	for (i=0;i<2;i++) {
+	  fSETHALF(i,RdV,fSATUN(16,fGETUHALF(i,RsV)+fGETUHALF(i,RtV)));
+	}
+})
+
+
+Q6INSN(A2_svsubh,"Rd32=vsubh(Rt32,Rs32)",ATTRIBS(),
+"Sub vector of half integers",
+{
+	fHIDE(int i;)
+	for (i=0;i<2;i++) {
+	  fSETHALF(i,RdV,fGETHALF(i,RtV)-fGETHALF(i,RsV));
+	}
+})
+
+Q6INSN(A2_svsubhs,"Rd32=vsubh(Rt32,Rs32):sat",ATTRIBS(),
+"Sub vector of half integers with saturation",
+{
+	fHIDE(int i;)
+	for (i=0;i<2;i++) {
+	  fSETHALF(i,RdV,fSATN(16,fGETHALF(i,RtV)-fGETHALF(i,RsV)));
+	}
+})
+
+Q6INSN(A2_svsubuhs,"Rd32=vsubuh(Rt32,Rs32):sat",ATTRIBS(),
+"Sub vector of unsigned half integers with saturation",
+{
+	fHIDE(int i;)
+	for (i=0;i<2;i++) {
+	  fSETHALF(i,RdV,fSATUN(16,fGETUHALF(i,RtV)-fGETUHALF(i,RsV)));
+	}
+})
+
+
+
+
+/**********************************************/
+/* Vector Reduce Add			  */
+/**********************************************/
+
+Q6INSN(A2_vraddub,"Rdd32=vraddub(Rss32,Rtt32)",ATTRIBS(),
+"Sum: two vectors of unsigned bytes",
+{
+	fHIDE(int i;)
+	RddV = 0;
+	for (i=0;i<4;i++) {
+		fSETWORD(0,RddV,(fGETWORD(0,RddV) + (fGETUBYTE(i,RssV)+fGETUBYTE(i,RttV))));
+	}
+	for (i=4;i<8;i++) {
+		fSETWORD(1,RddV,(fGETWORD(1,RddV) + (fGETUBYTE(i,RssV)+fGETUBYTE(i,RttV))));
+	}
+})
+
+Q6INSN(A2_vraddub_acc,"Rxx32+=vraddub(Rss32,Rtt32)",ATTRIBS(),
+"Sum: two vectors of unsigned bytes",
+{
+	fHIDE(int i;)
+	for (i = 0; i < 4; i++) {
+		fSETWORD(0,RxxV,(fGETWORD(0,RxxV) + (fGETUBYTE(i,RssV)+fGETUBYTE(i,RttV))));
+	}
+	for (i = 4; i < 8; i++) {
+		fSETWORD(1,RxxV,(fGETWORD(1,RxxV) + (fGETUBYTE(i,RssV)+fGETUBYTE(i,RttV))));
+	}
+    fACC();\
+})
+
+
+
+Q6INSN(M2_vraddh,"Rd32=vraddh(Rss32,Rtt32)",ATTRIBS(A_ARCHV3),
+"Sum: two vectors of halves",
+{
+	fHIDE(int i;)
+	RdV = 0;
+	for (i=0;i<4;i++) {
+		RdV += (fGETHALF(i,RssV)+fGETHALF(i,RttV));
+	}
+})
+
+Q6INSN(M2_vradduh,"Rd32=vradduh(Rss32,Rtt32)",ATTRIBS(A_ARCHV3),
+"Sum: two vectors of unsigned halves",
+{
+	fHIDE(int i;)
+	RdV = 0;
+	for (i=0;i<4;i++) {
+		RdV += (fGETUHALF(i,RssV)+fGETUHALF(i,RttV));
+	}
+})
+
+/**********************************************/
+/* Vector Sub				 */
+/**********************************************/
+
+Q6INSN(A2_vsubub,"Rdd32=vsubub(Rtt32,Rss32)",ATTRIBS(),
+"Sub vector of bytes",
+{
+	fHIDE(int i;)
+	for (i = 0; i < 8; i++) {
+		fSETBYTE(i,RddV,(fGETUBYTE(i,RttV)-fGETUBYTE(i,RssV)));
+	}
+})
+DEF_MAPPING(A2_vsubb_map,"Rdd32=vsubb(Rss32,Rtt32)","Rdd32=vsubub(Rss32,Rtt32)")
+
+Q6INSN(A2_vsububs,"Rdd32=vsubub(Rtt32,Rss32):sat",ATTRIBS(),
+"Sub vector of bytes",
+{
+	fHIDE(int i;)
+	for (i = 0; i < 8; i++) {
+		fSETBYTE(i,RddV,fSATUN(8,fGETUBYTE(i,RttV)-fGETUBYTE(i,RssV)));
+	}
+})
+
+Q6INSN(A2_vsubh,"Rdd32=vsubh(Rtt32,Rss32)",ATTRIBS(),
+"Sub vector of half integers",
+{
+	fHIDE(int i;)
+	for (i=0;i<4;i++) {
+	  fSETHALF(i,RddV,fGETHALF(i,RttV)-fGETHALF(i,RssV));
+	}
+})
+
+Q6INSN(A2_vsubhs,"Rdd32=vsubh(Rtt32,Rss32):sat",ATTRIBS(),
+"Sub vector of half integers with saturation",
+{
+	fHIDE(int i;)
+	for (i=0;i<4;i++) {
+	  fSETHALF(i,RddV,fSATN(16,fGETHALF(i,RttV)-fGETHALF(i,RssV)));
+	}
+})
+
+Q6INSN(A2_vsubuhs,"Rdd32=vsubuh(Rtt32,Rss32):sat",ATTRIBS(),
+"Sub vector of unsigned half integers with saturation",
+{
+	fHIDE(int i;)
+	for (i=0;i<4;i++) {
+	  fSETHALF(i,RddV,fSATUN(16,fGETUHALF(i,RttV)-fGETUHALF(i,RssV)));
+	}
+})
+
+Q6INSN(A2_vsubw,"Rdd32=vsubw(Rtt32,Rss32)",ATTRIBS(),
+"Sub vector of words",
+{
+	fHIDE(int i;)
+	for (i=0;i<2;i++) {
+	  fSETWORD(i,RddV,fGETWORD(i,RttV)-fGETWORD(i,RssV));
+	}
+})
+
+Q6INSN(A2_vsubws,"Rdd32=vsubw(Rtt32,Rss32):sat",ATTRIBS(),
+"Sub vector of words with saturation",
+{
+	fHIDE(int i;)
+	for (i=0;i<2;i++) {
+	  fSETWORD(i,RddV,fSATN(32,fGETWORD(i,RttV)-fGETWORD(i,RssV)));
+	}
+})
+
+
+
+
+/**********************************************/
+/* Vector Abs				 */
+/**********************************************/
+
+Q6INSN(A2_vabsh,"Rdd32=vabsh(Rss32)",ATTRIBS(),
+"Negate vector of half integers",
+{
+	fHIDE(int i;)
+	for (i=0;i<4;i++) {
+	  fSETHALF(i,RddV,fABS(fGETHALF(i,RssV)));
+	}
+})
+
+Q6INSN(A2_vabshsat,"Rdd32=vabsh(Rss32):sat",ATTRIBS(),
+"Negate vector of half integers",
+{
+	fHIDE(int i;)
+	for (i=0;i<4;i++) {
+	  fSETHALF(i,RddV,fSATH(fABS(fGETHALF(i,RssV))));
+	}
+})
+
+Q6INSN(A2_vabsw,"Rdd32=vabsw(Rss32)",ATTRIBS(),
+"Absolute Value vector of words",
+{
+	fHIDE(int i;)
+	for (i=0;i<2;i++) {
+	  fSETWORD(i,RddV,fABS(fGETWORD(i,RssV)));
+	}
+})
+
+Q6INSN(A2_vabswsat,"Rdd32=vabsw(Rss32):sat",ATTRIBS(),
+"Absolute Value vector of words",
+{
+	fHIDE(int i;)
+	for (i=0;i<2;i++) {
+	  fSETWORD(i,RddV,fSAT(fABS(fGETWORD(i,RssV))));
+	}
+})
+
+/**********************************************/
+/* Vector SAD				 */
+/**********************************************/
+
+
+Q6INSN(M2_vabsdiffw,"Rdd32=vabsdiffw(Rtt32,Rss32)",ATTRIBS(A_ARCHV2),
+"Absolute Differences: vector of words",
+{
+	fHIDE(int i;)
+	for (i=0;i<2;i++) {
+	  fSETWORD(i,RddV,fABS(fGETWORD(i,RttV) - fGETWORD(i,RssV)));
+	}
+})
+
+Q6INSN(M2_vabsdiffh,"Rdd32=vabsdiffh(Rtt32,Rss32)",ATTRIBS(A_ARCHV2),
+"Absolute Differences: vector of halfwords",
+{
+	fHIDE(int i;)
+	for (i=0;i<4;i++) {
+	  fSETHALF(i,RddV,fABS(fGETHALF(i,RttV) - fGETHALF(i,RssV)));
+	}
+})
+
+Q6INSN(M6_vabsdiffb,"Rdd32=vabsdiffb(Rtt32,Rss32)",ATTRIBS(),
+"Absolute Differences: vector of halfwords",
+{
+	fHIDE(int i;)
+	for (i=0;i<8;i++) {
+	  fSETBYTE(i,RddV,fABS(fGETBYTE(i,RttV) - fGETBYTE(i,RssV)));
+	}
+})
+
+Q6INSN(M6_vabsdiffub,"Rdd32=vabsdiffub(Rtt32,Rss32)",ATTRIBS(),
+"Absolute Differences: vector of halfwords",
+{
+	fHIDE(int i;)
+	for (i=0;i<8;i++) {
+	  fSETBYTE(i,RddV,fABS(fGETUBYTE(i,RttV) - fGETUBYTE(i,RssV)));
+	}
+})
+
+
+
+Q6INSN(A2_vrsadub,"Rdd32=vrsadub(Rss32,Rtt32)",ATTRIBS(),
+"Sum of Absolute Differences: vector of unsigned bytes",
+{
+	fHIDE(int i;)
+	RddV = 0;
+	for (i = 0; i < 4; i++) {
+		fSETWORD(0,RddV,(fGETWORD(0,RddV) + fABS((fGETUBYTE(i,RssV) - fGETUBYTE(i,RttV)))));
+	}
+	for (i = 4; i < 8; i++) {
+		fSETWORD(1,RddV,(fGETWORD(1,RddV) + fABS((fGETUBYTE(i,RssV) - fGETUBYTE(i,RttV)))));
+	}
+})
+
+Q6INSN(A2_vrsadub_acc,"Rxx32+=vrsadub(Rss32,Rtt32)",ATTRIBS(),
+"Sum of Absolute Differences: vector of unsigned bytes",
+{
+	fHIDE(int i;)
+	for (i = 0; i < 4; i++) {
+		fSETWORD(0,RxxV,(fGETWORD(0,RxxV) + fABS((fGETUBYTE(i,RssV) - fGETUBYTE(i,RttV)))));
+	}
+	for (i = 4; i < 8; i++) {
+		fSETWORD(1,RxxV,(fGETWORD(1,RxxV) + fABS((fGETUBYTE(i,RssV) - fGETUBYTE(i,RttV)))));
+	}
+    fACC();\
+})
+
+
+/**********************************************/
+/* Vector Average			     */
+/**********************************************/
+
+Q6INSN(A2_vavgub,"Rdd32=vavgub(Rss32,Rtt32)",ATTRIBS(),
+"Average vector of unsigned bytes",
+{
+	fHIDE(int i;)
+	for (i = 0; i < 8; i++) {
+		fSETBYTE(i,RddV,((fGETUBYTE(i,RssV) + fGETUBYTE(i,RttV))>>1));
+	}
+})
+
+Q6INSN(A2_vavguh,"Rdd32=vavguh(Rss32,Rtt32)",ATTRIBS(),
+"Average vector of unsigned halfwords",
+{
+	fHIDE(int i;)
+	for (i=0;i<4;i++) {
+		fSETHALF(i,RddV,(fGETUHALF(i,RssV)+fGETUHALF(i,RttV))>>1);
+	}
+})
+
+Q6INSN(A2_vavgh,"Rdd32=vavgh(Rss32,Rtt32)",ATTRIBS(),
+"Average vector of halfwords",
+{
+	fHIDE(int i;)
+	for (i=0;i<4;i++) {
+		fSETHALF(i,RddV,(fGETHALF(i,RssV)+fGETHALF(i,RttV))>>1);
+	}
+})
+
+Q6INSN(A2_vnavgh,"Rdd32=vnavgh(Rtt32,Rss32)",ATTRIBS(),
+"Negative Average vector of halfwords",
+{
+	fHIDE(int i;)
+	for (i=0;i<4;i++) {
+		fSETHALF(i,RddV,(fGETHALF(i,RttV)-fGETHALF(i,RssV))>>1);
+	}
+})
+
+Q6INSN(A2_vavgw,"Rdd32=vavgw(Rss32,Rtt32)",ATTRIBS(),
+"Average vector of words",
+{
+	fHIDE(int i;)
+	for (i=0;i<2;i++) {
+		fSETWORD(i,RddV,(fSXTN(32,33,fGETWORD(i,RssV))+fSXTN(32,33,fGETWORD(i,RttV)))>>1);
+	}
+})
+
+Q6INSN(A2_vnavgw,"Rdd32=vnavgw(Rtt32,Rss32)",ATTRIBS(A_ARCHV2),
+"Average vector of words",
+{
+	fHIDE(int i;)
+	for (i=0;i<2;i++) {
+		fSETWORD(i,RddV,(fSXTN(32,33,fGETWORD(i,RttV))-fSXTN(32,33,fGETWORD(i,RssV)))>>1);
+	}
+})
+
+Q6INSN(A2_vavgwr,"Rdd32=vavgw(Rss32,Rtt32):rnd",ATTRIBS(),
+"Average vector of words",
+{
+	fHIDE(int i;)
+	for (i=0;i<2;i++) {
+		fSETWORD(i,RddV,(fSXTN(32,33,fGETWORD(i,RssV))+fSXTN(32,33,fGETWORD(i,RttV))+1)>>1);
+	}
+})
+
+Q6INSN(A2_vnavgwr,"Rdd32=vnavgw(Rtt32,Rss32):rnd:sat",ATTRIBS(A_ARCHV2),
+"Average vector of words",
+{
+	fHIDE(int i;)
+	for (i=0;i<2;i++) {
+		fSETWORD(i,RddV,fSAT((fSXTN(32,33,fGETWORD(i,RttV))-fSXTN(32,33,fGETWORD(i,RssV))+1)>>1));
+	}
+})
+
+Q6INSN(A2_vavgwcr,"Rdd32=vavgw(Rss32,Rtt32):crnd",ATTRIBS(A_ARCHV2),
+"Average vector of words with convergent rounding",
+{
+	fHIDE(int i;)
+	for (i=0;i<2;i++) {
+		fSETWORD(i,RddV,(fCRND(fSXTN(32,33,fGETWORD(i,RssV))+fSXTN(32,33,fGETWORD(i,RttV)))>>1));
+	}
+})
+
+Q6INSN(A2_vnavgwcr,"Rdd32=vnavgw(Rtt32,Rss32):crnd:sat",ATTRIBS(A_ARCHV2),
+"Average negative vector of words with convergent rounding",
+{
+	fHIDE(int i;)
+	for (i=0;i<2;i++) {
+		fSETWORD(i,RddV,fSAT(fCRND(fSXTN(32,33,fGETWORD(i,RttV))-fSXTN(32,33,fGETWORD(i,RssV)))>>1));
+	}
+})
+
+Q6INSN(A2_vavghcr,"Rdd32=vavgh(Rss32,Rtt32):crnd",ATTRIBS(A_ARCHV2),
+"Average vector of halfwords with conv rounding",
+{
+	fHIDE(int i;)
+	for (i=0;i<4;i++) {
+		fSETHALF(i,RddV,fCRND(fGETHALF(i,RssV)+fGETHALF(i,RttV))>>1);
+	}
+})
+
+Q6INSN(A2_vnavghcr,"Rdd32=vnavgh(Rtt32,Rss32):crnd:sat",ATTRIBS(A_ARCHV2),
+"Average negative vector of halfwords with conv rounding",
+{
+	fHIDE(int i;)
+	for (i=0;i<4;i++) {
+		fSETHALF(i,RddV,fSATH(fCRND(fGETHALF(i,RttV)-fGETHALF(i,RssV))>>1));
+	}
+})
+
+
+Q6INSN(A2_vavguw,"Rdd32=vavguw(Rss32,Rtt32)",ATTRIBS(),
+"Average vector of unsigned words",
+{
+	fHIDE(int i;)
+	for (i=0;i<2;i++) {
+		fSETWORD(i,RddV,(fZXTN(32,33,fGETUWORD(i,RssV))+fZXTN(32,33,fGETUWORD(i,RttV)))>>1);
+	}
+})
+
+Q6INSN(A2_vavguwr,"Rdd32=vavguw(Rss32,Rtt32):rnd",ATTRIBS(),
+"Average vector of unsigned words",
+{
+	fHIDE(int i;)
+	for (i=0;i<2;i++) {
+		fSETWORD(i,RddV,(fZXTN(32,33,fGETUWORD(i,RssV))+fZXTN(32,33,fGETUWORD(i,RttV))+1)>>1);
+	}
+})
+
+Q6INSN(A2_vavgubr,"Rdd32=vavgub(Rss32,Rtt32):rnd",ATTRIBS(),
+"Average vector of unsigned bytes",
+{
+	fHIDE(int i;)
+	for (i = 0; i < 8; i++) {
+		fSETBYTE(i,RddV,((fGETUBYTE(i,RssV)+fGETUBYTE(i,RttV)+1)>>1));
+	}
+})
+
+Q6INSN(A2_vavguhr,"Rdd32=vavguh(Rss32,Rtt32):rnd",ATTRIBS(),
+"Average vector of unsigned halfwords with rounding",
+{
+	fHIDE(int i;)
+	for (i=0;i<4;i++) {
+		fSETHALF(i,RddV,(fGETUHALF(i,RssV)+fGETUHALF(i,RttV)+1)>>1);
+	}
+})
+
+Q6INSN(A2_vavghr,"Rdd32=vavgh(Rss32,Rtt32):rnd",ATTRIBS(),
+"Average vector of halfwords with rounding",
+{
+	fHIDE(int i;)
+	for (i=0;i<4;i++) {
+		fSETHALF(i,RddV,(fGETHALF(i,RssV)+fGETHALF(i,RttV)+1)>>1);
+	}
+})
+
+Q6INSN(A2_vnavghr,"Rdd32=vnavgh(Rtt32,Rss32):rnd:sat",ATTRIBS(A_ARCHV2),
+"Negative Average vector of halfwords with rounding",
+{
+	fHIDE(int i;)
+	for (i=0;i<4;i++) {
+		fSETHALF(i,RddV,fSATH((fGETHALF(i,RttV)-fGETHALF(i,RssV)+1)>>1));
+	}
+})
+
+
+/* Rounding Instruction */
+
+Q6INSN(A4_round_ri,"Rd32=round(Rs32,#u5)",ATTRIBS(),"Round", {RdV = fRNDN(RsV,uiV)>>uiV; })
+Q6INSN(A4_round_rr,"Rd32=round(Rs32,Rt32)",ATTRIBS(),"Round", {RdV = fRNDN(RsV,fZXTN(5,32,RtV))>>fZXTN(5,32,RtV); })
+Q6INSN(A4_round_ri_sat,"Rd32=round(Rs32,#u5):sat",ATTRIBS(),"Round", {RdV = (fSAT(fRNDN(RsV,uiV)))>>uiV; })
+Q6INSN(A4_round_rr_sat,"Rd32=round(Rs32,Rt32):sat",ATTRIBS(),"Round", {RdV = (fSAT(fRNDN(RsV,fZXTN(5,32,RtV))))>>fZXTN(5,32,RtV); })
+
+
+Q6INSN(A4_cround_ri,"Rd32=cround(Rs32,#u5)",ATTRIBS(),"Convergent Round", {RdV = fCRNDN(RsV,uiV); })
+Q6INSN(A4_cround_rr,"Rd32=cround(Rs32,Rt32)",ATTRIBS(),"Convergent Round", {RdV = fCRNDN(RsV,fZXTN(5,32,RtV)); })
+
+
+#define CROUND(DST,SRC,SHIFT) \
+	fHIDE(size16s_t rndbit_128;)\
+	fHIDE(size16s_t tmp128;)\
+	fHIDE(size16s_t src_128;)\
+	if (SHIFT == 0) { \
+		DST = SRC;\
+	} else if ((SRC & (size8s_t)((1LL << (SHIFT - 1)) - 1LL)) == 0) { \
+		src_128 = fCAST8S_16S(SRC);\
+		rndbit_128 = fCAST8S_16S(1LL);\
+		rndbit_128 = fSHIFTL128(rndbit_128, SHIFT);\
+		rndbit_128 = fAND128(rndbit_128, src_128);\
+		rndbit_128 = fSHIFTR128(rndbit_128, 1);\
+		tmp128 = fADD128(src_128, rndbit_128);\
+		tmp128 = fSHIFTR128(tmp128, SHIFT);\
+		DST =  fCAST16S_8S(tmp128);\
+	} else {\
+		size16s_t rndbit_128 =  fCAST8S_16S((1LL << (SHIFT - 1))); \
+		size16s_t src_128 =  fCAST8S_16S(SRC); \
+		size16s_t tmp128 = fADD128(src_128, rndbit_128);\
+		tmp128 = fSHIFTR128(tmp128, SHIFT);\
+		DST =  fCAST16S_8S(tmp128);\
+	}
+
+Q6INSN(A7_croundd_ri,"Rdd32=cround(Rss32,#u6)",ATTRIBS(),"Convergent Round",
+{
+CROUND(RddV,RssV,uiV);
+fEXTENSION_AUDIO();})
+
+Q6INSN(A7_croundd_rr,"Rdd32=cround(Rss32,Rt32)",ATTRIBS(),"Convergent Round",
+{
+CROUND(RddV,RssV,fZXTN(6,32,RtV));
+fEXTENSION_AUDIO();})
+
+
+
+
+
+
+
+
+
+Q6INSN(A7_clip,"Rd32=clip(Rs32,#u5)",ATTRIBS(),"Clip to  #s5", 	{   fCLIP(RdV,RsV,uiV); fEXTENSION_AUDIO();})
+Q6INSN(A7_vclip,"Rdd32=vclip(Rss32,#u5)",ATTRIBS(),"Clip to  #s5",
+{
+fHIDE(size4s_t tmp;)
+fCLIP(tmp, fGETWORD(0, RssV), uiV);
+fSETWORD(0, RddV, tmp);
+fCLIP(tmp,fGETWORD(1, RssV), uiV);
+fSETWORD(1, RddV, tmp);
+fEXTENSION_AUDIO();
+}
+)
+
+
+
+/**********************************************/
+/* V4: Cross Vector Min/Max		   */
+/**********************************************/
+
+
+#define VRMINORMAX(TAG,STR,OP,SHORTTYPE,SETTYPE,GETTYPE,NEL,SHIFT) \
+Q6INSN(A4_vr##TAG##SHORTTYPE,"Rxx32=vr"#TAG#SHORTTYPE"(Rss32,Ru32)",ATTRIBS(), \
+"Choose " STR " elements of a vector", \
+{ \
+	fHIDE(int i; size8s_t TAG; size4s_t addr;) \
+	TAG = fGET##GETTYPE(0,RxxV); \
+	addr = fGETWORD(1,RxxV); \
+	for (i = 0; i < NEL; i++) { \
+		if (TAG OP fGET##GETTYPE(i,RssV)) { \
+			TAG = fGET##GETTYPE(i,RssV); \
+			addr = RuV | i<<SHIFT; \
+		} \
+	} \
+	fSETWORD(0,RxxV,TAG); \
+	fSETWORD(1,RxxV,addr); \
+})
+
+#define RMINMAX(SHORTTYPE,SETTYPE,GETTYPE,NEL,SHIFT) \
+VRMINORMAX(min,"minimum",>,SHORTTYPE,SETTYPE,GETTYPE,NEL,SHIFT) \
+VRMINORMAX(max,"maximum",<,SHORTTYPE,SETTYPE,GETTYPE,NEL,SHIFT)
+
+
+RMINMAX(h,HALF,HALF,4,1)
+RMINMAX(uh,HALF,UHALF,4,1)
+RMINMAX(w,WORD,WORD,2,2)
+RMINMAX(uw,WORD,UWORD,2,2)
+
+#undef RMINMAX
+#undef VRMINORMAX
+
+/**********************************************/
+/* Vector Min/Max			     */
+/**********************************************/
+
+#define VMINORMAX(TAG,STR,FUNC,SHORTTYPE,SETTYPE,GETTYPE,NEL) \
+Q6INSN(A2_v##TAG##SHORTTYPE,"Rdd32=v"#TAG#SHORTTYPE"(Rtt32,Rss32)",ATTRIBS(), \
+"Choose " STR " elements of two vectors", \
+{ \
+	fHIDE(int i;) \
+	for (i = 0; i < NEL; i++) { \
+		fSET##SETTYPE(i,RddV,FUNC(fGET##GETTYPE(i,RttV),fGET##GETTYPE(i,RssV))); \
+	} \
+})
+
+#define VMINORMAX3(TAG,STR,FUNC,SHORTTYPE,SETTYPE,GETTYPE,NEL) \
+Q6INSN(A6_v##TAG##SHORTTYPE##3,"Rxx32=v"#TAG#SHORTTYPE"3(Rtt32,Rss32)",ATTRIBS(), \
+"Choose " STR " elements of two vectors", \
+{ \
+	fHIDE(int i;) \
+	for (i = 0; i < NEL; i++) { \
+		fSET##SETTYPE(i,RxxV,FUNC(fGET##GETTYPE(i,RxxV),FUNC(fGET##GETTYPE(i,RttV),fGET##GETTYPE(i,RssV)))); \
+	} \
+})
+
+#define MINMAX(SHORTTYPE,SETTYPE,GETTYPE,NEL) \
+VMINORMAX(min,"minimum",fMIN,SHORTTYPE,SETTYPE,GETTYPE,NEL) \
+VMINORMAX(max,"maximum",fMAX,SHORTTYPE,SETTYPE,GETTYPE,NEL)
+
+MINMAX(b,BYTE,BYTE,8)
+MINMAX(ub,BYTE,UBYTE,8)
+MINMAX(h,HALF,HALF,4)
+MINMAX(uh,HALF,UHALF,4)
+MINMAX(w,WORD,WORD,2)
+MINMAX(uw,WORD,UWORD,2)
+
+#undef MINMAX
+#undef VMINORMAX
+#undef VMINORMAX3
+
+
+Q6INSN(A5_ACS,"Rxx32,Pe4=vacsh(Rss32,Rtt32)",ATTRIBS(A_NOTE_LATEPRED,A_RESTRICT_LATEPRED),
+"Add Compare and Select elements of two vectors, record the maximums and the decisions ",
+{
+        fHIDE(int i;)
+        fHIDE(int xv;)
+        fHIDE(int sv;)
+        fHIDE(int tv;)
+        for (i = 0; i < 4; i++) {
+                xv = (int) fGETHALF(i,RxxV);
+                sv = (int) fGETHALF(i,RssV);
+                tv = (int) fGETHALF(i,RttV);
+                xv = xv + tv;           //assumes 17bit datapath
+                sv = sv - tv;           //assumes 17bit datapath
+                fSETBIT(i*2,  PeV,  (xv > sv));
+                fSETBIT(i*2+1,PeV,  (xv > sv));
+		fSETHALF(i,   RxxV, fSATH(fMAX(xv,sv)));
+        }
+})
+
+Q6INSN(A6_vminub_RdP,"Rdd32,Pe4=vminub(Rtt32,Rss32)",ATTRIBS(A_NOTE_LATEPRED,A_RESTRICT_LATEPRED),
+"Vector minimum of bytes, records minimum and decision vector",
+{
+        fHIDE(int i;)
+        for (i = 0; i < 8; i++) {
+			fSETBIT(i, PeV,     (fGETUBYTE(i,RttV) > fGETUBYTE(i,RssV)));
+			fSETBYTE(i,RddV,fMIN(fGETUBYTE(i,RttV),fGETUBYTE(i,RssV)));
+        }
+})
+
+/**********************************************/
+/* Vector Min/Max			     */
+/**********************************************/
+
+
+Q6INSN(A4_modwrapu,"Rd32=modwrap(Rs32,Rt32)",ATTRIBS(),
+"Wrap to an unsigned modulo buffer",
+{
+	if (RsV < 0) {
+	    RdV = RsV + fCAST4u(RtV);
+	} else if (fCAST4u(RsV) >= fCAST4u(RtV)) {
+	    RdV = RsV - fCAST4u(RtV);
+	} else {
+ 	 		 RdV = RsV;
+    }
+})
+
diff --git a/target/hexagon/imported/branch.idef b/target/hexagon/imported/branch.idef
new file mode 100644
index 0000000..ed066f6
--- /dev/null
+++ b/target/hexagon/imported/branch.idef
@@ -0,0 +1,344 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+
+/*********************************************/
+/* Jump instructions			 */
+/*********************************************/
+
+#define A_COFSTD A_RESTRICT_LOOP_LA,A_RESTRICT_COF_MAX1
+#define A_JDIR A_JUMP,A_DIRECT,A_COFSTD,A_BRANCHADDER
+#define A_CJNEWDIR A_JUMP,A_DIRECT,A_CJUMP,A_COFSTD,A_BRANCHADDER,A_DOTNEW,A_ROPS_2
+#define A_CJOLDDIR A_JUMP,A_DIRECT,A_CJUMP,A_COFSTD,A_BRANCHADDER,A_DOTOLD
+#define A_NEWVALUEJ A_JUMP,A_DIRECT,A_CJUMP,A_COFSTD,A_BRANCHADDER,A_RESTRICT_COF_MAX1,A_DOTNEWVALUE,A_MEMLIKE_PACKET_RULES,A_RESTRICT_SINGLE_MEM_FIRST,A_ROPS_2
+#define A_JINDIR A_JUMP,A_INDIRECT,A_COFSTD
+#define A_JINDIRNEW A_JUMP,A_INDIRECT,A_COFSTD,A_CJUMP,A_DOTNEW
+#define A_JINDIROLD A_JUMP,A_INDIRECT,A_COFSTD,A_CJUMP,A_DOTOLD
+
+#define A_RELAX_COND A_RESTRICT_COF_MAX1,A_RELAX_COF_2ND,A_RELAX_COF_1ST
+#define A_RELAX_UNC A_RESTRICT_COF_MAX1,A_RELAX_COF_2ND
+
+Q6INSN(J2_jump,"jump #r22:2",ATTRIBS(A_JDIR,A_RELAX_UNC), "direct unconditional jump",
+{fIMMEXT(riV); fPCALIGN(riV); fBRANCH(fREAD_PC()+riV,COF_TYPE_JUMP);})
+
+Q6INSN(J2_jumpr,"jumpr Rs32",ATTRIBS(A_JINDIR), "indirect unconditional jump",
+{fJUMPR(RsN,RsV,COF_TYPE_JUMPR);})
+
+#define OLDCOND_JUMP(TAG,OPER,OPER2,ATTRIB,DESCR,SEMANTICS) \
+Q6INSN(TAG##t,"if (Pu4) "OPER":nt "OPER2,ATTRIB,DESCR,{fBRANCH_SPECULATE_STALL(fLSBOLD(PuV),,SPECULATE_NOT_TAKEN,12,0); if (fLSBOLD(PuV)) { SEMANTICS; }}) \
+Q6INSN(TAG##f,"if (!Pu4) "OPER":nt "OPER2,ATTRIB,DESCR,{fBRANCH_SPECULATE_STALL(fLSBOLDNOT(PuV),,SPECULATE_NOT_TAKEN,12,0); if (fLSBOLDNOT(PuV)) { SEMANTICS; }}) \
+Q6INSN(TAG##tpt,"if (Pu4) "OPER":t "OPER2,ATTRIB,DESCR,{fBRANCH_SPECULATE_STALL(fLSBOLD(PuV),,SPECULATE_TAKEN,12,0); if (fLSBOLD(PuV)) { SEMANTICS; }}) \
+Q6INSN(TAG##fpt,"if (!Pu4) "OPER":t "OPER2,ATTRIB,DESCR,{fBRANCH_SPECULATE_STALL(fLSBOLDNOT(PuV),,SPECULATE_TAKEN,12,0); if (fLSBOLDNOT(PuV)) { SEMANTICS; }}) \
+DEF_MAPPING(TAG##t_nopred_map,"if (Pu4) "OPER" "OPER2,"if (Pu4) "OPER":nt "OPER2) \
+DEF_MAPPING(TAG##f_nopred_map,"if (!Pu4) "OPER" "OPER2,"if (!Pu4) "OPER":nt "OPER2)
+
+OLDCOND_JUMP(J2_jump,"jump","#r15:2",ATTRIBS(A_CJOLDDIR,A_NOTE_CONDITIONAL,A_RELAX_COND,A_PRED_BIT_12),"direct conditional jump",
+fIMMEXT(riV);fPCALIGN(riV); fBRANCH(fREAD_PC()+riV,COF_TYPE_JUMP);)
+
+OLDCOND_JUMP(J2_jumpr,"jumpr","Rs32",ATTRIBS(A_JINDIROLD,A_NOTE_CONDITIONAL,A_PRED_BIT_12),"indirect conditional jump",
+fJUMPR(RsN,RsV,COF_TYPE_JUMPR);)
+
+#define NEWCOND_JUMP(TAG,OPER,OPER2,ATTRIB,DESCR,SEMANTICS)\
+Q6INSN(TAG##tnew,"if (Pu4.new) "OPER":nt "OPER2,ATTRIB,DESCR,{fBRANCH_SPECULATE_STALL(fLSBNEW(PuN),, SPECULATE_NOT_TAKEN , 12,0)} {if(fLSBNEW(PuN)){SEMANTICS;}})\
+Q6INSN(TAG##fnew,"if (!Pu4.new) "OPER":nt "OPER2,ATTRIB,DESCR,{fBRANCH_SPECULATE_STALL(fLSBNEWNOT(PuN),, SPECULATE_NOT_TAKEN , 12,0)} {if(fLSBNEWNOT(PuN)){SEMANTICS;}})\
+Q6INSN(TAG##tnewpt,"if (Pu4.new) "OPER":t "OPER2,ATTRIB,DESCR,{fBRANCH_SPECULATE_STALL(fLSBNEW(PuN),, SPECULATE_TAKEN , 12,0)} {if(fLSBNEW(PuN)){SEMANTICS;}})\
+Q6INSN(TAG##fnewpt,"if (!Pu4.new) "OPER":t "OPER2,ATTRIB,DESCR,{fBRANCH_SPECULATE_STALL(fLSBNEWNOT(PuN),, SPECULATE_TAKEN , 12,0)} {if(fLSBNEWNOT(PuN)){SEMANTICS;}})
+
+NEWCOND_JUMP(J2_jump,"jump","#r15:2",ATTRIBS(A_CJNEWDIR,A_NOTE_CONDITIONAL,A_ARCHV2,A_RELAX_COND,A_PRED_BIT_12),"direct conditional jump",
+fIMMEXT(riV); fPCALIGN(riV); fBRANCH(fREAD_PC()+riV,COF_TYPE_JUMPNEW);)
+
+NEWCOND_JUMP(J2_jumpr,"jumpr","Rs32",ATTRIBS(A_JINDIRNEW,A_NOTE_CONDITIONAL,A_ARCHV3,A_PRED_BIT_12),"indirect conditional jump",
+fJUMPR(RsN,RsV,COF_TYPE_JUMPR);)
+
+
+
+Q6INSN(J4_hintjumpr,"hintjr(Rs32)",ATTRIBS(A_JINDIR,A_HINTJR),"hint indirect conditional jump",
+{fHINTJR(RsV);})
+
+
+/*********************************************/
+/* Compound Compare-Jumps		    */
+/*********************************************/
+Q6INSN(J2_jumprz,"if (Rs32!=#0) jump:nt #r13:2",ATTRIBS(A_NOTE_DEPRECATED,A_CJNEWDIR,A_ARCHV3,A_RELAX_COND,A_PRED_BIT_12),"direct conditional jump if register true",
+{fBRANCH_SPECULATE_STALL((RsV!=0), , SPECULATE_NOT_TAKEN,12,0) if (RsV != 0) { fBRANCH(fREAD_PC()+riV,COF_TYPE_JUMP);}})
+
+Q6INSN(J2_jumprnz,"if (Rs32==#0) jump:nt #r13:2",ATTRIBS(A_NOTE_DEPRECATED,A_CJNEWDIR,A_ARCHV3,A_RELAX_COND,A_PRED_BIT_12),"direct conditional jump if register false",
+{fBRANCH_SPECULATE_STALL((RsV==0), , SPECULATE_NOT_TAKEN,12,0) if (RsV == 0) {fBRANCH(fREAD_PC()+riV,COF_TYPE_JUMP);}})
+
+Q6INSN(J2_jumprzpt,"if (Rs32!=#0) jump:t #r13:2",ATTRIBS(A_NOTE_DEPRECATED,A_CJNEWDIR,A_ARCHV3,A_RELAX_COND,A_PRED_BIT_12),"direct conditional jump if register true",
+{fBRANCH_SPECULATE_STALL((RsV!=0), , SPECULATE_TAKEN,12,0) if (RsV != 0) { fBRANCH(fREAD_PC()+riV,COF_TYPE_JUMP);}})
+
+Q6INSN(J2_jumprnzpt,"if (Rs32==#0) jump:t #r13:2",ATTRIBS(A_NOTE_DEPRECATED,A_CJNEWDIR,A_ARCHV3,A_RELAX_COND,A_PRED_BIT_12),"direct conditional jump if register false",
+{fBRANCH_SPECULATE_STALL((RsV==0), , SPECULATE_TAKEN,12,0) if (RsV == 0) {fBRANCH(fREAD_PC()+riV,COF_TYPE_JUMP);}})
+
+Q6INSN(J2_jumprgtez,"if (Rs32>=#0) jump:nt #r13:2",ATTRIBS(A_NOTE_DEPRECATED,A_CJNEWDIR,A_ARCHV3,A_RELAX_COND,A_PRED_BIT_12),"direct conditional jump if register greater or equal to zero",
+{fBRANCH_SPECULATE_STALL((RsV>=0), , SPECULATE_NOT_TAKEN,12,0) if (RsV>=0) { fBRANCH(fREAD_PC()+riV,COF_TYPE_JUMP);}})
+
+Q6INSN(J2_jumprgtezpt,"if (Rs32>=#0) jump:t #r13:2",ATTRIBS(A_NOTE_DEPRECATED,A_CJNEWDIR,A_ARCHV3,A_RELAX_COND,A_PRED_BIT_12),"direct conditional jump if register greater or equal to zero",
+{fBRANCH_SPECULATE_STALL((RsV>=0), , SPECULATE_TAKEN,12,0) if (RsV>=0) { fBRANCH(fREAD_PC()+riV,COF_TYPE_JUMP);}})
+
+Q6INSN(J2_jumprltez,"if (Rs32<=#0) jump:nt #r13:2",ATTRIBS(A_NOTE_DEPRECATED,A_CJNEWDIR,A_ARCHV3,A_RELAX_COND,A_PRED_BIT_12),"direct conditional jump if register less than or equal to zero",
+{fBRANCH_SPECULATE_STALL((RsV<=0), , SPECULATE_NOT_TAKEN,12,0) if (RsV<=0) { fBRANCH(fREAD_PC()+riV,COF_TYPE_JUMP);}})
+
+Q6INSN(J2_jumprltezpt,"if (Rs32<=#0) jump:t #r13:2",ATTRIBS(A_NOTE_DEPRECATED,A_CJNEWDIR,A_ARCHV3,A_RELAX_COND,A_PRED_BIT_12),"direct conditional jump if register less than or equal to zero",
+{fBRANCH_SPECULATE_STALL((RsV<=0), , SPECULATE_TAKEN,12,0) if (RsV<=0) { fBRANCH(fREAD_PC()+riV,COF_TYPE_JUMP);}})
+
+
+
+/*********************************************/
+/* V4 Compound Compare-Jumps		 */
+/*********************************************/
+
+
+/* V4 compound compare jumps (CJ) */
+#define STD_CMPJUMP(TAG,TST,TSTSEM)\
+Q6INSN(J4_##TAG##_tp0_jump_nt, "p0="TST"; if (p0.new) jump:nt #r9:2", ATTRIBS(A_CJNEWDIR,A_RELAX_COND,A_NEWCMPJUMP,A_PRED_BIT_13),"compound compare-jump", {fPART1(fWRITE_P0(f8BITSOF(TSTSEM)))  fBRANCH_SPECULATE_STALL(fLSBNEW0,,SPECULATE_NOT_TAKEN,13,0)  if (fLSBNEW0) {fIMMEXT(riV); fPCALIGN(riV); fBRANCH(fREAD_PC()+riV,COF_TYPE_JUMP);}})\
+Q6INSN(J4_##TAG##_fp0_jump_nt, "p0="TST"; if (!p0.new) jump:nt #r9:2", ATTRIBS(A_CJNEWDIR,A_RELAX_COND,A_NEWCMPJUMP,A_PRED_BIT_13),"compound compare-jump",{fPART1(fWRITE_P0(f8BITSOF(TSTSEM)))  fBRANCH_SPECULATE_STALL(fLSBNEW0NOT,,SPECULATE_NOT_TAKEN,13,0) if (fLSBNEW0NOT) {fIMMEXT(riV); fPCALIGN(riV); fBRANCH(fREAD_PC()+riV,COF_TYPE_JUMP);}})\
+Q6INSN(J4_##TAG##_tp0_jump_t,  "p0="TST"; if (p0.new) jump:t #r9:2", ATTRIBS(A_CJNEWDIR,A_RELAX_COND,A_NEWCMPJUMP,A_PRED_BIT_13),"compound compare-jump",  {fPART1(fWRITE_P0(f8BITSOF(TSTSEM)))  fBRANCH_SPECULATE_STALL(fLSBNEW0,,SPECULATE_TAKEN,13,0)      if (fLSBNEW0) {fIMMEXT(riV); fPCALIGN(riV); fBRANCH(fREAD_PC()+riV,COF_TYPE_JUMP);}})\
+Q6INSN(J4_##TAG##_fp0_jump_t,  "p0="TST"; if (!p0.new) jump:t #r9:2", ATTRIBS(A_CJNEWDIR,A_RELAX_COND,A_NEWCMPJUMP,A_PRED_BIT_13),"compound compare-jump", {fPART1(fWRITE_P0(f8BITSOF(TSTSEM)))  fBRANCH_SPECULATE_STALL(fLSBNEW0NOT,,SPECULATE_TAKEN,13,0)     if (fLSBNEW0NOT) {fIMMEXT(riV); fPCALIGN(riV); fBRANCH(fREAD_PC()+riV,COF_TYPE_JUMP);}})\
+Q6INSN(J4_##TAG##_tp1_jump_nt, "p1="TST"; if (p1.new) jump:nt #r9:2", ATTRIBS(A_CJNEWDIR,A_RELAX_COND,A_NEWCMPJUMP,A_PRED_BIT_13),"compound compare-jump", {fPART1(fWRITE_P1(f8BITSOF(TSTSEM)))  fBRANCH_SPECULATE_STALL(fLSBNEW1,,SPECULATE_NOT_TAKEN,13,0)  if (fLSBNEW1) {fIMMEXT(riV); fPCALIGN(riV); fBRANCH(fREAD_PC()+riV,COF_TYPE_JUMP);}})\
+Q6INSN(J4_##TAG##_fp1_jump_nt, "p1="TST"; if (!p1.new) jump:nt #r9:2", ATTRIBS(A_CJNEWDIR,A_RELAX_COND,A_NEWCMPJUMP,A_PRED_BIT_13),"compound compare-jump",{fPART1(fWRITE_P1(f8BITSOF(TSTSEM)))  fBRANCH_SPECULATE_STALL(fLSBNEW1NOT,,SPECULATE_NOT_TAKEN,13,0) if (fLSBNEW1NOT) {fIMMEXT(riV); fPCALIGN(riV); fBRANCH(fREAD_PC()+riV,COF_TYPE_JUMP);}})\
+Q6INSN(J4_##TAG##_tp1_jump_t,  "p1="TST"; if (p1.new) jump:t #r9:2", ATTRIBS(A_CJNEWDIR,A_RELAX_COND,A_NEWCMPJUMP,A_PRED_BIT_13),"compound compare-jump",  {fPART1(fWRITE_P1(f8BITSOF(TSTSEM)))  fBRANCH_SPECULATE_STALL(fLSBNEW1,,SPECULATE_TAKEN,13,0)      if (fLSBNEW1) {fIMMEXT(riV); fPCALIGN(riV); fBRANCH(fREAD_PC()+riV,COF_TYPE_JUMP);}})\
+Q6INSN(J4_##TAG##_fp1_jump_t,  "p1="TST"; if (!p1.new) jump:t #r9:2", ATTRIBS(A_CJNEWDIR,A_RELAX_COND,A_NEWCMPJUMP,A_PRED_BIT_13),"compound compare-jump", {fPART1(fWRITE_P1(f8BITSOF(TSTSEM)))  fBRANCH_SPECULATE_STALL(fLSBNEW1NOT,,SPECULATE_TAKEN,13,0)     if (fLSBNEW1NOT) {fIMMEXT(riV); fPCALIGN(riV); fBRANCH(fREAD_PC()+riV,COF_TYPE_JUMP);}})
+
+
+STD_CMPJUMP(cmpeqi,"cmp.eq(Rs16,#U5)",(RsV==UiV))
+STD_CMPJUMP(cmpgti,"cmp.gt(Rs16,#U5)",(RsV>UiV))
+STD_CMPJUMP(cmpgtui,"cmp.gtu(Rs16,#U5)",(fCAST4u(RsV)>UiV))
+
+STD_CMPJUMP(cmpeqn1,"cmp.eq(Rs16,#-1)",(RsV==-1))
+STD_CMPJUMP(cmpgtn1,"cmp.gt(Rs16,#-1)",(RsV>-1))
+STD_CMPJUMP(tstbit0,"tstbit(Rs16,#0)",(RsV & 1))
+
+STD_CMPJUMP(cmpeq,"cmp.eq(Rs16,Rt16)",(RsV==RtV))
+STD_CMPJUMP(cmpgt,"cmp.gt(Rs16,Rt16)",(RsV>RtV))
+STD_CMPJUMP(cmpgtu,"cmp.gtu(Rs16,Rt16)",(fCAST4u(RsV)>RtV))
+
+
+
+/* V4 jump and transfer (CJ) */
+Q6INSN(J4_jumpseti,"Rd16=#U6 ; jump #r9:2",ATTRIBS(A_JDIR,A_RELAX_UNC), "direct unconditional jump and set register to immediate",
+{fIMMEXT(riV); fPCALIGN(riV); RdV=UiV; fBRANCH(fREAD_PC()+riV,COF_TYPE_JUMP);})
+
+Q6INSN(J4_jumpsetr,"Rd16=Rs16 ; jump #r9:2",ATTRIBS(A_JDIR,A_RELAX_UNC), "direct unconditional jump and transfer register",
+{fIMMEXT(riV); fPCALIGN(riV); RdV=RsV; fBRANCH(fREAD_PC()+riV,COF_TYPE_JUMP);})
+
+
+/* V4 new-value jumps (NCJ) */
+#define STD_CMPJUMPNEWRS(TAG,TST,TSTSEM)\
+Q6INSN(J4_##TAG##_jumpnv_t, "if ("TST") jump:t #r9:2", ATTRIBS(A_NEWVALUEJ,A_RESTRICT_NOSLOT1_STORE,A_RESTRICT_SLOT0ONLY,A_PRED_BIT_13),"compound compare-jump",{fBRANCH_SPECULATE_STALL(TSTSEM,,SPECULATE_TAKEN,13,0);if (TSTSEM) {fIMMEXT(riV); fPCALIGN(riV); fBRANCH(fREAD_PC()+riV,COF_TYPE_JUMP);}})\
+Q6INSN(J4_##TAG##_jumpnv_nt,"if ("TST") jump:nt #r9:2",ATTRIBS(A_NEWVALUEJ,A_RESTRICT_NOSLOT1_STORE,A_RESTRICT_SLOT0ONLY,A_PRED_BIT_13),"compound compare-jump",{fBRANCH_SPECULATE_STALL(TSTSEM,,SPECULATE_NOT_TAKEN,13,0); if (TSTSEM) {fIMMEXT(riV); fPCALIGN(riV); fBRANCH(fREAD_PC()+riV,COF_TYPE_JUMP);}})
+
+
+
+
+STD_CMPJUMPNEWRS(cmpeqi_t,"cmp.eq(Ns8.new,#U5)",(fNEWREG(NsN)==(UiV)))
+STD_CMPJUMPNEWRS(cmpeqi_f,"!cmp.eq(Ns8.new,#U5)",(fNEWREG(NsN)!=(UiV)))
+STD_CMPJUMPNEWRS(cmpgti_t,"cmp.gt(Ns8.new,#U5)",(fNEWREG(NsN)>(UiV)))
+STD_CMPJUMPNEWRS(cmpgti_f,"!cmp.gt(Ns8.new,#U5)",!(fNEWREG(NsN)>(UiV)))
+STD_CMPJUMPNEWRS(cmpgtui_t,"cmp.gtu(Ns8.new,#U5)",(fCAST4u(fNEWREG(NsN))>(UiV)))
+STD_CMPJUMPNEWRS(cmpgtui_f,"!cmp.gtu(Ns8.new,#U5)",!(fCAST4u(fNEWREG(NsN))>(UiV)))
+
+
+STD_CMPJUMPNEWRS(cmpeqn1_t,"cmp.eq(Ns8.new,#-1)",(fNEWREG(NsN)==(-1)))
+STD_CMPJUMPNEWRS(cmpeqn1_f,"!cmp.eq(Ns8.new,#-1)",(fNEWREG(NsN)!=(-1)))
+STD_CMPJUMPNEWRS(cmpgtn1_t,"cmp.gt(Ns8.new,#-1)",(fNEWREG(NsN)>(-1)))
+STD_CMPJUMPNEWRS(cmpgtn1_f,"!cmp.gt(Ns8.new,#-1)",!(fNEWREG(NsN)>(-1)))
+STD_CMPJUMPNEWRS(tstbit0_t,"tstbit(Ns8.new,#0)",((fNEWREG(NsN)) & 1))
+STD_CMPJUMPNEWRS(tstbit0_f,"!tstbit(Ns8.new,#0)",!((fNEWREG(NsN)) & 1))
+
+
+STD_CMPJUMPNEWRS(cmpeq_t, "cmp.eq(Ns8.new,Rt32)", (fNEWREG(NsN)==RtV))
+STD_CMPJUMPNEWRS(cmpgt_t, "cmp.gt(Ns8.new,Rt32)", (fNEWREG(NsN)>RtV))
+STD_CMPJUMPNEWRS(cmpgtu_t,"cmp.gtu(Ns8.new,Rt32)",(fCAST4u(fNEWREG(NsN))>fCAST4u(RtV)))
+STD_CMPJUMPNEWRS(cmplt_t, "cmp.gt(Rt32,Ns8.new)", (RtV>fNEWREG(NsN)))
+STD_CMPJUMPNEWRS(cmpltu_t,"cmp.gtu(Rt32,Ns8.new)",(fCAST4u(RtV)>fCAST4u(fNEWREG(NsN))))
+STD_CMPJUMPNEWRS(cmpeq_f, "!cmp.eq(Ns8.new,Rt32)", (fNEWREG(NsN)!=RtV))
+STD_CMPJUMPNEWRS(cmpgt_f, "!cmp.gt(Ns8.new,Rt32)", !(fNEWREG(NsN)>RtV))
+STD_CMPJUMPNEWRS(cmpgtu_f,"!cmp.gtu(Ns8.new,Rt32)",!(fCAST4u(fNEWREG(NsN))>fCAST4u(RtV)))
+STD_CMPJUMPNEWRS(cmplt_f, "!cmp.gt(Rt32,Ns8.new)", !(RtV>fNEWREG(NsN)))
+STD_CMPJUMPNEWRS(cmpltu_f,"!cmp.gtu(Rt32,Ns8.new)",!(fCAST4u(RtV)>fCAST4u(fNEWREG(NsN))))
+
+
+
+
+
+/*********************************************/
+/* Subroutine Call instructions	      */
+/*********************************************/
+
+#define CALL_NOTES A_NOTE_PACKET_PC,A_NOTE_PACKET_NPC,A_NOTE_RELATIVE_ADDRESS
+#define CDIR_STD   A_CALL,A_DIRECT,CALL_NOTES,A_COFSTD,A_BRANCHADDER
+#define CINDIR_STD A_CALL,A_INDIRECT,A_COFSTD
+
+Q6INSN(J2_call,"call #r22:2",ATTRIBS(CDIR_STD,A_RELAX_UNC), "direct unconditional call",
+{fIMMEXT(riV); fPCALIGN(riV); fCALL(fREAD_PC()+riV); })
+
+Q6INSN(J2_callt,"if (Pu4) call #r15:2",ATTRIBS(CDIR_STD,A_NOTE_CONDITIONAL,A_RELAX_COND,A_PRED_BIT_12),"direct conditional call if true",
+{fIMMEXT(riV); fPCALIGN(riV); fBRANCH_SPECULATE_STALL(fLSBOLD(PuV),,SPECULATE_NOT_TAKEN,12,0); if (fLSBOLD(PuV)) { fCALL(fREAD_PC()+riV); }})
+
+Q6INSN(J2_callf,"if (!Pu4) call #r15:2",ATTRIBS(CDIR_STD,A_NOTE_CONDITIONAL,A_RELAX_COND,A_PRED_BIT_12),"direct conditional call if false",
+{fIMMEXT(riV); fPCALIGN(riV); fBRANCH_SPECULATE_STALL(fLSBOLDNOT(PuV),,SPECULATE_NOT_TAKEN,12,0);if (fLSBOLDNOT(PuV)) { fCALL(fREAD_PC()+riV); }})
+
+Q6INSN(J2_callr,"callr Rs32",ATTRIBS(CINDIR_STD), "indirect unconditional call",
+{ fCALLR(RsV); })
+
+Q6INSN(J2_callrt,"if (Pu4) callr Rs32",ATTRIBS(CINDIR_STD,A_NOTE_CONDITIONAL,A_PRED_BIT_12),"indirect conditional call if true",
+{fBRANCH_SPECULATE_STALL(fLSBOLD(PuV),,SPECULATE_NOT_TAKEN,12,0);if (fLSBOLD(PuV)) { fCALLR(RsV); }})
+
+Q6INSN(J2_callrf,"if (!Pu4) callr Rs32",ATTRIBS(CINDIR_STD,A_NOTE_CONDITIONAL,A_PRED_BIT_12),"indirect conditional call if false",
+{fBRANCH_SPECULATE_STALL(fLSBOLDNOT(PuV),,SPECULATE_NOT_TAKEN,12,0);if (fLSBOLDNOT(PuV)) { fCALLR(RsV); }})
+
+
+
+
+#define ATTRIB_HWLOOP_SETUP A_NOTE_PACKET_NPC,A_NOTE_RELATIVE_ADDRESS,A_NOTE_PACKET_PC,A_NOTE_LA_RESTRICT,A_RESTRICT_LOOP_LA,A_BRANCHADDER,A_IT_HWLOOP,A_RELAX_COF_1ST,A_RELAX_COF_2ND
+#define ATTRIB_HWLOOP A_NOTE_PACKET_NPC,A_NOTE_PACKET_PC,A_RESTRICT_NOCOF,A_NOTE_NOCOF_RESTRICT,A_ROPS_3
+
+/*********************************************/
+/* HW Loop instructions		      */
+/*********************************************/
+
+Q6INSN(J2_loop0r,"loop0(#r7:2,Rs32)",ATTRIBS(ATTRIB_HWLOOP_SETUP,A_HWLOOP0_SETUP),"Initialize HW loop 0",
+{ fIMMEXT(riV); fPCALIGN(riV);
+  fWRITE_LOOP_REGS0(/*sa,lc*/ fREAD_PC()+riV, RsV);
+  fSET_LPCFG(0);
+})
+
+Q6INSN(J2_loop1r,"loop1(#r7:2,Rs32)",ATTRIBS(ATTRIB_HWLOOP_SETUP,A_HWLOOP1_SETUP),"Initialize HW loop 1",
+{ fIMMEXT(riV); fPCALIGN(riV);
+  fWRITE_LOOP_REGS1(/*sa,lc*/ fREAD_PC()+riV, RsV);
+})
+
+Q6INSN(J2_loop0i,"loop0(#r7:2,#U10)",ATTRIBS(ATTRIB_HWLOOP_SETUP,A_HWLOOP0_SETUP),"Initialize HW loop 0",
+{ fIMMEXT(riV); fPCALIGN(riV);
+  fWRITE_LOOP_REGS0(/*sa,lc*/ fREAD_PC()+riV, UiV);
+  fSET_LPCFG(0);
+})
+
+Q6INSN(J2_loop1i,"loop1(#r7:2,#U10)",ATTRIBS(ATTRIB_HWLOOP_SETUP,A_HWLOOP1_SETUP),"Initialize HW loop 1",
+{ fIMMEXT(riV); fPCALIGN(riV);
+  fWRITE_LOOP_REGS1(/*sa,lc*/ fREAD_PC()+riV, UiV);
+})
+
+
+Q6INSN(J2_ploop1sr,"p3=sp1loop0(#r7:2,Rs32)",ATTRIBS(ATTRIB_HWLOOP_SETUP,A_ARCHV2,A_RESTRICT_LATEPRED,A_NOTE_LATEPRED,A_HWLOOP0_SETUP),"Initialize HW loop 0",
+{ fIMMEXT(riV); fPCALIGN(riV);
+  fWRITE_LOOP_REGS0(/*sa,lc*/ fREAD_PC()+riV, RsV);
+  fSET_LPCFG(1);
+  fWRITE_P3_LATE(0);
+})
+Q6INSN(J2_ploop1si,"p3=sp1loop0(#r7:2,#U10)",ATTRIBS(ATTRIB_HWLOOP_SETUP,A_ARCHV2,A_RESTRICT_LATEPRED,A_NOTE_LATEPRED,A_HWLOOP0_SETUP),"Initialize HW loop 0",
+{ fIMMEXT(riV); fPCALIGN(riV);
+  fWRITE_LOOP_REGS0(/*sa,lc*/ fREAD_PC()+riV, UiV);
+  fSET_LPCFG(1);
+  fWRITE_P3_LATE(0);
+})
+
+Q6INSN(J2_ploop2sr,"p3=sp2loop0(#r7:2,Rs32)",ATTRIBS(ATTRIB_HWLOOP_SETUP,A_ARCHV2,A_RESTRICT_LATEPRED,A_NOTE_LATEPRED,A_HWLOOP0_SETUP),"Initialize HW loop 0",
+{ fIMMEXT(riV); fPCALIGN(riV);
+  fWRITE_LOOP_REGS0(/*sa,lc*/ fREAD_PC()+riV, RsV);
+  fSET_LPCFG(2);
+  fWRITE_P3_LATE(0);
+})
+Q6INSN(J2_ploop2si,"p3=sp2loop0(#r7:2,#U10)",ATTRIBS(ATTRIB_HWLOOP_SETUP,A_ARCHV2,A_RESTRICT_LATEPRED,A_NOTE_LATEPRED,A_HWLOOP0_SETUP),"Initialize HW loop 0",
+{ fIMMEXT(riV); fPCALIGN(riV);
+  fWRITE_LOOP_REGS0(/*sa,lc*/ fREAD_PC()+riV, UiV);
+  fSET_LPCFG(2);
+  fWRITE_P3_LATE(0);
+})
+
+Q6INSN(J2_ploop3sr,"p3=sp3loop0(#r7:2,Rs32)",ATTRIBS(ATTRIB_HWLOOP_SETUP,A_ARCHV2,A_RESTRICT_LATEPRED,A_NOTE_LATEPRED,A_HWLOOP0_SETUP),"Initialize HW loop 0",
+{ fIMMEXT(riV); fPCALIGN(riV);
+  fWRITE_LOOP_REGS0(/*sa,lc*/ fREAD_PC()+riV, RsV);
+  fSET_LPCFG(3);
+  fWRITE_P3_LATE(0);
+})
+Q6INSN(J2_ploop3si,"p3=sp3loop0(#r7:2,#U10)",ATTRIBS(ATTRIB_HWLOOP_SETUP,A_ARCHV2,A_RESTRICT_LATEPRED,A_NOTE_LATEPRED,A_HWLOOP0_SETUP),"Initialize HW loop 0",
+{ fIMMEXT(riV); fPCALIGN(riV);
+  fWRITE_LOOP_REGS0(/*sa,lc*/ fREAD_PC()+riV, UiV);
+  fSET_LPCFG(3);
+  fWRITE_P3_LATE(0);
+})
+
+
+
+Q6INSN(J2_endloop01,"endloop01",ATTRIBS(ATTRIB_HWLOOP,A_HWLOOP0_END,A_HWLOOP1_END,A_RESTRICT_LATEPRED),"Loopend for inner and outer loop",
+{
+
+  /* V2: With predicate control */
+  if (fGET_LPCFG) {
+    fHIDE( if (fGET_LPCFG >= 2) { fWRITE_P3(fNOATTRIB_READ_P3()); } else )
+    if (fGET_LPCFG==1) {
+       fWRITE_P3(0xff);
+    }
+    fSET_LPCFG(fGET_LPCFG-1);
+    fHIDE(MARK_LATE_PRED_WRITE(3))
+  }
+
+  /* check if iterate */
+  if (fREAD_LC0>1) {
+    fBRANCH(fREAD_SA0,COF_TYPE_LOOPEND0);
+    fHIDE(fLOOPSTATS(fREAD_SA0);)
+    /* decrement loop count */
+    fWRITE_LC0(fREAD_LC0-1);
+  } else {
+    /* check if iterate */
+    if (fREAD_LC1>1) {
+      fBRANCH(fREAD_SA1,COF_TYPE_LOOPEND1);
+      fHIDE(fLOOPSTATS(fREAD_SA1);)
+      /* decrement loop count */
+      fWRITE_LC1(fREAD_LC1-1);
+    }
+  }
+
+})
+
+Q6INSN(J2_endloop0,"endloop0",ATTRIBS(ATTRIB_HWLOOP,A_HWLOOP0_END,A_RESTRICT_LATEPRED),"Loopend for inner loop",
+{
+
+  /* V2: With predicate control */
+  if (fGET_LPCFG) {
+    fHIDE( if (fGET_LPCFG >= 2) { fWRITE_P3(fNOATTRIB_READ_P3()); } else )
+    if (fGET_LPCFG==1) {
+       fWRITE_P3(0xff);
+    }
+    fHIDE(MARK_LATE_PRED_WRITE(3))
+    fSET_LPCFG(fGET_LPCFG-1);
+  }
+
+  /* check if iterate */
+  if (fREAD_LC0>1) {
+    fBRANCH(fREAD_SA0,COF_TYPE_LOOPEND0);
+    fHIDE(fLOOPSTATS(fREAD_SA0);)
+    /* decrement loop count */
+    fWRITE_LC0(fREAD_LC0-1);
+  }
+})
+
+Q6INSN(J2_endloop1,"endloop1",ATTRIBS(ATTRIB_HWLOOP,A_HWLOOP1_END),"Loopend for outer loop",
+{
+  /* check if iterate */
+  if (fREAD_LC1>1) {
+    fBRANCH(fREAD_SA1,COF_TYPE_LOOPEND1);
+    fHIDE(fLOOPSTATS(fREAD_SA1);)
+    /* decrement loop count */
+    fWRITE_LC1(fREAD_LC1-1);
+  }
+})
+
+
diff --git a/target/hexagon/imported/compare.idef b/target/hexagon/imported/compare.idef
new file mode 100644
index 0000000..0f05307
--- /dev/null
+++ b/target/hexagon/imported/compare.idef
@@ -0,0 +1,639 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+/*
+ * Compare Instructions
+ */
+
+
+
+/*********************************************/
+/* Scalar compare instructions	       */
+/*********************************************/
+
+Q6INSN(C2_cmpeq,"Pd4=cmp.eq(Rs32,Rt32)",ATTRIBS(),
+"Compare for Equal",
+{PdV=f8BITSOF(RsV==RtV);})
+
+Q6INSN(C2_cmpgt,"Pd4=cmp.gt(Rs32,Rt32)",ATTRIBS(),
+"Compare for signed Greater Than",
+{PdV=f8BITSOF(RsV>RtV);})
+
+Q6INSN(C2_cmpgtu,"Pd4=cmp.gtu(Rs32,Rt32)",ATTRIBS(),
+"Compare for Greater Than Unsigned",
+{PdV=f8BITSOF(fCAST4u(RsV)>fCAST4u(RtV));})
+
+Q6INSN(C2_cmpeqp,"Pd4=cmp.eq(Rss32,Rtt32)",ATTRIBS(),
+"Compare for Equal",
+{PdV=f8BITSOF(RssV==RttV);})
+
+Q6INSN(C2_cmpgtp,"Pd4=cmp.gt(Rss32,Rtt32)",ATTRIBS(),
+"Compare for signed Greater Than",
+{PdV=f8BITSOF(RssV>RttV);})
+
+Q6INSN(C2_cmpgtup,"Pd4=cmp.gtu(Rss32,Rtt32)",ATTRIBS(),
+"Compare for Greater Than Unsigned",
+{PdV=f8BITSOF(fCAST8u(RssV)>fCAST8u(RttV));})
+
+
+
+
+/*********************************************/
+/* Compare and put result in GPR	     */
+/*  typically for function I/O	       */
+/*********************************************/
+
+Q6INSN(A4_rcmpeqi,"Rd32=cmp.eq(Rs32,#s8)",ATTRIBS(),
+"Compare for Equal",
+{fIMMEXT(siV); RdV=(RsV==siV); })
+
+Q6INSN(A4_rcmpneqi,"Rd32=!cmp.eq(Rs32,#s8)",ATTRIBS(),
+"Compare for Equal",
+{fIMMEXT(siV); RdV=(RsV!=siV); })
+
+
+Q6INSN(A4_rcmpeq,"Rd32=cmp.eq(Rs32,Rt32)",ATTRIBS(),
+"Compare for Equal",
+{RdV=(RsV==RtV); })
+
+Q6INSN(A4_rcmpneq,"Rd32=!cmp.eq(Rs32,Rt32)",ATTRIBS(),
+"Compare for Equal",
+{RdV=(RsV!=RtV); })
+
+
+
+/*********************************************/
+/* Scalar compare instructions	       */
+/*********************************************/
+
+
+Q6INSN(C2_bitsset,"Pd4=bitsset(Rs32,Rt32)",ATTRIBS(A_ARCHV2),
+"Compare for selected bits set",
+{PdV=f8BITSOF((RsV&RtV)==RtV);})
+
+Q6INSN(C2_bitsclr,"Pd4=bitsclr(Rs32,Rt32)",ATTRIBS(A_ARCHV2),
+"Compare for selected bits clear",
+{PdV=f8BITSOF((RsV&RtV)==0);})
+
+
+Q6INSN(C4_nbitsset,"Pd4=!bitsset(Rs32,Rt32)",ATTRIBS(A_ARCHV2),
+"Compare for selected bits set",
+{PdV=f8BITSOF((RsV&RtV)!=RtV);})
+
+Q6INSN(C4_nbitsclr,"Pd4=!bitsclr(Rs32,Rt32)",ATTRIBS(A_ARCHV2),
+"Compare for selected bits clear",
+{PdV=f8BITSOF((RsV&RtV)!=0);})
+
+
+
+/*********************************************/
+/* Scalar compare instructions W/ immediate  */
+/*********************************************/
+
+Q6INSN(C2_cmpeqi,"Pd4=cmp.eq(Rs32,#s10)",ATTRIBS(),
+"Compare for Equal",
+{fIMMEXT(siV); PdV=f8BITSOF(RsV==siV);})
+
+Q6INSN(C2_cmpgti,"Pd4=cmp.gt(Rs32,#s10)",ATTRIBS(),
+"Compare for signed Greater Than",
+{fIMMEXT(siV); PdV=f8BITSOF(RsV>siV);})
+
+Q6INSN(C2_cmpgtui,"Pd4=cmp.gtu(Rs32,#u9)",ATTRIBS(),
+"Compare for Greater Than Unsigned",
+{fIMMEXT(uiV); PdV=f8BITSOF(fCAST4u(RsV)>fCAST4u(uiV));})
+
+DEF_MAPPING(C2_cmpgei,"Pd4=cmp.ge(Rs32,#s8)","Pd4=cmp.gt(Rs32,#s8-1)")
+DEF_COND_MAPPING(C2_cmpgeui,"Pd4=cmp.geu(Rs32,#u8)","#u8==0","Pd4=cmp.eq(Rs32,Rs32)","Pd4=cmp.gtu(Rs32,#u8-1)")
+
+/* Just for fun */
+DEF_V2_MAPPING(C2_cmplt,"Pd4=cmp.lt(Rs32,Rt32)","Pd4=cmp.gt(Rt32,Rs32)")
+DEF_V2_MAPPING(C2_cmpltu,"Pd4=cmp.ltu(Rs32,Rt32)","Pd4=cmp.gtu(Rt32,Rs32)")
+
+
+Q6INSN(C2_bitsclri,"Pd4=bitsclr(Rs32,#u6)",ATTRIBS(A_ARCHV2),
+"Compare for selected bits clear",
+{PdV=f8BITSOF((RsV&uiV)==0);})
+
+Q6INSN(C4_nbitsclri,"Pd4=!bitsclr(Rs32,#u6)",ATTRIBS(A_ARCHV2),
+"Compare for selected bits clear",
+{PdV=f8BITSOF((RsV&uiV)!=0);})
+
+
+
+
+Q6INSN(C4_cmpneqi,"Pd4=!cmp.eq(Rs32,#s10)",ATTRIBS(), "Compare for Not Equal", {fIMMEXT(siV); PdV=f8BITSOF(RsV!=siV);})
+Q6INSN(C4_cmpltei,"Pd4=!cmp.gt(Rs32,#s10)",ATTRIBS(), "Compare for Less Than or Equal", {fIMMEXT(siV); PdV=f8BITSOF(RsV<=siV);})
+Q6INSN(C4_cmplteui,"Pd4=!cmp.gtu(Rs32,#u9)",ATTRIBS(), "Compare for Less Than or Equal Unsigned", {fIMMEXT(uiV); PdV=f8BITSOF(fCAST4u(RsV)<=fCAST4u(uiV));})
+
+Q6INSN(C4_cmpneq,"Pd4=!cmp.eq(Rs32,Rt32)",ATTRIBS(), "And-Compare for Equal", {PdV=f8BITSOF(RsV!=RtV);})
+Q6INSN(C4_cmplte,"Pd4=!cmp.gt(Rs32,Rt32)",ATTRIBS(), "And-Compare for signed Greater Than", {PdV=f8BITSOF(RsV<=RtV);})
+Q6INSN(C4_cmplteu,"Pd4=!cmp.gtu(Rs32,Rt32)",ATTRIBS(), "And-Compare for Greater Than Unsigned", {PdV=f8BITSOF(fCAST4u(RsV)<=fCAST4u(RtV));})
+
+
+
+
+
+/* Predicate Logical Operations */
+
+Q6INSN(C2_and,"Pd4=and(Pt4,Ps4)",ATTRIBS(A_CRSLOT23,A_NOTE_CRSLOT23),
+"Predicate AND",
+{fPREDUSE_TIMING();PdV=PsV & PtV;})
+
+Q6INSN(C2_or,"Pd4=or(Pt4,Ps4)",ATTRIBS(A_CRSLOT23,A_NOTE_CRSLOT23),
+"Predicate OR",
+{fPREDUSE_TIMING();PdV=PsV | PtV;})
+
+Q6INSN(C2_xor,"Pd4=xor(Ps4,Pt4)",ATTRIBS(A_CRSLOT23,A_NOTE_CRSLOT23),
+"Predicate XOR",
+{fPREDUSE_TIMING();PdV=PsV ^ PtV;})
+
+Q6INSN(C2_andn,"Pd4=and(Pt4,!Ps4)",ATTRIBS(A_CRSLOT23,A_NOTE_CRSLOT23),
+"Predicate AND NOT",
+{fPREDUSE_TIMING();PdV=PtV & (~PsV);})
+
+Q6INSN(C2_not,"Pd4=not(Ps4)",ATTRIBS(A_CRSLOT23,A_NOTE_CRSLOT23),
+"Logical NOT Predicate",
+{fPREDUSE_TIMING();PdV=~PsV;})
+
+Q6INSN(C2_orn,"Pd4=or(Pt4,!Ps4)",ATTRIBS(A_ARCHV2,A_CRSLOT23,A_NOTE_CRSLOT23),
+"Predicate OR NOT",
+{fPREDUSE_TIMING();PdV=PtV | (~PsV);})
+
+
+
+
+
+Q6INSN(C4_and_and,"Pd4=and(Ps4,and(Pt4,Pu4))",ATTRIBS(A_CRSLOT23,A_NOTE_CRSLOT23),
+"Compound And-And", { fPREDUSE_TIMING();PdV = PsV & PtV & PuV; })
+
+Q6INSN(C4_and_or,"Pd4=and(Ps4,or(Pt4,Pu4))",ATTRIBS(A_CRSLOT23,A_NOTE_CRSLOT23),
+"Compound And-Or", { fPREDUSE_TIMING();PdV = PsV &  (PtV | PuV); })
+
+Q6INSN(C4_or_and,"Pd4=or(Ps4,and(Pt4,Pu4))",ATTRIBS(A_CRSLOT23,A_NOTE_CRSLOT23),
+"Compound Or-And", { fPREDUSE_TIMING();PdV = PsV | (PtV & PuV); })
+
+Q6INSN(C4_or_or,"Pd4=or(Ps4,or(Pt4,Pu4))",ATTRIBS(A_CRSLOT23,A_NOTE_CRSLOT23),
+"Compound Or-Or", { fPREDUSE_TIMING();PdV = PsV | PtV | PuV; })
+
+
+
+Q6INSN(C4_and_andn,"Pd4=and(Ps4,and(Pt4,!Pu4))",ATTRIBS(A_CRSLOT23,A_NOTE_CRSLOT23),
+"Compound And-And", { fPREDUSE_TIMING();PdV = PsV & PtV & (~PuV); })
+
+Q6INSN(C4_and_orn,"Pd4=and(Ps4,or(Pt4,!Pu4))",ATTRIBS(A_CRSLOT23,A_NOTE_CRSLOT23),
+"Compound And-Or", { fPREDUSE_TIMING();PdV = PsV &  (PtV | (~PuV)); })
+
+Q6INSN(C4_or_andn,"Pd4=or(Ps4,and(Pt4,!Pu4))",ATTRIBS(A_CRSLOT23,A_NOTE_CRSLOT23),
+"Compound Or-And", { fPREDUSE_TIMING();PdV = PsV | (PtV & (~PuV)); })
+
+Q6INSN(C4_or_orn,"Pd4=or(Ps4,or(Pt4,!Pu4))",ATTRIBS(A_CRSLOT23,A_NOTE_CRSLOT23),
+"Compound Or-Or", { fPREDUSE_TIMING();PdV = PsV | PtV | (~PuV); })
+
+
+
+
+
+DEF_V2_MAPPING(C2_pxfer_map,"Pd4=Ps4","Pd4=or(Ps4,Ps4)")
+
+
+Q6INSN(C2_any8,"Pd4=any8(Ps4)",ATTRIBS(A_CRSLOT23,A_NOTE_CRSLOT23),
+"Logical ANY of low 8 predicate bits",
+{ fPREDUSE_TIMING();PsV ? (PdV=0xff) : (PdV=0x00); })
+
+Q6INSN(C2_all8,"Pd4=all8(Ps4)",ATTRIBS(A_CRSLOT23,A_NOTE_CRSLOT23),
+"Logical ALL of low 8 predicate bits",
+{ fPREDUSE_TIMING();(PsV==0xff) ? (PdV=0xff) : (PdV=0x00); })
+
+Q6INSN(C2_vitpack,"Rd32=vitpack(Ps4,Pt4)",ATTRIBS(),
+"Pack the odd and even bits of two predicate registers",
+{ fPREDUSE_TIMING();RdV = (PsV&0x55) | (PtV&0xAA); })
+
+/* Mux instructions */
+
+Q6INSN(C2_mux,"Rd32=mux(Pu4,Rs32,Rt32)",ATTRIBS(),
+"Scalar MUX",
+{ fPREDUSE_TIMING();(fLSBOLD(PuV)) ? (RdV=RsV):(RdV=RtV); })
+
+
+Q6INSN(C2_cmovenewit,"if (Pu4.new) Rd32=#s12",ATTRIBS(A_ARCHV2),
+"Scalar conditional move",
+{ fIMMEXT(siV); if (fLSBNEW(PuN)) RdV=siV; else CANCEL;})
+
+Q6INSN(C2_cmovenewif,"if (!Pu4.new) Rd32=#s12",ATTRIBS(A_ARCHV2),
+"Scalar conditional move",
+{ fIMMEXT(siV); if (fLSBNEWNOT(PuN)) RdV=siV; else CANCEL;})
+
+Q6INSN(C2_cmoveit,"if (Pu4) Rd32=#s12",ATTRIBS(A_ARCHV2),
+"Scalar conditional move",
+{ fIMMEXT(siV); if (fLSBOLD(PuV)) RdV=siV; else CANCEL;})
+
+Q6INSN(C2_cmoveif,"if (!Pu4) Rd32=#s12",ATTRIBS(A_ARCHV2),
+"Scalar conditional move",
+{ fIMMEXT(siV); if (fLSBOLDNOT(PuV)) RdV=siV; else CANCEL;})
+
+
+
+Q6INSN(C2_ccombinewnewt,"if (Pu4.new) Rdd32=combine(Rs32,Rt32)",ATTRIBS(A_ROPS_2,A_ARCHV2),
+"Conditionally combine two words into a register pair",
+{ if (fLSBNEW(PuN)) {
+    fSETWORD(0,RddV,RtV);
+    fSETWORD(1,RddV,RsV);
+  } else {CANCEL;}
+})
+
+Q6INSN(C2_ccombinewnewf,"if (!Pu4.new) Rdd32=combine(Rs32,Rt32)",ATTRIBS(A_ARCHV2,A_ROPS_2),
+"Conditionally combine two words into a register pair",
+{ if (fLSBNEWNOT(PuN)) {
+    fSETWORD(0,RddV,RtV);
+    fSETWORD(1,RddV,RsV);
+  } else {CANCEL;}
+})
+
+Q6INSN(C2_ccombinewt,"if (Pu4) Rdd32=combine(Rs32,Rt32)",ATTRIBS(A_ARCHV2,A_ROPS_2),
+"Conditionally combine two words into a register pair",
+{ if (fLSBOLD(PuV)) {
+    fSETWORD(0,RddV,RtV);
+    fSETWORD(1,RddV,RsV);
+  } else {CANCEL;}
+})
+
+Q6INSN(C2_ccombinewf,"if (!Pu4) Rdd32=combine(Rs32,Rt32)",ATTRIBS(A_ARCHV2,A_ROPS_2),
+"Conditionally combine two words into a register pair",
+{ if (fLSBOLDNOT(PuV)) {
+    fSETWORD(0,RddV,RtV);
+    fSETWORD(1,RddV,RsV);
+  } else {CANCEL;}
+})
+
+
+
+Q6INSN(C2_muxii,"Rd32=mux(Pu4,#s8,#S8)",ATTRIBS(A_ARCHV2),
+"Scalar MUX immediates",
+{ fPREDUSE_TIMING();fIMMEXT(siV); (fLSBOLD(PuV)) ? (RdV=siV):(RdV=SiV); })
+
+
+
+Q6INSN(C2_muxir,"Rd32=mux(Pu4,Rs32,#s8)",ATTRIBS(A_ARCHV2),
+"Scalar MUX register immediate",
+{ fPREDUSE_TIMING();fIMMEXT(siV); (fLSBOLD(PuV)) ? (RdV=RsV):(RdV=siV); })
+
+
+Q6INSN(C2_muxri,"Rd32=mux(Pu4,#s8,Rs32)",ATTRIBS(A_ARCHV2),
+"Scalar MUX register immediate",
+{ fPREDUSE_TIMING();fIMMEXT(siV); (fLSBOLD(PuV)) ? (RdV=siV):(RdV=RsV); })
+
+
+
+Q6INSN(C2_vmux,"Rdd32=vmux(Pu4,Rss32,Rtt32)",ATTRIBS(),
+"Vector MUX",
+{   fPREDUSE_TIMING();
+	fHIDE(int i;)
+	for (i = 0; i < 8; i++) {
+		fSETBYTE(i,RddV,(fGETBIT(i,PuV)?(fGETBYTE(i,RssV)):(fGETBYTE(i,RttV))));
+	}
+})
+
+Q6INSN(C2_mask,"Rdd32=mask(Pt4)",ATTRIBS(),
+"Vector Mask Generation",
+{   fPREDUSE_TIMING();
+	fHIDE(int i;)
+	for (i = 0; i < 8; i++) {
+		fSETBYTE(i,RddV,(fGETBIT(i,PtV)?(0xff):(0x00)));
+	}
+})
+
+/* VCMP */
+
+Q6INSN(A2_vcmpbeq,"Pd4=vcmpb.eq(Rss32,Rtt32)",ATTRIBS(),
+"Compare elements of two vectors ",
+{
+	fHIDE(int i;)
+	for (i = 0; i < 8; i++) {
+		fSETBIT(i,PdV,(fGETBYTE(i,RssV) == fGETBYTE(i,RttV)));
+	}
+})
+
+Q6INSN(A4_vcmpbeqi,"Pd4=vcmpb.eq(Rss32,#u8)",ATTRIBS(),
+"Compare elements of two vectors ",
+{
+	fHIDE(int i;)
+	for (i = 0; i < 8; i++) {
+		fSETBIT(i,PdV,(fGETUBYTE(i,RssV) == uiV));
+	}
+})
+
+Q6INSN(A4_vcmpbeq_any,"Pd4=any8(vcmpb.eq(Rss32,Rtt32))",ATTRIBS(),
+"Compare elements of two vectors ",
+{
+	fHIDE(int i;)
+	PdV = 0;
+	for (i = 0; i < 8; i++) {
+		if (fGETBYTE(i,RssV) == fGETBYTE(i,RttV)) PdV = 0xff;
+	}
+})
+
+Q6INSN(A6_vcmpbeq_notany,"Pd4=!any8(vcmpb.eq(Rss32,Rtt32))",ATTRIBS(),
+"Compare elements of two vectors ",
+{
+	fHIDE(int i;)
+	PdV = 0;
+	for (i = 0; i < 8; i++) {
+		if (fGETBYTE(i,RssV) == fGETBYTE(i,RttV)) PdV = 0xff;
+	}
+	PdV = ~PdV;
+})
+
+Q6INSN(A2_vcmpbgtu,"Pd4=vcmpb.gtu(Rss32,Rtt32)",ATTRIBS(),
+"Compare elements of two vectors ",
+{
+	fHIDE(int i;)
+	for (i = 0; i < 8; i++) {
+		fSETBIT(i,PdV,(fGETUBYTE(i,RssV) > fGETUBYTE(i,RttV)));
+	}
+})
+
+Q6INSN(A4_vcmpbgtui,"Pd4=vcmpb.gtu(Rss32,#u7)",ATTRIBS(),
+"Compare elements of two vectors ",
+{
+	fHIDE(int i;)
+	for (i = 0; i < 8; i++) {
+		fSETBIT(i,PdV,(fGETUBYTE(i,RssV) > uiV));
+	}
+})
+
+Q6INSN(A4_vcmpbgt,"Pd4=vcmpb.gt(Rss32,Rtt32)",ATTRIBS(),
+"Compare elements of two vectors ",
+{
+	fHIDE(int i;)
+	for (i = 0; i < 8; i++) {
+		fSETBIT(i,PdV,(fGETBYTE(i,RssV) > fGETBYTE(i,RttV)));
+	}
+})
+
+Q6INSN(A4_vcmpbgti,"Pd4=vcmpb.gt(Rss32,#s8)",ATTRIBS(),
+"Compare elements of two vectors ",
+{
+	fHIDE(int i;)
+	for (i = 0; i < 8; i++) {
+		fSETBIT(i,PdV,(fGETBYTE(i,RssV) > siV));
+	}
+})
+
+
+
+Q6INSN(A4_cmpbeq,"Pd4=cmpb.eq(Rs32,Rt32)",ATTRIBS(),
+"Compare bytes ",
+{
+	PdV=f8BITSOF(fGETBYTE(0,RsV) == fGETBYTE(0,RtV));
+})
+
+Q6INSN(A4_cmpbeqi,"Pd4=cmpb.eq(Rs32,#u8)",ATTRIBS(),
+"Compare bytes ",
+{
+	PdV=f8BITSOF(fGETUBYTE(0,RsV) == uiV);
+})
+
+Q6INSN(A4_cmpbgtu,"Pd4=cmpb.gtu(Rs32,Rt32)",ATTRIBS(),
+"Compare bytes ",
+{
+	PdV=f8BITSOF(fGETUBYTE(0,RsV) > fGETUBYTE(0,RtV));
+})
+
+Q6INSN(A4_cmpbgtui,"Pd4=cmpb.gtu(Rs32,#u7)",ATTRIBS(),
+"Compare bytes ",
+{
+	fIMMEXT(uiV);
+	PdV=f8BITSOF(fGETUBYTE(0,RsV) > fCAST4u(uiV));
+})
+
+Q6INSN(A4_cmpbgt,"Pd4=cmpb.gt(Rs32,Rt32)",ATTRIBS(),
+"Compare bytes ",
+{
+	PdV=f8BITSOF(fGETBYTE(0,RsV) > fGETBYTE(0,RtV));
+})
+
+Q6INSN(A4_cmpbgti,"Pd4=cmpb.gt(Rs32,#s8)",ATTRIBS(),
+"Compare bytes ",
+{
+	PdV=f8BITSOF(fGETBYTE(0,RsV) > siV);
+})
+
+Q6INSN(A2_vcmpheq,"Pd4=vcmph.eq(Rss32,Rtt32)",ATTRIBS(),
+"Compare elements of two vectors ",
+{
+	fHIDE(int i;)
+	for (i = 0; i < 4; i++) {
+		fSETBIT(i*2,PdV,  (fGETHALF(i,RssV) == fGETHALF(i,RttV)));
+		fSETBIT(i*2+1,PdV,(fGETHALF(i,RssV) == fGETHALF(i,RttV)));
+	}
+})
+
+Q6INSN(A2_vcmphgt,"Pd4=vcmph.gt(Rss32,Rtt32)",ATTRIBS(),
+"Compare elements of two vectors ",
+{
+	fHIDE(int i;)
+	for (i = 0; i < 4; i++) {
+		fSETBIT(i*2,  PdV,  (fGETHALF(i,RssV) > fGETHALF(i,RttV)));
+		fSETBIT(i*2+1,PdV,  (fGETHALF(i,RssV) > fGETHALF(i,RttV)));
+	}
+})
+
+Q6INSN(A2_vcmphgtu,"Pd4=vcmph.gtu(Rss32,Rtt32)",ATTRIBS(),
+"Compare elements of two vectors ",
+{
+	fHIDE(int i;)
+	for (i = 0; i < 4; i++) {
+		fSETBIT(i*2,  PdV,  (fGETUHALF(i,RssV) > fGETUHALF(i,RttV)));
+		fSETBIT(i*2+1,PdV,  (fGETUHALF(i,RssV) > fGETUHALF(i,RttV)));
+	}
+})
+
+Q6INSN(A4_vcmpheqi,"Pd4=vcmph.eq(Rss32,#s8)",ATTRIBS(),
+"Compare elements of two vectors ",
+{
+	fHIDE(int i;)
+	for (i = 0; i < 4; i++) {
+		fSETBIT(i*2,PdV,  (fGETHALF(i,RssV) == siV));
+		fSETBIT(i*2+1,PdV,(fGETHALF(i,RssV) == siV));
+	}
+})
+
+Q6INSN(A4_vcmphgti,"Pd4=vcmph.gt(Rss32,#s8)",ATTRIBS(),
+"Compare elements of two vectors ",
+{
+	fHIDE(int i;)
+	for (i = 0; i < 4; i++) {
+		fSETBIT(i*2,  PdV,  (fGETHALF(i,RssV) > siV));
+		fSETBIT(i*2+1,PdV,  (fGETHALF(i,RssV) > siV));
+	}
+})
+
+
+Q6INSN(A4_vcmphgtui,"Pd4=vcmph.gtu(Rss32,#u7)",ATTRIBS(),
+"Compare elements of two vectors ",
+{
+	fHIDE(int i;)
+	for (i = 0; i < 4; i++) {
+		fSETBIT(i*2,  PdV,  (fGETUHALF(i,RssV) > uiV));
+		fSETBIT(i*2+1,PdV,  (fGETUHALF(i,RssV) > uiV));
+	}
+})
+
+Q6INSN(A4_cmpheq,"Pd4=cmph.eq(Rs32,Rt32)",ATTRIBS(),
+"Compare halfwords ",
+{
+	PdV=f8BITSOF(fGETHALF(0,RsV) == fGETHALF(0,RtV));
+})
+
+Q6INSN(A4_cmphgt,"Pd4=cmph.gt(Rs32,Rt32)",ATTRIBS(),
+"Compare halfwords ",
+{
+	PdV=f8BITSOF(fGETHALF(0,RsV) > fGETHALF(0,RtV));
+})
+
+Q6INSN(A4_cmphgtu,"Pd4=cmph.gtu(Rs32,Rt32)",ATTRIBS(),
+"Compare halfwords ",
+{
+	PdV=f8BITSOF(fGETUHALF(0,RsV) > fGETUHALF(0,RtV));
+})
+
+Q6INSN(A4_cmpheqi,"Pd4=cmph.eq(Rs32,#s8)",ATTRIBS(),
+"Compare halfwords ",
+{
+	fIMMEXT(siV);
+	PdV=f8BITSOF(fGETHALF(0,RsV) == siV);
+})
+
+Q6INSN(A4_cmphgti,"Pd4=cmph.gt(Rs32,#s8)",ATTRIBS(),
+"Compare halfwords ",
+{
+	fIMMEXT(siV);
+	PdV=f8BITSOF(fGETHALF(0,RsV) > siV);
+})
+
+Q6INSN(A4_cmphgtui,"Pd4=cmph.gtu(Rs32,#u7)",ATTRIBS(),
+"Compare halfwords ",
+{
+	fIMMEXT(uiV);
+	PdV=f8BITSOF(fGETUHALF(0,RsV) > fCAST4u(uiV));
+})
+
+Q6INSN(A2_vcmpweq,"Pd4=vcmpw.eq(Rss32,Rtt32)",ATTRIBS(),
+"Compare elements of two vectors ",
+{
+	fSETBITS(3,0,PdV,(fGETWORD(0,RssV)==fGETWORD(0,RttV)));
+	fSETBITS(7,4,PdV,(fGETWORD(1,RssV)==fGETWORD(1,RttV)));
+})
+
+Q6INSN(A2_vcmpwgt,"Pd4=vcmpw.gt(Rss32,Rtt32)",ATTRIBS(),
+"Compare elements of two vectors ",
+{
+	fSETBITS(3,0,PdV,(fGETWORD(0,RssV)>fGETWORD(0,RttV)));
+	fSETBITS(7,4,PdV,(fGETWORD(1,RssV)>fGETWORD(1,RttV)));
+})
+
+Q6INSN(A2_vcmpwgtu,"Pd4=vcmpw.gtu(Rss32,Rtt32)",ATTRIBS(),
+"Compare elements of two vectors ",
+{
+	fSETBITS(3,0,PdV,(fGETUWORD(0,RssV)>fGETUWORD(0,RttV)));
+	fSETBITS(7,4,PdV,(fGETUWORD(1,RssV)>fGETUWORD(1,RttV)));
+})
+
+Q6INSN(A4_vcmpweqi,"Pd4=vcmpw.eq(Rss32,#s8)",ATTRIBS(),
+"Compare elements of two vectors ",
+{
+	fSETBITS(3,0,PdV,(fGETWORD(0,RssV)==siV));
+	fSETBITS(7,4,PdV,(fGETWORD(1,RssV)==siV));
+})
+
+Q6INSN(A4_vcmpwgti,"Pd4=vcmpw.gt(Rss32,#s8)",ATTRIBS(),
+"Compare elements of two vectors ",
+{
+	fSETBITS(3,0,PdV,(fGETWORD(0,RssV)>siV));
+	fSETBITS(7,4,PdV,(fGETWORD(1,RssV)>siV));
+})
+
+Q6INSN(A4_vcmpwgtui,"Pd4=vcmpw.gtu(Rss32,#u7)",ATTRIBS(),
+"Compare elements of two vectors ",
+{
+	fSETBITS(3,0,PdV,(fGETUWORD(0,RssV)>fCAST4u(uiV)));
+	fSETBITS(7,4,PdV,(fGETUWORD(1,RssV)>fCAST4u(uiV)));
+})
+
+Q6INSN(A4_boundscheck_hi,"Pd4=boundscheck(Rss32,Rtt32):raw:hi",ATTRIBS(),
+"Detect if a register is within bounds",
+{
+	fHIDE(size4u_t src;)
+	src = fGETUWORD(1,RssV);
+	PdV = f8BITSOF((fCAST4u(src) >= fGETUWORD(0,RttV)) && (fCAST4u(src) < fGETUWORD(1,RttV)));
+})
+
+Q6INSN(A4_boundscheck_lo,"Pd4=boundscheck(Rss32,Rtt32):raw:lo",ATTRIBS(),
+"Detect if a register is within bounds",
+{
+	fHIDE(size4u_t src;)
+	src = fGETUWORD(0,RssV);
+	PdV = f8BITSOF((fCAST4u(src) >= fGETUWORD(0,RttV)) && (fCAST4u(src) < fGETUWORD(1,RttV)));
+})
+
+DEF_V4_COND_MAPPING(A4_boundscheck,"Pd4=boundscheck(Rs32,Rtt32)","Rs32 & 1","Pd4=boundscheck(Rss32,Rtt32):raw:hi","Pd4=boundscheck(Rss32,Rtt32):raw:lo")
+
+Q6INSN(A4_tlbmatch,"Pd4=tlbmatch(Rss32,Rt32)",ATTRIBS(A_NOTE_LATEPRED,A_RESTRICT_LATEPRED),
+"Detect if a VA/ASID matches a TLB entry",
+{
+	fHIDE(size4u_t TLBHI; size4u_t TLBLO; size4u_t MASK; size4u_t SIZE;)
+	MASK = 0x07ffffff;
+	TLBLO = fGETUWORD(0,RssV);
+	TLBHI = fGETUWORD(1,RssV);
+	SIZE = fMIN(6,fCL1_4(~fBREV_4(TLBLO)));
+	MASK &= (0xffffffff << 2*SIZE);
+	PdV = f8BITSOF(fGETBIT(31,TLBHI) && ((TLBHI & MASK) == (RtV & MASK)));
+	fHIDE(MARK_LATE_PRED_WRITE(PdN))
+})
+
+Q6INSN(C2_tfrpr,"Rd32=Ps4",ATTRIBS(),
+"Transfer predicate to general register", { fPREDUSE_TIMING();RdV = fZXTN(8,32,PsV); })
+
+Q6INSN(C2_tfrrp,"Pd4=Rs32",ATTRIBS(),
+"Transfer general register to Predicate", { PdV = fGETUBYTE(0,RsV); })
+
+Q6INSN(C4_fastcorner9,"Pd4=fastcorner9(Ps4,Pt4)",ATTRIBS(A_CRSLOT23,A_NOTE_CRSLOT23),
+"Determine whether the predicate sources define a corner",
+{   fPREDUSE_TIMING();
+	fHIDE(size4u_t tmp = 0; size4u_t i;)
+	fSETHALF(0,tmp,(PsV<<8)|PtV);
+	fSETHALF(1,tmp,(PsV<<8)|PtV);
+	for (i = 1; i < 9; i++) {
+		tmp &= tmp >> 1;
+	}
+	PdV = f8BITSOF(tmp != 0);
+})
+
+Q6INSN(C4_fastcorner9_not,"Pd4=!fastcorner9(Ps4,Pt4)",ATTRIBS(A_CRSLOT23,A_NOTE_CRSLOT23),
+"Determine whether the predicate sources define a corner",
+{
+    fPREDUSE_TIMING();
+	fHIDE(size4u_t tmp = 0; size4u_t i;)
+	fSETHALF(0,tmp,(PsV<<8)|PtV);
+	fSETHALF(1,tmp,(PsV<<8)|PtV);
+	for (i = 1; i < 9; i++) {
+		tmp &= tmp >> 1;
+	}
+	PdV = f8BITSOF(tmp == 0);
+})
+
+
diff --git a/target/hexagon/imported/float.idef b/target/hexagon/imported/float.idef
new file mode 100644
index 0000000..ee9e940
--- /dev/null
+++ b/target/hexagon/imported/float.idef
@@ -0,0 +1,498 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+/*
+ * Floating-Point Instructions
+ */
+
+/*************************************/
+/* Scalar FP			 */
+/*************************************/
+Q6INSN(F2_sfadd,"Rd32=sfadd(Rs32,Rt32)",ATTRIBS(),
+"Floating-Point Add",
+{ RdV=fUNFLOAT(fFLOAT(RsV)+fFLOAT(RtV));})
+
+Q6INSN(F2_sfsub,"Rd32=sfsub(Rs32,Rt32)",ATTRIBS(),
+"Floating-Point Subtract",
+{ RdV=fUNFLOAT(fFLOAT(RsV)-fFLOAT(RtV));})
+
+Q6INSN(F2_sfmpy,"Rd32=sfmpy(Rs32,Rt32)",ATTRIBS(),
+"Floating-Point Multiply",
+{ RdV=fUNFLOAT(fSFMPY(fFLOAT(RsV),fFLOAT(RtV)));})
+
+Q6INSN(F2_sffma,"Rx32+=sfmpy(Rs32,Rt32)",ATTRIBS(),
+"Floating-Point Fused Multiply Add",
+{ RxV=fUNFLOAT(fFMAF(fFLOAT(RsV),fFLOAT(RtV),fFLOAT(RxV)));})
+
+Q6INSN(F2_sffma_sc,"Rx32+=sfmpy(Rs32,Rt32,Pu4):scale",ATTRIBS(),
+"Floating-Point Fused Multiply Add w/ Additional Scaling (2**Pu)",
+{
+    fPREDUSE_TIMING();
+	fHIDE(size4s_t tmp;)
+	fCHECKSFNAN3(RxV,RxV,RsV,RtV);
+	tmp=fUNFLOAT(fFMAFX(fFLOAT(RsV),fFLOAT(RtV),fFLOAT(RxV),PuV));
+	if (!((fFLOAT(RxV) == 0.0) && fISZEROPROD(fFLOAT(RsV),fFLOAT(RtV)))) RxV = tmp;
+})
+
+Q6INSN(F2_sffms,"Rx32-=sfmpy(Rs32,Rt32)",ATTRIBS(),
+"Floating-Point Fused Multiply Add",
+{ RxV=fUNFLOAT(fFMAF(-fFLOAT(RsV),fFLOAT(RtV),fFLOAT(RxV))); })
+
+Q6INSN(F2_sffma_lib,"Rx32+=sfmpy(Rs32,Rt32):lib",ATTRIBS(),
+"Floating-Point Fused Multiply Add for Library Routines",
+{ fFPSETROUND_NEAREST(); fHIDE(int infinp; int infminusinf; size4s_t tmp;)
+	infminusinf = ((isinf(fFLOAT(RxV))) &&
+		(fISINFPROD(fFLOAT(RsV),fFLOAT(RtV))) &&
+		(fGETBIT(31,RsV ^ RxV ^ RtV) != 0));
+	infinp = (isinf(fFLOAT(RxV))) || (isinf(fFLOAT(RtV))) || (isinf(fFLOAT(RsV)));
+	fCHECKSFNAN3(RxV,RxV,RsV,RtV);
+	tmp=fUNFLOAT(fFMAF(fFLOAT(RsV),fFLOAT(RtV),fFLOAT(RxV)));
+	if (!((fFLOAT(RxV) == 0.0) && fISZEROPROD(fFLOAT(RsV),fFLOAT(RtV)))) RxV = tmp;
+	fFPCANCELFLAGS();
+	if (isinf(fFLOAT(RxV)) && !infinp) RxV = RxV - 1;
+	if (infminusinf) RxV = 0;
+})
+
+Q6INSN(F2_sffms_lib,"Rx32-=sfmpy(Rs32,Rt32):lib",ATTRIBS(),
+"Floating-Point Fused Multiply Add for Library Routines",
+{ fFPSETROUND_NEAREST(); fHIDE(int infinp; int infminusinf; size4s_t tmp;)
+	infminusinf = ((isinf(fFLOAT(RxV))) &&
+		(fISINFPROD(fFLOAT(RsV),fFLOAT(RtV))) &&
+		(fGETBIT(31,RsV ^ RxV ^ RtV) == 0));
+	infinp = (isinf(fFLOAT(RxV))) || (isinf(fFLOAT(RtV))) || (isinf(fFLOAT(RsV)));
+	fCHECKSFNAN3(RxV,RxV,RsV,RtV);
+	tmp=fUNFLOAT(fFMAF(-fFLOAT(RsV),fFLOAT(RtV),fFLOAT(RxV)));
+	if (!((fFLOAT(RxV) == 0.0) && fISZEROPROD(fFLOAT(RsV),fFLOAT(RtV)))) RxV = tmp;
+	fFPCANCELFLAGS();
+	if (isinf(fFLOAT(RxV)) && !infinp) RxV = RxV - 1;
+	if (infminusinf) RxV = 0;
+})
+
+
+Q6INSN(F2_sfcmpeq,"Pd4=sfcmp.eq(Rs32,Rt32)",ATTRIBS(),
+"Floating Point Compare for Equal",
+{PdV=f8BITSOF(fFLOAT(RsV)==fFLOAT(RtV));})
+
+Q6INSN(F2_sfcmpgt,"Pd4=sfcmp.gt(Rs32,Rt32)",ATTRIBS(),
+"Floating-Point Compare for Greater Than",
+{PdV=f8BITSOF(fFLOAT(RsV)>fFLOAT(RtV));})
+
+/* cmpge is not the same as !cmpgt(swapops) in IEEE */
+
+Q6INSN(F2_sfcmpge,"Pd4=sfcmp.ge(Rs32,Rt32)",ATTRIBS(),
+"Floating-Point Compare for Greater Than / Equal To",
+{PdV=f8BITSOF(fFLOAT(RsV)>=fFLOAT(RtV));})
+
+/* Everyone seems to have this... */
+
+Q6INSN(F2_sfcmpuo,"Pd4=sfcmp.uo(Rs32,Rt32)",ATTRIBS(),
+"Floating-Point Compare for Unordered",
+{PdV=f8BITSOF(isunordered(fFLOAT(RsV),fFLOAT(RtV)));})
+
+
+Q6INSN(F2_sfmax,"Rd32=sfmax(Rs32,Rt32)",ATTRIBS(),
+"Maximum of Floating-Point values",
+{ RdV = fUNFLOAT(fSF_MAX(fFLOAT(RsV),fFLOAT(RtV))); })
+
+Q6INSN(F2_sfmin,"Rd32=sfmin(Rs32,Rt32)",ATTRIBS(),
+"Minimum of Floating-Point values",
+{ RdV = fUNFLOAT(fSF_MIN(fFLOAT(RsV),fFLOAT(RtV))); })
+
+
+Q6INSN(F2_sfclass,"Pd4=sfclass(Rs32,#u5)",ATTRIBS(),
+"Classify Floating-Point Value",
+{
+	fHIDE(int class;)
+	PdV = 0;
+	class = fpclassify(fFLOAT(RsV));
+	/* Is the value zero? */
+	if (fGETBIT(0,uiV) && (class == FP_ZERO)) PdV = 0xff;
+	if (fGETBIT(1,uiV) && (class == FP_NORMAL)) PdV = 0xff;
+	if (fGETBIT(2,uiV) && (class == FP_SUBNORMAL)) PdV = 0xff;
+	if (fGETBIT(3,uiV) && (class == FP_INFINITE)) PdV = 0xff;
+	if (fGETBIT(4,uiV) && (class == FP_NAN)) PdV = 0xff;
+	fFPCANCELFLAGS();
+})
+
+/* Range: +/- (1.0 .. 1+(63/64)) * 2**(-6 .. +9) */
+/* More immediate bits should probably be used for more precision? */
+
+Q6INSN(F2_sfimm_p,"Rd32=sfmake(#u10):pos",ATTRIBS(A_SFMAKE),
+"Make Floating Point Value",
+{
+	RdV = (127 - 6) << 23;
+	RdV += uiV << 17;
+})
+
+Q6INSN(F2_sfimm_n,"Rd32=sfmake(#u10):neg",ATTRIBS(A_SFMAKE),
+"Make Floating Point Value",
+{
+	RdV = (127 - 6) << 23;
+	RdV += (uiV << 17);
+	RdV |= (1 << 31);
+})
+
+
+Q6INSN(F2_sfdivcheat,"Rd32=sfdivcheat(Rs32,Rt32)",ATTRIBS(A_FAKEINSN),
+"Cheating Instruction for Divide Testing",
+{
+	RdV = fUNFLOAT(fFLOAT(RsV) / fFLOAT(RtV));
+})
+
+Q6INSN(F2_sfrecipa,"Rd32,Pe4=sfrecipa(Rs32,Rt32)",ATTRIBS(A_NOTE_COMPAT_ACCURACY,A_RESTRICT_LATEPRED,A_NOTE_LATEPRED),
+"Reciprocal Approximation for Division",
+{
+	fHIDE(int idx;)
+	fHIDE(int adjust;)
+	fHIDE(int mant;)
+	fHIDE(int exp;)
+	if (fSF_RECIP_COMMON(RsV,RtV,RdV,adjust)) {
+		PeV = adjust;
+		idx = (RtV >> 16) & 0x7f;
+		mant = (fSF_RECIP_LOOKUP(idx) << 15) | 1;
+		exp = fSF_BIAS() - (fSF_GETEXP(RtV) - fSF_BIAS()) - 1;
+		RdV = fMAKESF(fGETBIT(31,RtV),exp,mant);
+	}
+})
+
+Q6INSN(F2_sffixupn,"Rd32=sffixupn(Rs32,Rt32)",ATTRIBS(),
+"Fix Up Numerator",
+{
+	fHIDE(int adjust;)
+	fSF_RECIP_COMMON(RsV,RtV,RdV,adjust);
+	RdV = RsV;
+})
+
+Q6INSN(F2_sffixupd,"Rd32=sffixupd(Rs32,Rt32)",ATTRIBS(),
+"Fix Up Denominator",
+{
+	fHIDE(int adjust;)
+	fSF_RECIP_COMMON(RsV,RtV,RdV,adjust);
+	RdV = RtV;
+})
+
+Q6INSN(F2_sfinvsqrta,"Rd32,Pe4=sfinvsqrta(Rs32)",ATTRIBS(A_NOTE_COMPAT_ACCURACY,A_NOTE_LATEPRED,A_RESTRICT_LATEPRED),
+"Reciprocal Square Root Approximation",
+{
+	fHIDE(int idx;)
+	fHIDE(int adjust;)
+	fHIDE(int mant;)
+	fHIDE(int exp;)
+	if (fSF_INVSQRT_COMMON(RsV,RdV,adjust)) {
+		PeV = adjust;
+		idx = (RsV >> 17) & 0x7f;
+		mant = (fSF_INVSQRT_LOOKUP(idx) << 15);
+		exp = fSF_BIAS() - ((fSF_GETEXP(RsV) - fSF_BIAS()) >> 1) - 1;
+		RdV = fMAKESF(fGETBIT(31,RsV),exp,mant);
+	}
+})
+
+Q6INSN(F2_sffixupr,"Rd32=sffixupr(Rs32)",ATTRIBS(),
+"Fix Up Radicand",
+{
+	fHIDE(int adjust;)
+	fSF_INVSQRT_COMMON(RsV,RdV,adjust);
+	RdV = RsV;
+})
+
+Q6INSN(F2_sfsqrtcheat,"Rd32=sfsqrtcheat(Rs32)",ATTRIBS(A_FAKEINSN),
+"Cheating instruction for SQRT Testing",
+{
+	RdV = fUNFLOAT(sqrtf(fFLOAT(RsV)));
+})
+
+/*************************************/
+/* Scalar DP			 */
+/*************************************/
+Q6INSN(F2_dfadd,"Rdd32=dfadd(Rss32,Rtt32)",ATTRIBS(),
+"Floating-Point Add",
+{ RddV=fUNDOUBLE(fDOUBLE(RssV)+fDOUBLE(RttV));})
+
+Q6INSN(F2_dfsub,"Rdd32=dfsub(Rss32,Rtt32)",ATTRIBS(),
+"Floating-Point Subtract",
+{ RddV=fUNDOUBLE(fDOUBLE(RssV)-fDOUBLE(RttV));})
+
+Q6INSN(F2_dfmax,"Rdd32=dfmax(Rss32,Rtt32)",ATTRIBS(),
+"Maximum of Floating-Point values",
+{ RddV = fUNDOUBLE(fDF_MAX(fDOUBLE(RssV),fDOUBLE(RttV))); })
+
+Q6INSN(F2_dfmin,"Rdd32=dfmin(Rss32,Rtt32)",ATTRIBS(),
+"Minimum of Floating-Point values",
+{ RddV = fUNDOUBLE(fDF_MIN(fDOUBLE(RssV),fDOUBLE(RttV))); })
+
+Q6INSN(F2_dfmpyfix,"Rdd32=dfmpyfix(Rss32,Rtt32)",ATTRIBS(),
+"Fix Up Multiplicand for Multiplication",
+{
+	if (fDF_ISDENORM(RssV) && fDF_ISBIG(RttV) && fDF_ISNORMAL(RttV)) RddV = fUNDOUBLE(fDOUBLE(RssV) * 0x1.0p52);
+	else if (fDF_ISDENORM(RttV) && fDF_ISBIG(RssV) && fDF_ISNORMAL(RssV)) RddV = fUNDOUBLE(fDOUBLE(RssV) * 0x1.0p-52);
+	else RddV = RssV;
+})
+
+Q6INSN(F2_dfmpyll,"Rdd32=dfmpyll(Rss32,Rtt32)",ATTRIBS(),
+"Multiply low*low and shift off low 32 bits into sticky (in MSB)",
+{
+	fHIDE(size8u_t prod;)
+	prod = fMPY32UU(fGETUWORD(0,RssV),fGETUWORD(0,RttV));
+	RddV = (prod >> 32) << 1;
+	if (fGETUWORD(0,prod) != 0) fSETBIT(0,RddV,1);
+})
+
+Q6INSN(F2_dfmpylh,"Rxx32+=dfmpylh(Rss32,Rtt32)",ATTRIBS(),
+"Multiply low*high and accumulate",
+{
+	RxxV += (fGETUWORD(0,RssV) * (0x00100000 | fZXTN(20,64,fGETUWORD(1,RttV)))) << 1;
+})
+
+Q6INSN(F2_dfmpyhh,"Rxx32+=dfmpyhh(Rss32,Rtt32)",ATTRIBS(),
+"Multiply high*high and accumulate with L*H value",
+{
+	RxxV = fUNDOUBLE(fDF_MPY_HH(fDOUBLE(RssV),fDOUBLE(RttV),RxxV));
+})
+
+
+
+#ifdef ADD_DP_OPS
+Q6INSN(F2_dfmpy,"Rdd32=dfmpy(Rss32,Rtt32)",ATTRIBS(A_FAKEINSN),
+"Floating-Point Multiply",
+{ FLATENCY(4);RddV=fUNDOUBLE(fDFMPY(fDOUBLE(RssV),fDOUBLE(RttV)));})
+
+Q6INSN(F2_dffma,"Rxx32+=dfmpy(Rss32,Rtt32)",ATTRIBS(A_FAKEINSN),
+"Floating-Point Fused Multiply Add",
+{ FLATENCY(5);RxxV=fUNDOUBLE(fFMA(fDOUBLE(RssV),fDOUBLE(RttV),fDOUBLE(RxxV)));})
+
+Q6INSN(F2_dffms,"Rxx32-=dfmpy(Rss32,Rtt32)",ATTRIBS(A_FAKEINSN),
+"Floating-Point Fused Multiply Add",
+{ FLATENCY(5);RxxV=fUNDOUBLE(fFMA(-fDOUBLE(RssV),fDOUBLE(RttV),fDOUBLE(RxxV)));})
+
+Q6INSN(F2_dffma_lib,"Rxx32+=dfmpy(Rss32,Rtt32):lib",ATTRIBS(A_FAKEINSN),
+"Floating-Point Fused Multiply Add for Library Routines",
+{ FLATENCY(5);fFPSETROUND_NEAREST(); fHIDE(int infinp; int infminusinf;)
+  infinp = (isinf(fDOUBLE(RxxV))) || (isinf(fDOUBLE(RttV))) || (isinf(fDOUBLE(RssV)));
+  infminusinf = ((isinf(fDOUBLE(RxxV))) &&
+		(fISINFPROD(fDOUBLE(RssV),fDOUBLE(RttV))) &&
+		(fGETBIT(63,RssV ^ RxxV ^ RttV) != 0));
+  fCHECKDFNAN3(RxxV,RxxV,RssV,RttV);
+  if ((fDOUBLE(RssV) != 0.0) && (fDOUBLE(RttV) != 0.0)) {
+    RxxV=fUNDOUBLE(fFMA(fDOUBLE(RssV),fDOUBLE(RttV),fDOUBLE(RxxV)));
+  } else {
+    if (isinf(fDOUBLE(RssV)) || isinf(fDOUBLE(RttV))) RxxV = fDFNANVAL();
+  }
+  fFPCANCELFLAGS();
+  if (isinf(fDOUBLE(RxxV)) && !infinp) RxxV = RxxV - 1;
+  if (infminusinf) RxxV = 0;
+})
+
+
+Q6INSN(F2_dffms_lib,"Rxx32-=dfmpy(Rss32,Rtt32):lib",ATTRIBS(A_FAKEINSN),
+"Floating-Point Fused Multiply Add for Library Routines",
+{ FLATENCY(5); fFPSETROUND_NEAREST(); fHIDE(int infinp; int infminusinf;)
+  infinp = (isinf(fDOUBLE(RxxV))) || (isinf(fDOUBLE(RttV))) || (isinf(fDOUBLE(RssV)));
+  infminusinf = ((isinf(fDOUBLE(RxxV))) &&
+		(fISINFPROD(fDOUBLE(RssV),fDOUBLE(RttV))) &&
+		(fGETBIT(63,RssV ^ RxxV ^ RttV) == 0));
+  fCHECKDFNAN3(RxxV,RxxV,RssV,RttV);
+  if ((fDOUBLE(RssV) != 0.0) && (fDOUBLE(RttV) != 0.0)) {
+    RxxV=fUNDOUBLE(fFMA(-fDOUBLE(RssV),fDOUBLE(RttV),fDOUBLE(RxxV)));
+  } else {
+    if (isinf(fDOUBLE(RssV)) || isinf(fDOUBLE(RttV))) RxxV = fDFNANVAL();
+  }
+  fFPCANCELFLAGS();
+  if (isinf(fDOUBLE(RxxV)) && !infinp) RxxV = RxxV - 1;
+  if (infminusinf) RxxV = 0;
+})
+
+
+Q6INSN(F2_dffma_sc,"Rxx32+=dfmpy(Rss32,Rtt32,Pu4):scale",ATTRIBS(A_FAKEINSN),
+"Floating-Point Fused Multiply Add w/ Additional Scaling (2**(4*Pu))",
+{ FLATENCY(5);
+  fHIDE(size8s_t tmp;)
+  fCHECKDFNAN3(RxxV,RxxV,RssV,RttV);
+  tmp=fUNDOUBLE(fFMAX(fDOUBLE(RssV),fDOUBLE(RttV),fDOUBLE(RxxV),PuV));
+  if ((fDOUBLE(tmp) != fDOUBLE(RxxV)) || ((fDOUBLE(RssV) != 0.0) && (fDOUBLE(RttV) != 0.0))) RxxV = tmp;
+})
+
+#endif
+
+Q6INSN(F2_dfcmpeq,"Pd4=dfcmp.eq(Rss32,Rtt32)",ATTRIBS(),
+"Floating Point Compare for Equal",
+{PdV=f8BITSOF(fDOUBLE(RssV)==fDOUBLE(RttV));})
+
+Q6INSN(F2_dfcmpgt,"Pd4=dfcmp.gt(Rss32,Rtt32)",ATTRIBS(),
+"Floating-Point Compare for Greater Than",
+{PdV=f8BITSOF(fDOUBLE(RssV)>fDOUBLE(RttV));})
+
+
+/* cmpge is not the same as !cmpgt(swapops) in IEEE */
+
+Q6INSN(F2_dfcmpge,"Pd4=dfcmp.ge(Rss32,Rtt32)",ATTRIBS(),
+"Floating-Point Compare for Greater Than / Equal To",
+{PdV=f8BITSOF(fDOUBLE(RssV)>=fDOUBLE(RttV));})
+
+/* Everyone seems to have this... */
+
+Q6INSN(F2_dfcmpuo,"Pd4=dfcmp.uo(Rss32,Rtt32)",ATTRIBS(),
+"Floating-Point Compare for Unordered",
+{PdV=f8BITSOF(isunordered(fDOUBLE(RssV),fDOUBLE(RttV)));})
+
+
+Q6INSN(F2_dfclass,"Pd4=dfclass(Rss32,#u5)",ATTRIBS(),
+"Classify Floating-Point Value",
+{
+	fHIDE(int class;)
+	PdV = 0;
+	class = fpclassify(fDOUBLE(RssV));
+	/* Is the value zero? */
+	if (fGETBIT(0,uiV) && (class == FP_ZERO)) PdV = 0xff;
+	if (fGETBIT(1,uiV) && (class == FP_NORMAL)) PdV = 0xff;
+	if (fGETBIT(2,uiV) && (class == FP_SUBNORMAL)) PdV = 0xff;
+	if (fGETBIT(3,uiV) && (class == FP_INFINITE)) PdV = 0xff;
+	if (fGETBIT(4,uiV) && (class == FP_NAN)) PdV = 0xff;
+	fFPCANCELFLAGS();
+})
+
+
+/* Range: +/- (1.0 .. 1+(63/64)) * 2**(-6 .. +9) */
+/* More immediate bits should probably be used for more precision? */
+
+Q6INSN(F2_dfimm_p,"Rdd32=dfmake(#u10):pos",ATTRIBS(A_DFMAKE),
+"Make Floating Point Value",
+{
+	RddV = (1023ULL - 6) << 52;
+	RddV += (fHIDE((size8u_t))uiV) << 46;
+})
+
+Q6INSN(F2_dfimm_n,"Rdd32=dfmake(#u10):neg",ATTRIBS(A_DFMAKE),
+"Make Floating Point Value",
+{
+	RddV = (1023ULL - 6) << 52;
+	RddV += (fHIDE((size8u_t))uiV) << 46;
+	RddV |= ((1ULL) << 63);
+})
+
+
+#ifdef ADD_DP_OPS
+Q6INSN(F2_dfdivcheat,"Rdd32=dfdivcheat(Rss32,Rtt32)",ATTRIBS(A_FAKEINSN),
+"Cheating Instruction for Divide Testing",
+{
+	RddV = fUNDOUBLE(fDOUBLE(RssV) / fDOUBLE(RttV));
+})
+
+
+Q6INSN(F2_dfrecipa,"Rdd32,Pe4=dfrecipa(Rss32,Rtt32)",ATTRIBS(A_NOTE_COMPAT_ACCURACY,A_FAKEINSN),
+"Reciprocal Approximation for Division",
+{
+	fHIDE(int idx;)
+	fHIDE(int adjust;)
+	fHIDE(size8s_t mant;)
+	fHIDE(int exp;)
+	if (fDF_RECIP_COMMON(RssV,RttV,RddV,adjust)) {
+		PeV = adjust;
+		idx = (RttV >> 45) & 0x7f;
+		mant = (fDF_RECIP_LOOKUP(idx) << 44) | 1;
+		exp = fDF_BIAS() - (fDF_GETEXP(RttV) - fDF_BIAS()) - 1;
+		RddV = fMAKEDF(fGETBIT(63,RttV),exp,mant);
+	}
+})
+
+Q6INSN(F2_dffixupn,"Rdd32=dffixupn(Rss32,Rtt32)",ATTRIBS(A_FAKEINSN),
+"Fix Up Numerator",
+{
+	fHIDE(int adjust;)
+	fDF_RECIP_COMMON(RssV,RttV,RddV,adjust);
+	RddV = RssV;
+})
+
+Q6INSN(F2_dffixupd,"Rdd32=dffixupd(Rss32,Rtt32)",ATTRIBS(A_FAKEINSN),
+"Fix Up Denominator",
+{
+	fHIDE(int adjust;)
+	fDF_RECIP_COMMON(RssV,RttV,RddV,adjust);
+	RddV = RttV;
+})
+
+
+
+
+Q6INSN(F2_dfinvsqrta,"Rdd32,Pe4=dfinvsqrta(Rss32)",ATTRIBS(A_NOTE_COMPAT_ACCURACY,A_FAKEINSN),
+"Reciprocal Approximation for Division",
+{
+	fHIDE(int idx;)
+	fHIDE(int adjust;)
+	fHIDE(size8u_t mant;)
+	fHIDE(int exp;)
+	if (fDF_INVSQRT_COMMON(RssV,RddV,adjust)) {
+		PeV = adjust;
+		idx = (RssV >> 46) & 0x7f;
+		mant = (fDF_INVSQRT_LOOKUP(idx) << 44);
+		exp = fDF_BIAS() - ((fDF_GETEXP(RssV) - fDF_BIAS()) >> 1) - 1;
+		RddV = fMAKEDF(fGETBIT(63,RssV),exp,mant);
+	}
+})
+
+Q6INSN(F2_dffixupr,"Rdd32=dffixupr(Rss32)",ATTRIBS(A_FAKEINSN),
+"Fix Up Radicand",
+{
+	fHIDE(int adjust;)
+	fDF_INVSQRT_COMMON(RssV,RddV,adjust);
+	RddV = RssV;
+})
+
+Q6INSN(F2_dfsqrtcheat,"Rdd32=dfsqrtcheat(Rss32)",ATTRIBS(A_FAKEINSN),
+"Cheating instruction for SQRT Testing",
+{
+	RddV = fUNDOUBLE(sqrt(fDOUBLE(RssV)));
+})
+#endif
+
+
+
+
+
+/* CONVERSION */
+
+#define CONVERT(TAG,DEST,DESTV,SRC,SRCV,OUTCAST,OUTTYPE,INCAST,INTYPE,MODETAG,MODESYN,MODEBEH) \
+    Q6INSN(F2_conv_##TAG##MODETAG,#DEST"=convert_"#TAG"("#SRC")"#MODESYN,ATTRIBS(), \
+    "Floating point format conversion", \
+    { MODEBEH DESTV = OUTCAST(conv_##INTYPE##_to_##OUTTYPE(INCAST(SRCV))); })
+
+CONVERT(sf2df,Rdd32,RddV,Rs32,RsV,fUNDOUBLE,df,fFLOAT,sf,,,)
+CONVERT(df2sf,Rd32,RdV,Rss32,RssV,fUNFLOAT,sf,fDOUBLE,df,,,)
+
+#define ALLINTDST(TAGSTART,SRC,SRCV,INCAST,INTYPE,MODETAG,MODESYN,MODEBEH) \
+CONVERT(TAGSTART##uw,Rd32,RdV,SRC,SRCV,fCAST4u,4u,INCAST,INTYPE,MODETAG,MODESYN,MODEBEH) \
+CONVERT(TAGSTART##w,Rd32,RdV,SRC,SRCV,fCAST4s,4s,INCAST,INTYPE,MODETAG,MODESYN,MODEBEH) \
+CONVERT(TAGSTART##ud,Rdd32,RddV,SRC,SRCV,fCAST8u,8u,INCAST,INTYPE,MODETAG,MODESYN,MODEBEH) \
+CONVERT(TAGSTART##d,Rdd32,RddV,SRC,SRCV,fCAST8s,8s,INCAST,INTYPE,MODETAG,MODESYN,MODEBEH)
+
+#define ALLFPDST(TAGSTART,SRC,SRCV,INCAST,INTYPE,MODETAG,MODESYN,MODEBEH) \
+CONVERT(TAGSTART##sf,Rd32,RdV,SRC,SRCV,fUNFLOAT,sf,INCAST,INTYPE,MODETAG,MODESYN,MODEBEH) \
+CONVERT(TAGSTART##df,Rdd32,RddV,SRC,SRCV,fUNDOUBLE,df,INCAST,INTYPE,MODETAG,MODESYN,MODEBEH)
+
+#define ALLINTSRC(GEN,MODETAG,MODESYN,MODEBEH) \
+GEN(uw##2,Rs32,RsV,fCAST4u,4u,MODETAG,MODESYN,MODEBEH) \
+GEN(w##2,Rs32,RsV,fCAST4s,4s,MODETAG,MODESYN,MODEBEH) \
+GEN(ud##2,Rss32,RssV,fCAST8u,8u,MODETAG,MODESYN,MODEBEH) \
+GEN(d##2,Rss32,RssV,fCAST8s,8s,MODETAG,MODESYN,MODEBEH)
+
+#define ALLFPSRC(GEN,MODETAG,MODESYN,MODEBEH) \
+GEN(sf##2,Rs32,RsV,fFLOAT,sf,MODETAG,MODESYN,MODEBEH) \
+GEN(df##2,Rss32,RssV,fDOUBLE,df,MODETAG,MODESYN,MODEBEH)
+
+ALLINTSRC(ALLFPDST,,,)
+ALLFPSRC(ALLINTDST,,,)
+ALLFPSRC(ALLINTDST,_chop,:chop,fFPSETROUND_CHOP();)
+
diff --git a/target/hexagon/imported/ldst.idef b/target/hexagon/imported/ldst.idef
new file mode 100644
index 0000000..2b03bbc
--- /dev/null
+++ b/target/hexagon/imported/ldst.idef
@@ -0,0 +1,421 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+/*
+ * Load and Store instruction definitions
+ */
+
+#define FRAME_EXPLICIT 1
+
+/* The set of addressing modes standard to all Load instructions */
+#define STD_LD_AMODES(TAG,OPER,DESCR,ATTRIB,SHFT,SEMANTICS,SCALE)\
+Q6INSN(L2_##TAG##_io,  OPER"(Rs32+#s11:"SHFT")",          ATTRIB,DESCR,{fIMMEXT(siV); fEA_RI(RsV,siV); SEMANTICS; })\
+Q6INSN(L4_##TAG##_ur,  OPER"(Rt32<<#u2+#U6)",             ATTRIB,DESCR,{fMUST_IMMEXT(UiV); fEA_IRs(UiV,RtV,uiV); SEMANTICS;})\
+Q6INSN(L4_##TAG##_ap,  OPER"(Re32=#U6)",                  ATTRIB,DESCR,{fMUST_IMMEXT(UiV); fEA_IMM(UiV); SEMANTICS; ReV=UiV; })\
+Q6INSN(L2_##TAG##_pr,  OPER"(Rx32++Mu2)",                 ATTRIB,DESCR,{fEA_REG(RxV); fPM_M(RxV,MuV); SEMANTICS;})\
+Q6INSN(L2_##TAG##_pbr, OPER"(Rx32++Mu2:brev)",            ATTRIB,DESCR,{fEA_BREVR(RxV); fPM_M(RxV,MuV); SEMANTICS;})\
+Q6INSN(L2_##TAG##_pi,  OPER"(Rx32++#s4:"SHFT")",          ATTRIB,DESCR,{fEA_REG(RxV); fPM_I(RxV,siV); SEMANTICS;})\
+Q6INSN(L2_##TAG##_pci, OPER"(Rx32++#s4:"SHFT":circ(Mu2))",ATTRIB,DESCR,{fEA_REG(RxV); fPM_CIRI(RxV,siV,MuV); SEMANTICS;})\
+Q6INSN(L2_##TAG##_pcr, OPER"(Rx32++I:circ(Mu2))",  ATTRIB,DESCR,{fEA_REG(RxV); fPM_CIRR(RxV,fREAD_IREG(MuV)<<SCALE,MuV); SEMANTICS;})\
+DEF_MAPPING(L2_##TAG##_zomap, OPER"(Rs32)", OPER"(Rs32+#0)");
+
+/* The set of 32-bit load instructions */
+STD_LD_AMODES(loadrub,"Rd32=memub","Load Unsigned Byte",ATTRIBS(A_MEMSIZE_1B,A_LOAD),"0",fLOAD(1,1,u,EA,RdV),0)
+STD_LD_AMODES(loadrb, "Rd32=memb", "Load signed Byte",ATTRIBS(A_MEMSIZE_1B,A_LOAD),"0",fLOAD(1,1,s,EA,RdV),0)
+STD_LD_AMODES(loadruh,"Rd32=memuh","Load unsigned Half integer",ATTRIBS(A_MEMSIZE_2B,A_LOAD),"1",fLOAD(1,2,u,EA,RdV),1)
+STD_LD_AMODES(loadrh, "Rd32=memh", "Load signed Half integer",ATTRIBS(A_MEMSIZE_2B,A_LOAD),"1",fLOAD(1,2,s,EA,RdV),1)
+STD_LD_AMODES(loadri, "Rd32=memw", "Load Word",ATTRIBS(A_MEMSIZE_4B,A_LOAD),"2",fLOAD(1,4,u,EA,RdV),2)
+STD_LD_AMODES(loadrd, "Rdd32=memd","Load Double integer",ATTRIBS(A_MEMSIZE_8B,A_LOAD),"3",fLOAD(1,8,u,EA,RddV),3)
+
+/* These instructions do a load an unpack */
+STD_LD_AMODES(loadbzw2, "Rd32=memubh", "Load Bytes and Vector Zero-Extend (unpack)",
+ATTRIBS(A_MEMSIZE_2B,A_LOAD),"1",
+{fHIDE(size2u_t tmpV; int i;)
+ fLOAD(1,2,u,EA,tmpV);
+ for (i=0;i<2;i++) {
+  fSETHALF(i,RdV,fGETUBYTE(i,tmpV));
+ }
+},1)
+
+STD_LD_AMODES(loadbzw4, "Rdd32=memubh", "Load Bytes and Vector Zero-Extend (unpack)",
+ATTRIBS(A_MEMSIZE_4B,A_LOAD),"2",
+{fHIDE(size4u_t tmpV; int i;)
+ fLOAD(1,4,u,EA,tmpV);
+ for (i=0;i<4;i++) {
+  fSETHALF(i,RddV,fGETUBYTE(i,tmpV));
+ }
+},2)
+
+
+
+/* These instructions do a load an unpack */
+STD_LD_AMODES(loadbsw2, "Rd32=membh", "Load Bytes and Vector Sign-Extend (unpack)",
+ATTRIBS(A_MEMSIZE_2B,A_LOAD),"1",
+{fHIDE(size2u_t tmpV; int i;)
+ fLOAD(1,2,u,EA,tmpV);
+ for (i=0;i<2;i++) {
+  fSETHALF(i,RdV,fGETBYTE(i,tmpV));
+ }
+},1)
+
+STD_LD_AMODES(loadbsw4, "Rdd32=membh", "Load Bytes and Vector Sign-Extend (unpack)",
+ATTRIBS(A_MEMSIZE_4B,A_LOAD),"2",
+{fHIDE(size4u_t tmpV; int i;)
+ fLOAD(1,4,u,EA,tmpV);
+ for (i=0;i<4;i++) {
+  fSETHALF(i,RddV,fGETBYTE(i,tmpV));
+ }
+},2)
+
+
+
+STD_LD_AMODES(loadalignh, "Ryy32=memh_fifo", "Load Half-word into shifted vector",
+ATTRIBS(A_MEMSIZE_2B,A_LOAD),"1",
+{
+ fHIDE(size8u_t tmpV;)
+ fLOAD(1,2,u,EA,tmpV);
+ RyyV = (((size8u_t)RyyV)>>16)|(tmpV<<48);
+},1)
+
+
+STD_LD_AMODES(loadalignb, "Ryy32=memb_fifo", "Load byte into shifted vector",
+ATTRIBS(A_MEMSIZE_1B,A_LOAD),"0",
+{
+ fHIDE(size8u_t tmpV;)
+ fLOAD(1,1,u,EA,tmpV);
+ RyyV = (((size8u_t)RyyV)>>8)|(tmpV<<56);
+},0)
+
+
+
+
+/* The set of addressing modes standard to all Store instructions */
+#define STD_ST_AMODES(TAG,DEST,OPER,DESCR,ATTRIB,SHFT,SEMANTICS,SCALE)\
+Q6INSN(S2_##TAG##_io,  OPER"(Rs32+#s11:"SHFT")="DEST,     ATTRIB,DESCR,{fIMMEXT(siV); fEA_RI(RsV,siV); SEMANTICS; })\
+Q6INSN(S2_##TAG##_pi,  OPER"(Rx32++#s4:"SHFT")="DEST,     ATTRIB,DESCR,{fEA_REG(RxV); fPM_I(RxV,siV); SEMANTICS; })\
+Q6INSN(S4_##TAG##_ap,  OPER"(Re32=#U6)="DEST,             ATTRIB,DESCR,{fMUST_IMMEXT(UiV); fEA_IMM(UiV); SEMANTICS; ReV=UiV; })\
+Q6INSN(S2_##TAG##_pr,  OPER"(Rx32++Mu2)="DEST,            ATTRIB,DESCR,{fEA_REG(RxV); fPM_M(RxV,MuV); SEMANTICS; })\
+Q6INSN(S4_##TAG##_ur,  OPER"(Ru32<<#u2+#U6)="DEST,            ATTRIB,DESCR,{fMUST_IMMEXT(UiV); fEA_IRs(UiV,RuV,uiV); SEMANTICS;})\
+Q6INSN(S2_##TAG##_pbr, OPER"(Rx32++Mu2:brev)="DEST,       ATTRIB,DESCR,{fEA_BREVR(RxV); fPM_M(RxV,MuV); SEMANTICS; })\
+Q6INSN(S2_##TAG##_pci, OPER"(Rx32++#s4:"SHFT":circ(Mu2))="DEST,  ATTRIB,DESCR,{fEA_REG(RxV); fPM_CIRI(RxV,siV,MuV); SEMANTICS;})\
+Q6INSN(S2_##TAG##_pcr, OPER"(Rx32++I:circ(Mu2))="DEST,  ATTRIB,DESCR,{fEA_REG(RxV); fPM_CIRR(RxV,fREAD_IREG(MuV)<<SCALE,MuV); SEMANTICS;})\
+DEF_MAPPING(S2_##TAG##_zomap, OPER"(Rs32)="DEST, OPER"(Rs32+#0)="DEST);
+
+
+/* The set of 32-bit store instructions */
+STD_ST_AMODES(storerb, "Rt32", "memb","Store Byte",ATTRIBS(A_MEMSIZE_1B,A_STORE),"0",fSTORE(1,1,EA,fGETBYTE(0,RtV)),0)
+STD_ST_AMODES(storerh, "Rt32", "memh","Store Half integer",ATTRIBS(A_MEMSIZE_2B,A_STORE),"1",fSTORE(1,2,EA,fGETHALF(0,RtV)),1)
+STD_ST_AMODES(storerf, "Rt.H32", "memh","Store Upper Half integer",ATTRIBS(A_MEMSIZE_2B,A_STORE),"1",fSTORE(1,2,EA,fGETHALF(1,RtV)),1)
+STD_ST_AMODES(storeri, "Rt32", "memw","Store Word",ATTRIBS(A_MEMSIZE_4B,A_STORE),"2",fSTORE(1,4,EA,RtV),2)
+STD_ST_AMODES(storerd, "Rtt32","memd","Store Double integer",ATTRIBS(A_MEMSIZE_8B,A_STORE),"3",fSTORE(1,8,EA,RttV),3)
+STD_ST_AMODES(storerinew, "Nt8.new", "memw","Store Word",ATTRIBS(A_NOTE_NEWVAL_SLOT0,A_NVSTORE,A_NOTE_NVSLOT0,A_MEMSIZE_4B,A_STORE,A_RESTRICT_NOSLOT1_STORE),"2",fSTORE(1,4,EA,fNEWREG_ST(NtN)),2)
+STD_ST_AMODES(storerbnew, "Nt8.new", "memb","Store Byte",ATTRIBS(A_NOTE_NEWVAL_SLOT0,A_NVSTORE,A_NOTE_NVSLOT0,A_MEMSIZE_1B,A_STORE,A_RESTRICT_NOSLOT1_STORE),"0",fSTORE(1,1,EA,fGETBYTE(0,fNEWREG_ST(NtN))),0)
+STD_ST_AMODES(storerhnew, "Nt8.new", "memh","Store Half integer",ATTRIBS(A_NOTE_NEWVAL_SLOT0,A_NVSTORE,A_NOTE_NVSLOT0,A_MEMSIZE_2B,A_STORE,A_RESTRICT_NOSLOT1_STORE),"1",fSTORE(1,2,EA,fGETHALF(0,fNEWREG_ST(NtN))),1)
+
+
+#if FRAME_EXPLICIT
+Q6INSN(S2_allocframe,"allocframe(Rx32,#u11:3):raw", ATTRIBS(A_MEMSIZE_8B,A_STORE,A_RESTRICT_SLOT0ONLY), "Allocate stack frame",
+{ fEA_RI(RxV,-8); fSTORE(1,8,EA,fFRAME_SCRAMBLE((fCAST8_8u(fREAD_LR()) << 32) | fCAST4_4u(fREAD_FP()))); fWRITE_FP(EA); fFRAMECHECK(EA-uiV,EA); RxV = EA-uiV; })
+
+DEF_MAPPING(S6_allocframe_to_raw,"allocframe(#u11:3)","allocframe(r29,#u11:3):raw")
+#else
+Q6INSN(S2_allocframe,"allocframe(#u11:3)", ATTRIBS(A_MEMSIZE_8B,A_STORE,A_RESTRICT_SLOT0ONLY), "Allocate stack frame",
+{ fEA_RI(fREAD_SP(),-8); fSTORE(1,8,EA,fFRAME_SCRAMBLE((fCAST8_8u(fREAD_LR()) << 32) | fCAST4_4u(fREAD_FP()))); fWRITE_FP(EA); fFRAMECHECK(EA-uiV,EA); fWRITE_SP(EA-uiV); })
+#endif
+
+#define A_RETURN A_RESTRICT_COF_MAX1,A_RESTRICT_SLOT0ONLY,A_RESTRICT_NOSLOT1_STORE,A_RET_TYPE,A_DEALLOCRET
+
+#if FRAME_EXPLICIT
+Q6INSN(L2_deallocframe,"Rdd32=deallocframe(Rs32):raw", ATTRIBS(A_MEMSIZE_8B,A_LOAD,A_DEALLOCFRAME), "Deallocate stack frame",
+{ fHIDE(size8u_t tmp;) fEA_REG(RsV);
+	fLOAD(1,8,u,EA,tmp);
+	RddV = fFRAME_UNSCRAMBLE(tmp);
+	fWRITE_SP(EA+8); })
+
+DEF_MAPPING(L6_deallocframe_map_to_raw,"deallocframe","r31:30=deallocframe(r30):raw")
+#else
+Q6INSN(L2_deallocframe,"deallocframe", ATTRIBS(A_MEMSIZE_8B,A_LOAD,A_DEALLOCFRAME), "Deallocate stack frame",
+{ fHIDE(size8u_t tmp;) fEA_REG(fREAD_FP());
+	fLOAD(1,8,u,EA,tmp);
+	tmp = fFRAME_UNSCRAMBLE(tmp);
+	fWRITE_LR(fGETWORD(1,tmp));
+	fWRITE_FP(fGETWORD(0,tmp));
+	fWRITE_SP(EA+8); })
+#endif
+
+#if FRAME_EXPLICIT
+Q6INSN(L4_return,"Rdd32=dealloc_return(Rs32):raw", ATTRIBS(A_ROPS_2,A_JINDIR,A_MEMSIZE_8B,A_LOAD,A_RETURN), "Deallocate stack frame and return",
+{ fHIDE(size8u_t tmp;) fEA_REG(RsV);
+	fLOAD(1,8,u,EA,tmp);
+	RddV = fFRAME_UNSCRAMBLE(tmp);
+	fWRITE_SP(EA+8);
+    fJUMPR(REG_LR,fGETWORD(1,RddV),COF_TYPE_JUMPR);})
+
+DEF_MAPPING(L6_return_map_to_raw,"dealloc_return","r31:30=dealloc_return(r30):raw")
+#else
+Q6INSN(L4_return,"dealloc_return", ATTRIBS(A_ROPS_2,A_JINDIR,A_MEMSIZE_8B,A_LOAD,A_RETURN), "Deallocate stack frame and return",
+{ fHIDE(size8u_t tmp;) fEA_REG(fREAD_FP());
+	fLOAD(1,8,u,EA,tmp);
+	tmp = fFRAME_UNSCRAMBLE(tmp);
+	fWRITE_LR(fGETWORD(1,tmp));
+	fWRITE_FP(fGETWORD(0,tmp));
+	fWRITE_SP(EA+8);
+    fJUMPR(REG_LR,fGETWORD(1,tmp),COF_TYPE_JUMPR);})
+#endif
+
+#define CONDSEM(SRCREG,STALLBITS0,STALLBITS1,PREDFUNC,PREDARG,STALLSPEC,PREDCOND) \
+{ \
+	fHIDE(size8u_t tmp;) \
+	fBRANCH_SPECULATE_STALL(PREDFUNC##PREDCOND(PREDARG),,STALLSPEC,STALLBITS0,STALLBITS1); \
+	fEA_REG(SRCREG); \
+	if (PREDFUNC##PREDCOND(PREDARG)) { \
+		fLOAD(1,8,u,EA,tmp); \
+		RddV = fFRAME_UNSCRAMBLE(tmp); \
+		fWRITE_SP(EA+8); \
+		fJUMPR(REG_LR,fGETWORD(1,RddV),COF_TYPE_JUMPR); \
+	} else { \
+		LOAD_CANCEL(EA); \
+	} \
+}
+
+#define COND_RETURN_TF(TG,TG2,DOTNEW,STALLBITS0,STALLBITS1,STALLSPEC,ATTRIBS,PREDFUNC,PREDARG,T_NT) \
+	Q6INSN(TG##_t##TG2,"if (Pv4"DOTNEW") Rdd32=dealloc_return(Rs32)"T_NT":raw",ATTRIBS,"deallocate stack frame and return", \
+		CONDSEM(RsV,STALLBITS0,STALLBITS1,PREDFUNC,PREDARG,STALLSPEC,)) \
+	Q6INSN(TG##_f##TG2,"if (!Pv4"DOTNEW") Rdd32=dealloc_return(Rs32)"T_NT":raw",ATTRIBS,"deallocate stack frame and return", \
+		CONDSEM(RsV,STALLBITS0,STALLBITS1,PREDFUNC##NOT,PREDARG,STALLSPEC,)) \
+	DEF_MAPPING(TG##_map_to_raw_t##TG2,"if (Pv4"DOTNEW") dealloc_return"T_NT,"if (Pv4"DOTNEW") r31:30=dealloc_return(r30)"T_NT":raw") \
+	DEF_MAPPING(TG##_map_to_raw_f##TG2,"if (!Pv4"DOTNEW") dealloc_return"T_NT,"if (!Pv4"DOTNEW") r31:30=dealloc_return(r30)"T_NT":raw")
+
+#define COND_RETURN_NEW(TG,STALLBITS0,STALLBITS1,ATTRIBS) \
+	COND_RETURN_TF(TG,new_pt,".new",12,0,SPECULATE_TAKEN,ATTRIBS,fLSBNEW,PvN,":t") \
+	COND_RETURN_TF(TG,new_pnt,".new",12,0,SPECULATE_NOT_TAKEN,ATTRIBS,fLSBNEW,PvN,":nt") \
+
+#define RETURN_ATTRIBS A_ROPS_2,A_MEMSIZE_8B,A_LOAD,A_RETURN
+
+COND_RETURN_TF(L4_return,,,7,0,SPECULATE_NOT_TAKEN,ATTRIBS(RETURN_ATTRIBS,A_JINDIROLD,A_PRED_BIT_7),fLSBOLD,PvV,)
+COND_RETURN_NEW(L4_return,12,0,ATTRIBS(RETURN_ATTRIBS,A_JINDIRNEW,A_PRED_BIT_12))
+
+
+
+
+Q6INSN(L2_loadw_locked,"Rd32=memw_locked(Rs32)", ATTRIBS(A_MEMSIZE_4B,A_LOAD,A_RESTRICT_SLOT0ONLY,A_RESTRICT_PACKET_AXOK,A_NOTE_AXOK), "Load word with lock",
+{ fEA_REG(RsV); fLOAD_LOCKED(1,4,u,EA,RdV) });
+
+
+Q6INSN(S2_storew_locked,"memw_locked(Rs32,Pd4)=Rt32", ATTRIBS(A_MEMSIZE_4B,A_STORE,A_RESTRICT_SLOT0ONLY,A_RESTRICT_PACKET_AXOK,A_NOTE_AXOK,A_RESTRICT_LATEPRED,A_NOTE_LATEPRED), "Store word with lock",
+{ fEA_REG(RsV); fSTORE_LOCKED(1,4,EA,RtV,PdV) fHIDE(MARK_LATE_PRED_WRITE(PdN)) });
+
+
+Q6INSN(L4_loadd_locked,"Rdd32=memd_locked(Rs32)", ATTRIBS(A_MEMSIZE_8B,A_LOAD,A_RESTRICT_SLOT0ONLY,A_RESTRICT_PACKET_AXOK,A_NOTE_AXOK), "Load double with lock",
+{ fEA_REG(RsV); fLOAD_LOCKED(1,8,u,EA,RddV) });
+
+Q6INSN(L4_loadw_phys,"Rd32=memw_phys(Rs32,Rt32)", ATTRIBS(A_PRIV,A_RESTRICT_SLOT0ONLY,A_NOTE_PRIV,A_MEMSIZE_4B,A_LOAD,A_NOTE_NOPACKET,A_RESTRICT_NOPACKET), "Load word from physical address",
+{ fLOAD_PHYS(1,4,u,RsV,RtV,RdV); });
+
+
+Q6INSN(S4_stored_locked,"memd_locked(Rs32,Pd4)=Rtt32", ATTRIBS(A_MEMSIZE_8B,A_STORE,A_RESTRICT_SLOT0ONLY,A_RESTRICT_PACKET_AXOK,A_NOTE_AXOK,A_RESTRICT_LATEPRED,A_NOTE_LATEPRED), "Store word with lock",
+{ fEA_REG(RsV); fSTORE_LOCKED(1,8,EA,RttV,PdV) fHIDE(MARK_LATE_PRED_WRITE(PdN)) });
+
+
+
+
+
+/*****************************************************************/
+/*                                                               */
+/*                       Predicated LDST                         */
+/*                                                               */
+/*****************************************************************/
+
+#define STD_PLD_AMODES(TAG,OPER,DESCR,ATTRIB,SHFT,SHFTNUM,SEMANTICS)\
+Q6INSN(L4_##TAG##_rr,  OPER"(Rs32+Rt32<<#u2)",            ATTRIB,DESCR,{fEA_RRs(RsV,RtV,uiV); SEMANTICS;})\
+Q6INSN(L2_p##TAG##t_io, "if (Pt4) "OPER"(Rs32+#u6:"SHFT")",            ATTRIB,DESCR,{fIMMEXT(uiV); fEA_RI(RsV,uiV); if(fLSBOLD(PtV)){SEMANTICS;} else {LOAD_CANCEL(EA);}})\
+Q6INSN(L2_p##TAG##t_pi, "if (Pt4) "OPER"(Rx32++#s4:"SHFT")",           ATTRIB,DESCR,{fEA_REG(RxV); if(fLSBOLD(PtV)){ fPM_I(RxV,siV); SEMANTICS;} else {LOAD_CANCEL(EA);}})\
+Q6INSN(L2_p##TAG##f_io, "if (!Pt4) "OPER"(Rs32+#u6:"SHFT")",           ATTRIB,DESCR,{fIMMEXT(uiV); fEA_RI(RsV,uiV); if(fLSBOLDNOT(PtV)){ SEMANTICS; } else {LOAD_CANCEL(EA);}})\
+Q6INSN(L2_p##TAG##f_pi, "if (!Pt4) "OPER"(Rx32++#s4:"SHFT")",          ATTRIB,DESCR,{fEA_REG(RxV); if(fLSBOLDNOT(PtV)){ fPM_I(RxV,siV); SEMANTICS;} else {LOAD_CANCEL(EA);}})\
+Q6INSN(L2_p##TAG##tnew_io,"if (Pt4.new) "OPER"(Rs32+#u6:"SHFT")",ATTRIB,DESCR,{fIMMEXT(uiV); fEA_RI(RsV,uiV); if (fLSBNEW(PtN))  { SEMANTICS; } else {LOAD_CANCEL(EA);}})\
+Q6INSN(L2_p##TAG##fnew_io,"if (!Pt4.new) "OPER"(Rs32+#u6:"SHFT")",ATTRIB,DESCR,{fIMMEXT(uiV); fEA_RI(RsV,uiV); if (fLSBNEWNOT(PtN)) { SEMANTICS; } else {LOAD_CANCEL(EA);}})\
+Q6INSN(L4_p##TAG##t_rr, "if (Pv4) "OPER"(Rs32+Rt32<<#u2)",            ATTRIB,DESCR,{fEA_RRs(RsV,RtV,uiV); if(fLSBOLD(PvV)){ SEMANTICS;} else {LOAD_CANCEL(EA);}})\
+Q6INSN(L4_p##TAG##f_rr, "if (!Pv4) "OPER"(Rs32+Rt32<<#u2)",           ATTRIB,DESCR,{fEA_RRs(RsV,RtV,uiV); if(fLSBOLDNOT(PvV)){ SEMANTICS; } else {LOAD_CANCEL(EA);}})\
+Q6INSN(L4_p##TAG##tnew_rr,"if (Pv4.new) "OPER"(Rs32+Rt32<<#u2)",ATTRIB,DESCR,{fEA_RRs(RsV,RtV,uiV); if (fLSBNEW(PvN))  { SEMANTICS; } else {LOAD_CANCEL(EA);}})\
+Q6INSN(L4_p##TAG##fnew_rr,"if (!Pv4.new) "OPER"(Rs32+Rt32<<#u2)",ATTRIB,DESCR,{fEA_RRs(RsV,RtV,uiV); if (fLSBNEWNOT(PvN)) { SEMANTICS; } else {LOAD_CANCEL(EA);}})\
+Q6INSN(L2_p##TAG##tnew_pi, "if (Pt4.new) "OPER"(Rx32++#s4:"SHFT")",           ATTRIB,DESCR,{fEA_REG(RxV); if(fLSBNEW(PtN)){ fPM_I(RxV,siV); SEMANTICS;} else {LOAD_CANCEL(EA);}})\
+Q6INSN(L2_p##TAG##fnew_pi, "if (!Pt4.new) "OPER"(Rx32++#s4:"SHFT")",          ATTRIB,DESCR,{fEA_REG(RxV); if(fLSBNEWNOT(PtN)){ fPM_I(RxV,siV); SEMANTICS;} else {LOAD_CANCEL(EA);}})\
+Q6INSN(L4_p##TAG##t_abs, "if (Pt4) "OPER"(#u6)",            ATTRIB,DESCR,{fMUST_IMMEXT(uiV); fEA_IMM(uiV); if(fLSBOLD(PtV)){ SEMANTICS;} else {LOAD_CANCEL(EA);}})\
+Q6INSN(L4_p##TAG##f_abs, "if (!Pt4) "OPER"(#u6)",           ATTRIB,DESCR,{fMUST_IMMEXT(uiV); fEA_IMM(uiV); if(fLSBOLDNOT(PtV)){ SEMANTICS; } else {LOAD_CANCEL(EA);}})\
+Q6INSN(L4_p##TAG##tnew_abs,"if (Pt4.new) "OPER"(#u6)",ATTRIB,DESCR,{fMUST_IMMEXT(uiV); fEA_IMM(uiV);if (fLSBNEW(PtN))  { SEMANTICS; } else {LOAD_CANCEL(EA);}})\
+Q6INSN(L4_p##TAG##fnew_abs,"if (!Pt4.new) "OPER"(#u6)",ATTRIB,DESCR,{fMUST_IMMEXT(uiV); fEA_IMM(uiV);if (fLSBNEWNOT(PtN)) { SEMANTICS; } else {LOAD_CANCEL(EA);}})\
+DEF_MAPPING(L2_p##TAG##t_zomap, "if (Pt4) "OPER"(Rs32)", "if (Pt4) "OPER"(Rs32+#0)"); \
+DEF_MAPPING(L2_p##TAG##f_zomap, "if (!Pt4) "OPER"(Rs32)", "if (!Pt4) "OPER"(Rs32+#0)"); \
+DEF_MAPPING(L2_p##TAG##tnew_zomap, "if (Pt4.new) "OPER"(Rs32)", "if (Pt4.new) "OPER"(Rs32+#0)"); \
+DEF_MAPPING(L2_p##TAG##fnew_zomap, "if (!Pt4.new) "OPER"(Rs32)", "if (!Pt4.new) "OPER"(Rs32+#0)");
+
+
+
+/* The set of 32-bit predicated load instructions */
+STD_PLD_AMODES(loadrub,"Rd32=memub","Load Unsigned Byte",ATTRIBS(A_ARCHV2,A_MEMSIZE_1B,A_LOAD),"0",0,fLOAD(1,1,u,EA,RdV))
+STD_PLD_AMODES(loadrb, "Rd32=memb", "Load signed Byte",ATTRIBS(A_ARCHV2,A_MEMSIZE_1B,A_LOAD),"0",0,fLOAD(1,1,s,EA,RdV))
+STD_PLD_AMODES(loadruh,"Rd32=memuh","Load unsigned Half integer",ATTRIBS(A_ARCHV2,A_MEMSIZE_2B,A_LOAD),"1",1,fLOAD(1,2,u,EA,RdV))
+STD_PLD_AMODES(loadrh, "Rd32=memh", "Load signed Half integer",ATTRIBS(A_ARCHV2,A_MEMSIZE_2B,A_LOAD),"1",1,fLOAD(1,2,s,EA,RdV))
+STD_PLD_AMODES(loadri, "Rd32=memw", "Load Word",ATTRIBS(A_ARCHV2,A_MEMSIZE_4B,A_LOAD),"2",2,fLOAD(1,4,u,EA,RdV))
+STD_PLD_AMODES(loadrd, "Rdd32=memd","Load Double integer",ATTRIBS(A_ARCHV2,A_MEMSIZE_8B,A_LOAD),"3",3,fLOAD(1,8,u,EA,RddV))
+
+/* The set of addressing modes standard to all predicated store instructions */
+#define STD_PST_AMODES(TAG,DEST,OPER,DESCR,ATTRIB,SHFT,SHFTNUM,SEMANTICS)\
+Q6INSN(S4_##TAG##_rr,  OPER"(Rs32+Ru32<<#u2)="DEST,            ATTRIB,DESCR,{fEA_RRs(RsV,RuV,uiV); SEMANTICS;})\
+Q6INSN(S2_p##TAG##t_io, "if (Pv4) "OPER"(Rs32+#u6:"SHFT")="DEST,     ATTRIB,DESCR,{fIMMEXT(uiV); fEA_RI(RsV,uiV); if (fLSBOLD(PvV)){ SEMANTICS; } else {STORE_CANCEL(EA);}})\
+Q6INSN(S2_p##TAG##t_pi, "if (Pv4) "OPER"(Rx32++#s4:"SHFT")="DEST,     ATTRIB,DESCR,{fEA_REG(RxV); if (fLSBOLD(PvV)){ fPM_I(RxV,siV); SEMANTICS;} else {STORE_CANCEL(EA);}})\
+Q6INSN(S2_p##TAG##f_io, "if (!Pv4) "OPER"(Rs32+#u6:"SHFT")="DEST,     ATTRIB,DESCR,{fIMMEXT(uiV); fEA_RI(RsV,uiV); if (fLSBOLDNOT(PvV)){ SEMANTICS; } else {STORE_CANCEL(EA);}})\
+Q6INSN(S2_p##TAG##f_pi, "if (!Pv4) "OPER"(Rx32++#s4:"SHFT")="DEST,     ATTRIB,DESCR,{fEA_REG(RxV); if (fLSBOLDNOT(PvV)){ fPM_I(RxV,siV); SEMANTICS;} else {STORE_CANCEL(EA);}})\
+Q6INSN(S4_p##TAG##t_rr, "if (Pv4) "OPER"(Rs32+Ru32<<#u2)="DEST,     ATTRIB,DESCR,{fEA_RRs(RsV,RuV,uiV); if (fLSBOLD(PvV)){ SEMANTICS; } else {STORE_CANCEL(EA);}})\
+Q6INSN(S4_p##TAG##f_rr, "if (!Pv4) "OPER"(Rs32+Ru32<<#u2)="DEST,     ATTRIB,DESCR,{fEA_RRs(RsV,RuV,uiV); if (fLSBOLDNOT(PvV)){ SEMANTICS; } else {STORE_CANCEL(EA);}})\
+Q6INSN(S4_p##TAG##tnew_io,"if (Pv4.new) "OPER"(Rs32+#u6:"SHFT")="DEST,ATTRIB,DESCR,{fIMMEXT(uiV); fEA_RI(RsV,uiV); if ( fLSBNEW(PvN)) { SEMANTICS; } else {STORE_CANCEL(EA);}})\
+Q6INSN(S4_p##TAG##fnew_io,"if (!Pv4.new) "OPER"(Rs32+#u6:"SHFT")="DEST,ATTRIB,DESCR,{fIMMEXT(uiV); fEA_RI(RsV,uiV); if (fLSBNEWNOT(PvN)) { SEMANTICS; } else {STORE_CANCEL(EA);}})\
+Q6INSN(S4_p##TAG##tnew_rr,"if (Pv4.new) "OPER"(Rs32+Ru32<<#u2)="DEST,ATTRIB,DESCR,{fEA_RRs(RsV,RuV,uiV); if ( fLSBNEW(PvN)) { SEMANTICS; } else {STORE_CANCEL(EA);}})\
+Q6INSN(S4_p##TAG##fnew_rr,"if (!Pv4.new) "OPER"(Rs32+Ru32<<#u2)="DEST,ATTRIB,DESCR,{fEA_RRs(RsV,RuV,uiV); if (fLSBNEWNOT(PvN)) { SEMANTICS; } else {STORE_CANCEL(EA);}})\
+Q6INSN(S2_p##TAG##tnew_pi, "if (Pv4.new) "OPER"(Rx32++#s4:"SHFT")="DEST,     ATTRIB,DESCR,{fEA_REG(RxV); if (fLSBNEW(PvN)){ fPM_I(RxV,siV); SEMANTICS;} else {STORE_CANCEL(EA);}})\
+Q6INSN(S2_p##TAG##fnew_pi, "if (!Pv4.new) "OPER"(Rx32++#s4:"SHFT")="DEST,     ATTRIB,DESCR,{fEA_REG(RxV); if (fLSBNEWNOT(PvN)){ fPM_I(RxV,siV); SEMANTICS;} else {STORE_CANCEL(EA);}})\
+Q6INSN(S4_p##TAG##t_abs, "if (Pv4) "OPER"(#u6)="DEST,     ATTRIB,DESCR,{fMUST_IMMEXT(uiV); fEA_IMM(uiV); if (fLSBOLD(PvV)){ SEMANTICS; } else {STORE_CANCEL(EA);}})\
+Q6INSN(S4_p##TAG##f_abs, "if (!Pv4) "OPER"(#u6)="DEST,     ATTRIB,DESCR,{fMUST_IMMEXT(uiV);fEA_IMM(uiV); if (fLSBOLDNOT(PvV)){ SEMANTICS; } else {STORE_CANCEL(EA);}})\
+Q6INSN(S4_p##TAG##tnew_abs,"if (Pv4.new) "OPER"(#u6)="DEST,ATTRIB,DESCR,{fMUST_IMMEXT(uiV);fEA_IMM(uiV); if ( fLSBNEW(PvN)) { SEMANTICS; } else {STORE_CANCEL(EA);}})\
+Q6INSN(S4_p##TAG##fnew_abs,"if (!Pv4.new) "OPER"(#u6)="DEST,ATTRIB,DESCR,{fMUST_IMMEXT(uiV);fEA_IMM(uiV); if (fLSBNEWNOT(PvN)) { SEMANTICS; } else {STORE_CANCEL(EA);}})\
+DEF_MAPPING(S2_p##TAG##t_zomap, "if (Pv4) "OPER"(Rs32)="DEST, "if (Pv4) "OPER"(Rs32+#0)="DEST); \
+DEF_MAPPING(S2_p##TAG##f_zomap, "if (!Pv4) "OPER"(Rs32)="DEST, "if (!Pv4) "OPER"(Rs32+#0)="DEST); \
+DEF_MAPPING(S4_p##TAG##tnew_zomap, "if (Pv4.new) "OPER"(Rs32)="DEST, "if (Pv4.new) "OPER"(Rs32+#0)="DEST); \
+DEF_MAPPING(S4_p##TAG##fnew_zomap, "if (!Pv4.new) "OPER"(Rs32)="DEST, "if (!Pv4.new) "OPER"(Rs32+#0)="DEST);
+
+
+
+
+/* The set of 32-bit predicated store instructions */
+STD_PST_AMODES(storerb,"Rt32","memb","Store Byte",ATTRIBS(A_ARCHV2,A_MEMSIZE_1B,A_STORE),"0",0,fSTORE(1,1,EA,fGETBYTE(0,RtV)))
+STD_PST_AMODES(storerh,"Rt32","memh","Store Half integer",ATTRIBS(A_ARCHV2,A_MEMSIZE_2B,A_STORE),"1",1,fSTORE(1,2,EA,fGETHALF(0,RtV)))
+STD_PST_AMODES(storerf,"Rt.H32","memh","Store Upper Half integer",ATTRIBS(A_ARCHV2,A_MEMSIZE_2B,A_STORE),"1",1,fSTORE(1,2,EA,fGETHALF(1,RtV)))
+STD_PST_AMODES(storeri,"Rt32","memw","Store Word",ATTRIBS(A_ARCHV2,A_MEMSIZE_4B,A_STORE),"2",2,fSTORE(1,4,EA,RtV))
+STD_PST_AMODES(storerd,"Rtt32","memd","Store Double integer",ATTRIBS(A_ARCHV2,A_MEMSIZE_8B,A_STORE),"3",3,fSTORE(1,8,EA,RttV))
+STD_PST_AMODES(storerinew,"Nt8.new","memw","Store Word",ATTRIBS(A_ARCHV2,A_NOTE_NEWVAL_SLOT0,A_NVSTORE,A_NOTE_NVSLOT0,A_MEMSIZE_4B,A_STORE,A_RESTRICT_NOSLOT1_STORE),"2",2,fSTORE(1,4,EA,fNEWREG_ST(NtN)))
+STD_PST_AMODES(storerbnew,"Nt8.new","memb","Store Byte",ATTRIBS(A_ARCHV2,A_NOTE_NEWVAL_SLOT0,A_NVSTORE,A_NOTE_NVSLOT0,A_MEMSIZE_1B,A_STORE,A_RESTRICT_NOSLOT1_STORE),"0",0,fSTORE(1,1,EA,fGETBYTE(0,fNEWREG_ST(NtN))))
+STD_PST_AMODES(storerhnew,"Nt8.new","memh","Store Half integer",ATTRIBS(A_ARCHV2,A_NOTE_NEWVAL_SLOT0,A_NVSTORE,A_NOTE_NVSLOT0,A_MEMSIZE_2B,A_STORE,A_RESTRICT_NOSLOT1_STORE),"1",1,fSTORE(1,2,EA,fGETHALF(0,fNEWREG_ST(NtN))))
+
+
+
+
+/*****************************************************************/
+/*                                                               */
+/*                       Mem-Ops (Load-op-Store)                 */
+/*                                                               */
+/*****************************************************************/
+
+/* The set of 32-bit non-predicated mem-ops */
+#define STD_MEMOP_AMODES(TAG,OPER,DESCR,SEMANTICS)\
+Q6INSN(L4_##TAG##w_io,  "memw(Rs32+#u6:2)"OPER,     ATTRIBS(A_MEMOP,A_ROPS_3,A_MEMSIZE_4B,A_RESTRICT_SLOT0ONLY,A_RESTRICT_NOSLOT1_STORE),DESCR,{fIMMEXT(uiV); fEA_RI(RsV,uiV); fHIDE(size4s_t tmp;) fLOAD(1,4,s,EA,tmp); SEMANTICS;  fSTORE(1,4,EA,tmp); })\
+Q6INSN(L4_##TAG##b_io,  "memb(Rs32+#u6:0)"OPER,     ATTRIBS(A_MEMOP,A_ROPS_3,A_MEMSIZE_1B,A_RESTRICT_SLOT0ONLY,A_RESTRICT_NOSLOT1_STORE),DESCR,{fIMMEXT(uiV); fEA_RI(RsV,uiV); fHIDE(size4s_t tmp;) fLOAD(1,1,s,EA,tmp); SEMANTICS;  fSTORE(1,1,EA,tmp); })\
+Q6INSN(L4_##TAG##h_io,  "memh(Rs32+#u6:1)"OPER,     ATTRIBS(A_MEMOP,A_ROPS_3,A_MEMSIZE_2B,A_RESTRICT_SLOT0ONLY,A_RESTRICT_NOSLOT1_STORE),DESCR,{fIMMEXT(uiV); fEA_RI(RsV,uiV); fHIDE(size4s_t tmp;) fLOAD(1,2,s,EA,tmp); SEMANTICS;  fSTORE(1,2,EA,tmp); })\
+DEF_MAPPING(L4_##TAG##w_zomap, "memw(Rs32)"OPER, "memw(Rs32+#0)"OPER);\
+DEF_MAPPING(L4_##TAG##h_zomap, "memh(Rs32)"OPER, "memh(Rs32+#0)"OPER);\
+DEF_MAPPING(L4_##TAG##b_zomap, "memb(Rs32)"OPER, "memb(Rs32+#0)"OPER);
+
+
+
+STD_MEMOP_AMODES(add_memop, "+=Rt32", "Add Register to Memory Word", tmp += RtV)
+STD_MEMOP_AMODES(sub_memop, "-=Rt32", "Sub Register from Memory Word", tmp -= RtV)
+STD_MEMOP_AMODES(and_memop, "&=Rt32", "Logical AND Register to Memory Word", tmp &= RtV)
+STD_MEMOP_AMODES(or_memop, "|=Rt32", "Logical OR Register to Memory Word", tmp |= RtV)
+
+
+STD_MEMOP_AMODES(iadd_memop, "+=#U5", "Add Immediate to Memory Word", tmp += UiV)
+STD_MEMOP_AMODES(isub_memop, "-=#U5", "Sub Immediate to Memory Word", tmp -= UiV)
+STD_MEMOP_AMODES(iand_memop, "=clrbit(#U5)", "Clear a bit in memory", tmp &= (~(1<<UiV)))
+STD_MEMOP_AMODES(ior_memop,  "=setbit(#U5)", "Set a bit in memory", tmp |= (1<<UiV))
+
+
+/*****************************************************************/
+/*                                                               */
+/*                  V4 store immediates                          */
+/*                                                               */
+/*****************************************************************/
+/* Predicated Store immediates */
+#define V4_PSTI_AMODES(TAG,DEST,OPER,DESCR,ATTRIB,SHFT,SEMANTICS)\
+Q6INSN(S4_##TAG##t_io,"if (Pv4) "OPER"(Rs32+#u6:"SHFT")="DEST,ATTRIB,DESCR,{fEA_RI(RsV,uiV); if (fLSBOLD(PvV)){ SEMANTICS; } else {STORE_CANCEL(EA);}})\
+Q6INSN(S4_##TAG##f_io,"if (!Pv4) "OPER"(Rs32+#u6:"SHFT")="DEST,ATTRIB,DESCR,{fEA_RI(RsV,uiV); if (fLSBOLDNOT(PvV)){ SEMANTICS; } else {STORE_CANCEL(EA);}})\
+Q6INSN(S4_##TAG##tnew_io,"if (Pv4.new) "OPER"(Rs32+#u6:"SHFT")="DEST,ATTRIB,DESCR,{fEA_RI(RsV,uiV); if (fLSBNEW(PvN)){ SEMANTICS; } else {STORE_CANCEL(EA);}})\
+Q6INSN(S4_##TAG##fnew_io,"if (!Pv4.new) "OPER"(Rs32+#u6:"SHFT")="DEST,ATTRIB,DESCR,{fEA_RI(RsV,uiV); if (fLSBNEWNOT(PvN)){ SEMANTICS; } else {STORE_CANCEL(EA);}})\
+DEF_MAPPING(S4_##TAG##t_zomap,"if (Pv4) "OPER"(Rs32)="DEST,"if (Pv4) "OPER"(Rs32+#0)="DEST);\
+DEF_MAPPING(S4_##TAG##f_zomap,"if (!Pv4) "OPER"(Rs32)="DEST,"if (!Pv4) "OPER"(Rs32+#0)="DEST);\
+DEF_MAPPING(S4_##TAG##tnew_zomap,"if (Pv4.new) "OPER"(Rs32)="DEST,"if (Pv4.new) "OPER"(Rs32+#0)="DEST);\
+DEF_MAPPING(S4_##TAG##fnew_zomap,"if (!Pv4.new) "OPER"(Rs32)="DEST,"if (!Pv4.new) "OPER"(Rs32+#0)="DEST);\
+
+/* The set of 32-bit store immediate instructions */
+V4_PSTI_AMODES(storeirb,"#S6","memb","Store Immediate Byte",ATTRIBS(A_ARCHV2,A_ROPS_2,A_MEMSIZE_1B,A_STORE,A_STOREIMMED),"0",fIMMEXT(SiV); fSTORE(1,1,EA,SiV))
+V4_PSTI_AMODES(storeirh,"#S6","memh","Store Immediate Half integer",ATTRIBS(A_ARCHV2,A_ROPS_2,A_MEMSIZE_2B,A_STORE,A_STOREIMMED),"1",fIMMEXT(SiV); fSTORE(1,2,EA,SiV))
+V4_PSTI_AMODES(storeiri,"#S6","memw","Store Immediate Word",ATTRIBS(A_ARCHV2,A_ROPS_2,A_MEMSIZE_4B,A_STORE,A_STOREIMMED),"2",fIMMEXT(SiV); fSTORE(1,4,EA,SiV))
+
+
+/* Non-predicated store immediates */
+#define V4_STI_AMODES(TAG,DEST,OPER,DESCR,ATTRIB,SHFT,SEMANTICS)\
+Q6INSN(S4_##TAG##_io,  OPER"(Rs32+#u6:"SHFT")="DEST,  ATTRIB,DESCR,{fEA_RI(RsV,uiV); SEMANTICS; })\
+/*Q6INSN(S4_##TAG##_rr,  OPER"(Rs32+Rt32<<#u2)="DEST,   ATTRIB,DESCR,{fEA_RRs(RsV,RtV,uiV); SEMANTICS; })*/\
+DEF_MAPPING(S4_##TAG##_zomap,OPER"(Rs32)="DEST,OPER"(Rs32+#0)="DEST);
+
+/* The set of 32-bit store immediate instructions */
+V4_STI_AMODES(storeirb,"#S8","memb","Store Immediate Byte",ATTRIBS(A_ARCHV2,A_ROPS_2,A_MEMSIZE_1B,A_STORE,A_STOREIMMED),"0",fIMMEXT(SiV); fSTORE(1,1,EA,SiV))
+V4_STI_AMODES(storeirh,"#S8","memh","Store Immediate Half integer",ATTRIBS(A_ARCHV2,A_ROPS_2,A_MEMSIZE_2B,A_STORE,A_STOREIMMED),"1",fIMMEXT(SiV); fSTORE(1,2,EA,SiV))
+V4_STI_AMODES(storeiri,"#S8","memw","Store Immediate Word",ATTRIBS(A_ARCHV2,A_ROPS_2,A_MEMSIZE_4B,A_STORE,A_STOREIMMED),"2",fIMMEXT(SiV); fSTORE(1,4,EA,SiV))
+
+
+
+
+
+
+
+/*****************************************************************/
+/*                                                               */
+/*                  V2 GP-relative LD/ST                         */
+/*                                                               */
+/*****************************************************************/
+
+#define STD_GPLD_AMODES(TAG,OPER,DESCR,ATTRIB,SHFT,SEMANTICS)\
+Q6INSN(L2_##TAG##gp, OPER"(gp+#u16:"SHFT")",   ATTRIB,DESCR,{fIMMEXT(uiV); fEA_GPI(uiV); SEMANTICS; })
+
+/* The set of 32-bit load instructions */
+STD_GPLD_AMODES(loadrub,"Rd32=memub","Load Unsigned Byte",ATTRIBS(A_MEMSIZE_1B,A_LOAD,A_ARCHV2),"0",fLOAD(1,1,u,EA,RdV))
+STD_GPLD_AMODES(loadrb, "Rd32=memb", "Load signed Byte",ATTRIBS(A_MEMSIZE_1B,A_LOAD,A_ARCHV2),"0",fLOAD(1,1,s,EA,RdV))
+STD_GPLD_AMODES(loadruh,"Rd32=memuh","Load unsigned Half integer",ATTRIBS(A_MEMSIZE_2B,A_LOAD,A_ARCHV2),"1",fLOAD(1,2,u,EA,RdV))
+STD_GPLD_AMODES(loadrh, "Rd32=memh", "Load signed Half integer",ATTRIBS(A_MEMSIZE_2B,A_LOAD,A_ARCHV2),"1",fLOAD(1,2,s,EA,RdV))
+STD_GPLD_AMODES(loadri, "Rd32=memw", "Load Word",ATTRIBS(A_MEMSIZE_4B,A_LOAD,A_ARCHV2),"2",fLOAD(1,4,u,EA,RdV))
+STD_GPLD_AMODES(loadrd, "Rdd32=memd","Load Double integer",ATTRIBS(A_MEMSIZE_8B,A_LOAD,A_ARCHV2),"3",fLOAD(1,8,u,EA,RddV))
+
+
+#define STD_GPST_AMODES(TAG,DEST,OPER,DESCR,ATTRIB,SHFT,SEMANTICS)\
+Q6INSN(S2_##TAG##gp, OPER"(gp+#u16:"SHFT")="DEST, ATTRIB,DESCR,{fIMMEXT(uiV); fEA_GPI(uiV); SEMANTICS; })
+
+/* The set of 32-bit store instructions */
+STD_GPST_AMODES(storerb, "Rt32", "memb","Store Byte",ATTRIBS(A_MEMSIZE_1B,A_STORE,A_ARCHV2),"0",fSTORE(1,1,EA,fGETBYTE(0,RtV)))
+STD_GPST_AMODES(storerh, "Rt32", "memh","Store Half integer",ATTRIBS(A_MEMSIZE_2B,A_STORE,A_ARCHV2),"1",fSTORE(1,2,EA,fGETHALF(0,RtV)))
+STD_GPST_AMODES(storerf, "Rt.H32", "memh","Store Upper Half integer",ATTRIBS(A_MEMSIZE_2B,A_STORE,A_ARCHV2),"1",fSTORE(1,2,EA,fGETHALF(1,RtV)))
+STD_GPST_AMODES(storeri, "Rt32", "memw","Store Word",ATTRIBS(A_MEMSIZE_4B,A_STORE,A_ARCHV2),"2",fSTORE(1,4,EA,RtV))
+STD_GPST_AMODES(storerd, "Rtt32","memd","Store Double integer",ATTRIBS(A_MEMSIZE_8B,A_STORE,A_ARCHV2),"3",fSTORE(1,8,EA,RttV))
+STD_GPST_AMODES(storerinew, "Nt8.new", "memw","Store Word",ATTRIBS(A_NOTE_NEWVAL_SLOT0,A_NVSTORE,A_NOTE_NVSLOT0,A_MEMSIZE_4B,A_STORE,A_RESTRICT_NOSLOT1_STORE,A_ARCHV2),"2",fSTORE(1,4,EA,fNEWREG_ST(NtN)))
+STD_GPST_AMODES(storerbnew, "Nt8.new", "memb","Store Byte",ATTRIBS(A_NOTE_NEWVAL_SLOT0,A_NVSTORE,A_NOTE_NVSLOT0,A_MEMSIZE_1B,A_STORE,A_RESTRICT_NOSLOT1_STORE,A_ARCHV2),"0",fSTORE(1,1,EA,fGETBYTE(0,fNEWREG_ST(NtN))))
+STD_GPST_AMODES(storerhnew, "Nt8.new", "memh","Store Half integer",ATTRIBS(A_NOTE_NEWVAL_SLOT0,A_NVSTORE,A_NOTE_NVSLOT0,A_MEMSIZE_2B,A_STORE,A_RESTRICT_NOSLOT1_STORE,A_ARCHV2),"1",fSTORE(1,2,EA,fGETHALF(0,fNEWREG_ST(NtN))))
+
+
+
+
+
+
+#undef FRAME_EXPLICIT
+
diff --git a/target/hexagon/imported/mpy.idef b/target/hexagon/imported/mpy.idef
new file mode 100644
index 0000000..05cd804
--- /dev/null
+++ b/target/hexagon/imported/mpy.idef
@@ -0,0 +1,1269 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+/*
+ * Multiply Instructions
+ */
+
+
+#define STD_SP_MODES(TAG,OPER,ATR,DST,ACCSEM,SEM,OSEM,SATSEM,RNDSEM,ACCSYN)\
+Q6INSN(M2_##TAG##_hh_s0, OPER"(Rs.H32,Rt.H32)"OSEM,        ATR,"",{DST=SATSEM(RNDSEM(ACCSEM SEM(         fGETHALF(1,RsV),fGETHALF(1,RtV)))); ACCSYN;})\
+Q6INSN(M2_##TAG##_hh_s1, OPER"(Rs.H32,Rt.H32):<<1"OSEM,    ATR,"",{DST=SATSEM(RNDSEM(ACCSEM fSCALE(1,SEM(fGETHALF(1,RsV),fGETHALF(1,RtV))))); ACCSYN;})\
+Q6INSN(M2_##TAG##_hl_s0, OPER"(Rs.H32,Rt.L32)"OSEM,        ATR,"",{DST=SATSEM(RNDSEM(ACCSEM SEM(         fGETHALF(1,RsV),fGETHALF(0,RtV)))); ACCSYN;})\
+Q6INSN(M2_##TAG##_hl_s1, OPER"(Rs.H32,Rt.L32):<<1"OSEM,    ATR,"",{DST=SATSEM(RNDSEM(ACCSEM fSCALE(1,SEM(fGETHALF(1,RsV),fGETHALF(0,RtV))))); ACCSYN;})\
+Q6INSN(M2_##TAG##_lh_s0, OPER"(Rs.L32,Rt.H32)"OSEM,        ATR,"",{DST=SATSEM(RNDSEM(ACCSEM SEM(         fGETHALF(0,RsV),fGETHALF(1,RtV)))); ACCSYN;})\
+Q6INSN(M2_##TAG##_lh_s1, OPER"(Rs.L32,Rt.H32):<<1"OSEM,    ATR,"",{DST=SATSEM(RNDSEM(ACCSEM fSCALE(1,SEM(fGETHALF(0,RsV),fGETHALF(1,RtV))))); ACCSYN;})\
+Q6INSN(M2_##TAG##_ll_s0, OPER"(Rs.L32,Rt.L32)"OSEM,        ATR,"",{DST=SATSEM(RNDSEM(ACCSEM SEM(         fGETHALF(0,RsV),fGETHALF(0,RtV)))); ACCSYN;})\
+Q6INSN(M2_##TAG##_ll_s1, OPER"(Rs.L32,Rt.L32):<<1"OSEM,    ATR,"",{DST=SATSEM(RNDSEM(ACCSEM fSCALE(1,SEM(fGETHALF(0,RsV),fGETHALF(0,RtV))))); ACCSYN;})
+
+/*****************************************************/
+/* multiply 16x16->32 signed instructions            */
+/*****************************************************/
+STD_SP_MODES(mpy_acc,    "Rx32+=mpy", ,RxV,RxV+    ,fMPY16SS,          ,fPASS,fPASS,fACC())
+STD_SP_MODES(mpy_nac,    "Rx32-=mpy", ,RxV,RxV-    ,fMPY16SS,          ,fPASS,fPASS,fACC())
+STD_SP_MODES(mpy_acc_sat,"Rx32+=mpy", ,RxV,RxV+    ,fMPY16SS,":sat"    ,fSAT, fPASS,fACC())
+STD_SP_MODES(mpy_nac_sat,"Rx32-=mpy", ,RxV,RxV-    ,fMPY16SS,":sat"    ,fSAT, fPASS,fACC())
+STD_SP_MODES(mpy,        "Rd32=mpy",  ,RdV,        ,fMPY16SS,          ,fPASS,fPASS,)
+STD_SP_MODES(mpy_sat,    "Rd32=mpy",  ,RdV,        ,fMPY16SS,":sat"    ,fSAT, fPASS,)
+STD_SP_MODES(mpy_rnd,    "Rd32=mpy",  ,RdV,        ,fMPY16SS,":rnd"    ,fPASS,fROUND,)
+STD_SP_MODES(mpy_sat_rnd,"Rd32=mpy",  ,RdV,        ,fMPY16SS,":rnd:sat",fSAT, fROUND,)
+STD_SP_MODES(mpyd_acc,   "Rxx32+=mpy",,RxxV,RxxV+  ,fMPY16SS,          ,fPASS,fPASS,fACC())
+STD_SP_MODES(mpyd_nac,   "Rxx32-=mpy",,RxxV,RxxV-  ,fMPY16SS,          ,fPASS,fPASS,fACC())
+STD_SP_MODES(mpyd,       "Rdd32=mpy", ,RddV,       ,fMPY16SS,          ,fPASS,fPASS,)
+STD_SP_MODES(mpyd_rnd,   "Rdd32=mpy", ,RddV,       ,fMPY16SS,":rnd"    ,fPASS,fROUND,)
+
+
+/*****************************************************/
+/* multiply 16x16->32 unsigned instructions          */
+/*****************************************************/
+#define STD_USP_MODES(TAG,OPER,ATR,DST,ACCSEM,SEM,OSEM,SATSEM,RNDSEM,ACCSYN)\
+Q6INSN(M2_##TAG##_hh_s0, OPER"(Rs.H32,Rt.H32)"OSEM,        ATR,"",{DST=SATSEM(RNDSEM(ACCSEM SEM(         fGETUHALF(1,RsV),fGETUHALF(1,RtV)))); ACCSYN;})\
+Q6INSN(M2_##TAG##_hh_s1, OPER"(Rs.H32,Rt.H32):<<1"OSEM,    ATR,"",{DST=SATSEM(RNDSEM(ACCSEM fSCALE(1,SEM(fGETUHALF(1,RsV),fGETUHALF(1,RtV))))); ACCSYN;})\
+Q6INSN(M2_##TAG##_hl_s0, OPER"(Rs.H32,Rt.L32)"OSEM,        ATR,"",{DST=SATSEM(RNDSEM(ACCSEM SEM(         fGETUHALF(1,RsV),fGETUHALF(0,RtV)))); ACCSYN;})\
+Q6INSN(M2_##TAG##_hl_s1, OPER"(Rs.H32,Rt.L32):<<1"OSEM,    ATR,"",{DST=SATSEM(RNDSEM(ACCSEM fSCALE(1,SEM(fGETUHALF(1,RsV),fGETUHALF(0,RtV))))); ACCSYN;})\
+Q6INSN(M2_##TAG##_lh_s0, OPER"(Rs.L32,Rt.H32)"OSEM,        ATR,"",{DST=SATSEM(RNDSEM(ACCSEM SEM(         fGETUHALF(0,RsV),fGETUHALF(1,RtV))));} ACCSYN;)\
+Q6INSN(M2_##TAG##_lh_s1, OPER"(Rs.L32,Rt.H32):<<1"OSEM,    ATR,"",{DST=SATSEM(RNDSEM(ACCSEM fSCALE(1,SEM(fGETUHALF(0,RsV),fGETUHALF(1,RtV))))); ACCSYN;})\
+Q6INSN(M2_##TAG##_ll_s0, OPER"(Rs.L32,Rt.L32)"OSEM,        ATR,"",{DST=SATSEM(RNDSEM(ACCSEM SEM(         fGETUHALF(0,RsV),fGETUHALF(0,RtV)))); ACCSYN;})\
+Q6INSN(M2_##TAG##_ll_s1, OPER"(Rs.L32,Rt.L32):<<1"OSEM,    ATR,"",{DST=SATSEM(RNDSEM(ACCSEM fSCALE(1,SEM(fGETUHALF(0,RsV),fGETUHALF(0,RtV))))); ACCSYN;})
+
+STD_USP_MODES(mpyu_acc,    "Rx32+=mpyu", ,RxV,RxV+  ,fMPY16UU,          ,fPASS,fPASS,fACC())
+STD_USP_MODES(mpyu_nac,    "Rx32-=mpyu", ,RxV,RxV-  ,fMPY16UU,          ,fPASS,fPASS,fACC())
+STD_USP_MODES(mpyu,        "Rd32=mpyu",  ATTRIBS(A_INTRINSIC_RETURNS_UNSIGNED) ,RdV,  ,fMPY16UU, ,fPASS,fPASS,)
+STD_USP_MODES(mpyud_acc,   "Rxx32+=mpyu",,RxxV,RxxV+,fMPY16UU,          ,fPASS,fPASS,fACC())
+STD_USP_MODES(mpyud_nac,   "Rxx32-=mpyu",,RxxV,RxxV-,fMPY16UU,          ,fPASS,fPASS,fACC())
+STD_USP_MODES(mpyud,       "Rdd32=mpyu", ATTRIBS(A_INTRINSIC_RETURNS_UNSIGNED) ,RddV, ,fMPY16UU, ,fPASS,fPASS,)
+
+/**********************************************/
+/* mpy 16x#s8->32                             */
+/**********************************************/
+
+Q6INSN(M2_mpysip,"Rd32=+mpyi(Rs32,#u8)",ATTRIBS(A_ARCHV2,A_ROPS_2,A_MPY),
+"32-bit Multiply by unsigned immediate",
+{ fIMMEXT(uiV); RdV=RsV*uiV; })
+
+Q6INSN(M2_mpysin,"Rd32=-mpyi(Rs32,#u8)",ATTRIBS(A_ARCHV2,A_ROPS_2,A_MPY),
+"32-bit Multiply by unsigned immediate, negate result",
+{ RdV=RsV*-uiV; })
+
+DEF_V2_COND_MAPPING(M2_mpysmi,"Rd32=mpyi(Rs32,#m9)","((#m9<0) && (#m9>-256))","Rd32=-mpyi(Rs32,#m9*(-1))","Rd32=+mpyi(Rs32,#m9)")
+
+Q6INSN(M2_macsip,"Rx32+=mpyi(Rs32,#u8)",ATTRIBS(A_ARCHV2,A_ROPS_2,A_MPY),
+"32-bit Multiply-Add by unsigned immediate",
+{ fIMMEXT(uiV); RxV=RxV + (RsV*uiV); fACC();})
+
+Q6INSN(M2_macsin,"Rx32-=mpyi(Rs32,#u8)",ATTRIBS(A_ARCHV2,A_ROPS_2,A_MPY),
+"32-bit Multiply-Subtract by unsigned immediate",
+{ fIMMEXT(uiV); RxV=RxV - (RsV*uiV); fACC();})
+
+
+/**********************************************/
+/* multiply/mac  32x32->64 instructions       */
+/**********************************************/
+Q6INSN(M2_dpmpyss_s0,    "Rdd32=mpy(Rs32,Rt32)", ATTRIBS(A_IT_MPY,A_IT_MPY_32),"Multiply 32x32",{RddV=fMPY32SS(RsV,RtV);})
+Q6INSN(M2_dpmpyss_acc_s0,"Rxx32+=mpy(Rs32,Rt32)",ATTRIBS(A_IT_MPY,A_IT_MPY_32,A_ROPS_2),"Multiply 32x32",{RxxV= RxxV + fMPY32SS(RsV,RtV); fACC();})
+Q6INSN(M2_dpmpyss_nac_s0,"Rxx32-=mpy(Rs32,Rt32)",ATTRIBS(A_IT_MPY,A_IT_MPY_32,A_ROPS_2),"Multiply 32x32",{RxxV= RxxV - fMPY32SS(RsV,RtV); fACC();})
+
+Q6INSN(M2_dpmpyuu_s0,    "Rdd32=mpyu(Rs32,Rt32)", ATTRIBS(A_IT_MPY,A_IT_MPY_32,A_INTRINSIC_RETURNS_UNSIGNED),"Multiply 32x32",{RddV=fMPY32UU(fCAST4u(RsV),fCAST4u(RtV));})
+Q6INSN(M2_dpmpyuu_acc_s0,"Rxx32+=mpyu(Rs32,Rt32)",ATTRIBS(A_IT_MPY,A_IT_MPY_32,A_ROPS_2),"Multiply 32x32",{RxxV= RxxV + fMPY32UU(fCAST4u(RsV),fCAST4u(RtV)); fACC();})
+Q6INSN(M2_dpmpyuu_nac_s0,"Rxx32-=mpyu(Rs32,Rt32)",ATTRIBS(A_IT_MPY,A_IT_MPY_32,A_ROPS_2),"Multiply 32x32",{RxxV= RxxV - fMPY32UU(fCAST4u(RsV),fCAST4u(RtV)); fACC();})
+
+
+/******************************************************/
+/* multiply/mac  32x32->32 (upper) instructions       */
+/******************************************************/
+Q6INSN(M2_mpy_up,        "Rd32=mpy(Rs32,Rt32)", ATTRIBS(A_IT_MPY,A_IT_MPY_32),"Multiply 32x32",{RdV=fMPY32SS(RsV,RtV)>>32;})
+Q6INSN(M2_mpy_up_s1,     "Rd32=mpy(Rs32,Rt32):<<1", ATTRIBS(A_IT_MPY,A_IT_MPY_32),"Multiply 32x32",{RdV=fMPY32SS(RsV,RtV)>>31;})
+Q6INSN(M2_mpy_up_s1_sat, "Rd32=mpy(Rs32,Rt32):<<1:sat", ATTRIBS(A_IT_MPY,A_IT_MPY_32),"Multiply 32x32",{RdV=fSAT(fMPY32SS(RsV,RtV)>>31);})
+Q6INSN(M2_mpyu_up,       "Rd32=mpyu(Rs32,Rt32)", ATTRIBS(A_IT_MPY,A_IT_MPY_32,A_INTRINSIC_RETURNS_UNSIGNED),"Multiply 32x32",{RdV=fMPY32UU(fCAST4u(RsV),fCAST4u(RtV))>>32;})
+Q6INSN(M2_mpysu_up,      "Rd32=mpysu(Rs32,Rt32)", ATTRIBS(A_IT_MPY,A_IT_MPY_32),"Multiply 32x32",{RdV=fMPY32SU(RsV,fCAST4u(RtV))>>32;})
+Q6INSN(M2_dpmpyss_rnd_s0,"Rd32=mpy(Rs32,Rt32):rnd", ATTRIBS(A_IT_MPY,A_IT_MPY_32),"Multiply 32x32",{RdV=(fMPY32SS(RsV,RtV)+fCONSTLL(0x80000000))>>32;})
+
+Q6INSN(M4_mac_up_s1_sat, "Rx32+=mpy(Rs32,Rt32):<<1:sat", ATTRIBS(A_IT_MPY,A_IT_MPY_32),"Multiply 32x32",{RxV=fSAT(  (fSE32_64(RxV)) + (fMPY32SS(RsV,RtV)>>31)); fACC();})
+Q6INSN(M4_nac_up_s1_sat, "Rx32-=mpy(Rs32,Rt32):<<1:sat", ATTRIBS(A_IT_MPY,A_IT_MPY_32),"Multiply 32x32",{RxV=fSAT(  (fSE32_64(RxV)) - (fMPY32SS(RsV,RtV)>>31)); fACC();})
+
+
+/**********************************************/
+/* 32x32->32 multiply (lower)                 */
+/**********************************************/
+
+Q6INSN(M2_mpyi,"Rd32=mpyi(Rs32,Rt32)",ATTRIBS(A_IT_MPY,A_IT_MPY_32,A_MPY),
+"Multiply Integer",
+{ RdV=RsV*RtV;})
+
+
+
+DEF_MAPPING(M2_mpyui,"Rd32=mpyui(Rs32,Rt32)","Rd32=mpyi(Rs32,Rt32)")
+
+
+Q6INSN(M2_maci,"Rx32+=mpyi(Rs32,Rt32)",ATTRIBS(A_IT_MPY,A_IT_MPY_32,A_ARCHV2,A_ROPS_2,A_MPY),
+"Multiply-Accumulate Integer",
+{ RxV=RxV + RsV*RtV; fACC();})
+
+Q6INSN(M2_mnaci,"Rx32-=mpyi(Rs32,Rt32)",ATTRIBS(A_IT_MPY,A_IT_MPY_32,A_ARCHV2,A_ROPS_2,A_MPY),
+"Multiply-Neg-Accumulate Integer",
+{ RxV=RxV - RsV*RtV; fACC();})
+
+/****** WHY ARE THESE IN MPY.IDEF? **********/
+
+Q6INSN(M2_acci,"Rx32+=add(Rs32,Rt32)",ATTRIBS(A_ROPS_2,A_ARCHV2),
+"Add with accumulate",
+{ RxV=RxV + RsV + RtV;})
+
+Q6INSN(M2_accii,"Rx32+=add(Rs32,#s8)",ATTRIBS(A_ROPS_2,A_ARCHV2),
+"Add with accumulate",
+{ fIMMEXT(siV); RxV=RxV + RsV + siV;})
+
+Q6INSN(M2_nacci,"Rx32-=add(Rs32,Rt32)",ATTRIBS(A_ROPS_2,A_ARCHV2),
+"Add with neg accumulate",
+{ RxV=RxV - (RsV + RtV);})
+
+Q6INSN(M2_naccii,"Rx32-=add(Rs32,#s8)",ATTRIBS(A_ROPS_2,A_ARCHV2),
+"Add with neg accumulate",
+{ fIMMEXT(siV); RxV=RxV - (RsV + siV);})
+
+Q6INSN(M2_subacc,"Rx32+=sub(Rt32,Rs32)",ATTRIBS(A_ROPS_2,A_ARCHV2),
+"Sub with accumulate",
+{ RxV=RxV + RtV - RsV;})
+
+
+
+
+Q6INSN(M4_mpyrr_addr,"Ry32=add(Ru32,mpyi(Ry32,Rs32))",ATTRIBS(A_ROPS_2,A_MPY),
+"Mpy by immed and add immed",
+{ RyV = RuV + RsV*RyV; fACC();})
+
+Q6INSN(M4_mpyri_addr_u2,"Rd32=add(Ru32,mpyi(#u6:2,Rs32))",ATTRIBS(A_ROPS_2,A_MPY),
+"Mpy by immed and add immed",
+{ RdV = RuV + RsV*uiV; fACC();})
+
+Q6INSN(M4_mpyri_addr,"Rd32=add(Ru32,mpyi(Rs32,#u6))",ATTRIBS(A_ROPS_2,A_MPY),
+"Mpy by immed and add immed",
+{ fIMMEXT(uiV); RdV = RuV + RsV*uiV; fACC();})
+
+
+
+Q6INSN(M4_mpyri_addi,"Rd32=add(#u6,mpyi(Rs32,#U6))",ATTRIBS(A_ROPS_2,A_MPY),
+"Mpy by immed and add immed",
+{ fIMMEXT(uiV); RdV = uiV + RsV*UiV;})
+
+
+
+Q6INSN(M4_mpyrr_addi,"Rd32=add(#u6,mpyi(Rs32,Rt32))",ATTRIBS(A_ROPS_2,A_MPY),
+"Mpy by immed and add immed",
+{ fIMMEXT(uiV); RdV = uiV + RsV*RtV;})
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+/**********************************************/
+/* vector mac  2x[16x16 -> 32]                */
+/**********************************************/
+
+#undef vmac_sema
+#define vmac_sema(N)\
+{ fSETWORD(0,RddV,fSAT(fSCALE(N,fMPY16SS(fGETHALF(0,RsV),fGETHALF(0,RtV)))));\
+  fSETWORD(1,RddV,fSAT(fSCALE(N,fMPY16SS(fGETHALF(1,RsV),fGETHALF(1,RtV)))));\
+}
+Q6INSN(M2_vmpy2s_s0,"Rdd32=vmpyh(Rs32,Rt32):sat",ATTRIBS(),"Vector Multiply",vmac_sema(0))
+Q6INSN(M2_vmpy2s_s1,"Rdd32=vmpyh(Rs32,Rt32):<<1:sat",ATTRIBS(),"Vector Multiply",vmac_sema(1))
+
+
+#undef vmac_sema
+#define vmac_sema(N)\
+{ fSETWORD(0,RxxV,fSAT(fGETWORD(0,RxxV) + fSCALE(N,fMPY16SS(fGETHALF(0,RsV),fGETHALF(0,RtV)))));\
+  fSETWORD(1,RxxV,fSAT(fGETWORD(1,RxxV) + fSCALE(N,fMPY16SS(fGETHALF(1,RsV),fGETHALF(1,RtV)))));\
+  fACC();\
+}
+Q6INSN(M2_vmac2s_s0,"Rxx32+=vmpyh(Rs32,Rt32):sat",ATTRIBS(),"Vector Multiply",vmac_sema(0))
+Q6INSN(M2_vmac2s_s1,"Rxx32+=vmpyh(Rs32,Rt32):<<1:sat",ATTRIBS(),"Vector Multiply",vmac_sema(1))
+
+#undef vmac_sema
+#define vmac_sema(N)\
+{ fSETWORD(0,RddV,fSAT(fSCALE(N,fMPY16SU(fGETHALF(0,RsV),fGETUHALF(0,RtV)))));\
+  fSETWORD(1,RddV,fSAT(fSCALE(N,fMPY16SU(fGETHALF(1,RsV),fGETUHALF(1,RtV)))));\
+}
+Q6INSN(M2_vmpy2su_s0,"Rdd32=vmpyhsu(Rs32,Rt32):sat",ATTRIBS(),"Vector Multiply",vmac_sema(0))
+Q6INSN(M2_vmpy2su_s1,"Rdd32=vmpyhsu(Rs32,Rt32):<<1:sat",ATTRIBS(),"Vector Multiply",vmac_sema(1))
+
+
+#undef vmac_sema
+#define vmac_sema(N)\
+{ fSETWORD(0,RxxV,fSAT(fGETWORD(0,RxxV) + fSCALE(N,fMPY16SU(fGETHALF(0,RsV),fGETUHALF(0,RtV)))));\
+  fSETWORD(1,RxxV,fSAT(fGETWORD(1,RxxV) + fSCALE(N,fMPY16SU(fGETHALF(1,RsV),fGETUHALF(1,RtV)))));\
+  fACC();\
+}
+Q6INSN(M2_vmac2su_s0,"Rxx32+=vmpyhsu(Rs32,Rt32):sat",ATTRIBS(),"Vector Multiply",vmac_sema(0))
+Q6INSN(M2_vmac2su_s1,"Rxx32+=vmpyhsu(Rs32,Rt32):<<1:sat",ATTRIBS(),"Vector Multiply",vmac_sema(1))
+
+
+
+#undef vmac_sema
+#define vmac_sema(N)\
+{ fSETHALF(1,RdV,fGETHALF(1,(fSAT(fSCALE(N,fMPY16SS(fGETHALF(1,RsV),fGETHALF(1,RtV))) + 0x8000))));\
+  fSETHALF(0,RdV,fGETHALF(1,(fSAT(fSCALE(N,fMPY16SS(fGETHALF(0,RsV),fGETHALF(0,RtV))) + 0x8000))));\
+}
+Q6INSN(M2_vmpy2s_s0pack,"Rd32=vmpyh(Rs32,Rt32):rnd:sat",ATTRIBS(A_ARCHV2),"Vector Multiply",vmac_sema(0))
+Q6INSN(M2_vmpy2s_s1pack,"Rd32=vmpyh(Rs32,Rt32):<<1:rnd:sat",ATTRIBS(A_ARCHV2),"Vector Multiply",vmac_sema(1))
+
+
+#undef vmac_sema
+#define vmac_sema(N)\
+{ fSETWORD(0,RxxV,fGETWORD(0,RxxV) + fMPY16SS(fGETHALF(0,RsV),fGETHALF(0,RtV)));\
+  fSETWORD(1,RxxV,fGETWORD(1,RxxV) + fMPY16SS(fGETHALF(1,RsV),fGETHALF(1,RtV)));\
+  fACC();\
+}
+Q6INSN(M2_vmac2,"Rxx32+=vmpyh(Rs32,Rt32)",ATTRIBS(A_ARCHV2),"Vector Multiply",vmac_sema(0))
+
+#undef vmac_sema
+#define vmac_sema(N)\
+{ fSETWORD(0,RddV,fSAT(fSCALE(N,fMPY16SS(fGETHALF(0,RssV),fGETHALF(0,RttV)))));\
+  fSETWORD(1,RddV,fSAT(fSCALE(N,fMPY16SS(fGETHALF(2,RssV),fGETHALF(2,RttV)))));\
+}
+Q6INSN(M2_vmpy2es_s0,"Rdd32=vmpyeh(Rss32,Rtt32):sat",ATTRIBS(),"Vector Multiply",vmac_sema(0))
+Q6INSN(M2_vmpy2es_s1,"Rdd32=vmpyeh(Rss32,Rtt32):<<1:sat",ATTRIBS(),"Vector Multiply",vmac_sema(1))
+
+#undef vmac_sema
+#define vmac_sema(N)\
+{ fSETWORD(0,RxxV,fSAT(fGETWORD(0,RxxV) + fSCALE(N,fMPY16SS(fGETHALF(0,RssV),fGETHALF(0,RttV)))));\
+  fSETWORD(1,RxxV,fSAT(fGETWORD(1,RxxV) + fSCALE(N,fMPY16SS(fGETHALF(2,RssV),fGETHALF(2,RttV)))));\
+  fACC();\
+}
+Q6INSN(M2_vmac2es_s0,"Rxx32+=vmpyeh(Rss32,Rtt32):sat",ATTRIBS(),"Vector Multiply",vmac_sema(0))
+Q6INSN(M2_vmac2es_s1,"Rxx32+=vmpyeh(Rss32,Rtt32):<<1:sat",ATTRIBS(),"Vector Multiply",vmac_sema(1))
+
+#undef vmac_sema
+#define vmac_sema(N)\
+{ fSETWORD(0,RxxV,fGETWORD(0,RxxV) + fMPY16SS(fGETHALF(0,RssV),fGETHALF(0,RttV)));\
+  fSETWORD(1,RxxV,fGETWORD(1,RxxV) + fMPY16SS(fGETHALF(2,RssV),fGETHALF(2,RttV)));\
+  fACC();\
+}
+Q6INSN(M2_vmac2es,"Rxx32+=vmpyeh(Rss32,Rtt32)",ATTRIBS(A_ARCHV2),"Vector Multiply",vmac_sema(0))
+
+
+
+
+/********************************************************/
+/* vrmpyh, aka Big Mac, aka Mac Daddy, aka Mac-ac-ac-ac */
+/* vector mac  4x[16x16] + 64 ->64                      */
+/********************************************************/
+
+
+#undef vmac_sema
+#define vmac_sema(N)\
+{ RxxV = RxxV + fMPY16SS(fGETHALF(0,RssV),fGETHALF(0,RttV))\
+              + fMPY16SS(fGETHALF(1,RssV),fGETHALF(1,RttV))\
+              + fMPY16SS(fGETHALF(2,RssV),fGETHALF(2,RttV))\
+              + fMPY16SS(fGETHALF(3,RssV),fGETHALF(3,RttV));\
+  fACC();\
+}
+Q6INSN(M2_vrmac_s0,"Rxx32+=vrmpyh(Rss32,Rtt32)",ATTRIBS(),"Vector Multiply",vmac_sema(0))
+
+#undef vmac_sema
+#define vmac_sema(N)\
+{ RddV = fMPY16SS(fGETHALF(0,RssV),fGETHALF(0,RttV))\
+       + fMPY16SS(fGETHALF(1,RssV),fGETHALF(1,RttV))\
+       + fMPY16SS(fGETHALF(2,RssV),fGETHALF(2,RttV))\
+       + fMPY16SS(fGETHALF(3,RssV),fGETHALF(3,RttV));\
+}
+Q6INSN(M2_vrmpy_s0,"Rdd32=vrmpyh(Rss32,Rtt32)",ATTRIBS(),"Vector Multiply",vmac_sema(0))
+
+
+
+/******************************************************/
+/* vector dual macs. just like complex                */
+/******************************************************/
+
+
+/* With round&pack */
+#undef dmpy_sema
+#define dmpy_sema(N)\
+{ fSETHALF(0,RdV,fGETHALF(1,(fSAT(fSCALE(N,fMPY16SS(fGETHALF(0,RssV),fGETHALF(0,RttV))) + \
+                                  fSCALE(N,fMPY16SS(fGETHALF(1,RssV),fGETHALF(1,RttV))) + 0x8000))));\
+  fSETHALF(1,RdV,fGETHALF(1,(fSAT(fSCALE(N,fMPY16SS(fGETHALF(2,RssV),fGETHALF(2,RttV))) + \
+                                  fSCALE(N,fMPY16SS(fGETHALF(3,RssV),fGETHALF(3,RttV))) + 0x8000))));\
+}
+Q6INSN(M2_vdmpyrs_s0,"Rd32=vdmpy(Rss32,Rtt32):rnd:sat",ATTRIBS(),    "vector dual mac w/ round&pack",dmpy_sema(0))
+Q6INSN(M2_vdmpyrs_s1,"Rd32=vdmpy(Rss32,Rtt32):<<1:rnd:sat",ATTRIBS(),"vector dual mac w/ round&pack",dmpy_sema(1))
+
+
+
+
+
+/******************************************************/
+/* vector byte multiplies                             */
+/******************************************************/
+
+
+Q6INSN(M5_vrmpybuu,"Rdd32=vrmpybu(Rss32,Rtt32)",ATTRIBS(),
+ "vector dual mpy bytes",
+{
+  fSETWORD(0,RddV,(fMPY16SS(fGETUBYTE(0,RssV),fGETUBYTE(0,RttV)) +
+		   fMPY16SS(fGETUBYTE(1,RssV),fGETUBYTE(1,RttV)) +
+		   fMPY16SS(fGETUBYTE(2,RssV),fGETUBYTE(2,RttV)) +
+		   fMPY16SS(fGETUBYTE(3,RssV),fGETUBYTE(3,RttV))));
+  fSETWORD(1,RddV,(fMPY16SS(fGETUBYTE(4,RssV),fGETUBYTE(4,RttV)) +
+		   fMPY16SS(fGETUBYTE(5,RssV),fGETUBYTE(5,RttV)) +
+		   fMPY16SS(fGETUBYTE(6,RssV),fGETUBYTE(6,RttV)) +
+		   fMPY16SS(fGETUBYTE(7,RssV),fGETUBYTE(7,RttV))));
+ })
+
+Q6INSN(M5_vrmacbuu,"Rxx32+=vrmpybu(Rss32,Rtt32)",ATTRIBS(),
+ "vector dual mac bytes",
+{
+  fSETWORD(0,RxxV,(fGETWORD(0,RxxV) +
+	           fMPY16SS(fGETUBYTE(0,RssV),fGETUBYTE(0,RttV)) +
+		   fMPY16SS(fGETUBYTE(1,RssV),fGETUBYTE(1,RttV)) +
+		   fMPY16SS(fGETUBYTE(2,RssV),fGETUBYTE(2,RttV)) +
+		   fMPY16SS(fGETUBYTE(3,RssV),fGETUBYTE(3,RttV))));
+  fSETWORD(1,RxxV,(fGETWORD(1,RxxV) +
+		   fMPY16SS(fGETUBYTE(4,RssV),fGETUBYTE(4,RttV)) +
+		   fMPY16SS(fGETUBYTE(5,RssV),fGETUBYTE(5,RttV)) +
+		   fMPY16SS(fGETUBYTE(6,RssV),fGETUBYTE(6,RttV)) +
+		   fMPY16SS(fGETUBYTE(7,RssV),fGETUBYTE(7,RttV))));
+  fACC();\
+ })
+
+
+Q6INSN(M5_vrmpybsu,"Rdd32=vrmpybsu(Rss32,Rtt32)",ATTRIBS(),
+ "vector dual mpy bytes",
+{
+  fSETWORD(0,RddV,(fMPY16SS(fGETBYTE(0,RssV),fGETUBYTE(0,RttV)) +
+		   fMPY16SS(fGETBYTE(1,RssV),fGETUBYTE(1,RttV)) +
+		   fMPY16SS(fGETBYTE(2,RssV),fGETUBYTE(2,RttV)) +
+		   fMPY16SS(fGETBYTE(3,RssV),fGETUBYTE(3,RttV))));
+  fSETWORD(1,RddV,(fMPY16SS(fGETBYTE(4,RssV),fGETUBYTE(4,RttV)) +
+		   fMPY16SS(fGETBYTE(5,RssV),fGETUBYTE(5,RttV)) +
+		   fMPY16SS(fGETBYTE(6,RssV),fGETUBYTE(6,RttV)) +
+		   fMPY16SS(fGETBYTE(7,RssV),fGETUBYTE(7,RttV))));
+ })
+
+Q6INSN(M5_vrmacbsu,"Rxx32+=vrmpybsu(Rss32,Rtt32)",ATTRIBS(),
+ "vector dual mac bytes",
+{
+  fSETWORD(0,RxxV,(fGETWORD(0,RxxV) +
+		   fMPY16SS(fGETBYTE(0,RssV),fGETUBYTE(0,RttV)) +
+		   fMPY16SS(fGETBYTE(1,RssV),fGETUBYTE(1,RttV)) +
+		   fMPY16SS(fGETBYTE(2,RssV),fGETUBYTE(2,RttV)) +
+		   fMPY16SS(fGETBYTE(3,RssV),fGETUBYTE(3,RttV))));
+  fSETWORD(1,RxxV,(fGETWORD(1,RxxV) +
+		   fMPY16SS(fGETBYTE(4,RssV),fGETUBYTE(4,RttV)) +
+		   fMPY16SS(fGETBYTE(5,RssV),fGETUBYTE(5,RttV)) +
+		   fMPY16SS(fGETBYTE(6,RssV),fGETUBYTE(6,RttV)) +
+		   fMPY16SS(fGETBYTE(7,RssV),fGETUBYTE(7,RttV))));
+  fACC();\
+ })
+
+
+Q6INSN(M5_vmpybuu,"Rdd32=vmpybu(Rs32,Rt32)",ATTRIBS(),
+ "vector mpy bytes",
+{
+  fSETHALF(0,RddV,(fMPY16SS(fGETUBYTE(0,RsV),fGETUBYTE(0,RtV))));
+  fSETHALF(1,RddV,(fMPY16SS(fGETUBYTE(1,RsV),fGETUBYTE(1,RtV))));
+  fSETHALF(2,RddV,(fMPY16SS(fGETUBYTE(2,RsV),fGETUBYTE(2,RtV))));
+  fSETHALF(3,RddV,(fMPY16SS(fGETUBYTE(3,RsV),fGETUBYTE(3,RtV))));
+ })
+
+Q6INSN(M5_vmpybsu,"Rdd32=vmpybsu(Rs32,Rt32)",ATTRIBS(),
+ "vector mpy bytes",
+{
+  fSETHALF(0,RddV,(fMPY16SS(fGETBYTE(0,RsV),fGETUBYTE(0,RtV))));
+  fSETHALF(1,RddV,(fMPY16SS(fGETBYTE(1,RsV),fGETUBYTE(1,RtV))));
+  fSETHALF(2,RddV,(fMPY16SS(fGETBYTE(2,RsV),fGETUBYTE(2,RtV))));
+  fSETHALF(3,RddV,(fMPY16SS(fGETBYTE(3,RsV),fGETUBYTE(3,RtV))));
+ })
+
+
+Q6INSN(M5_vmacbuu,"Rxx32+=vmpybu(Rs32,Rt32)",ATTRIBS(),
+ "vector mac bytes",
+{
+  fSETHALF(0,RxxV,(fGETHALF(0,RxxV)+fMPY16SS(fGETUBYTE(0,RsV),fGETUBYTE(0,RtV))));
+  fSETHALF(1,RxxV,(fGETHALF(1,RxxV)+fMPY16SS(fGETUBYTE(1,RsV),fGETUBYTE(1,RtV))));
+  fSETHALF(2,RxxV,(fGETHALF(2,RxxV)+fMPY16SS(fGETUBYTE(2,RsV),fGETUBYTE(2,RtV))));
+  fSETHALF(3,RxxV,(fGETHALF(3,RxxV)+fMPY16SS(fGETUBYTE(3,RsV),fGETUBYTE(3,RtV))));
+  fACC();\
+ })
+
+Q6INSN(M5_vmacbsu,"Rxx32+=vmpybsu(Rs32,Rt32)",ATTRIBS(),
+ "vector mac bytes",
+{
+  fSETHALF(0,RxxV,(fGETHALF(0,RxxV)+fMPY16SS(fGETBYTE(0,RsV),fGETUBYTE(0,RtV))));
+  fSETHALF(1,RxxV,(fGETHALF(1,RxxV)+fMPY16SS(fGETBYTE(1,RsV),fGETUBYTE(1,RtV))));
+  fSETHALF(2,RxxV,(fGETHALF(2,RxxV)+fMPY16SS(fGETBYTE(2,RsV),fGETUBYTE(2,RtV))));
+  fSETHALF(3,RxxV,(fGETHALF(3,RxxV)+fMPY16SS(fGETBYTE(3,RsV),fGETUBYTE(3,RtV))));
+  fACC();\
+ })
+
+
+
+Q6INSN(M5_vdmpybsu,"Rdd32=vdmpybsu(Rss32,Rtt32):sat",ATTRIBS(),
+ "vector quad mpy bytes",
+{
+  fSETHALF(0,RddV,fSATN(16,(fMPY16SS(fGETBYTE(0,RssV),fGETUBYTE(0,RttV)) +
+         		   fMPY16SS(fGETBYTE(1,RssV),fGETUBYTE(1,RttV)))));
+  fSETHALF(1,RddV,fSATN(16,(fMPY16SS(fGETBYTE(2,RssV),fGETUBYTE(2,RttV)) +
+         		   fMPY16SS(fGETBYTE(3,RssV),fGETUBYTE(3,RttV)))));
+  fSETHALF(2,RddV,fSATN(16,(fMPY16SS(fGETBYTE(4,RssV),fGETUBYTE(4,RttV)) +
+         		   fMPY16SS(fGETBYTE(5,RssV),fGETUBYTE(5,RttV)))));
+  fSETHALF(3,RddV,fSATN(16,(fMPY16SS(fGETBYTE(6,RssV),fGETUBYTE(6,RttV)) +
+         		   fMPY16SS(fGETBYTE(7,RssV),fGETUBYTE(7,RttV)))));
+ })
+
+
+Q6INSN(M5_vdmacbsu,"Rxx32+=vdmpybsu(Rss32,Rtt32):sat",ATTRIBS(),
+ "vector quad mac bytes",
+{
+  fSETHALF(0,RxxV,fSATN(16,(fGETHALF(0,RxxV) +
+		   fMPY16SS(fGETBYTE(0,RssV),fGETUBYTE(0,RttV)) +
+		   fMPY16SS(fGETBYTE(1,RssV),fGETUBYTE(1,RttV)))));
+  fSETHALF(1,RxxV,fSATN(16,(fGETHALF(1,RxxV) +
+		   fMPY16SS(fGETBYTE(2,RssV),fGETUBYTE(2,RttV)) +
+		   fMPY16SS(fGETBYTE(3,RssV),fGETUBYTE(3,RttV)))));
+  fSETHALF(2,RxxV,fSATN(16,(fGETHALF(2,RxxV) +
+		   fMPY16SS(fGETBYTE(4,RssV),fGETUBYTE(4,RttV)) +
+		   fMPY16SS(fGETBYTE(5,RssV),fGETUBYTE(5,RttV)))));
+  fSETHALF(3,RxxV,fSATN(16,(fGETHALF(3,RxxV) +
+		   fMPY16SS(fGETBYTE(6,RssV),fGETUBYTE(6,RttV)) +
+		   fMPY16SS(fGETBYTE(7,RssV),fGETUBYTE(7,RttV)))));
+  fACC();\
+ })
+
+
+
+/* Full version */
+#undef dmpy_sema
+#define dmpy_sema(N)\
+{ fSETWORD(0,RxxV,fSAT(fGETWORD(0,RxxV) + fSCALE(N,fMPY16SS(fGETHALF(0,RssV),fGETHALF(0,RttV))) + \
+                     fSCALE(N,fMPY16SS(fGETHALF(1,RssV),fGETHALF(1,RttV)))));\
+  fSETWORD(1,RxxV,fSAT(fGETWORD(1,RxxV) + fSCALE(N,fMPY16SS(fGETHALF(2,RssV),fGETHALF(2,RttV))) + \
+                     fSCALE(N,fMPY16SS(fGETHALF(3,RssV),fGETHALF(3,RttV)))));\
+  fACC();\
+}
+Q6INSN(M2_vdmacs_s0,"Rxx32+=vdmpy(Rss32,Rtt32):sat",ATTRIBS(),    "",dmpy_sema(0))
+Q6INSN(M2_vdmacs_s1,"Rxx32+=vdmpy(Rss32,Rtt32):<<1:sat",ATTRIBS(),"",dmpy_sema(1))
+
+#undef dmpy_sema
+#define dmpy_sema(N)\
+{ fSETWORD(0,RddV,fSAT(fSCALE(N,fMPY16SS(fGETHALF(0,RssV),fGETHALF(0,RttV))) + \
+              fSCALE(N,fMPY16SS(fGETHALF(1,RssV),fGETHALF(1,RttV)))));\
+  fSETWORD(1,RddV,fSAT(fSCALE(N,fMPY16SS(fGETHALF(2,RssV),fGETHALF(2,RttV))) + \
+              fSCALE(N,fMPY16SS(fGETHALF(3,RssV),fGETHALF(3,RttV)))));\
+}
+
+Q6INSN(M2_vdmpys_s0,"Rdd32=vdmpy(Rss32,Rtt32):sat",ATTRIBS(),    "",dmpy_sema(0))
+Q6INSN(M2_vdmpys_s1,"Rdd32=vdmpy(Rss32,Rtt32):<<1:sat",ATTRIBS(),"",dmpy_sema(1))
+
+
+
+/******************************************************/
+/* complex multiply/mac with                          */
+/* real&imag are packed together and always saturated */
+/* to protect against overflow.                       */
+/******************************************************/
+
+#undef cmpy_sema
+#define cmpy_sema(N,CONJMINUS,CONJPLUS)\
+{ fSETHALF(1,RdV,fGETHALF(1,(fSAT(fSCALE(N,fMPY16SS(fGETHALF(1,RsV),fGETHALF(0,RtV))) CONJMINUS \
+                                  fSCALE(N,fMPY16SS(fGETHALF(0,RsV),fGETHALF(1,RtV))) + 0x8000))));\
+  fSETHALF(0,RdV,fGETHALF(1,(fSAT(fSCALE(N,fMPY16SS(fGETHALF(0,RsV),fGETHALF(0,RtV))) CONJPLUS \
+                                  fSCALE(N,fMPY16SS(fGETHALF(1,RsV),fGETHALF(1,RtV))) + 0x8000))));\
+}
+Q6INSN(M2_cmpyrs_s0,"Rd32=cmpy(Rs32,Rt32):rnd:sat",ATTRIBS(),    "Complex Multiply",cmpy_sema(0,+,-))
+Q6INSN(M2_cmpyrs_s1,"Rd32=cmpy(Rs32,Rt32):<<1:rnd:sat",ATTRIBS(),"Complex Multiply",cmpy_sema(1,+,-))
+
+
+Q6INSN(M2_cmpyrsc_s0,"Rd32=cmpy(Rs32,Rt32*):rnd:sat",ATTRIBS(A_ARCHV2),    "Complex Multiply",cmpy_sema(0,-,+))
+Q6INSN(M2_cmpyrsc_s1,"Rd32=cmpy(Rs32,Rt32*):<<1:rnd:sat",ATTRIBS(A_ARCHV2),"Complex Multiply",cmpy_sema(1,-,+))
+
+
+#undef cmpy_sema
+#define cmpy_sema(N,CONJMINUS,CONJPLUS)\
+{ fSETWORD(1,RxxV,fSAT(fGETWORD(1,RxxV) + fSCALE(N,fMPY16SS(fGETHALF(1,RsV),fGETHALF(0,RtV))) CONJMINUS \
+                                          fSCALE(N,fMPY16SS(fGETHALF(0,RsV),fGETHALF(1,RtV)))));\
+  fSETWORD(0,RxxV,fSAT(fGETWORD(0,RxxV) + fSCALE(N,fMPY16SS(fGETHALF(0,RsV),fGETHALF(0,RtV))) CONJPLUS \
+                                          fSCALE(N,fMPY16SS(fGETHALF(1,RsV),fGETHALF(1,RtV)))));\
+  fACC();\
+}
+Q6INSN(M2_cmacs_s0,"Rxx32+=cmpy(Rs32,Rt32):sat",ATTRIBS(),    "Complex Multiply",cmpy_sema(0,+,-))
+Q6INSN(M2_cmacs_s1,"Rxx32+=cmpy(Rs32,Rt32):<<1:sat",ATTRIBS(),"Complex Multiply",cmpy_sema(1,+,-))
+
+/* EJP: Need mac versions w/ CONJ T? */
+Q6INSN(M2_cmacsc_s0,"Rxx32+=cmpy(Rs32,Rt32*):sat",ATTRIBS(A_ARCHV2),    "Complex Multiply",cmpy_sema(0,-,+))
+Q6INSN(M2_cmacsc_s1,"Rxx32+=cmpy(Rs32,Rt32*):<<1:sat",ATTRIBS(A_ARCHV2),"Complex Multiply",cmpy_sema(1,-,+))
+
+
+#undef cmpy_sema
+#define cmpy_sema(N,CONJMINUS,CONJPLUS)\
+{ fSETWORD(1,RddV,fSAT(fSCALE(N,fMPY16SS(fGETHALF(1,RsV),fGETHALF(0,RtV))) CONJMINUS \
+                       fSCALE(N,fMPY16SS(fGETHALF(0,RsV),fGETHALF(1,RtV)))));\
+  fSETWORD(0,RddV,fSAT(fSCALE(N,fMPY16SS(fGETHALF(0,RsV),fGETHALF(0,RtV))) CONJPLUS \
+                       fSCALE(N,fMPY16SS(fGETHALF(1,RsV),fGETHALF(1,RtV)))));\
+}
+
+Q6INSN(M2_cmpys_s0,"Rdd32=cmpy(Rs32,Rt32):sat",ATTRIBS(),    "Complex Multiply",cmpy_sema(0,+,-))
+Q6INSN(M2_cmpys_s1,"Rdd32=cmpy(Rs32,Rt32):<<1:sat",ATTRIBS(),"Complex Multiply",cmpy_sema(1,+,-))
+
+Q6INSN(M2_cmpysc_s0,"Rdd32=cmpy(Rs32,Rt32*):sat",ATTRIBS(A_ARCHV2),    "Complex Multiply",cmpy_sema(0,-,+))
+Q6INSN(M2_cmpysc_s1,"Rdd32=cmpy(Rs32,Rt32*):<<1:sat",ATTRIBS(A_ARCHV2),"Complex Multiply",cmpy_sema(1,-,+))
+
+
+
+#undef cmpy_sema
+#define cmpy_sema(N,CONJMINUS,CONJPLUS)\
+{ fSETWORD(1,RxxV,fSAT(fGETWORD(1,RxxV) - (fSCALE(N,fMPY16SS(fGETHALF(1,RsV),fGETHALF(0,RtV))) CONJMINUS \
+                                           fSCALE(N,fMPY16SS(fGETHALF(0,RsV),fGETHALF(1,RtV))))));\
+  fSETWORD(0,RxxV,fSAT(fGETWORD(0,RxxV) - (fSCALE(N,fMPY16SS(fGETHALF(0,RsV),fGETHALF(0,RtV))) CONJPLUS \
+                                           fSCALE(N,fMPY16SS(fGETHALF(1,RsV),fGETHALF(1,RtV))))));\
+  fACC();\
+}
+Q6INSN(M2_cnacs_s0,"Rxx32-=cmpy(Rs32,Rt32):sat",ATTRIBS(A_ARCHV2),    "Complex Multiply",cmpy_sema(0,+,-))
+Q6INSN(M2_cnacs_s1,"Rxx32-=cmpy(Rs32,Rt32):<<1:sat",ATTRIBS(A_ARCHV2),"Complex Multiply",cmpy_sema(1,+,-))
+
+/* EJP: need CONJ versions? */
+Q6INSN(M2_cnacsc_s0,"Rxx32-=cmpy(Rs32,Rt32*):sat",ATTRIBS(A_ARCHV2),    "Complex Multiply",cmpy_sema(0,-,+))
+Q6INSN(M2_cnacsc_s1,"Rxx32-=cmpy(Rs32,Rt32*):<<1:sat",ATTRIBS(A_ARCHV2),"Complex Multiply",cmpy_sema(1,-,+))
+
+
+/******************************************************/
+/* complex interpolation                              */
+/* Given a pair of complex values, scale by a,b, sum  */
+/* Saturate/shift1 and round/pack                     */
+/******************************************************/
+
+#undef vrcmpys_sema
+#define vrcmpys_sema(N,INWORD) \
+{ fSETWORD(1,RddV,fSAT(fSCALE(N,fMPY16SS(fGETHALF(1,RssV),fGETHALF(0,INWORD))) + \
+                       fSCALE(N,fMPY16SS(fGETHALF(3,RssV),fGETHALF(1,INWORD)))));\
+  fSETWORD(0,RddV,fSAT(fSCALE(N,fMPY16SS(fGETHALF(0,RssV),fGETHALF(0,INWORD))) + \
+                       fSCALE(N,fMPY16SS(fGETHALF(2,RssV),fGETHALF(1,INWORD)))));\
+}
+
+
+
+Q6INSN(M2_vrcmpys_s1_h,"Rdd32=vrcmpys(Rss32,Rtt32):<<1:sat:raw:hi",ATTRIBS(A_ARCHV3), "Vector Reduce Complex Multiply by Scalar",vrcmpys_sema(1,fGETWORD(1,RttV)))
+Q6INSN(M2_vrcmpys_s1_l,"Rdd32=vrcmpys(Rss32,Rtt32):<<1:sat:raw:lo",ATTRIBS(A_ARCHV3), "Vector Reduce Complex Multiply by Scalar",vrcmpys_sema(1,fGETWORD(0,RttV)))
+
+DEF_V3_COND_MAPPING(M2_vrcmpys_s1,"Rdd32=vrcmpys(Rss32,Rt32):<<1:sat","Rt32 & 1","Rdd32=vrcmpys(Rss32,Rtt32):<<1:sat:raw:hi","Rdd32=vrcmpys(Rss32,Rtt32):<<1:sat:raw:lo")
+
+#undef vrcmpys_sema
+#define vrcmpys_sema(N,INWORD) \
+{ fSETWORD(1,RxxV,fSAT(fGETWORD(1,RxxV) + fSCALE(N,fMPY16SS(fGETHALF(1,RssV),fGETHALF(0,INWORD))) + \
+                       fSCALE(N,fMPY16SS(fGETHALF(3,RssV),fGETHALF(1,INWORD)))));\
+  fSETWORD(0,RxxV,fSAT(fGETWORD(0,RxxV) + fSCALE(N,fMPY16SS(fGETHALF(0,RssV),fGETHALF(0,INWORD))) + \
+                       fSCALE(N,fMPY16SS(fGETHALF(2,RssV),fGETHALF(1,INWORD)))));\
+  fACC();\
+}
+
+
+
+Q6INSN(M2_vrcmpys_acc_s1_h,"Rxx32+=vrcmpys(Rss32,Rtt32):<<1:sat:raw:hi",ATTRIBS(A_ARCHV3), "Vector Reduce Complex Multiply by Scalar",vrcmpys_sema(1,fGETWORD(1,RttV)))
+Q6INSN(M2_vrcmpys_acc_s1_l,"Rxx32+=vrcmpys(Rss32,Rtt32):<<1:sat:raw:lo",ATTRIBS(A_ARCHV3), "Vector Reduce Complex Multiply by Scalar",vrcmpys_sema(1,fGETWORD(0,RttV)))
+
+DEF_V3_COND_MAPPING(M2_vrcmpys_acc_s1,"Rxx32+=vrcmpys(Rss32,Rt32):<<1:sat","Rt32 & 1","Rxx32+=vrcmpys(Rss32,Rtt32):<<1:sat:raw:hi","Rxx32+=vrcmpys(Rss32,Rtt32):<<1:sat:raw:lo")
+
+
+#undef vrcmpys_sema
+#define vrcmpys_sema(N,INWORD) \
+{ fSETHALF(1,RdV,fGETHALF(1,fSAT(fSCALE(N,fMPY16SS(fGETHALF(1,RssV),fGETHALF(0,INWORD))) + \
+                       fSCALE(N,fMPY16SS(fGETHALF(3,RssV),fGETHALF(1,INWORD))) + 0x8000)));\
+  fSETHALF(0,RdV,fGETHALF(1,fSAT(fSCALE(N,fMPY16SS(fGETHALF(0,RssV),fGETHALF(0,INWORD))) + \
+                       fSCALE(N,fMPY16SS(fGETHALF(2,RssV),fGETHALF(1,INWORD))) + 0x8000)));\
+}
+
+Q6INSN(M2_vrcmpys_s1rp_h,"Rd32=vrcmpys(Rss32,Rtt32):<<1:rnd:sat:raw:hi",ATTRIBS(A_ARCHV3), "Vector Reduce Complex Multiply by Scalar",vrcmpys_sema(1,fGETWORD(1,RttV)))
+Q6INSN(M2_vrcmpys_s1rp_l,"Rd32=vrcmpys(Rss32,Rtt32):<<1:rnd:sat:raw:lo",ATTRIBS(A_ARCHV3), "Vector Reduce Complex Multiply by Scalar",vrcmpys_sema(1,fGETWORD(0,RttV)))
+
+DEF_V3_COND_MAPPING(M2_vrcmpys_s1rp,"Rd32=vrcmpys(Rss32,Rt32):<<1:rnd:sat","Rt32 & 1","Rd32=vrcmpys(Rss32,Rtt32):<<1:rnd:sat:raw:hi","Rd32=vrcmpys(Rss32,Rtt32):<<1:rnd:sat:raw:lo")
+
+#if 1
+/**************************************************************/
+/* mixed mode 32x16 vector dual multiplies                    */
+/*                                                            */
+/**************************************************************/
+
+/* SIGNED 32 x SIGNED 16 */
+
+
+#undef mixmpy_sema
+#define mixmpy_sema(N)\
+{ fSETWORD(1,RxxV,fSAT(fGETWORD(1,RxxV) + ((fSCALE(N,fMPY3216SS(fGETWORD(1,RssV),fGETHALF(2,RttV))))>>16)) ); \
+  fSETWORD(0,RxxV,fSAT(fGETWORD(0,RxxV) + ((fSCALE(N,fMPY3216SS(fGETWORD(0,RssV),fGETHALF(0,RttV))))>>16)) ); \
+  fACC();\
+}
+Q6INSN(M2_mmacls_s0,"Rxx32+=vmpyweh(Rss32,Rtt32):sat",ATTRIBS(),    "Mixed Precision Multiply",mixmpy_sema(0))
+Q6INSN(M2_mmacls_s1,"Rxx32+=vmpyweh(Rss32,Rtt32):<<1:sat",ATTRIBS(),"Mixed Precision Multiply",mixmpy_sema(1))
+
+#undef mixmpy_sema
+#define mixmpy_sema(N)\
+{ fSETWORD(1,RxxV,fSAT(fGETWORD(1,RxxV) + ((fSCALE(N,fMPY3216SS(fGETWORD(1,RssV),fGETHALF(3,RttV))))>>16) )); \
+  fSETWORD(0,RxxV,fSAT(fGETWORD(0,RxxV) + ((fSCALE(N,fMPY3216SS(fGETWORD(0,RssV),fGETHALF(1,RttV))))>>16 ))); \
+  fACC();\
+}
+Q6INSN(M2_mmachs_s0,"Rxx32+=vmpywoh(Rss32,Rtt32):sat",ATTRIBS(),    "Mixed Precision Multiply",mixmpy_sema(0))
+Q6INSN(M2_mmachs_s1,"Rxx32+=vmpywoh(Rss32,Rtt32):<<1:sat",ATTRIBS(),"Mixed Precision Multiply",mixmpy_sema(1))
+
+#undef mixmpy_sema
+#define mixmpy_sema(N)\
+{ fSETWORD(1,RddV,fSAT((fSCALE(N,fMPY3216SS(fGETWORD(1,RssV),fGETHALF(2,RttV))))>>16)); \
+  fSETWORD(0,RddV,fSAT((fSCALE(N,fMPY3216SS(fGETWORD(0,RssV),fGETHALF(0,RttV))))>>16)); \
+}
+Q6INSN(M2_mmpyl_s0,"Rdd32=vmpyweh(Rss32,Rtt32):sat",ATTRIBS(),    "Mixed Precision Multiply",mixmpy_sema(0))
+Q6INSN(M2_mmpyl_s1,"Rdd32=vmpyweh(Rss32,Rtt32):<<1:sat",ATTRIBS(),"Mixed Precision Multiply",mixmpy_sema(1))
+
+#undef mixmpy_sema
+#define mixmpy_sema(N)\
+{ fSETWORD(1,RddV,fSAT((fSCALE(N,fMPY3216SS(fGETWORD(1,RssV),fGETHALF(3,RttV))))>>16)); \
+  fSETWORD(0,RddV,fSAT((fSCALE(N,fMPY3216SS(fGETWORD(0,RssV),fGETHALF(1,RttV))))>>16)); \
+}
+Q6INSN(M2_mmpyh_s0,"Rdd32=vmpywoh(Rss32,Rtt32):sat",ATTRIBS(),    "Mixed Precision Multiply",mixmpy_sema(0))
+Q6INSN(M2_mmpyh_s1,"Rdd32=vmpywoh(Rss32,Rtt32):<<1:sat",ATTRIBS(),"Mixed Precision Multiply",mixmpy_sema(1))
+
+
+/* With rounding */
+
+#undef mixmpy_sema
+#define mixmpy_sema(N)\
+{ fSETWORD(1,RxxV,fSAT(fGETWORD(1,RxxV) + ((fSCALE(N,fMPY3216SS(fGETWORD(1,RssV),fGETHALF(2,RttV)))+0x8000)>>16)) ); \
+  fSETWORD(0,RxxV,fSAT(fGETWORD(0,RxxV) + ((fSCALE(N,fMPY3216SS(fGETWORD(0,RssV),fGETHALF(0,RttV)))+0x8000)>>16)) ); \
+  fACC();\
+}
+Q6INSN(M2_mmacls_rs0,"Rxx32+=vmpyweh(Rss32,Rtt32):rnd:sat",ATTRIBS(),    "Mixed Precision Multiply",mixmpy_sema(0))
+Q6INSN(M2_mmacls_rs1,"Rxx32+=vmpyweh(Rss32,Rtt32):<<1:rnd:sat",ATTRIBS(),"Mixed Precision Multiply",mixmpy_sema(1))
+
+#undef mixmpy_sema
+#define mixmpy_sema(N)\
+{ fSETWORD(1,RxxV,fSAT(fGETWORD(1,RxxV) + ((fSCALE(N,fMPY3216SS(fGETWORD(1,RssV),fGETHALF(3,RttV)))+0x8000)>>16) )); \
+  fSETWORD(0,RxxV,fSAT(fGETWORD(0,RxxV) + ((fSCALE(N,fMPY3216SS(fGETWORD(0,RssV),fGETHALF(1,RttV)))+0x8000)>>16 ))); \
+  fACC();\
+}
+Q6INSN(M2_mmachs_rs0,"Rxx32+=vmpywoh(Rss32,Rtt32):rnd:sat",ATTRIBS(),    "Mixed Precision Multiply",mixmpy_sema(0))
+Q6INSN(M2_mmachs_rs1,"Rxx32+=vmpywoh(Rss32,Rtt32):<<1:rnd:sat",ATTRIBS(),"Mixed Precision Multiply",mixmpy_sema(1))
+
+#undef mixmpy_sema
+#define mixmpy_sema(N)\
+{ fSETWORD(1,RddV,fSAT((fSCALE(N,fMPY3216SS(fGETWORD(1,RssV),fGETHALF(2,RttV)))+0x8000)>>16)); \
+  fSETWORD(0,RddV,fSAT((fSCALE(N,fMPY3216SS(fGETWORD(0,RssV),fGETHALF(0,RttV)))+0x8000)>>16)); \
+}
+Q6INSN(M2_mmpyl_rs0,"Rdd32=vmpyweh(Rss32,Rtt32):rnd:sat",ATTRIBS(),    "Mixed Precision Multiply",mixmpy_sema(0))
+Q6INSN(M2_mmpyl_rs1,"Rdd32=vmpyweh(Rss32,Rtt32):<<1:rnd:sat",ATTRIBS(),"Mixed Precision Multiply",mixmpy_sema(1))
+
+#undef mixmpy_sema
+#define mixmpy_sema(N)\
+{ fSETWORD(1,RddV,fSAT((fSCALE(N,fMPY3216SS(fGETWORD(1,RssV),fGETHALF(3,RttV)))+0x8000)>>16)); \
+  fSETWORD(0,RddV,fSAT((fSCALE(N,fMPY3216SS(fGETWORD(0,RssV),fGETHALF(1,RttV)))+0x8000)>>16)); \
+}
+Q6INSN(M2_mmpyh_rs0,"Rdd32=vmpywoh(Rss32,Rtt32):rnd:sat",ATTRIBS(),    "Mixed Precision Multiply",mixmpy_sema(0))
+Q6INSN(M2_mmpyh_rs1,"Rdd32=vmpywoh(Rss32,Rtt32):<<1:rnd:sat",ATTRIBS(),"Mixed Precision Multiply",mixmpy_sema(1))
+
+
+#undef mixmpy_sema
+#define mixmpy_sema(DEST,EQUALS,N,ACCSYN)\
+{ DEST EQUALS fSCALE(N,fMPY3216SS(fGETWORD(1,RssV),fGETHALF(2,RttV))) + fSCALE(N,fMPY3216SS(fGETWORD(0,RssV),fGETHALF(0,RttV))); ACCSYN;}
+
+Q6INSN(M4_vrmpyeh_s0,"Rdd32=vrmpyweh(Rss32,Rtt32)",ATTRIBS(),    "Mixed Precision Multiply",mixmpy_sema(RddV,=,0,))
+Q6INSN(M4_vrmpyeh_s1,"Rdd32=vrmpyweh(Rss32,Rtt32):<<1",ATTRIBS(),"Mixed Precision Multiply",mixmpy_sema(RddV,=,1,))
+Q6INSN(M4_vrmpyeh_acc_s0,"Rxx32+=vrmpyweh(Rss32,Rtt32)",ATTRIBS(),    "Mixed Precision Multiply",mixmpy_sema(RxxV,+=,0,fACC()))
+Q6INSN(M4_vrmpyeh_acc_s1,"Rxx32+=vrmpyweh(Rss32,Rtt32):<<1",ATTRIBS(),"Mixed Precision Multiply",mixmpy_sema(RxxV,+=,1,fACC()))
+
+#undef mixmpy_sema
+#define mixmpy_sema(DEST,EQUALS,N,ACCSYN)\
+{ DEST EQUALS fSCALE(N,fMPY3216SS(fGETWORD(1,RssV),fGETHALF(3,RttV))) + fSCALE(N,fMPY3216SS(fGETWORD(0,RssV),fGETHALF(1,RttV))); ACCSYN;}
+
+Q6INSN(M4_vrmpyoh_s0,"Rdd32=vrmpywoh(Rss32,Rtt32)",ATTRIBS(),    "Mixed Precision Multiply",mixmpy_sema(RddV,=,0,))
+Q6INSN(M4_vrmpyoh_s1,"Rdd32=vrmpywoh(Rss32,Rtt32):<<1",ATTRIBS(),"Mixed Precision Multiply",mixmpy_sema(RddV,=,1,))
+Q6INSN(M4_vrmpyoh_acc_s0,"Rxx32+=vrmpywoh(Rss32,Rtt32)",ATTRIBS(),    "Mixed Precision Multiply",mixmpy_sema(RxxV,+=,0,fACC()))
+Q6INSN(M4_vrmpyoh_acc_s1,"Rxx32+=vrmpywoh(Rss32,Rtt32):<<1",ATTRIBS(),"Mixed Precision Multiply",mixmpy_sema(RxxV,+=,1,fACC()))
+
+
+
+
+
+
+#undef mixmpy_sema
+#define mixmpy_sema(N,H,RND)\
+{  RdV = fSAT((fSCALE(N,fMPY3216SS(RsV,fGETHALF(H,RtV)))RND)>>16); \
+}
+Q6INSN(M2_hmmpyl_rs1,"Rd32=mpy(Rs32,Rt.L32):<<1:rnd:sat",ATTRIBS(A_ARCHV2),"Mixed Precision Multiply",mixmpy_sema(1,0,+0x8000))
+Q6INSN(M2_hmmpyh_rs1,"Rd32=mpy(Rs32,Rt.H32):<<1:rnd:sat",ATTRIBS(A_ARCHV2),"Mixed Precision Multiply",mixmpy_sema(1,1,+0x8000))
+Q6INSN(M2_hmmpyl_s1,"Rd32=mpy(Rs32,Rt.L32):<<1:sat",ATTRIBS(A_ARCHV2),"Mixed Precision Multiply",mixmpy_sema(1,0,))
+Q6INSN(M2_hmmpyh_s1,"Rd32=mpy(Rs32,Rt.H32):<<1:sat",ATTRIBS(A_ARCHV2),"Mixed Precision Multiply",mixmpy_sema(1,1,))
+
+
+
+
+
+
+
+
+
+/* SIGNED 32 x UNSIGNED 16 */
+
+#undef mixmpy_sema
+#define mixmpy_sema(N)\
+{ fSETWORD(1,RxxV,fSAT(fGETWORD(1,RxxV) + ((fSCALE(N,fMPY3216SU(fGETWORD(1,RssV),fGETUHALF(2,RttV))))>>16)) ); \
+  fSETWORD(0,RxxV,fSAT(fGETWORD(0,RxxV) + ((fSCALE(N,fMPY3216SU(fGETWORD(0,RssV),fGETUHALF(0,RttV))))>>16)) ); \
+  fACC();\
+}
+Q6INSN(M2_mmaculs_s0,"Rxx32+=vmpyweuh(Rss32,Rtt32):sat",ATTRIBS(),    "Mixed Precision Multiply",mixmpy_sema(0))
+Q6INSN(M2_mmaculs_s1,"Rxx32+=vmpyweuh(Rss32,Rtt32):<<1:sat",ATTRIBS(),"Mixed Precision Multiply",mixmpy_sema(1))
+
+#undef mixmpy_sema
+#define mixmpy_sema(N)\
+{ fSETWORD(1,RxxV,fSAT(fGETWORD(1,RxxV) + ((fSCALE(N,fMPY3216SU(fGETWORD(1,RssV),fGETUHALF(3,RttV))))>>16) )); \
+  fSETWORD(0,RxxV,fSAT(fGETWORD(0,RxxV) + ((fSCALE(N,fMPY3216SU(fGETWORD(0,RssV),fGETUHALF(1,RttV))))>>16 ))); \
+  fACC();\
+}
+Q6INSN(M2_mmacuhs_s0,"Rxx32+=vmpywouh(Rss32,Rtt32):sat",ATTRIBS(),    "Mixed Precision Multiply",mixmpy_sema(0))
+Q6INSN(M2_mmacuhs_s1,"Rxx32+=vmpywouh(Rss32,Rtt32):<<1:sat",ATTRIBS(),"Mixed Precision Multiply",mixmpy_sema(1))
+
+#undef mixmpy_sema
+#define mixmpy_sema(N)\
+{ fSETWORD(1,RddV,fSAT((fSCALE(N,fMPY3216SU(fGETWORD(1,RssV),fGETUHALF(2,RttV))))>>16)); \
+  fSETWORD(0,RddV,fSAT((fSCALE(N,fMPY3216SU(fGETWORD(0,RssV),fGETUHALF(0,RttV))))>>16)); \
+}
+Q6INSN(M2_mmpyul_s0,"Rdd32=vmpyweuh(Rss32,Rtt32):sat",ATTRIBS(),    "Mixed Precision Multiply",mixmpy_sema(0))
+Q6INSN(M2_mmpyul_s1,"Rdd32=vmpyweuh(Rss32,Rtt32):<<1:sat",ATTRIBS(),"Mixed Precision Multiply",mixmpy_sema(1))
+
+#undef mixmpy_sema
+#define mixmpy_sema(N)\
+{ fSETWORD(1,RddV,fSAT((fSCALE(N,fMPY3216SU(fGETWORD(1,RssV),fGETUHALF(3,RttV))))>>16)); \
+  fSETWORD(0,RddV,fSAT((fSCALE(N,fMPY3216SU(fGETWORD(0,RssV),fGETUHALF(1,RttV))))>>16)); \
+}
+Q6INSN(M2_mmpyuh_s0,"Rdd32=vmpywouh(Rss32,Rtt32):sat",ATTRIBS(),    "Mixed Precision Multiply",mixmpy_sema(0))
+Q6INSN(M2_mmpyuh_s1,"Rdd32=vmpywouh(Rss32,Rtt32):<<1:sat",ATTRIBS(),"Mixed Precision Multiply",mixmpy_sema(1))
+
+
+/* With rounding */
+
+#undef mixmpy_sema
+#define mixmpy_sema(N)\
+{ fSETWORD(1,RxxV,fSAT(fGETWORD(1,RxxV) + ((fSCALE(N,fMPY3216SU(fGETWORD(1,RssV),fGETUHALF(2,RttV)))+0x8000)>>16)) ); \
+  fSETWORD(0,RxxV,fSAT(fGETWORD(0,RxxV) + ((fSCALE(N,fMPY3216SU(fGETWORD(0,RssV),fGETUHALF(0,RttV)))+0x8000)>>16)) ); \
+  fACC();\
+}
+Q6INSN(M2_mmaculs_rs0,"Rxx32+=vmpyweuh(Rss32,Rtt32):rnd:sat",ATTRIBS(),    "Mixed Precision Multiply",mixmpy_sema(0))
+Q6INSN(M2_mmaculs_rs1,"Rxx32+=vmpyweuh(Rss32,Rtt32):<<1:rnd:sat",ATTRIBS(),"Mixed Precision Multiply",mixmpy_sema(1))
+
+#undef mixmpy_sema
+#define mixmpy_sema(N)\
+{ fSETWORD(1,RxxV,fSAT(fGETWORD(1,RxxV) + ((fSCALE(N,fMPY3216SU(fGETWORD(1,RssV),fGETUHALF(3,RttV)))+0x8000)>>16) )); \
+  fSETWORD(0,RxxV,fSAT(fGETWORD(0,RxxV) + ((fSCALE(N,fMPY3216SU(fGETWORD(0,RssV),fGETUHALF(1,RttV)))+0x8000)>>16 ))); \
+  fACC();\
+}
+Q6INSN(M2_mmacuhs_rs0,"Rxx32+=vmpywouh(Rss32,Rtt32):rnd:sat",ATTRIBS(),    "Mixed Precision Multiply",mixmpy_sema(0))
+Q6INSN(M2_mmacuhs_rs1,"Rxx32+=vmpywouh(Rss32,Rtt32):<<1:rnd:sat",ATTRIBS(),"Mixed Precision Multiply",mixmpy_sema(1))
+
+#undef mixmpy_sema
+#define mixmpy_sema(N)\
+{ fSETWORD(1,RddV,fSAT((fSCALE(N,fMPY3216SU(fGETWORD(1,RssV),fGETUHALF(2,RttV)))+0x8000)>>16)); \
+  fSETWORD(0,RddV,fSAT((fSCALE(N,fMPY3216SU(fGETWORD(0,RssV),fGETUHALF(0,RttV)))+0x8000)>>16)); \
+}
+Q6INSN(M2_mmpyul_rs0,"Rdd32=vmpyweuh(Rss32,Rtt32):rnd:sat",ATTRIBS(),    "Mixed Precision Multiply",mixmpy_sema(0))
+Q6INSN(M2_mmpyul_rs1,"Rdd32=vmpyweuh(Rss32,Rtt32):<<1:rnd:sat",ATTRIBS(),"Mixed Precision Multiply",mixmpy_sema(1))
+
+#undef mixmpy_sema
+#define mixmpy_sema(N)\
+{ fSETWORD(1,RddV,fSAT((fSCALE(N,fMPY3216SU(fGETWORD(1,RssV),fGETUHALF(3,RttV)))+0x8000)>>16)); \
+  fSETWORD(0,RddV,fSAT((fSCALE(N,fMPY3216SU(fGETWORD(0,RssV),fGETUHALF(1,RttV)))+0x8000)>>16)); \
+}
+Q6INSN(M2_mmpyuh_rs0,"Rdd32=vmpywouh(Rss32,Rtt32):rnd:sat",ATTRIBS(),    "Mixed Precision Multiply",mixmpy_sema(0))
+Q6INSN(M2_mmpyuh_rs1,"Rdd32=vmpywouh(Rss32,Rtt32):<<1:rnd:sat",ATTRIBS(),"Mixed Precision Multiply",mixmpy_sema(1))
+
+
+#endif
+
+
+
+/**************************************************************/
+/* complex mac with full 64-bit accum - no sat, no shift      */
+/* either do real or accum, never both                        */
+/**************************************************************/
+
+Q6INSN(M2_vrcmaci_s0,"Rxx32+=vrcmpyi(Rss32,Rtt32)",ATTRIBS(),"Vector Complex Mac Imaginary",
+{
+RxxV = RxxV + fMPY16SS(fGETHALF(1,RssV),fGETHALF(0,RttV)) + \
+              fMPY16SS(fGETHALF(0,RssV),fGETHALF(1,RttV)) + \
+              fMPY16SS(fGETHALF(3,RssV),fGETHALF(2,RttV)) + \
+              fMPY16SS(fGETHALF(2,RssV),fGETHALF(3,RttV));\
+fACC();\
+})
+
+Q6INSN(M2_vrcmacr_s0,"Rxx32+=vrcmpyr(Rss32,Rtt32)",ATTRIBS(),"Vector Complex Mac Real",
+{ RxxV = RxxV + fMPY16SS(fGETHALF(0,RssV),fGETHALF(0,RttV)) - \
+                fMPY16SS(fGETHALF(1,RssV),fGETHALF(1,RttV)) + \
+                fMPY16SS(fGETHALF(2,RssV),fGETHALF(2,RttV)) - \
+                fMPY16SS(fGETHALF(3,RssV),fGETHALF(3,RttV));\
+fACC();\
+})
+
+Q6INSN(M2_vrcmaci_s0c,"Rxx32+=vrcmpyi(Rss32,Rtt32*)",ATTRIBS(A_ARCHV2),"Vector Complex Mac Imaginary",
+{
+RxxV = RxxV + fMPY16SS(fGETHALF(1,RssV),fGETHALF(0,RttV)) - \
+              fMPY16SS(fGETHALF(0,RssV),fGETHALF(1,RttV)) + \
+              fMPY16SS(fGETHALF(3,RssV),fGETHALF(2,RttV)) - \
+              fMPY16SS(fGETHALF(2,RssV),fGETHALF(3,RttV));\
+fACC();\
+})
+
+Q6INSN(M2_vrcmacr_s0c,"Rxx32+=vrcmpyr(Rss32,Rtt32*)",ATTRIBS(A_ARCHV2),"Vector Complex Mac Real",
+{ RxxV = RxxV + fMPY16SS(fGETHALF(0,RssV),fGETHALF(0,RttV)) + \
+                fMPY16SS(fGETHALF(1,RssV),fGETHALF(1,RttV)) + \
+                fMPY16SS(fGETHALF(2,RssV),fGETHALF(2,RttV)) + \
+                fMPY16SS(fGETHALF(3,RssV),fGETHALF(3,RttV));\
+fACC();\
+})
+
+Q6INSN(M2_cmaci_s0,"Rxx32+=cmpyi(Rs32,Rt32)",ATTRIBS(),"Vector Complex Mac Imaginary",
+{
+RxxV = RxxV + fMPY16SS(fGETHALF(1,RsV),fGETHALF(0,RtV)) + \
+              fMPY16SS(fGETHALF(0,RsV),fGETHALF(1,RtV));
+fACC();\
+})
+
+Q6INSN(M2_cmacr_s0,"Rxx32+=cmpyr(Rs32,Rt32)",ATTRIBS(),"Vector Complex Mac Real",
+{ RxxV = RxxV + fMPY16SS(fGETHALF(0,RsV),fGETHALF(0,RtV)) - \
+                fMPY16SS(fGETHALF(1,RsV),fGETHALF(1,RtV));
+fACC();\
+})
+
+
+Q6INSN(M2_vrcmpyi_s0,"Rdd32=vrcmpyi(Rss32,Rtt32)",ATTRIBS(),"Vector Complex Mpy Imaginary",
+{
+RddV = fMPY16SS(fGETHALF(1,RssV),fGETHALF(0,RttV)) + \
+       fMPY16SS(fGETHALF(0,RssV),fGETHALF(1,RttV)) + \
+       fMPY16SS(fGETHALF(3,RssV),fGETHALF(2,RttV)) + \
+       fMPY16SS(fGETHALF(2,RssV),fGETHALF(3,RttV));\
+})
+
+Q6INSN(M2_vrcmpyr_s0,"Rdd32=vrcmpyr(Rss32,Rtt32)",ATTRIBS(),"Vector Complex Mpy Real",
+{ RddV = fMPY16SS(fGETHALF(0,RssV),fGETHALF(0,RttV)) - \
+         fMPY16SS(fGETHALF(1,RssV),fGETHALF(1,RttV)) + \
+         fMPY16SS(fGETHALF(2,RssV),fGETHALF(2,RttV)) - \
+         fMPY16SS(fGETHALF(3,RssV),fGETHALF(3,RttV));\
+})
+
+Q6INSN(M2_vrcmpyi_s0c,"Rdd32=vrcmpyi(Rss32,Rtt32*)",ATTRIBS(A_ARCHV2),"Vector Complex Mpy Imaginary",
+{
+RddV = fMPY16SS(fGETHALF(1,RssV),fGETHALF(0,RttV)) - \
+       fMPY16SS(fGETHALF(0,RssV),fGETHALF(1,RttV)) + \
+       fMPY16SS(fGETHALF(3,RssV),fGETHALF(2,RttV)) - \
+       fMPY16SS(fGETHALF(2,RssV),fGETHALF(3,RttV));\
+})
+
+Q6INSN(M2_vrcmpyr_s0c,"Rdd32=vrcmpyr(Rss32,Rtt32*)",ATTRIBS(A_ARCHV2),"Vector Complex Mpy Real",
+{ RddV = fMPY16SS(fGETHALF(0,RssV),fGETHALF(0,RttV)) + \
+         fMPY16SS(fGETHALF(1,RssV),fGETHALF(1,RttV)) + \
+         fMPY16SS(fGETHALF(2,RssV),fGETHALF(2,RttV)) + \
+         fMPY16SS(fGETHALF(3,RssV),fGETHALF(3,RttV));\
+})
+
+Q6INSN(M2_cmpyi_s0,"Rdd32=cmpyi(Rs32,Rt32)",ATTRIBS(),"Vector Complex Mpy Imaginary",
+{
+RddV = fMPY16SS(fGETHALF(1,RsV),fGETHALF(0,RtV)) + \
+       fMPY16SS(fGETHALF(0,RsV),fGETHALF(1,RtV));
+})
+
+Q6INSN(M2_cmpyr_s0,"Rdd32=cmpyr(Rs32,Rt32)",ATTRIBS(),"Vector Complex Mpy Real",
+{ RddV = fMPY16SS(fGETHALF(0,RsV),fGETHALF(0,RtV)) - \
+         fMPY16SS(fGETHALF(1,RsV),fGETHALF(1,RtV));
+})
+
+
+/**************************************************************/
+/* Complex mpy/mac with 2x32 bit accum, sat, shift            */
+/* 32x16 real or imag                                         */
+/**************************************************************/
+
+#if 1
+
+Q6INSN(M4_cmpyi_wh,"Rd32=cmpyiwh(Rss32,Rt32):<<1:rnd:sat",ATTRIBS(),"Mixed Precision Complex Multiply",
+{
+ RdV = fSAT(  (  fMPY3216SS(fGETWORD(0,RssV),fGETHALF(1,RtV))
+               + fMPY3216SS(fGETWORD(1,RssV),fGETHALF(0,RtV))
+               + 0x4000)>>15);
+})
+
+
+Q6INSN(M4_cmpyr_wh,"Rd32=cmpyrwh(Rss32,Rt32):<<1:rnd:sat",ATTRIBS(),"Mixed Precision Complex Multiply",
+{
+ RdV = fSAT(  (  fMPY3216SS(fGETWORD(0,RssV),fGETHALF(0,RtV))
+               - fMPY3216SS(fGETWORD(1,RssV),fGETHALF(1,RtV))
+               + 0x4000)>>15);
+})
+
+Q6INSN(M4_cmpyi_whc,"Rd32=cmpyiwh(Rss32,Rt32*):<<1:rnd:sat",ATTRIBS(),"Mixed Precision Complex Multiply",
+{
+ RdV = fSAT(  (  fMPY3216SS(fGETWORD(1,RssV),fGETHALF(0,RtV))
+               - fMPY3216SS(fGETWORD(0,RssV),fGETHALF(1,RtV))
+               + 0x4000)>>15);
+})
+
+
+Q6INSN(M4_cmpyr_whc,"Rd32=cmpyrwh(Rss32,Rt32*):<<1:rnd:sat",ATTRIBS(),"Mixed Precision Complex Multiply",
+{
+ RdV = fSAT(  (  fMPY3216SS(fGETWORD(0,RssV),fGETHALF(0,RtV))
+               + fMPY3216SS(fGETWORD(1,RssV),fGETHALF(1,RtV))
+               + 0x4000)>>15);
+})
+
+
+#endif
+
+/**************************************************************/
+/* Vector mpy/mac with 2x32 bit accum, sat, shift             */
+/* either do real or imag,  never both                        */
+/**************************************************************/
+
+#undef VCMPYSEMI
+#define VCMPYSEMI(DST,ACC0,ACC1,SHIFT,SAT,ACCSYN) \
+	fSETWORD(0,DST,SAT(ACC0 fSCALE(SHIFT,fMPY16SS(fGETHALF(1,RssV),fGETHALF(0,RttV)) + \
+	                                    fMPY16SS(fGETHALF(0,RssV),fGETHALF(1,RttV))))); \
+	fSETWORD(1,DST,SAT(ACC1 fSCALE(SHIFT,fMPY16SS(fGETHALF(3,RssV),fGETHALF(2,RttV)) + \
+	                                    fMPY16SS(fGETHALF(2,RssV),fGETHALF(3,RttV))))); \
+    ACCSYN;\
+
+#undef VCMPYSEMR
+#define VCMPYSEMR(DST,ACC0,ACC1,SHIFT,SAT,ACCSYN) \
+	fSETWORD(0,DST,SAT(ACC0 fSCALE(SHIFT,fMPY16SS(fGETHALF(0,RssV),fGETHALF(0,RttV)) - \
+	                       fMPY16SS(fGETHALF(1,RssV),fGETHALF(1,RttV))))); \
+	fSETWORD(1,DST,SAT(ACC1 fSCALE(SHIFT,fMPY16SS(fGETHALF(2,RssV),fGETHALF(2,RttV)) - \
+	                       fMPY16SS(fGETHALF(3,RssV),fGETHALF(3,RttV))))); \
+    ACCSYN;\
+
+
+#undef VCMPYIR
+#define VCMPYIR(TAGBASE,DSTSYN,DSTVAL,ACCSEM,ACCVAL0,ACCVAL1,SHIFTSYN,SHIFTVAL,SATSYN,SATVAL,ACCSYN) \
+Q6INSN(M2_##TAGBASE##i,DSTSYN ACCSEM "vcmpyi(Rss32,Rtt32)" SHIFTSYN SATSYN,ATTRIBS(A_ARCHV2), \
+	"Vector Complex Multiply Imaginary", { VCMPYSEMI(DSTVAL,ACCVAL0,ACCVAL1,SHIFTVAL,SATVAL,ACCSYN); }) \
+Q6INSN(M2_##TAGBASE##r,DSTSYN ACCSEM "vcmpyr(Rss32,Rtt32)" SHIFTSYN SATSYN,ATTRIBS(A_ARCHV2), \
+	"Vector Complex Multiply Imaginary", { VCMPYSEMR(DSTVAL,ACCVAL0,ACCVAL1,SHIFTVAL,SATVAL,ACCSYN); })
+
+
+VCMPYIR(vcmpy_s0_sat_,"Rdd32",RddV,"=",,,"",0,":sat",fSAT,)
+VCMPYIR(vcmpy_s1_sat_,"Rdd32",RddV,"=",,,":<<1",1,":sat",fSAT,)
+VCMPYIR(vcmac_s0_sat_,"Rxx32",RxxV,"+=",fGETWORD(0,RxxV) + ,fGETWORD(1,RxxV) + ,"",0,":sat",fSAT,fACC())
+
+
+/**********************************************************************
+ *  Rotation  -- by 0, 90, 180, or 270 means mult by 1, J, -1, -J     *
+ *********************************************************************/
+
+Q6INSN(S2_vcrotate,"Rdd32=vcrotate(Rss32,Rt32)",ATTRIBS(A_ARCHV2),"Rotate complex value by multiple of PI/2",
+{
+	fHIDE(size1u_t tmp;)
+	tmp = fEXTRACTU_RANGE(RtV,1,0);
+	if (tmp == 0) { /* No rotation */
+		fSETHALF(0,RddV,fGETHALF(0,RssV));
+		fSETHALF(1,RddV,fGETHALF(1,RssV));
+	} else if (tmp == 1) { /* Multiply by -J */
+		fSETHALF(0,RddV,fGETHALF(1,RssV));
+		fSETHALF(1,RddV,fSATH(-fGETHALF(0,RssV)));
+	} else if (tmp == 2) { /* Multiply by J */
+		fSETHALF(0,RddV,fSATH(-fGETHALF(1,RssV)));
+		fSETHALF(1,RddV,fGETHALF(0,RssV));
+	} else { /* Multiply by -1 */
+		fHIDE(if (tmp != 3) fatal("C is broken");)
+		fSETHALF(0,RddV,fSATH(-fGETHALF(0,RssV)));
+		fSETHALF(1,RddV,fSATH(-fGETHALF(1,RssV)));
+	}
+	tmp = fEXTRACTU_RANGE(RtV,3,2);
+	if (tmp == 0) { /* No rotation */
+		fSETHALF(2,RddV,fGETHALF(2,RssV));
+		fSETHALF(3,RddV,fGETHALF(3,RssV));
+	} else if (tmp == 1) { /* Multiply by -J */
+		fSETHALF(2,RddV,fGETHALF(3,RssV));
+		fSETHALF(3,RddV,fSATH(-fGETHALF(2,RssV)));
+	} else if (tmp == 2) { /* Multiply by J */
+		fSETHALF(2,RddV,fSATH(-fGETHALF(3,RssV)));
+		fSETHALF(3,RddV,fGETHALF(2,RssV));
+	} else { /* Multiply by -1 */
+		fHIDE(if (tmp != 3) fatal("C is broken");)
+		fSETHALF(2,RddV,fSATH(-fGETHALF(2,RssV)));
+		fSETHALF(3,RddV,fSATH(-fGETHALF(3,RssV)));
+	}
+})
+
+
+Q6INSN(S4_vrcrotate_acc,"Rxx32+=vrcrotate(Rss32,Rt32,#u2)",ATTRIBS(),"Rotate and Reduce Bytes",
+{
+	fHIDE(int i; int tmpr; int tmpi; unsigned int control;)
+	fHIDE(int sumr; int sumi;)
+	sumr = 0;
+	sumi = 0;
+	control = fGETUBYTE(uiV,RtV);
+	for (i = 0; i < 8; i += 2) {
+		tmpr = fGETBYTE(i  ,RssV);
+		tmpi = fGETBYTE(i+1,RssV);
+		switch (control & 3) {
+			case 0: /* No Rotation */
+				sumr += tmpr;
+				sumi += tmpi;
+				break;
+			case 1: /* Multiply by -J */
+				sumr += tmpi;
+				sumi -= tmpr;
+				break;
+			case 2: /* Multiply by J */
+				sumr -= tmpi;
+				sumi += tmpr;
+				break;
+			case 3: /* Multiply by -1 */
+				sumr -= tmpr;
+				sumi -= tmpi;
+				break;
+			fHIDE(default: fatal("C is broken!");)
+		}
+		control = control >> 2;
+	}
+	fSETWORD(0,RxxV,fGETWORD(0,RxxV) + sumr);
+	fSETWORD(1,RxxV,fGETWORD(1,RxxV) + sumi);
+    fACC();\
+})
+
+Q6INSN(S4_vrcrotate,"Rdd32=vrcrotate(Rss32,Rt32,#u2)",ATTRIBS(),"Rotate and Reduce Bytes",
+{
+	fHIDE(int i; int tmpr; int tmpi; unsigned int control;)
+	fHIDE(int sumr; int sumi;)
+	sumr = 0;
+	sumi = 0;
+	control = fGETUBYTE(uiV,RtV);
+	for (i = 0; i < 8; i += 2) {
+		tmpr = fGETBYTE(i  ,RssV);
+		tmpi = fGETBYTE(i+1,RssV);
+		switch (control & 3) {
+			case 0: /* No Rotation */
+				sumr += tmpr;
+				sumi += tmpi;
+				break;
+			case 1: /* Multiply by -J */
+				sumr += tmpi;
+				sumi -= tmpr;
+				break;
+			case 2: /* Multiply by J */
+				sumr -= tmpi;
+				sumi += tmpr;
+				break;
+			case 3: /* Multiply by -1 */
+				sumr -= tmpr;
+				sumi -= tmpi;
+				break;
+			fHIDE(default: fatal("C is broken!");)
+		}
+		control = control >> 2;
+	}
+	fSETWORD(0,RddV,sumr);
+	fSETWORD(1,RddV,sumi);
+})
+
+
+Q6INSN(S2_vcnegh,"Rdd32=vcnegh(Rss32,Rt32)",ATTRIBS(),"Conditional Negate halfwords",
+{
+	fHIDE(int i;)
+	for (i = 0; i < 4; i++) {
+		if (fGETBIT(i,RtV)) {
+			fSETHALF(i,RddV,fSATH(-fGETHALF(i,RssV)));
+		} else {
+			fSETHALF(i,RddV,fGETHALF(i,RssV));
+		}
+	}
+})
+
+Q6INSN(S2_vrcnegh,"Rxx32+=vrcnegh(Rss32,Rt32)",ATTRIBS(),"Vector Reduce Conditional Negate halfwords",
+{
+	fHIDE(int i;)
+	for (i = 0; i < 4; i++) {
+		if (fGETBIT(i,RtV)) {
+			RxxV += -fGETHALF(i,RssV);
+		} else {
+			RxxV += fGETHALF(i,RssV);
+		}
+	}
+    fACC();
+})
+
+
+/**********************************************************************
+ *  Finite-field multiplies.  Written by David Hoyle                  *
+ *********************************************************************/
+
+Q6INSN(M4_pmpyw,"Rdd32=pmpyw(Rs32,Rt32)",ATTRIBS(),"Polynomial 32bit Multiplication with Addition in GF(2)",
+{
+        fHIDE(int i; unsigned int y;)
+        fHIDE(unsigned long long x; unsigned long long prod;)
+        x = fGETUWORD(0, RsV);
+        y = fGETUWORD(0, RtV);
+
+        prod = 0;
+        for(i=0; i < 32; i++) {
+            if((y >> i) & 1) prod ^= (x << i);
+        }
+	RddV = prod;
+})
+
+Q6INSN(M4_vpmpyh,"Rdd32=vpmpyh(Rs32,Rt32)",ATTRIBS(),"Dual Polynomial 16bit Multiplication with Addition in GF(2)",
+{
+        fHIDE(int i; unsigned int x0; unsigned int x1;)
+        fHIDE(unsigned int y0; unsigned int y1;)
+        fHIDE(unsigned int prod0; unsigned int prod1;)
+
+        x0 = fGETUHALF(0, RsV);
+        x1 = fGETUHALF(1, RsV);
+        y0 = fGETUHALF(0, RtV);
+        y1 = fGETUHALF(1, RtV);
+
+        prod0 = prod1 = 0;
+        for(i=0; i < 16; i++) {
+            if((y0 >> i) & 1) prod0 ^= (x0 << i);
+            if((y1 >> i) & 1) prod1 ^= (x1 << i);
+        }
+        fSETHALF(0,RddV,fGETUHALF(0,prod0));
+        fSETHALF(1,RddV,fGETUHALF(0,prod1));
+        fSETHALF(2,RddV,fGETUHALF(1,prod0));
+        fSETHALF(3,RddV,fGETUHALF(1,prod1));
+})
+
+Q6INSN(M4_pmpyw_acc,"Rxx32^=pmpyw(Rs32,Rt32)",ATTRIBS(),"Polynomial 32bit Multiplication with Addition in GF(2)",
+{
+        fHIDE(int i; unsigned int y;)
+        fHIDE(unsigned long long x; unsigned long long prod;)
+        x = fGETUWORD(0, RsV);
+        y = fGETUWORD(0, RtV);
+
+        prod = 0;
+        for(i=0; i < 32; i++) {
+            if((y >> i) & 1) prod ^= (x << i);
+        }
+	RxxV ^= prod;
+})
+
+Q6INSN(M4_vpmpyh_acc,"Rxx32^=vpmpyh(Rs32,Rt32)",ATTRIBS(),"Dual Polynomial 16bit Multiplication with Addition in GF(2)",
+{
+        fHIDE(int i; unsigned int x0; unsigned int x1;)
+        fHIDE(unsigned int y0; unsigned int y1;)
+        fHIDE(unsigned int prod0; unsigned int prod1;)
+
+        x0 = fGETUHALF(0, RsV);
+        x1 = fGETUHALF(1, RsV);
+        y0 = fGETUHALF(0, RtV);
+        y1 = fGETUHALF(1, RtV);
+
+        prod0 = prod1 = 0;
+        for(i=0; i < 16; i++) {
+            if((y0 >> i) & 1) prod0 ^= (x0 << i);
+            if((y1 >> i) & 1) prod1 ^= (x1 << i);
+        }
+        fSETHALF(0,RxxV,fGETUHALF(0,RxxV) ^ fGETUHALF(0,prod0));
+        fSETHALF(1,RxxV,fGETUHALF(1,RxxV) ^ fGETUHALF(0,prod1));
+        fSETHALF(2,RxxV,fGETUHALF(2,RxxV) ^ fGETUHALF(1,prod0));
+        fSETHALF(3,RxxV,fGETUHALF(3,RxxV) ^ fGETUHALF(1,prod1));
+})
+
+
+/* V70: TINY CORE */
+
+#define CMPY64(TAG,NAME,DESC,OPERAND1,OP,W0,W1,W2,W3) \
+Q6INSN(M7_##TAG,"Rdd32=" NAME "(Rss32," OPERAND1 ")",ATTRIBS(A_RESTRICT_SLOT3ONLY,A_RESTRICT_NOSLOT2_MPY),"Complex Multiply 64-bit " DESC,    { RddV  = (fMPY32SS(fGETWORD(W0, RssV), fGETWORD(W1, RttV)) OP fMPY32SS(fGETWORD(W2, RssV), fGETWORD(W3, RttV))); fEXTENSION_AUDIO();})\
+Q6INSN(M7_##TAG##_acc,"Rxx32+=" NAME "(Rss32,"OPERAND1")",ATTRIBS(A_RESTRICT_SLOT3ONLY,A_RESTRICT_NOSLOT2_MPY,A_NOTE_NOSLOT2_MPY),"Complex Multiply-Accumulate 64-bit " DESC, { RxxV += (fMPY32SS(fGETWORD(W0, RssV), fGETWORD(W1, RttV)) OP fMPY32SS(fGETWORD(W2, RssV), fGETWORD(W3, RttV))); fEXTENSION_AUDIO(); fACC(); })
+
+CMPY64(dcmpyrw, "cmpyrw","Real","Rtt32" ,-,0,0,1,1)
+CMPY64(dcmpyrwc,"cmpyrw","Real","Rtt32*",+,0,0,1,1)
+CMPY64(dcmpyiw, "cmpyiw","Imag","Rtt32" ,+,0,1,1,0)
+CMPY64(dcmpyiwc,"cmpyiw","Imag","Rtt32*",-,1,0,0,1)
+
+DEF_MAPPING(M7_vdmpy,"Rdd32=vdmpyw(Rss32,Rtt32)","Rdd32=cmpyrw(Rss32,Rtt32*)")
+DEF_MAPPING(M7_vdmpy_acc,"Rxx32+=vdmpyw(Rss32,Rtt32)","Rxx32+=cmpyrw(Rss32,Rtt32*)")
+
+
+
+#define CMPY128(TAG, NAME, OPERAND1, WORD0, WORD1, WORD2, WORD3, OP) \
+Q6INSN(M7_##TAG,"Rd32=" NAME "(Rss32,"OPERAND1"):<<1:sat",ATTRIBS(A_RESTRICT_SLOT3ONLY,A_RESTRICT_NOSLOT2_MPY,A_NOTE_NOSLOT2_MPY),"Complex Multiply 32-bit result real",  \
+{ \
+fHIDE(size16s_t acc128;)\
+fHIDE(size16s_t tmp128;)\
+fHIDE(size8s_t acc64;)\
+tmp128 = fCAST8S_16S(fMPY32SS(fGETWORD(WORD0, RssV), fGETWORD(WORD1, RttV)));\
+acc128 = fCAST8S_16S(fMPY32SS(fGETWORD(WORD2, RssV), fGETWORD(WORD3, RttV)));\
+acc128 = OP(tmp128,acc128);\
+acc128 = fSHIFTR128(acc128, 31);\
+acc64 =  fCAST16S_8S(acc128);\
+RdV = fSATW(acc64);\
+fEXTENSION_AUDIO();})
+
+
+CMPY128(wcmpyrw, "cmpyrw", "Rtt32", 0, 0, 1, 1, fSUB128)
+CMPY128(wcmpyrwc, "cmpyrw", "Rtt32*", 0, 0, 1, 1, fADD128)
+CMPY128(wcmpyiw, "cmpyiw", "Rtt32", 0, 1, 1, 0, fADD128)
+CMPY128(wcmpyiwc, "cmpyiw", "Rtt32*", 1, 0, 0, 1, fSUB128)
+
+
+#define CMPY128RND(TAG, NAME, OPERAND1, WORD0, WORD1, WORD2, WORD3, OP) \
+Q6INSN(M7_##TAG##_rnd,"Rd32=" NAME "(Rss32,"OPERAND1"):<<1:rnd:sat",ATTRIBS(A_RESTRICT_SLOT3ONLY,A_RESTRICT_NOSLOT2_MPY,A_NOTE_NOSLOT2_MPY),"Complex Multiply 32-bit result real",  \
+{ \
+fHIDE(size16s_t acc128;)\
+fHIDE(size16s_t tmp128;)\
+fHIDE(size16s_t const128;)\
+fHIDE(size8s_t acc64;)\
+tmp128 = fCAST8S_16S(fMPY32SS(fGETWORD(WORD0, RssV), fGETWORD(WORD1, RttV)));\
+acc128 = fCAST8S_16S(fMPY32SS(fGETWORD(WORD2, RssV), fGETWORD(WORD3, RttV)));\
+const128 = fCAST8S_16S(fCONSTLL(0x40000000));\
+acc128 = OP(tmp128,acc128);\
+acc128 = fADD128(acc128,const128);\
+acc128 = fSHIFTR128(acc128, 31);\
+acc64 =  fCAST16S_8S(acc128);\
+RdV = fSATW(acc64);\
+fEXTENSION_AUDIO();})
+
+CMPY128RND(wcmpyrw, "cmpyrw", "Rtt32", 0, 0, 1, 1, fSUB128)
+CMPY128RND(wcmpyrwc, "cmpyrw", "Rtt32*", 0, 0, 1, 1, fADD128)
+CMPY128RND(wcmpyiw, "cmpyiw", "Rtt32", 0, 1, 1, 0, fADD128)
+CMPY128RND(wcmpyiwc, "cmpyiw", "Rtt32*", 1, 0, 0, 1, fSUB128)
+
+
+
+
diff --git a/target/hexagon/imported/shift.idef b/target/hexagon/imported/shift.idef
new file mode 100644
index 0000000..81c440f
--- /dev/null
+++ b/target/hexagon/imported/shift.idef
@@ -0,0 +1,1211 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+/*
+ * S-type Instructions
+ */
+
+/**********************************************/
+/* SHIFTS                                     */
+/**********************************************/
+
+/* NOTE: Rdd = Rs *right* shifts don't make sense */
+/* NOTE: Rd[d] = Rs[s] *right* shifts with saturation don't make sense */
+
+
+#define RSHIFTTYPES(TAGEND,REGD,REGS,REGSTYPE,ACC,ACCSRC,SAT,SATOPT,ATTRS) \
+Q6INSN(S2_asr_r_##TAGEND,#REGD "32" #ACC "=asr(" #REGS "32,Rt32)" #SATOPT,ATTRIBS(ATTRS), \
+	"Arithmetic Shift Right by Register", \
+	{  \
+                fHIDE(size4s_t) shamt=fSXTN(7,32,RtV);\
+		REGD##V = SAT(ACCSRC ACC fBIDIR_ASHIFTR(REGS##V,shamt,REGSTYPE));  \
+	})\
+\
+Q6INSN(S2_asl_r_##TAGEND,#REGD "32" #ACC "=asl(" #REGS "32,Rt32)" #SATOPT,ATTRIBS(ATTRS), \
+	"Arithmetic Shift Left by Register", \
+	{  \
+                fHIDE(size4s_t) shamt=fSXTN(7,32,RtV);\
+		REGD##V = SAT(ACCSRC ACC fBIDIR_ASHIFTL(REGS##V,shamt,REGSTYPE));  \
+	})\
+\
+Q6INSN(S2_lsr_r_##TAGEND,#REGD "32" #ACC "=lsr(" #REGS "32,Rt32)" #SATOPT,ATTRIBS(ATTRS), \
+	"Logical Shift Right by Register", \
+	{  \
+                fHIDE(size4s_t) shamt=fSXTN(7,32,RtV);\
+		REGD##V = SAT(ACCSRC ACC fBIDIR_LSHIFTR(REGS##V,shamt,REGSTYPE));  \
+	})\
+\
+Q6INSN(S2_lsl_r_##TAGEND,#REGD "32" #ACC "=lsl(" #REGS "32,Rt32)" #SATOPT,ATTRIBS(ATTRS), \
+	"Logical Shift Left by Register", \
+	{  \
+                fHIDE(size4s_t) shamt=fSXTN(7,32,RtV);\
+		REGD##V = SAT(ACCSRC ACC fBIDIR_LSHIFTL(REGS##V,shamt,REGSTYPE));  \
+	})
+
+RSHIFTTYPES(r,Rd,Rs,4_8,,,fECHO,,)
+RSHIFTTYPES(p,Rdd,Rss,8_8,,,fECHO,,)
+RSHIFTTYPES(r_acc,Rx,Rs,4_8,+,RxV,fECHO,,A_ROPS_2)
+RSHIFTTYPES(p_acc,Rxx,Rss,8_8,+,RxxV,fECHO,,A_ROPS_2)
+RSHIFTTYPES(r_nac,Rx,Rs,4_8,-,RxV,fECHO,,A_ROPS_2)
+RSHIFTTYPES(p_nac,Rxx,Rss,8_8,-,RxxV,fECHO,,A_ROPS_2)
+
+RSHIFTTYPES(r_and,Rx,Rs,4_8,&,RxV,fECHO,,A_ROPS_2)
+RSHIFTTYPES(r_or,Rx,Rs,4_8,|,RxV,fECHO,,A_ROPS_2)
+RSHIFTTYPES(p_and,Rxx,Rss,8_8,&,RxxV,fECHO,,A_ROPS_2)
+RSHIFTTYPES(p_or,Rxx,Rss,8_8,|,RxxV,fECHO,,A_ROPS_2)
+RSHIFTTYPES(p_xor,Rxx,Rss,8_8,^,RxxV,fECHO,,A_ROPS_2)
+
+
+#undef RSHIFTTYPES
+
+/* Register shift with saturation */
+#define RSATSHIFTTYPES(TAGEND,REGD,REGS,REGSTYPE) \
+Q6INSN(S2_asr_r_##TAGEND,#REGD "32" "=asr(" #REGS "32,Rt32):sat",ATTRIBS(), \
+	"Arithmetic Shift Right by Register", \
+	{  \
+                fHIDE(size4s_t) shamt=fSXTN(7,32,RtV);\
+		REGD##V = fBIDIR_ASHIFTR_SAT(REGS##V,shamt,REGSTYPE);  \
+	})\
+\
+Q6INSN(S2_asl_r_##TAGEND,#REGD "32" "=asl(" #REGS "32,Rt32):sat",ATTRIBS(), \
+	"Arithmetic Shift Left by Register", \
+	{  \
+                fHIDE(size4s_t) shamt=fSXTN(7,32,RtV);\
+		REGD##V = fBIDIR_ASHIFTL_SAT(REGS##V,shamt,REGSTYPE);  \
+	})
+
+RSATSHIFTTYPES(r_sat,Rd,Rs,4_8)
+
+
+
+
+
+#define ISHIFTTYPES(TAGEND,SIZE,REGD,REGS,REGSTYPE,ACC,ACCSRC,SAT,SATOPT,ATTRS) \
+Q6INSN(S2_asr_i_##TAGEND,#REGD "32" #ACC "=asr(" #REGS "32,#u" #SIZE ")" #SATOPT,ATTRIBS(ATTRS), \
+	"Arithmetic Shift Right by Immediate", \
+	{ REGD##V = SAT(ACCSRC ACC fASHIFTR(REGS##V,uiV,REGSTYPE)); }) \
+\
+Q6INSN(S2_lsr_i_##TAGEND,#REGD "32" #ACC "=lsr(" #REGS "32,#u" #SIZE ")" #SATOPT,ATTRIBS(ATTRS), \
+	"Logical Shift Right by Immediate", \
+	{ REGD##V = SAT(ACCSRC ACC fLSHIFTR(REGS##V,uiV,REGSTYPE)); }) \
+\
+Q6INSN(S2_asl_i_##TAGEND,#REGD "32" #ACC "=asl(" #REGS "32,#u" #SIZE ")" #SATOPT,ATTRIBS(ATTRS), \
+	"Shift Left by Immediate", \
+	{ REGD##V = SAT(ACCSRC ACC fASHIFTL(REGS##V,uiV,REGSTYPE)); }) \
+Q6INSN(S6_rol_i_##TAGEND,#REGD "32" #ACC "=rol(" #REGS "32,#u" #SIZE ")" #SATOPT,ATTRIBS(ATTRS), \
+	"Rotate Left by Immediate", \
+	{ REGD##V = SAT(ACCSRC ACC fROTL(REGS##V,uiV,REGSTYPE)); })
+
+
+#define ISHIFTTYPES_ONLY_ASL(TAGEND,SIZE,REGD,REGS,REGSTYPE,ACC,ACCSRC,SAT,SATOPT) \
+Q6INSN(S2_asl_i_##TAGEND,#REGD "32" #ACC "=asl(" #REGS "32,#u" #SIZE ")" #SATOPT,ATTRIBS(), \
+	"", \
+	{ REGD##V = SAT(ACCSRC ACC fASHIFTL(REGS##V,uiV,REGSTYPE)); })
+
+#define ISHIFTTYPES_ONLY_ASR(TAGEND,SIZE,REGD,REGS,REGSTYPE,ACC,ACCSRC,SAT,SATOPT) \
+Q6INSN(S2_asr_i_##TAGEND,#REGD "32" #ACC "=asr(" #REGS "32,#u" #SIZE ")" #SATOPT,ATTRIBS(), \
+	"", \
+	{ REGD##V = SAT(ACCSRC ACC fASHIFTR(REGS##V,uiV,REGSTYPE)); })
+
+
+#define ISHIFTTYPES_NOASR(TAGEND,SIZE,REGD,REGS,REGSTYPE,ACC,ACCSRC,SAT,SATOPT) \
+Q6INSN(S2_lsr_i_##TAGEND,#REGD "32" #ACC "=lsr(" #REGS "32,#u" #SIZE ")" #SATOPT,ATTRIBS(), \
+	"Logical Shift Right by Register", \
+	{ REGD##V = SAT(ACCSRC ACC fLSHIFTR(REGS##V,uiV,REGSTYPE)); }) \
+Q6INSN(S2_asl_i_##TAGEND,#REGD "32" #ACC "=asl(" #REGS "32,#u" #SIZE ")" #SATOPT,ATTRIBS(), \
+	"Shift Left by Register", \
+	{ REGD##V = SAT(ACCSRC ACC fASHIFTL(REGS##V,uiV,REGSTYPE)); }) \
+Q6INSN(S6_rol_i_##TAGEND,#REGD "32" #ACC "=rol(" #REGS "32,#u" #SIZE ")" #SATOPT,ATTRIBS(), \
+	"Rotate Left by Immediate", \
+	{ REGD##V = SAT(ACCSRC ACC fROTL(REGS##V,uiV,REGSTYPE)); })
+
+
+
+ISHIFTTYPES(r,5,Rd,Rs,4_4,,,fECHO,,)
+ISHIFTTYPES(p,6,Rdd,Rss,8_8,,,fECHO,,)
+ISHIFTTYPES(r_acc,5,Rx,Rs,4_4,+,RxV,fECHO,,A_ROPS_2)
+ISHIFTTYPES(p_acc,6,Rxx,Rss,8_8,+,RxxV,fECHO,,A_ROPS_2)
+ISHIFTTYPES(r_nac,5,Rx,Rs,4_4,-,RxV,fECHO,,A_ROPS_2)
+ISHIFTTYPES(p_nac,6,Rxx,Rss,8_8,-,RxxV,fECHO,,A_ROPS_2)
+
+ISHIFTTYPES_NOASR(r_xacc,5,Rx,Rs,4_4,^, RxV,fECHO,)
+ISHIFTTYPES_NOASR(p_xacc,6,Rxx,Rss,8_8,^, RxxV,fECHO,)
+
+ISHIFTTYPES(r_and,5,Rx,Rs,4_4,&,RxV,fECHO,,A_ROPS_2)
+ISHIFTTYPES(r_or,5,Rx,Rs,4_4,|,RxV,fECHO,,A_ROPS_2)
+ISHIFTTYPES(p_and,6,Rxx,Rss,8_8,&,RxxV,fECHO,,A_ROPS_2)
+ISHIFTTYPES(p_or,6,Rxx,Rss,8_8,|,RxxV,fECHO,,A_ROPS_2)
+
+ISHIFTTYPES_ONLY_ASL(r_sat,5,Rd,Rs,4_8,,,fSAT,:sat)
+
+
+Q6INSN(S2_asr_i_r_rnd,"Rd32=asr(Rs32,#u5):rnd",ATTRIBS(),
+       "Shift right with round",
+       { RdV = fASHIFTR(((fASHIFTR(RsV,uiV,4_8))+1),1,8_8); })
+
+
+DEF_V2_COND_MAPPING(S2_asr_i_r_rnd_goodsyntax,"Rd32=asrrnd(Rs32,#u5)","#u5==0","Rd32=Rs32","Rd32=asr(Rs32,#u5-1):rnd")
+
+
+
+
+Q6INSN(S2_asr_i_p_rnd,"Rdd32=asr(Rss32,#u6):rnd",ATTRIBS(), "Shift right with round",
+{ fHIDE(size8u_t tmp;)
+  fHIDE(size8u_t rnd;)
+  tmp = fASHIFTR(RssV,uiV,8_8);
+  rnd = tmp & 1;
+  RddV = fASHIFTR(tmp,1,8_8) + rnd; })
+
+
+DEF_V5_COND_MAPPING(S2_asr_i_p_rnd_goodsyntax,"Rdd32=asrrnd(Rss32,#u6)","#u6==0","Rdd32=Rss32","Rdd32=asr(Rss32,#u6-1):rnd")
+
+
+
+Q6INSN(S4_lsli,"Rd32=lsl(#s6,Rt32)",ATTRIBS(), "Shift an immediate left by register amount",
+{
+	fHIDE(size4s_t) shamt = fSXTN(7,32,RtV);
+	RdV = fBIDIR_LSHIFTL(siV,shamt,4_8);
+})
+
+
+
+
+Q6INSN(S2_addasl_rrri,"Rd32=addasl(Rt32,Rs32,#u3)",ATTRIBS(),
+	"Shift left by small amount and add",
+	{ RdV = RtV + fASHIFTL(RsV,uiV,4_4); })
+
+
+
+#define SHIFTOPI(TAGEND,INNEROP,INNERSEM)\
+Q6INSN(S4_andi_##TAGEND,"Rx32=and(#u8,"INNEROP")",,"Shift-op",{RxV=fIMMEXT(uiV)&INNERSEM;})\
+Q6INSN(S4_ori_##TAGEND, "Rx32=or(#u8,"INNEROP")",,"Shift-op",{RxV=fIMMEXT(uiV)|INNERSEM;})\
+Q6INSN(S4_addi_##TAGEND,"Rx32=add(#u8,"INNEROP")",,"Shift-op",{RxV=fIMMEXT(uiV)+INNERSEM;})\
+Q6INSN(S4_subi_##TAGEND,"Rx32=sub(#u8,"INNEROP")",,"Shift-op",{RxV=fIMMEXT(uiV)-INNERSEM;})
+
+
+SHIFTOPI(asl_ri,"asl(Rx32,#U5)",(RxV<<UiV))
+SHIFTOPI(lsr_ri,"lsr(Rx32,#U5)",(((unsigned int)RxV)>>UiV))
+
+
+/**********************************************/
+/* PERMUTES                                   */
+/**********************************************/
+Q6INSN(S2_valignib,"Rdd32=valignb(Rtt32,Rss32,#u3)",
+ATTRIBS(), "Vector align bytes",
+{
+  RddV = (fLSHIFTR(RssV,uiV*8,8_8))|(fASHIFTL(RttV,((8-uiV)*8),8_8));
+})
+
+Q6INSN(S2_valignrb,"Rdd32=valignb(Rtt32,Rss32,Pu4)",
+ATTRIBS(), "Align with register",
+{ fPREDUSE_TIMING();RddV = fLSHIFTR(RssV,(PuV&0x7)*8,8_8)|(fASHIFTL(RttV,(8-(PuV&0x7))*8,8_8));})
+
+Q6INSN(S2_vspliceib,"Rdd32=vspliceb(Rss32,Rtt32,#u3)",
+ATTRIBS(), "Vector splice bytes",
+{ RddV = fASHIFTL(RttV,uiV*8,8_8) | fZXTN(uiV*8,64,RssV); })
+
+Q6INSN(S2_vsplicerb,"Rdd32=vspliceb(Rss32,Rtt32,Pu4)",
+ATTRIBS(), "Splice with register",
+{ fPREDUSE_TIMING();RddV = fASHIFTL(RttV,(PuV&7)*8,8_8) | fZXTN((PuV&7)*8,64,RssV); })
+
+Q6INSN(S2_vsplatrh,"Rdd32=vsplath(Rs32)",
+ATTRIBS(), "Vector splat halfwords from register",
+{
+        fHIDE(int i;)
+        for (i=0;i<4;i++) {
+	  fSETHALF(i,RddV, fGETHALF(0,RsV));
+        }
+})
+
+
+Q6INSN(S2_vsplatrb,"Rd32=vsplatb(Rs32)",
+ATTRIBS(), "Vector splat bytes from register",
+{
+        fHIDE(int i;)
+        for (i=0;i<4;i++) {
+	  fSETBYTE(i,RdV, fGETBYTE(0,RsV));
+        }
+})
+
+Q6INSN(S6_vsplatrbp,"Rdd32=vsplatb(Rs32)",
+ATTRIBS(), "Vector splat bytes from register",
+{
+        fHIDE(int i;)
+        for (i=0;i<8;i++) {
+	  fSETBYTE(i,RddV, fGETBYTE(0,RsV));
+        }
+})
+
+
+
+/**********************************************/
+/* Insert/Extract[u]                          */
+/**********************************************/
+
+Q6INSN(S2_insert,"Rx32=insert(Rs32,#u5,#U5)",
+ATTRIBS(), "Insert bits",
+{
+        fHIDE(int) width=uiV;
+        fHIDE(int) offset=UiV;
+	/* clear bits in Rxx where new bits go */
+	RxV &= ~(((fCONSTLL(1)<<width)-1)<<offset);
+	/* OR in new bits */
+	RxV |= ((RsV & ((fCONSTLL(1)<<width)-1)) << offset);
+})
+
+
+Q6INSN(S2_tableidxb,"Rx32=tableidxb(Rs32,#u4,#S6):raw",
+ATTRIBS(A_ARCHV2), "Extract and insert bits",
+{
+        fHIDE(int) width=uiV;
+        fHIDE(int) offset=SiV;
+        fHIDE(int) field = fEXTRACTU_BIDIR(RsV,width,offset);
+        fINSERT_BITS(RxV,width,0,field);
+})
+
+Q6INSN(S2_tableidxh,"Rx32=tableidxh(Rs32,#u4,#S6):raw",
+ATTRIBS(A_ARCHV2), "Extract and insert bits",
+{
+        fHIDE(int) width=uiV;
+        fHIDE(int) offset=SiV+1;
+        fHIDE(int) field = fEXTRACTU_BIDIR(RsV,width,offset);
+        fINSERT_BITS(RxV,width,1,field);
+})
+
+Q6INSN(S2_tableidxw,"Rx32=tableidxw(Rs32,#u4,#S6):raw",
+ATTRIBS(A_ARCHV2), "Extract and insert bits",
+{
+        fHIDE(int) width=uiV;
+        fHIDE(int) offset=SiV+2;
+        fHIDE(int) field = fEXTRACTU_BIDIR(RsV,width,offset);
+        fINSERT_BITS(RxV,width,2,field);
+})
+
+Q6INSN(S2_tableidxd,"Rx32=tableidxd(Rs32,#u4,#S6):raw",
+ATTRIBS(A_ARCHV2), "Extract and insert bits",
+{
+        fHIDE(int) width=uiV;
+        fHIDE(int) offset=SiV+3;
+        fHIDE(int) field = fEXTRACTU_BIDIR(RsV,width,offset);
+        fINSERT_BITS(RxV,width,3,field);
+})
+
+DEF_V2_MAPPING(S2_tableidxb_goodsyntax,"Rx32=tableidxb(Rs32,#u4,#U5)","Rx32=tableidxb(Rs32,#u4,#U5):raw")
+DEF_V2_MAPPING(S2_tableidxh_goodsyntax,"Rx32=tableidxh(Rs32,#u4,#U5)","Rx32=tableidxh(Rs32,#u4,#U5-1):raw")
+DEF_V2_MAPPING(S2_tableidxw_goodsyntax,"Rx32=tableidxw(Rs32,#u4,#U5)","Rx32=tableidxw(Rs32,#u4,#U5-2):raw")
+DEF_V2_MAPPING(S2_tableidxd_goodsyntax,"Rx32=tableidxd(Rs32,#u4,#U5)","Rx32=tableidxd(Rs32,#u4,#U5-3):raw")
+
+
+
+
+
+Q6INSN(A4_bitspliti,"Rdd32=bitsplit(Rs32,#u5)",
+ATTRIBS(), "Split a bitfield into two registers",
+{
+        fSETWORD(1,RddV,(fCAST4_4u(RsV)>>uiV));
+        fSETWORD(0,RddV,fZXTN(uiV,32,RsV));
+})
+
+Q6INSN(A4_bitsplit,"Rdd32=bitsplit(Rs32,Rt32)",
+ATTRIBS(), "Split a bitfield into two registers",
+{
+		fHIDE(size4u_t) shamt = fZXTN(5,32,RtV);
+        fSETWORD(1,RddV,(fCAST4_4u(RsV)>>shamt));
+        fSETWORD(0,RddV,fZXTN(shamt,32,RsV));
+})
+
+
+
+
+Q6INSN(S4_extract,"Rd32=extract(Rs32,#u5,#U5)",
+ATTRIBS(), "Extract signed bitfield",
+{
+        fHIDE(int) width=uiV;
+        fHIDE(int) offset=UiV;
+		RdV = fSXTN(width,32,(fCAST4_4u(RsV) >> offset));
+})
+
+
+Q6INSN(S2_extractu,"Rd32=extractu(Rs32,#u5,#U5)",
+ATTRIBS(), "Extract unsigned bitfield",
+{
+        fHIDE(int) width=uiV;
+        fHIDE(int) offset=UiV;
+		RdV = fZXTN(width,32,(fCAST4_4u(RsV) >> offset));
+})
+
+Q6INSN(S2_insertp,"Rxx32=insert(Rss32,#u6,#U6)",
+ATTRIBS(), "Insert bits",
+{
+        fHIDE(int) width=uiV;
+      	fHIDE(int) offset=UiV;
+		/* clear bits in Rxx where new bits go */
+		RxxV &= ~(((fCONSTLL(1)<<width)-1)<<offset);
+		/* OR in new bits */
+		RxxV |= ((RssV & ((fCONSTLL(1)<<width)-1)) << offset);
+})
+
+
+Q6INSN(S4_extractp,"Rdd32=extract(Rss32,#u6,#U6)",
+ATTRIBS(), "Extract signed bitfield",
+{
+        fHIDE(int) width=uiV;
+        fHIDE(int) offset=UiV;
+	RddV = fSXTN(width,64,(fCAST8_8u(RssV) >> offset));
+})
+
+
+Q6INSN(S2_extractup,"Rdd32=extractu(Rss32,#u6,#U6)",
+ATTRIBS(), "Extract unsigned bitfield",
+{
+        fHIDE(int) width=uiV;
+        fHIDE(int) offset=UiV;
+	RddV = fZXTN(width,64,(fCAST8_8u(RssV) >> offset));
+})
+
+
+
+
+Q6INSN(S2_mask,"Rd32=mask(#u5,#U5)",
+ATTRIBS(), "Form mask from immediate",
+{
+    RdV = ((1<<uiV)-1) << UiV;
+})
+
+
+
+
+
+Q6INSN(S2_insert_rp,"Rx32=insert(Rs32,Rtt32)",
+ATTRIBS(), "Insert bits",
+{
+        fHIDE(int) width=fZXTN(6,32,(fGETWORD(1,RttV)));
+        fHIDE(int) offset=fSXTN(7,32,(fGETWORD(0,RttV)));
+	fHIDE(size8u_t) mask = ((fCONSTLL(1)<<width)-1);
+	if (offset < 0) {
+		RxV = 0;
+	} else {
+		/* clear bits in Rxx where new bits go */
+		RxV &= ~(mask<<offset);
+		/* OR in new bits */
+		RxV |= ((RsV & mask) << offset);
+	}
+})
+
+
+Q6INSN(S4_extract_rp,"Rd32=extract(Rs32,Rtt32)",
+ATTRIBS(), "Extract signed bitfield",
+{
+        fHIDE(int) width=fZXTN(6,32,(fGETWORD(1,RttV)));
+        fHIDE(int) offset=fSXTN(7,32,(fGETWORD(0,RttV)));
+	RdV = fSXTN(width,64,fBIDIR_LSHIFTR(fCAST4_8u(RsV),offset,4_8));
+})
+
+
+
+Q6INSN(S2_extractu_rp,"Rd32=extractu(Rs32,Rtt32)",
+ATTRIBS(), "Extract unsigned bitfield",
+{
+        fHIDE(int) width=fZXTN(6,32,(fGETWORD(1,RttV)));
+        fHIDE(int) offset=fSXTN(7,32,(fGETWORD(0,RttV)));
+	RdV = fZXTN(width,64,fBIDIR_LSHIFTR(fCAST4_8u(RsV),offset,4_8));
+})
+
+Q6INSN(S2_insertp_rp,"Rxx32=insert(Rss32,Rtt32)",
+ATTRIBS(), "Insert bits",
+{
+        fHIDE(int) width=fZXTN(6,32,(fGETWORD(1,RttV)));
+        fHIDE(int) offset=fSXTN(7,32,(fGETWORD(0,RttV)));
+	fHIDE(size8u_t) mask = ((fCONSTLL(1)<<width)-1);
+	if (offset < 0) {
+		RxxV = 0;
+	} else {
+		/* clear bits in Rxx where new bits go */
+		RxxV &= ~(mask<<offset);
+		/* OR in new bits */
+		RxxV |= ((RssV & mask) << offset);
+	}
+})
+
+
+Q6INSN(S4_extractp_rp,"Rdd32=extract(Rss32,Rtt32)",
+ATTRIBS(), "Extract signed bitfield",
+{
+        fHIDE(int) width=fZXTN(6,32,(fGETWORD(1,RttV)));
+        fHIDE(int) offset=fSXTN(7,32,(fGETWORD(0,RttV)));
+	RddV = fSXTN(width,64,fBIDIR_LSHIFTR(fCAST8_8u(RssV),offset,8_8));
+})
+
+
+Q6INSN(S2_extractup_rp,"Rdd32=extractu(Rss32,Rtt32)",
+ATTRIBS(), "Extract unsigned bitfield",
+{
+        fHIDE(int) width=fZXTN(6,32,(fGETWORD(1,RttV)));
+        fHIDE(int) offset=fSXTN(7,32,(fGETWORD(0,RttV)));
+	RddV = fZXTN(width,64,fBIDIR_LSHIFTR(fCAST8_8u(RssV),offset,8_8));
+})
+
+/**********************************************/
+/* tstbit/setbit/clrbit                       */
+/**********************************************/
+
+Q6INSN(S2_tstbit_i,"Pd4=tstbit(Rs32,#u5)",
+ATTRIBS(), "Test a bit",
+{
+	PdV = f8BITSOF((RsV & (1<<uiV)) != 0);
+})
+
+Q6INSN(S4_ntstbit_i,"Pd4=!tstbit(Rs32,#u5)",
+ATTRIBS(), "Test a bit",
+{
+	PdV = f8BITSOF((RsV & (1<<uiV)) == 0);
+})
+
+Q6INSN(S2_setbit_i,"Rd32=setbit(Rs32,#u5)",
+ATTRIBS(), "Set a bit",
+{
+	RdV = (RsV | (1<<uiV));
+})
+
+Q6INSN(S2_togglebit_i,"Rd32=togglebit(Rs32,#u5)",
+ATTRIBS(), "Toggle a bit",
+{
+	RdV = (RsV ^ (1<<uiV));
+})
+
+Q6INSN(S2_clrbit_i,"Rd32=clrbit(Rs32,#u5)",
+ATTRIBS(), "Clear a bit",
+{
+	RdV = (RsV & (~(1<<uiV)));
+})
+
+
+
+/* using a register */
+Q6INSN(S2_tstbit_r,"Pd4=tstbit(Rs32,Rt32)",
+ATTRIBS(), "Test a bit",
+{
+	PdV = f8BITSOF((fCAST4_8u(RsV) & fBIDIR_LSHIFTL(1,fSXTN(7,32,RtV),4_8)) != 0);
+})
+
+Q6INSN(S4_ntstbit_r,"Pd4=!tstbit(Rs32,Rt32)",
+ATTRIBS(), "Test a bit",
+{
+	PdV = f8BITSOF((fCAST4_8u(RsV) & fBIDIR_LSHIFTL(1,fSXTN(7,32,RtV),4_8)) == 0);
+})
+
+Q6INSN(S2_setbit_r,"Rd32=setbit(Rs32,Rt32)",
+ATTRIBS(), "Set a bit",
+{
+	RdV = (RsV | fBIDIR_LSHIFTL(1,fSXTN(7,32,RtV),4_8));
+})
+
+Q6INSN(S2_togglebit_r,"Rd32=togglebit(Rs32,Rt32)",
+ATTRIBS(), "Toggle a bit",
+{
+	RdV = (RsV ^ fBIDIR_LSHIFTL(1,fSXTN(7,32,RtV),4_8));
+})
+
+Q6INSN(S2_clrbit_r,"Rd32=clrbit(Rs32,Rt32)",
+ATTRIBS(), "Clear a bit",
+{
+	RdV = (RsV & (~(fBIDIR_LSHIFTL(1,fSXTN(7,32,RtV),4_8))));
+})
+
+
+/**********************************************/
+/* vector shifting                            */
+/**********************************************/
+
+/* Half Vector Immediate Shifts */
+
+Q6INSN(S2_asr_i_vh,"Rdd32=vasrh(Rss32,#u4)",ATTRIBS(),
+	"Vector Arithmetic Shift Right by Immediate",
+{
+        fHIDE(int i;)
+        for (i=0;i<4;i++) {
+	  fSETHALF(i,RddV, (fGETHALF(i,RssV)>>uiV));
+        }
+})
+
+
+Q6INSN(S2_lsr_i_vh,"Rdd32=vlsrh(Rss32,#u4)",ATTRIBS(),
+	"Vector Logical Shift Right by Immediate",
+{
+        fHIDE(int i;)
+        for (i=0;i<4;i++) {
+	  fSETHALF(i,RddV, (fGETUHALF(i,RssV)>>uiV));
+        }
+})
+
+Q6INSN(S2_asl_i_vh,"Rdd32=vaslh(Rss32,#u4)",ATTRIBS(),
+	"Vector Arithmetic Shift Left by Immediate",
+{
+        fHIDE(int i;)
+        for (i=0;i<4;i++) {
+	  fSETHALF(i,RddV, (fGETHALF(i,RssV)<<uiV));
+        }
+})
+
+/* Half Vector Register Shifts */
+
+Q6INSN(S2_asr_r_vh,"Rdd32=vasrh(Rss32,Rt32)",ATTRIBS(A_NOTE_OOBVSHIFT),
+	"Vector Arithmetic Shift Right by Register",
+{
+        fHIDE(int i;)
+        for (i=0;i<4;i++) {
+	  fSETHALF(i,RddV, fBIDIR_ASHIFTR(fGETHALF(i,RssV),fSXTN(7,32,RtV),2_8));
+        }
+})
+
+Q6INSN(S5_asrhub_rnd_sat,"Rd32=vasrhub(Rss32,#u4):raw",,
+	"Vector Arithmetic Shift Right by Immediate with Round, Saturate, and Pack",
+{
+        fHIDE(int i;)
+        for (i=0;i<4;i++) {
+			fSETBYTE(i,RdV, fSATUB( ((fGETHALF(i,RssV) >> uiV )+1)>>1  ));
+        }
+})
+
+DEF_V5_COND_MAPPING(S5_asrhub_rnd_sat_goodsyntax,"Rd32=vasrhub(Rss32,#u4):rnd:sat","#u4==0","Rd32=vsathub(Rss32)","Rd32=vasrhub(Rss32,#u4-1):raw")
+
+Q6INSN(S5_asrhub_sat,"Rd32=vasrhub(Rss32,#u4):sat",,
+	"Vector Arithmetic Shift Right by Immediate with Saturate and Pack",
+{
+        fHIDE(int i;)
+        for (i=0;i<4;i++) {
+			fSETBYTE(i,RdV, fSATUB( fGETHALF(i,RssV) >> uiV ));
+        }
+})
+
+
+
+Q6INSN(S5_vasrhrnd,"Rdd32=vasrh(Rss32,#u4):raw",,
+	"Vector Arithmetic Shift Right by Immediate with Round",
+{
+        fHIDE(int i;)
+        for (i=0;i<4;i++) {
+			fSETHALF(i,RddV, ( ((fGETHALF(i,RssV) >> uiV)+1)>>1  ));
+        }
+})
+
+DEF_V5_COND_MAPPING(S5_vasrhrnd_goodsyntax,"Rdd32=vasrh(Rss32,#u4):rnd","#u4==0","Rdd32=Rss32","Rdd32=vasrh(Rss32,#u4-1):raw")
+
+
+
+Q6INSN(S2_asl_r_vh,"Rdd32=vaslh(Rss32,Rt32)",ATTRIBS(A_NOTE_OOBVSHIFT),
+	"Vector Arithmetic Shift Left by Register",
+{
+        fHIDE(int i;)
+        for (i=0;i<4;i++) {
+	  fSETHALF(i,RddV, fBIDIR_ASHIFTL(fGETHALF(i,RssV),fSXTN(7,32,RtV),2_8));
+        }
+})
+
+
+
+Q6INSN(S2_lsr_r_vh,"Rdd32=vlsrh(Rss32,Rt32)",ATTRIBS(A_NOTE_OOBVSHIFT),
+	"Vector Logical Shift Right by Register",
+{
+        fHIDE(int i;)
+        for (i=0;i<4;i++) {
+	  fSETHALF(i,RddV, fBIDIR_LSHIFTR(fGETUHALF(i,RssV),fSXTN(7,32,RtV),2_8));
+        }
+})
+
+
+Q6INSN(S2_lsl_r_vh,"Rdd32=vlslh(Rss32,Rt32)",ATTRIBS(A_NOTE_OOBVSHIFT),
+	"Vector Logical Shift Left by Register",
+{
+        fHIDE(int i;)
+        for (i=0;i<4;i++) {
+	  fSETHALF(i,RddV, fBIDIR_LSHIFTL(fGETUHALF(i,RssV),fSXTN(7,32,RtV),2_8));
+        }
+})
+
+
+
+
+/* Word Vector Immediate Shifts */
+
+Q6INSN(S2_asr_i_vw,"Rdd32=vasrw(Rss32,#u5)",ATTRIBS(),
+	"Vector Arithmetic Shift Right by Immediate",
+{
+        fHIDE(int i;)
+        for (i=0;i<2;i++) {
+	  fSETWORD(i,RddV,(fGETWORD(i,RssV)>>uiV));
+        }
+})
+
+
+
+Q6INSN(S2_asr_i_svw_trun,"Rd32=vasrw(Rss32,#u5)",ATTRIBS(A_ARCHV2),
+	"Vector Arithmetic Shift Right by Immediate with Truncate and Pack",
+{
+        fHIDE(int i;)
+        for (i=0;i<2;i++) {
+	  fSETHALF(i,RdV,fGETHALF(0,(fGETWORD(i,RssV)>>uiV)));
+        }
+})
+
+Q6INSN(S2_asr_r_svw_trun,"Rd32=vasrw(Rss32,Rt32)",ATTRIBS(A_ARCHV2),
+	"Vector Arithmetic Shift Right truncate and Pack",
+{
+        fHIDE(int i;)
+        for (i=0;i<2;i++) {
+ 	  fSETHALF(i,RdV,fGETHALF(0,fBIDIR_ASHIFTR(fGETWORD(i,RssV),fSXTN(7,32,RtV),4_8)));
+        }
+})
+
+
+Q6INSN(S2_lsr_i_vw,"Rdd32=vlsrw(Rss32,#u5)",ATTRIBS(),
+	"Vector Logical Shift Right by Immediate",
+{
+        fHIDE(int i;)
+        for (i=0;i<2;i++) {
+	  fSETWORD(i,RddV,(fGETUWORD(i,RssV)>>uiV));
+        }
+})
+
+Q6INSN(S2_asl_i_vw,"Rdd32=vaslw(Rss32,#u5)",ATTRIBS(),
+	"Vector Arithmetic Shift Left by Immediate",
+{
+        fHIDE(int i;)
+        for (i=0;i<2;i++) {
+	  fSETWORD(i,RddV,(fGETWORD(i,RssV)<<uiV));
+        }
+})
+
+/* Word Vector Register Shifts */
+
+Q6INSN(S2_asr_r_vw,"Rdd32=vasrw(Rss32,Rt32)",ATTRIBS(A_NOTE_OOBVSHIFT),
+	"Vector Arithmetic Shift Right by Register",
+{
+        fHIDE(int i;)
+        for (i=0;i<2;i++) {
+ 	  fSETWORD(i,RddV, fBIDIR_ASHIFTR(fGETWORD(i,RssV),fSXTN(7,32,RtV),4_8));
+        }
+})
+
+
+
+Q6INSN(S2_asl_r_vw,"Rdd32=vaslw(Rss32,Rt32)",ATTRIBS(A_NOTE_OOBVSHIFT),
+	"Vector Arithmetic Shift Left by Register",
+{
+        fHIDE(int i;)
+        for (i=0;i<2;i++) {
+ 	  fSETWORD(i,RddV, fBIDIR_ASHIFTL(fGETWORD(i,RssV),fSXTN(7,32,RtV),4_8));
+        }
+})
+
+
+Q6INSN(S2_lsr_r_vw,"Rdd32=vlsrw(Rss32,Rt32)",ATTRIBS(A_NOTE_OOBVSHIFT),
+	"Vector Logical Shift Right by Register",
+{
+        fHIDE(int i;)
+        for (i=0;i<2;i++) {
+ 	  fSETWORD(i,RddV, fBIDIR_LSHIFTR(fGETUWORD(i,RssV),fSXTN(7,32,RtV),4_8));
+        }
+})
+
+
+
+Q6INSN(S2_lsl_r_vw,"Rdd32=vlslw(Rss32,Rt32)",ATTRIBS(A_NOTE_OOBVSHIFT),
+	"Vector Logical Shift Left by Register",
+{
+        fHIDE(int i;)
+        for (i=0;i<2;i++) {
+ 	  fSETWORD(i,RddV, fBIDIR_LSHIFTL(fGETUWORD(i,RssV),fSXTN(7,32,RtV),4_8));
+        }
+})
+
+
+
+
+
+/**********************************************/
+/* Vector SXT/ZXT/SAT/TRUN/RNDPACK            */
+/**********************************************/
+
+
+Q6INSN(S2_vrndpackwh,"Rd32=vrndwh(Rss32)",ATTRIBS(),
+"Round and Pack vector of words to Halfwords",
+{
+        fHIDE(int i;)
+        for (i=0;i<2;i++) {
+	  fSETHALF(i,RdV,fGETHALF(1,(fGETWORD(i,RssV)+0x08000)));
+        }
+})
+
+Q6INSN(S2_vrndpackwhs,"Rd32=vrndwh(Rss32):sat",ATTRIBS(),
+"Round and Pack vector of words to Halfwords",
+{
+        fHIDE(int i;)
+        for (i=0;i<2;i++) {
+	  fSETHALF(i,RdV,fGETHALF(1,fSAT(fGETWORD(i,RssV)+0x08000)));
+        }
+})
+
+Q6INSN(S2_vsxtbh,"Rdd32=vsxtbh(Rs32)",ATTRIBS(A_ARCHV2),
+"Vector sign extend byte to half",
+{
+        fHIDE(int i;)
+        for (i=0;i<4;i++) {
+  	  fSETHALF(i,RddV,fGETBYTE(i,RsV));
+        }
+})
+
+Q6INSN(S2_vzxtbh,"Rdd32=vzxtbh(Rs32)",ATTRIBS(),
+"Vector zero extend byte to half",
+{
+        fHIDE(int i;)
+        for (i=0;i<4;i++) {
+  	  fSETHALF(i,RddV,fGETUBYTE(i,RsV));
+        }
+})
+
+Q6INSN(S2_vsathub,"Rd32=vsathub(Rss32)",ATTRIBS(),
+"Vector saturate half to unsigned byte",
+{
+        fHIDE(int i;)
+        for (i=0;i<4;i++) {
+  	  fSETBYTE(i,RdV,fSATUN(8,fGETHALF(i,RssV)));
+        }
+})
+
+
+
+
+
+Q6INSN(S2_svsathub,"Rd32=vsathub(Rs32)",ATTRIBS(A_ARCHV2),
+"Vector saturate half to unsigned byte",
+{
+	fSETBYTE(0,RdV,fSATUN(8,fGETHALF(0,RsV)));
+  	fSETBYTE(1,RdV,fSATUN(8,fGETHALF(1,RsV)));
+  	fSETBYTE(2,RdV,0);
+  	fSETBYTE(3,RdV,0);
+})
+
+Q6INSN(S2_svsathb,"Rd32=vsathb(Rs32)",ATTRIBS(A_ARCHV2),
+"Vector saturate half to signed byte",
+{
+	fSETBYTE(0,RdV,fSATN(8,fGETHALF(0,RsV)));
+  	fSETBYTE(1,RdV,fSATN(8,fGETHALF(1,RsV)));
+  	fSETBYTE(2,RdV,0);
+  	fSETBYTE(3,RdV,0);
+})
+
+
+Q6INSN(S2_vsathb,"Rd32=vsathb(Rss32)",ATTRIBS(A_ARCHV2),
+"Vector saturate half to signed byte",
+{
+        fHIDE(int i;)
+        for (i=0;i<4;i++) {
+  	  fSETBYTE(i,RdV,fSATN(8,fGETHALF(i,RssV)));
+        }
+})
+
+Q6INSN(S2_vtrunohb,"Rd32=vtrunohb(Rss32)",ATTRIBS(A_ARCHV2),
+"Vector truncate half to byte: take high",
+{
+        fHIDE(int i;)
+        for (i=0;i<4;i++) {
+  	  fSETBYTE(i,RdV,fGETBYTE(i*2+1,RssV));
+        }
+})
+
+Q6INSN(S2_vtrunewh,"Rdd32=vtrunewh(Rss32,Rtt32)",ATTRIBS(A_ARCHV2),
+"Vector truncate word to half: take low",
+{
+	fSETHALF(0,RddV,fGETHALF(0,RttV));
+	fSETHALF(1,RddV,fGETHALF(2,RttV));
+	fSETHALF(2,RddV,fGETHALF(0,RssV));
+	fSETHALF(3,RddV,fGETHALF(2,RssV));
+})
+
+Q6INSN(S2_vtrunowh,"Rdd32=vtrunowh(Rss32,Rtt32)",ATTRIBS(A_ARCHV2),
+"Vector truncate word to half: take high",
+{
+	fSETHALF(0,RddV,fGETHALF(1,RttV));
+	fSETHALF(1,RddV,fGETHALF(3,RttV));
+	fSETHALF(2,RddV,fGETHALF(1,RssV));
+	fSETHALF(3,RddV,fGETHALF(3,RssV));
+})
+
+
+Q6INSN(S2_vtrunehb,"Rd32=vtrunehb(Rss32)",ATTRIBS(),
+"Vector truncate half to byte: take low",
+{
+        fHIDE(int i;)
+        for (i=0;i<4;i++) {
+  	  fSETBYTE(i,RdV,fGETBYTE(i*2,RssV));
+        }
+})
+
+Q6INSN(S6_vtrunehb_ppp,"Rdd32=vtrunehb(Rss32,Rtt32)",ATTRIBS(),
+"Vector truncate half to byte: take low",
+{
+        fHIDE(int i;)
+        for (i=0;i<4;i++) {
+  	  		fSETBYTE(i,RddV,fGETBYTE(i*2,RttV));
+			fSETBYTE(i+4,RddV,fGETBYTE(i*2,RssV));
+        }
+})
+
+Q6INSN(S6_vtrunohb_ppp,"Rdd32=vtrunohb(Rss32,Rtt32)",ATTRIBS(),
+"Vector truncate half to byte: take high",
+{
+        fHIDE(int i;)
+        for (i=0;i<4;i++) {
+  	  		fSETBYTE(i,RddV,fGETBYTE(i*2+1,RttV));
+  	  		fSETBYTE(i+4,RddV,fGETBYTE(i*2+1,RssV));
+        }
+})
+
+Q6INSN(S2_vsxthw,"Rdd32=vsxthw(Rs32)",ATTRIBS(),
+"Vector sign extend half to word",
+{
+        fHIDE(int i;)
+        for (i=0;i<2;i++) {
+  	  fSETWORD(i,RddV,fGETHALF(i,RsV));
+        }
+})
+
+Q6INSN(S2_vzxthw,"Rdd32=vzxthw(Rs32)",ATTRIBS(),
+"Vector zero extend half to word",
+{
+        fHIDE(int i;)
+        for (i=0;i<2;i++) {
+  	  fSETWORD(i,RddV,fGETUHALF(i,RsV));
+        }
+})
+
+
+Q6INSN(S2_vsatwh,"Rd32=vsatwh(Rss32)",ATTRIBS(),
+"Vector saturate word to signed half",
+{
+        fHIDE(int i;)
+        for (i=0;i<2;i++) {
+  	  fSETHALF(i,RdV,fSATN(16,fGETWORD(i,RssV)));
+        }
+})
+
+Q6INSN(S2_vsatwuh,"Rd32=vsatwuh(Rss32)",ATTRIBS(),
+"Vector saturate word to unsigned half",
+{
+        fHIDE(int i;)
+        for (i=0;i<2;i++) {
+  	  fSETHALF(i,RdV,fSATUN(16,fGETWORD(i,RssV)));
+        }
+})
+
+/* Other misc insns of this type */
+
+Q6INSN(S2_packhl,"Rdd32=packhl(Rs32,Rt32)",ATTRIBS(),
+"Pack high halfwords and low halfwords together",
+{
+    fSETHALF(0,RddV,fGETHALF(0,RtV));
+    fSETHALF(1,RddV,fGETHALF(0,RsV));
+    fSETHALF(2,RddV,fGETHALF(1,RtV));
+    fSETHALF(3,RddV,fGETHALF(1,RsV));
+})
+
+Q6INSN(A2_swiz,"Rd32=swiz(Rs32)",ATTRIBS(A_ARCHV2),
+"Endian swap the bytes of Rs",
+{
+    fSETBYTE(0,RdV,fGETBYTE(3,RsV));
+    fSETBYTE(1,RdV,fGETBYTE(2,RsV));
+    fSETBYTE(2,RdV,fGETBYTE(1,RsV));
+    fSETBYTE(3,RdV,fGETBYTE(0,RsV));
+})
+
+
+
+/* Vector Sat without Packing */
+Q6INSN(S2_vsathub_nopack,"Rdd32=vsathub(Rss32)",ATTRIBS(),
+"Vector saturate half to unsigned byte",
+{
+        fHIDE(int i;)
+        for (i=0;i<4;i++) {
+  	  fSETHALF(i,RddV,fSATUN(8,fGETHALF(i,RssV)));
+        }
+})
+
+Q6INSN(S2_vsathb_nopack,"Rdd32=vsathb(Rss32)",ATTRIBS(A_ARCHV2),
+"Vector saturate half to signed byte without pack",
+{
+        fHIDE(int i;)
+        for (i=0;i<4;i++) {
+  	  fSETHALF(i,RddV,fSATN(8,fGETHALF(i,RssV)));
+        }
+})
+
+Q6INSN(S2_vsatwh_nopack,"Rdd32=vsatwh(Rss32)",ATTRIBS(),
+"Vector saturate word to signed half",
+{
+        fHIDE(int i;)
+        for (i=0;i<2;i++) {
+  	  fSETWORD(i,RddV,fSATN(16,fGETWORD(i,RssV)));
+        }
+})
+
+Q6INSN(S2_vsatwuh_nopack,"Rdd32=vsatwuh(Rss32)",ATTRIBS(),
+"Vector saturate word to unsigned half",
+{
+        fHIDE(int i;)
+        for (i=0;i<2;i++) {
+  	  fSETWORD(i,RddV,fSATUN(16,fGETWORD(i,RssV)));
+        }
+})
+
+
+/**********************************************/
+/* Shuffle                                    */
+/**********************************************/
+
+
+Q6INSN(S2_shuffob,"Rdd32=shuffob(Rtt32,Rss32)",ATTRIBS(),
+"Shuffle high bytes together",
+{
+        fHIDE(int i;)
+        for (i=0;i<4;i++) {
+	  fSETBYTE(i*2  ,RddV,fGETBYTE(i*2+1,RssV));
+	  fSETBYTE(i*2+1,RddV,fGETBYTE(i*2+1,RttV));
+        }
+})
+
+Q6INSN(S2_shuffeb,"Rdd32=shuffeb(Rss32,Rtt32)",ATTRIBS(),
+"Shuffle low bytes together",
+{
+        fHIDE(int i;)
+        for (i=0;i<4;i++) {
+	  fSETBYTE(i*2  ,RddV,fGETBYTE(i*2,RttV));
+	  fSETBYTE(i*2+1,RddV,fGETBYTE(i*2,RssV));
+        }
+})
+
+Q6INSN(S2_shuffoh,"Rdd32=shuffoh(Rtt32,Rss32)",ATTRIBS(),
+"Shuffle high halves together",
+{
+        fHIDE(int i;)
+        for (i=0;i<2;i++) {
+	  fSETHALF(i*2  ,RddV,fGETHALF(i*2+1,RssV));
+	  fSETHALF(i*2+1,RddV,fGETHALF(i*2+1,RttV));
+        }
+})
+
+Q6INSN(S2_shuffeh,"Rdd32=shuffeh(Rss32,Rtt32)",ATTRIBS(),
+"Shuffle low halves together",
+{
+        fHIDE(int i;)
+        for (i=0;i<2;i++) {
+	  fSETHALF(i*2  ,RddV,fGETHALF(i*2,RttV));
+	  fSETHALF(i*2+1,RddV,fGETHALF(i*2,RssV));
+        }
+})
+
+
+/**********************************************/
+/* Strange bit instructions                   */
+/**********************************************/
+
+Q6INSN(S5_popcountp,"Rd32=popcount(Rss32)",ATTRIBS(),
+"Population Count", { RdV = fCOUNTONES_8(RssV); })
+
+Q6INSN(S4_parity,"Rd32=parity(Rs32,Rt32)",,
+"Parity of Masked Value", { RdV = 1&fCOUNTONES_4(RsV & RtV); })
+
+Q6INSN(S2_parityp,"Rd32=parity(Rss32,Rtt32)",ATTRIBS(A_ARCHV2),
+"Parity of Masked Value", { RdV = 1&fCOUNTONES_8(RssV & RttV); })
+
+Q6INSN(S2_lfsp,"Rdd32=lfs(Rss32,Rtt32)",ATTRIBS(A_ARCHV2),
+"Parity of Masked Value", { RddV = (fCAST8u(RssV) >> 1) | (fCAST8u((1&fCOUNTONES_8(RssV & RttV)))<<63) ; })
+
+Q6INSN(S2_clbnorm,"Rd32=normamt(Rs32)",ATTRIBS(A_ARCHV2),
+"Count leading sign bits - 1", { if (RsV == 0) { RdV = 0; } else { RdV = (fMAX(fCL1_4(RsV),fCL1_4(~RsV)))-1;} })
+
+Q6INSN(S4_clbaddi,"Rd32=add(clb(Rs32),#s6)",ATTRIBS(A_ARCHV2),
+"Count leading sign bits then add signed number",
+{ RdV = (fMAX(fCL1_4(RsV),fCL1_4(~RsV)))+siV;} )
+
+Q6INSN(S4_clbpnorm,"Rd32=normamt(Rss32)",ATTRIBS(A_ARCHV2),
+"Count leading sign bits - 1", { if (RssV == 0) { RdV = 0; }
+else { RdV = (fMAX(fCL1_8(RssV),fCL1_8(~RssV)))-1;}})
+
+Q6INSN(S4_clbpaddi,"Rd32=add(clb(Rss32),#s6)",ATTRIBS(A_ARCHV2),
+"Count leading sign bits then add signed number",
+{ RdV = (fMAX(fCL1_8(RssV),fCL1_8(~RssV)))+siV;})
+
+
+
+Q6INSN(S2_cabacdecbin,"Rdd32=decbin(Rss32,Rtt32)",ATTRIBS(A_ARCHV3,A_NOTE_LATEPRED,A_RESTRICT_LATEPRED),"CABAC decode bin",
+{
+	fHIDE(size4u_t state;)
+	fHIDE(size4u_t valMPS;)
+	fHIDE(size4u_t bitpos;)
+	fHIDE(size4u_t range;)
+	fHIDE(size4u_t offset;)
+	fHIDE(size4u_t rLPS;)
+	fHIDE(size4u_t rMPS;)
+
+	state =  fEXTRACTU_RANGE( fGETWORD(1,RttV) ,5,0);
+	valMPS = fEXTRACTU_RANGE( fGETWORD(1,RttV) ,8,8);
+	bitpos = fEXTRACTU_RANGE( fGETWORD(0,RttV) ,4,0);
+        range =  fGETWORD(0,RssV);
+        offset = fGETWORD(1,RssV);
+
+        /* calculate rLPS */
+        range <<= bitpos;
+        offset <<= bitpos;
+        rLPS = rLPS_table_64x4[state][ (range >>29)&3];
+        rLPS  = rLPS << 23;   /* left aligned */
+
+        /* calculate rMPS */
+        rMPS= (range&0xff800000) - rLPS;
+
+        /* most probable region */
+        if (offset < rMPS) {
+            RddV = AC_next_state_MPS_64[state];
+            fINSERT_RANGE(RddV,8,8,valMPS);
+            fINSERT_RANGE(RddV,31,23,(rMPS>>23));
+            fSETWORD(1,RddV,offset);
+            fWRITE_P0(valMPS);
+
+
+        }
+        /* least probable region */
+        else {
+            RddV = AC_next_state_LPS_64[state];
+            fINSERT_RANGE(RddV,8,8,((!state)?(1-valMPS):(valMPS)));
+            fINSERT_RANGE(RddV,31,23,(rLPS>>23));
+            fSETWORD(1,RddV,(offset-rMPS));
+            fWRITE_P0((valMPS^1));
+        }
+        fHIDE(MARK_LATE_PRED_WRITE(0))
+})
+
+
+
+
+Q6INSN(S2_cabacencbin,"Rdd32=encbin(Rss32,Rtt32,Pu4)",ATTRIBS(A_FAKEINSN),"CABAC encode bin",
+{
+    fPREDUSE_TIMING();
+	fHIDE(size4u_t state;)
+	fHIDE(size4u_t valMPS;)
+	fHIDE(size4u_t bitpos;)
+	fHIDE(size4u_t range;)
+	fHIDE(size4u_t low;)
+	fHIDE(size4u_t rLPS;)
+	fHIDE(size4u_t rMPS;)
+	fHIDE(size4u_t bin;)
+
+	state =  fEXTRACTU_RANGE( fGETWORD(1,RttV) ,5,0);
+	valMPS = fEXTRACTU_RANGE( fGETWORD(1,RttV) ,8,8);
+	bitpos = fEXTRACTU_RANGE( fGETWORD(0,RttV) ,4,0);
+	range =  fGETWORD(0,RssV);
+	low = fGETWORD(1,RssV);
+	bin = fLSBOLD(PuV);
+
+	/* Strip of MSB zeros */
+	range <<= bitpos;
+	/* Mask off low bits */
+	range &= 0xFF800000U;
+
+        /* calculate rLPS */
+        rLPS = rLPS_table_64x4[state][(range>>29)&3];
+        rLPS  = rLPS << 23;   /* left aligned */
+
+	/* Calculate rMPS */
+	rMPS = range - rLPS;
+
+        /* most probable region */
+        if (bin == valMPS) {
+            RddV = AC_next_state_MPS_64[state];
+            fINSERT_RANGE(RddV,8,8,valMPS);
+            fINSERT_RANGE(RddV,31,23,(rMPS>>23));
+            fSETWORD(1,RddV,low);
+        }
+        /* least probable region */
+        else {
+            RddV = AC_next_state_LPS_64[state];
+            fINSERT_RANGE(RddV,8,8,((!state)?(1-valMPS):(valMPS)));
+            fINSERT_RANGE(RddV,31,23,(rLPS>>23));
+            fSETWORD(1,RddV,(low+(rMPS >> bitpos)));
+        }
+})
+
+
+
+
+Q6INSN(S2_clb,"Rd32=clb(Rs32)",ATTRIBS(),
+"Count leading bits", {RdV = fMAX(fCL1_4(RsV),fCL1_4(~RsV));})
+
+
+Q6INSN(S2_cl0,"Rd32=cl0(Rs32)",ATTRIBS(),
+"Count leading bits", {RdV = fCL1_4(~RsV);})
+
+Q6INSN(S2_cl1,"Rd32=cl1(Rs32)",ATTRIBS(),
+"Count leading bits", {RdV = fCL1_4(RsV);})
+
+Q6INSN(S2_clbp,"Rd32=clb(Rss32)",ATTRIBS(),
+"Count leading bits", {RdV = fMAX(fCL1_8(RssV),fCL1_8(~RssV));})
+
+Q6INSN(S2_cl0p,"Rd32=cl0(Rss32)",ATTRIBS(),
+"Count leading bits", {RdV = fCL1_8(~RssV);})
+
+Q6INSN(S2_cl1p,"Rd32=cl1(Rss32)",ATTRIBS(),
+"Count leading bits", {RdV = fCL1_8(RssV);})
+
+
+
+
+Q6INSN(S2_brev,	"Rd32=brev(Rs32)",   ATTRIBS(A_ARCHV2), "Bit Reverse",{RdV = fBREV_4(RsV);})
+Q6INSN(S2_brevp,"Rdd32=brev(Rss32)", ATTRIBS(), "Bit Reverse",{RddV = fBREV_8(RssV);})
+Q6INSN(S2_ct0,  "Rd32=ct0(Rs32)",    ATTRIBS(A_ARCHV2), "Count Trailing",{RdV = fCL1_4(~fBREV_4(RsV));})
+Q6INSN(S2_ct1,  "Rd32=ct1(Rs32)",    ATTRIBS(A_ARCHV2), "Count Trailing",{RdV = fCL1_4(fBREV_4(RsV));})
+Q6INSN(S2_ct0p, "Rd32=ct0(Rss32)",   ATTRIBS(), "Count Trailing",{RdV = fCL1_8(~fBREV_8(RssV));})
+Q6INSN(S2_ct1p, "Rd32=ct1(Rss32)",   ATTRIBS(), "Count Trailing",{RdV = fCL1_8(fBREV_8(RssV));})
+
+
+Q6INSN(S2_interleave,"Rdd32=interleave(Rss32)",ATTRIBS(A_ARCHV2),"Interleave bits",
+{RddV = fINTERLEAVE(fGETWORD(1,RssV),fGETWORD(0,RssV));})
+
+Q6INSN(S2_deinterleave,"Rdd32=deinterleave(Rss32)",ATTRIBS(A_ARCHV2),"Interleave bits",
+{RddV = fDEINTERLEAVE(RssV);})
+
+
+
+
+Q6INSN(S6_userinsn,"Rxx32=userinsn(Rss32,Rtt32,#u7)",ATTRIBS(A_FAKEINSN),
+"User Defined Instruction",
+{
+	if (thread->processor_ptr->options->userinsn_callback) {
+      RxxV = thread->processor_ptr->options->userinsn_callback(
+	      thread->system_ptr,thread->processor_ptr,
+	      thread->threadId,RssV,RttV,RxxV,uiV);
+    } else {
+		warn("User instruction encountered but no callback!");
+	}
+
+})
+
+
+
+
+
+
+
diff --git a/target/hexagon/imported/subinsns.idef b/target/hexagon/imported/subinsns.idef
new file mode 100644
index 0000000..08658df
--- /dev/null
+++ b/target/hexagon/imported/subinsns.idef
@@ -0,0 +1,152 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+/*
+ * sub-instructions
+ */
+
+
+
+/*****************************************************************/
+/*                                                               */
+/*                       A-type subinsns                         */
+/*                                                               */
+/*****************************************************************/
+
+Q6INSN(SA1_addi,     "Rx16=add(Rx16,#s7)",    ATTRIBS(A_SUBINSN),"Add",        { fIMMEXT(siV); RxV=RxV+siV;})
+Q6INSN(SA1_tfr,      "Rd16=Rs16",             ATTRIBS(A_SUBINSN),"Tfr",        { RdV=RsV;})
+Q6INSN(SA1_seti,     "Rd16=#u6",              ATTRIBS(A_SUBINSN),"Set immed",  { fIMMEXT(uiV); RdV=uiV;})
+Q6INSN(SA1_setin1,   "Rd16=#-1",              ATTRIBS(A_SUBINSN),"Set to -1",  { RdV=-1;})
+Q6INSN(SA1_clrtnew,  "if (p0.new) Rd16=#0",   ATTRIBS(A_SUBINSN),"clear if true", { if (fLSBNEW0) {RdV=0;} else {CANCEL;} })
+Q6INSN(SA1_clrfnew,  "if (!p0.new) Rd16=#0",  ATTRIBS(A_SUBINSN),"clear if false",{ if (fLSBNEW0NOT) {RdV=0;} else {CANCEL;} })
+Q6INSN(SA1_clrt,     "if (p0) Rd16=#0",       ATTRIBS(A_SUBINSN),"clear if true", { if (fLSBOLD(fREAD_P0())) {RdV=0;} else {CANCEL;} })
+Q6INSN(SA1_clrf,     "if (!p0) Rd16=#0",      ATTRIBS(A_SUBINSN),"clear if false",{ if (fLSBOLDNOT(fREAD_P0())) {RdV=0;} else {CANCEL;} })
+
+Q6INSN(SA1_addsp,    "Rd16=add(r29,#u6:2)",   ATTRIBS(A_SUBINSN),"Add",        { RdV=fREAD_SP()+uiV; })
+Q6INSN(SA1_inc,      "Rd16=add(Rs16,#1)",     ATTRIBS(A_SUBINSN),"Inc",        { RdV=RsV+1;})
+Q6INSN(SA1_dec,      "Rd16=add(Rs16,#-1)",    ATTRIBS(A_SUBINSN),"Dec",        { RdV=RsV-1;})
+Q6INSN(SA1_addrx,    "Rx16=add(Rx16,Rs16)",   ATTRIBS(A_SUBINSN,A_COMMUTES),"Add",        { RxV=RxV+RsV; })
+Q6INSN(SA1_zxtb,     "Rd16=and(Rs16,#255)",   ATTRIBS(A_SUBINSN),"Zxtb",       { RdV= fZXTN(8,32,RsV);})
+Q6INSN(SA1_and1,     "Rd16=and(Rs16,#1)",     ATTRIBS(A_SUBINSN),"And #1",     { RdV= RsV&1;})
+Q6INSN(SA1_sxtb,     "Rd16=sxtb(Rs16)",       ATTRIBS(A_SUBINSN),"Sxtb",       { RdV= fSXTN(8,32,RsV);})
+Q6INSN(SA1_zxth,     "Rd16=zxth(Rs16)",       ATTRIBS(A_SUBINSN),"Zxth",       { RdV= fZXTN(16,32,RsV);})
+Q6INSN(SA1_sxth,     "Rd16=sxth(Rs16)",       ATTRIBS(A_SUBINSN),"Sxth",       { RdV= fSXTN(16,32,RsV);})
+Q6INSN(SA1_combinezr,"Rdd8=combine(#0,Rs16)", ATTRIBS(A_SUBINSN,A_ROPS_2),"Combines",   { fSETWORD(0,RddV,RsV); fSETWORD(1,RddV,0); })
+Q6INSN(SA1_combinerz,"Rdd8=combine(Rs16,#0)", ATTRIBS(A_SUBINSN,A_ROPS_2),"Combines",   { fSETWORD(0,RddV,0); fSETWORD(1,RddV,RsV); })
+Q6INSN(SA1_combine0i,"Rdd8=combine(#0,#u2)", ATTRIBS(A_SUBINSN,A_ROPS_2),"Combines",   { fSETWORD(0,RddV,uiV); fSETWORD(1,RddV,0); })
+Q6INSN(SA1_combine1i,"Rdd8=combine(#1,#u2)", ATTRIBS(A_SUBINSN,A_ROPS_2),"Combines",   { fSETWORD(0,RddV,uiV); fSETWORD(1,RddV,1); })
+Q6INSN(SA1_combine2i,"Rdd8=combine(#2,#u2)", ATTRIBS(A_SUBINSN,A_ROPS_2),"Combines",   { fSETWORD(0,RddV,uiV); fSETWORD(1,RddV,2); })
+Q6INSN(SA1_combine3i,"Rdd8=combine(#3,#u2)", ATTRIBS(A_SUBINSN,A_ROPS_2),"Combines",   { fSETWORD(0,RddV,uiV); fSETWORD(1,RddV,3); })
+Q6INSN(SA1_cmpeqi,   "p0=cmp.eq(Rs16,#u2)",   ATTRIBS(A_SUBINSN),"CompareImmed",{fWRITE_P0(f8BITSOF(RsV==uiV));})
+
+
+
+
+/*****************************************************************/
+/*                                                               */
+/*                       Ld1/2 subinsns                          */
+/*                                                               */
+/*****************************************************************/
+
+Q6INSN(SL1_loadri_io,  "Rd16=memw(Rs16+#u4:2)", ATTRIBS(A_MEMSIZE_4B,A_LOAD,A_SUBINSN),"load word", {fEA_RI(RsV,uiV); fLOAD(1,4,u,EA,RdV);})
+Q6INSN(SL1_loadrub_io, "Rd16=memub(Rs16+#u4:0)",ATTRIBS(A_MEMSIZE_1B,A_LOAD,A_SUBINSN),"load byte", {fEA_RI(RsV,uiV); fLOAD(1,1,u,EA,RdV);})
+
+Q6INSN(SL2_loadrh_io,  "Rd16=memh(Rs16+#u3:1)", ATTRIBS(A_MEMSIZE_2B,A_LOAD,A_SUBINSN),"load half", {fEA_RI(RsV,uiV); fLOAD(1,2,s,EA,RdV);})
+Q6INSN(SL2_loadruh_io, "Rd16=memuh(Rs16+#u3:1)",ATTRIBS(A_MEMSIZE_2B,A_LOAD,A_SUBINSN),"load half", {fEA_RI(RsV,uiV); fLOAD(1,2,u,EA,RdV);})
+Q6INSN(SL2_loadrb_io,  "Rd16=memb(Rs16+#u3:0)", ATTRIBS(A_MEMSIZE_1B,A_LOAD,A_SUBINSN),"load byte", {fEA_RI(RsV,uiV); fLOAD(1,1,s,EA,RdV);})
+Q6INSN(SL2_loadri_sp,  "Rd16=memw(r29+#u5:2)",  ATTRIBS(A_MEMSIZE_4B,A_LOAD,A_SUBINSN),"load word", {fEA_RI(fREAD_SP(),uiV); fLOAD(1,4,u,EA,RdV);})
+Q6INSN(SL2_loadrd_sp,  "Rdd8=memd(r29+#u5:3)", ATTRIBS(A_MEMSIZE_8B,A_LOAD,A_SUBINSN),"load dword",{fEA_RI(fREAD_SP(),uiV); fLOAD(1,8,u,EA,RddV);})
+
+Q6INSN(SL2_deallocframe,"deallocframe", ATTRIBS(A_SUBINSN,A_MEMSIZE_8B,A_LOAD,A_DEALLOCFRAME), "Deallocate stack frame",
+{ fHIDE(size8u_t tmp;) fEA_REG(fREAD_FP());
+	fLOAD(1,8,u,EA,tmp);
+	tmp = fFRAME_UNSCRAMBLE(tmp);
+	fWRITE_LR(fGETWORD(1,tmp));
+	fWRITE_FP(fGETWORD(0,tmp));
+	fWRITE_SP(EA+8); })
+
+Q6INSN(SL2_return,"dealloc_return", ATTRIBS(A_JINDIR,A_SUBINSN,A_ROPS_2,A_MEMSIZE_8B,A_LOAD,A_RETURN,A_RESTRICT_SLOT0ONLY,A_RET_TYPE,A_DEALLOCRET), "Deallocate stack frame and return",
+{ fHIDE(size8u_t tmp;) fEA_REG(fREAD_FP());
+	fLOAD(1,8,u,EA,tmp);
+	tmp = fFRAME_UNSCRAMBLE(tmp);
+	fWRITE_LR(fGETWORD(1,tmp));
+	fWRITE_FP(fGETWORD(0,tmp));
+	fWRITE_SP(EA+8);
+    fJUMPR(REG_LR,fGETWORD(1,tmp),COF_TYPE_JUMPR);})
+
+Q6INSN(SL2_return_t,"if (p0) dealloc_return", ATTRIBS(A_JINDIROLD,A_SUBINSN,A_ROPS_2,A_MEMSIZE_8B,A_LOAD,A_RETURN,A_RESTRICT_SLOT0ONLY,A_RET_TYPE,A_PRED_BIT_4), "Deallocate stack frame and return",
+{ fHIDE(size8u_t tmp;); fBRANCH_SPECULATE_STALL(fLSBOLD(fREAD_P0()),, SPECULATE_NOT_TAKEN,4,0); fEA_REG(fREAD_FP()); if (fLSBOLD(fREAD_P0())) { fLOAD(1,8,u,EA,tmp); tmp = fFRAME_UNSCRAMBLE(tmp); fWRITE_LR(fGETWORD(1,tmp)); fWRITE_FP(fGETWORD(0,tmp)); fWRITE_SP(EA+8);
+  fJUMPR(REG_LR,fGETWORD(1,tmp),COF_TYPE_JUMPR);} else {LOAD_CANCEL(EA);} })
+
+Q6INSN(SL2_return_f,"if (!p0) dealloc_return", ATTRIBS(A_JINDIROLD,A_SUBINSN,A_ROPS_2,A_MEMSIZE_8B,A_LOAD,A_RETURN,A_RESTRICT_SLOT0ONLY,A_RET_TYPE,A_PRED_BIT_4), "Deallocate stack frame and return",
+{ fHIDE(size8u_t tmp;);fBRANCH_SPECULATE_STALL(fLSBOLDNOT(fREAD_P0()),, SPECULATE_NOT_TAKEN,4,0); fEA_REG(fREAD_FP()); if (fLSBOLDNOT(fREAD_P0())) { fLOAD(1,8,u,EA,tmp); tmp = fFRAME_UNSCRAMBLE(tmp); fWRITE_LR(fGETWORD(1,tmp)); fWRITE_FP(fGETWORD(0,tmp)); fWRITE_SP(EA+8);
+  fJUMPR(REG_LR,fGETWORD(1,tmp),COF_TYPE_JUMPR);} else {LOAD_CANCEL(EA);} })
+
+
+
+Q6INSN(SL2_return_tnew,"if (p0.new) dealloc_return:nt", ATTRIBS(A_JINDIRNEW,A_SUBINSN,A_ROPS_2,A_MEMSIZE_8B,A_LOAD,A_RETURN,A_RESTRICT_SLOT0ONLY,A_RET_TYPE,A_PRED_BIT_4), "Deallocate stack frame and return",
+{ fHIDE(size8u_t tmp;) fBRANCH_SPECULATE_STALL(fLSBNEW0,, SPECULATE_NOT_TAKEN , 4,3); fEA_REG(fREAD_FP()); if (fLSBNEW0) { fLOAD(1,8,u,EA,tmp); tmp = fFRAME_UNSCRAMBLE(tmp); fWRITE_LR(fGETWORD(1,tmp)); fWRITE_FP(fGETWORD(0,tmp)); fWRITE_SP(EA+8);
+  fJUMPR(REG_LR,fGETWORD(1,tmp),COF_TYPE_JUMPR);} else {LOAD_CANCEL(EA);} })
+
+Q6INSN(SL2_return_fnew,"if (!p0.new) dealloc_return:nt", ATTRIBS(A_JINDIRNEW,A_SUBINSN,A_ROPS_2,A_MEMSIZE_8B,A_LOAD,A_RETURN,A_RESTRICT_SLOT0ONLY,A_RET_TYPE,A_PRED_BIT_4), "Deallocate stack frame and return",
+{ fHIDE(size8u_t tmp;) fBRANCH_SPECULATE_STALL(fLSBNEW0NOT,, SPECULATE_NOT_TAKEN , 4,3); fEA_REG(fREAD_FP()); if (fLSBNEW0NOT) { fLOAD(1,8,u,EA,tmp); tmp = fFRAME_UNSCRAMBLE(tmp); fWRITE_LR(fGETWORD(1,tmp)); fWRITE_FP(fGETWORD(0,tmp)); fWRITE_SP(EA+8);
+  fJUMPR(REG_LR,fGETWORD(1,tmp),COF_TYPE_JUMPR);} else {LOAD_CANCEL(EA);} })
+
+
+Q6INSN(SL2_jumpr31,"jumpr r31",ATTRIBS(A_SUBINSN,A_JINDIR,A_RESTRICT_SLOT0ONLY,A_RET_TYPE),"indirect unconditional jump",
+{ fJUMPR(REG_LR,fREAD_LR(),COF_TYPE_JUMPR);})
+
+Q6INSN(SL2_jumpr31_t,"if (p0) jumpr r31",ATTRIBS(A_SUBINSN,A_JINDIROLD,A_NOTE_CONDITIONAL,A_RESTRICT_SLOT0ONLY,A_RET_TYPE,A_PRED_BIT_4),"indirect conditional jump if true",
+{fBRANCH_SPECULATE_STALL(fLSBOLD(fREAD_P0()),, SPECULATE_TAKEN,4,0); if (fLSBOLD(fREAD_P0())) {fJUMPR(REG_LR,fREAD_LR(),COF_TYPE_JUMPR);}})
+
+Q6INSN(SL2_jumpr31_f,"if (!p0) jumpr r31",ATTRIBS(A_SUBINSN,A_JINDIROLD,A_NOTE_CONDITIONAL,A_RESTRICT_SLOT0ONLY,A_RET_TYPE,A_PRED_BIT_4),"indirect conditional jump if false",
+{fBRANCH_SPECULATE_STALL(fLSBOLDNOT(fREAD_P0()),, SPECULATE_TAKEN,4,0); if (fLSBOLDNOT(fREAD_P0())) {fJUMPR(REG_LR,fREAD_LR(),COF_TYPE_JUMPR);}})
+
+
+
+Q6INSN(SL2_jumpr31_tnew,"if (p0.new) jumpr:nt r31",ATTRIBS(A_SUBINSN,A_JINDIRNEW,A_NOTE_CONDITIONAL,A_RESTRICT_SLOT0ONLY,A_RET_TYPE,A_PRED_BIT_4),"indirect conditional jump if true",
+{fBRANCH_SPECULATE_STALL(fLSBNEW0,, SPECULATE_NOT_TAKEN , 4,3); if (fLSBNEW0) {fJUMPR(REG_LR,fREAD_LR(),COF_TYPE_JUMPR);}})
+
+Q6INSN(SL2_jumpr31_fnew,"if (!p0.new) jumpr:nt r31",ATTRIBS(A_SUBINSN,A_JINDIRNEW,A_NOTE_CONDITIONAL,A_RESTRICT_SLOT0ONLY,A_RET_TYPE,A_PRED_BIT_4),"indirect conditional jump if false",
+{fBRANCH_SPECULATE_STALL(fLSBNEW0NOT,, SPECULATE_NOT_TAKEN , 4,3); if (fLSBNEW0NOT) {fJUMPR(REG_LR,fREAD_LR(),COF_TYPE_JUMPR);}})
+
+
+
+
+
+/*****************************************************************/
+/*                                                               */
+/*                       St1/2 subinsns                          */
+/*                                                               */
+/*****************************************************************/
+
+Q6INSN(SS1_storew_io,  "memw(Rs16+#u4:2)=Rt16", ATTRIBS(A_MEMSIZE_4B,A_STORE,A_SUBINSN), "store word", {fEA_RI(RsV,uiV); fSTORE(1,4,EA,RtV);})
+Q6INSN(SS1_storeb_io,  "memb(Rs16+#u4:0)=Rt16", ATTRIBS(A_MEMSIZE_1B,A_STORE,A_SUBINSN), "store byte", {fEA_RI(RsV,uiV); fSTORE(1,1,EA,fGETBYTE(0,RtV));})
+Q6INSN(SS2_storeh_io,  "memh(Rs16+#u3:1)=Rt16", ATTRIBS(A_MEMSIZE_2B,A_STORE,A_SUBINSN), "store half", {fEA_RI(RsV,uiV); fSTORE(1,2,EA,fGETHALF(0,RtV));})
+Q6INSN(SS2_stored_sp,  "memd(r29+#s6:3)=Rtt8", ATTRIBS(A_MEMSIZE_8B,A_STORE,A_SUBINSN), "store dword",{fEA_RI(fREAD_SP(),siV); fSTORE(1,8,EA,RttV);})
+Q6INSN(SS2_storew_sp,  "memw(r29+#u5:2)=Rt16",  ATTRIBS(A_MEMSIZE_4B,A_STORE,A_SUBINSN), "store word", {fEA_RI(fREAD_SP(),uiV); fSTORE(1,4,EA,RtV);})
+Q6INSN(SS2_storewi0,   "memw(Rs16+#u4:2)=#0", ATTRIBS(A_MEMSIZE_4B,A_STORE,A_SUBINSN,A_ROPS_2), "store word", {fEA_RI(RsV,uiV); fSTORE(1,4,EA,0);})
+Q6INSN(SS2_storebi0,   "memb(Rs16+#u4:0)=#0", ATTRIBS(A_MEMSIZE_1B,A_STORE,A_SUBINSN,A_ROPS_2), "store byte", {fEA_RI(RsV,uiV); fSTORE(1,1,EA,0);})
+Q6INSN(SS2_storewi1,   "memw(Rs16+#u4:2)=#1", ATTRIBS(A_MEMSIZE_4B,A_STORE,A_SUBINSN,A_ROPS_2), "store word", {fEA_RI(RsV,uiV); fSTORE(1,4,EA,1);})
+Q6INSN(SS2_storebi1,   "memb(Rs16+#u4:0)=#1", ATTRIBS(A_MEMSIZE_1B,A_STORE,A_SUBINSN,A_ROPS_2), "store byte", {fEA_RI(RsV,uiV); fSTORE(1,1,EA,1);})
+
+
+Q6INSN(SS2_allocframe,"allocframe(#u5:3)", ATTRIBS(A_SUBINSN,A_MEMSIZE_8B,A_STORE,A_RESTRICT_SLOT0ONLY), "Allocate stack frame",
+{ fEA_RI(fREAD_SP(),-8);  fSTORE(1,8,EA,fFRAME_SCRAMBLE((fCAST8_8u(fREAD_LR()) << 32) | fCAST4_4u(fREAD_FP()))); fWRITE_FP(EA); fFRAMECHECK(EA-uiV,EA); fWRITE_SP(EA-uiV); })
+
+
+
diff --git a/target/hexagon/imported/system.idef b/target/hexagon/imported/system.idef
new file mode 100644
index 0000000..86660c5
--- /dev/null
+++ b/target/hexagon/imported/system.idef
@@ -0,0 +1,302 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+/*
+ * System Interface Instructions
+ */
+
+
+
+/********************************************/
+/* User->OS interface                       */
+/********************************************/
+
+Q6INSN(J2_trap0,"trap0(#u8)",ATTRIBS(A_COF,A_NOTE_NOPACKET,A_RESTRICT_NOPACKET),
+"Trap to Operating System",
+	fTRAP(0,uiV);
+)
+
+Q6INSN(J2_trap1,"trap1(Rx32,#u8)",ATTRIBS(A_COF,A_NOTE_NOPACKET,A_RESTRICT_NOPACKET),
+"Trap to Operating System",
+	/*
+	 * Note: if RxV is not written, we get the same as the input.
+	 * Since trap1 is SOLO, this means the register will effectively not be updated
+	 */
+	if (!fTRAP1_VIRTINSN(uiV)) {
+		fTRAP(1,uiV);
+	} else if (uiV == 1) {
+		fVIRTINSN_RTE(uiV,RxV);
+	} else if (uiV == 3) {
+		fVIRTINSN_SETIE(uiV,RxV);
+	} else if (uiV == 4) {
+		fVIRTINSN_GETIE(uiV,RxV);
+	} else if (uiV == 6) {
+		fVIRTINSN_SPSWAP(uiV,RxV);
+	})
+
+DEF_MAPPING(J2_trap1_noregmap,"trap1(#u8)","trap1(R0,#u8)")
+
+Q6INSN(J2_pause,"pause(#u8)",ATTRIBS(A_COF,A_NOTE_NOPACKET,A_RESTRICT_NOPACKET),
+"Enter low-power state for #u8 cycles",{fPAUSE(uiV);})
+
+Q6INSN(J2_rte,  "rte", ATTRIBS(A_PRIV,A_NOTE_PRIV,A_NO_TIMING_LOG),
+"Return from Exception",
+{
+fHIDE(if((thread->timing_on) && (thread->status & EXEC_STATUS_REPLAY)) { return; })
+fHIDE(CALLBACK(thread->processor_ptr->options->rte_callback,
+      thread->system_ptr,thread->processor_ptr,
+      thread->threadId,0);)
+fCLEAR_RTE_EX();
+fBRANCH(fREAD_ELR(),COF_TYPE_RTE);})
+
+
+/********************************************/
+/* Interrupt Management                     */
+/********************************************/
+
+Q6INSN(Y2_swi,"swi(Rs32)",ATTRIBS(A_PRIV,A_NOTE_PRIV,A_NOTE_AXOK,A_RESTRICT_PACKET_AXOK),"Software Interrupt",{DO_SWI(RsV);})
+Q6INSN(Y2_cswi,"cswi(Rs32)",ATTRIBS(A_PRIV,A_NOTE_PRIV,A_NOTE_AXOK,A_RESTRICT_PACKET_AXOK),"Cancel Software Interrupt",{DO_CSWI(RsV);})
+Q6INSN(Y2_ciad,"ciad(Rs32)",ATTRIBS(A_PRIV,A_NOTE_PRIV,A_NOTE_AXOK,A_RESTRICT_PACKET_AXOK),"Re-enable interrupt in IAD",{DO_CIAD(RsV);})
+Q6INSN(Y4_siad,"siad(Rs32)",ATTRIBS(A_PRIV,A_NOTE_PRIV,A_NOTE_AXOK,A_RESTRICT_PACKET_AXOK),"Disable interrupt in IAD",{DO_SIAD(RsV);})
+Q6INSN(Y2_iassignr,"Rd32=iassignr(Rs32)",ATTRIBS(A_PRIV,A_NOTE_PRIV,A_NOTE_AXOK,A_RESTRICT_PACKET_AXOK),"Read interrupt to thread assignments",{DO_IASSIGNR(RsV,RdV);})
+Q6INSN(Y2_iassignw,"iassignw(Rs32)",ATTRIBS(A_PRIV,A_NOTE_PRIV,A_NOTE_AXOK,A_RESTRICT_PACKET_AXOK),"Write interrupt to thread assignments",{DO_IASSIGNW(RsV);})
+
+
+Q6INSN(Y2_getimask,"Rd32=getimask(Rs32)",ATTRIBS(A_PRIV,A_NOTE_PRIV,A_NOTE_AXOK,A_RESTRICT_PACKET_AXOK),"Read imask register of another thread",
+{RdV = READ_IMASK(RsV & thread->processor_ptr->thread_system_mask); })
+
+Q6INSN(Y2_setimask,"setimask(Pt4,Rs32)",ATTRIBS(A_PRIV,A_NOTE_PRIV,A_NOTE_AXOK,A_RESTRICT_PACKET_AXOK),"Change imask register of another thread",
+{fPREDUSE_TIMING();WRITE_IMASK(PtV & thread->processor_ptr->thread_system_mask,RsV); })
+
+
+
+/********************************************/
+/* TLB management                           */
+/********************************************/
+
+Q6INSN(Y2_tlbw,"tlbw(Rss32,Rt32)", ATTRIBS(A_PRIV,A_NOTE_PRIV,A_NOTE_NOPACKET,A_RESTRICT_NOPACKET),
+"Write TLB entry", {fTLBW(RtV,RssV);})
+
+Q6INSN(Y5_ctlbw,"Rd32=ctlbw(Rss32,Rt32)", ATTRIBS(A_PRIV,A_NOTE_PRIV,A_NOTE_NOPACKET,A_RESTRICT_NOPACKET),
+"Conditional Write TLB entry",
+{
+  if (fTLB_ENTRY_OVERLAP( (1LL<<63) | RssV )) {
+        RdV=fTLB_ENTRY_OVERLAP_IDX( (1LL<<63) | RssV);
+     } else {
+        fTLBW(RtV,RssV);
+        RdV=0x80000000;
+     }
+})
+
+Q6INSN(Y5_tlboc,"Rd32=tlboc(Rss32)", ATTRIBS(A_PRIV,A_NOTE_PRIV,A_NOTE_NOPACKET,A_RESTRICT_NOPACKET),
+"TLB overlap check",
+{
+  if (fTLB_ENTRY_OVERLAP( (1LL<<63) | RssV )) {
+        RdV=fTLB_ENTRY_OVERLAP_IDX( (1LL<<63) | RssV);
+     } else {
+        RdV=0x80000000;
+     }
+})
+
+Q6INSN(Y2_tlbr,"Rdd32=tlbr(Rs32)", ATTRIBS(A_PRIV,A_NOTE_PRIV,A_NOTE_NOPACKET,A_RESTRICT_NOPACKET), "Read TLB entry",
+{RddV = fTLBR(RsV);})
+
+Q6INSN(Y2_tlbp,"Rd32=tlbp(Rs32)", ATTRIBS(A_PRIV,A_NOTE_PRIV,A_NOTE_NOPACKET,A_RESTRICT_NOPACKET), "Probe TLB", {RdV=fTLBP(RsV);})
+
+Q6INSN(Y5_tlbasidi,"tlbinvasid(Rs32)",ATTRIBS(A_PRIV,A_NOTE_PRIV,A_NOTE_NOPACKET,A_RESTRICT_NOPACKET), "Invalidate ASID",
+{
+	fHIDE(int i;)
+    fHIDE(unsigned int NUM_TLB_ENTRIES = NUM_TLB_REGS(thread->processor_ptr);)
+	for (i = 0; i < NUM_TLB_ENTRIES; i++) {
+		if ((fGET_FIELD(fTLBR(i),PTE_G) == 0) &&
+			(fGET_FIELD(fTLBR(i),PTE_ASID) == fEXTRACTU_RANGE(RsV,26,20))) {
+			fTLBW(i,fTLBR(i) & ~(1ULL << 63));
+		}
+	}
+})
+
+Q6INSN(Y2_tlblock,"tlblock", ATTRIBS(A_PRIV,A_NOTE_PRIV,A_NOTE_NOPACKET,A_RESTRICT_NOPACKET,A_NO_TIMING_LOG), "Lock TLB",
+{fSET_TLB_LOCK()})
+
+Q6INSN(Y2_tlbunlock,"tlbunlock", ATTRIBS(A_PRIV,A_NOTE_PRIV,A_NOTE_NOPACKET,A_RESTRICT_NOPACKET), "Unlock TLB",
+{fCLEAR_TLB_LOCK();})
+
+DEF_MAPPING(Y2_k1lock_map,"k1lock","tlblock")
+DEF_MAPPING(Y2_k1unlock_map,"k1unlock","tlbunlock")
+
+Q6INSN(Y2_k0lock,"k0lock", ATTRIBS(A_PRIV,A_NOTE_PRIV,A_NOTE_NOPACKET,A_RESTRICT_NOPACKET,A_NO_TIMING_LOG), "Lock K0",
+{fSET_K0_LOCK()})
+
+Q6INSN(Y2_k0unlock,"k0unlock", ATTRIBS(A_PRIV,A_NOTE_PRIV,A_NOTE_NOPACKET,A_RESTRICT_NOPACKET), "Unlock K0",
+{fCLEAR_K0_LOCK();})
+
+/********************************************/
+/* Supervisor Reg Management                */
+/********************************************/
+
+DEF_V4_MAPPING(Y2_crswap_old,"crswap(Rx32,sgp)","crswap(Rx32,sgp0)")
+
+Q6INSN(Y2_crswap0,"crswap(Rx32,sgp0)",ATTRIBS(A_PRIV,A_NOTE_PRIV), "Swap system general pointer 0 with GPR",
+{fHIDE(size4s_t tmp;) tmp = RxV; RxV = READ_SGP0(); WRITE_SGP0(tmp);})
+Q6INSN(Y4_crswap1,"crswap(Rx32,sgp1)",ATTRIBS(A_PRIV,A_NOTE_PRIV), "Swap system general pointer 1 with GPR",
+{fHIDE(size4s_t tmp;) tmp = RxV; RxV = READ_SGP1(); WRITE_SGP1(tmp);})
+
+Q6INSN(Y4_crswap10,"crswap(Rxx32,sgp1:0)",ATTRIBS(A_PRIV,A_NOTE_PRIV), "Swap system general purpose 0/1 with GPR Pair",
+{fHIDE(size8s_t tmp;) tmp = RxxV; RxxV=READ_SGP10(); WRITE_SGP10(tmp);})
+
+Q6INSN(Y2_tfrscrr,"Rd32=Ss128",ATTRIBS(A_PRIV,A_NOTE_PRIV),"Transfer Supervisor Reg to GPR", {RdV=SsV;})
+Q6INSN(Y2_tfrsrcr,"Sd128=Rs32",ATTRIBS(A_PRIV,A_NOTE_PRIV),"Transfer GPR to Supervisor Reg", {SdV=RsV;})
+Q6INSN(Y4_tfrscpp,"Rdd32=Sss128",ATTRIBS(A_PRIV,A_NOTE_PRIV),"Transfer Supervisor Reg to GPR", {RddV=SssV;})
+Q6INSN(Y4_tfrspcp,"Sdd128=Rss32",ATTRIBS(A_PRIV,A_NOTE_PRIV),"Transfer GPR to Supervisor Reg", {SddV=RssV;})
+
+#ifdef PTWALK
+Q6INSN(Y6_tfracrr,"Rd32=As64",ATTRIBS(A_PRIV,A_NOTE_PRIV),"Transfer Address Reg to GPR", {RdV=AsV;})
+Q6INSN(Y6_tfrarcr,"Ad64=Rs32",ATTRIBS(A_PRIV,A_NOTE_PRIV),"Transfer GPR to Address Reg", {AdV=RsV;})
+Q6INSN(Y6_tfracpp,"Rdd32=Ass64",ATTRIBS(A_PRIV,A_NOTE_PRIV),"Transfer Address Reg to GPR", {RddV=AssV;})
+Q6INSN(Y6_tfrapcp,"Add64=Rss32",ATTRIBS(A_PRIV,A_NOTE_PRIV),"Transfer GPR to Address Reg", {AddV=RssV;})
+#endif
+
+Q6INSN(G4_tfrgcrr,"Rd32=Gs32",ATTRIBS(A_GUEST,A_NOTE_GUEST),"Transfer Guest Reg to GPR", {RdV=GsV;})
+Q6INSN(G4_tfrgrcr,"Gd32=Rs32",ATTRIBS(A_GUEST,A_NOTE_GUEST),"Transfer GPR to Guest Reg", {GdV=RsV;})
+Q6INSN(G4_tfrgcpp,"Rdd32=Gss32",ATTRIBS(A_GUEST,A_NOTE_GUEST),"Transfer Guest Reg to GPR", {RddV=GssV;})
+Q6INSN(G4_tfrgpcp,"Gdd32=Rss32",ATTRIBS(A_GUEST,A_NOTE_GUEST),"Transfer GPR to Guest Reg", {GddV=RssV;})
+
+
+
+Q6INSN(Y2_setprio,"setprio(Pt4,Rs32)",ATTRIBS(A_PRIV,A_NOTE_PRIV),"Change TID Prio of another thread",
+{fPREDUSE_TIMING();WRITE_PRIO(PtV & thread->processor_ptr->thread_system_mask,RsV); })
+
+
+
+
+/********************************************/
+/* Power Management / Thread on/off         */
+/********************************************/
+Q6INSN(Y6_diag,"diag(Rs32)",ATTRIBS(),"Send value to Diag trace module",{
+})
+Q6INSN(Y6_diag0,"diag0(Rss32,Rtt32)",ATTRIBS(),"Send values of two register to DIAG Trace. Set X=0",{
+})
+Q6INSN(Y6_diag1,"diag1(Rss32,Rtt32)",ATTRIBS(),"Send values of two register to DIAG Trace.  Set X=1",{
+})
+
+
+Q6INSN(Y4_trace,"trace(Rs32)",ATTRIBS(A_NOTE_AXOK,A_RESTRICT_PACKET_AXOK),"Send value to ETM trace",{
+    fDO_TRACE(RsV);
+})
+
+Q6INSN(Y2_stop,"stop(Rs32)",ATTRIBS(A_PRIV,A_NOTE_PRIV,A_NOTE_NOPACKET,A_RESTRICT_NOPACKET),"Stop thread(s)",{
+    fHIDE(RsV=RsV;)
+    if (!fIN_DEBUG_MODE_NO_ISDB(fGET_TNUM())) fCLEAR_RUN_MODE(fGET_TNUM());
+})
+
+Q6INSN(Y4_nmi,"nmi(Rs32)",ATTRIBS(A_PRIV,A_NOTE_PRIV,A_NOTE_NOPACKET,A_RESTRICT_NOPACKET,A_NO_TIMING_LOG),"Raise NMI on thread(s)",{
+    fDO_NMI(RsV);
+})
+
+Q6INSN(Y2_start,"start(Rs32)",ATTRIBS(A_PRIV,A_NOTE_PRIV,A_NOTE_NOPACKET,A_RESTRICT_NOPACKET),"Start thread(s)",fSTART(RsV);)
+
+Q6INSN(Y2_wait,"wait(Rs32)",ATTRIBS(A_PRIV,A_NOTE_PRIV,A_NOTE_NOPACKET,A_RESTRICT_NOPACKET,A_NO_TIMING_LOG),"Make thread(s) wait",{
+    fHIDE(RsV=RsV;)
+    if (!fIN_DEBUG_MODE(fGET_TNUM())) fSET_WAIT_MODE(fGET_TNUM());
+	fIN_DEBUG_MODE_WARN(fGET_TNUM());
+})
+
+Q6INSN(Y2_resume,"resume(Rs32)",ATTRIBS(A_PRIV,A_NOTE_PRIV,A_NOTE_NOPACKET,A_RESTRICT_NOPACKET),"Make thread(s) stop waiting",fRESUME(RsV);)
+
+Q6INSN(Y2_break,"brkpt",ATTRIBS(A_NOTE_NOPACKET,A_RESTRICT_NOPACKET),"Breakpoint",{fBREAK();})
+
+
+/********************************************/
+/* Cache Management                         */
+/********************************************/
+
+Q6INSN(Y2_ictagr,"Rd32=ictagr(Rs32)",ATTRIBS(A_ICOP,A_PRIV,A_NOTE_PRIV,A_NOTE_NOPACKET,A_RESTRICT_NOPACKET,A_CACHEOP,A_COPBYIDX,A_ICTAGOP),"Instruction Cache Tag Read",{fICTAGR(RsV,RdV,RdN);})
+Q6INSN(Y2_ictagw,"ictagw(Rs32,Rt32)",ATTRIBS(A_ICOP,A_PRIV,A_NOTE_PRIV,A_NOTE_NOPACKET,A_RESTRICT_NOPACKET,A_CACHEOP,A_COPBYIDX,A_ICTAGOP),"Instruction Cache Tag Write",{fICTAGW(RsV,RtV);})
+Q6INSN(Y2_icdataw,"icdataw(Rs32,Rt32)",ATTRIBS(A_ICOP,A_PRIV,A_NOTE_PRIV,A_NOTE_NOPACKET,A_RESTRICT_NOPACKET,A_CACHEOP,A_COPBYIDX,A_ICTAGOP),"Instruction Cache Data Write",{fICDATAW(RsV,RtV);})
+Q6INSN(Y2_icdatar,"Rd32=icdatar(Rs32)",ATTRIBS(A_ICOP,A_PRIV,A_NOTE_PRIV,A_NOTE_NOPACKET,A_RESTRICT_NOPACKET,A_CACHEOP,A_COPBYIDX,A_ICTAGOP),"Instruction Cache Data Read",{fICDATAR(RsV, RdV);})
+Q6INSN(Y2_icinva,"icinva(Rs32)",ATTRIBS(A_ICOP,A_NOTE_NOPACKET,A_RESTRICT_NOPACKET,A_CACHEOP,A_COPBYADDRESS,A_ICFLUSHOP),"Instruction Cache Invalidate Address",{fEA_REG(RsV); fICINVA(EA);})
+Q6INSN(Y2_icinvidx,"icinvidx(Rs32)",ATTRIBS(A_ICOP,A_PRIV,A_NOTE_PRIV,A_NOTE_NOPACKET,A_RESTRICT_NOPACKET,A_CACHEOP,A_COPBYIDX,A_ICFLUSHOP),"Instruction Cache Invalidate Index",{fICINVIDX(RsV);})
+Q6INSN(Y2_ickill,"ickill",ATTRIBS(A_ICOP,A_PRIV,A_NOTE_PRIV,A_NOTE_NOPACKET,A_RESTRICT_NOPACKET,A_CACHEOP,A_ICFLUSHOP),"Instruction Cache Invalidate",{fICKILL();})
+
+Q6INSN(Y2_isync,"isync",ATTRIBS(A_NOTE_NOPACKET,A_RESTRICT_NOPACKET),"Memory Synchronization",{fISYNC();})
+Q6INSN(Y2_barrier,"barrier",ATTRIBS(A_NOTE_NOPACKET,A_RESTRICT_SLOT0ONLY,A_RESTRICT_PACKET_AXOK),"Memory Barrier",{fBARRIER();})
+Q6INSN(Y2_syncht,"syncht",ATTRIBS(A_NOTE_NOPACKET,A_RESTRICT_SLOT0ONLY,A_RESTRICT_NOPACKET),"Memory Synchronization",{fSYNCH();})
+
+
+Q6INSN(Y2_dcfetchbo,"dcfetch(Rs32+#u11:3)",ATTRIBS(A_RESTRICT_PREFERSLOT0,A_DCFETCH,A_RESTRICT_NOSLOT1_STORE),"Data Cache Prefetch",{fEA_RI(RsV,uiV); fDCFETCH(EA);})
+DEF_MAPPING(Y2_dcfetch,  "dcfetch(Rs32)","dcfetch(Rs32+#0)")
+Q6INSN(Y2_dckill,"dckill",ATTRIBS(A_PRIV,A_NOTE_PRIV,A_NOTE_NOPACKET,A_RESTRICT_SLOT0ONLY,A_RESTRICT_NOPACKET,A_CACHEOP,A_DCFLUSHOP),"Data Cache Invalidate",{fDCKILL();})
+
+
+Q6INSN(Y2_dczeroa,"dczeroa(Rs32)",ATTRIBS(A_STORE,A_RESTRICT_SLOT1_AOK,A_NOTE_SLOT1_AOK,A_RESTRICT_SLOT0ONLY,A_CACHEOP,A_COPBYADDRESS,A_DCZEROA),"Zero an aligned 32-byte cacheline",{fEA_REG(RsV); fDCZEROA(EA);})
+Q6INSN(Y2_dccleana,"dccleana(Rs32)",ATTRIBS(A_RESTRICT_SLOT1_AOK,A_NOTE_SLOT1_AOK,A_RESTRICT_SLOT0ONLY,A_CACHEOP,A_COPBYADDRESS,A_DCFLUSHOP),"Data Cache Clean Address",{fEA_REG(RsV); fDCCLEANA(EA);})
+Q6INSN(Y2_dccleanidx,"dccleanidx(Rs32)",ATTRIBS(A_PRIV,A_NOTE_PRIV,A_RESTRICT_PACKET_AXOK,A_NOTE_AXOK,A_RESTRICT_SLOT0ONLY,A_CACHEOP,A_COPBYIDX,A_DCFLUSHOP),"Data Cache Clean Index",{fDCCLEANIDX(RsV);})
+Q6INSN(Y2_dccleaninva,"dccleaninva(Rs32)",ATTRIBS(A_RESTRICT_SLOT1_AOK,A_NOTE_SLOT1_AOK,A_RESTRICT_SLOT0ONLY,A_CACHEOP,A_COPBYADDRESS,A_DCFLUSHOP),"Data Cache Clean and Invalidate Address",{fEA_REG(RsV); fDCCLEANINVA(EA);})
+Q6INSN(Y2_dccleaninvidx,"dccleaninvidx(Rs32)",ATTRIBS(A_PRIV,A_NOTE_PRIV,A_RESTRICT_PACKET_AXOK,A_NOTE_AXOK,A_RESTRICT_SLOT0ONLY,A_CACHEOP,A_COPBYIDX,A_DCFLUSHOP),"Data Cache Clean and Invalidate Index",{fDCCLEANINVIDX(RsV);})
+Q6INSN(Y2_dcinva,"dcinva(Rs32)",ATTRIBS(A_RESTRICT_SLOT1_AOK,A_NOTE_SLOT1_AOK,A_RESTRICT_SLOT0ONLY,A_CACHEOP,A_COPBYADDRESS,A_DCFLUSHOP),"Data Cache Invalidate Address",{fEA_REG(RsV); fDCCLEANINVA(EA);})
+Q6INSN(Y2_dcinvidx,"dcinvidx(Rs32)",ATTRIBS(A_PRIV,A_NOTE_PRIV,A_RESTRICT_PACKET_AXOK,A_NOTE_AXOK,A_RESTRICT_SLOT0ONLY,A_CACHEOP,A_COPBYIDX,A_DCFLUSHOP),"Data Cache Invalidate Index",{fDCINVIDX(RsV);})
+Q6INSN(Y2_dctagr,"Rd32=dctagr(Rs32)",ATTRIBS(A_PRIV,A_NOTE_PRIV,A_RESTRICT_PACKET_AXOK,A_NOTE_AXOK,A_RESTRICT_SLOT0ONLY,A_CACHEOP,A_COPBYIDX,A_DCTAGOP),"Data Cache Tag Read",{fDCTAGR(RsV,RdV,RdN);})
+Q6INSN(Y2_dctagw,"dctagw(Rs32,Rt32)",ATTRIBS(A_PRIV,A_NOTE_PRIV,A_RESTRICT_SLOT0ONLY,A_NOTE_NOPACKET,A_RESTRICT_NOPACKET,A_CACHEOP,A_COPBYIDX,A_DCTAGOP),"Data Cache Tag Write",{fDCTAGW(RsV,RtV);})
+
+
+Q6INSN(Y2_l2kill,"l2kill",ATTRIBS(A_PRIV,A_NOTE_PRIV,A_NOTE_NOPACKET,A_RESTRICT_SLOT0ONLY,A_RESTRICT_NOPACKET,A_CACHEOP,A_L2FLUSHOP),"L2 Cache Invalidate",{fL2KILL();})
+Q6INSN(Y4_l2tagw,"l2tagw(Rs32,Rt32)",ATTRIBS(A_PRIV,A_NOTE_BADTAG_UNDEF,A_NOTE_PRIV,A_RESTRICT_SLOT0ONLY,A_NOTE_NOPACKET,A_RESTRICT_NOPACKET,A_CACHEOP,A_COPBYIDX,A_L2TAGOP),"L2 Cache Tag Write",{fL2TAGW(RsV,RtV);})
+Q6INSN(Y4_l2tagr,"Rd32=l2tagr(Rs32)",ATTRIBS(A_PRIV,A_NOTE_BADTAG_UNDEF,A_NOTE_PRIV,A_NOTE_AXOK,A_RESTRICT_PACKET_AXOK,A_RESTRICT_SLOT0ONLY,A_CACHEOP,A_COPBYIDX,A_L2TAGOP),"L2 Cache Tag Read",{fL2TAGR(RsV,RdV,RdN);})
+
+Q6INSN(Y2_l2cleaninvidx,"l2cleaninvidx(Rs32)",ATTRIBS(A_PRIV,A_NOTE_PRIV,A_NOTE_AXOK,A_RESTRICT_PACKET_AXOK,A_RESTRICT_SLOT0ONLY,A_CACHEOP,A_COPBYIDX,A_L2FLUSHOP),"L2 Cache Clean and Invalidate Index",{fL2CLEANINVIDX(RsV); })
+Q6INSN(Y5_l2cleanidx,"l2cleanidx(Rs32)",ATTRIBS(A_PRIV,A_NOTE_PRIV,A_NOTE_AXOK,A_RESTRICT_PACKET_AXOK,A_RESTRICT_SLOT0ONLY,A_CACHEOP,A_COPBYIDX,A_L2FLUSHOP),"L2 Cache Clean by Index",{fL2CLEANIDX(RsV); })
+Q6INSN(Y5_l2invidx,"l2invidx(Rs32)",ATTRIBS(A_PRIV,A_NOTE_PRIV,A_NOTE_AXOK,A_RESTRICT_PACKET_AXOK,A_RESTRICT_SLOT0ONLY,A_CACHEOP,A_COPBYIDX,A_L2FLUSHOP),"L2 Cache Invalidate by Index",{fL2INVIDX(RsV); })
+
+
+
+Q6INSN(Y4_l2fetch,"l2fetch(Rs32,Rt32)",ATTRIBS(A_RESTRICT_SLOT0ONLY,A_RESTRICT_PACKET_AXOK,A_NOTE_AXOK),"L2 Cache Prefetch",
+{ fL2FETCH(RsV,
+	(RtV&0xff), /*height*/
+	((RtV>>8)&0xff), /*width*/
+	((RtV>>16)&0xffff), /*stride*/
+	0); /*extra attrib flags*/
+})
+
+
+
+Q6INSN(Y5_l2fetch,"l2fetch(Rs32,Rtt32)",ATTRIBS(A_RESTRICT_SLOT0ONLY,A_RESTRICT_PACKET_AXOK,A_NOTE_AXOK),"L2 Cache Prefetch",
+{ fL2FETCH(RsV,
+	fGETUHALF(0,RttV), /*height*/
+	fGETUHALF(1,RttV), /*width*/
+	fGETUHALF(2,RttV), /*stride*/
+	fGETUHALF(3,RttV)); /*flags*/
+})
+
+Q6INSN(Y5_l2locka,"Pd4=l2locka(Rs32)", ATTRIBS(A_PRIV,A_NOTE_PRIV,A_CACHEOP,A_COPBYADDRESS,A_RESTRICT_SLOT0ONLY,A_RESTRICT_PACKET_AXOK,A_NOTE_AXOK,A_RESTRICT_LATEPRED,A_NOTE_LATEPRED),
+"Lock L2 cache line by address", { fEA_REG(RsV); fL2LOCKA(EA,PdV,PdN); fHIDE(MARK_LATE_PRED_WRITE(PdN)) });
+
+
+Q6INSN(Y5_l2unlocka,"l2unlocka(Rs32)", ATTRIBS(A_PRIV,A_NOTE_PRIV,A_CACHEOP,A_COPBYADDRESS,A_RESTRICT_SLOT0ONLY,A_RESTRICT_PACKET_AXOK,A_NOTE_AXOK), "UnLock L2 cache line by address", { fEA_REG(RsV); fL2UNLOCKA(EA); })
+
+
+
+Q6INSN(Y5_l2gunlock,"l2gunlock",ATTRIBS(A_PRIV,A_NOTE_PRIV,A_NOTE_NOPACKET,A_RESTRICT_SLOT0ONLY,A_RESTRICT_NOPACKET,A_CACHEOP,A_L2FLUSHOP),"L2 Global Unlock",{fL2UNLOCK();})
+
+Q6INSN(Y5_l2gclean,"l2gclean",ATTRIBS(A_PRIV,A_NOTE_PRIV,A_NOTE_NOPACKET,A_RESTRICT_SLOT0ONLY,A_RESTRICT_NOPACKET,A_CACHEOP,A_L2FLUSHOP),"L2 Global Clean",{fL2CLEAN();})
+
+Q6INSN(Y5_l2gcleaninv,"l2gcleaninv",ATTRIBS(A_PRIV,A_NOTE_PRIV,A_NOTE_NOPACKET,A_RESTRICT_SLOT0ONLY,A_RESTRICT_NOPACKET,A_CACHEOP,A_L2FLUSHOP),"L2 Global Clean and Invalidate",{fL2CLEANINV();})
+
+Q6INSN(Y6_l2gcleanpa,"l2gclean(Rtt32)",ATTRIBS(A_PRIV,A_NOTE_PRIV,A_NOTE_NOPACKET,A_RESTRICT_SLOT0ONLY,A_RESTRICT_NOPACKET,A_CACHEOP,A_L2FLUSHOP),"L2 Global Clean by PA Range",{fL2CLEANPA(RttV);})
+
+Q6INSN(Y6_l2gcleaninvpa,"l2gcleaninv(Rtt32)",ATTRIBS(A_PRIV,A_NOTE_PRIV,A_NOTE_NOPACKET,A_RESTRICT_SLOT0ONLY,A_RESTRICT_NOPACKET,A_CACHEOP,A_L2FLUSHOP),"L2 Global Clean and Invalidate by PA Range",{fL2CLEANINVPA(RttV);})
+
+
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 17/67] Hexagon arch import - macro definitions
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (15 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 16/67] Hexagon arch import - instruction semantics definitions Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-02-28 16:43 ` [RFC PATCH v2 18/67] Hexagon arch import - instruction encoding Taylor Simpson
                   ` (50 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Imported from the Hexagon architecture library
    imported/macros.def              Scalar core macro definitions
The macro definition files specify instruction attributes that are applied
to each instruction that reverences the macro.

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/imported/macros.def | 3970 ++++++++++++++++++++++++++++++++++++
 1 file changed, 3970 insertions(+)
 create mode 100755 target/hexagon/imported/macros.def

diff --git a/target/hexagon/imported/macros.def b/target/hexagon/imported/macros.def
new file mode 100755
index 0000000..ace0432
--- /dev/null
+++ b/target/hexagon/imported/macros.def
@@ -0,0 +1,3970 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+DEF_MACRO(
+	LIKELY,		/* NAME */
+	(X),		/* PARAMS */
+	"X",		/* sdescr */
+	"X",		/* ldescr */
+	__builtin_expect((X),1), /* BEH */
+	()		/* attribs */
+)
+
+DEF_MACRO(
+	UNLIKELY,	/* NAME */
+	(X),		/* PARAMS */
+	"X",		/* sdescr */
+	"X",		/* ldescr */
+	__builtin_expect((X),0), /* BEH */
+	()		/* attribs */
+)
+
+DEF_MACRO(
+	CANCEL, /* macro name */
+	, /* parameters */
+	"NOP", /* short description */
+	"NOP this instruction", /* long description */
+	{if (thread->last_pkt) thread->last_pkt->slot_cancelled |= (1<<insn->slot); return;} , /* behavior */
+	(A_CONDEXEC)
+)
+
+DEF_MACRO(
+	STORE_ZERO, /* macro name */
+	, /* parameters */
+	"Zero Store", /* short description */
+	"NOP this instruction", /* long description */
+	{if (thread->last_pkt) thread->last_pkt->slot_zero_byte_store |= (1<<insn->slot); } , /* behavior */
+	(A_CONDEXEC)
+)
+
+DEF_MACRO(
+	LOAD_CANCEL, /* macro name */
+	(EA), /* parameters */
+	"NOP", /* short description */
+	"NOP this instruction", /* long description */
+	{mem_general_load_cancelled(thread,EA,insn);CANCEL;} , /* behavior */
+	(A_CONDEXEC)
+)
+
+DEF_MACRO(
+	STORE_CANCEL, /* macro name */
+	(EA), /* parameters */
+	"NOP", /* short description */
+	"NOP this instruction", /* long description */
+	{mem_general_store_cancelled(thread,EA,insn);CANCEL;} , /* behavior */
+	(A_CONDEXEC)
+)
+
+DEF_MACRO(
+	IS_CANCELLED, /* macro name */
+	(SLOT), /* parameters */
+	"", /* short description */
+	"", /* long description */
+	((thread->last_pkt->slot_cancelled >> SLOT)&1) , /* behavior */
+	/* attrib */
+)
+
+DEF_MACRO(
+	fMAX, /* macro name */
+	(A,B), /* parameters */
+	"max(A,B)", /* short description */
+	"the larger of A and B", /* long description */
+	(((A) > (B)) ? (A) : (B)), /* behavior */
+	/* optional attributes */
+)
+
+DEF_MACRO(
+	fMIN, /* macro name */
+	(A,B), /* parameters */
+	"min(A,B)", /* short description */
+	"the smaller of A and B", /* long description */
+	(((A) < (B)) ? (A) : (B)), /* behavior */
+	/* optional attributes */
+)
+
+DEF_MACRO(
+	fABS, /* macro name */
+	(A), /* parameters */
+	"ABS(A)", /* short description */
+	"if A is less than zero then A=-A", /* long description */
+	(((A)<0)?(-(A)):(A)), /* behavior */
+	/* optional attributes */
+)
+
+
+/* Bit insert */
+DEF_MACRO(
+	fINSERT_BITS,(REG,WIDTH,OFFSET,INVAL),
+	"REG[(WIDTH-1+OFFSET):OFFSET]=INVAL",
+	"REG[(WIDTH-1+OFFSET):OFFSET]=INVAL",
+        {
+	   REG = ((REG) & ~(((fCONSTLL(1)<<(WIDTH))-1)<<(OFFSET))) | (((INVAL) & ((fCONSTLL(1)<<(WIDTH))-1)) << (OFFSET));
+        },
+	/* attribs */
+)
+
+/* Bit extract */
+DEF_MACRO(
+	fEXTRACTU_BITS,(INREG,WIDTH,OFFSET),
+	"INREG[(WIDTH+OFFSET-1):OFFSET]",
+	"INREG[(WIDTH+OFFSET-1):OFFSET]",
+	(fZXTN(WIDTH,32,(INREG >> OFFSET))),
+	/* attribs */
+)
+
+DEF_MACRO(
+	fEXTRACTU_BIDIR,(INREG,WIDTH,OFFSET),
+	"INREG[(WIDTH+OFFSET-1):OFFSET]",
+	"INREG[(WIDTH+OFFSET-1):OFFSET]",
+	(fZXTN(WIDTH,32,fBIDIR_LSHIFTR((INREG),(OFFSET),4_8))),
+	/* attribs */
+)
+
+DEF_MACRO(
+	fEXTRACTU_RANGE,(INREG,HIBIT,LOWBIT),
+	"INREG[HIBIT:LOWBIT]",
+	"INREG[HIBIT:LOWBIT]",
+	(fZXTN((HIBIT-LOWBIT+1),32,(INREG >> LOWBIT))),
+	/* attribs */
+)
+
+
+DEF_MACRO(
+	fINSERT_RANGE,(INREG,HIBIT,LOWBIT,INVAL),
+	"INREG[HIBIT:LOWBIT]=INVAL",
+	"INREG[HIBIT:LOWBIT]=INVAL",
+        {
+          int offset=LOWBIT;
+          int width=HIBIT-LOWBIT+1;
+  	  /* clear bits where new bits go */
+	  INREG &= ~(((fCONSTLL(1)<<width)-1)<<offset);
+	  /* OR in new bits */
+	  INREG |= ((INVAL & ((fCONSTLL(1)<<width)-1)) << offset);
+        },
+	/* attribs */
+)
+
+
+DEF_MACRO(
+	f8BITSOF,(VAL),
+	"VAL ? 0xff : 0x00",
+	"VAL ? 0xff : 0x00",
+	( (VAL) ? 0xff : 0x00),
+	/* attribs */
+)
+
+DEF_MACRO(
+	fLSBOLD,(VAL),
+	"VAL[0]",
+	"Least significant bit of VAL",
+	((VAL) & 1),
+	(A_DOTOLD)
+)
+
+DEF_MACRO(
+	fLSBNEW,(PNUM),
+	"PNUM.new[0]",
+	"Least significant bit of new PNUM",
+	predlog_read(thread,PNUM),
+	(A_DOTNEW)
+)
+
+DEF_MACRO(
+	fLSBNEW0,,
+	"P0.new[0]",
+	"Least significant bit of new P0",
+	predlog_read(thread,0),
+	(A_DOTNEW,A_IMPLICIT_READS_P0)
+)
+
+DEF_MACRO(
+	fLSBNEW1,,
+	"P1.new[0]",
+	"Least significant bit of new P1",
+	predlog_read(thread,1),
+	(A_DOTNEW,A_IMPLICIT_READS_P1)
+)
+
+DEF_MACRO(
+	fLSBOLDNOT,(VAL),
+	"!VAL[0]",
+	"!Least significant bit of VAL",
+	(!fLSBOLD(VAL)),
+	(A_DOTOLD,A_INVPRED)
+)
+
+DEF_MACRO(
+	fLSBNEWNOT,(PNUM),
+	"!PNUM.new[0]",
+	"!Least significant bit of new PNUM",
+	(!fLSBNEW(PNUM)),
+	(A_DOTNEW,A_INVPRED)
+)
+
+DEF_MACRO(
+	fLSBNEW0NOT,,
+	"!P0.new[0]",
+	"!Least significant bit of new P0",
+	(!fLSBNEW0),
+	(A_DOTNEW,A_INVPRED,A_IMPLICIT_READS_P0)
+)
+
+DEF_MACRO(
+	fLSBNEW1NOT,,
+	"P1.new[0]",
+	"Least significant bit of new P1",
+	(!fLSBNEW1),
+	(A_DOTNEW,A_INVPRED,A_IMPLICIT_READS_P1)
+)
+
+DEF_MACRO(
+	fNEWREG,(RNUM),
+	"RNUM.new",
+	"Register value produced in this packet",
+	({if (newvalue_missing(thread,RNUM) ||
+      IS_CANCELLED(insn->new_value_producer_slot)) CANCEL; reglog_read(thread,RNUM);}),
+	(A_DOTNEWVALUE,A_RESTRICT_SLOT0ONLY)
+)
+// Store new with a missing newvalue or cancelled goes out as a zero byte store in V65
+// take advantage of the fact that reglog_read returns zero for not valid rnum
+DEF_MACRO(
+	fNEWREG_ST,(RNUM),
+	"RNUM.new",
+	"Register value produced in this packet for store",
+	({if (newvalue_missing(thread,RNUM) ||
+      IS_CANCELLED(insn->new_value_producer_slot)) { STORE_ZERO; RNUM = -1; }; reglog_read(thread,RNUM);}),
+	(A_DOTNEWVALUE,A_RESTRICT_SLOT0ONLY)
+)
+
+DEF_MACRO(
+	fMEMZNEW,(RNUM),
+	"RNUM.new == 0",
+	"Newly loaded value is zero",
+	(RNUM == 0),
+	/* attribs */
+)
+
+DEF_MACRO(
+	fMEMNZNEW,(RNUM),
+	"RNUM.new != 0",
+	"Newly loaded value is non-zero",
+	(RNUM != 0),
+	/* attribs */
+)
+
+DEF_MACRO(
+	fVSATUVALN,(N,VAL),
+	"N-Bit Unsigned Saturation Value for sign of VAL",
+	"N-Bit Unsigned Saturation Value for sign of VAL",
+	({ ((VAL) < 0) ? 0 : ((1LL<<(N))-1);}),
+	(A_USATURATE)
+)
+
+DEF_MACRO(
+	fSATUVALN,(N,VAL),
+	"N-Bit Unsigned Saturation Value for sign of VAL",
+	"N-Bit Unsigned Saturation Value for sign of VAL",
+	({fSET_OVERFLOW(); ((VAL) < 0) ? 0 : ((1LL<<(N))-1);}),
+	(A_USATURATE)
+)
+
+DEF_MACRO(
+	fSATVALN,(N,VAL),
+	"N-Bit Saturation Value for sign of VAL",
+	"N-Bit Saturation Value for sign of VAL",
+	({fSET_OVERFLOW(); ((VAL) < 0) ? (-(1LL<<((N)-1))) : ((1LL<<((N)-1))-1);}),
+	(A_SATURATE)
+)
+
+DEF_MACRO(
+	fVSATVALN,(N,VAL),
+	"N-Bit Saturation Value for sign of VAL",
+	"N-Bit Saturation Value for sign of VAL",
+	({((VAL) < 0) ? (-(1LL<<((N)-1))) : ((1LL<<((N)-1))-1);}),
+	(A_SATURATE)
+)
+
+DEF_MACRO(
+	fZXTN, /* macro name */
+	(N,M,VAL),
+	"zxt_{N->M}(VAL)", /* short descr */
+	"Zero extend N bits from VAL",
+	((VAL) & ((1LL<<(N))-1)),
+	/* attribs */
+)
+
+DEF_MACRO(
+	fSXTN, /* macro name */
+	(N,M,VAL),
+	"sxt_{N->M}(VAL)", /* short descr */
+	"Sign extend N bits from VAL",
+	((fZXTN(N,M,VAL) ^ (1LL<<((N)-1))) - (1LL<<((N)-1))),
+	/* attribs */
+)
+
+DEF_MACRO(
+	fSATN,
+	(N,VAL),
+	"sat_##N(VAL)", /* short descr */
+	"Saturate VAL to N bits",
+	((fSXTN(N,64,VAL) == (VAL)) ? (VAL) : fSATVALN(N,VAL)),
+	(A_SATURATE)
+)
+DEF_MACRO(
+	fVSATN,
+	(N,VAL),
+	"sat_##N(VAL)", /* short descr */
+	"Saturate VAL to N bits",
+	((fSXTN(N,64,VAL) == (VAL)) ? (VAL) : fVSATVALN(N,VAL)),
+	(A_SATURATE)
+)
+
+DEF_MACRO(
+	fADDSAT64,
+	(DST,A,B),
+	"DST=sat64(A+B)", /* short descr */
+	"DST=sat64(A+B)",
+	{
+	  size8u_t __a = fCAST8u(A);
+	  size8u_t __b = fCAST8u(B);
+	  size8u_t __sum = __a + __b;
+	  size8u_t __xor = __a ^ __b;
+	  const size8u_t __mask = 0x8000000000000000ULL;
+	  if (__xor & __mask) {
+		/* Opposite signs, OK */
+		DST = __sum;
+	  } else if ((__a ^ __sum) & __mask) {
+		/* Signs mismatch */
+		if (__sum & __mask) {
+			/* overflowed to negative, make max pos */
+			DST=0x7FFFFFFFFFFFFFFFLL; fSET_OVERFLOW();
+		} else {
+			/* overflowed to positive, make max neg */
+			DST=0x8000000000000000LL; fSET_OVERFLOW();
+		}
+	  } else {
+		/* signs did not mismatch, OK */
+		DST = __sum;
+	  }
+        },
+	(A_SATURATE)
+)
+
+DEF_MACRO(
+	fVSATUN,
+	(N,VAL),
+	"usat_##N(VAL)", /* short descr */
+	"Saturate VAL to N bits",
+	((fZXTN(N,64,VAL) == (VAL)) ? (VAL) : fVSATUVALN(N,VAL)),
+	(A_USATURATE)
+)
+
+DEF_MACRO(
+	fSATUN,
+	(N,VAL),
+	"usat_##N(VAL)", /* short descr */
+	"Saturate VAL to N bits",
+	((fZXTN(N,64,VAL) == (VAL)) ? (VAL) : fSATUVALN(N,VAL)),
+	(A_USATURATE)
+)
+
+DEF_MACRO(
+	fSATH,
+	(VAL),
+	"sat_16(VAL)", /* short descr */
+	"Saturate VAL to a signed half",
+	(fSATN(16,VAL)),
+	(A_SATURATE)
+)
+
+
+DEF_MACRO(
+	fSATUH,
+	(VAL),
+	"usat_16(VAL)", /* short descr */
+	"Saturate VAL to an unsigned half",
+	(fSATUN(16,VAL)),
+	(A_USATURATE)
+)
+
+DEF_MACRO(
+	fVSATH,
+	(VAL),
+	"sat_16(VAL)", /* short descr */
+	"Saturate VAL to a signed half",
+	(fVSATN(16,VAL)),
+	(A_SATURATE)
+)
+
+DEF_MACRO(
+	fVSATUH,
+	(VAL),
+	"usat_16(VAL)", /* short descr */
+	"Saturate VAL to a signed half",
+	(fVSATUN(16,VAL)),
+	(A_USATURATE)
+)
+
+
+DEF_MACRO(
+	fSATUB,
+	(VAL),
+	"usat_8(VAL)", /* short descr */
+	"Saturate VAL to an unsigned byte",
+	(fSATUN(8,VAL)),
+	(A_USATURATE)
+)
+DEF_MACRO(
+	fSATB,
+	(VAL),
+	"sat_8(VAL)", /* short descr */
+	"Saturate VAL to a signed byte",
+	(fSATN(8,VAL)),
+	(A_SATURATE)
+)
+
+
+DEF_MACRO(
+	fVSATUB,
+	(VAL),
+	"usat_8(VAL)", /* short descr */
+	"Saturate VAL to a unsigned byte",
+	(fVSATUN(8,VAL)),
+	(A_USATURATE)
+)
+DEF_MACRO(
+	fVSATB,
+	(VAL),
+	"sat_8(VAL)", /* short descr */
+	"Saturate VAL to a signed byte",
+	(fVSATN(8,VAL)),
+	(A_SATURATE)
+)
+
+
+
+
+/*************************************/
+/* immediate extension               */
+/*************************************/
+
+DEF_MACRO(
+	fIMMEXT,
+	(IMM),
+	"apply_extension(IMM)",
+	"Apply extension to IMM",
+	(IMM = IMM /* (insn->extension_valid?(fGET_EXTENSION | fZXTN(6,32,(IMM))):(IMM)) */ ),
+	(A_EXTENDABLE)
+)
+
+DEF_MACRO(
+	fMUST_IMMEXT,
+	(IMM),
+	"apply_extension(IMM)",
+	"Apply extension to IMM",
+	fIMMEXT(IMM),
+	(A_EXTENDABLE,A_MUST_EXTEND)
+)
+
+DEF_MACRO(
+	fPCALIGN,
+	(IMM),
+	"IMM=IMM & ~PCALIGN_MASK",
+	"",
+	IMM=(IMM & ~PCALIGN_MASK),
+	(A_EXTENDABLE)
+)
+
+DEF_MACRO(
+	fGET_EXTENSION,
+	,
+	"extension",
+	"extension",
+	insn->extension,
+	/* attrs */
+)
+
+/*************************************/
+/* Read and Write Implicit Regs      */
+/*************************************/
+
+DEF_MACRO(
+	fREAD_IREG, /* read link register */
+	(VAL), /* parameters */
+	"I", /* short description */
+	"I", /* long description */
+	(fSXTN(11,64,(((VAL) & 0xf0000000)>>21) | ((VAL>>17)&0x7f) )),          /* behavior */
+	()
+)
+
+
+DEF_MACRO(
+	fREAD_R0, /* read r0 */
+	(), /* parameters */
+	"R0", /* short description */
+	"R0", /* long description */
+	(READ_RREG(0)),          /* behavior */
+	(A_IMPLICIT_READS_R00)
+)
+
+DEF_MACRO(
+	fREAD_LR, /* read link register */
+	(), /* parameters */
+	"LR", /* short description */
+	"LR", /* long description */
+	(READ_RREG(REG_LR)),          /* behavior */
+	(A_IMPLICIT_READS_LR)
+)
+
+DEF_MACRO(
+	fREAD_SSR, /* read link register */
+	(), /* parameters */
+	"SSR", /* short description */
+	"SSR", /* long description */
+	(READ_RREG(REG_SSR)),          /* behavior */
+	()
+)
+
+DEF_MACRO(
+	fWRITE_R0, /* write r0 */
+	(A), /* parameters */
+	"R0=A", /* short description */
+	"R0=A", /* long description */
+	WRITE_RREG(0,A),          /* behavior */
+	(A_IMPLICIT_WRITES_R00)
+)
+
+DEF_MACRO(
+	fWRITE_LR, /* write lr */
+	(A), /* parameters */
+	"LR=A", /* short description */
+	"LR=A", /* long description */
+	WRITE_RREG(REG_LR,A),          /* behavior */
+	(A_IMPLICIT_WRITES_LR)
+)
+
+DEF_MACRO(
+	fWRITE_FP, /* write sp */
+	(A), /* parameters */
+	"FP=A", /* short description */
+	"FP=A", /* long description */
+	WRITE_RREG(REG_FP,A),          /* behavior */
+	(A_IMPLICIT_WRITES_FP)
+)
+
+DEF_MACRO(
+	fWRITE_SP, /* write sp */
+	(A), /* parameters */
+	"SP=A", /* short description */
+	"SP=A", /* long description */
+        WRITE_RREG(REG_SP,A),          /* behavior */
+	(A_IMPLICIT_WRITES_SP)
+)
+
+DEF_MACRO(
+	fWRITE_GOSP, /* write gosp */
+	(A), /* parameters */
+	"GOSP=A", /* short description */
+	"GOSP=A", /* long description */
+        WRITE_RREG(REG_GOSP,A),          /* behavior */
+	(A_IMPLICIT_WRITES_GOSP)
+)
+
+DEF_MACRO(
+	fWRITE_GP, /* write gp */
+	(A), /* parameters */
+	"GP=A", /* short description */
+	"GP=A", /* long description */
+	WRITE_RREG(REG_GP,A),          /* behavior */
+	(A_IMPLICIT_WRITES_GP)
+)
+
+DEF_MACRO(
+	fREAD_SP, /* read stack pointer */
+	(), /* parameters */
+	"SP", /* short description */
+	"SP", /* long description */
+	(READ_RREG(REG_SP)),          /* behavior */
+	(A_IMPLICIT_READS_SP)
+)
+
+DEF_MACRO(
+	fREAD_GOSP, /* read guest other stack pointer */
+	(), /* parameters */
+	"GOSP", /* short description */
+	"GOSP", /* long description */
+	(READ_RREG(REG_GOSP)),          /* behavior */
+	(A_IMPLICIT_READS_GOSP)
+)
+
+DEF_MACRO(
+	fREAD_GELR, /* read guest other stack pointer */
+	(), /* parameters */
+	"GELR", /* short description */
+	"GELR", /* long description */
+	(READ_RREG(REG_GELR)),          /* behavior */
+	(A_IMPLICIT_READS_GELR)
+)
+
+DEF_MACRO(
+	fREAD_GEVB, /* read guest other stack pointer */
+	(), /* parameters */
+	"GEVB", /* short description */
+	"GEVB", /* long description */
+	(READ_RREG(REG_GEVB)),          /* behavior */
+	(A_IMPLICIT_READS_GEVB)
+)
+
+DEF_MACRO(
+	fREAD_CSREG, /* read stack pointer */
+	(N), /* parameters */
+	"CS N", /* short description */
+	"CS N", /* long description */
+	(READ_RREG(REG_CSA+N)),          /* behavior */
+	(A_IMPLICIT_READS_CS)
+)
+
+DEF_MACRO(
+	fREAD_LC0, /* read loop count */
+	, /* parameters */
+	"LC0", /* short description */
+	"LC0", /* long description */
+	(READ_RREG(REG_LC0)),          /* behavior */
+	(A_IMPLICIT_READS_LC0)
+)
+
+DEF_MACRO(
+	fREAD_LC1, /* read loop count */
+	, /* parameters */
+	"LC1", /* short description */
+	"LC1", /* long description */
+	(READ_RREG(REG_LC1)),          /* behavior */
+	(A_IMPLICIT_READS_LC1)
+)
+
+DEF_MACRO(
+	fREAD_SA0, /* read start addr */
+	, /* parameters */
+	"SA0", /* short description */
+	"SA0", /* long description */
+	(READ_RREG(REG_SA0)),          /* behavior */
+	(A_IMPLICIT_READS_SA0)
+)
+
+DEF_MACRO(
+	fREAD_SA1, /* read start addr */
+	, /* parameters */
+	"SA1", /* short description */
+	"SA1", /* long description */
+	(READ_RREG(REG_SA1)),          /* behavior */
+	(A_IMPLICIT_READS_SA1)
+)
+
+
+DEF_MACRO(
+	fREAD_FP, /* read stack pointer */
+	(), /* parameters */
+	"FP", /* short description */
+	"FP", /* long description */
+	(READ_RREG(REG_FP)),          /* behavior */
+	(A_IMPLICIT_READS_FP)
+)
+
+DEF_MACRO(
+	fREAD_GP, /* read global pointer */
+	(), /* parameters */
+	"(Constant_extended ? (0) : GP)", /* short description */
+	"(Constant_extended ? (0) : GP)", /* long description */
+	(insn->extension_valid ? 0 : READ_RREG(REG_GP)),          /* behavior */
+	(A_IMPLICIT_READS_GP)
+)
+
+DEF_MACRO(
+	fREAD_PC, /* read PC */
+	(), /* parameters */
+	"PC", /* short description */
+	"PC", /* long description */
+	(READ_RREG(REG_PC)),          /* behavior */
+	(A_IMPLICIT_READS_PC)
+)
+
+DEF_MACRO(
+	fREAD_NPC, /* read PC */
+	(), /* parameters */
+	"NPC", /* short description */
+	"NPC", /* long description */
+	(thread->next_PC & (0xfffffffe)),          /* behavior */
+	()
+)
+
+DEF_MACRO(
+	fREAD_P0, /* read Predicate 0 */
+	(), /* parameters */
+	"P0", /* short description */
+	"P0", /* long description */
+	(READ_PREG(0)),          /* behavior */
+	(A_IMPLICIT_READS_P0)
+)
+
+DEF_MACRO(
+	fREAD_P3, /* read Predicate 0 */
+	(), /* parameters */
+	"P3", /* short description */
+	"P3", /* long description */
+	(READ_PREG(3)),          /* behavior */
+	(A_IMPLICIT_READS_P3)
+)
+
+DEF_MACRO(
+	fNOATTRIB_READ_P3, /* read Predicate 0 */
+	(), /* parameters */
+	"P3", /* short description */
+	"P3", /* long description */
+	(READ_PREG(3)),          /* behavior */
+	()
+)
+
+DEF_MACRO(
+	fINVALID,(),
+	"Invalid instruction!",
+	"Invalid instruction!",
+	(register_error_exception(thread,PRECISE_CAUSE_INVALID_PACKET,thread->Regs[REG_BADVA0],thread->Regs[REG_BADVA1],GET_SSR_FIELD(SSR_BVS),GET_SSR_FIELD(SSR_V0),GET_SSR_FIELD(SSR_V1),0)),
+	()
+)
+
+DEF_MACRO(
+	fCHECK_PCALIGN, (A),
+	"",
+	"",
+	/* EJP: if exception already detected, do not trigger pc unaligned exception since we will be rewinding PC anyway. */
+	/* Maybe this will screw up prioritization logic... but probably it is ok?  Otherwise we can ditch out of dealloc_return early if EXCEPTION_DETECTED */
+	if (((A) & PCALIGN_MASK)) {
+		register_error_exception(thread,PRECISE_CAUSE_PC_NOT_ALIGNED,thread->Regs[REG_BADVA0],thread->Regs[REG_BADVA1],GET_SSR_FIELD(SSR_BVS),GET_SSR_FIELD(SSR_V0),GET_SSR_FIELD(SSR_V1),0);
+	},
+	()
+)
+
+DEF_MACRO(
+	fCUREXT,(),
+	"",
+	"",
+	GET_SSR_FIELD(SSR_XA),
+	()
+)
+
+DEF_MACRO(
+	fCUREXT_WRAP,(EXT_NO),
+	"",
+	"",
+	{
+		EXT_NO = fCUREXT();
+		if (thread->processor_ptr->arch_proc_options->ext_contexts)
+			EXT_NO &= (thread->processor_ptr->arch_proc_options->ext_contexts-1);
+		else
+			EXT_NO = 0;
+		EXT_NO = thread->processor_ptr->features->QDSP6_CP_PRESENT ? (EXT_NO & 0x3) : (EXT_NO+4);
+	},
+	()
+)
+
+DEF_MACRO(
+	fWRITE_NPC, /* write next PC */
+	(A), /* parameters */
+	"PC=A", /* short description */
+	"PC=A", /* long description */
+	if (!thread->branch_taken) {
+           if (A != thread->next_PC) {
+             thread->next_pkt_guess=thread->last_pkt->taken_ptr;
+           }
+	   fCHECK_PCALIGN(A);
+           thread->branched = 1; thread->branch_taken = 1; thread->next_PC = A; \
+           thread->branch_offset = insn->encoding_offset; thread->branch_opcode = insn->opcode;
+        }
+         ,          /* behavior */
+	(A_IMPLICIT_WRITES_PC,A_COF)
+)
+
+
+DEF_MACRO(
+	fLOOPSTATS,
+	(A),
+	"",
+	"",
+	/* INC_TSTAT(tloopends);
+  	   INC_TSTATN(tloopend_same_line,(thread->Regs[REG_PC] & -32) == ((A) & -32));
+	   INC_TSTATN(tloopend_samepc,(thread->Regs[REG_PC]) == (A));*/ ,
+	()
+)
+
+
+DEF_MACRO(
+	fCOF_CALLBACK,
+	(LOC,TYPE),
+	"",
+	"",
+  {
+    thread->cof_log_to_va = (LOC);
+    thread->cof_log_coftype = (TYPE);
+    thread->cof_log_valid = 1;
+  },
+	()
+)
+
+DEF_MACRO(
+	fBRANCH,
+	(LOC,TYPE),
+	"PC=LOC",
+	"PC=LOC",
+	fWRITE_NPC(LOC); fCOF_CALLBACK(LOC,TYPE),
+	()
+)
+
+DEF_MACRO(
+	fTIME_JUMPR,
+	(REGNO,TARGET,TYPE),
+	"",
+	"",
+	{ sys_branch_return(thread, insn->slot, TARGET, REGNO); },
+	(A_INDIRECT)
+)
+
+
+DEF_MACRO(
+	fJUMPR,	/* A jumpr has executed */
+	(REGNO,TARGET,TYPE),
+	"PC=TARGET",
+	"PC=TARGET",
+	{ fTIME_JUMPR(REGNO,TARGET,TYPE); fBRANCH(TARGET,COF_TYPE_JUMPR);},
+	(A_INDIRECT)
+)
+
+DEF_MACRO(
+	fHINTJR,	/* A hintjr instruction has executed */
+	(TARGET),
+	"",
+	"",
+	{ },
+)
+
+DEF_MACRO(
+	fBP_RAS_CALL,
+	(A),
+	"",
+	"",
+	{ sys_branch_call(thread, insn->slot, A, fREAD_NPC()); },
+)
+
+DEF_MACRO(
+	fCALL,	/* Do a call */
+	(A),
+	"fWRITE_LR(fREAD_NPC()); fWRITE_NPC(A);",
+	"fWRITE_LR(fREAD_NPC()); fWRITE_NPC(A);",
+	if (!thread->branch_taken) {fBP_RAS_CALL(A); fWRITE_LR(fREAD_NPC()); fBRANCH(A,COF_TYPE_CALL);},
+	(A_IMPLICIT_WRITES_PC,A_COF,A_IMPLICIT_WRITES_LR,A_CALL)
+)
+
+DEF_MACRO(
+	fCALLR,	/* Do a call Register */
+	(A),
+	"fWRITE_LR(fREAD_NPC()); fWRITE_NPC(A);",
+	"fWRITE_LR(fREAD_NPC()); fWRITE_NPC(A);",
+	if (!thread->branch_taken) {fBP_RAS_CALL(A); fWRITE_LR(fREAD_NPC()); fBRANCH(A,COF_TYPE_CALLR);},
+	(A_IMPLICIT_WRITES_PC,A_COF,A_IMPLICIT_WRITES_LR,A_CALL)
+)
+
+DEF_MACRO(
+	fWRITE_LOOP_REGS0, /* write ln,sa,ea,lc */
+	(START,COUNT), /* parameters */
+	"SA0=START; LC0=COUNT", /* short description */
+	"SA0=START; LC0=COUNT", /* short description */
+	{WRITE_RREG(REG_LC0,COUNT);
+         WRITE_RREG(REG_SA0,START);},
+	(A_IMPLICIT_WRITES_LC0,A_IMPLICIT_WRITES_SA0)
+)
+
+DEF_MACRO(
+	fWRITE_LOOP_REGS1, /* write ln,sa,ea,lc */
+	(START,COUNT), /* parameters */
+	"SA1=START; LC1=COUNT", /* short description */
+	"SA1=START; LC1=COUNT", /* short description */
+	{WRITE_RREG(REG_LC1,COUNT);
+         WRITE_RREG(REG_SA1,START);},
+	(A_IMPLICIT_WRITES_LC1,A_IMPLICIT_WRITES_SA1)
+)
+
+DEF_MACRO(
+	fWRITE_LC0,
+	(VAL), /* parameters */
+	"LC0=VAL", /* short description */
+	"LC0=VAL",
+	WRITE_RREG(REG_LC0,VAL),
+	(A_IMPLICIT_WRITES_LC0)
+)
+
+DEF_MACRO(
+	fWRITE_LC1,
+	(VAL), /* parameters */
+	"LC1=VAL", /* short description */
+	"LC1=VAL",
+	WRITE_RREG(REG_LC1,VAL),
+	(A_IMPLICIT_WRITES_LC1)
+)
+
+DEF_MACRO(
+	fCARRY_FROM_ADD,
+	(A,B,C),
+	"carry_from_add(A,B,C)",
+	"carry_from_add(A,B,C)",
+	carry_from_add64(A,B,C),
+	/* NOTHING */
+)
+
+DEF_MACRO(
+	fSETCV_ADD,
+	(A,B,CARRY),
+	"SR[CV]=FLAGS(A+B)",
+	"SR[CV]=FLAGS(A+B)",
+	do {
+		SET_USR_FIELD(USR_C,gen_carry_add((A),(B),((A)+(B))));
+		SET_USR_FIELD(USR_V,gen_overflow_add((A),(B),((A)+(B))));
+	} while (0),
+	(A_IMPLICIT_WRITES_CVBITS,A_NOTE_CVFLAGS,A_RESTRICT_NOSRMOVE)
+)
+DEF_MACRO(
+	fSETCV_SUB,
+	(A,B,CARRY),
+	"SR[CV]=FLAGS(A-B)",
+	"SR[CV]=FLAGS(A-B)",
+	do {
+		SET_USR_FIELD(USR_C,gen_carry_add((A),(B),((A)-(B))));
+		SET_USR_FIELD(USR_V,gen_overflow_add((A),(B),((A)-(B))));
+	} while (0),
+	(A_IMPLICIT_WRITES_CVBITS,A_NOTE_CVFLAGS,A_RESTRICT_NOSRMOVE)
+)
+
+DEF_MACRO(
+	fSET_OVERFLOW,
+	(),
+	"USR.OVF=1",
+	"USR.OVF=1",
+	SET_USR_FIELD(USR_OVF,1),
+	(A_IMPLICIT_WRITES_SRBIT,A_NOTE_SR_OVF_WHEN_SATURATING,A_RESTRICT_NOSRMOVE)
+)
+
+DEF_MACRO(
+	fSET_LPCFG,
+	(VAL), /* parameters */
+	"USR.LPCFG=VAL",
+	"USR.LPCFG=VAL",
+	SET_USR_FIELD(USR_LPCFG,(VAL)),
+	(A_IMPLICIT_WRITES_LPCFG,A_RESTRICT_NOSRMOVE)
+)
+
+
+DEF_MACRO(
+	fGET_LPCFG,
+	, /* parameters */
+	"USR.LPCFG",
+	"USR.LPCFG",
+	(GET_USR_FIELD(USR_LPCFG)),
+	()
+)
+
+
+
+DEF_MACRO(
+	fWRITE_P0, /* write Predicate 0 */
+	(VAL), /* parameters */
+	"P0=VAL", /* short description */
+	"P0=VAL", /* long description */
+	WRITE_PREG(0,VAL),          /* behavior */
+	(A_IMPLICIT_WRITES_P0)
+)
+
+DEF_MACRO(
+	fWRITE_P1, /* write Predicate 0 */
+	(VAL), /* parameters */
+	"P1=VAL", /* short description */
+	"P1=VAL", /* long description */
+	WRITE_PREG(1,VAL),          /* behavior */
+	(A_IMPLICIT_WRITES_P1)
+)
+
+DEF_MACRO(
+	fWRITE_P2, /* write Predicate 0 */
+	(VAL), /* parameters */
+	"P2=VAL", /* short description */
+	"P2=VAL", /* long description */
+	WRITE_PREG(2,VAL),          /* behavior */
+	(A_IMPLICIT_WRITES_P2)
+)
+
+DEF_MACRO(
+	fWRITE_P3, /* write Predicate 0 */
+	(VAL), /* parameters */
+	"P3=VAL", /* short description */
+	"P3=VAL", /* long description */
+	WRITE_PREG(3,VAL),     /* behavior */
+	(A_IMPLICIT_WRITES_P3)
+)
+
+DEF_MACRO(
+	fWRITE_P3_LATE, /* write Predicate 0 */
+	(VAL), /* parameters */
+	"P3=VAL", /* short description */
+	"P3=VAL", /* long description */
+	{WRITE_PREG(3,VAL); fHIDE(MARK_LATE_PRED_WRITE(3))} ,          /* behavior */
+	(A_IMPLICIT_WRITES_P3,A_RESTRICT_LATEPRED)
+)
+
+
+DEF_MACRO(
+	fPART1, /* write Predicate 0 */
+	(WORK), /* parameters */
+	"WORK", /* short description */
+	"WORK", /* long description */
+	if (insn->part1) { WORK; return; },          /* behavior */
+	/* optional attributes */
+)
+
+
+/*************************************/
+/* Casting, Sign-Zero extension, etc */
+/*************************************/
+
+DEF_MACRO(
+	fCAST4u, /* macro name */
+	(A), /* parameters */
+	"A.uw[0]", /* short description */
+	"unsigned 32-bit A", /* long description */
+	((size4u_t)(A)),          /* behavior */
+	/* optional attributes */
+)
+
+DEF_MACRO(
+	fCAST4s, /* macro name */
+	(A), /* parameters */
+	"A.s32", /* short description */
+	"signed 32-bit A", /* long description */
+	((size4s_t)(A)),          /* behavior */
+	/* optional attributes */
+)
+
+DEF_MACRO(
+	fCAST8u, /* macro name */
+	(A), /* parameters */
+	"A.u64", /* short description */
+	"unsigned 64-bit A", /* long description */
+	((size8u_t)(A)),          /* behavior */
+	/* optional attributes */
+)
+
+DEF_MACRO(
+	fCAST8s, /* macro name */
+	(A), /* parameters */
+	"A.s64", /* short description */
+	"signed 64-bit A", /* long description */
+	((size8s_t)(A)),          /* behavior */
+	/* optional attributes */
+)
+
+DEF_MACRO(
+	fCAST2_2s, /* macro name */
+	(A), /* params */
+	"A",
+	"signed 16-bit A",
+	((size2s_t)(A)),
+	/* optional attributes */
+)
+
+DEF_MACRO(
+	fCAST2_2u, /* macro name */
+	(A), /* params */
+	"A",
+	"unsigned 16-bit A",
+	((size2u_t)(A)),
+	/* optional attributes */
+)
+
+DEF_MACRO(
+	fCAST4_4s, /* macro name */
+	(A), /* params */
+	"A",
+	"signed 32-bit A",
+	((size4s_t)(A)),
+	/* optional attributes */
+)
+
+DEF_MACRO(
+	fCAST4_4u, /* macro name */
+	(A), /* params */
+	"A",
+	"unsigned 32-bit A",
+	((size4u_t)(A)),
+	/* optional attributes */
+)
+
+
+DEF_MACRO(
+	fCAST4_8s, /* macro name */
+	(A), /* params */
+	"fSXTN(32,64,A)",
+	"32-bit A sign-extended to signed 64-bit",
+	((size8s_t)((size4s_t)(A))),
+	/* optional attributes */
+)
+
+DEF_MACRO(
+	fCAST4_8u, /* macro name */
+	(A), /* params */
+	"fZXTN(32,64,A)",
+	"32-bit A zero-extended to unsigned 64-bit",
+	((size8u_t)((size4u_t)(A))),
+	/* optional attributes */
+)
+
+DEF_MACRO(
+	fCAST8_8s, /* macro name */
+	(A), /* params */
+	"A",
+	"signed 64-bit A",
+	((size8s_t)(A)),
+	/* optional attributes */
+)
+
+DEF_MACRO(
+	fCAST8_8u, /* macro name */
+	(A), /* params */
+	"A",
+	"unsigned 64-bit A",
+	((size8u_t)(A)),
+	/* optional attributes */
+)
+
+DEF_MACRO(
+	fCAST2_8s, /* macro name */
+	(A), /* params */
+	"fSXTN(16,64,A)",
+	"16-bit A sign-extended to signed 64-bit",
+	((size8s_t)((size2s_t)(A))),
+	/* optional attributes */
+)
+
+DEF_MACRO(
+	fCAST2_8u, /* macro name */
+	(A), /* params */
+	"fZXTN(16,64,A)",
+	"16-bit A zero-extended to unsigned 64-bit",
+	((size8u_t)((size2u_t)(A))),
+	/* optional attributes */
+)
+
+DEF_MACRO(
+	fZE8_16, /* zero-extend 8 to 16 */
+	(A),
+	"A",
+	"Unsigned low 8 bits of A",
+	((size2s_t)((size1u_t)(A))),
+	/* optional attributes */
+)
+DEF_MACRO(
+	fSE8_16, /* sign-extend 8 to 16 */
+	(A),
+	"A",
+	"Signed low 8 bits of A",
+	((size2s_t)((size1s_t)(A))),
+	/* optional attributes */
+)
+
+
+DEF_MACRO(
+	fSE16_32, /* sign-extend 16 to 32 */
+	(A), /* parameters */
+	"A", /* short description */
+	"signed low 16-bits of A", /* long description */
+	((size4s_t)((size2s_t)(A))),          /* behavior */
+	/* optional attributes */
+)
+
+DEF_MACRO(
+	fZE16_32, /* zero-extend 16 to 32 */
+	(A), /* parameters */
+	"A", /* short description */
+	"signed low 16-bits of A", /* long description */
+	((size4u_t)((size2u_t)(A))),          /* behavior */
+	/* optional attributes */
+)
+
+DEF_MACRO(
+	fSE32_64,
+	(A), /* parameters */
+	"A", /* short description */
+	"A", /* long description */
+	( (size8s_t)((size4s_t)(A)) ),          /* behavior */
+	/* optional attributes */
+)
+
+DEF_MACRO(
+	fZE32_64,
+	(A), /* parameters */
+	"A", /* short description */
+	"zero-extend A from 32 to 64", /* long description */
+	( (size8u_t)((size4u_t)(A)) ),          /* behavior */
+	/* optional attributes */
+)
+
+DEF_MACRO(
+	fSE8_32, /* sign-extend 8 to 32 */
+	(A),
+	"A",
+	"Signed low 8 bits of A",
+	((size4s_t)((size1s_t)(A))),
+	/* optional attributes */
+)
+
+DEF_MACRO(
+	fZE8_32, /* zero-extend 8 to 32 */
+	(A),
+	"A",
+	"Unsigned low 8 bits of A",
+	((size4s_t)((size1u_t)(A))),
+	/* optional attributes */
+)
+
+/*************************************/
+/* DSP arithmetic support            */
+/************************************/
+DEF_MACRO(
+	fMPY8UU, /* multiply half integer */
+	(A,B), /* parameters */
+	"(A * B)", /* short description */
+	"unsigned 8-bit multiply of A by B", /* long description */
+	(int)(fZE8_16(A)*fZE8_16(B)),     /* behavior */
+	(A_MPY)
+)
+DEF_MACRO(
+	fMPY8US, /* multiply half integer */
+	(A,B), /* parameters */
+	"(A * B)", /* short description */
+	"unsigned 8-bit multiply of A by signed B", /* long description */
+	(int)(fZE8_16(A)*fSE8_16(B)),     /* behavior */
+	(A_MPY)
+)
+DEF_MACRO(
+	fMPY8SU, /* multiply half integer */
+	(A,B), /* parameters */
+	"(A * B)", /* short description */
+	"signed 8-bit multiply of A by unsigned B", /* long description */
+	(int)(fSE8_16(A)*fZE8_16(B)),     /* behavior */
+	(A_MPY)
+)
+
+DEF_MACRO(
+	fMPY8SS, /* multiply half integer */
+	(A,B), /* parameters */
+	"(A * B)", /* short description */
+	"signed 8-bit multiply of A by B", /* long description */
+	(int)((short)(A)*(short)(B)),     /* behavior */
+	(A_MPY)
+)
+
+DEF_MACRO(
+	fMPY16SS, /* multiply half integer */
+	(A,B), /* parameters */
+	"(A * B)", /* short description */
+	"signed 16-bit multiply of A by B", /* long description */
+	fSE32_64(fSE16_32(A)*fSE16_32(B)),     /* behavior */
+	(A_MPY)
+)
+
+DEF_MACRO(
+	fMPY16UU, /* multiply unsigned half integer */
+	(A,B), /* parameters */
+	"(A * B)", /* short description */
+	"multiply unsigned A by unsigned B", /* long description */
+	fZE32_64(fZE16_32(A)*fZE16_32(B)),     /* behavior */
+	(A_MPY)
+)
+
+DEF_MACRO(
+	fMPY16SU, /* multiply half integer */
+	(A,B), /* parameters */
+	"(A * B)", /* short description */
+	"signed 16-bit A times unsigned 16-bit B", /* long description */
+	fSE32_64(fSE16_32(A)*fZE16_32(B)),     /* behavior */
+	(A_MPY)
+)
+
+DEF_MACRO(
+	fMPY16US, /* multiply half integer */
+	(A,B), /* parameters */
+	"(A * B)", /* short description */
+	"unsigned 16-bit A times signed 16-bit B", /* long description */
+	fMPY16SU(B,A),
+	(A_MPY)
+)
+
+DEF_MACRO(
+	fMPY32SS, /* multiply half integer */
+	(A,B), /* parameters */
+	"(A * B)", /* short description */
+	"signed 32-bit multiply of A by B", /* long description */
+	(fSE32_64(A)*fSE32_64(B)),     /* behavior */
+	(A_MPY)
+)
+
+DEF_MACRO(
+	fMPY32UU, /* multiply half integer */
+	(A,B), /* parameters */
+	"(A * B)", /* short description */
+	"unsigned 32-bit multiply of A by B", /* long description */
+	(fZE32_64(A)*fZE32_64(B)),     /* behavior */
+	(A_MPY)
+)
+
+DEF_MACRO(
+	fMPY32SU, /* multiply half integer */
+	(A,B), /* parameters */
+	"(A * B)", /* short description */
+	"32-bit multiply of signed A by unsigned B", /* long description */
+	(fSE32_64(A)*fZE32_64(B)),     /* behavior */
+	(A_MPY)
+)
+
+DEF_MACRO(
+	fMPY3216SS, /* multiply mixed precision */
+	(A,B), /* parameters */
+	"(A * B)", /* short description */
+	"signed 16-bit multiply of A by B", /* long description */
+	(fSE32_64(A)*fSXTN(16,64,B)),     /* behavior */
+	(A_MPY)
+)
+
+DEF_MACRO(
+	fMPY3216SU, /* multiply mixed precision */
+	(A,B), /* parameters */
+	"(A * B)", /* short description */
+	"signed 16-bit multiply of A by unsigned B", /* long description */
+	(fSE32_64(A)*fZXTN(16,64,B)),     /* behavior */
+	(A_MPY)
+)
+
+DEF_MACRO(
+	fROUND, /* optional rounding */
+	(A), /* parameters */
+	"round(A)", /* short description */
+	"round(A)", /* long description */
+	(A+0x8000),
+	/* optional attributes */
+)
+
+DEF_MACRO(
+	fCLIP, /* optional rounding */
+	(DST,SRC,U), /* parameters */
+	"DST=MIN((1<<U)-1,MAX(SRC,-(1<<U)))", /* short description */
+	"DST=IN((1<<U)-1,MAX(SRC,-(1<<U)))", /* long description */
+	{ size4s_t maxv = (1<<U)-1;
+	 size4s_t minv = -(1<<U);
+	 DST = fMIN(maxv,fMAX(SRC,minv));
+	},
+	/* optional attributes */
+)
+
+DEF_MACRO(
+	fCRND, /* optional rounding */
+	(A), /* parameters */
+	"convround(A)", /* short description */
+	"convround(A)", /* long description */
+	((((A)&0x3)==0x3)?((A)+1):((A))),
+	/* optional attributes */
+)
+
+DEF_MACRO(
+	fRNDN, /* Rounding to a boundary */
+	(A,N), /* parameters */
+	"(N==0)?(A):round(A,2**(N-1))", /* short description */
+	"(N==0)?(A):round(A,2**(N-1))", /* long description */
+	((((N)==0)?(A):(((fSE32_64(A))+(1<<((N)-1)))))),
+	/* optional attributes */
+)
+
+DEF_MACRO(
+	fCRNDN, /* Rounding to a boundary */
+	(A,N), /* parameters */
+	"(N==0)?A:convround(A,2**(N-1))>>N", /* short description */
+	"(N==0)?A:convround(A,2**(N-1))>>N", /* long description */
+	(conv_round(A,N)),
+	/* optional attributes */
+)
+
+DEF_MACRO(
+	fCRNDN64, /* Rounding to a boundary */
+	(A,N), /* parameters */
+	"(N==0)?A:convround(A,2**(N-1))>>N", /* short description */
+	"(N==0)?A:convround(A,2**(N-1))>>N", /* long description */
+	(conv_round64(A,N)),
+	/* optional attributes */
+)
+DEF_MACRO(
+	fADD128, /* Rounding to a boundary */
+	(A,B), /* parameters */
+	"A+B", /* short description */
+	"", /* long description */
+	(add128(A, B)),
+	/* optional attributes */
+)
+DEF_MACRO(
+	fSUB128, /* Rounding to a boundary */
+	(A,B), /* parameters */
+	"A-B", /* short description */
+	"", /* long description */
+	(sub128(A, B)),
+	/* optional attributes */
+)
+DEF_MACRO(
+	fSHIFTR128, /* Rounding to a boundary */
+	(A,B), /* parameters */
+	"(size8s_t) (A >> B)", /* short description */
+	"", /* long description */
+	(shiftr128(A, B)),
+	/* optional attributes */
+)
+
+DEF_MACRO(
+	fSHIFTL128, /* Rounding to a boundary */
+	(A,B), /* parameters */
+	"(A << B)", /* short description */
+	"", /* long description */
+	(shiftl128(A, B)),
+	/* optional attributes */
+)
+
+DEF_MACRO(
+	fAND128, /* Rounding to a boundary */
+	(A,B), /* parameters */
+	"(A & B)", /* short description */
+	"", /* long description */
+	(and128(A, B)),
+	/* optional attributes */
+)
+
+DEF_MACRO(
+	fCAST8S_16S, /* Rounding to a boundary */
+	(A), /* parameters */
+	"sxt_{64->128}(A)", /* short description */
+	"", /* long description */
+	(cast8s_to_16s(A)),
+	/* optional attributes */
+)
+DEF_MACRO(
+	fCAST16S_8S, /* Rounding to a boundary */
+	(A), /* parameters */
+	"sxt_{128->64}(A)", /* short description */
+	"", /* long description */
+	(cast16s_to_8s(A)),
+	/* optional attributes */
+)
+DEF_MACRO(
+	fCAST16S_4S, /* Rounding to a boundary */
+	(A), /* parameters */
+	"sxt_{128->32}(A)", /* short description */
+	"", /* long description */
+	(cast16s_to_4s(A)),
+	/* optional attributes */
+)
+
+
+
+DEF_MACRO(
+	fEA_RI, /* Calculate EA with Register + Immediate Offset */
+	(REG,IMM),
+	"EA=REG+IMM",	/* short descr */
+	"EA=REG+IMM",	/* long descr */
+	do { EA=REG+IMM; fDOCHKPAGECROSS(REG,EA); } while (0),
+	(A_EA_REG_PLUS_IMM)
+)
+
+DEF_MACRO(
+	fEA_RRs, /* Calculate EA with Register + Registers scaled Offset */
+	(REG,REG2,SCALE),
+	"EA=REG+(REG2<<SCALE)",	/* short descr */
+	"EA=REG+(REG2<<SCALE)",	/* long descr */
+	do { EA=REG+(REG2<<SCALE); fDOCHKPAGECROSS(REG,EA); } while (0),
+	(A_EA_REG_PLUS_REGSCALED)
+)
+
+DEF_MACRO(
+	fEA_IRs, /* Calculate EA with Immediate + Registers scaled Offset */
+	(IMM,REG,SCALE),
+	"EA=IMM+(REG<<SCALE)",	/* short descr */
+	"EA=IMM+(REG<<SCALE)",	/* long descr */
+	do { EA=IMM+(REG<<SCALE); fDOCHKPAGECROSS(IMM,EA); } while (0),
+	(A_EA_IMM_PLUS_REGSCALED)
+)
+
+DEF_MACRO(
+	fEA_IMM, /* Calculate EA with Immediate */
+	(IMM),
+	"EA=IMM",	/* short descr */
+	"EA=IMM",	/* long descr */
+	EA=IMM,
+	(A_EA_IMM_ONLY)
+)
+
+DEF_MACRO(
+	fEA_REG, /* Calculate EA with REGISTER */
+	(REG),
+	"EA=REG",	/* short descr */
+	"EA=REG",	/* long descr */
+	EA=REG,
+	(A_EA_REG_ONLY)
+)
+
+DEF_MACRO(
+	fEA_BREVR, /* Calculate EA with bit reversed bottom of REGISTER */
+	(REG),
+	"EA=fbrev(REG)",	/* short descr */
+	"EA=fbrev(REG)",	/* long descr */
+	EA=fbrev(REG),
+	(A_EA_BREV_REG)
+)
+
+DEF_MACRO(
+	fEA_GPI, /* Calculate EA with Global Poitner + Immediate */
+	(IMM),
+	"EA=fREAD_GP()+IMM",	/* short descr */
+	"EA=fREAD_GP()+IMM",	/* long descr */
+    do { EA=fREAD_GP()+IMM; fGP_DOCHKPAGECROSS(fREAD_GP(),EA); } while (0),
+	(A_EA_GP_IMM)
+)
+
+DEF_MACRO(
+	fPM_I, /* Post Modify Register by Immediate*/
+	(REG,IMM),
+	"REG=REG+IMM",	/* short descr */
+	"REG=REG+IMM",	/* long descr */
+	do { REG = REG + IMM; fTIMING_AIA(EA,REG); } while (0),
+	(A_PM_I)
+)
+
+DEF_MACRO(
+	fPM_M, /* Post Modify Register by M register */
+	(REG,MVAL),
+	"REG=REG+MVAL",	/* short descr */
+	"REG=REG+MVAL",	/* long descr */
+	do { REG = REG + MVAL; fTIMING_AIA(EA,REG); } while (0),
+	(A_PM_M)
+)
+
+DEF_MACRO(
+	fPM_CIRI, /* Post Modify Register using Circular arithmetic by Immediate */
+	(REG,IMM,MVAL),
+	"REG=fcirc_add(REG,IMM,MVAL)",	/* short descr */
+	"REG=fcirc_add(REG,IMM,MVAL)",	/* long descr */
+	do { fcirc_add(REG,siV,MuV); fTIMING_AIA(EA,REG); } while (0),
+	(A_PM_CIRI)
+)
+
+DEF_MACRO(
+	fPM_CIRR, /* Post Modify Register using Circular arithmetic by register */
+	(REG,VAL,MVAL),
+	"REG=fcirc_add(REG,VAL,MVAL)",	/* short descr */
+	"REG=fcirc_add(REG,VAL,MVAL)",	/* long descr */
+	do { fcirc_add(REG,VAL,MuV); fTIMING_AIA(EA,REG); } while (0),
+	(A_PM_CIRR)
+)
+
+
+
+DEF_MACRO(
+	fMODCIRCU, /* modulo power of 2, positive output  */
+	(N,P), /* parameters */
+	"N modulo 2^P", /* short description */
+	"N modulo 2^P ", /* long description */
+	((N) & ((1<<(P))-1)),
+	/* optional attributes */
+)
+
+DEF_MACRO(
+	fSCALE, /* scale by N */
+	(N,A), /* parameters */
+	"A<<N", /* short description */
+	"A<<N", /* long description */
+	(((size8s_t)(A))<<N),
+	/* optional attributes */
+)
+DEF_MACRO(
+	fVSATW, /* saturating to 32-bits*/
+	(A), /* parameters */
+	"sat_32(A)", /* short description */
+	"saturate A to 32-bits", /* long description */
+	fVSATN(32,((long long)A)),
+	(A_SATURATE)
+)
+
+DEF_MACRO(
+	fSATW, /* saturating to 32-bits*/
+	(A), /* parameters */
+	"sat_32(A)", /* short description */
+	"saturate A to 32-bits", /* long description */
+	fSATN(32,((long long)A)),
+	(A_SATURATE)
+)
+
+DEF_MACRO(
+	fVSAT, /* saturating to 32-bits*/
+	(A), /* parameters */
+	"sat_32(A)", /* short description */
+	"saturate A to 32-bits", /* long description */
+	fVSATN(32,(A)),
+	(A_SATURATE)
+)
+
+DEF_MACRO(
+	fSAT, /* saturating to 32-bits*/
+	(A), /* parameters */
+	"sat_32(A)", /* short description */
+	"saturate A to 32-bits", /* long description */
+	fSATN(32,(A)),
+	(A_SATURATE)
+)
+
+DEF_MACRO(
+	fSAT_ORIG_SHL, /* Saturating to 32-bits, with original value, for shift left */
+	(A,ORIG_REG), /* parameters */
+	"sat_32(A)", /* short description */
+	"saturate A to 32-bits for left shift", /* long description */
+	((((size4s_t)((fSAT(A)) ^ ((size4s_t)(ORIG_REG)))) < 0) ?
+		fSATVALN(32,((size4s_t)(ORIG_REG))) :
+		((((ORIG_REG) > 0) && ((A) == 0)) ?
+			fSATVALN(32,(ORIG_REG)) :
+			fSAT(A))),
+	(A_SATURATE)
+)
+
+DEF_MACRO(
+	fPASS,
+	(A),
+	"A", /* short description */
+	"", /* long description */
+	A,
+)
+
+DEF_MACRO(
+	fRND, /* saturating to 32-bits*/
+	(A), /* parameters */
+	"((A)+1)>>1", /* short description */
+	"round(A)", /* long description */
+	(((A)+1)>>1),
+)
+
+
+DEF_MACRO(
+	fBIDIR_SHIFTL,
+	(SRC,SHAMT,REGSTYPE),
+	"bidir_shiftl(SRC,SHAMT)",
+	"bidir_shiftl(SRC,SHAMT)",
+	(((SHAMT) < 0) ? ((fCAST##REGSTYPE(SRC) >> ((-(SHAMT))-1)) >>1) : (fCAST##REGSTYPE(SRC) << (SHAMT))),
+	(A_BIDIRSHIFTL)
+)
+
+DEF_MACRO(
+	fBIDIR_ASHIFTL,
+	(SRC,SHAMT,REGSTYPE),
+	"(SHAMT>0)?(fCAST##REGSTYPE##s(SRC)<<SHAMT):(fCAST##REGSTYPE##s(SRC)>>SHAMT)",
+	"bidir_asl(fCAST##REGSTYPE##s(SRC),SHAMT)",
+	fBIDIR_SHIFTL(SRC,SHAMT,REGSTYPE##s),
+	(A_BIDIRSHIFTL)
+)
+
+DEF_MACRO(
+	fBIDIR_LSHIFTL,
+	(SRC,SHAMT,REGSTYPE),
+	"(SHAMT>0)?(fCAST##REGSTYPE##u(SRC)<<SHAMT):(fCAST##REGSTYPE##u(SRC)>>>SHAMT)",
+	"bidir_lsl(fCAST##REGSTYPE##u(SRC),SHAMT)",
+	fBIDIR_SHIFTL(SRC,SHAMT,REGSTYPE##u),
+	(A_BIDIRSHIFTL)
+)
+
+DEF_MACRO(
+	fBIDIR_ASHIFTL_SAT,
+	(SRC,SHAMT,REGSTYPE),
+	"bidir_shiftl(SRC,SHAMT)",
+	"bidir_shiftl(SRC,SHAMT)",
+	(((SHAMT) < 0) ? ((fCAST##REGSTYPE##s(SRC) >> ((-(SHAMT))-1)) >>1) : fSAT_ORIG_SHL(fCAST##REGSTYPE##s(SRC) << (SHAMT),(SRC))),
+	(A_BIDIRSHIFTL)
+)
+
+
+DEF_MACRO(
+	fBIDIR_SHIFTR,
+	(SRC,SHAMT,REGSTYPE),
+	"bidir_shiftr(SRC,SHAMT)",
+	"bidir_shiftr(SRC,SHAMT)",
+	(((SHAMT) < 0) ? ((fCAST##REGSTYPE(SRC) << ((-(SHAMT))-1)) << 1) : (fCAST##REGSTYPE(SRC) >> (SHAMT))),
+	(A_BIDIRSHIFTR)
+)
+
+DEF_MACRO(
+	fBIDIR_ASHIFTR,
+	(SRC,SHAMT,REGSTYPE),
+	"(SHAMT>0)?(fCAST##REGSTYPE##s(SRC)>>SHAMT):(fCAST##REGSTYPE##s(SRC)<<SHAMT)",
+	"bidir_asr(fCAST##REGSTYPE##s(SRC),SHAMT)",
+	fBIDIR_SHIFTR(SRC,SHAMT,REGSTYPE##s),
+	(A_BIDIRSHIFTR)
+)
+
+DEF_MACRO(
+	fBIDIR_LSHIFTR,
+	(SRC,SHAMT,REGSTYPE),
+	"(SHAMT>0)?(fCAST##REGSTYPE##u(SRC)>>>SHAMT):(fCAST##REGSTYPE##u(SRC)<<SHAMT)",
+	"bidir_lsr(fCAST##REGSTYPE##u(SRC),SHAMT)",
+	fBIDIR_SHIFTR(SRC,SHAMT,REGSTYPE##u),
+	(A_BIDIRSHIFTR)
+)
+
+DEF_MACRO(
+	fBIDIR_ASHIFTR_SAT,
+	(SRC,SHAMT,REGSTYPE),
+	"bidir_shiftr(SRC,SHAMT)",
+	"bidir_shiftr(SRC,SHAMT)",
+	(((SHAMT) < 0) ? fSAT_ORIG_SHL((fCAST##REGSTYPE##s(SRC) << ((-(SHAMT))-1)) << 1,(SRC)) : (fCAST##REGSTYPE##s(SRC) >> (SHAMT))),
+	(A_BIDIRSHIFTR)
+)
+
+DEF_MACRO(
+	fASHIFTR,
+	(SRC,SHAMT,REGSTYPE),
+	"SRC >> SHAMT",
+	"asr(SRC,SHAMT)",
+	(fCAST##REGSTYPE##s(SRC) >> (SHAMT)),
+	/* */
+)
+
+DEF_MACRO(
+	fLSHIFTR,
+	(SRC,SHAMT,REGSTYPE),
+	"SRC >>> SHAMT",
+	"lsr(SRC,SHAMT)",
+	(((SHAMT) >= 64)?0:(fCAST##REGSTYPE##u(SRC) >> (SHAMT))),
+	/* */
+)
+
+DEF_MACRO(
+	fROTL,
+	(SRC,SHAMT,REGSTYPE),
+	"SRC <<_{R} SHAMT",
+	"rol(SRC,SHAMT)",
+	(((SHAMT)==0) ? (SRC) : ((fCAST##REGSTYPE##u(SRC) << (SHAMT)) | \
+		((fCAST##REGSTYPE##u(SRC) >> ((sizeof(SRC)*8)-(SHAMT)))))),
+	/* */
+)
+
+DEF_MACRO(
+	fROTR,
+	(SRC,SHAMT,REGSTYPE),
+	"SRC >>_{R} SHAMT",
+	"ror(SRC,SHAMT)",
+	(((SHAMT)==0) ? (SRC) : ((fCAST##REGSTYPE##u(SRC) >> (SHAMT)) | \
+		((fCAST##REGSTYPE##u(SRC) << ((sizeof(SRC)*8)-(SHAMT)))))),
+	/* */
+)
+
+DEF_MACRO(
+	fASHIFTL,
+	(SRC,SHAMT,REGSTYPE),
+	"fCAST##REGSTYPE##s(SRC) << SHAMT",
+	"asl(SRC,SHAMT)",
+	(((SHAMT) >= 64)?0:(fCAST##REGSTYPE##s(SRC) << (SHAMT))),
+	/* */
+)
+
+/*************************************/
+/* Floating-Point Support            */
+/************************************/
+
+DEF_MACRO(
+	fFLOAT, /* name */
+	(A), /* parameters */
+	"A", /* short description */
+	"A", /* long description */
+	({ union { float f; size4u_t i; } _fipun; _fipun.i = (A); _fipun.f; }),     /* behavior */
+	(A_FPOP,A_FPSINGLE,A_IMPLICIT_WRITES_FPFLAGS,A_IMPLICIT_READS_FPRND)
+)
+
+DEF_MACRO(
+	fUNFLOAT, /* multiply half integer */
+	(A), /* parameters */
+	"A", /* short description */
+	"A", /* long description */
+	({ union { float f; size4u_t i; } _fipun; _fipun.f = (A); isnan(_fipun.f) ? 0xFFFFFFFFU : _fipun.i; }),     /* behavior */
+	(A_FPOP,A_FPSINGLE,A_IMPLICIT_WRITES_FPFLAGS,A_IMPLICIT_READS_FPRND)
+)
+
+DEF_MACRO(
+	fSFNANVAL,(),
+	"NaN",
+	"NaN",
+	0xffffffff,
+	()
+)
+
+DEF_MACRO(
+	fSFINFVAL,(A),
+	"sign(A) * Inf",
+	"sign(A) * Inf",
+	(((A) & 0x80000000) | 0x7f800000),
+	()
+)
+
+DEF_MACRO(
+	fSFONEVAL,(A),
+	"sign(A) * 1.0",
+	"sign(A) * 1.0",
+	(((A) & 0x80000000) | fUNFLOAT(1.0)),
+	()
+)
+
+DEF_MACRO(
+	fCHECKSFNAN,(DST,A),
+	"if (isnan(A)) DST = NaN;",
+	"if (isnan(A)) DST = NaN;",
+	do {
+		if (isnan(fFLOAT(A))) {
+			if ((fGETBIT(22,A)) == 0) fRAISEFLAGS(FE_INVALID);
+			DST = fSFNANVAL();
+		}
+	} while (0),
+	()
+)
+
+DEF_MACRO(
+	fCHECKSFNAN3,(DST,A,B,C),
+	"if (isnan(A) || isnan(B) || isnan(C)) DST = NaN;",
+	"if (isnan(A) || isnan(B) || isnan(C)) DST = NaN;",
+	do {
+		fCHECKSFNAN(DST,A);
+		fCHECKSFNAN(DST,B);
+		fCHECKSFNAN(DST,C);
+	} while (0),
+	()
+)
+
+DEF_MACRO(
+	fSF_BIAS,(),
+	"127",
+	"single precision bias",
+	127,
+	()
+)
+
+DEF_MACRO(
+	fSF_MANTBITS,(),
+	"23",
+	"single mantissa bits",
+	23,
+	()
+)
+
+DEF_MACRO(
+	fSF_RECIP_LOOKUP,(IDX),
+	"recip_lut[IDX]",
+	"Single Precision reciprocal Estimate",
+	arch_recip_lookup(IDX),
+	()
+)
+
+DEF_MACRO(
+	fSF_INVSQRT_LOOKUP,(IDX),
+	"invsqrt_lut[IDX]",
+	"Single Precision reciprocal Estimate",
+	arch_invsqrt_lookup(IDX),
+	()
+)
+
+DEF_MACRO(
+	fSF_MUL_POW2,(A,B),
+	"A * 2**B",
+	"A * 2**B",
+	(fUNFLOAT(fFLOAT(A) * fFLOAT((fSF_BIAS() + (B)) << fSF_MANTBITS()))),
+	()
+)
+
+DEF_MACRO(
+	fSF_GETEXP,(A),
+	"exponent(A)",
+	"exponent(A)",
+	(((A) >> fSF_MANTBITS()) & 0xff),
+	()
+)
+
+DEF_MACRO(
+	fSF_MAXEXP,(),
+	"254",
+	"SF maximum exponent",
+	(254),
+	()
+)
+
+DEF_MACRO(
+	fSF_RECIP_COMMON,(N,D,O,A),
+	"(N,D,O,A)=recip_common(N,D)",
+	"(N,D,O,A)=recip_common(N,D)",
+	arch_sf_recip_common(&N,&D,&O,&A),
+	(A_FPOP,A_IMPLICIT_WRITES_FPFLAGS)
+)
+
+DEF_MACRO(
+	fSF_INVSQRT_COMMON,(N,O,A),
+	"(N,O,A)=invsqrt_common(N)",
+	"(N,O,A)=invsqrt_common(N)",
+	arch_sf_invsqrt_common(&N,&O,&A),
+	(A_FPOP,A_IMPLICIT_WRITES_FPFLAGS)
+)
+
+DEF_MACRO(
+	fFMAFX,(A,B,C,ADJ),
+	"fmaf(A,B,C) * 2**(ADJ)",
+	"Fused Multiply Add w/ Scaling: (A*B+C)*2**ADJ",
+	internal_fmafx(A,B,C,fSXTN(8,64,ADJ)),
+	(A_MPY)
+)
+
+DEF_MACRO(
+	fFMAF,(A,B,C),
+	"fmaf(A,B,C)",
+	"Fused Multiply Add: (A*B+C)",
+	internal_fmafx(A,B,C,0),
+	(A_MPY)
+)
+
+DEF_MACRO(
+	fSFMPY,(A,B),
+	"A*B",
+	"Multiply: A*B",
+	internal_mpyf(A,B),
+	(A_MPY)
+)
+
+DEF_MACRO(
+	fMAKESF,(SIGN,EXP,MANT),
+	"-1**SIGN * 1.MANT * 2**(EXP-BIAS)",
+	"-1**SIGN * 1.MANT * 2**(EXP-BIAS)",
+	((((SIGN) & 1) << 31) | (((EXP) & 0xff) << fSF_MANTBITS()) |
+		((MANT) & ((1<<fSF_MANTBITS())-1))),
+	()
+)
+
+
+DEF_MACRO(
+	fDOUBLE, /* multiply half integer */
+	(A), /* parameters */
+	"A", /* short description */
+	"A", /* long description */
+	({ union { double f; size8u_t i; } _fipun; _fipun.i = (A); _fipun.f; }),     /* behavior */
+	(A_FPOP,A_FPDOUBLE,A_IMPLICIT_WRITES_FPFLAGS,A_IMPLICIT_READS_FPRND)
+)
+
+DEF_MACRO(
+	fUNDOUBLE, /* multiply half integer */
+	(A), /* parameters */
+	"A", /* short description */
+	"A", /* long description */
+	({ union { double f; size8u_t i; } _fipun; _fipun.f = (A); isnan(_fipun.f) ? 0xFFFFFFFFFFFFFFFFULL : _fipun.i; }),     /* behavior */
+	(A_FPOP,A_FPDOUBLE,A_IMPLICIT_WRITES_FPFLAGS,A_IMPLICIT_READS_FPRND)
+)
+
+DEF_MACRO(
+	fDFNANVAL,(),
+	"NaN",
+	"NaN",
+	0xffffffffffffffffULL,
+	()
+)
+
+DEF_MACRO(
+	fDFINFVAL,(A),
+	"sign(A) * Inf",
+	"sign(A) * Inf",
+	(((A) & 0x8000000000000000ULL) | 0x7ff0000000000000ULL),
+	()
+)
+
+DEF_MACRO(
+	fDFONEVAL,(A),
+	"sign(A) * 1.0",
+	"sign(A) * 1.0",
+	(((A) & 0x8000000000000000ULL) | fUNDOUBLE(1.0)),
+	()
+)
+
+DEF_MACRO(
+	fCHECKDFNAN,(DST,A),
+	"if (isnan(A)) DST = NaN;",
+	"if (isnan(A)) DST = NaN;",
+	do {
+		if (isnan(fDOUBLE(A))) {
+			if ((fGETBIT(51,A)) == 0) fRAISEFLAGS(FE_INVALID);
+			DST = fDFNANVAL();
+		}
+	} while (0),
+	()
+)
+
+DEF_MACRO(
+	fCHECKDFNAN3,(DST,A,B,C),
+	"if (isnan(A) || isnan(B) || isnan(C)) DST = NaN;",
+	"if (isnan(A) || isnan(B) || isnan(C)) DST = NaN;",
+	do {
+		fCHECKDFNAN(DST,A);
+		fCHECKDFNAN(DST,B);
+		fCHECKDFNAN(DST,C);
+	} while (0),
+	()
+)
+
+DEF_MACRO(
+	fDF_BIAS,(),
+	"1023",
+	"double precision bias",
+	1023,
+	()
+)
+
+DEF_MACRO(
+	fDF_ISNORMAL,(X),
+	"is_normal(X)",
+	"is X normal?",
+	(fpclassify(fDOUBLE(X)) == FP_NORMAL),
+	()
+)
+
+DEF_MACRO(
+	fDF_ISDENORM,(X),
+	"is_denormal(X)",
+	"is X denormal?",
+	(fpclassify(fDOUBLE(X)) == FP_SUBNORMAL),
+	()
+)
+
+DEF_MACRO(
+	fDF_ISBIG,(X),
+	"(df_exponent(X) >= 512)",
+	"is X sufficiently large for mpyfix (exp >= 512)?",
+	(fDF_GETEXP(X) >= 512),
+	()
+)
+
+DEF_MACRO(
+	fDF_MANTBITS,(),
+	"52",
+	"single mantissa bits",
+	52,
+	()
+)
+
+DEF_MACRO(
+	fDF_RECIP_LOOKUP,(IDX),
+	"recip_lut[IDX]",
+	"Single Precision reciprocal Estimate",
+	(size8u_t)(arch_recip_lookup(IDX)),
+	()
+)
+
+DEF_MACRO(
+	fDF_INVSQRT_LOOKUP,(IDX),
+	"invsqrt_lut[IDX]",
+	"Single Precision reciprocal square root Estimate",
+	(size8u_t)(arch_invsqrt_lookup(IDX)),
+	()
+)
+
+DEF_MACRO(
+	fDF_MUL_POW2,(A,B),
+	"A * 2**B",
+	"A * 2**B",
+	(fUNDOUBLE(fDOUBLE(A) * fDOUBLE((0ULL + fDF_BIAS() + (B)) << fDF_MANTBITS()))),
+	()
+)
+
+DEF_MACRO(
+	fDF_GETEXP,(A),
+	"exponent(A)",
+	"exponent(A)",
+	(((A) >> fDF_MANTBITS()) & 0x7ff),
+	()
+)
+
+DEF_MACRO(
+	fDF_MAXEXP,(),
+	"2046",
+	"SF maximum exponent",
+	(2046),
+	()
+)
+
+DEF_MACRO(
+	fDF_RECIP_COMMON,(N,D,O,A),
+	"(N,D,O,A)=recip_common(N,D)",
+	"(N,D,O,A)=recip_common(N,D)",
+	arch_df_recip_common(&N,&D,&O,&A),
+	(A_FPOP)
+)
+
+DEF_MACRO(
+	fDF_INVSQRT_COMMON,(N,O,A),
+	"(N,O,A)=invsqrt_common(N)",
+	"(N,O,A)=invsqrt_common(N)",
+	arch_df_invsqrt_common(&N,&O,&A),
+	(A_FPOP)
+)
+
+DEF_MACRO(
+	fFMA, (A,B,C),
+	"fma(A,B,C)",
+	"Fused Multiply Add: A*B+C",
+	internal_fma(A,B,C),
+	/* nothing */
+)
+
+DEF_MACRO(
+	fDFMPY, (A,B),
+	"A*B",
+	"Multiply: A*B",
+	internal_mpy(A,B),
+	/* nothing */
+)
+
+DEF_MACRO(
+	fDF_MPY_HH, (A,B,ACC),
+	"A*B with partial product ACC",
+	"Multiply: A*B with partial product ACC",
+	internal_mpyhh(A,B,ACC),
+	/* nothing */
+)
+
+DEF_MACRO(
+	fFMAX, (A,B,C,ADJ),
+	"fma(A,B,C)*2**(2*ADJ)",
+	"Fused Multiply Add: (A*B+C)*2**(2*ADJ)",
+	internal_fmax(A,B,C,fSXTN(8,64,ADJ)*2),
+	/* nothing */
+)
+
+DEF_MACRO(
+	fMAKEDF,(SIGN,EXP,MANT),
+	"-1**SIGN * 1.MANT * 2**(EXP-BIAS)",
+	"-1**SIGN * 1.MANT * 2**(EXP-BIAS)",
+	((((SIGN) & 1ULL) << 63) | (((EXP) & 0x7ffULL) << fDF_MANTBITS()) |
+		((MANT) & ((1ULL<<fDF_MANTBITS())-1))),
+	()
+)
+
+DEF_MACRO(
+	fFPOP_START,
+	(),
+	"fpop_start",
+	"Begin floating point operation",
+	arch_fpop_start(thread),
+	/* nothing */
+)
+
+DEF_MACRO(
+	fFPOP_END,
+	(),
+	"fpop_end",
+	"End floating point operation",
+	arch_fpop_end(thread),
+	/* nothing */
+)
+
+DEF_MACRO(
+	fFPSETROUND_NEAREST,
+	(),
+	"round_to_nearest()",
+	"Set rounding mode to Nearest",
+	fesetround(FE_TONEAREST),
+	/* nothing */
+)
+
+DEF_MACRO(
+	fFPSETROUND_CHOP,
+	(),
+	"round_to_zero()",
+	"Set rounding mode to Chop",
+	fesetround(FE_TOWARDZERO),
+	/* nothing */
+)
+
+DEF_MACRO(
+	fFPCANCELFLAGS,
+	(),
+	"cancel_flags()",
+	"Do not update Floating Point Flags",
+	feclearexcept(FE_ALL_EXCEPT),
+	/* nothing */
+)
+
+DEF_MACRO(
+	fISINFPROD,
+	(A,B),
+	"isinf(A*B)",
+	"True if the product of A and B is truly infinite",
+	((isinf(A) && isinf(B)) ||
+		(isinf(A) && isfinite(B) && ((B) != 0.0)) ||
+		(isinf(B) && isfinite(A) && ((A) != 0.0))),
+	/* nothing */
+)
+
+DEF_MACRO(
+	fISZEROPROD,
+	(A,B),
+	"is_true_zero(A*B)",
+	"True if the product of A and B is truly zero",
+	((((A) == 0.0) && isfinite(B)) || (((B) == 0.0) && isfinite(A))),
+	/* nothing */
+)
+
+DEF_MACRO(
+	fRAISEFLAGS,
+	(A),
+	"fpflags |= A",
+	"Raise Floating Point Flags",
+	arch_raise_fpflag(A),
+	/* NOTHING */
+)
+
+DEF_MACRO(
+	fDF_MAX,
+	(A,B),
+	"fmax(A,B)",
+	"Floating Point Maximum",
+	(((A)==(B))
+		? fDOUBLE(fUNDOUBLE(A) & fUNDOUBLE(B))
+		: fmax(A,B)),
+	(A_FPOP)
+)
+
+DEF_MACRO(
+	fDF_MIN,
+	(A,B),
+	"fmin(A,B)",
+	"Floating Point Maximum",
+	(((A)==(B))
+		? fDOUBLE(fUNDOUBLE(A) | fUNDOUBLE(B))
+		: fmin(A,B)),
+	(A_FPOP)
+)
+
+DEF_MACRO(
+	fSF_MAX,
+	(A,B),
+	"fmaxf(A,B)",
+	"Floating Point Maximum",
+	(((A)==(B))
+		? fFLOAT(fUNFLOAT(A) & fUNFLOAT(B))
+		: fmaxf(A,B)),
+	(A_FPOP)
+)
+
+DEF_MACRO(
+	fSF_MIN,
+	(A,B),
+	"fmin(A,B)",
+	"Floating Point Maximum",
+	(((A)==(B))
+		? fFLOAT(fUNFLOAT(A) | fUNFLOAT(B))
+		: fminf(A,B)),
+	(A_FPOP)
+)
+
+/*************************************/
+/* Load/Store support                */
+/*************************************/
+
+DEF_MACRO(fMMU,(ADDR),
+	"ADDR",
+	"ADDR",
+	/* mmu xlate */ ADDR,
+	(A_EXCEPTION_TLB,A_EXCEPTION_ACCESS)
+)
+
+DEF_MACRO(fcirc_add,(REG,INCR,IMMED),
+	"REG=circ_add(REG,INCR,IMMED)",
+	"REG=circ_add(REG,INCR,IMMED)",
+	(REG=fcircadd(thread, REG,INCR,IMMED, fREAD_CSREG(MuN))),
+	(A_CIRCADDR,A_IMPLICIT_READS_CS)
+)
+
+DEF_MACRO(fbrev,(REG),
+	"REG.h[1] | brev(REG.h[0])",
+	"REG.h[1] | brev(REG.h[0])",
+	(fbrevaddr(REG)),
+	(A_BREVADDR)
+)
+
+
+DEF_MACRO(fLOAD,(NUM,SIZE,SIGN,EA,DST),
+	"DST = *EA",
+	"Get SIZE bytes from memory at EA and save in DST.",
+	{ DST = (size##SIZE##SIGN##_t)MEM_LOAD##SIZE(thread,EA,insn); },
+	(A_LOAD,A_MEMLIKE,A_RESTRICT_SINGLE_MEM_FIRST)
+)
+
+DEF_MACRO(fMEMOP,(NUM,SIZE,SIGN,EA,FNTYPE,VALUE),
+	"DST = *EA",
+	"Get SIZE bytes from memory at EA and save in DST.",
+	{ memop##SIZE##_##FNTYPE(thread,EA,VALUE); },
+	(A_LOAD,A_STORE,A_MEMLIKE,A_RESTRICT_SINGLE_MEM_FIRST)
+)
+
+DEF_MACRO(fGET_FRAMEKEY,(),
+	"FRAMEKEY",
+	"FRAMEKEY",
+	READ_RREG(REG_FRAMEKEY),
+	(A_IMPLICIT_READS_FRAMEKEY)
+)
+
+DEF_MACRO(fFRAME_SCRAMBLE,(VAL),
+	"frame_scramble(VAL)",
+	"frame_scramble(VAL)",
+	((VAL) ^ (fCAST8u(fGET_FRAMEKEY()) << 32)),
+	/* ATTRIBS */
+)
+
+DEF_MACRO(fFRAME_UNSCRAMBLE,(VAL),
+	"frame_unscramble(VAL)",
+	"frame_unscramble(VAL)",
+	fFRAME_SCRAMBLE(VAL),
+	/* ATTRIBS */
+)
+
+DEF_MACRO(fFRAMECHECK,(ADDR,EA),
+	"frame_check_limit(ADDR)",
+	"frame_check_limit(ADDR)",
+	sys_check_framelimit(thread,ADDR,EA),
+	(A_IMPLICIT_READS_FRAMELIMIT)
+)
+
+DEF_MACRO(fLOAD_LOCKED,(NUM,SIZE,SIGN,EA,DST),
+	"DST = *EA; ",
+	"Get SIZE bytes from memory at EA and save in DST.",
+	{     DST = (size##SIZE##SIGN##_t)mem_load_locked(thread,EA,SIZE,insn);
+  },
+	(A_LOAD,A_MEMLIKE,A_RESTRICT_SINGLE_MEM_FIRST)
+)
+
+DEF_MACRO(fLOAD_PHYS,(NUM,SIZE,SIGN,SRC1,SRC2,DST),
+	"DST = *((SRC1&0x7ff) | (SRC2<<11))",
+	"Load from physical address",
+	{     DST = (size##SIZE##SIGN##_t)mem_load_phys(thread,SRC1,SRC2,insn);
+  },
+	(A_LOAD,A_MEMLIKE,A_RESTRICT_SINGLE_MEM_FIRST)
+)
+
+
+DEF_MACRO(fSTORE,(NUM,SIZE,EA,SRC),
+	"*EA = SRC",
+	"Store SIZE bytes from SRC into memory at EA",
+	{ MEM_STORE##SIZE(thread,EA,SRC,insn); },
+	(A_STORE,A_MEMLIKE,A_RESTRICT_SINGLE_MEM_FIRST)
+)
+
+
+DEF_MACRO(fSTORE_LOCKED,(NUM,SIZE,EA,SRC,PRED),
+	"if (lock_valid) { *EA = SRC; PRED = 0xff; lock_valid = 0; } else { PRED = 0; }",
+	"if the lock is valid, Store SIZE bytes from SRC into memory at EA, and invalidate the lock.  PRED is set to all 1s if the write actually took place.",
+	{ PRED = (mem_store_conditional(thread,EA,SRC,SIZE,insn) ? 0xff : 0); },
+	(A_STORE,A_MEMLIKE,A_RESTRICT_SINGLE_MEM_FIRST)
+)
+
+DEF_MACRO(fVTCM_MEMCPY,(DST,SRC,SIZE),
+	"for (i = 0; i <= SIZE; i++) { *(DST + i) = *(SRC + i); }",
+	"Copy SIZE+1 bytes from memory at SRC to memory at DST",
+	mem_vtcm_memcpy(thread, insn, DST, SRC, SIZE);,
+	/* ATTRIBS */
+)
+
+
+/*************************************/
+/* Permute Handler                   */
+/*************************************/
+
+DEF_MACRO(fPERMUTEH,(SRC0,SRC1,CTRL),
+	"permute(SRC0,SRC1,CTRL)",
+	"Permute SRC0 and SRC1 according to CTRL",
+	fpermuteh((SRC0),(SRC1),CTRL),
+	(A_PERM)
+)
+
+DEF_MACRO(fPERMUTEB,(SRC0,SRC1,CTRL),
+	"permute(SRC0,SRC1,CTRL)",
+	"Permute SRC0 and SRC1 according to CTRL",
+	fpermuteb((SRC0),(SRC1),CTRL),
+	(A_PERM)
+)
+
+/*************************************/
+/* Functions to help with bytes      */
+/*************************************/
+
+DEF_MACRO(fGETBYTE,(N,SRC),
+	"SRC.b[N]",
+	"Byte N from SRC",
+         ((size1s_t)((SRC>>((N)*8))&0xff)),
+	/* nothing */
+)
+
+DEF_MACRO(fGETUBYTE,(N,SRC),
+	"SRC.ub[N]",
+	"Byte N from SRC",
+         ((size1u_t)((SRC>>((N)*8))&0xff)),
+	/* nothing */
+)
+
+DEF_MACRO(fSETBYTE,(N,DST,VAL),
+	"DST.b[N]=VAL",
+	"Byte N from DST = VAL",
+	{
+	DST = (DST & ~(0x0ffLL<<((N)*8))) | (((size8u_t)((VAL) & 0x0ffLL)) << ((N)*8));
+	},
+	/* nothing */
+)
+
+DEF_MACRO(fGETHALF,(N,SRC),
+	"SRC.h[N]",
+	"Halfword N from SRC",
+         ((size2s_t)((SRC>>((N)*16))&0xffff)),
+	/* nothing */
+)
+
+DEF_MACRO(fGETUHALF,(N,SRC),
+	"SRC.uh[N]",
+	"Halfword N from SRC",
+         ((size2u_t)((SRC>>((N)*16))&0xffff)),
+	/* nothing */
+)
+
+DEF_MACRO(fSETHALF,(N,DST,VAL),
+	"DST.h[N]=VAL",
+	"Halfword N from DST = VAL",
+	{
+	DST = (DST & ~(0x0ffffLL<<((N)*16))) | (((size8u_t)((VAL) & 0x0ffff)) << ((N)*16));
+	},
+	/* nothing */
+)
+
+
+
+DEF_MACRO(fGETWORD,(N,SRC),
+	"SRC.w[N]",
+	"Word N from SRC",
+         ((size8s_t)((size4s_t)((SRC>>((N)*32))&0x0ffffffffLL))),
+	/* nothing */
+)
+
+DEF_MACRO(fGETUWORD,(N,SRC),
+	"SRC.uw[N]",
+	"Word N from SRC",
+         ((size8u_t)((size4u_t)((SRC>>((N)*32))&0x0ffffffffLL))),
+	/* nothing */
+)
+
+DEF_MACRO(fSETWORD,(N,DST,VAL),
+	"DST.w[N]=VAL",
+	"Word N from DST = VAL",
+	{
+	DST = (DST & ~(0x0ffffffffLL<<((N)*32))) | (((VAL) & 0x0ffffffffLL) << ((N)*32));
+	},
+	/* nothing */
+)
+
+DEF_MACRO(fACC,(),
+    "",
+    "",
+    ,
+    (A_ACC)
+)
+
+DEF_MACRO(fEXTENSION_AUDIO,(A),
+    "A",
+    "A",
+    A,
+    (A_EXTENSION_AUDIO,A_NOTE_EXTENSION_AUDIO)
+)
+
+DEF_MACRO(fSETBIT,(N,DST,VAL),
+	"DST.N = VAL",
+	"Set bit N in DST to VAL",
+	{
+	DST = (DST & ~(1ULL<<(N))) | (((size8u_t)(VAL))<<(N));
+	},
+	/* nothing */
+)
+
+DEF_MACRO(fGETBIT,(N,SRC),
+	"SRC.N",
+	"Get bit N from SRC",
+	(((SRC)>>N)&1),
+	/* nothing */
+)
+
+
+DEF_MACRO(fSETBITS,(HI,LO,DST,VAL),
+	"DST[HI:LO] = VAL",
+	"Set bits from HI to LO in DST to VAL",
+	do {
+        int j;
+        for (j=LO;j<=HI;j++) {
+          fSETBIT(j,DST,VAL);
+        }
+	} while (0),
+	/* nothing */
+)
+
+DEF_MACRO(fUNDEFINED,(),
+	"UNDEFINED",
+	"UNDEFINED",
+	warn("[UNDEFINED]: architecturally undefined"),
+	/* NOTHING */
+)
+
+/*************************************/
+/* Used for parity, etc........      */
+/*************************************/
+DEF_MACRO(fCOUNTONES_2,(VAL),
+	"count_ones(VAL)",
+	"Count the number of bits set in VAL",
+	count_ones_2(VAL),
+	/* nothing */
+)
+
+DEF_MACRO(fCOUNTONES_4,(VAL),
+	"count_ones(VAL)",
+	"Count the number of bits set in VAL",
+	count_ones_4(VAL),
+	/* nothing */
+)
+
+DEF_MACRO(fCOUNTONES_8,(VAL),
+	"count_ones(VAL)",
+	"Count the number of bits set in VAL",
+	count_ones_8(VAL),
+	/* nothing */
+)
+
+DEF_MACRO(fBREV_8,(VAL),
+	"reverse_bits(VAL)",
+	"Count the number of bits set in VAL",
+	reverse_bits_8(VAL),
+	/* nothing */
+)
+
+DEF_MACRO(fBREV_4,(VAL),
+	"reverse_bits(VAL)",
+	"Count the number of bits set in VAL",
+	reverse_bits_4(VAL),
+	/* nothing */
+)
+
+DEF_MACRO(fBREV_2,(VAL),
+	"reverse_bits(VAL)",
+	"Count the number of bits set in VAL",
+	reverse_bits_2(VAL),
+	/* nothing */
+)
+
+DEF_MACRO(fBREV_1,(VAL),
+	"reverse_bits(VAL)",
+	"Count the number of bits set in VAL",
+	reverse_bits_1(VAL),
+	/* nothing */
+)
+
+DEF_MACRO(fCL1_8,(VAL),
+	"count_leading_ones(VAL)",
+	"Count the number of bits set in VAL",
+	count_leading_ones_8(VAL),
+	/* nothing */
+)
+
+DEF_MACRO(fCL1_4,(VAL),
+	"count_leading_ones(VAL)",
+	"Count the number of bits set in VAL",
+	count_leading_ones_4(VAL),
+	/* nothing */
+)
+
+DEF_MACRO(fCL1_2,(VAL),
+	"count_leading_ones(VAL)",
+	"Count the number of bits set in VAL",
+	count_leading_ones_2(VAL),
+	/* nothing */
+)
+
+DEF_MACRO(fCL1_1,(VAL),
+	"count_leading_ones(VAL)",
+	"Count the number of bits set in VAL",
+	count_leading_ones_1(VAL),
+	/* nothing */
+)
+
+DEF_MACRO(fINTERLEAVE,(ODD,EVEN),
+	"interleave(ODD,EVEN)",
+	"Interleave odd bits from ODD with even bits from EVEN",
+	interleave(ODD,EVEN),
+	/* nothing */
+)
+
+DEF_MACRO(fDEINTERLEAVE,(MIXED),
+	"deinterleave(ODD,EVEN)",
+	"Deinterleave odd bits into high half even bits to low half",
+	deinterleave(MIXED),
+	/* nothing */
+)
+
+DEF_MACRO(fNORM16,(VAL),
+	"get norm of 16bit value(VAL)",
+	"the number of leading sign bits in VAL",
+        ((VAL == 0) ? (31) : (fMAX(fCL1_2( VAL),fCL1_2(~VAL))-1)),
+	/* nothing */
+)
+
+
+
+DEF_MACRO(fHIDE,(A),
+	"",
+	"",
+	A,
+	()
+)
+
+DEF_MACRO(fASM_MAP,(A,B),
+	"Assembler mapped to: B",
+	"Assembler mapped to: B",
+	fatal("ASM_MAP instruction " A "->" B " executed.");,
+	()
+)
+
+DEF_MACRO(fCOND_ASM_MAP,(A,C,X,Y),
+	"if (C) {Assembler mapped to: X;} else {Assembler mapped to: Y;}@",
+	"if (C) {Assembler mapped to: Y;} else {Assembler mapped to: Y;}@",
+	fatal("ASM_MAP instruction " A "->" X "/" Y " executed.");,
+	()
+)
+
+DEF_MACRO(fCONSTLL,(A),
+	"A",
+	"A",
+	A##LL,
+)
+
+DEF_MACRO(fCONSTULL,(A),
+	"A",
+	"A",
+	A##ULL,
+)
+
+/* Do the things in the parens, but don't print the parens. */
+DEF_MACRO(fECHO,(A),
+	"A",
+	"A",
+	(A),
+	/* nothing */
+)
+
+
+/********************************************/
+/* OS interface and stop/wait               */
+/********************************************/
+
+DEF_MACRO(RUNNABLE_THREADS_MAX,,
+	"THREADS_MAX",
+	"THREADS_MAX",
+	(thread->processor_ptr->runnable_threads_max),
+	()
+)
+
+DEF_MACRO(THREAD_IS_ON,(PROC,TNUM),
+	"THREAD IS ENABLE",
+	"Thread is enabled in the revid",
+	((PROC->arch_proc_options->thread_enable_mask>>TNUM) & 0x1),
+	()
+)
+
+DEF_MACRO(THREAD_EN_MASK,(PROC),
+	"THREAD IS ENABLE MASK",
+	"Thread is enabled in the revid",
+	((PROC->arch_proc_options->thread_enable_mask)),
+	()
+)
+
+
+
+DEF_MACRO(READ_IMASK,(TH),
+	"IMASK[TH]",
+	"IMASK[TH]",
+         (((TH) >= (thread->processor_ptr->runnable_threads_max)) ? 0 : (thread->processor_ptr->thread[TH]->Regs[REG_IMASK])),
+	(A_IMPLICIT_READS_IMASK_ANYTHREAD)
+)
+DEF_MACRO(WRITE_IMASK,(TH,VAL),
+	"IMASK[TH]=VAL",
+	"IMASK[TH]=VAL",
+	 if ((TH) < (thread->processor_ptr->runnable_threads_max)) { thread->processor_ptr->thread[TH]->Regs[REG_IMASK]=(VAL & reg_mutability[REG_IMASK] ); },
+	(A_IMPLICIT_WRITES_IMASK_ANYTHREAD)
+)
+
+
+DEF_MACRO(WRITE_PRIO,(TH,VAL),
+	"TID[TH].PRIO=VAL",
+	"TID[TH].PRIO=VAL",
+	{
+		if ((TH) < (thread->processor_ptr->runnable_threads_max)) {
+			size4u_t tid_reg = thread->processor_ptr->thread[TH]->Regs[REG_TID];
+			fINSERT_BITS(tid_reg, reg_field_info[STID_PRIO].width, reg_field_info[STID_PRIO].offset, VAL);
+			LOG_OTHER_THREAD_REG_WRITE(thread,REG_TID,tid_reg,TH);
+		}
+	},
+	(A_IMPLICIT_WRITES_STID_PRIO_ANYTHREAD)
+)
+
+
+DEF_MACRO(DO_IASSIGNW,(REG),
+	"IASSIGNW(REG)",
+	"IASSIGNW(REG)",
+	{
+        int i;
+        int intbitpos = ((REG>>16)&0xF);
+        for (i=0;i<RUNNABLE_THREADS_MAX;i++) {
+			if(( (thread->processor_ptr->arch_proc_options->thread_enable_mask>>i) & 0x1)) {
+				fINSERT_BITS(thread->processor_ptr->thread[i]->Regs[REG_IMASK],1, intbitpos, (REG>>i) & 1);
+           }
+		}
+	},
+	(A_IMPLICIT_WRITES_IMASK_ANYTHREAD)
+)
+
+
+
+
+DEF_MACRO(fDO_NMI,(SREG),
+	"Raise NMI on threads",
+	"Raise NMI on threads",
+	{
+		int i;
+		for (i=0;i<RUNNABLE_THREADS_MAX;i++) {
+			if( ( (thread->processor_ptr->arch_proc_options->thread_enable_mask>>i) & 0x1) ) {
+				if (SREG & (1<<i)) {
+					register_nmi_interrupt(thread->processor_ptr->thread[i]);
+				}
+			}
+		}
+	},
+)
+
+DEF_MACRO(fDO_TRACE,(SREG),
+	"Send value to ETM trace",
+	"Send value to ETM trace",
+	{
+		fHIDE(CALLBACK(thread->processor_ptr->options->trace_callback,
+			thread->system_ptr,thread->processor_ptr,
+			thread->threadId,SREG);)
+        },
+)
+
+DEF_MACRO(DO_IASSIGNR,(SREG,DREG),
+	"DREG=IASSIGNR(SREG)",
+	"DREG=IASSIGNR(SREG)",
+	{
+		int i;
+		int result=0;
+		int intbitpos = ((SREG>>16)&0xF);
+		for (i=0;i<RUNNABLE_THREADS_MAX;i++) {
+			if(( (thread->processor_ptr->arch_proc_options->thread_enable_mask>>i) & 0x1)) {
+				result |= (((thread->processor_ptr->thread[i]->Regs[REG_IMASK]>>intbitpos)&1)<<i);
+			}
+		}
+		DREG=result;
+	},
+	(A_IMPLICIT_READS_IMASK_ANYTHREAD)
+)
+
+#ifdef NEW_INTERRUPTS
+
+DEF_MACRO(DO_SWI,(REG),
+	"IPEND |= REG",
+	"IPEND |= REG",
+        {fHIDE(CALLBACK(thread->processor_ptr->options->swi_callback,
+         thread->system_ptr,thread->processor_ptr,
+         thread->threadId,REG));
+	 fLOG_GLOBAL_REG_FIELD(IPENDAD,IPENDAD_IPEND,fREAD_GLOBAL_REG_FIELD(IPENDAD,IPENDAD_IPEND)|(REG));
+        },
+	(A_EXCEPTION_SWI,A_IMPLICIT_READS_IPENDAD_IPEND,A_IMPLICIT_WRITES_IPENDAD_IPEND)
+)
+
+DEF_MACRO(DO_CSWI,(REG),
+	"IPEND &= ~REG",
+	"IPEND &= ~REG",
+	fLOG_GLOBAL_REG_FIELD(IPENDAD,IPENDAD_IPEND,fREAD_GLOBAL_REG_FIELD(IPENDAD,IPENDAD_IPEND) & ~(REG));,
+	(A_IMPLICIT_READS_IPENDAD_IPEND,A_IMPLICIT_WRITES_IPENDAD_IPEND)
+)
+
+DEF_MACRO(DO_CIAD,(VAL),
+	"IAD &= ~VAL",
+	"IAD &= ~VAL",
+	sys_ciad(thread,VAL);
+	fLOG_GLOBAL_REG_FIELD(IPENDAD,IPENDAD_IAD,fREAD_GLOBAL_REG_FIELD(IPENDAD,IPENDAD_IAD) & ~(VAL));,
+	(A_EXCEPTION_SWI,A_IMPLICIT_READS_IPENDAD_IAD,A_IMPLICIT_WRITES_IPENDAD_IAD)
+)
+
+DEF_MACRO(DO_SIAD,(VAL),
+	"IAD |= VAL",
+	"IAD |= VAL",
+	sys_siad(thread,VAL);
+	fLOG_GLOBAL_REG_FIELD(IPENDAD,IPENDAD_IAD,fREAD_GLOBAL_REG_FIELD(IPENDAD,IPENDAD_IAD) | (VAL));,
+	(A_EXCEPTION_SWI,A_IMPLICIT_READS_IPENDAD_IAD,A_IMPLICIT_WRITES_IPENDAD_IAD)
+)
+
+#else
+
+DEF_MACRO(DO_SWI,(REG),
+        "IPEND |= REG",
+        "IPEND |= REG",
+        {fHIDE(CALLBACK(thread->processor_ptr->options->swi_callback,
+         thread->system_ptr,thread->processor_ptr,
+         thread->threadId,REG));
+         LOG_GLOBAL_REG_WRITE(REG_IPEND,(GLOBAL_REG_READ(REG_IPEND) | (REG & GLOBAL_REG_READ(REG_IEL))));
+        },
+        (A_EXCEPTION_SWI)
+)
+
+DEF_MACRO(DO_CSWI,(REG),
+        "IPEND &= ~REG",
+        "IPEND &= ~REG",
+        LOG_GLOBAL_REG_WRITE(REG_IPEND,GLOBAL_REG_READ(REG_IPEND) & ~((REG) & GLOBAL_REG_READ(REG_IEL)));,
+        ()
+)
+
+DEF_MACRO(DO_CIAD,(VAL),
+        "IAD &= ~VAL",
+        "IAD &= ~VAL",
+        sys_ciad(thread,VAL); LOG_GLOBAL_REG_WRITE(REG_IAD,GLOBAL_REG_READ(REG_IAD) & ~(VAL));,
+        (A_EXCEPTION_SWI)
+)
+
+DEF_MACRO(DO_SIAD,(VAL),
+        "IAD |= VAL",
+        "IAD |= VAL",
+        sys_siad(thread,VAL); LOG_GLOBAL_REG_WRITE(REG_IAD,GLOBAL_REG_READ(REG_IAD) | (VAL));,
+        (A_EXCEPTION_SWI)
+)
+
+
+#endif
+
+DEF_MACRO(fBREAK,(),
+	"Enter Debug mode",
+	"Enter Debug mode",
+        {isdb_brkpt_insn(thread->processor_ptr,thread->threadId);},
+	()
+)
+
+DEF_MACRO(fGP_DOCHKPAGECROSS,(BASE,SUM),
+	"",
+	"",
+	if (!(insn->extension_valid)) {
+        fDOCHKPAGECROSS(BASE,SUM);
+    },
+	(A_EA_PAGECROSS)
+)
+
+DEF_MACRO(fDOCHKPAGECROSS,(BASE,SUM),
+	"",
+	"",
+	if (thread->bq_on) {
+		thread->mem_access[insn->slot].check_page_crosses = 1;
+		thread->mem_access[insn->slot].page_cross_base = BASE;
+		thread->mem_access[insn->slot].page_cross_sum = SUM;
+	},
+	(A_EA_PAGECROSS)
+)
+
+DEF_MACRO(fTIMING_AIA,(OLDVA,NEWVA),
+	"",
+	"",
+  if(thread->bq_on){
+	thread->mem_access[insn->slot].post_updated = 1;
+	thread->mem_access[insn->slot].post_updated_va = NEWVA;
+	insn->is_aia = 1;
+	},
+	(A_PM_ANY)
+)
+
+
+DEF_MACRO(fPAUSE,(IMM),
+	"Pause for IMM cycles",
+	"Pause for IMM cycles",
+        {sys_pause(thread, insn->slot, IMM);},
+	()
+)
+
+
+#if 1
+/* The way it should be */
+DEF_MACRO(fTRAP,(TRAPTYPE,IMM),
+	"SSR.CAUSE = IMM; TRAP # TRAPTYPE",
+	"SSR.CAUSE = IMM; TRAP # TRAPTYPE",
+    warn("Trap NPC=%x ",fREAD_NPC());
+	warn("Trap exception, PCYCLE=%lld TYPE=%d NPC=%x IMM=0x%x",thread->processor_ptr->pstats[pcycles],TRAPTYPE,fREAD_NPC(),IMM);
+	register_trap_exception(thread,fREAD_NPC(),TRAPTYPE,IMM);,
+	(A_EXCEPTION_SWI)
+)
+#else
+DEF_MACRO(fTRAP,(TRAPTYPE,IMM),
+        "SSR.CAUSE = IMM; TRAP # TRAPTYPE",
+        "SSR.CAUSE = IMM; TRAP # TRAPTYPE",
+        if ((TRAPTYPE == 0) && (IMM == 0) &&
+              (!thread->sandbox_execution) &&
+              (!thread->processor_ptr->options->disable_angelswi)) {
+                sim_handle_trap(thread->system_ptr,thread->processor_ptr,thread->threadId,0);
+                /*thread->status |= EXEC_STATUS_SWI;*/
+        } else if ((TRAPTYPE == 0) && (IMM == 0xdb) &&
+               (!thread->sandbox_execution) &&
+               (!thread->processor_ptr->options->disable_angelswi)) {
+                sim_handle_debug(thread->system_ptr,thread->processor_ptr,thread->threadId,0xdb);
+        } else {
+          warn("Trap exception, TYPE=%d NPC=%x IMM=0x%x",TRAPTYPE,fREAD_NPC(),IMM);
+          register_trap_exception(thread,fREAD_NPC(),TRAPTYPE,IMM);
+        },
+        (A_EXCEPTION_SWI)
+)
+#endif
+
+DEF_MACRO(fINTERNAL_CLEAR_SAMEPAGE,
+	(),
+	"",
+	"",
+	/* force re-xlate at next fetch, refresh of in_user_mode, etc */
+	/* Permissions change too... */
+	sys_utlb_invalidate(thread->processor_ptr,thread),
+	/* NOTHING */
+)
+
+DEF_MACRO(fCLEAR_RTE_EX,(),
+      "SSR.SSR_EX = 0",
+      "SSR.SSR_EX = 0",
+      {
+        fLOG_REG_FIELD(SSR,SSR_EX,0);
+  	    fINTERNAL_CLEAR_SAMEPAGE();
+      },
+      ()
+)
+
+DEF_MACRO(fTLB_LOCK_AVAILABLE,(),
+	"SYSCFG.TLBLOCK == 0",
+	"SYSCFG.TLBLOCK == 0",
+	(fREAD_GLOBAL_REG_FIELD(SYSCONF,SYSCFG_TLBLOCK) == 0),
+	()
+)
+
+DEF_MACRO(fK0_LOCK_AVAILABLE,(),
+	"SYSCFG.K0LOCK == 0",
+	"SYSCFG.K0LOCK == 0",
+	(fREAD_GLOBAL_REG_FIELD(SYSCONF,SYSCFG_K0LOCK) == 0),
+	()
+)
+
+DEF_MACRO(fSET_TLB_LOCK,(),
+      "if (can_aquire_tlb_lock) {SYSCFG.TLBLOCK = 1;} else {sleep_until_available;}",
+      "if (can_aquire_tlb_lock) {SYSCFG.TLBLOCK = 1;} else {sleep_until_available;}",
+      {
+      if (fTLB_LOCK_AVAILABLE()) {
+        fLOG_GLOBAL_REG_FIELD(SYSCONF,SYSCFG_TLBLOCK,1);
+      } else {
+        sys_waiting_for_tlb_lock(thread);
+      }
+      },
+      (A_IMPLICIT_READS_SYSCFG_TLBLOCK,A_IMPLICIT_WRITES_SYSCFG_TLBLOCK)
+)
+
+DEF_MACRO(fSET_K0_LOCK,(),
+      "if (can_aquire_k0_lock) {SYSCFG.K0LOCK = 1;} else {sleep_until_available;}",
+      "if (can_aquire_k0_lock) {SYSCFG.K0LOCK = 1;} else {sleep_until_available;}",
+      {
+      if (fK0_LOCK_AVAILABLE() && sys_k0lock_queue_ready(thread)) {
+	warn("k0lock: T%d: PC=0x%x: PCycle=%lld",thread->threadId,thread->Regs[REG_PC],thread->processor_ptr->pstats[pcycles]);
+        fLOG_GLOBAL_REG_FIELD(SYSCONF,SYSCFG_K0LOCK,1);
+      } else {
+      warn("k0lock_waiting: T%d: PC=0x%x: PCycle=%lld",thread->threadId,thread->Regs[REG_PC],thread->processor_ptr->pstats[pcycles]);
+        sys_waiting_for_k0_lock(thread);
+      }
+      },
+      (A_IMPLICIT_READS_SYSCFG_K0LOCK,A_IMPLICIT_WRITES_SYSCFG_K0LOCK)
+)
+
+DEF_MACRO(fCLEAR_TLB_LOCK,(),
+      "SYSCFG.TLBLOCK = 0",
+      "SYSCFG.TLBLOCK = 0",
+      {
+		int i;
+		fLOG_GLOBAL_REG_FIELD(SYSCONF,SYSCFG_TLBLOCK,0);
+		for (i = 0; i < RUNNABLE_THREADS_MAX; i++) {
+			if(( (thread->processor_ptr->arch_proc_options->thread_enable_mask>>i) & 0x1)) {
+				thread->processor_ptr->thread[i]->cu_tlb_lock_waiting = 0;
+			}
+		}
+      },
+      (A_IMPLICIT_READS_SYSCFG_TLBLOCK,A_IMPLICIT_WRITES_SYSCFG_TLBLOCK)
+)
+
+DEF_MACRO(fCLEAR_K0_LOCK,(),
+      "SYSCFG.K0LOCK = 0",
+      "SYSCFG.K0LOCK = 0",
+      do {
+      warn("k0unlock: T%d: PC=0x%x: Pcycle=%lld",thread->threadId,thread->Regs[REG_PC], thread->processor_ptr->pstats[pcycles]);
+      sys_initiate_clear_k0_lock(thread);
+      } while (0),
+      (A_IMPLICIT_READS_SYSCFG_K0LOCK,A_IMPLICIT_WRITES_SYSCFG_K0LOCK)
+)
+
+DEF_MACRO(fWRITE_REG_FIELD,(REG,FIELD,VAL),
+	"REG.FIELD = VAL",
+	"REG.FIELD = VAL",
+	fINSERT_BITS(thread->Regs[REG_##REG],
+            reg_field_info[FIELD].width,
+            reg_field_info[FIELD].offset,VAL),
+)
+
+DEF_MACRO(fALIGN_REG_FIELD_VALUE,(FIELD,VAL),
+	"VAL << FIELD.OFFSET",
+	"VAL << FIELD.OFFSET",
+	((VAL)<<reg_field_info[FIELD].offset),
+	/* */
+)
+
+DEF_MACRO(fGET_REG_FIELD_MASK,(FIELD),
+	"VAL << FIELD.OFFSET",
+	"VAL << FIELD.OFFSET",
+	(((1<<reg_field_info[FIELD].width)-1)<<reg_field_info[FIELD].offset),
+	/* */
+)
+
+DEF_MACRO(fLOG_REG_FIELD,(REG,FIELD,VAL),
+	"REG.FIELD = VAL",
+	"REG.FIELD = VAL",
+	LOG_MASKED_REG_WRITE(thread,REG_##REG,
+		fALIGN_REG_FIELD_VALUE(FIELD,VAL),
+		fGET_REG_FIELD_MASK(FIELD)),
+	()
+)
+
+DEF_MACRO(fWRITE_GLOBAL_REG_FIELD,(REG,FIELD,VAL),
+	"REG.FIELD = VAL",
+	"REG.FIELD = VAL",
+	fINSERT_BITS(thread->processor_ptr->global_regs[REG_##REG],
+            reg_field_info[FIELD].width,
+            reg_field_info[FIELD].offset,VAL),
+)
+
+DEF_MACRO(fLOG_GLOBAL_REG_FIELD,(REG,FIELD,VAL),
+	"REG.FIELD = VAL",
+	"REG.FIELD = VAL",
+	LOG_MASKED_GLOBAL_REG_WRITE(REG_##REG,
+		fALIGN_REG_FIELD_VALUE(FIELD,VAL),
+		fGET_REG_FIELD_MASK(FIELD)),
+	()
+)
+
+DEF_MACRO(fREAD_REG_FIELD,(REG,FIELD),
+	"REG.FIELD",
+	"REG.FIELD",
+	fEXTRACTU_BITS(thread->Regs[REG_##REG],
+            reg_field_info[FIELD].width,
+            reg_field_info[FIELD].offset),
+	/* ATTRIBS */
+)
+
+DEF_MACRO(fREAD_GLOBAL_REG_FIELD,(REG,FIELD),
+	"REG.FIELD",
+	"REG.FIELD",
+	fEXTRACTU_BITS(thread->processor_ptr->global_regs[REG_##REG],
+            reg_field_info[FIELD].width,
+            reg_field_info[FIELD].offset),
+	/* ATTRIBS */
+)
+
+DEF_MACRO(fGET_FIELD,(VAL,FIELD),
+	"VAL.FIELD",
+	"VAL.FIELD",
+	fEXTRACTU_BITS(VAL,
+		reg_field_info[FIELD].width,
+		reg_field_info[FIELD].offset),
+	/* ATTRIBS */
+)
+
+DEF_MACRO(fSET_FIELD,(VAL,FIELD,NEWVAL),
+	"VAL.FIELD",
+	"VAL.FIELD",
+	fINSERT_BITS(VAL,
+		reg_field_info[FIELD].width,
+		reg_field_info[FIELD].offset,
+		(NEWVAL)),
+	/* ATTRIBS */
+)
+
+DEF_MACRO(fSET_RUN_MODE_NOW,(TNUM),
+	"modectl[TNUM] = 1",
+	"modectl[TNUM] = 1",
+        {thread->processor_ptr->global_regs[REG_MODECTL] |= (1<<TNUM);
+         thread->last_commit_cycle = thread->processor_ptr->pcycle_counter;
+         sys_recalc_num_running_threads(thread->processor_ptr);},
+)
+
+DEF_MACRO(fIN_DEBUG_MODE,(TNUM),
+	"in_debug_mode",
+	"in_debug_mode",
+	(thread->debug_mode || (fREAD_GLOBAL_REG_FIELD(ISDBST,ISDBST_DEBUGMODE) & 1<<TNUM)),
+	()
+)
+DEF_MACRO(fIN_DEBUG_MODE_NO_ISDB,(TNUM),
+	"in_debug_mode",
+	"in_debug_mode",
+	(thread->debug_mode),
+	()
+)
+
+
+DEF_MACRO(fIN_DEBUG_MODE_WARN,(TNUM),
+	"",
+	"",
+	{
+		if (fREAD_GLOBAL_REG_FIELD(ISDBST,ISDBST_DEBUGMODE) & 1<<TNUM)
+			warn("In ISDB debug mode, but TB told me to step normally");
+	},
+	()
+)
+
+DEF_MACRO(fCLEAR_RUN_MODE,(TNUM),
+	"modectl[TNUM] = 0",
+	"modectl[TNUM] = 0",
+	{fLOG_GLOBAL_REG_FIELD(MODECTL,MODECTL_E,
+    fREAD_GLOBAL_REG_FIELD(MODECTL,MODECTL_E) & ~(1<<(TNUM)))},
+	/* NOTHING */
+)
+
+DEF_MACRO(fCLEAR_RUN_MODE_NOW,(TNUM),
+	"modectl[TNUM] = 0",
+	"modectl[TNUM] = 0",
+	do {
+		fWRITE_GLOBAL_REG_FIELD(MODECTL,MODECTL_E,
+			fREAD_GLOBAL_REG_FIELD(MODECTL,MODECTL_E) & ~(1<<(TNUM)));
+		sys_recalc_num_running_threads(thread->processor_ptr);
+	} while (0),
+	/* NOTHING */
+)
+
+DEF_MACRO(fGET_RUN_MODE,(TNUM),
+	"modectl[TNUM]",
+	"modectl[TNUM]",
+        ((thread->processor_ptr->global_regs[REG_MODECTL]>>TNUM)&0x1),
+)
+
+DEF_MACRO(fSET_WAIT_MODE,(TNUM),
+	"modectl[(TNUM+16)] = 1",
+	"modectl[(TNUM+16)] = 1",
+	{fLOG_GLOBAL_REG_FIELD(MODECTL,MODECTL_W,
+    fREAD_GLOBAL_REG_FIELD(MODECTL,MODECTL_W) | 1<<(TNUM))},
+	/* NOTHING */
+)
+
+DEF_MACRO(fCLEAR_WAIT_MODE,(TNUM),
+	"modectl[(TNUM+16)] = 0",
+	"modectl[(TNUM+16)] = 0",
+        {thread->processor_ptr->global_regs[REG_MODECTL] &= ~(1<<(TNUM+16));
+         thread->last_commit_cycle = thread->processor_ptr->pcycle_counter;
+         sys_recalc_num_running_threads(thread->processor_ptr);},
+)
+
+DEF_MACRO(fGET_WAIT_MODE,(TNUM),
+	"modectl[(TNUM+16)]",
+	"modectl[(TNUM+16)]",
+        ((thread->processor_ptr->global_regs[REG_MODECTL]>>(TNUM+16))&0x1),
+)
+
+
+DEF_MACRO(fRESET_THREAD,(T,NUM),
+	"reset_thread(NUM)",
+	"reset_thread(NUM)",
+        register_reset_interrupt(T,NUM),
+)
+
+DEF_MACRO(fREAD_CURRENT_EVB,(),
+	"EVB",
+	"EVB",
+	(GLOBAL_REG_READ(REG_EVB)),
+	/* nothing */
+)
+
+DEF_MACRO(fREAD_ELR,(),
+	"ELR",
+	"ELR",
+	READ_RREG(REG_ELR),
+	(A_IMPLICIT_READS_ELR)
+)
+
+DEF_MACRO(fPOW2_HELP_ROUNDUP,(VAL),
+	"helper for pow2_roundup",
+	"helper for pow2_roundup",
+	((VAL) | ((VAL) >> 1) | ((VAL) >> 2) | ((VAL) >> 4) | ((VAL) >> 8) | ((VAL) >> 16)),
+	()
+)
+
+DEF_MACRO(fPOW2_ROUNDUP,(VAL),
+	"pow2_roundup(VAL)",
+	"pow2_roundup(VAL)",
+	fPOW2_HELP_ROUNDUP((VAL)-1)+1,
+	()
+)
+
+DEF_MACRO(fTLB_IDXMASK,(INDEX),
+	"INDEX % TLBSIZE",
+	"INDEX % TLBSIZE",
+	((INDEX) & (fPOW2_ROUNDUP(fCAST4u(thread->processor_ptr->arch_proc_options->jtlb_size)) - 1)),
+	()
+)
+
+DEF_MACRO(fTLB_NONPOW2WRAP,(INDEX),
+	"INDEX % TLBSIZE",
+	"INDEX % TLBSIZE",
+	(((INDEX) >= thread->processor_ptr->arch_proc_options->jtlb_size) ? ((INDEX) - thread->processor_ptr->arch_proc_options->jtlb_size) : (INDEX)),
+	/* ATTRIBS */
+)
+
+DEF_MACRO(fTLBW,(INDEX,VALUE),
+	"TLB[INDEX] = VALUE",
+	"TLB[INDEX] = VALUE",
+	do {size4u_t __myidx = fTLB_NONPOW2WRAP(fTLB_IDXMASK(INDEX));
+         TLB_REG_WRITE(__myidx,VALUE);
+         fHIDE(CALLBACK(thread->processor_ptr->options->tlbw_callback,thread->system_ptr,thread->processor_ptr,thread->threadId,__myidx);)
+         fHIDE(sys_tlb_write(thread,__myidx,VALUE);)} while (0),
+	/* ATTRIBS */
+)
+
+DEF_MACRO(fTLB_ENTRY_OVERLAP,(VALUE),
+	"CHECK_TLB_OVERLAP(VALUE)",
+	"CHECK_TLB_OVERLAP(VALUE)",
+     fHIDE( (sys_check_overlap(thread,VALUE)!=-2) ),
+	/* ATTRIBS */
+)
+
+DEF_MACRO(fTLB_ENTRY_OVERLAP_IDX,(VALUE),
+	"GET_OVERLAPPING_IDX(VALUE)",
+	"GET_OVERLAPPING_IDX(VALUE)",
+     fHIDE(sys_check_overlap(thread,VALUE)),
+	/* ATTRIBS */
+)
+
+
+DEF_MACRO(fTLBR,(INDEX),
+	"TLB[INDEX]",
+	"TLB[INDEX]",
+        TLB_REG_READ(fTLB_NONPOW2WRAP(fTLB_IDXMASK(INDEX))),
+	/* ATTRIBS */
+)
+
+DEF_MACRO(fTLBP,(TLBHI),
+	"search_TLB(TLBHI)",
+	"search_TLB(TLBHI)",
+        tlb_lookup(thread,((TLBHI)>>12),((TLBHI)<<12),1),
+	/* attribs */
+)
+
+
+
+DEF_MACRO(READ_SGP0,(),
+	"SGP0",
+	"SGP0",
+	READ_RREG(REG_SGP),
+	(A_IMPLICIT_READS_SGP0)
+)
+
+DEF_MACRO(READ_SGP1,(),
+	"SGP1",
+	"SGP1",
+	READ_RREG(REG_SGP+1),
+	(A_IMPLICIT_READS_SGP1)
+)
+
+DEF_MACRO(READ_SGP10,(),
+	"SGP",
+	"SGP",
+	READ_RREG_PAIR(REG_SGP),
+	(A_IMPLICIT_READS_SGP0,A_IMPLICIT_READS_SGP1)
+)
+
+DEF_MACRO(READ_UGP,(),
+	"UGP",
+	"UGP",
+	READ_RREG(REG_UGP),
+)
+
+DEF_MACRO(WRITE_SGP0,(VAL),
+	"SGP0 = VAL",
+	"SGP0 = VAL",
+        WRITE_RREG(REG_SGP,VAL),
+	(A_IMPLICIT_WRITES_SGP0)
+)
+
+DEF_MACRO(WRITE_SGP1,(VAL),
+	"SGP1 = VAL",
+	"SGP1 = VAL",
+        WRITE_RREG(REG_SGP+1,VAL),
+	(A_IMPLICIT_WRITES_SGP1)
+)
+
+DEF_MACRO(WRITE_SGP10,(VAL),
+	"SGP = VAL",
+	"SGP = VAL",
+        WRITE_RREG_PAIR(REG_SGP,VAL),
+	(A_IMPLICIT_WRITES_SGP0,A_IMPLICIT_WRITES_SGP1)
+)
+
+DEF_MACRO(WRITE_UGP,(VAL),
+	"UGP = VAL",
+	"UGP = VAL",
+        WRITE_RREG(REG_UGP,VAL),
+)
+
+DEF_MACRO(fSTART,(REG),
+	"start(REG)",
+	"start(REG)",
+	fLOG_GLOBAL_REG_FIELD(MODECTL,MODECTL_E, fREAD_GLOBAL_REG_FIELD(MODECTL,MODECTL_E) | (((REG & ((1<<RUNNABLE_THREADS_MAX)-1))) & THREAD_EN_MASK(thread->processor_ptr))),
+	()
+)
+
+DEF_MACRO(fRESUME,(REG),
+	"resume(REG)",
+	"resume(REG)",
+	fLOG_GLOBAL_REG_FIELD(MODECTL,MODECTL_W,
+		fREAD_GLOBAL_REG_FIELD(MODECTL,MODECTL_W) & (~(REG))),
+	()
+)
+
+DEF_MACRO(fGET_TNUM,(),
+	"TNUM",
+	"TNUM",
+	thread->threadId,
+	()
+)
+
+/********************************************/
+/* Cache Management                         */
+/********************************************/
+
+DEF_MACRO(fBARRIER,(),
+	"memory_barrier",
+	"memory_barrier",
+     {
+        sys_barrier(thread, insn->slot);
+     },
+	()
+)
+
+DEF_MACRO(fSYNCH,(),
+	"memory_synch",
+	"memory_synch",
+  {
+      sys_sync(thread, insn->slot);
+  },
+	()
+)
+
+DEF_MACRO(fISYNC,(),
+	"instruction_sync",
+	"instruction_sync",
+  {
+      sys_isync(thread, insn->slot);
+  },
+	()
+)
+
+
+DEF_MACRO(fICFETCH,(REG),
+	"icache_fetch(REG)",
+	"icache_fetch(REG)",
+	/* Unimplemented for now in uarch... cache_nru_icache_fetch(thread->processor_ptr, thread->threadId, (REG)) */,
+	()
+)
+
+DEF_MACRO(fDCFETCH,(REG),
+	"dcache_fetch(REG)",
+	"dcache_fetch(REG)",
+	sys_dcfetch(thread, (REG), insn->slot),
+	(A_MEMLIKE,A_RESTRICT_SINGLE_MEM_FIRST)
+)
+
+DEF_MACRO(fICINVIDX,(REG),
+	"icache_inv_idx(REG)",
+	"icache_inv_idx(REG)",
+  {
+  	arch_internal_flush(thread->processor_ptr,0,0xffffffff);
+  }
+  ,
+	()
+)
+
+DEF_MACRO(fICINVA,(REG),
+	"icache_inv_addr(REG)",
+	"icache_inv_addr(REG)",
+	{
+	arch_internal_flush(thread->processor_ptr, 0, 0xffffffff);
+   	sys_icinva(thread, (REG),insn->slot);
+	},
+	(A_ICINVA)
+)
+
+DEF_MACRO(fICKILL,(),
+	"icache_inv_all()",
+	"icache_inv_all()",
+        arch_internal_flush(thread->processor_ptr, 0, 0xffffffff);
+        cache_kill_icache(thread->processor_ptr); ,
+	()
+)
+
+DEF_MACRO(fDCKILL,(),
+	"dcache_inv_all()",
+	"dcache_inv_all()",
+        cache_kill_dcache(thread->processor_ptr); ,
+	()
+)
+
+DEF_MACRO(fL2KILL,(),
+	"l2cache_inv_all()",
+	"l2cache_inv_all()",
+        cache_kill_l2cache(thread->processor_ptr); ,
+	(A_IMPLICIT_READS_SYSCFG_GCA,A_IMPLICIT_WRITES_SYSCFG_GCA)
+)
+
+DEF_MACRO(fL2UNLOCK,(),
+	"l2cache_global_unlock()",
+	"l2cache_global_unlock()",
+	sys_l2gunlock(thread),
+	(A_IMPLICIT_READS_SYSCFG_GCA,A_IMPLICIT_WRITES_SYSCFG_GCA)
+)
+
+DEF_MACRO(fL2CLEAN,(),
+	"l2cache_global_clean()",
+	"l2cache_global_clean()",
+	sys_l2gclean(thread),
+	(A_IMPLICIT_READS_SYSCFG_GCA,A_IMPLICIT_WRITES_SYSCFG_GCA)
+)
+
+DEF_MACRO(fL2CLEANINV,(),
+	"l2cache_global_clean_inv()",
+	"l2cache_global_clean_inv()",
+	sys_l2gcleaninv(thread),
+	(A_IMPLICIT_READS_SYSCFG_GCA,A_IMPLICIT_WRITES_SYSCFG_GCA)
+)
+
+DEF_MACRO(fL2CLEANPA,(REG),
+	"l2cache_global_clean_range(REG)",
+	"l2cache_global_clean_range(REG)",
+	sys_l2gclean_pa(thread,REG),
+	(A_IMPLICIT_READS_SYSCFG_GCA,A_IMPLICIT_WRITES_SYSCFG_GCA)
+)
+
+DEF_MACRO(fL2CLEANINVPA,(REG),
+	"l2cache_global_clean_inv_range(REG)",
+	"l2cache_global_clean_inv_range(REG)",
+	sys_l2gcleaninv_pa(thread,REG),
+	(A_IMPLICIT_READS_SYSCFG_GCA,A_IMPLICIT_WRITES_SYSCFG_GCA)
+)
+
+
+DEF_MACRO(fL2CLEANINVIDX,(REG),
+	"l2cache_clean_invalidate_idx(REG)",
+	"l2cache_clean_invalidate_idx(REG)",
+	sys_l2cleaninvidx(thread, (REG)),
+	(A_MEMLIKE,A_RESTRICT_SINGLE_MEM_FIRST)
+)
+
+DEF_MACRO(fL2CLEANIDX,(REG),
+	"l2cache_clean_idx(REG)",
+	"l2cache_clean_idx(REG)",
+	sys_l2cleanidx(thread, (REG)),
+	(A_MEMLIKE,A_RESTRICT_SINGLE_MEM_FIRST)
+)
+
+DEF_MACRO(fL2INVIDX,(REG),
+	"l2cache_inv_idx(REG)",
+	"l2cache_inv_idx(REG)",
+	sys_l2invidx(thread, (REG)),
+	(A_MEMLIKE,A_RESTRICT_SINGLE_MEM_FIRST)
+)
+
+DEF_MACRO(fDCTAGR,(INDEX,DST,DSTREGNO),
+	"dcache_tag_read(INDEX)",
+	"dcache_tag_read(INDEX)",
+	({DST=sys_dctagr(thread, INDEX, insn->slot,DSTREGNO);})/* FIXME */,
+	()
+)
+
+DEF_MACRO(fDCTAGW,(INDEX,PART2),
+	"dcache_tag_write(INDEX,PART2)",
+	"dcache_tag_write(INDEX,PART2)",
+	(sys_dctagw(thread, INDEX, PART2, insn->slot)),
+	()
+)
+DEF_MACRO(fICTAGR,(INDEX,DST,REGNO),
+	"icache_tag_read(INDEX)",
+	"icache_tag_read(INDEX)",
+	({DST=sys_ictagr(thread, INDEX, insn->slot,REGNO);}),
+	()
+)
+
+DEF_MACRO(fICDATAR,(INDEX, DST),
+	"icache_data_read(INDEX)",
+	"icache_data_read(INDEX)",
+	({DST=sys_icdatar(thread, INDEX, insn->slot);}),
+	()
+)
+
+DEF_MACRO(fICTAGW,(INDEX,PART2),
+	"icache_tag_write(INDEX,PART2)",
+	"icache_tag_write(INDEX,PART2)",
+	(sys_ictagw(thread, INDEX, PART2, insn->slot)),
+	()
+)
+DEF_MACRO(fICDATAW,(INDEX,DATA),
+	"icache_data_write(INDEX,DATA)",
+	"icache_data_write(INDEX,DATA)",
+	({ fHIDE(); }),
+	()
+)
+
+DEF_MACRO(fL2FETCH,(ADDR,HEIGHT,WIDTH,STRIDE,FLAGS),
+	"l2fetch(ADDR,INFO)",
+	"l2fetch(ADDR,INFO)",
+	sys_l2fetch(thread, ADDR,HEIGHT,WIDTH,STRIDE,FLAGS, insn->slot),
+	(A_MEMLIKE,A_RESTRICT_SINGLE_MEM_FIRST,A_L2FETCH)
+)
+
+DEF_MACRO(fL2TAGR,(INDEX, DST, DSTREG),
+	"l2cache_tag_read(INDEX)",
+	"l2cache_tag_read(INDEX)",
+	({DST=sys_l2tagr(thread, INDEX, insn->slot, DSTREG);}),
+	()
+)
+
+DEF_MACRO(fL2LOCKA,(VA,DST,PREGDST),
+	"DST=l2locka(VA)",
+	"DST=l2locka(VA)",
+	do {DST=sys_l2locka(thread, VA, insn->slot, PREGDST); } while (0),
+	()
+)
+
+DEF_MACRO(fL2UNLOCKA,(VA),
+	"l2unlocka(VA)",
+	"l2unlocka(VA)",
+	sys_l2unlocka(thread, VA, insn->slot),
+	()
+)
+
+DEF_MACRO(fL2TAGW,(INDEX,PART2),
+	"l2cache_tag_write(INDEX,PART2)",
+	"l2cache_tag_write(INDEX,PART2)",
+	({sys_l2tagw(thread, INDEX, PART2, insn->slot);}),
+	()
+)
+
+DEF_MACRO(fDCCLEANIDX,(REG),
+	"dcache_clean_idx(REG)",
+	"dcache_clean_idx(REG)",
+	sys_dccleanidx(thread, (REG)),
+	(A_MEMLIKE,A_RESTRICT_SINGLE_MEM_FIRST)
+)
+
+DEF_MACRO(fDCCLEANA,(REG),
+	"dcache_clean_addr(REG)",
+	"dcache_clean_addr(REG)",
+	sys_dccleana(thread, (REG)),
+	(A_MEMLIKE,A_RESTRICT_SINGLE_MEM_FIRST)
+)
+
+DEF_MACRO(fDCCLEANINVIDX,(REG),
+	"dcache_cleaninv_idx(REG)",
+	"dcache_cleaninv_idx(REG)",
+	sys_dccleaninvidx(thread, (REG)),
+	(A_MEMLIKE,A_RESTRICT_SINGLE_MEM_FIRST)
+)
+
+DEF_MACRO(fDCCLEANINVA,(REG),
+	"dcache_cleaninv_addr(REG)",
+	"dcache_cleaninv_addr(REG)",
+	sys_dccleaninva(thread, (REG), insn->slot),
+	(A_MEMLIKE,A_RESTRICT_SINGLE_MEM_FIRST,A_DCCLEANINVA)
+)
+
+DEF_MACRO(fDCZEROA,(REG),
+	"dcache_zero_addr(REG)",
+	"dcache_zero_addr(REG)",
+	sys_dczeroa(thread, (REG)),
+	(A_MEMLIKE,A_RESTRICT_SINGLE_MEM_FIRST)
+)
+
+DEF_MACRO(fDCINVIDX,(REG),
+	"dcache_inv_idx(REG)",
+	"dcache_inv_idx(REG)",
+	sys_dcinvidx(thread, (REG)),
+	(A_MEMLIKE,A_RESTRICT_SINGLE_MEM_FIRST)
+)
+
+DEF_MACRO(fDCINVA,(REG),
+	"dcache_inv_addr(REG)",
+	"dcache_inv_addr(REG)",
+	sys_dcinva(thread, (REG)),
+	(A_MEMLIKE,A_RESTRICT_SINGLE_MEM_FIRST)
+)
+
+
+DEF_MACRO(fCHECKFORPRIV,(),
+	"priv_check();",
+	"priv_check();",
+	{sys_check_privs(thread); if (EXCEPTION_DETECTED) return; },
+	()
+)
+
+DEF_MACRO(fCHECKFORGUEST,(),
+	"priv_check();",
+	"priv_check();",
+	{sys_check_guest(thread); if (EXCEPTION_DETECTED) return; },
+	()
+)
+
+DEF_MACRO(fILLEGAL,(),
+	"illegal()",
+	"illegal()",
+	do {sys_illegal(thread); if (EXCEPTION_DETECTED) return; } while (0),
+	()
+)
+
+#ifdef NEW_INTERRUPTS
+
+DEF_MACRO(fTAKEN_INTERRUPT_EDGECLEAR,(proc,intnum),
+	"clear_ipend(intnum)",
+	"If the interrupt is edge triggered, clear the interrupt from IPEND due to being taken",
+	{ fWRITE_GLOBAL_REG_FIELD(IPENDAD,IPENDAD_IPEND,
+		fREAD_GLOBAL_REG_FIELD(IPENDAD,IPENDAD_IPEND) & ~(INT_NUMTOMASK(intnum))); },
+	()
+)
+
+DEF_MACRO(fSET_IAD,(thread,intnum),
+	"set_iad(intnum)",
+	"Set IAD bit corresponding to intnum",
+	{ sys_siad(thread,INT_NUMTOMASK(intnum));
+	  fWRITE_GLOBAL_REG_FIELD(IPENDAD,IPENDAD_IAD,
+		fREAD_GLOBAL_REG_FIELD(IPENDAD,IPENDAD_IAD) | INT_NUMTOMASK(intnum)); },
+	()
+)
+#else
+
+DEF_MACRO(fTAKEN_INTERRUPT_EDGECLEAR,(proc,intnum),
+        "clear_ipend(intnum)",
+        "If the interrupt is edge triggered, clear the interrupt from IPEND due to being taken",
+        { proc->global_regs[REG_IPEND] &= ~(INT_NUMTOMASK(intnum) & proc->global_regs[REG_IEL]); },
+        ()
+)
+
+DEF_MACRO(fSET_IAD,(thread,intnum),
+        "set_iad(intnum)",
+        "Set IAD bit corresponding to intnum",
+        { sys_siad(thread,INT_NUMTOMASK(intnum)); thread->processor_ptr->global_regs[REG_IAD] |= INT_NUMTOMASK(intnum); },
+        ()
+)
+
+#endif
+
+DEF_MACRO(fBRANCH_SPECULATED_RIGHT,(JC,SD,DOTNEWVAL),
+	"branch_speculated_right(JC,SD,DOTNEWVAL)",
+	"branch_speculated_right(JC,SD,DOTNEWVAL)",
+	(((JC) ^ (SD) ^ (DOTNEWVAL&1)) & 0x1),
+	()
+)
+
+
+
+DEF_MACRO(fBRANCH_SPECULATE_STALL,(DOTNEWVAL, JUMP_COND, SPEC_DIR, HINTBITNUM, STRBITNUM),
+	"",
+	"Check the .new predicate and stall if wrongly speculated.",
+	{
+sys_speculate_branch_stall(thread, insn->slot, JUMP_COND(JUMP_PRED_SET),
+										   SPEC_DIR,
+										   DOTNEWVAL,
+										   HINTBITNUM,
+										   STRBITNUM,
+										   0,
+										   thread->last_pkt->pkt_has_dual_jump,
+										   insn->is_2nd_jump,
+										   (thread->fetch_access.vaddr + insn->encoding_offset*4));
+  },
+	(A_BIMODAL_BRANCH)
+)
+
+
+DEF_MACRO(CACHE_MODIFY, (A, B),
+  "cache_modify()",
+  "Assigns B to A if not sandbox execution",
+  A=B
+  ,
+  ()
+)
+
+DEF_MACRO(SIM_BUSACCESS, (A,B,C,D,E,F,G,H,I),
+  "sim_busaccess_macro()",
+  "Sim bus access qualified with cache modify",
+   ({
+	sim_busaccess_t p = B->busaccess;
+	if (!p) p = sim_busaccess;
+	p(A,B,C,D,E,F,G,H,I);
+   })
+  ,
+  ()
+)
+
+DEF_MACRO(SIM_BUSACCESS_ACK, (A,B,C,D,E,F,G,H,I),
+  "sim_busaccess_macro()",
+  "Sim bus access with a return value and qualified with cache modify",
+	/* EJP: implicit "status" is probably bad. */
+  do {
+	sim_busaccess_t p = B->busaccess;
+	if (!p) p = sim_busaccess;
+	status = p(A,B,C,D,E,F,G,H,I);
+  } while (0)
+  ,
+  ()
+)
+
+DEF_MACRO(SET_PMU_EVENT_STATE, (_THREAD_, _PMU_EVENT_NUM_),
+  "set_pmu_state()",
+  "Sets the bit for an event in the PMU array in the thread",
+  (_THREAD_->processor_ptr->pmu_event_state[_PMU_EVENT_NUM_] = 1)
+  ,
+  ()
+)
+
+DEF_MACRO(CLEAR_PMU_EVENT_STATE, (_THREAD_, _PMU_EVENT_NUM_),
+  "clear_pmu_state()",
+  "Clears the bit for an event in the PMU array in the thread",
+  (_THREAD_->processor_ptr->pmu_event_state[_PMU_EVENT_NUM_] = 0)
+  ,
+  ()
+)
+
+DEF_MACRO(fNOP_EXECUTED,
+	,
+	"do nothing",
+	"some magic triggers for cache drive",
+	{ sys_nop_executed(thread);	},
+	()
+)
+
+DEF_MACRO(fSETMREG,
+	  (IDX, VAL),
+	  "fSETMREG(IDX,VAL)",   /* short desc */
+	  "SET MREG Vector IDX", /* long desc */
+          (thread->Mregs[IDX] = (VAL)),
+	  ()
+)
+
+DEF_MACRO(fGETMREG,
+	  (IDX),
+	  "fGETMREG(IDX)",   /* short desc */
+	  "GET MREG Vector IDX", /* long desc */
+          (thread->Mregs[IDX]),
+	  ()
+)
+
+
+
+
+
+
+
+
+DEF_MACRO(fXORBITS,(SUM,VAR,VEC),
+      "SUM = xor bitsin(VEC)",
+      "XOR all bits together in a 32bit register",
+      {
+        for (SUM=0,VAR=0;VAR<32;VAR++) {
+           SUM ^= VEC & 1;
+           VEC = VEC >> 1;
+        }
+      },
+)
+
+DEF_MACRO(fTIMING,(A),
+	"",
+	"",
+	if (UNLIKELY(thread->timing_on)) {
+		A;
+	},
+	()
+)
+
+DEF_MACRO(IV1DEAD,(),
+	"",
+	"",
+	,
+	() /*A_NOTE_NOISTARIV1*/
+)
+
+DEF_MACRO(FAKE,(),
+	"",
+	"",
+	,
+	(A_FAKEINSN)
+)
+
+DEF_MACRO(fIN_MONITOR_MODE,(),
+	"in_monitor_mode()",
+	"in_monitor_mode()",
+	sys_in_monitor_mode(thread),
+	(A_IMPLICIT_READS_SSR)
+)
+
+DEF_MACRO(fIN_USER_MODE,(),
+	"in_user_mode()",
+	"in_user_mode()",
+	sys_in_user_mode(thread),
+	(A_IMPLICIT_READS_SSR)
+)
+
+DEF_MACRO(fIN_GUEST_MODE,(),
+	"in_guest_mode()",
+	"in_guest_mode()",
+	sys_in_guest_mode(thread),
+	(A_IMPLICIT_READS_SSR)
+)
+
+DEF_MACRO(fGRE_ENABLED,(),
+	"CCR.GRE",
+	"CCR.GRE",
+	fREAD_REG_FIELD(CCR,CCR_GRE),
+	(A_IMPLICIT_READS_CCR)
+)
+
+DEF_MACRO(fGTE_ENABLED,(),
+	"CCR.GRE",
+	"CCR.GRE",
+	fREAD_REG_FIELD(CCR,CCR_GRE),
+	(A_IMPLICIT_READS_CCR)
+)
+
+DEF_MACRO(fTRAP1_VIRTINSN,(IMM),
+	"can_handle_trap1_virtinsn(IMM)",
+	"can_handle_trap1_virtinsn(IMM)",
+	((fIN_GUEST_MODE())
+		&& (fGRE_ENABLED())
+		&& (	((IMM) == 1)
+			|| ((IMM) == 3)
+			|| ((IMM) == 4)
+			|| ((IMM) == 6))),
+	()
+)
+
+DEF_MACRO(fTRAP0_TO_GUEST,(),
+	"can_handle_trap0_to_guest(IMM)",
+	"can_handle_trap0_to_guest(IMM)",
+	((!fIN_MONITOR_MODE())
+		&& (fGTE_ENABLED())),
+	()
+)
+
+DEF_MACRO(fVIRTINSN_RTE,(IMM,REG),
+	"VMRTE",
+	"VMRTE",
+	do {
+		thread->trap1_info = TRAP1_VIRTINSN_RTE;
+		fLOG_REG_FIELD(SSR,SSR_SS,fREAD_REG_FIELD(GSR,GSR_SS));
+		fLOG_REG_FIELD(CCR,CCR_GIE,fREAD_REG_FIELD(GSR,GSR_IE));
+		fLOG_REG_FIELD(SSR,SSR_GM,!fREAD_REG_FIELD(GSR,GSR_UM));
+		fBRANCH((fREAD_GELR() & -4),COF_TYPE_RTE);
+		fINTERNAL_CLEAR_SAMEPAGE();
+	} while (0),
+	(A_IMPLICIT_READS_GSR,A_IMPLICIT_WRITES_PC,A_IMPLICIT_WRITES_CCR,A_IMPLICIT_WRITES_SSR)
+)
+
+DEF_MACRO(fVIRTINSN_SETIE,(IMM,REG),
+	"VMSETIE",
+	"VMSETIE",
+	do { 	fLOG_REG_FIELD(CCR,CCR_GIE,(REG) & 1);
+		REG = fREAD_REG_FIELD(CCR,CCR_GIE);
+		thread->trap1_info = TRAP1_VIRTINSN_SETIE;
+	} while (0),
+	(A_IMPLICIT_READS_CCR,A_IMPLICIT_WRITES_CCR)
+)
+
+DEF_MACRO(fVIRTINSN_GETIE,(IMM,REG),
+	"VMGETIE",
+	"VMGETIE",
+	{ 	thread->trap1_info = TRAP1_VIRTINSN_GETIE;
+		REG = fREAD_REG_FIELD(CCR,CCR_GIE);
+	},
+
+	(A_IMPLICIT_READS_CCR)
+)
+
+DEF_MACRO(fVIRTINSN_SPSWAP,(IMM,REG),
+	"VMSPSWAP",
+	"VMSPSWAP",
+	do { if (fREAD_REG_FIELD(GSR,GSR_UM)) {
+		size4u_t TEMP = REG;
+		REG = fREAD_GOSP();
+		fWRITE_GOSP(TEMP);
+		thread->trap1_info = TRAP1_VIRTINSN_SPSWAP;
+	} } while (0),
+	(A_IMPLICIT_READS_GSR,A_IMPLICIT_READS_GOSP,A_IMPLICIT_WRITES_GOSP)
+)
+
+DEF_MACRO(fGUESTTRAP,(TRAPTYPE,IMM),
+	"GSR.CAUSE = IMM; TRAP # TRAPTYPE",
+	"GSR.CAUSE = IMM; TRAP # TRAPTYPE",
+	do {
+		if (TRAPTYPE == 0) {
+			CALLBACK(thread->processor_ptr->options->trap0_callback,
+					 thread->system_ptr, thread->processor_ptr,
+					 thread->threadId, IMM);
+		}
+		WRITE_RREG(REG_GELR,fREAD_NPC());
+		fLOG_REG_FIELD(GSR,GSR_UM,!fREAD_REG_FIELD(SSR,SSR_GM));
+		fLOG_REG_FIELD(GSR,GSR_SS,fREAD_REG_FIELD(SSR,SSR_SS));
+		fLOG_REG_FIELD(GSR,GSR_IE,fREAD_REG_FIELD(CCR,CCR_GIE));
+		fLOG_REG_FIELD(GSR,GSR_CAUSE,IMM);
+		fLOG_REG_FIELD(SSR,SSR_GM,1);
+		fLOG_REG_FIELD(SSR,SSR_SS,0);
+		fLOG_REG_FIELD(CCR,CCR_GIE,0);
+		fBRANCH(fREAD_GEVB() + ((EXCEPT_TYPE_TRAP##TRAPTYPE)<<2),COF_TYPE_TRAP);
+	} while (0),
+	()
+)
+
+DEF_MACRO(fPREDUSE_TIMING,(),
+	"PREDUSE_TIMING",
+	"PREDUSE_TIMING",
+	,
+	(A_PREDUSE_BSB)
+)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 18/67] Hexagon arch import - instruction encoding
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (16 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 17/67] Hexagon arch import - macro definitions Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-02-28 16:43 ` [RFC PATCH v2 19/67] Hexagon instruction class definitions Taylor Simpson
                   ` (49 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Imported from the Hexagon architecture library
    Instruction encoding bit patterns for every instruction

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/imported/encode.def         |  125 ++
 target/hexagon/imported/encode_pp.def      | 2283 ++++++++++++++++++++++++++++
 target/hexagon/imported/encode_subinsn.def |  150 ++
 3 files changed, 2558 insertions(+)
 create mode 100644 target/hexagon/imported/encode.def
 create mode 100644 target/hexagon/imported/encode_pp.def
 create mode 100644 target/hexagon/imported/encode_subinsn.def

diff --git a/target/hexagon/imported/encode.def b/target/hexagon/imported/encode.def
new file mode 100644
index 0000000..ae23301
--- /dev/null
+++ b/target/hexagon/imported/encode.def
@@ -0,0 +1,125 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+/*
+ * This just includes all encoding files
+ */
+
+#ifndef DEF_FIELD32
+#define __SELF_DEF_FIELD32
+#define DEF_FIELD32(...) /* nothing */
+#endif
+
+#ifndef DEF_CLASS32
+#define __SELF_DEF_CLASS32
+#define DEF_CLASS32(...) /* nothing */
+#endif
+
+#ifndef DEF_ANTICLASS32
+#define __SELF_DEF_ANTICLASS32
+#define DEF_ANTICLASS32(...) /* nothing */
+#endif
+
+#ifndef LEGACY_DEF_ENC32
+#define __SELF_DEF_LEGACY_DEF_ENC32
+#define LEGACY_DEF_ENC32(...) /* nothing */
+#endif
+
+#ifndef DEF_FIELDROW_DESC32
+#define __SELF_DEF_FIELDROW_DESC32
+#define DEF_FIELDROW_DESC32(...) /* nothing */
+#endif
+
+#ifndef DEF_ENC32
+#define __SELF_DEF_ENC32
+#define DEF_ENC32(...) /* nothing */
+#endif
+
+#ifndef DEF_PACKED32
+#define __SELF_DEF_PACKED32
+#define DEF_PACKED32(...) /* nothing */
+#endif
+
+#ifndef DEF_ENC_SUBINSN
+#define __SELF_DEF_ENC_SUBINSN
+#define DEF_ENC_SUBINSN(...) /* nothing */
+#endif
+
+#ifndef DEF_EXT_ENC
+#define __SELF_DEF_EXT_ENC
+#define DEF_EXT_ENC(...) /* nothing */
+#endif
+
+#ifndef DEF_EXT_SPACE
+#define __SELF_DEF_EXT_SPACE
+#define DEF_EXT_SPACE(...) /* nothing */
+#endif
+
+#include "encode_pp.def"
+#include "encode_subinsn.def"
+
+#ifdef __SELF_DEF_FIELD32
+#undef __SELF_DEF_FIELD32
+#undef DEF_FIELD32
+#endif
+
+#ifdef __SELF_DEF_CLASS32
+#undef __SELF_DEF_CLASS32
+#undef DEF_CLASS32
+#endif
+
+#ifdef __SELF_DEF_ANTICLASS32
+#undef __SELF_DEF_ANTICLASS32
+#undef DEF_ANTICLASS32
+#endif
+
+#ifdef __SELF_DEF_LEGACY_DEF_ENC32
+#undef __SELF_DEF_LEGACY_DEF_ENC32
+#undef LEGACY_DEF_ENC32
+#endif
+
+#ifdef __SELF_DEF_FIELDROW_DESC32
+#undef __SELF_DEF_FIELDROW_DESC32
+#undef DEF_FIELDROW_DESC32
+#endif
+
+#ifdef __SELF_DEF_ENC32
+#undef __SELF_DEF_ENC32
+#undef DEF_ENC32
+#endif
+
+#ifdef __SELF_DEF_EXT_SPACE
+#undef __SELF_DEF_EXT_SPACE
+#undef DEF_EXT_SPACE
+#endif
+
+
+#ifdef __SELF_DEF_PACKED32
+#undef __SELF_DEF_PACKED32
+#undef DEF_PACKED32
+#endif
+
+#ifdef __SELF_DEF_ENC_SUBINSN
+#undef __SELF_DEF_ENC_SUBINSN
+#undef DEF_ENC_SUBINSN
+#endif
+
+#ifdef __SELF_DEF_EXT_ENC
+#undef __SELF_DEF_EXT_ENC
+#undef DEF_EXT_ENC
+#endif
+
diff --git a/target/hexagon/imported/encode_pp.def b/target/hexagon/imported/encode_pp.def
new file mode 100644
index 0000000..41a183f
--- /dev/null
+++ b/target/hexagon/imported/encode_pp.def
@@ -0,0 +1,2283 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+/*
+ * Encodings for 32 bit instructions
+ *
+ */
+
+
+
+
+DEF_CLASS32("---- ---- -------- PP------ --------",ALL_PP)
+DEF_FIELD32("---- ---- -------- !!------ --------",Parse,"Packet/Loop parse bits")
+DEF_FIELD32("!!!! ---- -------- PP------ --------",ICLASS,"Instruction Class")
+
+#define FRAME_EXPLICIT 1
+
+
+#define ICLASS_EXTENDER   "0000"
+#define ICLASS_CJ         "0001"
+#define ICLASS_NCJ        "0010"
+#define ICLASS_V4LDST     "0011"
+#define ICLASS_V2LDST     "0100"
+#define ICLASS_J          "0101"
+#define ICLASS_CR         "0110"
+#define ICLASS_ALU2op     "0111"
+#define ICLASS_S2op       "1000"
+#define ICLASS_LD         "1001"
+#define ICLASS_ST         "1010"
+#define ICLASS_ADDI       "1011"
+#define ICLASS_S3op       "1100"
+#define ICLASS_ALU64      "1101"
+#define ICLASS_M          "1110"
+#define ICLASS_ALU3op     "1111"
+
+
+
+/*******************************/
+/*                             */
+/*                             */
+/*     V4 Immediate Payload    */
+/*                             */
+/*                             */
+/*******************************/
+
+DEF_CLASS32(ICLASS_EXTENDER" ---- -------- PP------ --------",EXTENDER)
+DEF_ENC32(A4_ext, ICLASS_EXTENDER "iiii iiiiiiii PPiiiiii iiiiiiii")
+
+
+
+/*******************************/
+/*                             */
+/*                             */
+/*     V2 PREDICATED LD/ST     */
+/*                             */
+/*                             */
+/*******************************/
+
+DEF_CLASS32(ICLASS_V2LDST" ---- -------- PP------ --------",V2LDST)
+DEF_CLASS32(ICLASS_V2LDST" ---1 -------- PP------ --------",V2LD)
+DEF_CLASS32(ICLASS_V2LDST" ---0 -------- PP------ --------",V2ST)
+DEF_CLASS32(ICLASS_V2LDST" 0--1 -------- PP------ --------",PLD)
+DEF_CLASS32(ICLASS_V2LDST" 0--0 -------- PP------ --------",PST)
+DEF_CLASS32(ICLASS_V2LDST" 1--1 -------- PP------ --------",GPLD)
+DEF_CLASS32(ICLASS_V2LDST" 1--0 -------- PP------ --------",GPST)
+
+DEF_FIELD32(ICLASS_V2LDST" 0!-- -------- PP------ --------",PMEM_Sense,"Sense")
+DEF_FIELD32(ICLASS_V2LDST" 0-!- -------- PP------ --------",PMEM_PredNew,"PredNew")
+DEF_FIELD32(ICLASS_V2LDST" ---1 !!------ PP------ --------",PMEML_Type,"Type")
+DEF_FIELD32(ICLASS_V2LDST" ---1 --!----- PP------ --------",PMEML_UN,"Unsigned")
+DEF_FIELD32(ICLASS_V2LDST" ---0 !!!----- PP------ --------",PMEMS_Type,"Type")
+
+#define STD_PLD_IOENC(TAG,OPC) \
+DEF_ENC32(L2_pload##TAG##t_io,   ICLASS_V2LDST" 0001 "OPC"  sssss  PP0ttiii  iiiddddd")\
+DEF_ENC32(L2_pload##TAG##f_io,   ICLASS_V2LDST" 0101 "OPC"  sssss  PP0ttiii  iiiddddd")\
+DEF_ENC32(L2_pload##TAG##tnew_io,ICLASS_V2LDST" 0011 "OPC"  sssss  PP0ttiii  iiiddddd")\
+DEF_ENC32(L2_pload##TAG##fnew_io,ICLASS_V2LDST" 0111 "OPC"  sssss  PP0ttiii  iiiddddd")
+
+STD_PLD_IOENC(rb,  "000")
+STD_PLD_IOENC(rub, "001")
+STD_PLD_IOENC(rh,  "010")
+STD_PLD_IOENC(ruh, "011")
+STD_PLD_IOENC(ri,  "100")
+STD_PLD_IOENC(rd,  "110") /* note dest reg field LSB=0, 1 is reserved */
+
+
+
+#define STD_PST_IOENC(TAG,OPC,SRC) \
+DEF_ENC32(S2_pstore##TAG##t_io,   ICLASS_V2LDST" 0000 "OPC"  sssss  PPi"SRC"  iiiii0vv")\
+DEF_ENC32(S2_pstore##TAG##f_io,   ICLASS_V2LDST" 0100 "OPC"  sssss  PPi"SRC"  iiiii0vv")\
+DEF_ENC32(S4_pstore##TAG##tnew_io,ICLASS_V2LDST" 0010 "OPC"  sssss  PPi"SRC"  iiiii0vv")\
+DEF_ENC32(S4_pstore##TAG##fnew_io,ICLASS_V2LDST" 0110 "OPC"  sssss  PPi"SRC"  iiiii0vv")
+
+STD_PST_IOENC(rb,    "000","ttttt")
+STD_PST_IOENC(rh,    "010","ttttt")
+STD_PST_IOENC(rf,    "011","ttttt")
+STD_PST_IOENC(ri,    "100","ttttt")
+STD_PST_IOENC(rd,    "110","ttttt")
+STD_PST_IOENC(rbnew, "101","00ttt")
+STD_PST_IOENC(rhnew, "101","01ttt")
+STD_PST_IOENC(rinew, "101","10ttt")
+
+
+
+
+
+/*******************************/
+/*                             */
+/*                             */
+/*     V2 GP-RELATIVE LD/ST    */
+/*                             */
+/*                             */
+/*******************************/
+#define STD_LD_GP(TAG,OPC) \
+DEF_ENC32(L2_load##TAG##gp,   ICLASS_V2LDST" 1ii1 "OPC"  iiiii  PPiiiiii  iiiddddd")
+
+STD_LD_GP(rb,  "000")
+STD_LD_GP(rub, "001")
+STD_LD_GP(rh,  "010")
+STD_LD_GP(ruh, "011")
+STD_LD_GP(ri,  "100")
+STD_LD_GP(rd,  "110") /* note dest reg field LSB=0, 1 is reserved */
+
+#define STD_ST_GP(TAG,OPC,SRC) \
+DEF_ENC32(S2_store##TAG##gp,  ICLASS_V2LDST" 1ii0 "OPC"  iiiii  PPi"SRC"  iiiiiiii")
+
+STD_ST_GP(rb,   "000","ttttt")
+STD_ST_GP(rh,   "010","ttttt")
+STD_ST_GP(rf,   "011","ttttt")
+STD_ST_GP(ri,   "100","ttttt")
+STD_ST_GP(rd,   "110","ttttt")
+STD_ST_GP(rbnew,"101","00ttt")
+STD_ST_GP(rhnew,"101","01ttt")
+STD_ST_GP(rinew,"101","10ttt")
+
+
+
+
+
+/*******************************/
+/*                             */
+/*                             */
+/*     V4LDST                  */
+/*                             */
+/*                             */
+/*******************************/
+
+
+DEF_CLASS32(ICLASS_V4LDST" ---- -------- PP------ --------",V4LDST)
+DEF_CLASS32(ICLASS_V4LDST" 0--- -------- PP------ --------",Pred_RplusR)
+DEF_CLASS32(ICLASS_V4LDST" 100- -------- PP------ --------",Pred_StoreImmed)
+DEF_CLASS32(ICLASS_V4LDST" 101- -------- PP------ --------",RplusR)
+DEF_CLASS32(ICLASS_V4LDST" 110- -------- PP------ --------",StoreImmed)
+DEF_CLASS32(ICLASS_V4LDST" 111- -------- PP------ --------",MemOp)
+
+
+
+
+/*******************************/
+/*    Pred (R+R)               */
+/*******************************/
+
+#define STD_PLD_RRENC(TAG,OPC) \
+DEF_ENC32(L4_pload##TAG##t_rr,   ICLASS_V4LDST" 00 00 "OPC"  sssss  PPittttt  ivvddddd")\
+DEF_ENC32(L4_pload##TAG##f_rr,   ICLASS_V4LDST" 00 01 "OPC"  sssss  PPittttt  ivvddddd")\
+DEF_ENC32(L4_pload##TAG##tnew_rr,ICLASS_V4LDST" 00 10 "OPC"  sssss  PPittttt  ivvddddd")\
+DEF_ENC32(L4_pload##TAG##fnew_rr,ICLASS_V4LDST" 00 11 "OPC"  sssss  PPittttt  ivvddddd")
+
+STD_PLD_RRENC(rb,  "000")
+STD_PLD_RRENC(rub, "001")
+STD_PLD_RRENC(rh,  "010")
+STD_PLD_RRENC(ruh, "011")
+STD_PLD_RRENC(ri,  "100")
+STD_PLD_RRENC(rd,  "110")
+
+#define STD_PST_RRENC(TAG,OPC,SRC) \
+DEF_ENC32(S4_pstore##TAG##t_rr,   ICLASS_V4LDST" 01 00 "OPC"  sssss  PPiuuuuu  ivv"SRC)\
+DEF_ENC32(S4_pstore##TAG##f_rr,   ICLASS_V4LDST" 01 01 "OPC"  sssss  PPiuuuuu  ivv"SRC)\
+DEF_ENC32(S4_pstore##TAG##tnew_rr,ICLASS_V4LDST" 01 10 "OPC"  sssss  PPiuuuuu  ivv"SRC)\
+DEF_ENC32(S4_pstore##TAG##fnew_rr,ICLASS_V4LDST" 01 11 "OPC"  sssss  PPiuuuuu  ivv"SRC)
+
+STD_PST_RRENC(rb,    "000","ttttt")
+STD_PST_RRENC(rh,    "010","ttttt")
+STD_PST_RRENC(rf,    "011","ttttt")
+STD_PST_RRENC(ri,    "100","ttttt")
+STD_PST_RRENC(rd,    "110","ttttt")
+STD_PST_RRENC(rbnew, "101","00ttt")
+STD_PST_RRENC(rhnew, "101","01ttt")
+STD_PST_RRENC(rinew, "101","10ttt")
+
+
+
+/*******************************/
+/*     Pred Store immediates   */
+/*******************************/
+
+#define V4_PSTI(TAG,OPC) \
+DEF_ENC32(S4_storei##TAG##t_io,    ICLASS_V4LDST" 100 00  "OPC"  sssss  PPIiiiii  ivvIIIII")\
+DEF_ENC32(S4_storei##TAG##f_io,    ICLASS_V4LDST" 100 01  "OPC"  sssss  PPIiiiii  ivvIIIII")\
+DEF_ENC32(S4_storei##TAG##tnew_io, ICLASS_V4LDST" 100 10  "OPC"  sssss  PPIiiiii  ivvIIIII")\
+DEF_ENC32(S4_storei##TAG##fnew_io, ICLASS_V4LDST" 100 11  "OPC"  sssss  PPIiiiii  ivvIIIII")
+
+V4_PSTI(rb, "00")
+V4_PSTI(rh, "01")
+V4_PSTI(ri, "10")
+
+
+
+/*******************************/
+/*    (R+R)                    */
+/*******************************/
+
+#define STD_LD_RRENC(TAG,OPC) \
+DEF_ENC32(L4_load##TAG##_rr,     ICLASS_V4LDST" 1010 "OPC"  sssss  PPittttt  i--ddddd")
+
+STD_LD_RRENC(rb,  "000")
+STD_LD_RRENC(rub, "001")
+STD_LD_RRENC(rh,  "010")
+STD_LD_RRENC(ruh, "011")
+STD_LD_RRENC(ri,  "100")
+STD_LD_RRENC(rd,  "110")
+
+#define STD_ST_RRENC(TAG,OPC,SRC) \
+DEF_ENC32(S4_store##TAG##_rr,     ICLASS_V4LDST" 1011 "OPC"  sssss  PPiuuuuu  i--"SRC)
+
+STD_ST_RRENC(rb,    "000","ttttt")
+STD_ST_RRENC(rh,    "010","ttttt")
+STD_ST_RRENC(rf,    "011","ttttt")
+STD_ST_RRENC(ri,    "100","ttttt")
+STD_ST_RRENC(rd,    "110","ttttt")
+STD_ST_RRENC(rbnew, "101","00ttt")
+STD_ST_RRENC(rhnew, "101","01ttt")
+STD_ST_RRENC(rinew, "101","10ttt")
+
+
+
+
+/*******************************/
+/*     Store immediates        */
+/*******************************/
+
+#define V4_STI(TAG,OPC) \
+DEF_ENC32(S4_storei##TAG##_io,     ICLASS_V4LDST" 110 -- "OPC"  sssss  PPIiiiii  iIIIIIII")
+
+
+V4_STI(rb, "00")
+V4_STI(rh, "01")
+V4_STI(ri, "10")
+
+
+/*******************************/
+/*     Memops                 */
+/*******************************/
+
+#define MEMOPENC(TAG,OPC) \
+DEF_ENC32(L4_add_##TAG##_io,         ICLASS_V4LDST" 111 0- " OPC "sssss  PP0iiiii  i00ttttt")\
+DEF_ENC32(L4_sub_##TAG##_io,         ICLASS_V4LDST" 111 0- " OPC "sssss  PP0iiiii  i01ttttt")\
+DEF_ENC32(L4_and_##TAG##_io,         ICLASS_V4LDST" 111 0- " OPC "sssss  PP0iiiii  i10ttttt")\
+DEF_ENC32(L4_or_##TAG##_io,          ICLASS_V4LDST" 111 0- " OPC "sssss  PP0iiiii  i11ttttt")\
+\
+DEF_ENC32(L4_iadd_##TAG##_io,        ICLASS_V4LDST" 111 1- " OPC "sssss  PP0iiiii  i00IIIII")\
+DEF_ENC32(L4_isub_##TAG##_io,        ICLASS_V4LDST" 111 1- " OPC "sssss  PP0iiiii  i01IIIII")\
+DEF_ENC32(L4_iand_##TAG##_io,        ICLASS_V4LDST" 111 1- " OPC "sssss  PP0iiiii  i10IIIII")\
+DEF_ENC32(L4_ior_##TAG##_io,         ICLASS_V4LDST" 111 1- " OPC "sssss  PP0iiiii  i11IIIII")
+
+
+
+MEMOPENC(memopw,"10")
+MEMOPENC(memoph,"01")
+MEMOPENC(memopb,"00")
+
+
+
+
+/*******************************/
+/*                             */
+/*                             */
+/*           LOAD              */
+/*                             */
+/*                             */
+/*******************************/
+DEF_CLASS32(ICLASS_LD" ---- -------- PP------ --------",LD)
+
+
+DEF_CLASS32(ICLASS_LD" 0--- -------- PP------ --------",LD_ADDR_ROFFSET)
+DEF_CLASS32(ICLASS_LD" 100- -------- PP----0- --------",LD_ADDR_POST_CIRC_IMMED)
+DEF_CLASS32(ICLASS_LD" 101- -------- PP00---- --------",LD_ADDR_POST_IMMED)
+DEF_CLASS32(ICLASS_LD" 101- -------- PP01---- --------",LD_ADDR_ABS_UPDATE_V4)
+DEF_CLASS32(ICLASS_LD" 101- -------- PP1----- --------",LD_ADDR_POST_IMMED_PRED_V2)
+DEF_CLASS32(ICLASS_LD" 110- -------- PP-0---- 0-------",LD_ADDR_POST_REG)
+DEF_CLASS32(ICLASS_LD" 110- -------- PP-1---- --------",LD_ADDR_ABS_PLUS_REG_V4)
+DEF_CLASS32(ICLASS_LD" 100- -------- PP----1- --------",LD_ADDR_POST_CREG_V2)
+DEF_CLASS32(ICLASS_LD" 111- -------- PP------ 0-------",LD_ADDR_POST_BREV_REG)
+DEF_CLASS32(ICLASS_LD" 111- -------- PP------ 1-------",LD_ADDR_PRED_ABS_V4)
+
+DEF_FIELD32(ICLASS_LD" !!!- -------- PP------ --------",LD_Amode,"Amode")
+DEF_FIELD32(ICLASS_LD" ---! !!------ PP------ --------",LD_Type,"Type")
+DEF_FIELD32(ICLASS_LD" ---- --!----- PP------ --------",LD_UN,"Unsigned")
+
+#define STD_LD_ENC(TAG,OPC) \
+DEF_ENC32(L2_load##TAG##_io,   ICLASS_LD" 0 ii "OPC"  sssss  PPiiiiii  iiiddddd")\
+DEF_ENC32(L2_load##TAG##_pci,  ICLASS_LD" 1 00 "OPC"  xxxxx  PPu0--0i  iiiddddd")\
+DEF_ENC32(L2_load##TAG##_pi,   ICLASS_LD" 1 01 "OPC"  xxxxx  PP00---i  iiiddddd")\
+DEF_ENC32(L4_load##TAG##_ap,   ICLASS_LD" 1 01 "OPC"  eeeee  PP01IIII  -IIddddd")\
+DEF_ENC32(L2_load##TAG##_pr,   ICLASS_LD" 1 10 "OPC"  xxxxx  PPu0----  0--ddddd")\
+DEF_ENC32(L4_load##TAG##_ur,   ICLASS_LD" 1 10 "OPC"  ttttt  PPi1IIII  iIIddddd")\
+DEF_ENC32(L2_load##TAG##_pcr,  ICLASS_LD" 1 00 "OPC"  xxxxx  PPu0--1-  0--ddddd")\
+DEF_ENC32(L2_load##TAG##_pbr,  ICLASS_LD" 1 11 "OPC"  xxxxx  PPu0----  0--ddddd")
+
+
+#define STD_LDX_ENC(TAG,OPC) \
+DEF_ENC32(L2_load##TAG##_io,   ICLASS_LD" 0 ii "OPC"  sssss  PPiiiiii  iiiyyyyy")\
+DEF_ENC32(L2_load##TAG##_pci,  ICLASS_LD" 1 00 "OPC"  xxxxx  PPu0--0i  iiiyyyyy")\
+DEF_ENC32(L2_load##TAG##_pi,   ICLASS_LD" 1 01 "OPC"  xxxxx  PP00---i  iiiyyyyy")\
+DEF_ENC32(L4_load##TAG##_ap,   ICLASS_LD" 1 01 "OPC"  eeeee  PP01IIII  -IIyyyyy")\
+DEF_ENC32(L2_load##TAG##_pr,   ICLASS_LD" 1 10 "OPC"  xxxxx  PPu0----  0--yyyyy")\
+DEF_ENC32(L4_load##TAG##_ur,   ICLASS_LD" 1 10 "OPC"  ttttt  PPi1IIII  iIIyyyyy")\
+DEF_ENC32(L2_load##TAG##_pcr,  ICLASS_LD" 1 00 "OPC"  xxxxx  PPu0--1-  0--yyyyy")\
+DEF_ENC32(L2_load##TAG##_pbr,  ICLASS_LD" 1 11 "OPC"  xxxxx  PPu0----  0--yyyyy")
+
+
+#define STD_PLD_ENC(TAG,OPC) \
+DEF_ENC32(L2_pload##TAG##t_pi,    ICLASS_LD" 1 01 "OPC"  xxxxx  PP100tti  iiiddddd")\
+DEF_ENC32(L2_pload##TAG##f_pi,    ICLASS_LD" 1 01 "OPC"  xxxxx  PP101tti  iiiddddd")\
+DEF_ENC32(L2_pload##TAG##tnew_pi, ICLASS_LD" 1 01 "OPC"  xxxxx  PP110tti  iiiddddd")\
+DEF_ENC32(L2_pload##TAG##fnew_pi, ICLASS_LD" 1 01 "OPC"  xxxxx  PP111tti  iiiddddd")\
+DEF_ENC32(L4_pload##TAG##t_abs,   ICLASS_LD" 1 11 "OPC"  iiiii  PP100tti  1--ddddd")\
+DEF_ENC32(L4_pload##TAG##f_abs,   ICLASS_LD" 1 11 "OPC"  iiiii  PP101tti  1--ddddd")\
+DEF_ENC32(L4_pload##TAG##tnew_abs,ICLASS_LD" 1 11 "OPC"  iiiii  PP110tti  1--ddddd")\
+DEF_ENC32(L4_pload##TAG##fnew_abs,ICLASS_LD" 1 11 "OPC"  iiiii  PP111tti  1--ddddd")
+
+
+/*               0 000  misc: dealloc,loadw_locked,dcfetch      */
+STD_LD_ENC(bzw4,"0 101")
+STD_LD_ENC(bzw2,"0 011")
+
+STD_LD_ENC(bsw4,"0 111")
+STD_LD_ENC(bsw2,"0 001")
+
+STD_LDX_ENC(alignh,"0 010")
+STD_LDX_ENC(alignb,"0 100")
+
+STD_LD_ENC(rb,  "1 000")
+STD_LD_ENC(rub, "1 001")
+STD_LD_ENC(rh,  "1 010")
+STD_LD_ENC(ruh, "1 011")
+STD_LD_ENC(ri,  "1 100")
+STD_LD_ENC(rd,  "1 110") /* note dest reg field LSB=0, 1 is reserved */
+
+STD_PLD_ENC(rb,  "1 000")
+STD_PLD_ENC(rub, "1 001")
+STD_PLD_ENC(rh,  "1 010")
+STD_PLD_ENC(ruh, "1 011")
+STD_PLD_ENC(ri,  "1 100")
+STD_PLD_ENC(rd,  "1 110") /* note dest reg field LSB=0, 1 is reserved */
+
+
+DEF_CLASS32(    ICLASS_LD" 0--0 000----- PP------ --------",LD_MISC)
+DEF_ANTICLASS32(ICLASS_LD" 0--0 000----- PP------ --------",LD_ADDR_ROFFSET)
+DEF_ANTICLASS32(ICLASS_LD" 1000 000----- PP------ --------",LD_ADDR_POST_CIRC_IMMED)
+DEF_ANTICLASS32(ICLASS_LD" 1010 000----- PP------ --------",LD_ADDR_POST_IMMED)
+DEF_ANTICLASS32(ICLASS_LD" 1100 000----- PP------ --------",LD_ADDR_POST_REG)
+DEF_ANTICLASS32(ICLASS_LD" 1110 000----- PP------ --------",LD_ADDR_POST_REG)
+
+DEF_ENC32(L2_deallocframe,    ICLASS_LD" 000 0 000 sssss PP0----- ---ddddd")
+DEF_ENC32(L4_return,          ICLASS_LD" 011 0 000 sssss PP0000-- ---ddddd")
+DEF_ENC32(L4_return_t,        ICLASS_LD" 011 0 000 sssss PP0100vv ---ddddd")
+DEF_ENC32(L4_return_f,        ICLASS_LD" 011 0 000 sssss PP1100vv ---ddddd")
+DEF_ENC32(L4_return_tnew_pt,  ICLASS_LD" 011 0 000 sssss PP0110vv ---ddddd")
+DEF_ENC32(L4_return_fnew_pt,  ICLASS_LD" 011 0 000 sssss PP1110vv ---ddddd")
+DEF_ENC32(L4_return_tnew_pnt, ICLASS_LD" 011 0 000 sssss PP0010vv ---ddddd")
+DEF_ENC32(L4_return_fnew_pnt, ICLASS_LD" 011 0 000 sssss PP1010vv ---ddddd")
+
+DEF_ENC32(L2_loadw_locked,ICLASS_LD" 001 0 000 sssss PP00---- -00ddddd")
+DEF_ENC32(L4_loadw_phys,  ICLASS_LD" 001 0 000 sssss PP1ttttt -00ddddd")
+
+
+
+
+
+
+DEF_ENC32(L4_loadd_locked,ICLASS_LD" 001 0 000 sssss PP01---- -00ddddd")
+DEF_EXT_SPACE(EXTRACTW,   ICLASS_LD" 001 0 000 iiiii PP0iiiii -01iiiii")
+DEF_ENC32(Y2_dcfetchbo,   ICLASS_LD" 010 0 000 sssss PP0--iii iiiiiiii")
+
+
+
+
+
+
+
+
+/*******************************/
+/*                             */
+/*                             */
+/*           STORE             */
+/*                             */
+/*                             */
+/*******************************/
+
+DEF_CLASS32(ICLASS_ST" ---- -------- PP------ --------",ST)
+
+DEF_FIELD32(ICLASS_ST" !!!- -------- PP------ --------",ST_Amode,"Amode")
+DEF_FIELD32(ICLASS_ST" ---! !!------ PP------ --------",ST_Type,"Type")
+DEF_FIELD32(ICLASS_ST" ---- --!----- PP------ --------",ST_UN,"Unsigned")
+
+DEF_CLASS32(ICLASS_ST" 0--1 -------- PP------ --------",ST_ADDR_ROFFSET)
+DEF_CLASS32(ICLASS_ST" 1001 -------- PP------ ------0-",ST_ADDR_POST_CIRC_IMMED)
+DEF_CLASS32(ICLASS_ST" 1011 -------- PP0----- 0-----0-",ST_ADDR_POST_IMMED)
+DEF_CLASS32(ICLASS_ST" 1011 -------- PP0----- 1-------",ST_ADDR_ABS_UPDATE_V4)
+DEF_CLASS32(ICLASS_ST" 1011 -------- PP1----- --------",ST_ADDR_POST_IMMED_PRED_V2)
+DEF_CLASS32(ICLASS_ST" 1111 -------- PP------ 1-------",ST_ADDR_PRED_ABS_V4)
+DEF_CLASS32(ICLASS_ST" 1101 -------- PP------ 0-------",ST_ADDR_POST_REG)
+DEF_CLASS32(ICLASS_ST" 1101 -------- PP------ 1-------",ST_ADDR_ABS_PLUS_REG_V4)
+DEF_CLASS32(ICLASS_ST" 1001 -------- PP------ ------1-",ST_ADDR_POST_CREG_V2)
+DEF_CLASS32(ICLASS_ST" 1111 -------- PP------ 0-------",ST_ADDR_POST_BREV_REG)
+DEF_CLASS32(ICLASS_ST" 0--0 1------- PP------ --------",ST_MISC_STORELIKE)
+DEF_CLASS32(ICLASS_ST" 1--0 0------- PP------ --------",ST_MISC_BUSOP)
+DEF_CLASS32(ICLASS_ST" 0--0 0------- PP------ --------",ST_MISC_CACHEOP)
+
+
+#define STD_ST_ENC(TAG,OPC,SRC) \
+DEF_ENC32(S2_store##TAG##_io,   ICLASS_ST" 0 ii "OPC"  sssss  PPi"SRC"  iiiiiiii")\
+DEF_ENC32(S2_store##TAG##_pci,  ICLASS_ST" 1 00 "OPC"  xxxxx  PPu"SRC"  0iiii-0-")\
+DEF_ENC32(S2_store##TAG##_pi,   ICLASS_ST" 1 01 "OPC"  xxxxx  PP0"SRC"  0iiii-0-")\
+DEF_ENC32(S4_store##TAG##_ap,   ICLASS_ST" 1 01 "OPC"  eeeee  PP0"SRC"  1-IIIIII")\
+DEF_ENC32(S2_store##TAG##_pr,   ICLASS_ST" 1 10 "OPC"  xxxxx  PPu"SRC"  0-------")\
+DEF_ENC32(S4_store##TAG##_ur,   ICLASS_ST" 1 10 "OPC"  uuuuu  PPi"SRC"  1iIIIIII")\
+DEF_ENC32(S2_store##TAG##_pcr,  ICLASS_ST" 1 00 "OPC"  xxxxx  PPu"SRC"  0-----1-")\
+DEF_ENC32(S2_store##TAG##_pbr,  ICLASS_ST" 1 11 "OPC"  xxxxx  PPu"SRC"  0-------")
+
+
+#define STD_PST_ENC(TAG,OPC,SRC) \
+DEF_ENC32(S2_pstore##TAG##t_pi,    ICLASS_ST" 1 01 "OPC"  xxxxx  PP1"SRC"  0iiii0vv")\
+DEF_ENC32(S2_pstore##TAG##f_pi,    ICLASS_ST" 1 01 "OPC"  xxxxx  PP1"SRC"  0iiii1vv")\
+DEF_ENC32(S2_pstore##TAG##tnew_pi, ICLASS_ST" 1 01 "OPC"  xxxxx  PP1"SRC"  1iiii0vv")\
+DEF_ENC32(S2_pstore##TAG##fnew_pi, ICLASS_ST" 1 01 "OPC"  xxxxx  PP1"SRC"  1iiii1vv")\
+DEF_ENC32(S4_pstore##TAG##t_abs,   ICLASS_ST" 1 11 "OPC"  ---ii  PP0"SRC"  1iiii0vv")\
+DEF_ENC32(S4_pstore##TAG##f_abs,   ICLASS_ST" 1 11 "OPC"  ---ii  PP0"SRC"  1iiii1vv")\
+DEF_ENC32(S4_pstore##TAG##tnew_abs,ICLASS_ST" 1 11 "OPC"  ---ii  PP1"SRC"  1iiii0vv")\
+DEF_ENC32(S4_pstore##TAG##fnew_abs,ICLASS_ST" 1 11 "OPC"  ---ii  PP1"SRC"  1iiii1vv")
+
+
+/*                 0 0--  Store Misc */
+/*                 0 1xx  Available */
+STD_ST_ENC(rb,    "1 000","ttttt")
+STD_ST_ENC(rh,    "1 010","ttttt")
+STD_ST_ENC(rf,    "1 011","ttttt")
+STD_ST_ENC(ri,    "1 100","ttttt")
+STD_ST_ENC(rd,    "1 110","ttttt")
+STD_ST_ENC(rbnew, "1 101","00ttt")
+STD_ST_ENC(rhnew, "1 101","01ttt")
+STD_ST_ENC(rinew, "1 101","10ttt")
+
+STD_PST_ENC(rb,    "1 000","ttttt")
+STD_PST_ENC(rh,    "1 010","ttttt")
+STD_PST_ENC(rf,    "1 011","ttttt")
+STD_PST_ENC(ri,    "1 100","ttttt")
+STD_PST_ENC(rd,    "1 110","ttttt")
+STD_PST_ENC(rbnew, "1 101","00ttt")
+STD_PST_ENC(rhnew, "1 101","01ttt")
+STD_PST_ENC(rinew, "1 101","10ttt")
+
+
+
+/* User */
+/*                                   xx - st_misc */
+/*                                                */
+/*                               x bus/cache     */
+/*                                    x store/cache     */
+#ifdef FRAME_EXPLICIT
+DEF_ENC32(S2_allocframe,   ICLASS_ST" 000 01 00xxxxx PP000iii iiiiiiii")
+#else
+DEF_ENC32(S2_allocframe,   ICLASS_ST" 000 01 0011101 PP000iii iiiiiiii")
+#endif
+DEF_ENC32(S2_storew_locked,ICLASS_ST" 000 01 01sssss PP-ttttt ------dd")
+DEF_ENC32(S4_stored_locked,ICLASS_ST" 000 01 11sssss PP0ttttt ------dd")
+DEF_ENC32(Y5_l2locka,      ICLASS_ST" 000 01 11sssss PP1----- ------dd")
+DEF_ENC32(Y2_dczeroa,      ICLASS_ST" 000 01 10sssss PP0----- --------")
+
+
+DEF_ENC32(Y2_barrier,      ICLASS_ST" 100 00 00----- PP------ 000-----")
+DEF_ENC32(Y2_syncht,       ICLASS_ST" 100 00 10----- PP------ --------")
+DEF_ENC32(Y2_l2kill,       ICLASS_ST" 100 00 01----- PP-000-- --------")
+DEF_ENC32(Y5_l2gunlock,    ICLASS_ST" 100 00 01----- PP-010-- --------")
+DEF_ENC32(Y5_l2gclean,     ICLASS_ST" 100 00 01----- PP-100-- --------")
+DEF_ENC32(Y5_l2gcleaninv,  ICLASS_ST" 100 00 01----- PP-110-- --------")
+DEF_ENC32(Y2_l2cleaninvidx,ICLASS_ST" 100 00 11sssss PP------ --------")
+
+
+
+DEF_ENC32(Y2_dccleana,     ICLASS_ST" 000 00 00sssss PP------ --------")
+DEF_ENC32(Y2_dcinva,       ICLASS_ST" 000 00 01sssss PP------ --------")
+DEF_ENC32(Y2_dccleaninva,  ICLASS_ST" 000 00 10sssss PP------ --------")
+
+/* Super */
+DEF_ENC32(Y2_dckill,       ICLASS_ST" 001 00 00----- PP------ --------")
+DEF_ENC32(Y2_dccleanidx,   ICLASS_ST" 001 00 01sssss PP------ --------")
+DEF_ENC32(Y2_dcinvidx,     ICLASS_ST" 001 00 10sssss PP------ --------")
+DEF_ENC32(Y2_dccleaninvidx,ICLASS_ST" 001 00 11sssss PP------ --------")
+
+DEF_ENC32(Y2_dctagw       ,ICLASS_ST" 010 00 00sssss PP-ttttt --------")
+DEF_ENC32(Y2_dctagr       ,ICLASS_ST" 010 00 01sssss PP------ ---ddddd")
+
+DEF_ENC32(Y4_l2tagw       ,ICLASS_ST" 010 00 10sssss PP0ttttt --------")
+DEF_ENC32(Y4_l2tagr       ,ICLASS_ST" 010 00 11sssss PP------ ---ddddd")
+
+DEF_ENC32(Y4_l2fetch,      ICLASS_ST" 011 00 00sssss PP-ttttt 000-----")
+DEF_ENC32(Y5_l2cleanidx,   ICLASS_ST" 011 00 01sssss PP------ --------")
+DEF_ENC32(Y5_l2invidx,     ICLASS_ST" 011 00 10sssss PP------ --------")
+DEF_ENC32(Y5_l2unlocka,    ICLASS_ST" 011 00 11sssss PP------ --------")
+DEF_ENC32(Y5_l2fetch,      ICLASS_ST" 011 01 00sssss PP-ttttt --------")
+
+DEF_ENC32(Y6_l2gcleanpa,   ICLASS_ST" 011 01 01----- PP-ttttt --------")
+DEF_ENC32(Y6_l2gcleaninvpa,ICLASS_ST" 011 01 10----- PP-ttttt --------")
+
+
+
+
+
+
+
+
+
+
+
+
+/*******************************/
+/*                             */
+/*                             */
+/*           JUMP              */
+/*                             */
+/*                             */
+/*******************************/
+
+DEF_CLASS32(ICLASS_J" ---- -------- PP------ --------",J)
+DEF_CLASS32(ICLASS_J" 0--- -------- PP------ --------",JUMPR_MISC)
+DEF_CLASS32(ICLASS_J" 10-- -------- PP------ --------",UCJUMP)
+DEF_CLASS32(ICLASS_J" 110- -------- PP------ --------",CJUMP)
+DEF_FIELD32(ICLASS_J" 110- -------- PP--!--- --------",J_DN,"Dot-new")
+DEF_FIELD32(ICLASS_J" 110- -------- PP-!---- --------",J_PT,"Predict-taken")
+
+
+
+DEF_FIELDROW_DESC32(ICLASS_J" 0000 -------- PP------ --------","[#0] PC=(Rs), R31=return")
+DEF_ENC32(J2_callr,     ICLASS_J" 0000  101sssss  PP------  --------")
+
+DEF_FIELDROW_DESC32(ICLASS_J" 0001 -------- PP------ --------","[#1] if (Pu) PC=(Rs), R31=return")
+DEF_ENC32(J2_callrt,    ICLASS_J" 0001  000sssss  PP----uu  --------")
+DEF_ENC32(J2_callrf,    ICLASS_J" 0001  001sssss  PP----uu  --------")
+
+DEF_FIELDROW_DESC32(ICLASS_J" 0010 -------- PP------ --------","[#2] PC=(Rs); ")
+DEF_ENC32(J2_jumpr,      ICLASS_J" 0010  100sssss  PP------  --------")
+DEF_ENC32(J4_hintjumpr,  ICLASS_J" 0010  101sssss  PP------  --------")
+
+DEF_FIELDROW_DESC32(ICLASS_J" 0011 -------- PP------ --------","[#3] if (Pu) PC=(Rs) ")
+DEF_ENC32(J2_jumprt,   ICLASS_J" 0011  010sssss  PP-00-uu  --------")
+DEF_ENC32(J2_jumprf,   ICLASS_J" 0011  011sssss  PP-00-uu  --------")
+DEF_ENC32(J2_jumprtpt,    ICLASS_J" 0011  010sssss  PP-10-uu  --------")
+DEF_ENC32(J2_jumprfpt,    ICLASS_J" 0011  011sssss  PP-10-uu  --------")
+DEF_ENC32(J2_jumprtnew,   ICLASS_J" 0011  010sssss  PP-01-uu  --------")
+DEF_ENC32(J2_jumprfnew,   ICLASS_J" 0011  011sssss  PP-01-uu  --------")
+DEF_ENC32(J2_jumprtnewpt, ICLASS_J" 0011  010sssss  PP-11-uu  --------")
+DEF_ENC32(J2_jumprfnewpt, ICLASS_J" 0011  011sssss  PP-11-uu  --------")
+
+DEF_FIELDROW_DESC32(ICLASS_J" 0100 -------- PP------ --------","[#4] (#u8) ")
+DEF_ENC32(J2_trap0,     ICLASS_J" 0100  00------  PP-iiiii  ---iii--")
+DEF_ENC32(J2_trap1,     ICLASS_J" 0100  10-xxxxx  PP-iiiii  ---iii--")
+DEF_ENC32(J2_pause,     ICLASS_J" 0100  01------  PP-iiiii  ---iii--")
+
+DEF_FIELDROW_DESC32(ICLASS_J" 0101 -------- PP------ --------","[#5] Rd=(Rs) ")
+DEF_ENC32(Y2_icdatar,   ICLASS_J" 0101  101sssss  PP------  ---ddddd")
+DEF_ENC32(Y2_ictagr,    ICLASS_J" 0101  111sssss  PP------  ---ddddd")
+DEF_ENC32(Y2_ictagw,    ICLASS_J" 0101  110sssss  PP0ttttt  --------")
+DEF_ENC32(Y2_icdataw,   ICLASS_J" 0101  110sssss  PP1ttttt  --------")
+
+DEF_FIELDROW_DESC32(ICLASS_J" 0110 -------- PP------ --------","[#6] icop(Rs) ")
+DEF_ENC32(Y2_icinva,    ICLASS_J" 0110  110sssss  PP000---  --------")
+DEF_ENC32(Y2_icinvidx,  ICLASS_J" 0110  110sssss  PP001---  --------")
+DEF_ENC32(Y2_ickill,    ICLASS_J" 0110  110-----  PP010---  --------")
+
+DEF_FIELDROW_DESC32(ICLASS_J" 0111 -------- PP------ --------","[#7] () ")
+DEF_ENC32(Y2_isync,     ICLASS_J" 0111  11000000  PP0---00  00000010")
+DEF_ENC32(J2_rte,       ICLASS_J" 0111  111-----  PP00----  000-----")
+
+/* JUMP */
+DEF_FIELDROW_DESC32(ICLASS_J" 100- -------- PP------ --------","[#8,9] PC=(#r22)")
+DEF_ENC32(J2_jump,      ICLASS_J" 100i  iiiiiiii  PPiiiiii  iiiiiii-")
+
+DEF_FIELDROW_DESC32(ICLASS_J" 101- -------- PP------ --------","[#10,11] PC=(#r22), R31=return")
+DEF_ENC32(J2_call,      ICLASS_J" 101i  iiiiiiii  PPiiiiii  iiiiiii0")
+
+DEF_FIELDROW_DESC32(ICLASS_J" 1100 -------- PP------ --------","[#12] if (Pu) PC=(#r15)")
+DEF_ENC32(J2_jumpt,  ICLASS_J" 1100  ii0iiiii  PPi00-uu  iiiiiii-")
+DEF_ENC32(J2_jumpf,  ICLASS_J" 1100  ii1iiiii  PPi00-uu  iiiiiii-")
+DEF_ENC32(J2_jumptpt,   ICLASS_J" 1100  ii0iiiii  PPi10-uu  iiiiiii-")
+DEF_ENC32(J2_jumpfpt,   ICLASS_J" 1100  ii1iiiii  PPi10-uu  iiiiiii-")
+DEF_ENC32(J2_jumptnew,  ICLASS_J" 1100  ii0iiiii  PPi01-uu  iiiiiii-")
+DEF_ENC32(J2_jumpfnew,  ICLASS_J" 1100  ii1iiiii  PPi01-uu  iiiiiii-")
+DEF_ENC32(J2_jumptnewpt,ICLASS_J" 1100  ii0iiiii  PPi11-uu  iiiiiii-")
+DEF_ENC32(J2_jumpfnewpt,ICLASS_J" 1100  ii1iiiii  PPi11-uu  iiiiiii-")
+
+DEF_FIELDROW_DESC32(ICLASS_J" 1101 -------- PP------ --------","[#13] if (Pu) PC=(#r15), R31=return")
+DEF_ENC32(J2_callt,     ICLASS_J" 1101  ii0iiiii  PPi-0-uu  iiiiiii-")
+DEF_ENC32(J2_callf,     ICLASS_J" 1101  ii1iiiii  PPi-0-uu  iiiiiii-")
+
+
+
+
+
+
+
+/*******************************/
+/*                             */
+/*        V4                   */
+/*   COMPOUND COMPARE-JUMPS    */
+/*                             */
+/*                             */
+/*******************************/
+
+
+/* EJP: this has to match what we have in htmldocs.py... so I will call it CJ, we can change it */
+DEF_CLASS32(ICLASS_CJ" 0--- -------- PP------ --------",CJ)
+
+DEF_FIELDROW_DESC32(ICLASS_CJ" 00-- -------- -------- --------","[#0-3]  pd=cmp.xx(R,#u5) ; if ([!]p0.new) jump:[h] #s9:2 ")
+DEF_FIELDROW_DESC32(ICLASS_CJ" 010- -------- -------- --------","[#4,5]  pd=cmp.eq(R,R) ; if ([!]p0.new) jump:[h] #s9:2 ")
+DEF_FIELDROW_DESC32(ICLASS_CJ" 0110 -------- -------- --------","[#6]    Rd=#u6 ; jump #s9:2 ")
+DEF_FIELDROW_DESC32(ICLASS_CJ" 0111 -------- -------- --------","[#7]    Rd=Rs ; jump #s9:2 ")
+
+
+#define CMPJMPI_ENC(TAG,OPC) \
+DEF_ENC32(TAG##i_tp0_jump_t,      ICLASS_CJ" 00 0 "OPC"  0iissss  PP1IIIII  iiiiiii-") \
+DEF_ENC32(TAG##i_fp0_jump_t,      ICLASS_CJ" 00 0 "OPC"  1iissss  PP1IIIII  iiiiiii-") \
+DEF_ENC32(TAG##i_tp0_jump_nt,     ICLASS_CJ" 00 0 "OPC"  0iissss  PP0IIIII  iiiiiii-") \
+DEF_ENC32(TAG##i_fp0_jump_nt,     ICLASS_CJ" 00 0 "OPC"  1iissss  PP0IIIII  iiiiiii-") \
+\
+DEF_ENC32(TAG##i_tp1_jump_t,      ICLASS_CJ" 00 1 "OPC"  0iissss  PP1IIIII  iiiiiii-") \
+DEF_ENC32(TAG##i_fp1_jump_t,      ICLASS_CJ" 00 1 "OPC"  1iissss  PP1IIIII  iiiiiii-") \
+DEF_ENC32(TAG##i_tp1_jump_nt,     ICLASS_CJ" 00 1 "OPC"  0iissss  PP0IIIII  iiiiiii-") \
+DEF_ENC32(TAG##i_fp1_jump_nt,     ICLASS_CJ" 00 1 "OPC"  1iissss  PP0IIIII  iiiiiii-")
+
+CMPJMPI_ENC(J4_cmpeq,"00")
+CMPJMPI_ENC(J4_cmpgt,"01")
+CMPJMPI_ENC(J4_cmpgtu,"10")
+
+
+#define CMPJMP1I_ENC(TAG,OPC) \
+DEF_ENC32(TAG##_tp0_jump_t,      ICLASS_CJ" 00 0  11  0iissss  PP1---"OPC"  iiiiiii-") \
+DEF_ENC32(TAG##_fp0_jump_t,      ICLASS_CJ" 00 0  11  1iissss  PP1---"OPC"  iiiiiii-") \
+DEF_ENC32(TAG##_tp0_jump_nt,     ICLASS_CJ" 00 0  11  0iissss  PP0---"OPC"  iiiiiii-") \
+DEF_ENC32(TAG##_fp0_jump_nt,     ICLASS_CJ" 00 0  11  1iissss  PP0---"OPC"  iiiiiii-") \
+\
+DEF_ENC32(TAG##_tp1_jump_t,      ICLASS_CJ" 00 1  11  0iissss  PP1---"OPC"  iiiiiii-") \
+DEF_ENC32(TAG##_fp1_jump_t,      ICLASS_CJ" 00 1  11  1iissss  PP1---"OPC"  iiiiiii-") \
+DEF_ENC32(TAG##_tp1_jump_nt,     ICLASS_CJ" 00 1  11  0iissss  PP0---"OPC"  iiiiiii-") \
+DEF_ENC32(TAG##_fp1_jump_nt,     ICLASS_CJ" 00 1  11  1iissss  PP0---"OPC"  iiiiiii-")
+
+CMPJMP1I_ENC(J4_cmpeqn1,"00")
+CMPJMP1I_ENC(J4_cmpgtn1,"01")
+CMPJMP1I_ENC(J4_tstbit0,"11")
+
+
+
+#define CMPJMPR_ENC(TAG,OPC) \
+DEF_ENC32(TAG##_tp0_jump_t,       ICLASS_CJ" 01 0 "OPC"  0iissss  PP10tttt  iiiiiii-") \
+DEF_ENC32(TAG##_fp0_jump_t,       ICLASS_CJ" 01 0 "OPC"  1iissss  PP10tttt  iiiiiii-") \
+DEF_ENC32(TAG##_tp0_jump_nt,      ICLASS_CJ" 01 0 "OPC"  0iissss  PP00tttt  iiiiiii-") \
+DEF_ENC32(TAG##_fp0_jump_nt,      ICLASS_CJ" 01 0 "OPC"  1iissss  PP00tttt  iiiiiii-") \
+\
+DEF_ENC32(TAG##_tp1_jump_t,       ICLASS_CJ" 01 0 "OPC"  0iissss  PP11tttt  iiiiiii-") \
+DEF_ENC32(TAG##_fp1_jump_t,       ICLASS_CJ" 01 0 "OPC"  1iissss  PP11tttt  iiiiiii-") \
+DEF_ENC32(TAG##_tp1_jump_nt,      ICLASS_CJ" 01 0 "OPC"  0iissss  PP01tttt  iiiiiii-") \
+DEF_ENC32(TAG##_fp1_jump_nt,      ICLASS_CJ" 01 0 "OPC"  1iissss  PP01tttt  iiiiiii-")
+
+CMPJMPR_ENC(J4_cmpeq,"00")
+CMPJMPR_ENC(J4_cmpgt,"01")
+CMPJMPR_ENC(J4_cmpgtu,"10")
+
+
+DEF_ENC32(J4_jumpseti,            ICLASS_CJ" 0110  --iidddd  PPIIIIII  iiiiiii-")
+DEF_ENC32(J4_jumpsetr,            ICLASS_CJ" 0111  --iissss  PP--dddd  iiiiiii-")
+
+
+DEF_EXT_SPACE(EXT_CJ,             ICLASS_CJ"1 iii  iiiiiiii  PPiiiiii  iiiiiiii")
+
+
+
+DEF_CLASS32(ICLASS_NCJ" 0--- -------- PP------ --------",NCJ)
+DEF_FIELDROW_DESC32(ICLASS_NCJ" 00-- -------- -------- --------","[#0-3] if (cmp.xx(R.new,R)) jump:[h] #s9:2 ")
+DEF_FIELDROW_DESC32(ICLASS_NCJ" 01-- -------- -------- --------","[#4-7] if (cmp.xx(R.new,#U5)) jump:[h] #s9:2 ")
+
+#define OPRJMP_ENC(TAG,OPC) \
+DEF_ENC32(TAG##_t_jumpnv_t,       ICLASS_NCJ" 00 "OPC"  0ii-sss  PP1ttttt  iiiiiii-") \
+DEF_ENC32(TAG##_f_jumpnv_t,       ICLASS_NCJ" 00 "OPC"  1ii-sss  PP1ttttt  iiiiiii-") \
+DEF_ENC32(TAG##_t_jumpnv_nt,      ICLASS_NCJ" 00 "OPC"  0ii-sss  PP0ttttt  iiiiiii-") \
+DEF_ENC32(TAG##_f_jumpnv_nt,      ICLASS_NCJ" 00 "OPC"  1ii-sss  PP0ttttt  iiiiiii-")
+
+OPRJMP_ENC(J4_cmpeq,   "000")
+OPRJMP_ENC(J4_cmpgt,   "001")
+OPRJMP_ENC(J4_cmpgtu,  "010")
+OPRJMP_ENC(J4_cmplt,   "011")
+OPRJMP_ENC(J4_cmpltu,  "100")
+
+
+#define OPIJMP_ENC(TAG,OPC) \
+DEF_ENC32(TAG##_t_jumpnv_t,       ICLASS_NCJ" 01 "OPC"  0ii-sss  PP1IIIII  iiiiiii-") \
+DEF_ENC32(TAG##_f_jumpnv_t,       ICLASS_NCJ" 01 "OPC"  1ii-sss  PP1IIIII  iiiiiii-") \
+DEF_ENC32(TAG##_t_jumpnv_nt,      ICLASS_NCJ" 01 "OPC"  0ii-sss  PP0IIIII  iiiiiii-") \
+DEF_ENC32(TAG##_f_jumpnv_nt,      ICLASS_NCJ" 01 "OPC"  1ii-sss  PP0IIIII  iiiiiii-")
+
+OPIJMP_ENC(J4_cmpeqi,  "000")
+OPIJMP_ENC(J4_cmpgti,  "001")
+OPIJMP_ENC(J4_cmpgtui, "010")
+
+
+#define OPI1JMP_ENC(TAG,OPC) \
+DEF_ENC32(TAG##_t_jumpnv_t,       ICLASS_NCJ" 01 "OPC"  0ii-sss  PP1-----  iiiiiii-") \
+DEF_ENC32(TAG##_f_jumpnv_t,       ICLASS_NCJ" 01 "OPC"  1ii-sss  PP1-----  iiiiiii-") \
+DEF_ENC32(TAG##_t_jumpnv_nt,      ICLASS_NCJ" 01 "OPC"  0ii-sss  PP0-----  iiiiiii-") \
+DEF_ENC32(TAG##_f_jumpnv_nt,      ICLASS_NCJ" 01 "OPC"  1ii-sss  PP0-----  iiiiiii-")
+
+OPI1JMP_ENC(J4_cmpeqn1,  "100")
+OPI1JMP_ENC(J4_cmpgtn1,  "101")
+OPI1JMP_ENC(J4_tstbit0,  "011")
+
+
+DEF_EXT_SPACE(EXT_NCJ,             ICLASS_NCJ"1 iii  iiiiiiii  PPiiiiii  iiiiiiii")
+
+
+
+/*******************************/
+/*                             */
+/*                             */
+/*           CR                */
+/*                             */
+/*                             */
+/*******************************/
+
+
+
+DEF_CLASS32(ICLASS_CR" ---- -------- PP------ --------",CR)
+DEF_CLASS32(ICLASS_CR" -0-- -------- PP------ --------",CRUSER)
+DEF_CLASS32(ICLASS_CR" -1-- -------- PP------ --------",CRSUPER)
+
+DEF_FIELD32(ICLASS_CR" -!-- -------- PP------ --------",CR_sm,"Supervisor mode only")
+
+/* User CR ops */
+
+DEF_FIELDROW_DESC32(    ICLASS_CR" 0000  --------  PP------  --------","[#0] (Rs,#r8)")
+DEF_ENC32(J2_loop0r,    ICLASS_CR" 0000  000sssss  PP-iiiii  ---ii---")
+DEF_ENC32(J2_loop1r,    ICLASS_CR" 0000  001sssss  PP-iiiii  ---ii---")
+DEF_ENC32(J2_ploop1sr,  ICLASS_CR" 0000  101sssss  PP-iiiii  ---ii---")
+DEF_ENC32(J2_ploop2sr,  ICLASS_CR" 0000  110sssss  PP-iiiii  ---ii---")
+DEF_ENC32(J2_ploop3sr,  ICLASS_CR" 0000  111sssss  PP-iiiii  ---ii---")
+
+DEF_FIELDROW_DESC32(     ICLASS_CR" 0001  --------  PP------  --------","[#1] (Rs,#r13)")
+DEF_ENC32(J2_jumprz,     ICLASS_CR" 0001  00isssss  PPi0iiii  iiiiiii-")
+DEF_ENC32(J2_jumprzpt,   ICLASS_CR" 0001  00isssss  PPi1iiii  iiiiiii-")
+DEF_ENC32(J2_jumprnz,    ICLASS_CR" 0001  10isssss  PPi0iiii  iiiiiii-")
+DEF_ENC32(J2_jumprnzpt,  ICLASS_CR" 0001  10isssss  PPi1iiii  iiiiiii-")
+
+DEF_ENC32(J2_jumprgtez,  ICLASS_CR" 0001  01isssss  PPi0iiii  iiiiiii-")
+DEF_ENC32(J2_jumprgtezpt,ICLASS_CR" 0001  01isssss  PPi1iiii  iiiiiii-")
+DEF_ENC32(J2_jumprltez,  ICLASS_CR" 0001  11isssss  PPi0iiii  iiiiiii-")
+DEF_ENC32(J2_jumprltezpt,ICLASS_CR" 0001  11isssss  PPi1iiii  iiiiiii-")
+
+DEF_FIELDROW_DESC32(    ICLASS_CR" 0010  --------  PP------  --------","[#2] Cd=Rs ")
+DEF_ENC32(A2_tfrrcr,    ICLASS_CR" 0010  001sssss  PP------  ---ddddd")
+DEF_ENC32(G4_tfrgrcr,   ICLASS_CR" 0010  000sssss  PP------  ---ddddd")
+DEF_ENC32(Y4_trace,     ICLASS_CR" 0010  010sssss  PP------  000-----")
+DEF_ENC32(Y6_diag,      ICLASS_CR" 0010  010sssss  PP------  001-----")
+DEF_ENC32(Y6_diag0,     ICLASS_CR" 0010  010sssss  PP-ttttt  010-----")
+DEF_ENC32(Y6_diag1,     ICLASS_CR" 0010  010sssss  PP-ttttt  011-----")
+
+DEF_FIELDROW_DESC32(    ICLASS_CR" 0011  --------  PP------  --------","[#3] Cdd=Rss ")
+DEF_ENC32(A4_tfrpcp,    ICLASS_CR" 0011  001sssss  PP------  ---ddddd")
+DEF_ENC32(G4_tfrgpcp,   ICLASS_CR" 0011  000sssss  PP------  ---ddddd")
+
+DEF_FIELDROW_DESC32(    ICLASS_CR" 1000  --------  PP------  --------","[#8] Rdd=Css ")
+DEF_ENC32(A4_tfrcpp,    ICLASS_CR" 1000  000sssss  PP------  ---ddddd")
+DEF_ENC32(G4_tfrgcpp,   ICLASS_CR" 1000  001sssss  PP------  ---ddddd")
+
+DEF_FIELDROW_DESC32(    ICLASS_CR" 1001  --------  PP------  --------","[#9] (#r8,#U10)")
+DEF_ENC32(J2_ploop1si,  ICLASS_CR" 1001  101IIIII  PP-iiiii  IIIii-II")
+DEF_ENC32(J2_ploop2si,  ICLASS_CR" 1001  110IIIII  PP-iiiii  IIIii-II")
+DEF_ENC32(J2_ploop3si,  ICLASS_CR" 1001  111IIIII  PP-iiiii  IIIii-II")
+DEF_ENC32(J2_loop0i,    ICLASS_CR" 1001  000IIIII  PP-iiiii  IIIii-II")
+DEF_ENC32(J2_loop1i,    ICLASS_CR" 1001  001IIIII  PP-iiiii  IIIii-II")
+
+DEF_FIELDROW_DESC32(    ICLASS_CR" 1010  --------  PP------  --------","[#10] Rd=Cs ")
+DEF_ENC32(A2_tfrcrr,    ICLASS_CR" 1010  000sssss  PP------  ---ddddd")
+DEF_ENC32(G4_tfrgcrr,   ICLASS_CR" 1010  001sssss  PP------  ---ddddd")
+DEF_ENC32(C4_addipc,    ICLASS_CR" 1010  01001001  PP-iiiii  i--ddddd")
+
+
+DEF_FIELDROW_DESC32(    ICLASS_CR" 1011  --------  PP0-----  --------","[#11] Pd=(Ps,Pt,Pu)")
+DEF_ENC32(C2_and,       ICLASS_CR" 1011  0000--ss  PP0---tt  ------dd")
+DEF_ENC32(C2_or,        ICLASS_CR" 1011  0010--ss  PP0---tt  ------dd")
+DEF_ENC32(C2_xor,       ICLASS_CR" 1011  0100--ss  PP0---tt  ------dd")
+DEF_ENC32(C2_andn,      ICLASS_CR" 1011  0110--ss  PP0---tt  ------dd")
+DEF_ENC32(C2_any8,      ICLASS_CR" 1011  1000--ss  PP0-----  ------dd")
+DEF_ENC32(C2_all8,      ICLASS_CR" 1011  1010--ss  PP0-----  ------dd")
+DEF_ENC32(C2_not,       ICLASS_CR" 1011  1100--ss  PP0-----  ------dd")
+DEF_ENC32(C2_orn,       ICLASS_CR" 1011  1110--ss  PP0---tt  ------dd")
+
+DEF_ENC32(C4_and_and,   ICLASS_CR" 1011  0001--ss  PP0---tt  uu----dd")
+DEF_ENC32(C4_and_or,    ICLASS_CR" 1011  0011--ss  PP0---tt  uu----dd")
+DEF_ENC32(C4_or_and,    ICLASS_CR" 1011  0101--ss  PP0---tt  uu----dd")
+DEF_ENC32(C4_or_or,     ICLASS_CR" 1011  0111--ss  PP0---tt  uu----dd")
+DEF_ENC32(C4_and_andn,  ICLASS_CR" 1011  1001--ss  PP0---tt  uu----dd")
+DEF_ENC32(C4_and_orn,   ICLASS_CR" 1011  1011--ss  PP0---tt  uu----dd")
+DEF_ENC32(C4_or_andn,   ICLASS_CR" 1011  1101--ss  PP0---tt  uu----dd")
+DEF_ENC32(C4_or_orn,    ICLASS_CR" 1011  1111--ss  PP0---tt  uu----dd")
+
+DEF_ENC32(C4_fastcorner9,	ICLASS_CR"1011 0000--ss  PP1---tt 1--1--dd")
+DEF_ENC32(C4_fastcorner9_not,	ICLASS_CR"1011 0001--ss  PP1---tt 1--1--dd")
+
+
+
+/* Supervisor CR ops */
+/* Interrupts */
+DEF_FIELDROW_DESC32(   ICLASS_CR" 0100 -------- PP------  --------","[#4] (Rs,Pt)")
+DEF_ENC32(Y2_swi,      ICLASS_CR" 0100 000sssss PP------ 000-----")
+DEF_ENC32(Y2_cswi,     ICLASS_CR" 0100 000sssss PP------ 001-----")
+DEF_ENC32(Y2_iassignw, ICLASS_CR" 0100 000sssss PP------ 010-----")
+DEF_ENC32(Y2_ciad,     ICLASS_CR" 0100 000sssss PP------ 011-----")
+DEF_ENC32(Y2_setimask, ICLASS_CR" 0100 100sssss PP----tt 000-----")
+DEF_ENC32(Y2_setprio,  ICLASS_CR" 0100 100sssss PP----tt 001-----")
+DEF_ENC32(Y4_siad,     ICLASS_CR" 0100 100sssss PP------ 011-----")
+
+DEF_ENC32(Y2_wait,     ICLASS_CR" 0100 010sssss PP------ 000-----")
+DEF_ENC32(Y2_resume,   ICLASS_CR" 0100 010sssss PP------ 001-----")
+DEF_ENC32(Y2_stop,     ICLASS_CR" 0100 011sssss PP------ 000-----")
+DEF_ENC32(Y2_start,    ICLASS_CR" 0100 011sssss PP------ 001-----")
+DEF_ENC32(Y4_nmi,      ICLASS_CR" 0100 011sssss PP------ 010-----")
+
+DEF_FIELDROW_DESC32(   ICLASS_CR" 0101 -------- PP------  --------","[#5] Rx ")
+DEF_ENC32(Y2_crswap0,  ICLASS_CR" 0101 000xxxxx PP------ --------")
+DEF_ENC32(Y4_crswap1,  ICLASS_CR" 0101 001xxxxx PP------ --------")
+
+DEF_FIELDROW_DESC32(   ICLASS_CR" 0110 -------- PP------  --------","[#6] Rd=(Rs)")
+DEF_ENC32(Y2_getimask, ICLASS_CR" 0110 000sssss PP------ ---ddddd")
+DEF_ENC32(Y2_iassignr, ICLASS_CR" 0110 011sssss PP------ ---ddddd")
+
+DEF_FIELDROW_DESC32(   ICLASS_CR" 0111 -------- PP------  --------","[#7] cr=Rs ")
+DEF_ENC32(Y2_tfrsrcr,  ICLASS_CR" 0111 00-sssss PP------ -ddddddd")
+#ifdef PTWALK
+DEF_ENC32(Y6_tfrarcr,  ICLASS_CR" 0111 01-sssss PP------ --dddddd")
+#endif
+
+DEF_FIELDROW_DESC32(   ICLASS_CR" 1100 -------- PP------  --------","[#12] ")
+DEF_ENC32(Y2_break,    ICLASS_CR" 1100 001----- PP------ 000-----")
+DEF_ENC32(Y2_tlblock,  ICLASS_CR" 1100 001----- PP------ 001-----")
+DEF_ENC32(Y2_tlbunlock,ICLASS_CR" 1100 001----- PP------ 010-----")
+DEF_ENC32(Y2_k0lock,   ICLASS_CR" 1100 001----- PP------ 011-----")
+DEF_ENC32(Y2_k0unlock, ICLASS_CR" 1100 001----- PP------ 100-----")
+DEF_ENC32(Y2_tlbp,     ICLASS_CR" 1100 100sssss PP------ ---ddddd")
+DEF_ENC32(Y5_tlboc,    ICLASS_CR" 1100 111sssss PP------ ---ddddd")
+DEF_ENC32(Y5_tlbasidi, ICLASS_CR" 1100 101sssss PP------ --------")
+DEF_ENC32(Y2_tlbr,     ICLASS_CR" 1100 010sssss PP------ ---ddddd")
+DEF_ENC32(Y2_tlbw,     ICLASS_CR" 1100 000sssss PP0ttttt --------")
+DEF_ENC32(Y5_ctlbw,    ICLASS_CR" 1100 110sssss PP0ttttt ---ddddd")
+
+DEF_FIELDROW_DESC32(   ICLASS_CR" 1101 -------- PP------  --------","[#13] Rxx ")
+DEF_ENC32(Y4_crswap10, ICLASS_CR" 1101 10-xxxxx PP------ ---00000")
+DEF_ENC32(Y4_tfrspcp,  ICLASS_CR" 1101 00-sssss PP------ -ddddddd")
+#ifdef PTWALK
+DEF_ENC32(Y6_tfrapcp,  ICLASS_CR" 1101 01-sssss PP------ --dddddd")
+#endif
+
+DEF_FIELDROW_DESC32(   ICLASS_CR" 1110 -------- PP------  --------","[#14] Rd=cr ")
+DEF_ENC32(Y2_tfrscrr,  ICLASS_CR" 1110 1sssssss PP------ ---ddddd")
+#ifdef PTWALK
+DEF_ENC32(Y6_tfracrr,  ICLASS_CR" 1110 0-ssssss PP------ ---ddddd")
+#endif
+
+DEF_FIELDROW_DESC32(   ICLASS_CR" 1111 -------- PP------  --------","[#15] Rdd=Sss ")
+DEF_ENC32(Y4_tfrscpp,  ICLASS_CR" 1111 0sssssss PP------ ---ddddd")
+#ifdef PTWALK
+DEF_ENC32(Y6_tfracpp,  ICLASS_CR" 1111 1-ssssss PP------ ---ddddd")
+#endif
+
+
+
+
+
+
+
+
+/*******************************/
+/*                             */
+/*                             */
+/*           M                 */
+/*                             */
+/*                             */
+/*******************************/
+
+
+DEF_CLASS32(ICLASS_M" ---- -------- PP------ --------",M)
+DEF_FIELD32(ICLASS_M" !!!! -------- PP------ --------",M_RegType,"Register Type")
+DEF_FIELD32(ICLASS_M" ---- !!!----- PP------ --------",M_MajOp,"Major Opcode")
+DEF_FIELD32(ICLASS_M" ---- -------- PP------ !!!-----",M_MinOp,"Minor Opcode")
+
+
+
+#define SP_MPY(TAG,REGTYPE,DSTCHARS,SAT,RND,UNS)\
+DEF_ENC32(TAG##_ll_s0, ICLASS_M  REGTYPE "0"  UNS RND"sssss  PP-ttttt "SAT"00"   DSTCHARS)\
+DEF_ENC32(TAG##_lh_s0, ICLASS_M  REGTYPE "0"  UNS RND"sssss  PP-ttttt "SAT"01"   DSTCHARS)\
+DEF_ENC32(TAG##_hl_s0, ICLASS_M  REGTYPE "0"  UNS RND"sssss  PP-ttttt "SAT"10"   DSTCHARS)\
+DEF_ENC32(TAG##_hh_s0, ICLASS_M  REGTYPE "0"  UNS RND"sssss  PP-ttttt "SAT"11"   DSTCHARS)\
+DEF_ENC32(TAG##_ll_s1, ICLASS_M  REGTYPE "1"  UNS RND"sssss  PP-ttttt "SAT"00"   DSTCHARS)\
+DEF_ENC32(TAG##_lh_s1, ICLASS_M  REGTYPE "1"  UNS RND"sssss  PP-ttttt "SAT"01"   DSTCHARS)\
+DEF_ENC32(TAG##_hl_s1, ICLASS_M  REGTYPE "1"  UNS RND"sssss  PP-ttttt "SAT"10"   DSTCHARS)\
+DEF_ENC32(TAG##_hh_s1, ICLASS_M  REGTYPE "1"  UNS RND"sssss  PP-ttttt "SAT"11"   DSTCHARS)
+
+/* Double precision                   */
+#define MPY_ENC(TAG,REGTYPE,DSTCHARS,SAT,RNDNAC,UNS,SHFT,VMIN2)\
+DEF_ENC32(TAG, ICLASS_M REGTYPE SHFT UNS RNDNAC"sssss  PP0ttttt "SAT VMIN2 DSTCHARS)
+
+#define MPYI_ENC(TAG,REGTYPE,DSTCHARS,RNDNAC,UNS,SHFT)\
+DEF_ENC32(TAG, ICLASS_M REGTYPE SHFT UNS RNDNAC"sssss  PP0iiiii iii" DSTCHARS)
+
+
+DEF_FIELDROW_DESC32(ICLASS_M" 0000 -------- PP------ --------","[#0] Rd=(Rs,#u8)")
+MPYI_ENC(M2_mpysip,          "0000","ddddd","-","-","0"     )
+MPYI_ENC(M2_mpysin,          "0000","ddddd","-","-","1"     )
+
+
+DEF_FIELDROW_DESC32(ICLASS_M" 0001 -------- PP------ --------","[#1] Rx=(Rs,#u8)")
+MPYI_ENC(M2_macsip,          "0001","xxxxx","-","-","0"     )
+MPYI_ENC(M2_macsin,          "0001","xxxxx","-","-","1"     )
+
+
+DEF_FIELDROW_DESC32(ICLASS_M" 0010 -------- PP------ --------","[#2] Rx=(Rs,#s8)")
+MPYI_ENC(M2_accii,           "0010","xxxxx","-","-","0"     )
+MPYI_ENC(M2_naccii,          "0010","xxxxx","-","-","1"     )
+
+
+DEF_FIELDROW_DESC32(ICLASS_M" 0011 -------- PP------ --------","[#3] Ry=(Ru,(Rs,Ry)) ")
+DEF_ENC32(M4_mpyrr_addr,ICLASS_M" 0011 000sssss PP-yyyyy ---uuuuu")
+
+
+DEF_FIELDROW_DESC32(ICLASS_M" 0100 -------- PP------ --------","[#4] Rdd=(Rs,Rt)")
+DEF_FIELD32(ICLASS_M"         0100 -------- PP------ --!-----",Ma_tH,"Rt is High") /*Rt high */
+DEF_FIELD32(ICLASS_M"         0100 -------- PP------ -!------",Ma_sH,"Rs is High") /* Rs high */
+SP_MPY(M2_mpyd,              "0100","ddddd","-","0","0")
+SP_MPY(M2_mpyd_rnd,          "0100","ddddd","-","1","0")
+SP_MPY(M2_mpyud,             "0100","ddddd","-","0","1")
+
+
+DEF_FIELDROW_DESC32(ICLASS_M" 0101 -------- PP------ --------","[#5] Rdd=(Rs,Rt)")
+MPY_ENC(M2_dpmpyss_s0,       "0101","ddddd","0","0","0","0","00")
+MPY_ENC(M2_dpmpyuu_s0,       "0101","ddddd","0","0","1","0","00")
+MPY_ENC(M2_vmpy2s_s0,        "0101","ddddd","1","0","0","0","01")
+MPY_ENC(M2_vmpy2s_s1,        "0101","ddddd","1","0","0","1","01")
+MPY_ENC(M2_cmpyi_s0,         "0101","ddddd","0","0","0","0","01")
+MPY_ENC(M2_cmpyr_s0,         "0101","ddddd","0","0","0","0","10")
+MPY_ENC(M2_cmpys_s0,         "0101","ddddd","1","0","0","0","10")
+MPY_ENC(M2_cmpys_s1,         "0101","ddddd","1","0","0","1","10")
+MPY_ENC(M2_cmpysc_s0,        "0101","ddddd","1","0","1","0","10")
+MPY_ENC(M2_cmpysc_s1,        "0101","ddddd","1","0","1","1","10")
+MPY_ENC(M2_vmpy2su_s0,       "0101","ddddd","1","0","0","0","11")
+MPY_ENC(M2_vmpy2su_s1,       "0101","ddddd","1","0","0","1","11")
+MPY_ENC(M4_pmpyw,            "0101","ddddd","1","0","1","0","11")
+MPY_ENC(M4_vpmpyh,           "0101","ddddd","1","0","1","1","11")
+MPY_ENC(M5_vmpybuu,          "0101","ddddd","0","0","0","1","01")
+MPY_ENC(M5_vmpybsu,          "0101","ddddd","0","0","1","0","01")
+
+
+
+
+DEF_FIELDROW_DESC32(ICLASS_M" 0110 -------- PP------ --------","[#6] Rxx=(Rs,Rt)")
+DEF_FIELD32(ICLASS_M"         0110 -------- PP------ --!-----",Mb_tH,"Rt is High") /*Rt high */
+DEF_FIELD32(ICLASS_M"         0110 -------- PP------ -!------",Mb_sH,"Rs is High") /* Rs high */
+SP_MPY(M2_mpyd_acc,          "0110","xxxxx","0","0","0")
+SP_MPY(M2_mpyud_acc,         "0110","xxxxx","0","0","1")
+SP_MPY(M2_mpyd_nac,          "0110","xxxxx","0","1","0")
+SP_MPY(M2_mpyud_nac,         "0110","xxxxx","0","1","1")
+
+
+DEF_FIELDROW_DESC32(ICLASS_M" 0111 -------- PP------ --------","[#7] Rxx=(Rs,Rt)")
+MPY_ENC(M2_dpmpyss_acc_s0,   "0111","xxxxx","0","0","0","0","00")
+MPY_ENC(M2_dpmpyss_nac_s0,   "0111","xxxxx","0","1","0","0","00")
+MPY_ENC(M2_dpmpyuu_acc_s0,   "0111","xxxxx","0","0","1","0","00")
+MPY_ENC(M2_dpmpyuu_nac_s0,   "0111","xxxxx","0","1","1","0","00")
+MPY_ENC(M2_vmac2s_s0,        "0111","xxxxx","1","0","0","0","01")
+MPY_ENC(M2_vmac2s_s1,        "0111","xxxxx","1","0","0","1","01")
+MPY_ENC(M2_cmaci_s0,         "0111","xxxxx","0","0","0","0","01")
+MPY_ENC(M2_cmacr_s0,         "0111","xxxxx","0","0","0","0","10")
+MPY_ENC(M2_cmacs_s0,         "0111","xxxxx","1","0","0","0","10")
+MPY_ENC(M2_cmacs_s1,         "0111","xxxxx","1","0","0","1","10")
+MPY_ENC(M2_cmacsc_s0,        "0111","xxxxx","1","0","1","0","10")
+MPY_ENC(M2_cmacsc_s1,        "0111","xxxxx","1","0","1","1","10")
+MPY_ENC(M2_vmac2,            "0111","xxxxx","0","1","0","0","01")
+MPY_ENC(M2_cnacs_s0,         "0111","xxxxx","1","0","0","0","11")
+MPY_ENC(M2_cnacs_s1,         "0111","xxxxx","1","0","0","1","11")
+MPY_ENC(M2_cnacsc_s0,        "0111","xxxxx","1","0","1","0","11")
+MPY_ENC(M2_cnacsc_s1,        "0111","xxxxx","1","0","1","1","11")
+MPY_ENC(M2_vmac2su_s0,       "0111","xxxxx","1","1","1","0","01")
+MPY_ENC(M2_vmac2su_s1,       "0111","xxxxx","1","1","1","1","01")
+MPY_ENC(M4_pmpyw_acc,        "0111","xxxxx","1","1","0","0","11")
+MPY_ENC(M4_vpmpyh_acc,       "0111","xxxxx","1","1","0","1","11")
+MPY_ENC(M5_vmacbuu,          "0111","xxxxx","0","0","0","1","01")
+MPY_ENC(M5_vmacbsu,          "0111","xxxxx","0","0","1","1","01")
+
+
+
+
+
+DEF_FIELDROW_DESC32(ICLASS_M" 1000 -------- PP------ --------","[#8] Rdd=(Rss,Rtt)")
+MPY_ENC(M2_vrcmpyi_s0,       "1000","ddddd","0","0","0","0","00")
+MPY_ENC(M2_vdmpys_s0,        "1000","ddddd","1","0","0","0","00")
+MPY_ENC(M2_vdmpys_s1,        "1000","ddddd","1","0","0","1","00")
+MPY_ENC(M2_vrcmpyi_s0c,      "1000","ddddd","0","0","1","0","00")
+MPY_ENC(M2_vabsdiffw,        "1000","ddddd","0","1","0","0","00")
+MPY_ENC(M6_vabsdiffub,       "1000","ddddd","0","1","0","1","00")
+MPY_ENC(M2_vabsdiffh,        "1000","ddddd","0","1","1","0","00")
+MPY_ENC(M6_vabsdiffb,        "1000","ddddd","0","1","1","1","00")
+MPY_ENC(M2_vrcmpys_s1_h,     "1000","ddddd","1","1","0","1","00")
+MPY_ENC(M2_vrcmpys_s1_l,     "1000","ddddd","1","1","1","1","00")
+MPY_ENC(M2_vrcmpyr_s0c,      "1000","ddddd","0","1","1","0","01")
+MPY_ENC(M2_vrcmpyr_s0,       "1000","ddddd","0","0","0","0","01")
+MPY_ENC(A2_vraddub,          "1000","ddddd","0","0","1","0","01")
+MPY_ENC(M2_mmpyl_s0,         "1000","ddddd","1","0","0","0","01")
+MPY_ENC(M2_mmpyl_s1,         "1000","ddddd","1","0","0","1","01")
+MPY_ENC(M2_mmpyl_rs0,        "1000","ddddd","1","1","0","0","01")
+MPY_ENC(M2_mmpyl_rs1,        "1000","ddddd","1","1","0","1","01")
+MPY_ENC(M2_mmpyul_s0,        "1000","ddddd","1","0","1","0","01")
+MPY_ENC(M2_mmpyul_s1,        "1000","ddddd","1","0","1","1","01")
+MPY_ENC(M2_mmpyul_rs0,       "1000","ddddd","1","1","1","0","01")
+MPY_ENC(M2_mmpyul_rs1,       "1000","ddddd","1","1","1","1","01")
+MPY_ENC(M2_vrmpy_s0,         "1000","ddddd","0","0","0","0","10")
+MPY_ENC(A2_vrsadub,          "1000","ddddd","0","0","1","0","10")
+MPY_ENC(M2_vmpy2es_s0,       "1000","ddddd","1","0","0","0","10")
+MPY_ENC(M2_vmpy2es_s1,       "1000","ddddd","1","0","0","1","10")
+MPY_ENC(M2_vcmpy_s0_sat_i,   "1000","ddddd","1","0","1","0","10")
+MPY_ENC(M2_vcmpy_s0_sat_r,   "1000","ddddd","1","1","0","0","10")
+MPY_ENC(M2_vcmpy_s1_sat_i,   "1000","ddddd","1","0","1","1","10")
+MPY_ENC(M2_vcmpy_s1_sat_r,   "1000","ddddd","1","1","0","1","10")
+
+MPY_ENC(M2_mmpyh_s0,         "1000","ddddd","1","0","0","0","11")
+MPY_ENC(M2_mmpyh_s1,         "1000","ddddd","1","0","0","1","11")
+MPY_ENC(M2_mmpyh_rs0,        "1000","ddddd","1","1","0","0","11")
+MPY_ENC(M2_mmpyh_rs1,        "1000","ddddd","1","1","0","1","11")
+MPY_ENC(M2_mmpyuh_s0,        "1000","ddddd","1","0","1","0","11")
+MPY_ENC(M2_mmpyuh_s1,        "1000","ddddd","1","0","1","1","11")
+MPY_ENC(M2_mmpyuh_rs0,       "1000","ddddd","1","1","1","0","11")
+MPY_ENC(M2_mmpyuh_rs1,       "1000","ddddd","1","1","1","1","11")
+
+MPY_ENC(M4_vrmpyeh_s0,       "1000","ddddd","1","0","1","0","00")
+MPY_ENC(M4_vrmpyeh_s1,       "1000","ddddd","1","0","1","1","00")
+MPY_ENC(M4_vrmpyoh_s0,       "1000","ddddd","0","1","0","0","10")
+MPY_ENC(M4_vrmpyoh_s1,       "1000","ddddd","0","1","0","1","10")
+MPY_ENC(M5_vrmpybuu,         "1000","ddddd","0","0","0","1","01")
+MPY_ENC(M5_vrmpybsu,         "1000","ddddd","0","0","1","1","01")
+MPY_ENC(M5_vdmpybsu,         "1000","ddddd","0","1","0","1","01")
+
+MPY_ENC(F2_dfadd,            "1000","ddddd","0","0","0","0","11")
+MPY_ENC(F2_dfsub,            "1000","ddddd","0","0","0","1","11")
+MPY_ENC(F2_dfmpyfix,         "1000","ddddd","0","0","1","0","11")
+MPY_ENC(F2_dfmin,            "1000","ddddd","0","0","1","1","11")
+MPY_ENC(F2_dfmax,            "1000","ddddd","0","1","0","0","11")
+MPY_ENC(F2_dfmpyll,          "1000","ddddd","0","1","0","1","11")
+#ifdef ADD_DP_OPS
+MPY_ENC(F2_dfdivcheat,       "1000","ddddd","0","0","0","1","00")
+
+MPY_ENC(F2_dffixupn,         "1000","ddddd","0","1","0","1","11")
+MPY_ENC(F2_dffixupd,         "1000","ddddd","0","1","1","0","11")
+MPY_ENC(F2_dfrecipa,         "1000","ddddd","0","1","1","1","ee")
+#endif
+
+MPY_ENC(M7_dcmpyrw,       	"1000","ddddd","0","0","0","1","10")
+MPY_ENC(M7_dcmpyrwc,         "1000","ddddd","0","0","1","1","10")
+MPY_ENC(M7_dcmpyiw,       	"1000","ddddd","0","1","1","0","10")
+MPY_ENC(M7_dcmpyiwc,         "1000","ddddd","0","1","1","1","10")
+
+
+
+DEF_FIELDROW_DESC32(ICLASS_M" 1001 -------- PP------ --------","[#9] Rd=(Rss,Rtt)")
+MPY_ENC(M2_vdmpyrs_s0,       "1001","ddddd","0","0","0","0","00")
+MPY_ENC(M2_vdmpyrs_s1,       "1001","ddddd","0","0","0","1","00")
+
+MPY_ENC(M7_wcmpyrw,      	 "1001","ddddd","0","0","1","0","00")
+MPY_ENC(M7_wcmpyrw_rnd,      "1001","ddddd","0","0","1","1","00")
+MPY_ENC(M7_wcmpyiw,       	 "1001","ddddd","0","1","0","0","00")
+MPY_ENC(M7_wcmpyiw_rnd,      "1001","ddddd","0","1","0","1","00")
+
+MPY_ENC(M7_wcmpyrwc,      	 "1001","ddddd","0","1","1","0","00")
+MPY_ENC(M7_wcmpyrwc_rnd,     "1001","ddddd","0","1","1","1","00")
+MPY_ENC(M7_wcmpyiwc,       	 "1001","ddddd","1","0","0","0","00")
+MPY_ENC(M7_wcmpyiwc_rnd,     "1001","ddddd","1","0","0","1","00")
+
+
+
+MPY_ENC(M2_vradduh,          "1001","ddddd","-","-","-","0","01")
+MPY_ENC(M2_vrcmpys_s1rp_h,   "1001","ddddd","1","1","-","1","10")
+MPY_ENC(M2_vrcmpys_s1rp_l,   "1001","ddddd","1","1","-","1","11")
+MPY_ENC(M2_vraddh,           "1001","ddddd","1","1","-","0","11")
+
+
+DEF_FIELDROW_DESC32(ICLASS_M" 1010 -------- PP------ --------","[#10] Rxx=(Rss,Rtt)")
+MPY_ENC(M2_vrcmaci_s0,       "1010","xxxxx","0","0","0","0","00")
+MPY_ENC(M2_vdmacs_s0,        "1010","xxxxx","1","0","0","0","00")
+MPY_ENC(M2_vdmacs_s1,        "1010","xxxxx","1","0","0","1","00")
+MPY_ENC(M2_vrcmaci_s0c,      "1010","xxxxx","0","0","1","0","00")
+MPY_ENC(M2_vcmac_s0_sat_i,   "1010","xxxxx","1","0","1","0","00")
+MPY_ENC(M2_vcmac_s0_sat_r,   "1010","xxxxx","1","1","0","0","00")
+MPY_ENC(M2_vrcmpys_acc_s1_h, "1010","xxxxx","1","1","0","1","00")
+MPY_ENC(M2_vrcmpys_acc_s1_l, "1010","xxxxx","1","1","1","1","00")
+MPY_ENC(M2_vrcmacr_s0,       "1010","xxxxx","0","0","0","0","01")
+MPY_ENC(A2_vraddub_acc,      "1010","xxxxx","0","0","1","0","01")
+MPY_ENC(M2_mmacls_s0,        "1010","xxxxx","1","0","0","0","01")
+MPY_ENC(M2_mmacls_s1,        "1010","xxxxx","1","0","0","1","01")
+MPY_ENC(M2_mmacls_rs0,       "1010","xxxxx","1","1","0","0","01")
+MPY_ENC(M2_mmacls_rs1,       "1010","xxxxx","1","1","0","1","01")
+MPY_ENC(M2_mmaculs_s0,       "1010","xxxxx","1","0","1","0","01")
+MPY_ENC(M2_mmaculs_s1,       "1010","xxxxx","1","0","1","1","01")
+MPY_ENC(M2_mmaculs_rs0,      "1010","xxxxx","1","1","1","0","01")
+MPY_ENC(M2_mmaculs_rs1,      "1010","xxxxx","1","1","1","1","01")
+MPY_ENC(M2_vrcmacr_s0c,      "1010","xxxxx","0","1","1","0","01")
+MPY_ENC(M2_vrmac_s0,         "1010","xxxxx","0","0","0","0","10")
+MPY_ENC(A2_vrsadub_acc,      "1010","xxxxx","0","0","1","0","10")
+MPY_ENC(M2_vmac2es_s0,       "1010","xxxxx","1","0","0","0","10")
+MPY_ENC(M2_vmac2es_s1,       "1010","xxxxx","1","0","0","1","10")
+MPY_ENC(M2_vmac2es,          "1010","xxxxx","0","1","0","0","10")
+MPY_ENC(M2_mmachs_s0,        "1010","xxxxx","1","0","0","0","11")
+MPY_ENC(M2_mmachs_s1,        "1010","xxxxx","1","0","0","1","11")
+MPY_ENC(M2_mmachs_rs0,       "1010","xxxxx","1","1","0","0","11")
+MPY_ENC(M2_mmachs_rs1,       "1010","xxxxx","1","1","0","1","11")
+MPY_ENC(M2_mmacuhs_s0,       "1010","xxxxx","1","0","1","0","11")
+MPY_ENC(M2_mmacuhs_s1,       "1010","xxxxx","1","0","1","1","11")
+MPY_ENC(M2_mmacuhs_rs0,      "1010","xxxxx","1","1","1","0","11")
+MPY_ENC(M2_mmacuhs_rs1,      "1010","xxxxx","1","1","1","1","11")
+MPY_ENC(M4_vrmpyeh_acc_s0,   "1010","xxxxx","1","1","0","0","10")
+MPY_ENC(M4_vrmpyeh_acc_s1,   "1010","xxxxx","1","1","0","1","10")
+MPY_ENC(M4_vrmpyoh_acc_s0,   "1010","xxxxx","1","1","1","0","10")
+MPY_ENC(M4_vrmpyoh_acc_s1,   "1010","xxxxx","1","1","1","1","10")
+MPY_ENC(M5_vrmacbuu,         "1010","xxxxx","0","0","0","1","01")
+MPY_ENC(M5_vrmacbsu,         "1010","xxxxx","0","0","1","1","01")
+MPY_ENC(M5_vdmacbsu,         "1010","xxxxx","0","1","0","0","01")
+
+MPY_ENC(F2_dfmpylh,          "1010","xxxxx","0","0","0","0","11")
+MPY_ENC(F2_dfmpyhh,          "1010","xxxxx","0","0","0","1","11")
+#ifdef ADD_DP_OPS
+MPY_ENC(F2_dfmpyhh,          "1010","xxxxx","0","0","1","0","11")
+MPY_ENC(F2_dffma,            "1010","xxxxx","0","0","0","0","11")
+MPY_ENC(F2_dffms,            "1010","xxxxx","0","0","0","1","11")
+
+MPY_ENC(F2_dffma_lib,        "1010","xxxxx","0","0","1","0","11")
+MPY_ENC(F2_dffms_lib,        "1010","xxxxx","0","0","1","1","11")
+MPY_ENC(F2_dffma_sc,         "1010","xxxxx","0","1","1","1","uu")
+#endif
+
+
+MPY_ENC(M7_dcmpyrw_acc,       	"1010","xxxxx","0","0","0","1","10")
+MPY_ENC(M7_dcmpyrwc_acc,         "1010","xxxxx","0","0","1","1","10")
+MPY_ENC(M7_dcmpyiw_acc,       	"1010","xxxxx","0","1","1","0","10")
+MPY_ENC(M7_dcmpyiwc_acc,         "1010","xxxxx","1","0","1","0","10")
+
+
+
+
+MPY_ENC(A5_ACS,              "1010","xxxxx","0","1","0","1","ee")
+MPY_ENC(A6_vminub_RdP,       "1010","ddddd","0","1","1","1","ee")
+/*
+*/
+
+DEF_FIELDROW_DESC32(ICLASS_M" 1011 -------- PP------ --------","[#11] Reserved")
+MPY_ENC(F2_sfadd,            "1011","ddddd","0","0","0","0","00")
+MPY_ENC(F2_sfsub,            "1011","ddddd","0","0","0","0","01")
+MPY_ENC(F2_sfmax,            "1011","ddddd","0","0","0","1","00")
+MPY_ENC(F2_sfmin,            "1011","ddddd","0","0","0","1","01")
+MPY_ENC(F2_sfmpy,            "1011","ddddd","0","0","1","0","00")
+MPY_ENC(F2_sfdivcheat,       "1011","ddddd","0","0","1","0","11")
+MPY_ENC(F2_sffixupn,         "1011","ddddd","0","0","1","1","00")
+MPY_ENC(F2_sffixupd,         "1011","ddddd","0","0","1","1","01")
+MPY_ENC(F2_sfrecipa,         "1011","ddddd","1","1","1","1","ee")
+
+DEF_FIELDROW_DESC32(ICLASS_M" 1100 -------- PP------ --------","[#12] Rd=(Rs,Rt)")
+DEF_FIELD32(ICLASS_M"         1100 -------- PP------ --!-----",Mc_tH,"Rt is High") /*Rt high */
+DEF_FIELD32(ICLASS_M"         1100 -------- PP------ -!------",Mc_sH,"Rs is High") /* Rs high */
+SP_MPY(M2_mpy,               "1100","ddddd","0","0","0")
+SP_MPY(M2_mpy_sat,           "1100","ddddd","1","0","0")
+SP_MPY(M2_mpy_rnd,           "1100","ddddd","0","1","0")
+SP_MPY(M2_mpy_sat_rnd,       "1100","ddddd","1","1","0")
+SP_MPY(M2_mpyu,              "1100","ddddd","0","0","1")
+
+DEF_FIELDROW_DESC32(ICLASS_M" 1101 -------- PP------ --------","[#13] Rd=(Rs,Rt)")
+/* EJP: same as mpyi MPY_ENC(M2_mpyui,            "1101","ddddd","0","0","1","0","00") */
+MPY_ENC(M2_mpyi,             "1101","ddddd","0","0","0","0","00")
+MPY_ENC(M2_mpy_up,           "1101","ddddd","0","0","0","0","01")
+MPY_ENC(M2_mpyu_up,          "1101","ddddd","0","0","1","0","01")
+MPY_ENC(M2_dpmpyss_rnd_s0,   "1101","ddddd","0","1","0","0","01")
+MPY_ENC(M2_cmpyrs_s0,        "1101","ddddd","1","1","0","0","10")
+MPY_ENC(M2_cmpyrs_s1,        "1101","ddddd","1","1","0","1","10")
+MPY_ENC(M2_cmpyrsc_s0,       "1101","ddddd","1","1","1","0","10")
+MPY_ENC(M2_cmpyrsc_s1,       "1101","ddddd","1","1","1","1","10")
+MPY_ENC(M2_vmpy2s_s0pack,    "1101","ddddd","1","1","0","0","11")
+MPY_ENC(M2_vmpy2s_s1pack,    "1101","ddddd","1","1","0","1","11")
+MPY_ENC(M2_hmmpyh_rs1,       "1101","ddddd","1","1","0","1","00")
+MPY_ENC(M2_hmmpyl_rs1,       "1101","ddddd","1","1","1","1","00")
+
+MPY_ENC(M2_hmmpyh_s1,        "1101","ddddd","0","1","0","1","00")
+MPY_ENC(M2_hmmpyl_s1,        "1101","ddddd","0","1","0","1","01")
+MPY_ENC(M2_mpy_up_s1,        "1101","ddddd","0","1","0","1","10")
+MPY_ENC(M2_mpy_up_s1_sat,    "1101","ddddd","0","1","1","1","00")
+MPY_ENC(M2_mpysu_up,         "1101","ddddd","0","1","1","0","01")
+
+
+DEF_FIELDROW_DESC32(ICLASS_M" 1110 -------- PP------ --------","[#14] Rx=(Rs,Rt)")
+DEF_FIELD32(ICLASS_M"         1110 -------- PP------ --!-----",Md_tH,"Rt is High") /*Rt high */
+DEF_FIELD32(ICLASS_M"         1110 -------- PP------ -!------",Md_sH,"Rs is High") /* Rs high */
+SP_MPY(M2_mpyu_acc,          "1110","xxxxx","0","0","1")
+SP_MPY(M2_mpy_acc,           "1110","xxxxx","0","0","0")
+SP_MPY(M2_mpy_acc_sat,       "1110","xxxxx","1","0","0")
+SP_MPY(M2_mpyu_nac,          "1110","xxxxx","0","1","1")
+SP_MPY(M2_mpy_nac,           "1110","xxxxx","0","1","0")
+SP_MPY(M2_mpy_nac_sat,       "1110","xxxxx","1","1","0")
+
+
+DEF_FIELDROW_DESC32(ICLASS_M" 1111 -------- PP------ --------","[#15] Rx=(Rs,Rt)")
+MPY_ENC(M2_maci,             "1111","xxxxx","0","0","0","0","00")
+MPY_ENC(M2_mnaci,            "1111","xxxxx","0","0","0","1","00")
+MPY_ENC(M2_acci,             "1111","xxxxx","0","0","0","0","01")
+MPY_ENC(M2_nacci,            "1111","xxxxx","0","0","0","1","01")
+MPY_ENC(M2_xor_xacc,         "1111","xxxxx","0","0","0","1","11")
+MPY_ENC(M2_subacc,           "1111","xxxxx","0","0","0","0","11")
+
+MPY_ENC(M4_mac_up_s1_sat,    "1111","xxxxx","0","1","1","0","00")
+MPY_ENC(M4_nac_up_s1_sat,    "1111","xxxxx","0","1","1","0","01")
+
+MPY_ENC(M4_and_and,          "1111","xxxxx","0","0","1","0","00")
+MPY_ENC(M4_and_or,           "1111","xxxxx","0","0","1","0","01")
+MPY_ENC(M4_and_xor,          "1111","xxxxx","0","0","1","0","10")
+MPY_ENC(M4_or_and,           "1111","xxxxx","0","0","1","0","11")
+MPY_ENC(M4_or_or,            "1111","xxxxx","0","0","1","1","00")
+MPY_ENC(M4_or_xor,           "1111","xxxxx","0","0","1","1","01")
+MPY_ENC(M4_xor_and,          "1111","xxxxx","0","0","1","1","10")
+MPY_ENC(M4_xor_or,           "1111","xxxxx","0","0","1","1","11")
+
+MPY_ENC(M4_or_andn,          "1111","xxxxx","0","1","0","0","00")
+MPY_ENC(M4_and_andn,         "1111","xxxxx","0","1","0","0","01")
+MPY_ENC(M4_xor_andn,         "1111","xxxxx","0","1","0","0","10")
+
+MPY_ENC(F2_sffma,            "1111","xxxxx","1","0","0","0","00")
+MPY_ENC(F2_sffms,            "1111","xxxxx","1","0","0","0","01")
+
+MPY_ENC(F2_sffma_lib,        "1111","xxxxx","1","0","0","0","10")
+MPY_ENC(F2_sffms_lib,        "1111","xxxxx","1","0","0","0","11")
+
+MPY_ENC(F2_sffma_sc,         "1111","xxxxx","1","1","1","0","uu")
+
+
+/*******************************/
+/*                             */
+/*                             */
+/*           ALU32_2op         */
+/*                             */
+/*                             */
+/*******************************/
+DEF_CLASS32(ICLASS_ADDI" ---- -------- PP------ --------",ALU32_ADDI)
+
+DEF_CLASS32(ICLASS_ALU2op" ---- -------- PP------ --------",ALU32_2op)
+DEF_FIELD32(ICLASS_ALU2op" !--- -------- PP------ --------",A2_Rs,"No Rs read")
+DEF_FIELD32(ICLASS_ALU2op" -!!! -------- PP------ --------",A2_MajOp,"Major Opcode")
+DEF_FIELD32(ICLASS_ALU2op" ---- !!!----- PP------ --------",A2_MinOp,"Minor Opcode")
+
+DEF_FIELD32(ICLASS_ALU3op" -!!! -------- PP------ --------",A3_MajOp,"Major Opcode")
+DEF_FIELD32(ICLASS_ALU3op" ---- !!!----- PP------ --------",A3_MinOp,"Minor Opcode")
+DEF_CLASS32(ICLASS_ALU3op" ---- -------- PP------ --------",ALU32_3op)
+DEF_FIELD32(ICLASS_ALU3op" !--- -------- PP------ --------",A3_P,"Predicated")
+DEF_FIELD32(ICLASS_ALU3op" ---- -------- PP!----- --------",A3_DN,"Dot-new")
+DEF_FIELD32(ICLASS_ALU3op" ---- -------- PP------ !-------",A3_PS,"Predicate sense")
+
+
+/*************************/
+/* Our good friend addi  */
+/*************************/
+DEF_ENC32(A2_addi,    ICLASS_ADDI"  iiii iiisssss PPiiiiii iiiddddd")
+
+
+/*******************************/
+/* Standard ALU32 insns        */
+/*******************************/
+
+#define ALU32_IRR_ENC(TAG,MAJ4,MIN3,SMOD1,VMIN3,DSTCHARS)\
+DEF_ENC32(TAG, ICLASS_ALU2op" "MAJ4"  "MIN3"sssss  PP"SMOD1"iiiii "VMIN3 DSTCHARS)
+
+#define ALU32_RR_ENC(TAG,MAJ4,MIN3,SMOD1,VMIN3,DSTCHARS)\
+DEF_ENC32(TAG, ICLASS_ALU2op" "MAJ4"  "MIN3"sssss  PP"SMOD1"----- "VMIN3 DSTCHARS)
+
+#define CONDA32_RR_ENC(TAG,MAJ4,MIN3,SMOD1,VMIN3,DSTCHARS)\
+DEF_ENC32(TAG##t,   ICLASS_ALU2op" "MAJ4"  "MIN3"sssss  PP"SMOD1"-00uu "VMIN3 DSTCHARS)\
+DEF_ENC32(TAG##f,   ICLASS_ALU2op" "MAJ4"  "MIN3"sssss  PP"SMOD1"-10uu "VMIN3 DSTCHARS)\
+DEF_ENC32(TAG##tnew,ICLASS_ALU2op" "MAJ4"  "MIN3"sssss  PP"SMOD1"-01uu "VMIN3 DSTCHARS)\
+DEF_ENC32(TAG##fnew,ICLASS_ALU2op" "MAJ4"  "MIN3"sssss  PP"SMOD1"-11uu "VMIN3 DSTCHARS)
+
+
+DEF_FIELDROW_DESC32(    ICLASS_ALU2op" 0000 -------- PP------ --------","[#0] (Pu) Rd=(Rs)")
+DEF_FIELD32(            ICLASS_ALU2op" 0000 -------- PP!----- --------",A32a_C,"Conditional")
+DEF_FIELD32(            ICLASS_ALU2op" 0000 -------- PP--!--- --------",A32a_S,"Predicate sense")
+DEF_FIELD32(            ICLASS_ALU2op" 0000 -------- PP---!-- --------",A32a_dn,"Dot-new")
+
+ALU32_RR_ENC(A2_aslh,                 "0000","000","0","---","ddddd")
+ALU32_RR_ENC(A2_asrh,                 "0000","001","0","---","ddddd")
+ALU32_RR_ENC(A2_tfr,                  "0000","011","0","---","ddddd")
+ALU32_RR_ENC(A2_sxtb,                 "0000","101","0","---","ddddd")
+ALU32_RR_ENC(A2_zxth,                 "0000","110","0","---","ddddd")
+ALU32_RR_ENC(A2_sxth,                 "0000","111","0","---","ddddd")
+
+CONDA32_RR_ENC(A4_paslh,               "0000","000","1","---","ddddd")
+CONDA32_RR_ENC(A4_pasrh,               "0000","001","1","---","ddddd")
+CONDA32_RR_ENC(A4_pzxtb,               "0000","100","1","---","ddddd")
+CONDA32_RR_ENC(A4_psxtb,               "0000","101","1","---","ddddd")
+CONDA32_RR_ENC(A4_pzxth,               "0000","110","1","---","ddddd")
+CONDA32_RR_ENC(A4_psxth,               "0000","111","1","---","ddddd")
+
+
+DEF_FIELDROW_DESC32(    ICLASS_ALU2op" 0001 -------- PP------ --------","[#1] Rx=(#u16)")
+DEF_ENC32(A2_tfril,     ICLASS_ALU2op" 0001 ii1xxxxx PPiiiiii iiiiiiii")
+
+
+DEF_FIELDROW_DESC32(    ICLASS_ALU2op" 0010 -------- PP------ --------","[#2] Rx=(#u16)")
+DEF_ENC32(A2_tfrih,     ICLASS_ALU2op" 0010 ii1xxxxx PPiiiiii iiiiiiii")
+
+
+DEF_FIELDROW_DESC32(    ICLASS_ALU2op" 0011 -------- PP------ --------","[#3] Rd=(Pu,Rs,#s8)")
+DEF_ENC32(C2_muxir,     ICLASS_ALU2op" 0011 0uusssss PP0iiiii iiiddddd")
+DEF_ENC32(C2_muxri,     ICLASS_ALU2op" 0011 1uusssss PP0iiiii iiiddddd")
+
+DEF_ENC32(A4_combineri, ICLASS_ALU2op" 0011 -00sssss PP1iiiii iiiddddd") /* Rdd = (Rs,#s8) */
+DEF_ENC32(A4_combineir, ICLASS_ALU2op" 0011 -01sssss PP1iiiii iiiddddd") /* Rdd = (Rs,#s8) */
+DEF_ENC32(A4_rcmpeqi,   ICLASS_ALU2op" 0011 -10sssss PP1iiiii iiiddddd") /* Rd = (Rs,#s8) */
+DEF_ENC32(A4_rcmpneqi,  ICLASS_ALU2op" 0011 -11sssss PP1iiiii iiiddddd") /* Rd = (Rs,#s8) */
+
+
+DEF_FIELDROW_DESC32(    ICLASS_ALU2op" 0100 -------- PP------ --------","[#4] (Pu) Rd=(Rs,#s8)")
+DEF_FIELD32(            ICLASS_ALU2op" 0100 -------- PP!----- --------",A32a_DN,"Dot-new")
+DEF_FIELD32(            ICLASS_ALU2op" 0100 !------- PP------ --------",A32a_PS,"Predicate sense")
+DEF_ENC32(A2_paddit,    ICLASS_ALU2op" 0100 0uusssss PP0iiiii iiiddddd")
+DEF_ENC32(A2_padditnew, ICLASS_ALU2op" 0100 0uusssss PP1iiiii iiiddddd")
+DEF_ENC32(A2_paddif,    ICLASS_ALU2op" 0100 1uusssss PP0iiiii iiiddddd")
+DEF_ENC32(A2_paddifnew, ICLASS_ALU2op" 0100 1uusssss PP1iiiii iiiddddd")
+
+
+DEF_FIELDROW_DESC32(     ICLASS_ALU2op" 0101 -------- PP------ --------","[#5] Pd=(Rs,#s10)")
+DEF_ENC32(C2_cmpeqi,     ICLASS_ALU2op" 0101 00isssss PPiiiiii iii000dd")
+DEF_ENC32(C2_cmpgti,     ICLASS_ALU2op" 0101 01isssss PPiiiiii iii000dd")
+DEF_ENC32(C2_cmpgtui,    ICLASS_ALU2op" 0101 100sssss PPiiiiii iii000dd")
+
+DEF_ENC32(C4_cmpneqi,    ICLASS_ALU2op" 0101 00isssss PPiiiiii iii100dd")
+DEF_ENC32(C4_cmpltei,    ICLASS_ALU2op" 0101 01isssss PPiiiiii iii100dd")
+DEF_ENC32(C4_cmplteui,   ICLASS_ALU2op" 0101 100sssss PPiiiiii iii100dd")
+
+
+DEF_FIELDROW_DESC32(    ICLASS_ALU2op" 0110 -------- PP------ --------","[#6] Rd=(Rs,#s10)")
+ALU32_IRR_ENC(A2_andir,               "0110","00i","i","iii","ddddd")
+ALU32_IRR_ENC(A2_subri,               "0110","01i","i","iii","ddddd")
+ALU32_IRR_ENC(A2_orir,                "0110","10i","i","iii","ddddd")
+
+
+DEF_FIELDROW_DESC32(    ICLASS_ALU2op" 0111 -------- PP------ --------","[#7] Reserved")
+
+
+DEF_FIELDROW_DESC32(    ICLASS_ALU2op" 1000 -------- PP------ --------","[#8] Rd=#s16")
+DEF_ENC32(A2_tfrsi,     ICLASS_ALU2op" 1000 ii-iiiii PPiiiiii iiiddddd")
+
+
+DEF_FIELDROW_DESC32(    ICLASS_ALU2op" 1001 -------- PP------ --------","[#9] Reserved")
+
+
+DEF_FIELDROW_DESC32(    ICLASS_ALU2op" 101- -------- PP------ --------","[#10,#11] Rd=(Pu,#s8,#S8)")
+DEF_ENC32(C2_muxii,     ICLASS_ALU2op" 101u uIIIIIII PPIiiiii iiiddddd")
+
+
+DEF_FIELDROW_DESC32(    ICLASS_ALU2op" 1100 -------- PP------ --------","[#12] Rdd=(#s8,#S8)")
+DEF_ENC32(A2_combineii, ICLASS_ALU2op" 1100 0IIIIIII PPIiiiii iiiddddd")
+DEF_ENC32(A4_combineii, ICLASS_ALU2op" 1100 1--IIIII PPIiiiii iiiddddd")
+
+
+DEF_FIELDROW_DESC32(    ICLASS_ALU2op" 1101 -------- PP------ --------","[#13] Reserved")
+
+
+DEF_FIELDROW_DESC32(    ICLASS_ALU2op" 1110 -------- PP------ --------","[#14] (Pu) Rd=#s12")
+DEF_FIELD32(            ICLASS_ALU2op" 1110 ---0---- PP!----- --------",A32c_DN,"Dot-new")
+DEF_FIELD32(            ICLASS_ALU2op" 1110 !--0---- PP------ --------",A32c_PS,"Predicate sense")
+DEF_ENC32(C2_cmovenewit,ICLASS_ALU2op" 1110 0uu0iiii PP1iiiii iiiddddd")
+DEF_ENC32(C2_cmovenewif,ICLASS_ALU2op" 1110 1uu0iiii PP1iiiii iiiddddd")
+DEF_ENC32(C2_cmoveit,   ICLASS_ALU2op" 1110 0uu0iiii PP0iiiii iiiddddd")
+DEF_ENC32(C2_cmoveif,   ICLASS_ALU2op" 1110 1uu0iiii PP0iiiii iiiddddd")
+
+
+DEF_FIELDROW_DESC32(    ICLASS_ALU2op" 1111 -------- PP------ --------","[#15] nop")
+DEF_ENC32(A2_nop,       ICLASS_ALU2op" 1111 -------- PP------ --------")
+
+
+
+
+
+
+
+
+
+
+
+
+/*******************************/
+/*                             */
+/*                             */
+/*    ALU32_3op                */
+/*                             */
+/*                             */
+/*******************************/
+
+
+#define V2A32_RRR_ENC(TAG,MAJ4,MIN3,SMOD1,VMIN3,DSTCHARS)\
+DEF_ENC32(TAG, ICLASS_ALU3op" "MAJ4"  "MIN3"sssss  PP"SMOD1"ttttt "VMIN3 DSTCHARS)
+
+#define V2A32_RR_ENC(TAG,MAJ4,MIN3,SMOD1,VMIN3,DSTCHARS)\
+DEF_ENC32(TAG, ICLASS_ALU3op" "MAJ4"  "MIN3"sssss  PP"SMOD1"----- "VMIN3 DSTCHARS)
+
+
+DEF_FIELDROW_DESC32(ICLASS_ALU3op"  0000 -------- PP------ --------","[#0] Reserved")
+
+
+DEF_FIELDROW_DESC32(ICLASS_ALU3op"  0001 -------- PP------ --------","[#1] Rd=(Rs,Rt)")
+V2A32_RRR_ENC(A2_and,              "0001","000","-","---","ddddd")
+V2A32_RRR_ENC(A2_or,               "0001","001","-","---","ddddd")
+V2A32_RRR_ENC(A2_xor,              "0001","011","-","---","ddddd")
+V2A32_RRR_ENC(A4_andn,             "0001","100","-","---","ddddd")
+V2A32_RRR_ENC(A4_orn,              "0001","101","-","---","ddddd")
+
+DEF_FIELDROW_DESC32(ICLASS_ALU3op"  0010 -------- PP------ --------","[#2] Pd=(Rs,Rt)")
+V2A32_RRR_ENC(C2_cmpeq,            "0010","-00","-","---","000dd")
+V2A32_RRR_ENC(C2_cmpgt,            "0010","-10","-","---","000dd")
+V2A32_RRR_ENC(C2_cmpgtu,           "0010","-11","-","---","000dd")
+
+V2A32_RRR_ENC(C4_cmpneq,           "0010","-00","-","---","100dd")
+V2A32_RRR_ENC(C4_cmplte,           "0010","-10","-","---","100dd")
+V2A32_RRR_ENC(C4_cmplteu,          "0010","-11","-","---","100dd")
+
+
+DEF_FIELDROW_DESC32(ICLASS_ALU3op"  0011 -------- PP------ --------","[#3] Rd=(Rs,Rt)")
+V2A32_RRR_ENC(A2_add,              "0011","000","-","---","ddddd")
+V2A32_RRR_ENC(A2_sub,              "0011","001","-","---","ddddd")
+V2A32_RRR_ENC(A2_combine_hh,       "0011","100","-","---","ddddd")
+V2A32_RRR_ENC(A2_combine_hl,       "0011","101","-","---","ddddd")
+V2A32_RRR_ENC(A2_combine_lh,       "0011","110","-","---","ddddd")
+V2A32_RRR_ENC(A2_combine_ll,       "0011","111","-","---","ddddd")
+V2A32_RRR_ENC(A4_rcmpeq,           "0011","010","-","---","ddddd")
+V2A32_RRR_ENC(A4_rcmpneq,          "0011","011","-","---","ddddd")
+
+
+DEF_FIELDROW_DESC32(ICLASS_ALU3op"  0100 -------- PP------ --------","[#4] Rd=(Pu,Rs,Rt)")
+V2A32_RRR_ENC(C2_mux,              "0100","---","-","-uu","ddddd")
+
+
+DEF_FIELDROW_DESC32(ICLASS_ALU3op"  0101 -------- PP------ --------","[#5] Rdd=(Rs,Rt)")
+V2A32_RRR_ENC(A2_combinew,         "0101","0--","-","---","ddddd")
+V2A32_RRR_ENC(S2_packhl,           "0101","1--","-","---","ddddd")
+
+
+DEF_FIELDROW_DESC32(ICLASS_ALU3op"  0110 -------- PP------ --------","[#6] Rd=(Rs,Rt)")
+V2A32_RRR_ENC(A2_svaddh,           "0110","000","-","---","ddddd")
+V2A32_RRR_ENC(A2_svaddhs,          "0110","001","-","---","ddddd")
+V2A32_RRR_ENC(A2_svadduhs,         "0110","011","-","---","ddddd")
+V2A32_RRR_ENC(A2_svsubh,           "0110","100","-","---","ddddd")
+V2A32_RRR_ENC(A2_svsubhs,          "0110","101","-","---","ddddd")
+V2A32_RRR_ENC(A2_svsubuhs,         "0110","111","-","---","ddddd")
+V2A32_RRR_ENC(A2_addsat,           "0110","010","-","---","ddddd")
+V2A32_RRR_ENC(A2_subsat,           "0110","110","-","---","ddddd")
+
+
+DEF_FIELDROW_DESC32(ICLASS_ALU3op"  0111 -------- PP------ --------","[#7] Rd=(Rs,Rt)")
+V2A32_RRR_ENC(A2_svavgh,           "0111","-00","-","---","ddddd")
+V2A32_RRR_ENC(A2_svavghs,          "0111","-01","-","---","ddddd")
+V2A32_RRR_ENC(A2_svnavgh,          "0111","-11","-","---","ddddd")
+
+
+DEF_FIELDROW_DESC32(ICLASS_ALU3op"  1000 -------- PP------ --------","[#8] Reserved")
+
+
+DEF_FIELDROW_DESC32(ICLASS_ALU3op"  1001 -------- PP------ --------","[#9] (Pu) Rd=(Rs,Rt)")
+V2A32_RRR_ENC(A2_pandt,            "1001","-00","0","0uu","ddddd")
+V2A32_RRR_ENC(A2_pandtnew,         "1001","-00","1","0uu","ddddd")
+V2A32_RRR_ENC(A2_pandf,            "1001","-00","0","1uu","ddddd")
+V2A32_RRR_ENC(A2_pandfnew,         "1001","-00","1","1uu","ddddd")
+V2A32_RRR_ENC(A2_port,             "1001","-01","0","0uu","ddddd")
+V2A32_RRR_ENC(A2_portnew,          "1001","-01","1","0uu","ddddd")
+V2A32_RRR_ENC(A2_porf,             "1001","-01","0","1uu","ddddd")
+V2A32_RRR_ENC(A2_porfnew,          "1001","-01","1","1uu","ddddd")
+V2A32_RRR_ENC(A2_pxort,            "1001","-11","0","0uu","ddddd")
+V2A32_RRR_ENC(A2_pxortnew,         "1001","-11","1","0uu","ddddd")
+V2A32_RRR_ENC(A2_pxorf,            "1001","-11","0","1uu","ddddd")
+V2A32_RRR_ENC(A2_pxorfnew,         "1001","-11","1","1uu","ddddd")
+
+
+DEF_FIELDROW_DESC32(ICLASS_ALU3op"  1010 -------- PP------ --------","[#10] Reserved")
+
+
+
+DEF_FIELDROW_DESC32(ICLASS_ALU3op"  1011 -------- PP------ --------","[#11] (Pu) Rd=(Rs,Rt)")
+V2A32_RRR_ENC(A2_paddt,            "1011","0-0","0","0uu","ddddd")
+V2A32_RRR_ENC(A2_paddtnew,         "1011","0-0","1","0uu","ddddd")
+V2A32_RRR_ENC(A2_paddf,            "1011","0-0","0","1uu","ddddd")
+V2A32_RRR_ENC(A2_paddfnew,         "1011","0-0","1","1uu","ddddd")
+V2A32_RRR_ENC(A2_psubt,            "1011","0-1","0","0uu","ddddd")
+V2A32_RRR_ENC(A2_psubtnew,         "1011","0-1","1","0uu","ddddd")
+V2A32_RRR_ENC(A2_psubf,            "1011","0-1","0","1uu","ddddd")
+V2A32_RRR_ENC(A2_psubfnew,         "1011","0-1","1","1uu","ddddd")
+
+
+DEF_FIELDROW_DESC32(ICLASS_ALU3op"  1100 -------- PP------ --------","[#12] Reserved")
+
+
+DEF_FIELDROW_DESC32(ICLASS_ALU3op"  1101 -------- PP------ --------","[#13] (Pu) Rdd=(Rs,Rt)")
+V2A32_RRR_ENC(C2_ccombinewnewt,    "1101","000","1","0uu","ddddd")
+V2A32_RRR_ENC(C2_ccombinewnewf,    "1101","000","1","1uu","ddddd")
+V2A32_RRR_ENC(C2_ccombinewt,       "1101","000","0","0uu","ddddd")
+V2A32_RRR_ENC(C2_ccombinewf,       "1101","000","0","1uu","ddddd")
+
+
+
+
+
+
+DEF_FIELDROW_DESC32(ICLASS_ALU3op"  1110 -------- PP------ --------","[#14] Reserved")
+
+
+
+
+
+
+
+
+
+/*******************************/
+/*                             */
+/*                             */
+/*    S                        */
+/*                             */
+/*                             */
+/*******************************/
+
+DEF_CLASS32(ICLASS_S2op" ---- -------- PP------ --------",S_2op)
+DEF_FIELD32(ICLASS_S2op" !!!! -------- PP------ --------",STYPEB_RegType,"Register Type")
+DEF_FIELD32(ICLASS_S2op" ---- !!------ PP------ --------",S2_MajOp,"Major Opcode")
+DEF_FIELD32(ICLASS_S2op" ---- -------- PP------ !!!-----",S2_MinOp,"Minor Opcode")
+
+DEF_CLASS32(ICLASS_S3op" ---- -------- PP------ --------",S_3op)
+DEF_FIELD32(ICLASS_S3op" !!!! -------- PP------ --------",STYPEA_RegType,"Register Type")
+DEF_FIELD32(ICLASS_S3op" ---- !!------ PP------ --------",S3_Maj,"Major Opcode")
+DEF_FIELD32(ICLASS_S3op" ---- -------- PP------ !!------",S3_Min,"Minor Opcode")
+
+
+#define SH_RRR_ENC(TAG,MAJ4,MIN3,SMOD1,VMIN3,DSTCHARS) \
+DEF_ENC32(TAG,ICLASS_S3op" "MAJ4"  "MIN3"sssss  PP"SMOD1"ttttt "VMIN3 DSTCHARS)
+
+#define SH_RRRiENC(TAG,MAJ4,MIN3,SMOD1,VMIN3,DSTCHARS) \
+DEF_ENC32(TAG,ICLASS_S3op" "MAJ4"  "MIN3"iiiii  PP"SMOD1"ttttt "VMIN3 DSTCHARS)
+
+#define SH_RRR_ENCX(TAG,MAJ4,MIN3,SMOD1,VMIN3,DSTCHARS) \
+DEF_ENC32(TAG,ICLASS_S3op" "MAJ4"  "MIN3"sssss  PP"SMOD1"xxxxx "VMIN3 DSTCHARS)
+
+#define SH3_RR_ENC(TAG,MAJ4,MIN3,SMOD1,VMIN3,DSTCHARS) \
+DEF_ENC32(TAG,ICLASS_S3op" "MAJ4"  "MIN3"sssss  PP"SMOD1"----- "VMIN3 DSTCHARS)
+
+#define SH_PPP_ENC(TAG,MAJ4,MIN3,SMOD1,VMIN3,DSTCHARS) \
+DEF_ENC32(TAG,ICLASS_S3op" "MAJ4"  "MIN3"---ss  PP"SMOD1"---tt "VMIN3 DSTCHARS)
+
+#define SH2_RR_ENC(TAG,MAJ4,MIN3,SMOD1,VMIN3,DSTCHARS) \
+DEF_ENC32(TAG,ICLASS_S2op" "MAJ4"  "MIN3"sssss  PP"SMOD1"----- "VMIN3 DSTCHARS)
+
+#define SH2_PPP_ENC(TAG,MAJ4,MIN3,SMOD1,VMIN3,DSTCHARS) \
+DEF_ENC32(TAG,ICLASS_S2op" "MAJ4"  "MIN3"---ss  PP"SMOD1"---tt "VMIN3 DSTCHARS)
+
+#define SH_RRI4_ENC(TAG,MAJ4,MIN3,VMIN3,DSTCHARS) \
+DEF_ENC32(TAG,ICLASS_S2op" "MAJ4" "MIN3 "sssss PP00iiii " VMIN3 DSTCHARS)
+
+#define SH_RRI5_ENC(TAG,MAJ4,MIN3,VMIN3,DSTCHARS) \
+DEF_ENC32(TAG,ICLASS_S2op" "MAJ4" "MIN3 "sssss PP0iiiii " VMIN3 DSTCHARS)
+
+#define SH_RRI6_ENC(TAG,MAJ4,MIN3,VMIN3,DSTCHARS) \
+DEF_ENC32(TAG,ICLASS_S2op" "MAJ4" "MIN3 "sssss PPiiiiii " VMIN3 DSTCHARS)
+
+#define RSHIFTTYPES(TAGEND,MAJ4,MIN3,SMOD1,DMOD1,DSTCHARS) \
+SH_RRR_ENC(S2_asr_r_##TAGEND,MAJ4,MIN3,SMOD1,"00"DMOD1,DSTCHARS) \
+SH_RRR_ENC(S2_lsr_r_##TAGEND,MAJ4,MIN3,SMOD1,"01"DMOD1,DSTCHARS) \
+SH_RRR_ENC(S2_asl_r_##TAGEND,MAJ4,MIN3,SMOD1,"10"DMOD1,DSTCHARS) \
+SH_RRR_ENC(S2_lsl_r_##TAGEND,MAJ4,MIN3,SMOD1,"11"DMOD1,DSTCHARS)
+
+
+#define I5SHIFTTYPES(TAGEND,MAJ4,MIN3,SMOD1,DSTCHARS) \
+SH_RRI5_ENC(S2_asr_i_##TAGEND,MAJ4,MIN3,SMOD1 "00",DSTCHARS) \
+SH_RRI5_ENC(S2_lsr_i_##TAGEND,MAJ4,MIN3,SMOD1 "01",DSTCHARS) \
+SH_RRI5_ENC(S2_asl_i_##TAGEND,MAJ4,MIN3,SMOD1 "10",DSTCHARS) \
+SH_RRI5_ENC(S6_rol_i_##TAGEND,MAJ4,MIN3,SMOD1 "11",DSTCHARS)
+
+#define I5SHIFTTYPES_NOROL(TAGEND,MAJ4,MIN3,SMOD1,DSTCHARS) \
+SH_RRI5_ENC(S2_asr_i_##TAGEND,MAJ4,MIN3,SMOD1 "00",DSTCHARS) \
+SH_RRI5_ENC(S2_lsr_i_##TAGEND,MAJ4,MIN3,SMOD1 "01",DSTCHARS) \
+SH_RRI5_ENC(S2_asl_i_##TAGEND,MAJ4,MIN3,SMOD1 "10",DSTCHARS)
+
+#define I5SHIFTTYPES_NOASR(TAGEND,MAJ4,MIN3,SMOD1,DSTCHARS) \
+SH_RRI5_ENC(S2_lsr_i_##TAGEND,MAJ4,MIN3,SMOD1 "01",DSTCHARS) \
+SH_RRI5_ENC(S2_asl_i_##TAGEND,MAJ4,MIN3,SMOD1 "10",DSTCHARS) \
+SH_RRI5_ENC(S6_rol_i_##TAGEND,MAJ4,MIN3,SMOD1 "11",DSTCHARS)
+
+#define I4SHIFTTYPES(TAGEND,MAJ4,MIN3,SMOD1,DSTCHARS) \
+SH_RRI4_ENC(S2_asr_i_##TAGEND,MAJ4,MIN3,SMOD1 "00",DSTCHARS) \
+SH_RRI4_ENC(S2_lsr_i_##TAGEND,MAJ4,MIN3,SMOD1 "01",DSTCHARS) \
+SH_RRI4_ENC(S2_asl_i_##TAGEND,MAJ4,MIN3,SMOD1 "10",DSTCHARS)
+
+#define I5ASHIFTTYPES(TAGEND,MAJ4,MIN3,SMOD1,DSTCHARS) \
+SH_RRI5_ENC(S2_asl_i_##TAGEND,MAJ4,MIN3,SMOD1 "10",DSTCHARS)
+
+#define I4ASHIFTTYPES(TAGEND,MAJ4,MIN3,SMOD1,DSTCHARS) \
+SH_RRI4_ENC(S2_asl_i_##TAGEND,MAJ4,MIN3,SMOD1 "10",DSTCHARS)
+
+#define I6SHIFTTYPES(TAGEND,MAJ4,MIN3,SMOD1,DSTCHARS) \
+SH_RRI6_ENC(S2_asr_i_##TAGEND,MAJ4,MIN3,SMOD1 "00",DSTCHARS) \
+SH_RRI6_ENC(S2_lsr_i_##TAGEND,MAJ4,MIN3,SMOD1 "01",DSTCHARS) \
+SH_RRI6_ENC(S2_asl_i_##TAGEND,MAJ4,MIN3,SMOD1 "10",DSTCHARS) \
+SH_RRI6_ENC(S6_rol_i_##TAGEND,MAJ4,MIN3,SMOD1 "11",DSTCHARS) \
+
+#define I6SHIFTTYPES_NOASR(TAGEND,MAJ4,MIN3,SMOD1,DSTCHARS) \
+SH_RRI6_ENC(S2_lsr_i_##TAGEND,MAJ4,MIN3,SMOD1 "01",DSTCHARS) \
+SH_RRI6_ENC(S2_asl_i_##TAGEND,MAJ4,MIN3,SMOD1 "10",DSTCHARS) \
+SH_RRI6_ENC(S6_rol_i_##TAGEND,MAJ4,MIN3,SMOD1 "11",DSTCHARS)
+
+
+
+DEF_FIELDROW_DESC32(ICLASS_S2op" 0000 -------- PP------ --------","[#0] Rdd=(Rss,#u6)")
+/* EJP: there is actually quite a bit of space here, look at the reserved bits */
+I6SHIFTTYPES(p,                 "0000","000","0","ddddd")
+I5SHIFTTYPES_NOROL(vw,          "0000","010","0","ddddd")
+I4SHIFTTYPES(vh,                "0000","100","0","ddddd")
+
+
+
+/* False assume an immediate */
+SH2_RR_ENC(S2_vsathub_nopack, "0000","000","-","1 00","ddddd")
+SH2_RR_ENC(S2_vsatwuh_nopack, "0000","000","-","1 01","ddddd")
+SH2_RR_ENC(S2_vsatwh_nopack,  "0000","000","-","1 10","ddddd")
+SH2_RR_ENC(S2_vsathb_nopack,  "0000","000","-","1 11","ddddd")
+
+SH_RRI4_ENC(S5_vasrhrnd,      "0000","001",    "0 00","ddddd")
+
+SH2_RR_ENC(A2_vabsh,          "0000","010","-","1 00","ddddd")
+SH2_RR_ENC(A2_vabshsat,       "0000","010","-","1 01","ddddd")
+SH2_RR_ENC(A2_vabsw,          "0000","010","-","1 10","ddddd")
+SH2_RR_ENC(A2_vabswsat,       "0000","010","-","1 11","ddddd")
+
+SH2_RR_ENC(A2_notp,           "0000","100","-","1 00","ddddd")
+SH2_RR_ENC(A2_negp,           "0000","100","-","1 01","ddddd")
+SH2_RR_ENC(A2_absp,           "0000","100","-","1 10","ddddd")
+SH2_RR_ENC(A2_vconj,          "0000","100","-","1 11","ddddd")
+
+SH2_RR_ENC(S2_deinterleave,   "0000","110","-","1 00","ddddd")
+SH2_RR_ENC(S2_interleave,     "0000","110","-","1 01","ddddd")
+SH2_RR_ENC(S2_brevp,          "0000","110","-","1 10","ddddd")
+SH_RRI6_ENC(S2_asr_i_p_rnd,   "0000","110",    "1 11","ddddd")
+
+SH2_RR_ENC(F2_conv_df2d,      "0000","111","0","0 00","ddddd")
+SH2_RR_ENC(F2_conv_df2ud,     "0000","111","0","0 01","ddddd")
+SH2_RR_ENC(F2_conv_ud2df,     "0000","111","0","0 10","ddddd")
+SH2_RR_ENC(F2_conv_d2df,      "0000","111","0","0 11","ddddd")
+#ifdef ADD_DP_OPS
+SH2_RR_ENC(F2_dffixupr,       "0000","111","0","1 00","ddddd")
+SH2_RR_ENC(F2_dfsqrtcheat,    "0000","111","0","1 01","ddddd")
+#endif
+SH2_RR_ENC(F2_conv_df2d_chop, "0000","111","0","1 10","ddddd")
+SH2_RR_ENC(F2_conv_df2ud_chop,"0000","111","0","1 11","ddddd")
+#ifdef ADD_DP_OPS
+SH2_RR_ENC(F2_dfinvsqrta,     "0000","111","1","0 ee","ddddd")
+#endif
+
+
+
+DEF_FIELDROW_DESC32(ICLASS_S2op"    0001 -------- PP------ --------","[#1] Rdd=(Rss,#u6,#U6)")
+DEF_ENC32(S2_extractup,ICLASS_S2op" 0001 IIIsssss PPiiiiii IIIddddd")
+
+
+DEF_FIELDROW_DESC32(ICLASS_S2op" 0010 -------- PP------ --------","[#2] Rxx=(Rss,#u6)")
+I6SHIFTTYPES(p_nac,             "0010","00-","0","xxxxx")
+I6SHIFTTYPES(p_acc,             "0010","00-","1","xxxxx")
+I6SHIFTTYPES(p_and,             "0010","01-","0","xxxxx")
+I6SHIFTTYPES(p_or,              "0010","01-","1","xxxxx")
+I6SHIFTTYPES_NOASR(p_xacc,      "0010","10-","0","xxxxx")
+
+
+DEF_FIELDROW_DESC32(ICLASS_S2op"  0011 -------- PP------ --------","[#3] Rxx=(Rss,#u6,#U6)")
+DEF_ENC32(S2_insertp,ICLASS_S2op" 0011 IIIsssss PPiiiiii IIIxxxxx")
+
+
+DEF_FIELDROW_DESC32(ICLASS_S2op"  0100 -------- PP------ --------","[#4] Rdd=(Rs)")
+SH2_RR_ENC(S2_vsxtbh,            "0100","00-","-","00-","ddddd")
+SH2_RR_ENC(S2_vzxtbh,            "0100","00-","-","01-","ddddd")
+SH2_RR_ENC(S2_vsxthw,            "0100","00-","-","10-","ddddd")
+SH2_RR_ENC(S2_vzxthw,            "0100","00-","-","11-","ddddd")
+SH2_RR_ENC(A2_sxtw,              "0100","01-","-","00-","ddddd")
+SH2_RR_ENC(S2_vsplatrh,          "0100","01-","-","01-","ddddd")
+SH2_RR_ENC(S6_vsplatrbp,         "0100","01-","-","10-","ddddd")
+
+SH2_RR_ENC(F2_conv_sf2df,        "0100","1--","-","000","ddddd")
+SH2_RR_ENC(F2_conv_uw2df,        "0100","1--","-","001","ddddd")
+SH2_RR_ENC(F2_conv_w2df,         "0100","1--","-","010","ddddd")
+SH2_RR_ENC(F2_conv_sf2ud,        "0100","1--","-","011","ddddd")
+SH2_RR_ENC(F2_conv_sf2d,         "0100","1--","-","100","ddddd")
+SH2_RR_ENC(F2_conv_sf2ud_chop,   "0100","1--","-","101","ddddd")
+SH2_RR_ENC(F2_conv_sf2d_chop,    "0100","1--","-","110","ddddd")
+
+
+DEF_FIELDROW_DESC32(ICLASS_S2op"   0101 -------- PP------ --------","[#5] Pd=(Rs,#u6)")
+DEF_ENC32(S2_tstbit_i,ICLASS_S2op" 0101 000sssss PP0iiiii ------dd")
+DEF_ENC32(C2_tfrrp,   ICLASS_S2op" 0101 010sssss PP------ ------dd")
+DEF_ENC32(C2_bitsclri,ICLASS_S2op" 0101 100sssss PPiiiiii ------dd")
+DEF_ENC32(S4_ntstbit_i,ICLASS_S2op"0101 001sssss PP0iiiii ------dd")
+DEF_ENC32(C4_nbitsclri,ICLASS_S2op"0101 101sssss PPiiiiii ------dd")
+DEF_ENC32(F2_sfclass,  ICLASS_S2op"0101 111sssss PP0iiiii ------dd")
+
+
+DEF_FIELDROW_DESC32(ICLASS_S2op"   0110 -------- PP------ --------","[#6] Rdd=(Pt)")
+DEF_ENC32(C2_mask, ICLASS_S2op"    0110   --- -----  PP----tt --- ddddd")
+
+
+DEF_FIELDROW_DESC32(ICLASS_S2op"    0111 -------- PP------ --------","[#7] Rx=(Rs,#u4,#S6)")
+DEF_ENC32(S2_tableidxb,ICLASS_S2op" 0111 00isssss PPIIIIII iiixxxxx")
+DEF_ENC32(S2_tableidxh,ICLASS_S2op" 0111 01isssss PPIIIIII iiixxxxx")
+DEF_ENC32(S2_tableidxw,ICLASS_S2op" 0111 10isssss PPIIIIII iiixxxxx")
+DEF_ENC32(S2_tableidxd,ICLASS_S2op" 0111 11isssss PPIIIIII iiixxxxx")
+
+
+DEF_FIELDROW_DESC32(ICLASS_S2op"   1000 -------- PP------ --------","[#8] Rd=(Rss,#u6)")
+SH2_RR_ENC(S2_vsathub,            "1000","000","-","000","ddddd")
+SH2_RR_ENC(S2_vsatwh,             "1000","000","-","010","ddddd")
+SH2_RR_ENC(S2_vsatwuh,            "1000","000","-","100","ddddd")
+SH2_RR_ENC(S2_vsathb,             "1000","000","-","110","ddddd")
+SH2_RR_ENC(S2_clbp,               "1000","010","-","000","ddddd")
+SH2_RR_ENC(S2_cl0p,               "1000","010","-","010","ddddd")
+SH2_RR_ENC(S2_cl1p,               "1000","010","-","100","ddddd")
+SH2_RR_ENC(S2_ct0p,               "1000","111","-","010","ddddd")
+SH2_RR_ENC(S2_ct1p,               "1000","111","-","100","ddddd")
+SH2_RR_ENC(S2_vtrunohb,           "1000","100","-","000","ddddd")
+SH2_RR_ENC(S2_vtrunehb,           "1000","100","-","010","ddddd")
+SH2_RR_ENC(S2_vrndpackwh,         "1000","100","-","100","ddddd")
+SH2_RR_ENC(S2_vrndpackwhs,        "1000","100","-","110","ddddd")
+SH2_RR_ENC(A2_sat,                "1000","110","-","000","ddddd")
+SH2_RR_ENC(A2_roundsat,           "1000","110","-","001","ddddd")
+SH_RRI5_ENC(S2_asr_i_svw_trun,    "1000","110",    "010","ddddd")
+SH_RRI5_ENC(A4_bitspliti,         "1000","110",    "100","ddddd")
+
+SH_RRI5_ENC(A7_clip,         	  "1000","110",    "101","ddddd")
+SH_RRI5_ENC(A7_vclip,         	  "1000","110",    "110","ddddd")
+
+
+SH2_RR_ENC(S4_clbpnorm,           "1000","011","-","000","ddddd")
+SH_RRI6_ENC(S4_clbpaddi,          "1000","011",    "010","ddddd")
+SH2_RR_ENC(S5_popcountp,          "1000","011","-","011","ddddd")
+
+SH_RRI4_ENC(S5_asrhub_rnd_sat,    "1000","011",    "100","ddddd")
+SH_RRI4_ENC(S5_asrhub_sat,        "1000","011",    "101","ddddd")
+
+SH2_RR_ENC(F2_conv_df2sf,         "1000","000","-","001","ddddd")
+SH2_RR_ENC(F2_conv_ud2sf,         "1000","001","-","001","ddddd")
+SH2_RR_ENC(F2_conv_d2sf,          "1000","010","-","001","ddddd")
+SH2_RR_ENC(F2_conv_df2uw,         "1000","011","-","001","ddddd")
+SH2_RR_ENC(F2_conv_df2w,          "1000","100","-","001","ddddd")
+SH2_RR_ENC(F2_conv_df2uw_chop,    "1000","101","-","001","ddddd")
+SH2_RR_ENC(F2_conv_df2w_chop,     "1000","111","-","001","ddddd")
+
+
+
+DEF_FIELDROW_DESC32(ICLASS_S2op"   1001 -------- PP------ --------","[#9] Rd=(Ps,Pt)")
+DEF_ENC32(C2_vitpack, ICLASS_S2op" 1001   -00 ---ss  PP----tt --- ddddd")
+DEF_ENC32(C2_tfrpr,   ICLASS_S2op" 1001   -1- ---ss  PP------ --- ddddd")
+
+
+DEF_FIELDROW_DESC32(ICLASS_S2op"   1010 -------- PP------ --------","[#10] Rdd=(Rss,#u6,#U6)")
+DEF_ENC32(S4_extractp,ICLASS_S2op" 1010 IIIsssss PPiiiiii IIIddddd")
+
+
+DEF_FIELDROW_DESC32(ICLASS_S2op"   1011 -------- PP------ --------","[#11] Rd=(Rs)")
+SH2_RR_ENC(F2_conv_uw2sf,         "1011","001","-","000","ddddd")
+SH2_RR_ENC(F2_conv_w2sf,          "1011","010","-","000","ddddd")
+SH2_RR_ENC(F2_conv_sf2uw,         "1011","011","-","000","ddddd")
+SH2_RR_ENC(F2_conv_sf2w,          "1011","100","-","000","ddddd")
+SH2_RR_ENC(F2_conv_sf2uw_chop,    "1011","011","-","001","ddddd")
+SH2_RR_ENC(F2_conv_sf2w_chop,     "1011","100","-","001","ddddd")
+SH2_RR_ENC(F2_sffixupr,           "1011","101","-","000","ddddd")
+SH2_RR_ENC(F2_sfsqrtcheat,        "1011","110","-","111","ddddd")
+SH2_RR_ENC(F2_sfinvsqrta,         "1011","111","-","0ee","ddddd")
+
+
+DEF_FIELDROW_DESC32(ICLASS_S2op"      1100 -------- PP------ --------","[#12] Rd=(Rs,#u6)")
+I5SHIFTTYPES(r,                      "1100","000",            "0  ","ddddd")
+SH_RRI5_ENC(S2_asl_i_r_sat,          "1100","010",            "010","ddddd")
+SH_RRI5_ENC(S2_asr_i_r_rnd,          "1100","010",            "000","ddddd")
+
+SH2_RR_ENC(S2_svsathb,               "1100","10-","-",        "00-","ddddd")
+SH2_RR_ENC(S2_svsathub,              "1100","10-","-",        "01-","ddddd")
+
+SH_RRI5_ENC(A4_cround_ri,            "1100","111",            "00-","ddddd")
+SH_RRI6_ENC(A7_croundd_ri,           "1100","111",            "01-","ddddd")
+SH_RRI5_ENC(A4_round_ri,             "1100","111",            "10-","ddddd")
+SH_RRI5_ENC(A4_round_ri_sat,         "1100","111",            "11-","ddddd")
+
+DEF_ENC32(S2_setbit_i,   ICLASS_S2op" 1100   110sssss PP0iiiii 000ddddd")
+DEF_ENC32(S2_clrbit_i,   ICLASS_S2op" 1100   110sssss PP0iiiii 001ddddd")
+DEF_ENC32(S2_togglebit_i,ICLASS_S2op" 1100   110sssss PP0iiiii 010ddddd")
+DEF_ENC32(S4_lsli       ,ICLASS_S2op" 1100   110sssss PPiiiiii 011ddddd")
+
+DEF_ENC32(S4_clbaddi    ,ICLASS_S2op" 1100   001sssss PPiiiiii 000ddddd")
+
+
+
+/* False read #u6 */
+SH2_RR_ENC(S2_clb,                   "1100","000","-","1 00","ddddd")
+SH2_RR_ENC(S2_cl0,                   "1100","000","-","1 01","ddddd")
+SH2_RR_ENC(S2_cl1,                   "1100","000","-","1 10","ddddd")
+SH2_RR_ENC(S2_clbnorm,               "1100","000","-","1 11","ddddd")
+SH2_RR_ENC(S2_ct0,                   "1100","010","-","1 00","ddddd")
+SH2_RR_ENC(S2_ct1,                   "1100","010","-","1 01","ddddd")
+SH2_RR_ENC(S2_brev,                  "1100","010","-","1 10","ddddd")
+SH2_RR_ENC(S2_vsplatrb,              "1100","010","-","1 11","ddddd")
+SH2_RR_ENC(A2_abs,                   "1100","100","-","1 00","ddddd")
+SH2_RR_ENC(A2_abssat,                "1100","100","-","1 01","ddddd")
+SH2_RR_ENC(A2_negsat,                "1100","100","-","1 10","ddddd")
+SH2_RR_ENC(A2_swiz,                  "1100","100","-","1 11","ddddd")
+SH2_RR_ENC(A2_sath,                  "1100","110","-","1 00","ddddd")
+SH2_RR_ENC(A2_satuh,                 "1100","110","-","1 01","ddddd")
+SH2_RR_ENC(A2_satub,                 "1100","110","-","1 10","ddddd")
+SH2_RR_ENC(A2_satb,                  "1100","110","-","1 11","ddddd")
+
+
+DEF_FIELDROW_DESC32(ICLASS_S2op"     1101 -------- PP------ --------","[#13] Rd=(Rs,#u6,#U6)")
+DEF_ENC32(S2_extractu,  ICLASS_S2op" 1101 0IIsssss PP0iiiii IIIddddd")
+DEF_ENC32(S4_extract,   ICLASS_S2op" 1101 1IIsssss PP0iiiii IIIddddd")
+DEF_ENC32(S2_mask,    ICLASS_S2op"   1101 0II----- PP1iiiii IIIddddd")
+
+
+
+
+
+
+DEF_FIELDROW_DESC32(ICLASS_S2op"     1110 -------- PP------ --------","[#14] Rx=(Rs,#u6)")
+I5SHIFTTYPES(r_nac,       "1110","00-","0","xxxxx")
+I5SHIFTTYPES(r_acc,       "1110","00-","1","xxxxx")
+I5SHIFTTYPES(r_and,       "1110","01-","0","xxxxx")
+I5SHIFTTYPES(r_or,        "1110","01-","1","xxxxx")
+I5SHIFTTYPES_NOASR(r_xacc,"1110","10-","0","xxxxx")
+
+
+DEF_FIELDROW_DESC32(ICLASS_S2op"     1111 -------- PP------ --------","[#15] Rs=(Rs,#u6,#U6)")
+DEF_ENC32(S2_insert,    ICLASS_S2op" 1111 0IIsssss PP0iiiii IIIxxxxx")
+
+
+
+
+
+/*************************/
+/* S_3_operand           */
+/*************************/
+
+
+DEF_FIELDROW_DESC32(ICLASS_S3op" 0000 -------- PP------ --------","[#0] Rdd=(Rss,Rtt,#u3)")
+SH_RRR_ENC(S2_valignib,         "0000","0--","-","iii","ddddd")
+SH_RRR_ENC(S2_vspliceib,        "0000","1--","-","iii","ddddd")
+
+
+DEF_FIELDROW_DESC32(ICLASS_S3op" 0001 -------- PP------ --------","[#1] Rdd=(Rss,Rtt)")
+SH_RRR_ENC(S2_extractup_rp,     "0001","00-","-","00-","ddddd")
+SH_RRR_ENC(S2_shuffeb,          "0001","00-","-","01-","ddddd")
+SH_RRR_ENC(S2_shuffob,          "0001","00-","-","10-","ddddd")
+SH_RRR_ENC(S2_shuffeh,          "0001","00-","-","11-","ddddd")
+
+SH_RRR_ENC(S2_shuffoh,          "0001","10-","-","000","ddddd")
+SH_RRR_ENC(S2_vtrunewh,         "0001","10-","-","010","ddddd")
+SH_RRR_ENC(S6_vtrunehb_ppp,		"0001","10-","-","011","ddddd")
+SH_RRR_ENC(S2_vtrunowh,         "0001","10-","-","100","ddddd")
+SH_RRR_ENC(S6_vtrunohb_ppp,		"0001","10-","-","101","ddddd")
+SH_RRR_ENC(S2_lfsp,             "0001","10-","-","110","ddddd")
+
+SH_RRR_ENC(S4_vxaddsubw,        "0001","01-","-","000","ddddd")
+SH_RRR_ENC(A5_vaddhubs,         "0001","01-","-","001","ddddd")
+SH_RRR_ENC(S4_vxsubaddw,        "0001","01-","-","010","ddddd")
+SH_RRR_ENC(S4_vxaddsubh,        "0001","01-","-","100","ddddd")
+SH_RRR_ENC(S4_vxsubaddh,        "0001","01-","-","110","ddddd")
+
+SH_RRR_ENC(S4_vxaddsubhr,       "0001","11-","-","00-","ddddd")
+SH_RRR_ENC(S4_vxsubaddhr,       "0001","11-","-","01-","ddddd")
+SH_RRR_ENC(S4_extractp_rp,      "0001","11-","-","10-","ddddd")
+SH_RRR_ENC(S2_cabacdecbin,      "0001","11-","-","11-","ddddd") /* implicit P0 write */
+
+
+DEF_FIELDROW_DESC32(ICLASS_S3op" 0010 -------- PP------ --------","[#2] Rdd=(Rss,Rtt,Pu)")
+SH_RRR_ENC(S2_valignrb,         "0010","0--","-","-uu","ddddd")
+SH_RRR_ENC(S2_vsplicerb,        "0010","100","-","-uu","ddddd")
+SH_RRR_ENC(S2_cabacencbin,      "0010","101","-","-uu","ddddd")
+SH_RRR_ENC(A4_addp_c,	    	"0010","110","-","-xx","ddddd")
+SH_RRR_ENC(A4_subp_c,		"0010","111","-","-xx","ddddd")
+
+
+DEF_FIELDROW_DESC32(ICLASS_S3op" 0011 -------- PP------ --------","[#3] Rdd=(Rss,Rt)")
+RSHIFTTYPES(vw,                 "0011","00-","-","-","ddddd")
+RSHIFTTYPES(vh,                 "0011","01-","-","-","ddddd")
+RSHIFTTYPES(p,                  "0011","10-","-","-","ddddd")
+SH_RRR_ENC(S2_vcrotate,         "0011","11-","-","00-","ddddd")
+SH_RRR_ENC(S2_vcnegh,           "0011","11-","-","01-","ddddd")
+SH_RRR_ENC(S4_vrcrotate,        "0011","11-","i","11i","ddddd")
+
+
+DEF_FIELDROW_DESC32(ICLASS_S3op" 0100 -------- PP------ --------","[#4] Rd=(Rs,Rt,#u3)")
+DEF_ENC32(S2_addasl_rrri, ICLASS_S3op" 0100   000 sssss PP0ttttt iiiddddd")
+
+
+
+DEF_FIELDROW_DESC32(ICLASS_S3op" 0101 -------- PP------ --------","[#5] Rd=(Rss,Rt)")
+SH_RRR_ENC(S2_asr_r_svw_trun,   "0101","---","-","010","ddddd")
+SH_RRR_ENC(M4_cmpyi_wh,         "0101","---","-","100","ddddd")
+SH_RRR_ENC(M4_cmpyr_wh,         "0101","---","-","110","ddddd")
+SH_RRR_ENC(M4_cmpyi_whc,        "0101","---","-","101","ddddd")
+SH_RRR_ENC(M4_cmpyr_whc,        "0101","---","-","111","ddddd")
+
+DEF_FIELDROW_DESC32(ICLASS_S3op" 0110 -------- PP------ --------","[#6] Rd=(Rs,Rt)")
+SH_RRR_ENC(S2_asr_r_r_sat,      "0110","00-","-","00-","ddddd") \
+SH_RRR_ENC(S2_asl_r_r_sat,      "0110","00-","-","10-","ddddd")
+
+RSHIFTTYPES(r,                  "0110","01-","-","-","ddddd")
+
+SH_RRR_ENC(S2_setbit_r,         "0110","10-","-","00-","ddddd")
+SH_RRR_ENC(S2_clrbit_r,         "0110","10-","-","01-","ddddd")
+SH_RRR_ENC(S2_togglebit_r,      "0110","10-","-","10-","ddddd")
+SH_RRRiENC(S4_lsli,             "0110","10-","-","11i","ddddd")
+
+SH_RRR_ENC(A4_cround_rr,        "0110","11-","-","00-","ddddd")
+SH_RRR_ENC(A7_croundd_rr,       "0110","11-","-","01-","ddddd")
+SH_RRR_ENC(A4_round_rr,         "0110","11-","-","10-","ddddd")
+SH_RRR_ENC(A4_round_rr_sat,     "0110","11-","-","11-","ddddd")
+
+
+
+
+DEF_FIELDROW_DESC32(ICLASS_S3op" 0111 -------- PP------ --------","[#7] Pd=(Rs,Rt)")
+SH_RRR_ENC(S2_tstbit_r,         "0111","000","-","---","---dd")
+SH_RRR_ENC(C2_bitsset,          "0111","010","-","---","---dd")
+SH_RRR_ENC(C2_bitsclr,          "0111","100","-","---","---dd")
+SH_RRR_ENC(A4_cmpheq,           "0111","110","-","011","---dd")
+SH_RRR_ENC(A4_cmphgt,           "0111","110","-","100","---dd")
+SH_RRR_ENC(A4_cmphgtu,          "0111","110","-","101","---dd")
+SH_RRR_ENC(A4_cmpbeq,           "0111","110","-","110","---dd")
+SH_RRR_ENC(A4_cmpbgtu,          "0111","110","-","111","---dd")
+SH_RRR_ENC(A4_cmpbgt,           "0111","110","-","010","---dd")
+SH_RRR_ENC(S4_ntstbit_r,        "0111","001","-","---","---dd")
+SH_RRR_ENC(C4_nbitsset,         "0111","011","-","---","---dd")
+SH_RRR_ENC(C4_nbitsclr,         "0111","101","-","---","---dd")
+
+SH_RRR_ENC(F2_sfcmpge,          "0111","111","-","000","---dd")
+SH_RRR_ENC(F2_sfcmpuo,          "0111","111","-","001","---dd")
+SH_RRR_ENC(F2_sfcmpeq,          "0111","111","-","011","---dd")
+SH_RRR_ENC(F2_sfcmpgt,          "0111","111","-","100","---dd")
+
+
+DEF_FIELDROW_DESC32(ICLASS_S3op" 1000 -------- PP------ --------","[#8] Rx=(Rs,Rtt)")
+SH_RRR_ENC(S2_insert_rp,        "1000","---","-","---","xxxxx")
+
+
+DEF_FIELDROW_DESC32(ICLASS_S3op" 1001 -------- PP------ --------","[#9] Rd=(Rs,Rtt)")
+SH_RRR_ENC(S2_extractu_rp,      "1001","00-","-","00-","ddddd")
+SH_RRR_ENC(S4_extract_rp,       "1001","00-","-","01-","ddddd")
+
+
+DEF_FIELDROW_DESC32(ICLASS_S3op" 1010 -------- PP------ --------","[#10] Rxx=(Rss,Rtt)")
+SH_RRR_ENC(S2_insertp_rp,       "1010","0--","0","---","xxxxx")
+SH_RRR_ENC(M4_xor_xacc,         "1010","10-","0","000","xxxxx")
+
+DEF_FIELDROW_DESC32(ICLASS_S3op" 1011 -------- PP------ --------","[#11] Rxx=(Rss,Rt)")
+RSHIFTTYPES(p_or,               "1011","000","-","-","xxxxx")
+RSHIFTTYPES(p_and,              "1011","010","-","-","xxxxx")
+RSHIFTTYPES(p_nac,              "1011","100","-","-","xxxxx")
+RSHIFTTYPES(p_acc,              "1011","110","-","-","xxxxx")
+RSHIFTTYPES(p_xor,              "1011","011","-","-","xxxxx")
+
+SH_RRR_ENCX(A4_vrmaxh,		"1011","001","0","001","uuuuu")
+SH_RRR_ENCX(A4_vrmaxuh,		"1011","001","1","001","uuuuu")
+SH_RRR_ENCX(A4_vrmaxw,		"1011","001","0","010","uuuuu")
+SH_RRR_ENCX(A4_vrmaxuw,		"1011","001","1","010","uuuuu")
+
+SH_RRR_ENCX(A4_vrminh,		"1011","001","0","101","uuuuu")
+SH_RRR_ENCX(A4_vrminuh,		"1011","001","1","101","uuuuu")
+SH_RRR_ENCX(A4_vrminw,		"1011","001","0","110","uuuuu")
+SH_RRR_ENCX(A4_vrminuw,		"1011","001","1","110","uuuuu")
+
+SH_RRR_ENC(S2_vrcnegh,		"1011","001","1","111","xxxxx")
+
+SH_RRR_ENC(S4_vrcrotate_acc,	"1011","101","i","--i","xxxxx")
+
+
+DEF_FIELDROW_DESC32(ICLASS_S3op" 1100 -------- PP------ --------","[#12] Rx=(Rs,Rt)")
+RSHIFTTYPES(r_or,               "1100","00-","-","-","xxxxx")
+RSHIFTTYPES(r_and,              "1100","01-","-","-","xxxxx")
+RSHIFTTYPES(r_nac,              "1100","10-","-","-","xxxxx")
+RSHIFTTYPES(r_acc,              "1100","11-","-","-","xxxxx")
+
+
+DEF_FIELDROW_DESC32(ICLASS_S3op" 1101 -------- PP------ --------","[#13] Reserved")
+DEF_FIELDROW_DESC32(ICLASS_S3op" 1110 -------- PP------ --------","[#14] Reserved")
+
+
+DEF_FIELDROW_DESC32(ICLASS_S3op" 1111 -------- PP------ --------","[#14] User Instruction")
+DEF_ENC32(S6_userinsn,ICLASS_S3op" 1111 iiisssss PPittttt iiixxxxx")
+
+
+
+
+
+
+
+
+
+
+
+
+
+/*******************************/
+/*                             */
+/*                             */
+/*           ALU64             */
+/*                             */
+/*                             */
+/*******************************/
+DEF_CLASS32(ICLASS_ALU64" ---- -------- PP------ --------",ALU64)
+DEF_FIELD32(ICLASS_ALU64" !!!! -------- PP------ --------",ALU64_RegType,"Register Type")
+DEF_FIELD32(ICLASS_ALU64" 0--- !!!----- PP------ --------",A_MajOp,"Major Opcode")
+DEF_FIELD32(ICLASS_ALU64" 0--- -------- PP------ !!!-----",A_MinOp,"Minor Opcode")
+DEF_FIELD32(ICLASS_ALU64" 11-- -------- PP------ ---!!!!!",A_MajOp,"Major Opcode")
+
+
+
+#define ALU64_RRR_ENC(TAG,MAJ4,MIN3,SMOD1,VMIN3,DSTCHARS)\
+DEF_ENC32(TAG, ICLASS_ALU64" "MAJ4"  "MIN3"sssss  PP"SMOD1"ttttt "VMIN3 DSTCHARS)
+
+#define LEGACY_ALU64_RRR_ENC(TAG,MAJ4,MIN3,SMOD1,VMIN3,DSTCHARS)\
+LEGACY_DEF_ENC32(TAG, ICLASS_ALU64" "MAJ4"  "MIN3"sssss  PP"SMOD1"ttttt "VMIN3 DSTCHARS)
+
+
+DEF_FIELDROW_DESC32(ICLASS_ALU64" 0000 -------- PP------ --------","[#0] Rd=(Rss,Rtt)")
+ALU64_RRR_ENC(S2_parityp,        "0000","---","-","---","ddddd")
+
+
+DEF_FIELDROW_DESC32(ICLASS_ALU64" 0001 -------- PP------ --------","[#1] Rdd=(Pu,Rss,Rtt)")
+ALU64_RRR_ENC(C2_vmux,           "0001","---","-","-uu","ddddd")
+
+
+DEF_FIELDROW_DESC32(ICLASS_ALU64" 0010 -------- PP------ --------","[#2] Pd=(Rss,Rtt)")
+ALU64_RRR_ENC(A2_vcmpweq,        "0010","0--","0","000","---dd")
+ALU64_RRR_ENC(A2_vcmpwgt,        "0010","0--","0","001","---dd")
+ALU64_RRR_ENC(A2_vcmpwgtu,       "0010","0--","0","010","---dd")
+ALU64_RRR_ENC(A2_vcmpheq,        "0010","0--","0","011","---dd")
+ALU64_RRR_ENC(A2_vcmphgt,        "0010","0--","0","100","---dd")
+ALU64_RRR_ENC(A2_vcmphgtu,       "0010","0--","0","101","---dd")
+ALU64_RRR_ENC(A2_vcmpbeq,        "0010","0--","0","110","---dd")
+ALU64_RRR_ENC(A2_vcmpbgtu,       "0010","0--","0","111","---dd")
+
+ALU64_RRR_ENC(A4_vcmpbeq_any,    "0010","0--","1","000","---dd")
+ALU64_RRR_ENC(A6_vcmpbeq_notany, "0010","0--","1","001","---dd")
+ALU64_RRR_ENC(A4_vcmpbgt,        "0010","0--","1","010","---dd")
+ALU64_RRR_ENC(A4_tlbmatch,       "0010","0--","1","011","---dd")
+ALU64_RRR_ENC(A4_boundscheck_lo, "0010","0--","1","100","---dd")
+ALU64_RRR_ENC(A4_boundscheck_hi, "0010","0--","1","101","---dd")
+
+ALU64_RRR_ENC(C2_cmpeqp,         "0010","100","-","000","---dd")
+ALU64_RRR_ENC(C2_cmpgtp,         "0010","100","-","010","---dd")
+ALU64_RRR_ENC(C2_cmpgtup,        "0010","100","-","100","---dd")
+
+ALU64_RRR_ENC(F2_dfcmpeq,        "0010","111","-","000","---dd")
+ALU64_RRR_ENC(F2_dfcmpgt,        "0010","111","-","001","---dd")
+ALU64_RRR_ENC(F2_dfcmpge,        "0010","111","-","010","---dd")
+ALU64_RRR_ENC(F2_dfcmpuo,        "0010","111","-","011","---dd")
+
+
+DEF_FIELDROW_DESC32(ICLASS_ALU64" 0011 -------- PP------ --------","[#3] Rdd=(Rss,Rtt)")
+ALU64_RRR_ENC(A2_vaddub,         "0011","000","-","000","ddddd")
+ALU64_RRR_ENC(A2_vaddubs,        "0011","000","-","001","ddddd")
+ALU64_RRR_ENC(A2_vaddh,          "0011","000","-","010","ddddd")
+ALU64_RRR_ENC(A2_vaddhs,         "0011","000","-","011","ddddd")
+ALU64_RRR_ENC(A2_vadduhs,        "0011","000","-","100","ddddd")
+ALU64_RRR_ENC(A2_vaddw,          "0011","000","-","101","ddddd")
+ALU64_RRR_ENC(A2_vaddws,         "0011","000","-","110","ddddd")
+ALU64_RRR_ENC(A2_addp,           "0011","000","-","111","ddddd")
+
+ALU64_RRR_ENC(A2_vsubub,         "0011","001","-","000","ddddd")
+ALU64_RRR_ENC(A2_vsububs,        "0011","001","-","001","ddddd")
+ALU64_RRR_ENC(A2_vsubh,          "0011","001","-","010","ddddd")
+ALU64_RRR_ENC(A2_vsubhs,         "0011","001","-","011","ddddd")
+ALU64_RRR_ENC(A2_vsubuhs,        "0011","001","-","100","ddddd")
+ALU64_RRR_ENC(A2_vsubw,          "0011","001","-","101","ddddd")
+ALU64_RRR_ENC(A2_vsubws,         "0011","001","-","110","ddddd")
+ALU64_RRR_ENC(A2_subp,           "0011","001","-","111","ddddd")
+
+ALU64_RRR_ENC(A2_vavgub,         "0011","010","-","000","ddddd")
+ALU64_RRR_ENC(A2_vavgubr,        "0011","010","-","001","ddddd")
+ALU64_RRR_ENC(A2_vavgh,          "0011","010","-","010","ddddd")
+ALU64_RRR_ENC(A2_vavghr,         "0011","010","-","011","ddddd")
+ALU64_RRR_ENC(A2_vavghcr,        "0011","010","-","100","ddddd")
+ALU64_RRR_ENC(A2_vavguh,         "0011","010","-","101","ddddd")
+ALU64_RRR_ENC(A2_vavguhr,        "0011","010","-","11-","ddddd")
+
+ALU64_RRR_ENC(A2_vavgw,          "0011","011","-","000","ddddd")
+ALU64_RRR_ENC(A2_vavgwr,         "0011","011","-","001","ddddd")
+ALU64_RRR_ENC(A2_vavgwcr,        "0011","011","-","010","ddddd")
+ALU64_RRR_ENC(A2_vavguw,         "0011","011","-","011","ddddd")
+ALU64_RRR_ENC(A2_vavguwr,        "0011","011","-","100","ddddd")
+ALU64_RRR_ENC(A2_addpsat,        "0011","011","-","101","ddddd")
+ALU64_RRR_ENC(A2_addspl,         "0011","011","-","110","ddddd")
+ALU64_RRR_ENC(A2_addsph,         "0011","011","-","111","ddddd")
+
+ALU64_RRR_ENC(A2_vnavgh,         "0011","100","-","000","ddddd")
+ALU64_RRR_ENC(A2_vnavghr,        "0011","100","-","001","ddddd")
+ALU64_RRR_ENC(A2_vnavghcr,       "0011","100","-","010","ddddd")
+ALU64_RRR_ENC(A2_vnavgw,         "0011","100","-","011","ddddd")
+ALU64_RRR_ENC(A2_vnavgwr,        "0011","100","-","10-","ddddd")
+ALU64_RRR_ENC(A2_vnavgwcr,       "0011","100","-","11-","ddddd")
+
+ALU64_RRR_ENC(A2_vminub,         "0011","101","-","000","ddddd")
+ALU64_RRR_ENC(A2_vminh,          "0011","101","-","001","ddddd")
+ALU64_RRR_ENC(A2_vminuh,         "0011","101","-","010","ddddd")
+ALU64_RRR_ENC(A2_vminw,          "0011","101","-","011","ddddd")
+ALU64_RRR_ENC(A2_vminuw,         "0011","101","-","100","ddddd")
+ALU64_RRR_ENC(A2_vmaxuw,         "0011","101","-","101","ddddd")	/* Doh! We did not put max with other max insns in v3 */
+ALU64_RRR_ENC(A2_minp,           "0011","101","-","110","ddddd")
+ALU64_RRR_ENC(A2_minup,          "0011","101","-","111","ddddd")
+
+ALU64_RRR_ENC(A2_vmaxub,         "0011","110","-","000","ddddd")
+ALU64_RRR_ENC(A2_vmaxh,          "0011","110","-","001","ddddd")
+ALU64_RRR_ENC(A2_vmaxuh,         "0011","110","-","010","ddddd")
+ALU64_RRR_ENC(A2_vmaxw,          "0011","110","-","011","ddddd")
+ALU64_RRR_ENC(A2_maxp,           "0011","110","-","100","ddddd")
+ALU64_RRR_ENC(A2_maxup,          "0011","110","-","101","ddddd")
+ALU64_RRR_ENC(A2_vmaxb,          "0011","110","-","110","ddddd")
+ALU64_RRR_ENC(A2_vminb,          "0011","110","-","111","ddddd")	/* EJP: Because vmaxuw out of place */
+
+ALU64_RRR_ENC(A2_andp,           "0011","111","-","000","ddddd")
+ALU64_RRR_ENC(A2_orp,            "0011","111","-","010","ddddd")
+ALU64_RRR_ENC(A2_xorp,           "0011","111","-","100","ddddd")
+ALU64_RRR_ENC(A4_andnp,          "0011","111","-","001","ddddd")
+ALU64_RRR_ENC(A4_ornp,           "0011","111","-","011","ddddd")
+
+ALU64_RRR_ENC(A4_modwrapu,       "0011","111","-","111","ddddd")
+
+
+
+DEF_FIELDROW_DESC32(ICLASS_ALU64" 0100 -------- PP------ --------","[#4] Rdd=(Rs,Rt)")
+LEGACY_ALU64_RRR_ENC(S2_packhl,  "0100","--0","-","---","ddddd")
+ALU64_RRR_ENC(A4_bitsplit,       "0100","--1","-","---","ddddd")
+
+
+DEF_FIELDROW_DESC32(ICLASS_ALU64" 0101 -------- PP------ --------","[#5] Rd=(Rs,Rt)")
+ALU64_RRR_ENC(A2_addh_l16_ll,    "0101","000","-","00-","ddddd")
+ALU64_RRR_ENC(A2_addh_l16_hl,    "0101","000","-","01-","ddddd")
+ALU64_RRR_ENC(A2_addh_l16_sat_ll,"0101","000","-","10-","ddddd")
+ALU64_RRR_ENC(A2_addh_l16_sat_hl,"0101","000","-","11-","ddddd")
+
+ALU64_RRR_ENC(A2_subh_l16_ll,    "0101","001","-","00-","ddddd")
+ALU64_RRR_ENC(A2_subh_l16_hl,    "0101","001","-","01-","ddddd")
+ALU64_RRR_ENC(A2_subh_l16_sat_ll,"0101","001","-","10-","ddddd")
+ALU64_RRR_ENC(A2_subh_l16_sat_hl,"0101","001","-","11-","ddddd")
+
+ALU64_RRR_ENC(A2_addh_h16_ll,    "0101","010","-","000","ddddd")
+ALU64_RRR_ENC(A2_addh_h16_lh,    "0101","010","-","001","ddddd")
+ALU64_RRR_ENC(A2_addh_h16_hl,    "0101","010","-","010","ddddd")
+ALU64_RRR_ENC(A2_addh_h16_hh,    "0101","010","-","011","ddddd")
+ALU64_RRR_ENC(A2_addh_h16_sat_ll,"0101","010","-","100","ddddd")
+ALU64_RRR_ENC(A2_addh_h16_sat_lh,"0101","010","-","101","ddddd")
+ALU64_RRR_ENC(A2_addh_h16_sat_hl,"0101","010","-","110","ddddd")
+ALU64_RRR_ENC(A2_addh_h16_sat_hh,"0101","010","-","111","ddddd")
+
+ALU64_RRR_ENC(A2_subh_h16_ll,    "0101","011","-","000","ddddd")
+ALU64_RRR_ENC(A2_subh_h16_lh,    "0101","011","-","001","ddddd")
+ALU64_RRR_ENC(A2_subh_h16_hl,    "0101","011","-","010","ddddd")
+ALU64_RRR_ENC(A2_subh_h16_hh,    "0101","011","-","011","ddddd")
+ALU64_RRR_ENC(A2_subh_h16_sat_ll,"0101","011","-","100","ddddd")
+ALU64_RRR_ENC(A2_subh_h16_sat_lh,"0101","011","-","101","ddddd")
+ALU64_RRR_ENC(A2_subh_h16_sat_hl,"0101","011","-","110","ddddd")
+ALU64_RRR_ENC(A2_subh_h16_sat_hh,"0101","011","-","111","ddddd")
+
+LEGACY_ALU64_RRR_ENC(A2_addsat,  "0101","100","-","0--","ddddd")
+LEGACY_ALU64_RRR_ENC(A2_subsat,  "0101","100","-","1--","ddddd")
+
+ALU64_RRR_ENC(A2_min,            "0101","101","-","0--","ddddd")
+ALU64_RRR_ENC(A2_minu,           "0101","101","-","1--","ddddd")
+
+ALU64_RRR_ENC(A2_max,            "0101","110","-","0--","ddddd")
+ALU64_RRR_ENC(A2_maxu,           "0101","110","-","1--","ddddd")
+
+ALU64_RRR_ENC(S4_parity,         "0101","111","-","---","ddddd")
+
+
+DEF_FIELDROW_DESC32(ICLASS_ALU64" 0110 -------- PP------ --------","[#6] Rd=#u10 ")
+DEF_ENC32(F2_sfimm_p,     ICLASS_ALU64" 0110   00i ----- PPiiiiii iiiddddd")
+DEF_ENC32(F2_sfimm_n,     ICLASS_ALU64" 0110   01i ----- PPiiiiii iiiddddd")
+
+
+DEF_FIELDROW_DESC32(ICLASS_ALU64" 0111 -------- PP------ --------","[#7] Rd=(Rs,Rt,#u6)")
+DEF_ENC32(M4_mpyrr_addi,  ICLASS_ALU64" 0111   0ii sssss PPittttt iiiddddd")
+
+
+DEF_FIELDROW_DESC32(ICLASS_ALU64" 1000 -------- PP------ --------","[#8] Rd=(Rs,#u6,#U6)")
+DEF_ENC32(M4_mpyri_addi,  ICLASS_ALU64" 1000   Iii sssss PPiddddd iiiIIIII")
+
+
+DEF_FIELDROW_DESC32(ICLASS_ALU64" 1001 -------- PP------ --------","[#9] Rdd=#u10 ")
+DEF_ENC32(F2_dfimm_p,     ICLASS_ALU64" 1001   00i ----- PPiiiiii iiiddddd")
+DEF_ENC32(F2_dfimm_n,     ICLASS_ALU64" 1001   01i ----- PPiiiiii iiiddddd")
+
+
+DEF_FIELDROW_DESC32(ICLASS_ALU64" 1010 -------- PP------ --------","[#10] Rx=(Rs,Rx,#s10)")
+DEF_ENC32(S4_or_andix,    ICLASS_ALU64" 1010   01i xxxxx PPiiiiii iiiuuuuu")
+DEF_ENC32(S4_or_andi,     ICLASS_ALU64" 1010   00i sssss PPiiiiii iiixxxxx")
+DEF_ENC32(S4_or_ori,      ICLASS_ALU64" 1010   10i sssss PPiiiiii iiixxxxx")
+
+
+DEF_FIELDROW_DESC32(ICLASS_ALU64" 1011 -------- PP------ --------","[#11] Rd=(Rs,Rd,#s6)")
+DEF_ENC32(S4_addaddi,     ICLASS_ALU64" 1011   0ii sssss PPiddddd iiiuuuuu")
+DEF_ENC32(S4_subaddi,     ICLASS_ALU64" 1011   1ii sssss PPiddddd iiiuuuuu")
+
+
+DEF_FIELDROW_DESC32(ICLASS_ALU64"     1100 -------- PP------ --------","[#12] Pd=(Rss,#s8)")
+DEF_ENC32(A4_vcmpbeqi,   ICLASS_ALU64"1100 000sssss PP-iiiii iii00-dd")
+DEF_ENC32(A4_vcmpbgti,   ICLASS_ALU64"1100 001sssss PP-iiiii iii00-dd")
+DEF_ENC32(A4_vcmpbgtui,  ICLASS_ALU64"1100 010sssss PP-0iiii iii00-dd")
+DEF_ENC32(A4_vcmpheqi,   ICLASS_ALU64"1100 000sssss PP-iiiii iii01-dd")
+DEF_ENC32(A4_vcmphgti,   ICLASS_ALU64"1100 001sssss PP-iiiii iii01-dd")
+DEF_ENC32(A4_vcmphgtui,  ICLASS_ALU64"1100 010sssss PP-0iiii iii01-dd")
+DEF_ENC32(A4_vcmpweqi,   ICLASS_ALU64"1100 000sssss PP-iiiii iii10-dd")
+DEF_ENC32(A4_vcmpwgti,   ICLASS_ALU64"1100 001sssss PP-iiiii iii10-dd")
+DEF_ENC32(A4_vcmpwgtui,  ICLASS_ALU64"1100 010sssss PP-0iiii iii10-dd")
+
+DEF_ENC32(F2_dfclass,    ICLASS_ALU64"1100 100sssss PP-000ii iii10-dd")
+
+
+DEF_FIELDROW_DESC32(ICLASS_ALU64" 1101 -------- PP------ --------","[#13] Pd=(Rs,#s8)")
+DEF_ENC32(A4_cmpbeqi,    ICLASS_ALU64"1101 -00sssss PP-iiiii iii00-dd")
+DEF_ENC32(A4_cmpbgti,    ICLASS_ALU64"1101 -01sssss PP-iiiii iii00-dd")
+DEF_ENC32(A4_cmpbgtui,   ICLASS_ALU64"1101 -10sssss PP-0iiii iii00-dd")
+DEF_ENC32(A4_cmpheqi,    ICLASS_ALU64"1101 -00sssss PP-iiiii iii01-dd")
+DEF_ENC32(A4_cmphgti,    ICLASS_ALU64"1101 -01sssss PP-iiiii iii01-dd")
+DEF_ENC32(A4_cmphgtui,   ICLASS_ALU64"1101 -10sssss PP-0iiii iii01-dd")
+
+
+DEF_FIELDROW_DESC32(ICLASS_ALU64" 1110 -------- PP------ --------","[#14] Rx=(#u9,op(Rx,#u5))")
+
+#define OP_OPI_RI(TAG,OPB)\
+DEF_ENC32(S4_andi_##TAG##_ri,ICLASS_ALU64" 1110 iiixxxxx PPiIIIII iii"OPB"i00-")\
+DEF_ENC32(S4_ori_##TAG##_ri, ICLASS_ALU64" 1110 iiixxxxx PPiIIIII iii"OPB"i01-")\
+DEF_ENC32(S4_addi_##TAG##_ri,ICLASS_ALU64" 1110 iiixxxxx PPiIIIII iii"OPB"i10-")\
+DEF_ENC32(S4_subi_##TAG##_ri,ICLASS_ALU64" 1110 iiixxxxx PPiIIIII iii"OPB"i11-")
+
+OP_OPI_RI(asl,"0")
+OP_OPI_RI(lsr,"1")
+
+
+DEF_FIELDROW_DESC32(ICLASS_ALU64" 1111 -------- PP------ --------","[#15] Rd=(Rs,Ru,#u6:2)")
+DEF_ENC32(M4_mpyri_addr_u2, ICLASS_ALU64" 1111   0ii sssss PPiddddd iiiuuuuu")
+DEF_ENC32(M4_mpyri_addr,    ICLASS_ALU64" 1111   1ii sssss PPiddddd iiiuuuuu")
+
+
+
+#undef FRAME_EXPLICIT
diff --git a/target/hexagon/imported/encode_subinsn.def b/target/hexagon/imported/encode_subinsn.def
new file mode 100644
index 0000000..ebb8119
--- /dev/null
+++ b/target/hexagon/imported/encode_subinsn.def
@@ -0,0 +1,150 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+
+/* DEF_ENC_SUBINSN(TAG, CLASS, ENCSTR) */
+
+
+
+
+/*********************/
+/* Ld1-type subinsns */
+/*********************/
+DEF_ENC_SUBINSN(SL1_loadri_io,   SUBINSN_L1, "0iiiissssdddd")
+DEF_ENC_SUBINSN(SL1_loadrub_io,  SUBINSN_L1, "1iiiissssdddd")
+
+/*********************/
+/* St1-type subinsns */
+/*********************/
+DEF_ENC_SUBINSN(SS1_storew_io,  SUBINSN_S1, "0ii iisssstttt")
+DEF_ENC_SUBINSN(SS1_storeb_io,  SUBINSN_S1, "1ii iisssstttt")
+
+
+/*********************/
+/* Ld2-type subinsns */
+/*********************/
+DEF_ENC_SUBINSN(SL2_loadrh_io,   SUBINSN_L2, "00i iissssdddd")
+DEF_ENC_SUBINSN(SL2_loadruh_io,  SUBINSN_L2, "01i iissssdddd")
+DEF_ENC_SUBINSN(SL2_loadrb_io,   SUBINSN_L2, "10i iissssdddd")
+DEF_ENC_SUBINSN(SL2_loadri_sp,   SUBINSN_L2, "111 0iiiiidddd")
+DEF_ENC_SUBINSN(SL2_loadrd_sp,   SUBINSN_L2, "111 10iiiiiddd")
+
+DEF_ENC_SUBINSN(SL2_deallocframe,SUBINSN_L2, "111 1100---0--")
+
+DEF_ENC_SUBINSN(SL2_return,      SUBINSN_L2, "111 1101---0--")
+DEF_ENC_SUBINSN(SL2_return_t,    SUBINSN_L2, "111 1101---100")
+DEF_ENC_SUBINSN(SL2_return_f,    SUBINSN_L2, "111 1101---101")
+DEF_ENC_SUBINSN(SL2_return_tnew, SUBINSN_L2, "111 1101---110")
+DEF_ENC_SUBINSN(SL2_return_fnew, SUBINSN_L2, "111 1101---111")
+
+DEF_ENC_SUBINSN(SL2_jumpr31,     SUBINSN_L2, "111 1111---0--")
+DEF_ENC_SUBINSN(SL2_jumpr31_t,   SUBINSN_L2, "111 1111---100")
+DEF_ENC_SUBINSN(SL2_jumpr31_f,   SUBINSN_L2, "111 1111---101")
+DEF_ENC_SUBINSN(SL2_jumpr31_tnew,SUBINSN_L2, "111 1111---110")
+DEF_ENC_SUBINSN(SL2_jumpr31_fnew,SUBINSN_L2, "111 1111---111")
+
+
+/*********************/
+/* St2-type subinsns */
+/*********************/
+DEF_ENC_SUBINSN(SS2_storeh_io,   SUBINSN_S2, "00i iisssstttt")
+DEF_ENC_SUBINSN(SS2_storew_sp,   SUBINSN_S2, "010 0iiiiitttt")
+DEF_ENC_SUBINSN(SS2_stored_sp,   SUBINSN_S2, "010 1iiiiiittt")
+
+DEF_ENC_SUBINSN(SS2_storewi0,    SUBINSN_S2, "100 00ssssiiii")
+DEF_ENC_SUBINSN(SS2_storewi1,    SUBINSN_S2, "100 01ssssiiii")
+DEF_ENC_SUBINSN(SS2_storebi0,    SUBINSN_S2, "100 10ssssiiii")
+DEF_ENC_SUBINSN(SS2_storebi1,    SUBINSN_S2, "100 11ssssiiii")
+
+DEF_ENC_SUBINSN(SS2_allocframe,  SUBINSN_S2, "111 0iiiii----")
+
+
+
+/*******************/
+/* A-type subinsns */
+/*******************/
+DEF_ENC_SUBINSN(SA1_addi,       SUBINSN_A, "00i iiiiiixxxx")
+DEF_ENC_SUBINSN(SA1_seti,       SUBINSN_A, "010 iiiiiidddd")
+DEF_ENC_SUBINSN(SA1_addsp,      SUBINSN_A, "011 iiiiiidddd")
+
+DEF_ENC_SUBINSN(SA1_tfr,        SUBINSN_A, "100 00ssssdddd")
+DEF_ENC_SUBINSN(SA1_inc,        SUBINSN_A, "100 01ssssdddd")
+DEF_ENC_SUBINSN(SA1_and1,       SUBINSN_A, "100 10ssssdddd")
+DEF_ENC_SUBINSN(SA1_dec,        SUBINSN_A, "100 11ssssdddd")
+
+DEF_ENC_SUBINSN(SA1_sxth,       SUBINSN_A, "101 00ssssdddd")
+DEF_ENC_SUBINSN(SA1_sxtb,       SUBINSN_A, "101 01ssssdddd")
+DEF_ENC_SUBINSN(SA1_zxth,       SUBINSN_A, "101 10ssssdddd")
+DEF_ENC_SUBINSN(SA1_zxtb,       SUBINSN_A, "101 11ssssdddd")
+
+
+DEF_ENC_SUBINSN(SA1_addrx,      SUBINSN_A, "110 00ssssxxxx")
+DEF_ENC_SUBINSN(SA1_cmpeqi,     SUBINSN_A, "110 01ssss--ii")
+DEF_ENC_SUBINSN(SA1_setin1,     SUBINSN_A, "110 1--0--dddd")
+DEF_ENC_SUBINSN(SA1_clrtnew,    SUBINSN_A, "110 1--100dddd")
+DEF_ENC_SUBINSN(SA1_clrfnew,    SUBINSN_A, "110 1--101dddd")
+DEF_ENC_SUBINSN(SA1_clrt,       SUBINSN_A, "110 1--110dddd")
+DEF_ENC_SUBINSN(SA1_clrf,       SUBINSN_A, "110 1--111dddd")
+
+
+DEF_ENC_SUBINSN(SA1_combine0i,  SUBINSN_A, "111 -0-ii00ddd")
+DEF_ENC_SUBINSN(SA1_combine1i,  SUBINSN_A, "111 -0-ii01ddd")
+DEF_ENC_SUBINSN(SA1_combine2i,  SUBINSN_A, "111 -0-ii10ddd")
+DEF_ENC_SUBINSN(SA1_combine3i,  SUBINSN_A, "111 -0-ii11ddd")
+DEF_ENC_SUBINSN(SA1_combinezr,  SUBINSN_A, "111 -1ssss0ddd")
+DEF_ENC_SUBINSN(SA1_combinerz,  SUBINSN_A, "111 -1ssss1ddd")
+
+
+
+
+/* maybe R=cmpeq ? */
+
+
+/* Add a group of NCJ: if (R.new==#0) jump:hint #r9 */
+/* Add a group of NCJ: if (R.new!=#0) jump:hint #r9 */
+/* NCJ goes with LD1, LD2 */
+
+
+
+
+DEF_FIELD32("---! !!!! !!!!!!!! EE------ --------",SUBFIELD_B_SLOT1,"B: Slot1 Instruction")
+DEF_FIELD32("---- ---- -------- EE-!!!!! !!!!!!!!",SUBFIELD_A_SLOT0,"A: Slot0 Instruction")
+
+
+/* DEF_PACKED32(TAG, CLASSA, CLASSB, ENCSTR) */
+
+DEF_PACKED32(P2_PACKED_L1_L1, SUBINSN_L1, SUBINSN_L1, "000B BBBB BBBB BBBB EE0A AAAA AAAA AAAA")
+DEF_PACKED32(P2_PACKED_L1_L2, SUBINSN_L2, SUBINSN_L1, "000B BBBB BBBB BBBB EE1A AAAA AAAA AAAA")
+DEF_PACKED32(P2_PACKED_L2_L2, SUBINSN_L2, SUBINSN_L2, "001B BBBB BBBB BBBB EE0A AAAA AAAA AAAA")
+DEF_PACKED32(P2_PACKED_A_A,   SUBINSN_A,  SUBINSN_A,  "001B BBBB BBBB BBBB EE1A AAAA AAAA AAAA")
+
+DEF_PACKED32(P2_PACKED_L1_A,  SUBINSN_L1, SUBINSN_A,  "010B BBBB BBBB BBBB EE0A AAAA AAAA AAAA")
+DEF_PACKED32(P2_PACKED_L2_A,  SUBINSN_L2, SUBINSN_A,  "010B BBBB BBBB BBBB EE1A AAAA AAAA AAAA")
+DEF_PACKED32(P2_PACKED_S1_A,  SUBINSN_S1, SUBINSN_A,  "011B BBBB BBBB BBBB EE0A AAAA AAAA AAAA")
+DEF_PACKED32(P2_PACKED_S2_A,  SUBINSN_S2, SUBINSN_A,  "011B BBBB BBBB BBBB EE1A AAAA AAAA AAAA")
+
+DEF_PACKED32(P2_PACKED_S1_L1, SUBINSN_S1, SUBINSN_L1, "100B BBBB BBBB BBBB EE0A AAAA AAAA AAAA")
+DEF_PACKED32(P2_PACKED_S1_L2, SUBINSN_S1, SUBINSN_L2, "100B BBBB BBBB BBBB EE1A AAAA AAAA AAAA")
+DEF_PACKED32(P2_PACKED_S1_S1, SUBINSN_S1, SUBINSN_S1, "101B BBBB BBBB BBBB EE0A AAAA AAAA AAAA")
+DEF_PACKED32(P2_PACKED_S1_S2, SUBINSN_S2, SUBINSN_S1, "101B BBBB BBBB BBBB EE1A AAAA AAAA AAAA")
+
+DEF_PACKED32(P2_PACKED_S2_L1, SUBINSN_S2, SUBINSN_L1, "110B BBBB BBBB BBBB EE0A AAAA AAAA AAAA")
+DEF_PACKED32(P2_PACKED_S2_L2, SUBINSN_S2, SUBINSN_L2, "110B BBBB BBBB BBBB EE1A AAAA AAAA AAAA")
+DEF_PACKED32(P2_PACKED_S2_S2, SUBINSN_S2, SUBINSN_S2, "111B BBBB BBBB BBBB EE0A AAAA AAAA AAAA")
+
+DEF_PACKED32(P2_PACKED_RESERVED, SUBINSN_INVALID, SUBINSN_INVALID, "111B BBBB BBBB BBBB EE1A AAAA AAAA AAAA")
+
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 19/67] Hexagon instruction class definitions
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (17 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 18/67] Hexagon arch import - instruction encoding Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-02-28 16:43 ` [RFC PATCH v2 20/67] Hexagon instruction utility functions Taylor Simpson
                   ` (48 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

    Imported from the Hexagon architecture library

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/imported/iclass.def | 52 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)
 create mode 100644 target/hexagon/imported/iclass.def

diff --git a/target/hexagon/imported/iclass.def b/target/hexagon/imported/iclass.def
new file mode 100644
index 0000000..4ef725f
--- /dev/null
+++ b/target/hexagon/imported/iclass.def
@@ -0,0 +1,52 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+/* DEF_*(TYPE,SLOTS,UNITS) */
+DEF_PP_ICLASS32(EXTENDER,0123,LDST|SUNIT|MUNIT) /* 0 */
+DEF_PP_ICLASS32(CJ,0123,CTRLFLOW) /* 1 */
+DEF_PP_ICLASS32(NCJ,01,LDST|CTRLFLOW) /* 2 */
+DEF_PP_ICLASS32(V4LDST,01,LDST) /* 3 */
+DEF_PP_ICLASS32(V2LDST,01,LDST) /* 4 */
+DEF_PP_ICLASS32(J,0123,CTRLFLOW)  /* 5 */
+DEF_PP_ICLASS32(CR,3,SUNIT)     /* 6 */
+DEF_PP_ICLASS32(ALU32_2op,0123,LDST|SUNIT|MUNIT) /* 7 */
+DEF_PP_ICLASS32(S_2op,23,SUNIT|MUNIT)               /* 8 */
+DEF_PP_ICLASS32(LD,01,LDST)                    /* 9 */
+DEF_PP_ICLASS32(ST,01,LDST)                        /* 10 */
+DEF_PP_ICLASS32(ALU32_ADDI,0123,LDST|SUNIT|MUNIT) /* 11 */
+DEF_PP_ICLASS32(S_3op,23,SUNIT|MUNIT)               /* 12 */
+DEF_PP_ICLASS32(ALU64,23,SUNIT|MUNIT)             /* 13 */
+DEF_PP_ICLASS32(M,23,SUNIT|MUNIT)                 /* 14 */
+DEF_PP_ICLASS32(ALU32_3op,0123,LDST|SUNIT|MUNIT) /* 15 */
+
+DEF_EE_ICLASS32(EE0,01,INVALID) /* 0 */
+DEF_EE_ICLASS32(EE1,01,INVALID) /* 1 */
+DEF_EE_ICLASS32(EE2,01,INVALID) /* 2 */
+DEF_EE_ICLASS32(EE3,01,INVALID) /* 3 */
+DEF_EE_ICLASS32(EE4,01,INVALID) /* 4 */
+DEF_EE_ICLASS32(EE5,01,INVALID) /* 5 */
+DEF_EE_ICLASS32(EE6,01,INVALID) /* 6 */
+DEF_EE_ICLASS32(EE7,01,INVALID) /* 7 */
+DEF_EE_ICLASS32(EE8,01,INVALID) /* 8 */
+DEF_EE_ICLASS32(EE9,01,INVALID) /* 9 */
+DEF_EE_ICLASS32(EEA,01,INVALID) /* 10 */
+DEF_EE_ICLASS32(EEB,01,INVALID) /* 11 */
+DEF_EE_ICLASS32(EEC,01,INVALID) /* 12 */
+DEF_EE_ICLASS32(EED,01,INVALID) /* 13 */
+DEF_EE_ICLASS32(EEE,01,INVALID) /* 14 */
+DEF_EE_ICLASS32(EEF,01,INVALID) /* 15 */
+
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 20/67] Hexagon instruction utility functions
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (18 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 19/67] Hexagon instruction class definitions Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-04-09 18:53   ` Brian Cain
  2020-02-28 16:43 ` [RFC PATCH v2 21/67] Hexagon generator phase 1 - C preprocessor for semantics Taylor Simpson
                   ` (47 subsequent siblings)
  67 siblings, 1 reply; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Utility functions called by various instructions

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/arch.h     |  62 ++++
 target/hexagon/conv_emu.h |  50 +++
 target/hexagon/fma_emu.h  |  30 ++
 target/hexagon/arch.c     | 663 +++++++++++++++++++++++++++++++++
 target/hexagon/conv_emu.c | 369 +++++++++++++++++++
 target/hexagon/fma_emu.c  | 916 ++++++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 2090 insertions(+)
 create mode 100644 target/hexagon/arch.h
 create mode 100644 target/hexagon/conv_emu.h
 create mode 100644 target/hexagon/fma_emu.h
 create mode 100644 target/hexagon/arch.c
 create mode 100644 target/hexagon/conv_emu.c
 create mode 100644 target/hexagon/fma_emu.c

diff --git a/target/hexagon/arch.h b/target/hexagon/arch.h
new file mode 100644
index 0000000..716bbdd
--- /dev/null
+++ b/target/hexagon/arch.h
@@ -0,0 +1,62 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HEXAGON_ARCH_H
+#define HEXAGON_ARCH_H
+
+#include "cpu.h"
+#include "hex_arch_types.h"
+
+extern size1u_t rLPS_table_64x4[64][4];
+extern size1u_t AC_next_state_MPS_64[64];
+extern size1u_t AC_next_state_LPS_64[64];
+
+extern size4u_t fbrevaddr(size4u_t pointer);
+extern size4u_t count_ones_2(size2u_t src);
+extern size4u_t count_ones_4(size4u_t src);
+extern size4u_t count_ones_8(size8u_t src);
+extern size4u_t count_leading_ones_8(size8u_t src);
+extern size4u_t count_leading_ones_4(size4u_t src);
+extern size4u_t count_leading_ones_2(size2u_t src);
+extern size4u_t count_leading_ones_1(size1u_t src);
+extern size8u_t reverse_bits_8(size8u_t src);
+extern size4u_t reverse_bits_4(size4u_t src);
+extern size4u_t reverse_bits_2(size2u_t src);
+extern size4u_t reverse_bits_1(size1u_t src);
+extern size8u_t exchange(size8u_t bits, size4u_t cntrl);
+extern size8u_t interleave(size4u_t odd, size4u_t even);
+extern size8u_t deinterleave(size8u_t src);
+extern size4u_t carry_from_add64(size8u_t a, size8u_t b, size4u_t c);
+extern size4s_t conv_round(size4s_t a, int n);
+extern size16s_t cast8s_to_16s(size8s_t a);
+extern size8s_t cast16s_to_8s(size16s_t a);
+extern size4s_t cast16s_to_4s(size16s_t a);
+extern size16s_t add128(size16s_t a, size16s_t b);
+extern size16s_t sub128(size16s_t a, size16s_t b);
+extern size16s_t shiftr128(size16s_t a, size4u_t n);
+extern size16s_t shiftl128(size16s_t a, size4u_t n);
+extern size16s_t and128(size16s_t a, size16s_t b);
+extern void arch_fpop_start(CPUHexagonState *env);
+extern void arch_fpop_end(CPUHexagonState *env);
+extern void arch_raise_fpflag(unsigned int flags);
+extern int arch_sf_recip_common(size4s_t *Rs, size4s_t *Rt, size4s_t *Rd,
+                                int *adjust);
+extern int arch_sf_invsqrt_common(size4s_t *Rs, size4s_t *Rd, int *adjust);
+extern int arch_recip_lookup(int index);
+extern int arch_invsqrt_lookup(int index);
+
+#endif
diff --git a/target/hexagon/conv_emu.h b/target/hexagon/conv_emu.h
new file mode 100644
index 0000000..50d9d2c
--- /dev/null
+++ b/target/hexagon/conv_emu.h
@@ -0,0 +1,50 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HEXAGON_CONV_EMU_H
+#define HEXAGON_CONV_EMU_H
+
+#include "hex_arch_types.h"
+
+extern size8u_t conv_sf_to_8u(float in);
+extern size4u_t conv_sf_to_4u(float in);
+extern size8s_t conv_sf_to_8s(float in);
+extern size4s_t conv_sf_to_4s(float in);
+
+extern size8u_t conv_df_to_8u(double in);
+extern size4u_t conv_df_to_4u(double in);
+extern size8s_t conv_df_to_8s(double in);
+extern size4s_t conv_df_to_4s(double in);
+
+extern double conv_8u_to_df(size8u_t in);
+extern double conv_4u_to_df(size4u_t in);
+extern double conv_8s_to_df(size8s_t in);
+extern double conv_4s_to_df(size4s_t in);
+
+extern float conv_8u_to_sf(size8u_t in);
+extern float conv_4u_to_sf(size4u_t in);
+extern float conv_8s_to_sf(size8s_t in);
+extern float conv_4s_to_sf(size4s_t in);
+
+extern float conv_df_to_sf(double in);
+
+static inline double conv_sf_to_df(float in)
+{
+    return in;
+}
+
+#endif
diff --git a/target/hexagon/fma_emu.h b/target/hexagon/fma_emu.h
new file mode 100644
index 0000000..3a54f25
--- /dev/null
+++ b/target/hexagon/fma_emu.h
@@ -0,0 +1,30 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HEXAGON_FMA_EMU_H
+#define HEXAGON_FMA_EMU_H
+
+extern float internal_fmafx(float a_in, float b_in, float c_in, int scale);
+extern float internal_fmaf(float a_in, float b_in, float c_in);
+extern double internal_fma(double a_in, double b_in, double c_in);
+extern double internal_fmax(double a_in, double b_in, double c_in, int scale);
+extern float internal_mpyf(float a_in, float b_in);
+extern double internal_mpy(double a_in, double b_in);
+extern double internal_mpyhh(double a_in, double b_in,
+                             unsigned long long int accumulated);
+
+#endif
diff --git a/target/hexagon/arch.c b/target/hexagon/arch.c
new file mode 100644
index 0000000..9831e2c
--- /dev/null
+++ b/target/hexagon/arch.c
@@ -0,0 +1,663 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include <math.h>
+#include "fma_emu.h"
+#include "arch.h"
+#include "macros.h"
+
+/*
+ * These three tables are used by the cabacdecbin instruction
+ */
+size1u_t rLPS_table_64x4[64][4] = {
+    {128, 176, 208, 240},
+    {128, 167, 197, 227},
+    {128, 158, 187, 216},
+    {123, 150, 178, 205},
+    {116, 142, 169, 195},
+    {111, 135, 160, 185},
+    {105, 128, 152, 175},
+    {100, 122, 144, 166},
+    {95, 116, 137, 158},
+    {90, 110, 130, 150},
+    {85, 104, 123, 142},
+    {81, 99, 117, 135},
+    {77, 94, 111, 128},
+    {73, 89, 105, 122},
+    {69, 85, 100, 116},
+    {66, 80, 95, 110},
+    {62, 76, 90, 104},
+    {59, 72, 86, 99},
+    {56, 69, 81, 94},
+    {53, 65, 77, 89},
+    {51, 62, 73, 85},
+    {48, 59, 69, 80},
+    {46, 56, 66, 76},
+    {43, 53, 63, 72},
+    {41, 50, 59, 69},
+    {39, 48, 56, 65},
+    {37, 45, 54, 62},
+    {35, 43, 51, 59},
+    {33, 41, 48, 56},
+    {32, 39, 46, 53},
+    {30, 37, 43, 50},
+    {29, 35, 41, 48},
+    {27, 33, 39, 45},
+    {26, 31, 37, 43},
+    {24, 30, 35, 41},
+    {23, 28, 33, 39},
+    {22, 27, 32, 37},
+    {21, 26, 30, 35},
+    {20, 24, 29, 33},
+    {19, 23, 27, 31},
+    {18, 22, 26, 30},
+    {17, 21, 25, 28},
+    {16, 20, 23, 27},
+    {15, 19, 22, 25},
+    {14, 18, 21, 24},
+    {14, 17, 20, 23},
+    {13, 16, 19, 22},
+    {12, 15, 18, 21},
+    {12, 14, 17, 20},
+    {11, 14, 16, 19},
+    {11, 13, 15, 18},
+    {10, 12, 15, 17},
+    {10, 12, 14, 16},
+    {9, 11, 13, 15},
+    {9, 11, 12, 14},
+    {8, 10, 12, 14},
+    {8, 9, 11, 13},
+    {7, 9, 11, 12},
+    {7, 9, 10, 12},
+    {7, 8, 10, 11},
+    {6, 8, 9, 11},
+    {6, 7, 9, 10},
+    {6, 7, 8, 9},
+    {2, 2, 2, 2}
+};
+
+size1u_t AC_next_state_MPS_64[64] = {
+    1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
+    11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
+    21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
+    31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
+    41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
+    51, 52, 53, 54, 55, 56, 57, 58, 59, 60,
+    61, 62, 62, 63
+};
+
+
+size1u_t AC_next_state_LPS_64[64] = {
+    0, 0, 1, 2, 2, 4, 4, 5, 6, 7,
+    8, 9, 9, 11, 11, 12, 13, 13, 15, 15,
+    16, 16, 18, 18, 19, 19, 21, 21, 22, 22,
+    23, 24, 24, 25, 26, 26, 27, 27, 28, 29,
+    29, 30, 30, 30, 31, 32, 32, 33, 33, 33,
+    34, 34, 35, 35, 35, 36, 36, 36, 37, 37,
+    37, 38, 38, 63
+};
+
+size4u_t fbrevaddr(size4u_t pointer)
+{
+    size4u_t offset = pointer & 0xffff;
+    size4u_t brevoffset = 0;
+    int i;
+
+    for (i = 0; i < 16; i++) {
+        fSETBIT(i, brevoffset, (fGETBIT((15 - i), offset)));
+    }
+    return (pointer & 0xffff0000) | brevoffset;
+}
+
+/*
+ * Counting bits set, Brian Kernighan's way
+ * Brian Kernighan's method goes through as many iterations as there are
+ * set bits. So if we have a 32-bit word with only the high bit set,
+ * then it will only go once through the loop.
+ */
+size4u_t count_ones_2(size2u_t src)
+{
+    int ret;
+    for (ret = 0; src; ret++) {
+        src &= src - 1;    /* clear the least significant bit set */
+    }
+    return ret;
+}
+
+ size4u_t count_ones_4(size4u_t src)
+{
+    int ret;
+    for (ret = 0; src; ret++) {
+        src &= src - 1;    /* clear the least significant bit set */
+    }
+    return ret;
+}
+
+size4u_t count_ones_8(size8u_t src)
+{
+    int ret;
+    for (ret = 0; src; ret++) {
+        src &= src - 1;    /* clear the least significant bit set */
+    }
+    return ret;
+}
+
+size4u_t count_leading_ones_8(size8u_t src)
+{
+    int ret;
+    for (ret = 0; src & 0x8000000000000000LL; src <<= 1) {
+        ret++;
+    }
+    return ret;
+}
+
+
+size4u_t count_leading_ones_4(size4u_t src)
+{
+    int ret;
+    for (ret = 0; src & 0x80000000; src <<= 1) {
+        ret++;
+    }
+    return ret;
+}
+
+size4u_t count_leading_ones_2(size2u_t src)
+{
+    int ret;
+    for (ret = 0; src & 0x8000; src <<= 1) {
+        ret++;
+    }
+    return ret;
+}
+
+size4u_t count_leading_ones_1(size1u_t src)
+{
+    int ret;
+    for (ret = 0; src & 0x80; src <<= 1) {
+        ret++;
+    }
+    return ret;
+}
+
+#define BITS_MASK_8 0x5555555555555555ULL
+#define PAIR_MASK_8 0x3333333333333333ULL
+#define NYBL_MASK_8 0x0f0f0f0f0f0f0f0fULL
+#define BYTE_MASK_8 0x00ff00ff00ff00ffULL
+#define HALF_MASK_8 0x0000ffff0000ffffULL
+#define WORD_MASK_8 0x00000000ffffffffULL
+
+size8u_t reverse_bits_8(size8u_t src)
+{
+    src = ((src >> 1) & BITS_MASK_8) | ((src & BITS_MASK_8) << 1);
+    src = ((src >> 2) & PAIR_MASK_8) | ((src & PAIR_MASK_8) << 2);
+    src = ((src >> 4) & NYBL_MASK_8) | ((src & NYBL_MASK_8) << 4);
+    src = ((src >> 8) & BYTE_MASK_8) | ((src & BYTE_MASK_8) << 8);
+    src = ((src >> 16) & HALF_MASK_8) | ((src & HALF_MASK_8) << 16);
+    src = ((src >> 32) & WORD_MASK_8) | ((src & WORD_MASK_8) << 32);
+    return src;
+}
+
+
+#define BITS_MASK_4 0x55555555ULL
+#define PAIR_MASK_4 0x33333333ULL
+#define NYBL_MASK_4 0x0f0f0f0fULL
+#define BYTE_MASK_4 0x00ff00ffULL
+#define HALF_MASK_4 0x0000ffffULL
+
+size4u_t reverse_bits_4(size4u_t src)
+{
+    src = ((src >> 1) & BITS_MASK_4) | ((src & BITS_MASK_4) << 1);
+    src = ((src >> 2) & PAIR_MASK_4) | ((src & PAIR_MASK_4) << 2);
+    src = ((src >> 4) & NYBL_MASK_4) | ((src & NYBL_MASK_4) << 4);
+    src = ((src >> 8) & BYTE_MASK_4) | ((src & BYTE_MASK_4) << 8);
+    src = ((src >> 16) & HALF_MASK_4) | ((src & HALF_MASK_4) << 16);
+    return src;
+}
+
+#define BITS_MASK_2 0x5555ULL
+#define PAIR_MASK_2 0x3333ULL
+#define NYBL_MASK_2 0x0f0fULL
+#define BYTE_MASK_2 0x00ffULL
+#define HALF_MASK_2 0x0000ULL
+
+size4u_t reverse_bits_2(size2u_t src)
+{
+    src = ((src >> 1) & BITS_MASK_2) | ((src & BITS_MASK_2) << 1);
+    src = ((src >> 2) & PAIR_MASK_2) | ((src & PAIR_MASK_2) << 2);
+    src = ((src >> 4) & NYBL_MASK_2) | ((src & NYBL_MASK_2) << 4);
+    src = ((src >> 8) & BYTE_MASK_2) | ((src & BYTE_MASK_2) << 8);
+    return src;
+}
+
+#define BITS_MASK_1 0x55ULL
+#define PAIR_MASK_1 0x33ULL
+#define NYBL_MASK_1 0x0fULL
+#define BYTE_MASK_1 0x00ULL
+#define HALF_MASK_1 0x00ULL
+
+size4u_t reverse_bits_1(size1u_t src)
+{
+    src = ((src >> 1) & BITS_MASK_1) | ((src & BITS_MASK_1) << 1);
+    src = ((src >> 2) & PAIR_MASK_1) | ((src & PAIR_MASK_1) << 2);
+    src = ((src >> 4) & NYBL_MASK_1) | ((src & NYBL_MASK_1) << 4);
+    return src;
+}
+
+size8u_t exchange(size8u_t bits, size4u_t cntrl)
+{
+    int i;
+    size4u_t mask = 1 << 31;
+    size8u_t mask0 = 0x1LL << 62;
+    size8u_t mask1 = 0x2LL << 62;
+    size8u_t outbits = 0;
+    size8u_t b0, b1;
+
+    for (i = 0; i < 32; i++) {
+        b0 = (bits & mask0) >> (2 * i);
+        b1 = (bits & mask1) >> (2 * i);
+        if (cntrl & mask) {
+            outbits |= (b1 >> 1) | (b0 << 1);
+        } else {
+            outbits |= (b1 | b0);
+        }
+        outbits <<= 2;
+        mask >>= 1;
+        mask0 >>= 2;
+        mask1 >>= 2;
+    }
+    return outbits;
+}
+
+size8u_t interleave(size4u_t odd, size4u_t even)
+{
+    /* Convert to long long */
+    size8u_t myodd = odd;
+    size8u_t myeven = even;
+    /* First, spread bits out */
+    myodd = (myodd | (myodd << 16)) & HALF_MASK_8;
+    myeven = (myeven | (myeven << 16)) & HALF_MASK_8;
+    myodd = (myodd | (myodd << 8)) & BYTE_MASK_8;
+    myeven = (myeven | (myeven << 8)) & BYTE_MASK_8;
+    myodd = (myodd | (myodd << 4)) & NYBL_MASK_8;
+    myeven = (myeven | (myeven << 4)) & NYBL_MASK_8;
+    myodd = (myodd | (myodd << 2)) & PAIR_MASK_8;
+    myeven = (myeven | (myeven << 2)) & PAIR_MASK_8;
+    myodd = (myodd | (myodd << 1)) & BITS_MASK_8;
+    myeven = (myeven | (myeven << 1)) & BITS_MASK_8;
+    /* Now OR together */
+    return myeven | (myodd << 1);
+}
+
+size8u_t deinterleave(size8u_t src)
+{
+    /* Get odd and even bits */
+    size8u_t myodd = ((src >> 1) & BITS_MASK_8);
+    size8u_t myeven = (src & BITS_MASK_8);
+
+    /* Unspread bits */
+    myeven = (myeven | (myeven >> 1)) & PAIR_MASK_8;
+    myodd = (myodd | (myodd >> 1)) & PAIR_MASK_8;
+    myeven = (myeven | (myeven >> 2)) & NYBL_MASK_8;
+    myodd = (myodd | (myodd >> 2)) & NYBL_MASK_8;
+    myeven = (myeven | (myeven >> 4)) & BYTE_MASK_8;
+    myodd = (myodd | (myodd >> 4)) & BYTE_MASK_8;
+    myeven = (myeven | (myeven >> 8)) & HALF_MASK_8;
+    myodd = (myodd | (myodd >> 8)) & HALF_MASK_8;
+    myeven = (myeven | (myeven >> 16)) & WORD_MASK_8;
+    myodd = (myodd | (myodd >> 16)) & WORD_MASK_8;
+
+    /* Return odd bits in upper half */
+    return myeven | (myodd << 32);
+}
+
+size4u_t carry_from_add64(size8u_t a, size8u_t b, size4u_t c)
+{
+    size8u_t tmpa, tmpb, tmpc;
+    tmpa = fGETUWORD(0, a);
+    tmpb = fGETUWORD(0, b);
+    tmpc = tmpa + tmpb + c;
+    tmpa = fGETUWORD(1, a);
+    tmpb = fGETUWORD(1, b);
+    tmpc = tmpa + tmpb + fGETUWORD(1, tmpc);
+    tmpc = fGETUWORD(1, tmpc);
+    return tmpc;
+}
+
+size4s_t conv_round(size4s_t a, int n)
+{
+    size8s_t val;
+
+    if (n == 0) {
+        val = a;
+    } else if ((a & ((1 << (n - 1)) - 1)) == 0) {    /* N-1..0 all zero? */
+        /* Add LSB from int part */
+        val = ((fSE32_64(a)) + (size8s_t) (((size4u_t) ((1 << n) & a)) >> 1));
+    } else {
+        val = ((fSE32_64(a)) + (1 << (n - 1)));
+    }
+
+    val = val >> n;
+    return (size4s_t)val;
+}
+
+size16s_t cast8s_to_16s(size8s_t a)
+{
+    size16s_t result = {.hi = 0, .lo = 0};
+    result.lo = a;
+    if (a < 0) {
+        result.hi = -1;
+    }
+    return result;
+}
+
+size8s_t cast16s_to_8s(size16s_t a)
+{
+    return a.lo;
+}
+
+size4s_t cast16s_to_4s(size16s_t a)
+{
+    return (size4s_t)a.lo;
+}
+
+size16s_t add128(size16s_t a, size16s_t b)
+{
+    size16s_t result = {.hi = 0, .lo = 0};
+    result.lo = a.lo + b.lo;
+    result.hi = a.hi + b.hi;
+
+    if (result.lo < b.lo) {
+        result.hi++;
+    }
+
+    return result;
+}
+
+size16s_t sub128(size16s_t a, size16s_t b)
+{
+    size16s_t result = {.hi = 0, .lo = 0};
+    result.lo = a.lo - b.lo;
+    result.hi = a.hi - b.hi;
+    if (result.lo > a.lo) {
+        result.hi--;
+    }
+
+    return result;
+}
+
+size16s_t shiftr128(size16s_t a, size4u_t n)
+{
+    size16s_t result;
+    result.lo = (a.lo >> n) | (a.hi << (64 - n));
+    result.hi = a.hi >> n;
+    return result;
+}
+
+size16s_t shiftl128(size16s_t a, size4u_t n)
+{
+    size16s_t result;
+    result.lo = a.lo << n;
+    result.hi = (a.hi << n) | (a.lo >> (64 - n));
+    return result;
+}
+
+size16s_t and128(size16s_t a, size16s_t b)
+{
+    size16s_t result;
+    result.lo = a.lo & b.lo;
+    result.hi = a.hi & b.hi;
+    return result;
+}
+
+/* Floating Point Stuff */
+
+static const int roundingmodes[] = {
+    FE_TONEAREST,
+    FE_TOWARDZERO,
+    FE_DOWNWARD,
+    FE_UPWARD
+};
+
+void arch_fpop_start(CPUHexagonState *env)
+{
+    fegetenv(&env->fenv);
+    feclearexcept(FE_ALL_EXCEPT);
+    fesetround(roundingmodes[fREAD_REG_FIELD(USR, USR_FPRND)]);
+}
+
+#define NOTHING             /* Don't do anything */
+
+#define TEST_FLAG(LIBCF, MYF, MYE) \
+    do { \
+        if (fetestexcept(LIBCF)) { \
+            if (GET_USR_FIELD(USR_##MYF) == 0) { \
+                SET_USR_FIELD(USR_##MYF, 1); \
+                if (GET_USR_FIELD(USR_##MYE)) { \
+                    NOTHING \
+                } \
+            } \
+        } \
+    } while (0)
+
+void arch_fpop_end(CPUHexagonState *env)
+{
+    if (fetestexcept(FE_ALL_EXCEPT)) {
+        TEST_FLAG(FE_INEXACT, FPINPF, FPINPE);
+        TEST_FLAG(FE_DIVBYZERO, FPDBZF, FPDBZE);
+        TEST_FLAG(FE_INVALID, FPINVF, FPINVE);
+        TEST_FLAG(FE_OVERFLOW, FPOVFF, FPOVFE);
+        TEST_FLAG(FE_UNDERFLOW, FPUNFF, FPUNFE);
+    }
+    fesetenv(&env->fenv);
+}
+
+#undef TEST_FLAG
+
+
+void arch_raise_fpflag(unsigned int flags)
+{
+    feraiseexcept(flags);
+}
+
+int arch_sf_recip_common(size4s_t *Rs, size4s_t *Rt, size4s_t *Rd, int *adjust)
+{
+    int n_class;
+    int d_class;
+    int n_exp;
+    int d_exp;
+    int ret = 0;
+    size4s_t RsV, RtV, RdV;
+    int PeV = 0;
+    RsV = *Rs;
+    RtV = *Rt;
+    n_class = fpclassify(fFLOAT(RsV));
+    d_class = fpclassify(fFLOAT(RtV));
+    if ((n_class == FP_NAN) && (d_class == FP_NAN)) {
+        if (fGETBIT(22, RsV & RtV) == 0) {
+            fRAISEFLAGS(FE_INVALID);
+        }
+        RdV = RsV = RtV = fSFNANVAL();
+    } else if (n_class == FP_NAN) {
+        if (fGETBIT(22, RsV) == 0) {
+            fRAISEFLAGS(FE_INVALID);
+        }
+        RdV = RsV = RtV = fSFNANVAL();
+    } else if (d_class == FP_NAN) {
+        /* EJP: or put NaN in num/den fixup? */
+        if (fGETBIT(22, RtV) == 0) {
+            fRAISEFLAGS(FE_INVALID);
+        }
+        RdV = RsV = RtV = fSFNANVAL();
+    } else if ((n_class == FP_INFINITE) && (d_class == FP_INFINITE)) {
+        /* EJP: or put Inf in num fixup? */
+        RdV = RsV = RtV = fSFNANVAL();
+        fRAISEFLAGS(FE_INVALID);
+    } else if ((n_class == FP_ZERO) && (d_class == FP_ZERO)) {
+        /* EJP: or put zero in num fixup? */
+        RdV = RsV = RtV = fSFNANVAL();
+        fRAISEFLAGS(FE_INVALID);
+    } else if (d_class == FP_ZERO) {
+        /* EJP: or put Inf in num fixup? */
+        RsV = fSFINFVAL(RsV ^ RtV);
+        RtV = fSFONEVAL(0);
+        RdV = fSFONEVAL(0);
+        if (n_class != FP_INFINITE) {
+            fRAISEFLAGS(FE_DIVBYZERO);
+        }
+    } else if (d_class == FP_INFINITE) {
+        RsV = 0x80000000 & (RsV ^ RtV);
+        RtV = fSFONEVAL(0);
+        RdV = fSFONEVAL(0);
+    } else if (n_class == FP_ZERO) {
+        /* EJP: Does this just work itself out? */
+        /* EJP: No, 0/Inf causes problems. */
+        RsV = 0x80000000 & (RsV ^ RtV);
+        RtV = fSFONEVAL(0);
+        RdV = fSFONEVAL(0);
+    } else if (n_class == FP_INFINITE) {
+        /* EJP: Does this just work itself out? */
+        RsV = fSFINFVAL(RsV ^ RtV);
+        RtV = fSFONEVAL(0);
+        RdV = fSFONEVAL(0);
+    } else {
+        PeV = 0x00;
+        /* Basic checks passed */
+        n_exp = fSF_GETEXP(RsV);
+        d_exp = fSF_GETEXP(RtV);
+        if ((n_exp - d_exp + fSF_BIAS()) <= fSF_MANTBITS()) {
+            /* Near quotient underflow / inexact Q */
+            PeV = 0x80;
+            RtV = fSF_MUL_POW2(RtV, -64);
+            RsV = fSF_MUL_POW2(RsV, 64);
+        } else if ((n_exp - d_exp + fSF_BIAS()) > (fSF_MAXEXP() - 24)) {
+            /* Near quotient overflow */
+            PeV = 0x40;
+            RtV = fSF_MUL_POW2(RtV, 32);
+            RsV = fSF_MUL_POW2(RsV, -32);
+        } else if (n_exp <= fSF_MANTBITS() + 2) {
+            RtV = fSF_MUL_POW2(RtV, 64);
+            RsV = fSF_MUL_POW2(RsV, 64);
+        } else if (d_exp <= 1) {
+            RtV = fSF_MUL_POW2(RtV, 32);
+            RsV = fSF_MUL_POW2(RsV, 32);
+        } else if (d_exp > 252) {
+            RtV = fSF_MUL_POW2(RtV, -32);
+            RsV = fSF_MUL_POW2(RsV, -32);
+        }
+        RdV = 0;
+        ret = 1;
+    }
+    *Rs = RsV;
+    *Rt = RtV;
+    *Rd = RdV;
+    *adjust = PeV;
+    return ret;
+}
+
+int arch_sf_invsqrt_common(size4s_t *Rs, size4s_t *Rd, int *adjust)
+{
+    int r_class;
+    size4s_t RsV, RdV;
+    int PeV = 0;
+    int r_exp;
+    int ret = 0;
+    RsV = *Rs;
+    r_class = fpclassify(fFLOAT(RsV));
+    if (r_class == FP_NAN) {
+        if (fGETBIT(22, RsV) == 0) {
+            fRAISEFLAGS(FE_INVALID);
+        }
+        RdV = RsV = fSFNANVAL();
+    } else if (fFLOAT(RsV) < 0.0) {
+        /* Negative nonzero values are NaN */
+        fRAISEFLAGS(FE_INVALID);
+        RsV = fSFNANVAL();
+        RdV = fSFNANVAL();
+    } else if (r_class == FP_INFINITE) {
+        /* EJP: or put Inf in num fixup? */
+        RsV = fSFINFVAL(-1);
+        RdV = fSFINFVAL(-1);
+    } else if (r_class == FP_ZERO) {
+        /* EJP: or put zero in num fixup? */
+        RsV = RsV;
+        RdV = fSFONEVAL(0);
+    } else {
+        PeV = 0x00;
+        /* Basic checks passed */
+        r_exp = fSF_GETEXP(RsV);
+        if (r_exp <= 24) {
+            RsV = fSF_MUL_POW2(RsV, 64);
+            PeV = 0xe0;
+        }
+        RdV = 0;
+        ret = 1;
+    }
+    *Rs = RsV;
+    *Rd = RdV;
+    *adjust = PeV;
+    return ret;
+}
+
+int arch_recip_lookup(int index)
+{
+    index &= 0x7f;
+    unsigned const int roundrom[128] = {
+        0x0fe, 0x0fa, 0x0f6, 0x0f2, 0x0ef, 0x0eb, 0x0e7, 0x0e4,
+        0x0e0, 0x0dd, 0x0d9, 0x0d6, 0x0d2, 0x0cf, 0x0cc, 0x0c9,
+        0x0c6, 0x0c2, 0x0bf, 0x0bc, 0x0b9, 0x0b6, 0x0b3, 0x0b1,
+        0x0ae, 0x0ab, 0x0a8, 0x0a5, 0x0a3, 0x0a0, 0x09d, 0x09b,
+        0x098, 0x096, 0x093, 0x091, 0x08e, 0x08c, 0x08a, 0x087,
+        0x085, 0x083, 0x080, 0x07e, 0x07c, 0x07a, 0x078, 0x075,
+        0x073, 0x071, 0x06f, 0x06d, 0x06b, 0x069, 0x067, 0x065,
+        0x063, 0x061, 0x05f, 0x05e, 0x05c, 0x05a, 0x058, 0x056,
+        0x054, 0x053, 0x051, 0x04f, 0x04e, 0x04c, 0x04a, 0x049,
+        0x047, 0x045, 0x044, 0x042, 0x040, 0x03f, 0x03d, 0x03c,
+        0x03a, 0x039, 0x037, 0x036, 0x034, 0x033, 0x032, 0x030,
+        0x02f, 0x02d, 0x02c, 0x02b, 0x029, 0x028, 0x027, 0x025,
+        0x024, 0x023, 0x021, 0x020, 0x01f, 0x01e, 0x01c, 0x01b,
+        0x01a, 0x019, 0x017, 0x016, 0x015, 0x014, 0x013, 0x012,
+        0x011, 0x00f, 0x00e, 0x00d, 0x00c, 0x00b, 0x00a, 0x009,
+        0x008, 0x007, 0x006, 0x005, 0x004, 0x003, 0x002, 0x000,
+    };
+    return roundrom[index];
+};
+
+int arch_invsqrt_lookup(int index)
+{
+    index &= 0x7f;
+    unsigned const int roundrom[128] = {
+        0x069, 0x066, 0x063, 0x061, 0x05e, 0x05b, 0x059, 0x057,
+        0x054, 0x052, 0x050, 0x04d, 0x04b, 0x049, 0x047, 0x045,
+        0x043, 0x041, 0x03f, 0x03d, 0x03b, 0x039, 0x037, 0x036,
+        0x034, 0x032, 0x030, 0x02f, 0x02d, 0x02c, 0x02a, 0x028,
+        0x027, 0x025, 0x024, 0x022, 0x021, 0x01f, 0x01e, 0x01d,
+        0x01b, 0x01a, 0x019, 0x017, 0x016, 0x015, 0x014, 0x012,
+        0x011, 0x010, 0x00f, 0x00d, 0x00c, 0x00b, 0x00a, 0x009,
+        0x008, 0x007, 0x006, 0x005, 0x004, 0x003, 0x002, 0x001,
+        0x0fe, 0x0fa, 0x0f6, 0x0f3, 0x0ef, 0x0eb, 0x0e8, 0x0e4,
+        0x0e1, 0x0de, 0x0db, 0x0d7, 0x0d4, 0x0d1, 0x0ce, 0x0cb,
+        0x0c9, 0x0c6, 0x0c3, 0x0c0, 0x0be, 0x0bb, 0x0b8, 0x0b6,
+        0x0b3, 0x0b1, 0x0af, 0x0ac, 0x0aa, 0x0a8, 0x0a5, 0x0a3,
+        0x0a1, 0x09f, 0x09d, 0x09b, 0x099, 0x097, 0x095, 0x093,
+        0x091, 0x08f, 0x08d, 0x08b, 0x089, 0x087, 0x086, 0x084,
+        0x082, 0x080, 0x07f, 0x07d, 0x07b, 0x07a, 0x078, 0x077,
+        0x075, 0x074, 0x072, 0x071, 0x06f, 0x06e, 0x06c, 0x06b,
+    };
+    return roundrom[index];
+};
+
diff --git a/target/hexagon/conv_emu.c b/target/hexagon/conv_emu.c
new file mode 100644
index 0000000..b31ff32
--- /dev/null
+++ b/target/hexagon/conv_emu.c
@@ -0,0 +1,369 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <math.h>
+#include "qemu/osdep.h"
+#include "hex_arch_types.h"
+#include "macros.h"
+#include "conv_emu.h"
+
+#define isz(X) (fpclassify(X) == FP_ZERO)
+#define DF_BIAS 1023
+#define SF_BIAS 127
+
+#define LL_MAX_POS 0x7fffffffffffffffULL
+#define MAX_POS 0x7fffffffU
+
+#ifdef VCPP
+/*
+ * Visual C isn't GNU C and doesn't have __builtin_clzll
+ */
+
+static int __builtin_clzll(unsigned long long int input)
+{
+    int total = 0;
+    if (input == 0) {
+        return 64;
+    }
+    total += ((input >> (total + 32)) != 0) ? 32 : 0;
+    total += ((input >> (total + 16)) != 0) ? 16 : 0;
+    total += ((input >> (total +  8)) != 0) ?  8 : 0;
+    total += ((input >> (total +  4)) != 0) ?  4 : 0;
+    total += ((input >> (total +  2)) != 0) ?  2 : 0;
+    total += ((input >> (total +  1)) != 0) ?  1 : 0;
+    return 63 - total;
+}
+#endif
+
+typedef union {
+    double f;
+    size8u_t i;
+    struct {
+        size8u_t mant:52;
+        size8u_t exp:11;
+        size8u_t sign:1;
+    } x;
+} df_t;
+
+
+typedef union {
+    float f;
+    size4u_t i;
+    struct {
+        size4u_t mant:23;
+        size4u_t exp:8;
+        size4u_t sign:1;
+    } x;
+} sf_t;
+
+
+#define MAKE_CONV_8U_TO_XF_N(FLOATID, BIGFLOATID, RETTYPE) \
+static RETTYPE conv_8u_to_##FLOATID##_n(size8u_t in, int negate) \
+{ \
+    FLOATID##_t x; \
+    size8u_t tmp, truncbits, shamt; \
+    int leading_zeros; \
+    if (in == 0) { \
+        return 0.0; \
+    } \
+    leading_zeros = __builtin_clzll(in); \
+    tmp = in << (leading_zeros); \
+    tmp <<= 1; \
+    shamt = 64 - f##BIGFLOATID##_MANTBITS(); \
+    truncbits = tmp & ((1ULL << (shamt)) - 1); \
+    tmp >>= shamt; \
+    if (truncbits != 0) { \
+        feraiseexcept(FE_INEXACT); \
+        switch (fegetround()) { \
+        case FE_TOWARDZERO: \
+            break; \
+        case FE_DOWNWARD: \
+            if (negate) { \
+                tmp += 1; \
+            } \
+            break; \
+        case FE_UPWARD: \
+            if (!negate) { \
+                tmp += 1; \
+            } \
+            break; \
+        default: \
+            if ((truncbits & ((1ULL << (shamt - 1)) - 1)) == 0) { \
+                tmp += (tmp & 1); \
+            } else { \
+                tmp += ((truncbits >> (shamt - 1)) & 1); \
+            } \
+            break; \
+        } \
+    } \
+    if (((tmp << shamt) >> shamt) != tmp) { \
+        leading_zeros--; \
+    } \
+    x.x.mant = tmp; \
+    x.x.exp = BIGFLOATID##_BIAS + f##BIGFLOATID##_MANTBITS() - \
+              leading_zeros + shamt - 1; \
+    x.x.sign = negate; \
+    return x.f; \
+}
+
+MAKE_CONV_8U_TO_XF_N(df, DF, double)
+MAKE_CONV_8U_TO_XF_N(sf, SF, float)
+
+double conv_8u_to_df(size8u_t in)
+{
+    return conv_8u_to_df_n(in, 0);
+}
+
+double conv_8s_to_df(size8s_t in)
+{
+    if (in == 0x8000000000000000) {
+        return -0x1p63;
+    }
+    if (in < 0) {
+        return conv_8u_to_df_n(-in, 1);
+    } else {
+        return conv_8u_to_df_n(in, 0);
+    }
+}
+
+double conv_4u_to_df(size4u_t in)
+{
+    return conv_8u_to_df((size8u_t) in);
+}
+
+double conv_4s_to_df(size4s_t in)
+{
+    return conv_8s_to_df(in);
+}
+
+float conv_8u_to_sf(size8u_t in)
+{
+    return conv_8u_to_sf_n(in, 0);
+}
+
+float conv_8s_to_sf(size8s_t in)
+{
+    if (in == 0x8000000000000000) {
+        return -0x1p63;
+    }
+    if (in < 0) {
+        return conv_8u_to_sf_n(-in, 1);
+    } else {
+        return conv_8u_to_sf_n(in, 0);
+    }
+}
+
+float conv_4u_to_sf(size4u_t in)
+{
+    return conv_8u_to_sf(in);
+}
+
+float conv_4s_to_sf(size4s_t in)
+{
+    return conv_8s_to_sf(in);
+}
+
+
+static size8u_t conv_df_to_8u_n(double in, int will_negate)
+{
+    df_t x;
+    int fracshift, endshift;
+    size8u_t tmp, truncbits;
+    x.f = in;
+    if (isinf(in)) {
+        feraiseexcept(FE_INVALID);
+        if (in > 0.0) {
+            return ~0ULL;
+        } else {
+            return 0ULL;
+        }
+    }
+    if (isnan(in)) {
+        feraiseexcept(FE_INVALID);
+        return ~0ULL;
+    }
+    if (isz(in)) {
+        return 0;
+    }
+    if (x.x.sign) {
+        feraiseexcept(FE_INVALID);
+        return 0;
+    }
+    if (in < 0.5) {
+        /* Near zero, captures large fracshifts, denorms, etc */
+        feraiseexcept(FE_INEXACT);
+        switch (fegetround()) {
+        case FE_DOWNWARD:
+            if (will_negate) {
+                return 1;
+            } else {
+                return 0;
+            }
+        case FE_UPWARD:
+            if (!will_negate) {
+                return 1;
+            } else {
+                return 0;
+            }
+        default:
+            return 0;    /* nearest or towards zero */
+        }
+    }
+    if ((x.x.exp - DF_BIAS) >= 64) {
+        /* way too big */
+        feraiseexcept(FE_INVALID);
+        return ~0ULL;
+    }
+    fracshift = fMAX(0, (fDF_MANTBITS() - (x.x.exp - DF_BIAS)));
+    endshift = fMAX(0, ((x.x.exp - DF_BIAS - fDF_MANTBITS())));
+    tmp = x.x.mant | (1ULL << fDF_MANTBITS());
+    truncbits = tmp & ((1ULL << fracshift) - 1);
+    tmp >>= fracshift;
+    if (truncbits) {
+        /* Apply Rounding */
+        feraiseexcept(FE_INEXACT);
+        switch (fegetround()) {
+        case FE_TOWARDZERO:
+            break;
+        case FE_DOWNWARD:
+            if (will_negate) {
+                tmp += 1;
+            }
+            break;
+        case FE_UPWARD:
+            if (!will_negate) {
+                tmp += 1;
+            }
+            break;
+        default:
+            if ((truncbits & ((1ULL << (fracshift - 1)) - 1)) == 0) {
+                /* Exactly .5 */
+                tmp += (tmp & 1);
+            } else {
+                tmp += ((truncbits >> (fracshift - 1)) & 1);
+            }
+        }
+    }
+    /*
+     * If we added one and it carried all the way out,
+     * check to see if overflow
+     */
+    if ((tmp & ((1ULL << (fDF_MANTBITS() + 1)) - 1)) == 0) {
+        if ((x.x.exp - DF_BIAS) == 63) {
+            feclearexcept(FE_INEXACT);
+            feraiseexcept(FE_INVALID);
+            return ~0ULL;
+        }
+    }
+    tmp <<= endshift;
+    return tmp;
+}
+
+static size4u_t conv_df_to_4u_n(double in, int will_negate)
+{
+    size8u_t tmp;
+    tmp = conv_df_to_8u_n(in, will_negate);
+    if (tmp > 0x00000000ffffffffULL) {
+        feclearexcept(FE_INEXACT);
+        feraiseexcept(FE_INVALID);
+        return ~0U;
+    }
+    return (size4u_t)tmp;
+}
+
+size8u_t conv_df_to_8u(double in)
+{
+    return conv_df_to_8u_n(in, 0);
+}
+
+size4u_t conv_df_to_4u(double in)
+{
+    return conv_df_to_4u_n(in, 0);
+}
+
+size8s_t conv_df_to_8s(double in)
+{
+    size8u_t tmp;
+    df_t x;
+    x.f = in;
+    if (isnan(in)) {
+        feraiseexcept(FE_INVALID);
+        return -1;
+    }
+    if (x.x.sign) {
+        tmp = conv_df_to_8u_n(-in, 1);
+    } else {
+        tmp = conv_df_to_8u_n(in, 0);
+    }
+    if (tmp > (LL_MAX_POS + x.x.sign)) {
+        feclearexcept(FE_INEXACT);
+        feraiseexcept(FE_INVALID);
+        tmp = (LL_MAX_POS + x.x.sign);
+    }
+    if (x.x.sign) {
+        return -tmp;
+    } else {
+        return tmp;
+    }
+}
+
+size4s_t conv_df_to_4s(double in)
+{
+    size8u_t tmp;
+    df_t x;
+    x.f = in;
+    if (isnan(in)) {
+        feraiseexcept(FE_INVALID);
+        return -1;
+    }
+    if (x.x.sign) {
+        tmp = conv_df_to_8u_n(-in, 1);
+    } else {
+        tmp = conv_df_to_8u_n(in, 0);
+    }
+    if (tmp > (MAX_POS + x.x.sign)) {
+        feclearexcept(FE_INEXACT);
+        feraiseexcept(FE_INVALID);
+        tmp = (MAX_POS + x.x.sign);
+    }
+    if (x.x.sign) {
+        return -tmp;
+    } else {
+        return tmp;
+    }
+}
+
+size8u_t conv_sf_to_8u(float in)
+{
+    return conv_df_to_8u(in);
+}
+
+size4u_t conv_sf_to_4u(float in)
+{
+    return conv_df_to_4u(in);
+}
+
+size8s_t conv_sf_to_8s(float in)
+{
+    return conv_df_to_8s(in);
+}
+
+size4s_t conv_sf_to_4s(float in)
+{
+    return conv_df_to_4s(in);
+}
+
diff --git a/target/hexagon/fma_emu.c b/target/hexagon/fma_emu.c
new file mode 100644
index 0000000..01cca83
--- /dev/null
+++ b/target/hexagon/fma_emu.c
@@ -0,0 +1,916 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <math.h>
+#include "qemu/osdep.h"
+#include "macros.h"
+#include "conv_emu.h"
+#include "fma_emu.h"
+
+#define DF_INF_EXP 0x7ff
+#define DF_BIAS 1023
+
+#define SF_INF_EXP 0xff
+#define SF_BIAS 127
+
+#define HF_INF_EXP 0x1f
+#define HF_BIAS 15
+
+#define WAY_BIG_EXP 4096
+
+#define isz(X) (fpclassify(X) == FP_ZERO)
+
+#define debug_fprintf(...)    /* nothing */
+
+typedef union {
+    double f;
+    size8u_t i;
+    struct {
+        size8u_t mant:52;
+        size8u_t exp:11;
+        size8u_t sign:1;
+    } x;
+} df_t;
+
+typedef union {
+    float f;
+    size4u_t i;
+    struct {
+        size4u_t mant:23;
+        size4u_t exp:8;
+        size4u_t sign:1;
+    } x;
+} sf_t;
+
+typedef struct {
+    union {
+        size8u_t low;
+        struct {
+            size4u_t w0;
+            size4u_t w1;
+        };
+    };
+    union {
+        size8u_t high;
+        struct {
+            size4u_t w3;
+            size4u_t w2;
+        };
+    };
+} int128_t;
+
+typedef struct {
+    int128_t mant;
+    size4s_t exp;
+    size1u_t sign;
+    size1u_t guard;
+    size1u_t round;
+    size1u_t sticky;
+} xf_t;
+
+static inline void xf_debug(const char *msg, xf_t a)
+{
+    debug_fprintf(stdout, "%s %c0x%016llx_%016llx /%d/%d/%d p%d\n", msg,
+                  a.sign ? '-' : '+', a.mant.high, a.mant.low, a.guard,
+                  a.round, a.sticky, a.exp);
+}
+
+static inline void xf_init(xf_t *p)
+{
+    p->mant.low = 0;
+    p->mant.high = 0;
+    p->exp = 0;
+    p->sign = 0;
+    p->guard = 0;
+    p->round = 0;
+    p->sticky = 0;
+}
+
+static inline size8u_t df_getmant(df_t a)
+{
+    int class = fpclassify(a.f);
+    switch (class) {
+    case FP_NORMAL:
+    return a.x.mant | 1ULL << 52;
+    case FP_ZERO:
+        return 0;
+    case FP_SUBNORMAL:
+        return a.x.mant;
+    default:
+        return -1;
+    };
+}
+
+static inline size4s_t df_getexp(df_t a)
+{
+    int class = fpclassify(a.f);
+    switch (class) {
+    case FP_NORMAL:
+        return a.x.exp;
+    case FP_SUBNORMAL:
+        return a.x.exp + 1;
+    default:
+        return -1;
+    };
+}
+
+static inline size8u_t sf_getmant(sf_t a)
+{
+    int class = fpclassify(a.f);
+    switch (class) {
+    case FP_NORMAL:
+        return a.x.mant | 1ULL << 23;
+    case FP_ZERO:
+        return 0;
+    case FP_SUBNORMAL:
+        return a.x.mant | 0ULL;
+    default:
+        return -1;
+    };
+}
+
+static inline size4s_t sf_getexp(sf_t a)
+{
+    int class = fpclassify(a.f);
+    switch (class) {
+    case FP_NORMAL:
+        return a.x.exp;
+    case FP_SUBNORMAL:
+        return a.x.exp + 1;
+    default:
+        return -1;
+    };
+}
+
+static inline int128_t int128_mul_6464(size8u_t ai, size8u_t bi)
+{
+    int128_t ret;
+    int128_t a, b;
+    size8u_t pp0, pp1a, pp1b, pp1s, pp2;
+
+    debug_fprintf(stdout, "ai/bi: 0x%016llx/0x%016llx\n", ai, bi);
+
+    a.high = b.high = 0;
+    a.low = ai;
+    b.low = bi;
+    pp0 = (size8u_t)a.w0 * (size8u_t)b.w0;
+    pp1a = (size8u_t)a.w1 * (size8u_t)b.w0;
+    pp1b = (size8u_t)b.w1 * (size8u_t)a.w0;
+    pp2 = (size8u_t)a.w1 * (size8u_t)b.w1;
+
+    debug_fprintf(stdout,
+                  "pp2/1b/1a/0: 0x%016llx/0x%016llx/0x%016llx/0x%016llx\n",
+                  pp2, pp1b, pp1a, pp0);
+
+    pp1s = pp1a + pp1b;
+    if ((pp1s < pp1a) || (pp1s < pp1b)) {
+        pp2 += (1ULL << 32);
+    }
+    ret.low = pp0 + (pp1s << 32);
+    if ((ret.low < pp0) || (ret.low < (pp1s << 32))) {
+        pp2 += 1;
+    }
+    ret.high = pp2 + (pp1s >> 32);
+
+    debug_fprintf(stdout,
+                  "pp1s/rethi/retlo: 0x%016llx/0x%016llx/0x%016llx\n",
+                  pp1s, ret.high, ret.low);
+
+    return ret;
+}
+
+static inline int128_t int128_shl(int128_t a, size4u_t amt)
+{
+    int128_t ret;
+    if (amt == 0) {
+        return a;
+    }
+    if (amt > 128) {
+        ret.high = 0;
+        ret.low = 0;
+        return ret;
+    }
+    if (amt >= 64) {
+        amt -= 64;
+        a.high = a.low;
+        a.low = 0;
+    }
+    ret.high = a.high << amt;
+    ret.high |= (a.low >> (64 - amt));
+    ret.low = a.low << amt;
+    return ret;
+}
+
+static inline int128_t int128_shr(int128_t a, size4u_t amt)
+{
+    int128_t ret;
+    if (amt == 0) {
+        return a;
+    }
+    if (amt > 128) {
+        ret.high = 0;
+        ret.low = 0;
+        return ret;
+    }
+    if (amt >= 64) {
+        amt -= 64;
+        a.low = a.high;
+        a.high = 0;
+    }
+    ret.low = a.low >> amt;
+    ret.low |= (a.high << (64 - amt));
+    ret.high = a.high >> amt;
+    return ret;
+}
+
+static inline int128_t int128_add(int128_t a, int128_t b)
+{
+    int128_t ret;
+    ret.low = a.low + b.low;
+    if ((ret.low < a.low) || (ret.low < b.low)) {
+        /* carry into high part */
+        a.high += 1;
+    }
+    ret.high = a.high + b.high;
+    return ret;
+}
+
+static inline int128_t int128_sub(int128_t a, int128_t b, int borrow)
+{
+    int128_t ret;
+    ret.low = a.low - b.low;
+    if (ret.low > a.low) {
+        /* borrow into high part */
+        a.high -= 1;
+    }
+    ret.high = a.high - b.high;
+    if (borrow == 0) {
+        return ret;
+    } else {
+        a.high = 0;
+        a.low = 1;
+        return int128_sub(ret, a, 0);
+    }
+}
+
+static inline int int128_gt(int128_t a, int128_t b)
+{
+    if (a.high == b.high) {
+        return a.low > b.low;
+    }
+    return a.high > b.high;
+}
+
+static inline xf_t xf_norm_left(xf_t a)
+{
+    a.exp--;
+    a.mant = int128_shl(a.mant, 1);
+    a.mant.low |= a.guard;
+    a.guard = a.round;
+    a.round = a.sticky;
+    return a;
+}
+
+static inline xf_t xf_norm_right(xf_t a, int amt)
+{
+    if (amt > 130) {
+        a.sticky |=
+            a.round | a.guard | (a.mant.low != 0) | (a.mant.high != 0);
+        a.guard = a.round = a.mant.high = a.mant.low = 0;
+        a.exp += amt;
+        return a;
+
+    }
+    while (amt >= 64) {
+        a.sticky |= a.round | a.guard | (a.mant.low != 0);
+        a.guard = (a.mant.low >> 63) & 1;
+        a.round = (a.mant.low >> 62) & 1;
+        a.mant.low = a.mant.high;
+        a.mant.high = 0;
+        a.exp += 64;
+        amt -= 64;
+    }
+    while (amt > 0) {
+        a.exp++;
+        a.sticky |= a.round;
+        a.round = a.guard;
+        a.guard = a.mant.low & 1;
+        a.mant = int128_shr(a.mant, 1);
+        amt--;
+    }
+    return a;
+}
+
+
+/*
+ * On the add/sub, we need to be able to shift out lots of bits, but need a
+ * sticky bit for what was shifted out, I think.
+ */
+static xf_t xf_add(xf_t a, xf_t b);
+
+static inline xf_t xf_sub(xf_t a, xf_t b, int negate)
+{
+    xf_t ret;
+    xf_init(&ret);
+    int borrow;
+
+    xf_debug("-->Sub/a: ", a);
+    xf_debug("-->Sub/b: ", b);
+
+    if (a.sign != b.sign) {
+        b.sign = !b.sign;
+        return xf_add(a, b);
+    }
+    if (b.exp > a.exp) {
+        /* small - big == - (big - small) */
+        return xf_sub(b, a, !negate);
+    }
+    if ((b.exp == a.exp) && (int128_gt(b.mant, a.mant))) {
+        /* small - big == - (big - small) */
+        return xf_sub(b, a, !negate);
+    }
+
+    xf_debug("OK: Sub/a: ", a);
+    xf_debug("OK: Sub/b: ", b);
+
+    while (a.exp > b.exp) {
+        /* Try to normalize exponents: shrink a exponent and grow mantissa */
+        if (a.mant.high & (1ULL << 62)) {
+            /* Can't grow a any more */
+            break;
+        } else {
+            a = xf_norm_left(a);
+        }
+    }
+
+    xf_debug("norm_l: Sub/a: ", a);
+    xf_debug("norm_l: Sub/b: ", b);
+
+    while (a.exp > b.exp) {
+        /* Try to normalize exponents: grow b exponent and shrink mantissa */
+        /* Keep around shifted out bits... we might need those later */
+        b = xf_norm_right(b, a.exp - b.exp);
+    }
+
+    xf_debug("norm_r: Sub/a: ", a);
+    xf_debug("norm_r: Sub/b: ", b);
+
+    if ((int128_gt(b.mant, a.mant))) {
+        xf_debug("retry: Sub/a: ", a);
+        xf_debug("retry: Sub/b: ", b);
+        return xf_sub(b, a, !negate);
+    }
+
+    /* OK, now things should be normalized! */
+    ret.sign = a.sign;
+    ret.exp = a.exp;
+    assert(!int128_gt(b.mant, a.mant));
+    borrow = (b.round << 2) | (b.guard << 1) | b.sticky;
+    ret.mant = int128_sub(a.mant, b.mant, (borrow != 0));
+    borrow = 0 - borrow;
+    ret.guard = (borrow >> 2) & 1;
+    ret.round = (borrow >> 1) & 1;
+    ret.sticky = (borrow >> 0) & 1;
+    if (negate) {
+        ret.sign = !ret.sign;
+    }
+    return ret;
+}
+
+static xf_t xf_add(xf_t a, xf_t b)
+{
+    xf_t ret;
+    xf_init(&ret);
+    xf_debug("-->Add/a: ", a);
+    xf_debug("-->Add/b: ", b);
+    if (a.sign != b.sign) {
+        b.sign = !b.sign;
+        return xf_sub(a, b, 0);
+    }
+    if (b.exp > a.exp) {
+        /* small + big ==  (big + small) */
+        return xf_add(b, a);
+    }
+    if ((b.exp == a.exp) && int128_gt(b.mant, a.mant)) {
+        /* small + big ==  (big + small) */
+        return xf_add(b, a);
+    }
+
+    xf_debug("OK? Add/a: ", a);
+    xf_debug("OK? Add/b: ", b);
+
+    while (a.exp > b.exp) {
+        /* Try to normalize exponents: shrink a exponent and grow mantissa */
+        if (a.mant.high & (1ULL << 62)) {
+            /* Can't grow a any more */
+            break;
+        } else {
+            a = xf_norm_left(a);
+        }
+    }
+
+    xf_debug("norm_l: Add/a: ", a);
+    xf_debug("norm_l: Add/b: ", b);
+
+    while (a.exp > b.exp) {
+        /* Try to normalize exponents: grow b exponent and shrink mantissa */
+        /* Keep around shifted out bits... we might need those later */
+        b = xf_norm_right(b, a.exp - b.exp);
+    }
+
+    xf_debug("norm_r: Add/a: ", a);
+    xf_debug("norm_r: Add/b: ", b);
+
+    /* OK, now things should be normalized! */
+    if (int128_gt(b.mant, a.mant)) {
+        xf_debug("retry: Add/a: ", a);
+        xf_debug("retry: Add/b: ", b);
+        return xf_add(b, a);
+    };
+    ret.sign = a.sign;
+    ret.exp = a.exp;
+    assert(!int128_gt(b.mant, a.mant));
+    ret.mant = int128_add(a.mant, b.mant);
+    ret.guard = b.guard;
+    ret.round = b.round;
+    ret.sticky = b.sticky;
+    return ret;
+}
+
+/* Return an infinity with the same sign as a */
+static inline df_t infinite_df_t(xf_t a)
+{
+    df_t ret;
+    ret.x.sign = a.sign;
+    ret.x.exp = DF_INF_EXP;
+    ret.x.mant = 0ULL;
+    return ret;
+}
+
+/* Return a maximum finite value with the same sign as a */
+static inline df_t maxfinite_df_t(xf_t a)
+{
+    df_t ret;
+    ret.x.sign = a.sign;
+    ret.x.exp = DF_INF_EXP - 1;
+    ret.x.mant = 0x000fffffffffffffULL;
+    return ret;
+}
+
+static inline df_t f2df_t(double in)
+{
+    df_t ret;
+    ret.f = in;
+    return ret;
+}
+
+/* Return an infinity with the same sign as a */
+static inline sf_t infinite_sf_t(xf_t a)
+{
+    sf_t ret;
+    ret.x.sign = a.sign;
+    ret.x.exp = SF_INF_EXP;
+    ret.x.mant = 0ULL;
+    return ret;
+}
+
+/* Return a maximum finite value with the same sign as a */
+static inline sf_t maxfinite_sf_t(xf_t a)
+{
+    sf_t ret;
+    ret.x.sign = a.sign;
+    ret.x.exp = SF_INF_EXP - 1;
+    ret.x.mant = 0x007fffffUL;
+    return ret;
+}
+
+static inline sf_t f2sf_t(float in)
+{
+    sf_t ret;
+    ret.f = in;
+    return ret;
+}
+
+#define GEN_XF_ROUND(TYPE, MANTBITS, INF_EXP) \
+static inline TYPE xf_round_##TYPE(xf_t a) \
+{ \
+    TYPE ret; \
+    ret.i = 0; \
+    ret.x.sign = a.sign; \
+    if ((a.mant.high == 0) && (a.mant.low == 0) \
+        && ((a.guard | a.round | a.sticky) == 0)) { \
+        /* result zero */ \
+        switch (fegetround()) { \
+        case FE_DOWNWARD: \
+            return f2##TYPE(-0.0); \
+        default: \
+            return f2##TYPE(0.0); \
+        } \
+    } \
+    /* Normalize right */ \
+    /* We want MANTBITS bits of mantissa plus the leading one. */ \
+    /* That means that we want MANTBITS+1 bits, or 0x000000000000FF_FFFF */ \
+    /* So we need to normalize right while the high word is non-zero and \
+    * while the low word is nonzero when masked with 0xffe0_0000_0000_0000 */ \
+    xf_debug("input: ", a); \
+    while ((a.mant.high != 0) || ((a.mant.low >> (MANTBITS + 1)) != 0)) { \
+        a = xf_norm_right(a, 1); \
+    } \
+    xf_debug("norm_right: ", a); \
+    /* \
+     * OK, now normalize left \
+     * We want to normalize left until we have a leading one in bit 24 \
+     * Theoretically, we only need to shift a maximum of one to the left if we \
+     * shifted out lots of bits from B, or if we had no shift / 1 shift sticky \
+     * shoudl be 0  \
+     */ \
+    while ((a.mant.low & (1ULL << MANTBITS)) == 0) { \
+        a = xf_norm_left(a); \
+    } \
+    xf_debug("norm_left: ", a); \
+    /* \
+     * OK, now we might need to denormalize because of potential underflow. \
+     * We need to do this before rounding, and rounding might make us normal \
+     * again \
+     */ \
+    while (a.exp <= 0) { \
+        a = xf_norm_right(a, 1 - a.exp); \
+        /* \
+         * Do we have underflow? \
+         * That's when we get an inexact answer because we ran out of bits \
+         * in a denormal. \
+         */ \
+        if (a.guard || a.round || a.sticky) { \
+            feraiseexcept(FE_UNDERFLOW); \
+        } \
+    } \
+    xf_debug("norm_denorm: ", a); \
+    /* OK, we're relatively canonical... now we need to round */ \
+    if (a.guard || a.round || a.sticky) { \
+        feraiseexcept(FE_INEXACT); \
+        switch (fegetround()) { \
+        case FE_TOWARDZERO: \
+            /* Chop and we're done */ \
+            break; \
+        case FE_UPWARD: \
+            if (a.sign == 0) { \
+                a.mant.low += 1; \
+            } \
+            break; \
+        case FE_DOWNWARD: \
+            if (a.sign != 0) { \
+                a.mant.low += 1; \
+            } \
+            break; \
+        default: \
+            if (a.round || a.sticky) { \
+                /* round up if guard is 1, down if guard is zero */ \
+                a.mant.low += a.guard; \
+            } else if (a.guard) { \
+                /* exactly .5, round up if odd */ \
+                a.mant.low += (a.mant.low & 1); \
+            } \
+            break; \
+        } \
+    } \
+    xf_debug("post_round: ", a); \
+    /* \
+     * OK, now we might have carried all the way up. \
+     * So we might need to shr once \
+     * at least we know that the lsb should be zero if we rounded and \
+     * got a carry out... \
+     */ \
+    if ((a.mant.low >> (MANTBITS + 1)) != 0) { \
+        a = xf_norm_right(a, 1); \
+    } \
+    xf_debug("once_norm_right: ", a); \
+    /* Overflow? */ \
+    if (a.exp >= INF_EXP) { \
+        /* Yep, inf result */ \
+        xf_debug("inf: ", a); \
+        feraiseexcept(FE_OVERFLOW); \
+        feraiseexcept(FE_INEXACT); \
+        switch (fegetround()) { \
+        case FE_TOWARDZERO: \
+            return maxfinite_##TYPE(a); \
+        case FE_UPWARD: \
+            if (a.sign == 0) { \
+                return infinite_##TYPE(a); \
+            } else { \
+                return maxfinite_##TYPE(a); \
+            } \
+        case FE_DOWNWARD: \
+            if (a.sign != 0) { \
+                return infinite_##TYPE(a); \
+            } else { \
+                return maxfinite_##TYPE(a); \
+            } \
+        default: \
+            return infinite_##TYPE(a); \
+        } \
+    } \
+    /* Underflow? */ \
+    if (a.mant.low & (1ULL << MANTBITS)) { \
+        /* Leading one means: No, we're normal. So, we should be done... */ \
+        xf_debug("norm: ", a); \
+        ret.x.exp = a.exp; \
+        ret.x.mant = a.mant.low; \
+        return ret; \
+    } \
+    xf_debug("denorm: ", a); \
+    if (a.exp != 1) { \
+        printf("a.exp == %d\n", a.exp); \
+    } \
+    assert(a.exp == 1); \
+    ret.x.exp = 0; \
+    ret.x.mant = a.mant.low; \
+    return ret; \
+}
+
+GEN_XF_ROUND(df_t, fDF_MANTBITS(), DF_INF_EXP)
+GEN_XF_ROUND(sf_t, fSF_MANTBITS(), SF_INF_EXP)
+
+static inline double special_fma(df_t a, df_t b, df_t c)
+{
+    df_t ret;
+    ret.i = 0;
+
+    /*
+     * If A multiplied by B is an exact infinity and C is also an infinity
+     * but with the opposite sign, FMA returns NaN and raises invalid.
+     */
+    if (fISINFPROD(a.f, b.f) && isinf(c.f)) {
+        if ((a.x.sign ^ b.x.sign) != c.x.sign) {
+            ret.i = fDFNANVAL();
+            feraiseexcept(FE_INVALID);
+            return ret.f;
+        }
+    }
+    if ((isinf(a.f) && isz(b.f)) || (isz(a.f) && isinf(b.f))) {
+        ret.i = fDFNANVAL();
+        feraiseexcept(FE_INVALID);
+        return ret.f;
+    }
+    /*
+     * If none of the above checks are true and C is a NaN,
+     * a NaN shall be returned
+     * If A or B are NaN, a NAN shall be returned.
+     */
+    if (isnan(a.f) || isnan(b.f) || (isnan(c.f))) {
+        if (isnan(a.f) && (fGETBIT(51, a.i) == 0)) {
+            feraiseexcept(FE_INVALID);
+        }
+        if (isnan(b.f) && (fGETBIT(51, b.i) == 0)) {
+            feraiseexcept(FE_INVALID);
+        }
+        if (isnan(c.f) && (fGETBIT(51, c.i) == 0)) {
+            feraiseexcept(FE_INVALID);
+        }
+        ret.i = fDFNANVAL();
+        return ret.f;
+    }
+    /*
+     * We have checked for adding opposite-signed infinities.
+     * Other infinities return infinity with the correct sign
+     */
+    if (isinf(c.f)) {
+        ret.x.exp = DF_INF_EXP;
+        ret.x.mant = 0;
+        ret.x.sign = c.x.sign;
+        return ret.f;
+    }
+    if (isinf(a.f) || isinf(b.f)) {
+        ret.x.exp = DF_INF_EXP;
+        ret.x.mant = 0;
+        ret.x.sign = (a.x.sign ^ b.x.sign);
+        return ret.f;
+    }
+    g_assert_not_reached();
+    ret.x.exp = 0x123;
+    ret.x.mant = 0xdead;
+    return ret.f;
+}
+
+static inline float special_fmaf(sf_t a, sf_t b, sf_t c)
+{
+    df_t aa, bb, cc;
+    aa.f = a.f;
+    bb.f = b.f;
+    cc.f = c.f;
+    return special_fma(aa, bb, cc);
+}
+
+double internal_fmax(double a_in, double b_in, double c_in, int scale)
+{
+    df_t a, b, c;
+    xf_t prod;
+    xf_t acc;
+    xf_t result;
+    xf_init(&prod);
+    xf_init(&acc);
+    xf_init(&result);
+    a.f = a_in;
+    b.f = b_in;
+    c.f = c_in;
+
+    debug_fprintf(stdout,
+                  "internal_fmax: 0x%016llx * 0x%016llx + 0x%016llx sc: %d\n",
+                  fUNDOUBLE(a_in), fUNDOUBLE(b_in), fUNDOUBLE(c_in),
+                  scale);
+
+    if (isinf(a.f) || isinf(b.f) || isinf(c.f)) {
+        return special_fma(a, b, c);
+    }
+    if (isnan(a.f) || isnan(b.f) || isnan(c.f)) {
+        return special_fma(a, b, c);
+    }
+    if ((scale == 0) && (isz(a.f) || isz(b.f))) {
+        return a.f * b.f + c.f;
+    }
+
+    /* (a * 2**b) * (c * 2**d) == a*c * 2**(b+d) */
+    prod.mant = int128_mul_6464(df_getmant(a), df_getmant(b));
+
+    /*
+     * Note: extracting the mantissa into an int is multiplying by
+     * 2**52, so adjust here
+     */
+    prod.exp = df_getexp(a) + df_getexp(b) - DF_BIAS - 52;
+    prod.sign = a.x.sign ^ b.x.sign;
+    xf_debug("prod: ", prod);
+    if (!isz(c.f)) {
+        acc.mant = int128_mul_6464(df_getmant(c), 1);
+        acc.exp = df_getexp(c);
+        acc.sign = c.x.sign;
+        xf_debug("acc: ", acc);
+        result = xf_add(prod, acc);
+    } else {
+        result = prod;
+    }
+    xf_debug("sum: ", result);
+    debug_fprintf(stdout, "Scaling: %d\n", scale);
+    result.exp += scale;
+    xf_debug("post-scale: ", result);
+    return xf_round_df_t(result).f;
+}
+
+float internal_fmafx(float a_in, float b_in, float c_in, int scale)
+{
+    sf_t a, b, c;
+    xf_t prod;
+    xf_t acc;
+    xf_t result;
+    xf_init(&prod);
+    xf_init(&acc);
+    xf_init(&result);
+    a.f = a_in;
+    b.f = b_in;
+    c.f = c_in;
+
+    debug_fprintf(stdout,
+                  "internal_fmax: 0x%016x * 0x%016x + 0x%016x sc: %d\n",
+                  fUNFLOAT(a_in), fUNFLOAT(b_in), fUNFLOAT(c_in), scale);
+    if (isinf(a.f) || isinf(b.f) || isinf(c.f)) {
+        return special_fmaf(a, b, c);
+    }
+    if (isnan(a.f) || isnan(b.f) || isnan(c.f)) {
+        return special_fmaf(a, b, c);
+    }
+    if ((scale == 0) && (isz(a.f) || isz(b.f))) {
+        return a.f * b.f + c.f;
+    }
+
+    /* (a * 2**b) * (c * 2**d) == a*c * 2**(b+d) */
+    prod.mant = int128_mul_6464(sf_getmant(a), sf_getmant(b));
+
+    /*
+     * Note: extracting the mantissa into an int is multiplying by
+     * 2**23, so adjust here
+     */
+    prod.exp = sf_getexp(a) + sf_getexp(b) - SF_BIAS - 23;
+    prod.sign = a.x.sign ^ b.x.sign;
+    if (isz(a.f) || isz(b.f)) {
+        prod.exp = -2 * WAY_BIG_EXP;
+    }
+    xf_debug("prod: ", prod);
+    if ((scale > 0) && (fpclassify(c.f) == FP_SUBNORMAL)) {
+        acc.mant = int128_mul_6464(0, 0);
+        acc.exp = -WAY_BIG_EXP;
+        acc.sign = c.x.sign;
+        acc.sticky = 1;
+        xf_debug("special denorm acc: ", acc);
+        result = xf_add(prod, acc);
+    } else if (!isz(c.f)) {
+        acc.mant = int128_mul_6464(sf_getmant(c), 1);
+        acc.exp = sf_getexp(c);
+        acc.sign = c.x.sign;
+        xf_debug("acc: ", acc);
+        result = xf_add(prod, acc);
+    } else {
+        result = prod;
+    }
+    xf_debug("sum: ", result);
+    debug_fprintf(stdout, "Scaling: %d\n", scale);
+    result.exp += scale;
+    xf_debug("post-scale: ", result);
+    return xf_round_sf_t(result).f;
+}
+
+
+float internal_fmaf(float a_in, float b_in, float c_in)
+{
+    return internal_fmafx(a_in, b_in, c_in, 0);
+}
+
+double internal_fma(double a_in, double b_in, double c_in)
+{
+    return internal_fmax(a_in, b_in, c_in, 0);
+}
+
+float internal_mpyf(float a_in, float b_in)
+{
+    if (isz(a_in) || isz(b_in)) {
+        return a_in * b_in;
+    }
+    return internal_fmafx(a_in, b_in, 0.0, 0);
+}
+
+double internal_mpy(double a_in, double b_in)
+{
+    if (isz(a_in) || isz(b_in)) {
+        return a_in * b_in;
+    }
+    return internal_fmax(a_in, b_in, 0.0, 0);
+}
+
+static inline double internal_mpyhh_special(double a, double b)
+{
+    return a * b;
+}
+
+double internal_mpyhh(double a_in, double b_in,
+                      unsigned long long int accumulated)
+{
+    df_t a, b;
+    xf_t x;
+    unsigned long long int prod;
+    unsigned int sticky;
+
+    a.f = a_in;
+    b.f = b_in;
+    sticky = accumulated & 1;
+    accumulated >>= 1;
+    xf_init(&x);
+    if (isz(a_in) || isnan(a_in) || isinf(a_in)) {
+        return internal_mpyhh_special(a_in, b_in);
+    }
+    if (isz(b_in) || isnan(b_in) || isinf(b_in)) {
+        return internal_mpyhh_special(a_in, b_in);
+    }
+    x.mant = int128_mul_6464(accumulated, 1);
+    x.sticky = sticky;
+    prod = fGETUWORD(1, df_getmant(a)) * fGETUWORD(1, df_getmant(b));
+    x.mant = int128_add(x.mant, int128_mul_6464(prod, 0x100000000ULL));
+    x.exp = df_getexp(a) + df_getexp(b) - DF_BIAS - 20;
+    xf_debug("formed: ", x);
+    if (!isnormal(a_in) || !isnormal(b_in)) {
+        /* crush to inexact zero */
+        x.sticky = 1;
+        x.exp = -4096;
+        xf_debug("crushing: ", x);
+    }
+    x.sign = a.x.sign ^ b.x.sign;
+    xf_debug("with sign: ", x);
+    return xf_round_df_t(x).f;
+}
+
+float conv_df_to_sf(double in_f)
+{
+    xf_t x;
+    df_t in;
+    if (isz(in_f) || isnan(in_f) || isinf(in_f)) {
+        return in_f;
+    }
+    xf_init(&x);
+    in.f = in_f;
+    x.mant = int128_mul_6464(df_getmant(in), 1);
+    x.exp = df_getexp(in) - DF_BIAS + SF_BIAS - 52 + 23;
+    x.sign = in.x.sign;
+    xf_debug("conv to sf: x: ", x);
+    return xf_round_sf_t(x).f;
+}
+
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 21/67] Hexagon generator phase 1 - C preprocessor for semantics
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (19 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 20/67] Hexagon instruction utility functions Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-02-28 16:43 ` [RFC PATCH v2 22/67] Hexagon generator phase 2 - qemu_def_generated.h Taylor Simpson
                   ` (46 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Run the C preprocessor across the instruction definition files and macro definitoin file to expand macros and prepare the semantics_generated.pyinc file.  The
resulting file contains one entry with the semantics for each instruction and
one line with the instruction attributes associated with each macro.

Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com
Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/gen_semantics.c | 92 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 92 insertions(+)
 create mode 100644 target/hexagon/gen_semantics.c

diff --git a/target/hexagon/gen_semantics.c b/target/hexagon/gen_semantics.c
new file mode 100644
index 0000000..65d04fa
--- /dev/null
+++ b/target/hexagon/gen_semantics.c
@@ -0,0 +1,92 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+/*
+ * This program generates the semantics file that is processed by
+ * the do_qemu.py script.  We use the C preporcessor to manipulate the
+ * files imported from the Hexagon architecture library.
+ */
+
+#include <stdio.h>
+#define STRINGIZE(X) #X
+
+int main(int argc, char *argv[])
+{
+    FILE *outfile;
+
+    if (argc != 2) {
+        fprintf(stderr, "Usage: gen_semantics ouptputfile\n");
+        return -1;
+    }
+    outfile = fopen(argv[1], "w");
+    if (outfile == NULL) {
+        fprintf(stderr, "Cannot open %s for writing\n", argv[1]);
+        return -1;
+    }
+
+/*
+ * Process the instruction definitions
+ *     Scalar core instructions have the following form
+ *         Q6INSN(A2_add,"Rd32=add(Rs32,Rt32)",ATTRIBS(),
+ *         "Add 32-bit registers",
+ *         { RdV=RsV+RtV;})
+ *     HVX instructions have the following form
+ *         EXTINSN(V6_vinsertwr, "Vx32.w=vinsert(Rt32)",
+ *         ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VX,A_CVI_LATE,A_NOTE_MPY_RESOURCE),
+ *         "Insert Word Scalar into Vector",
+ *         VxV.uw[0] = RtV;)
+ */
+#define Q6INSN(TAG, BEH, ATTRIBS, DESCR, SEM) \
+    do { \
+        fprintf(outfile, "SEMANTICS(\"%s\",%s,\"\"\"%s\"\"\")\n", \
+                #TAG, STRINGIZE(BEH), STRINGIZE(SEM)); \
+        fprintf(outfile, "ATTRIBUTES(\"%s\",\"%s\")\n", \
+                #TAG, STRINGIZE(ATTRIBS)); \
+    } while (0);
+#define EXTINSN(TAG, BEH, ATTRIBS, DESCR, SEM) \
+    do { \
+        fprintf(outfile, "EXT_SEMANTICS(\"%s\",\"%s\",%s,\"\"\"%s\"\"\")\n", \
+                EXTSTR, #TAG, STRINGIZE(BEH), STRINGIZE(SEM)); \
+        fprintf(outfile, "ATTRIBUTES(\"%s\",\"%s\")\n", \
+                #TAG, STRINGIZE(ATTRIBS)); \
+    } while (0);
+#include "imported/allidefs.def"
+#undef Q6INSN
+#undef EXTINSN
+
+/*
+ * Process the macro definitions
+ *     Macros definitions have the following form
+ *         DEF_MACRO(
+ *             fLSBNEW0,,
+ *             "P0.new[0]",
+ *             "Least significant bit of new P0",
+ *             predlog_read(thread,0),
+ *             (A_DOTNEW,A_IMPLICIT_READS_P0)
+ *         )
+ * The important part here is the attributes.  Whenever an instruction
+ * invokes a macro, we add the macro's attributes to the instruction.
+ */
+#define DEF_MACRO(MNAME, PARAMS, SDESC, LDESC, BEH, ATTRS) \
+    fprintf(outfile, "MACROATTRIB(\"%s\",\"\"\"%s\"\"\",\"%s\")\n", \
+            #MNAME, STRINGIZE(BEH), STRINGIZE(ATTRS));
+#include "imported/macros.def"
+#undef DEF_MACRO
+
+    fclose(outfile);
+    return 0;
+}
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 22/67] Hexagon generator phase 2 - qemu_def_generated.h
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (20 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 21/67] Hexagon generator phase 1 - C preprocessor for semantics Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-02-28 16:43 ` [RFC PATCH v2 23/67] Hexagon generator phase 2 - qemu_wrap_generated.h Taylor Simpson
                   ` (45 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

For each instruction we create
    DEF_HELPER function prototype
    TCG code to generate call to helper
    Helper definition

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/do_qemu.py | 769 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 769 insertions(+)
 create mode 100755 target/hexagon/do_qemu.py

diff --git a/target/hexagon/do_qemu.py b/target/hexagon/do_qemu.py
new file mode 100755
index 0000000..6f0e376
--- /dev/null
+++ b/target/hexagon/do_qemu.py
@@ -0,0 +1,769 @@
+#!/usr/bin/env python3
+
+##
+##  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+##
+##  This program is free software; you can redistribute it and/or modify
+##  it under the terms of the GNU General Public License as published by
+##  the Free Software Foundation; either version 2 of the License, or
+##  (at your option) any later version.
+##
+##  This program is distributed in the hope that it will be useful,
+##  but WITHOUT ANY WARRANTY; without even the implied warranty of
+##  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+##  GNU General Public License for more details.
+##
+##  You should have received a copy of the GNU General Public License
+##  along with this program; if not, see <http://www.gnu.org/licenses/>.
+##
+
+import sys
+import re
+import string
+from io import StringIO
+
+
+import operator
+from itertools import chain
+
+
+
+behdict = {}          # tag ->behavior
+semdict = {}          # tag -> semantics
+extdict = {}          # tag -> What extension an instruction belongs to (or "")
+extnames = {}         # ext name -> True
+attribdict = {}       # tag -> attributes
+macros = {}           # macro -> macro information...
+attribinfo = {}       # Register information and misc
+tags = []             # list of all tags
+
+def get_macro(macname,ext=""):
+    mackey = macname + ":" + ext
+    if ext and mackey not in macros:
+        return get_macro(macname,"")
+    return macros[mackey]
+
+# We should do this as a hash for performance,
+# but to keep order let's keep it as a list.
+def uniquify(seq):
+    seen = set()
+    seen_add = seen.add
+    return [x for x in seq if x not in seen and not seen_add(x)]
+
+regre = re.compile(
+    r"((?<!DUP)[MNORCPQXSGVZA])([stuvwxyzdefg]+)([.]?[LlHh]?)(\d+S?)")
+immre = re.compile(r"[#]([rRsSuUm])(\d+)(?:[:](\d+))?")
+reg_or_immre = \
+    re.compile(r"(((?<!DUP)[MNRCOPQXSGVZA])([stuvwxyzdefg]+)" + \
+                "([.]?[LlHh]?)(\d+S?))|([#]([rRsSuUm])(\d+)[:]?(\d+)?)")
+relimmre = re.compile(r"[#]([rR])(\d+)(?:[:](\d+))?")
+absimmre = re.compile(r"[#]([sSuUm])(\d+)(?:[:](\d+))?")
+
+finished_macros = set()
+
+def expand_macro_attribs(macro,allmac_re):
+    if macro.key not in finished_macros:
+        # Get a list of all things that might be macros
+        l = allmac_re.findall(macro.beh)
+        for submacro in l:
+            if not submacro: continue
+            if not get_macro(submacro,macro.ext):
+                raise Exception("Couldn't find macro: <%s>" % l)
+            macro.attribs |= expand_macro_attribs(
+                get_macro(submacro,macro.ext), allmac_re)
+            finished_macros.add(macro.key)
+    return macro.attribs
+
+immextre = re.compile(r'f(MUST_)?IMMEXT[(]([UuSsRr])')
+def calculate_attribs():
+    # Recurse down macros, find attributes from sub-macros
+    macroValues = list(macros.values())
+    allmacros_restr = "|".join(set([ m.re.pattern for m in macroValues ]))
+    allmacros_re = re.compile(allmacros_restr)
+    for macro in macroValues:
+        expand_macro_attribs(macro,allmacros_re)
+    # Append attributes to all instructions
+    for tag in tags:
+        for macname in allmacros_re.findall(semdict[tag]):
+            if not macname: continue
+            macro = get_macro(macname,extdict[tag])
+            attribdict[tag] |= set(macro.attribs)
+        m = immextre.search(semdict[tag])
+        if m:
+            if m.group(2).isupper():
+                attrib = 'A_EXT_UPPER_IMMED'
+            elif m.group(2).islower():
+                attrib = 'A_EXT_LOWER_IMMED'
+            else:
+                raise "Not a letter: %s (%s)" % (m.group(1),tag)
+            if not attrib in attribdict[tag]:
+                attribdict[tag].add(attrib)
+
+def SEMANTICS(tag, beh, sem):
+    #print tag,beh,sem
+    extdict[tag] = ""
+    behdict[tag] = beh
+    semdict[tag] = sem
+    attribdict[tag] = set()
+    tags.append(tag)        # dicts have no order, this is for order
+
+def ATTRIBUTES(tag, attribstring):
+    attribstring = \
+        attribstring.replace("ATTRIBS","").replace("(","").replace(")","")
+    if not attribstring:
+        return
+    attribs = attribstring.split(",")
+    for attrib in attribs:
+        attribdict[tag].add(attrib.strip())
+
+class Macro(object):
+    __slots__ = ['key','name', 'beh', 'attribs', 're','ext']
+    def __init__(self,key, name, beh, attribs,ext):
+        self.key = key
+        self.name = name
+        self.beh = beh
+        self.attribs = set(attribs)
+        self.ext = ext
+        self.re = re.compile("\\b" + name + "\\b")
+
+def MACROATTRIB(macname,beh,attribstring,ext=""):
+    attribstring = attribstring.replace("(","").replace(")","")
+    mackey = macname + ":" + ext
+    if attribstring:
+        attribs = attribstring.split(",")
+    else:
+        attribs = []
+    macros[mackey] = Macro(mackey,macname,beh,attribs,ext)
+
+# read in file.  Evaluate each line: each line calls a function above
+
+for line in open(sys.argv[1], 'rt').readlines():
+    if not line.startswith("#"):
+        eval(line.strip())
+
+
+calculate_attribs()
+
+
+attribre = re.compile(r'DEF_ATTRIB\(([A-Za-z0-9_]+), ([^,]*), ' +
+        r'"([A-Za-z0-9_\.]*)", "([A-Za-z0-9_\.]*)"\)')
+for line in open(sys.argv[2], 'rt').readlines():
+    if not attribre.match(line):
+        continue
+    (attrib_base,descr,rreg,wreg) = attribre.findall(line)[0]
+    attrib_base = 'A_' + attrib_base
+    attribinfo[attrib_base] = {'rreg':rreg, 'wreg':wreg, 'descr':descr}
+
+def compute_tag_regs(tag):
+    return uniquify(regre.findall(behdict[tag]))
+
+def compute_tag_immediates(tag):
+    return uniquify(immre.findall(behdict[tag]))
+
+##
+##  tagregs is the main data structure we'll use
+##  tagregs[tag] will contain the registers used by an instruction
+##  Within each entry, we'll use the regtype and regid fields
+##      regtype can be one of the following
+##          C                control register
+##          N                new register value
+##          P                predicate register
+##          R                GPR register
+##          M                modifier register
+##      regid can be one of the following
+##          d, e             destination register
+##          dd               destination register pair
+##          s, t, u, v, w    source register
+##          ss, tt, uu, vv   source register pair
+##          x, y             read-write register
+##          xx, yy           read-write register pair
+##
+tagregs = dict(zip(tags, list(map(compute_tag_regs, tags))))
+
+def is_pair(regid):
+    return len(regid) == 2
+
+def is_single(regid):
+    return len(regid) == 1
+
+def is_written(regid):
+    return regid[0] in "dexy"
+
+def is_writeonly(regid):
+    return regid[0] in "de"
+
+def is_read(regid):
+    return regid[0] in "stuvwxy"
+
+def is_readwrite(regid):
+    return regid[0] in "xy"
+
+def is_scalar_reg(regtype):
+    return regtype in "RPC"
+
+def is_old_val(regtype, regid, tag):
+    return regtype+regid+'V' in semdict[tag]
+
+def is_new_val(regtype, regid, tag):
+    return regtype+regid+'N' in semdict[tag]
+
+tagimms = dict(zip(tags, list(map(compute_tag_immediates, tags))))
+
+def need_slot(tag):
+    if ('A_CONDEXEC' in attribdict[tag] or
+        'A_STORE' in attribdict[tag]):
+        return 1
+    else:
+        return 0
+
+def need_part1(tag):
+    return re.compile(r"fPART1").search(semdict[tag])
+
+def need_ea(tag):
+    return re.compile(r"\bEA\b").search(semdict[tag])
+
+def imm_name(immlett):
+    return "%siV" % immlett
+
+##
+## Helpers for gen_helper_prototype
+##
+def_helper_types = {
+    'N' : 's32',
+    'O' : 's32',
+    'P' : 's32',
+    'M' : 's32',
+    'C' : 's32',
+    'R' : 's32',
+    'V' : 'ptr',
+    'Q' : 'ptr'
+}
+
+def_helper_types_pair = {
+    'R' : 's64',
+    'C' : 's64',
+    'S' : 's64',
+    'G' : 's64',
+    'V' : 'ptr',
+    'Q' : 'ptr'
+}
+
+def gen_def_helper_opn(f, tag, regtype, regid, toss, numregs, i):
+    if (is_pair(regid)):
+        f.write(", %s" % (def_helper_types_pair[regtype]))
+    elif (is_single(regid)):
+        f.write(", %s" % (def_helper_types[regtype]))
+    else:
+        print("Bad register parse: ",regtype,regid,toss,numregs)
+
+##
+## Generate the DEF_HELPER prototype for an instruction
+##     For A2_add: Rd32=add(Rs32,Rt32)
+##     We produce:
+##         #ifndef fWRAP_A2_add
+##         DEF_HELPER_3(A2_add, s32, env, s32, s32)
+##         #endif
+##
+def gen_helper_prototype(f, tag, regs, imms):
+    f.write('#ifndef fWRAP_%s\n' % tag)
+    numresults = 0
+    numscalarresults = 0
+    numscalarreadwrite = 0
+    for regtype,regid,toss,numregs in regs:
+        if (is_written(regid)):
+            numresults += 1
+            if (is_scalar_reg(regtype)):
+                numscalarresults += 1
+        if (is_readwrite(regid)):
+            if (is_scalar_reg(regtype)):
+                numscalarreadwrite += 1
+
+    if (numscalarresults > 1):
+        ## The helper is bogus when there is more than one result
+        f.write('DEF_HELPER_1(%s, void, env)\n' % tag)
+    else:
+        ## Figure out how many arguments the helper will take
+        if (numscalarresults == 0):
+            def_helper_size = len(regs)+len(imms)+numscalarreadwrite+1
+            if need_part1(tag): def_helper_size += 1
+            if need_slot(tag): def_helper_size += 1
+            f.write('DEF_HELPER_%s(%s' % (def_helper_size, tag))
+            ## The return type is void
+            f.write(', void' )
+        else:
+            def_helper_size = len(regs)+len(imms)+numscalarreadwrite
+            if need_part1(tag): def_helper_size += 1
+            if need_slot(tag): def_helper_size += 1
+            f.write('DEF_HELPER_%s(%s' % (def_helper_size, tag))
+
+        ## Generate the qemu DEF_HELPER type for each result
+        i=0
+        for regtype,regid,toss,numregs in regs:
+            if (is_written(regid)):
+                gen_def_helper_opn(f, tag, regtype, regid, toss, numregs, i)
+                i += 1
+
+        ## Put the env between the outputs and inputs
+        f.write(', env' )
+        i += 1
+
+        ## Generate the qemu type for each input operand (regs and immediates)
+        for regtype,regid,toss,numregs in regs:
+            if (is_read(regid)):
+                gen_def_helper_opn(f, tag, regtype, regid, toss, numregs, i)
+                i += 1
+        for immlett,bits,immshift in imms:
+            f.write(", s32")
+
+        ## Add the arguments for the instruction slot and part1 (if needed)
+        if need_slot(tag): f.write(', i32' )
+        if need_part1(tag): f.write(' , i32' )
+        f.write(')\n')
+    f.write('#endif\n')
+
+##
+## Helpers for gen_tcg_func
+##
+def gen_decl_ea_tcg(f):
+    f.write("DECL_EA;\n")
+
+def gen_free_ea_tcg(f):
+    f.write("FREE_EA;\n")
+
+def genptr_decl(f,regtype,regid,regno):
+    regN="%s%sN" % (regtype,regid)
+    macro = "DECL_%sREG_%s" % (regtype, regid)
+    f.write("%s(%s%sV, %s, %d, 0);\n" % \
+        (macro, regtype, regid, regN, regno))
+
+def genptr_decl_new(f,regtype,regid,regno):
+    regN="%s%sX" % (regtype,regid)
+    macro = "DECL_NEW_%sREG_%s" % (regtype, regid)
+    f.write("%s(%s%sN, %s, %d, 0);\n" % \
+        (macro, regtype, regid, regN, regno))
+
+def genptr_decl_opn(f, tag, regtype, regid, toss, numregs, i):
+    if (is_pair(regid)):
+        genptr_decl(f,regtype,regid,i)
+    elif (is_single(regid)):
+        if is_old_val(regtype, regid, tag):
+            genptr_decl(f,regtype,regid,i)
+        elif is_new_val(regtype, regid, tag):
+            genptr_decl_new(f,regtype,regid,i)
+        else:
+            print("Bad register parse: ",regtype,regid,toss,numregs)
+    else:
+        print("Bad register parse: ",regtype,regid,toss,numregs)
+
+def genptr_decl_imm(f,immlett):
+    if (immlett.isupper()):
+        i = 1
+    else:
+        i = 0
+    f.write("DECL_IMM(%s,%d);\n" % (imm_name(immlett),i))
+
+def genptr_free(f,regtype,regid,regno):
+    macro = "FREE_%sREG_%s" % (regtype, regid)
+    f.write("%s(%s%sV);\n" % (macro, regtype, regid))
+
+def genptr_free_new(f,regtype,regid,regno):
+    macro = "FREE_NEW_%sREG_%s" % (regtype, regid)
+    f.write("%s(%s%sN);\n" % (macro, regtype, regid))
+
+def genptr_free_opn(f,regtype,regid,i):
+    if (is_pair(regid)):
+        genptr_free(f,regtype,regid,i)
+    elif (is_single(regid)):
+        if is_old_val(regtype, regid, tag):
+            genptr_free(f,regtype,regid,i)
+        elif is_new_val(regtype, regid, tag):
+            genptr_free_new(f,regtype,regid,i)
+        else:
+            print("Bad register parse: ",regtype,regid,toss,numregs)
+    else:
+        print("Bad register parse: ",regtype,regid,toss,numregs)
+
+def genptr_free_imm(f,immlett):
+    f.write("FREE_IMM(%s);\n" % (imm_name(immlett)))
+
+def genptr_src_read(f,regtype,regid):
+    macro = "READ_%sREG_%s" % (regtype, regid)
+    f.write("%s(%s%sV, %s%sN);\n" % \
+        (macro,regtype,regid,regtype,regid))
+
+def genptr_src_read_new(f,regtype,regid):
+    macro = "READ_NEW_%sREG_%s" % (regtype, regid)
+    f.write("%s(%s%sN, %s%sX);\n" % \
+        (macro,regtype,regid,regtype,regid))
+
+def genptr_src_read_opn(f,regtype,regid):
+    if (is_pair(regid)):
+        genptr_src_read(f,regtype,regid)
+    elif (is_single(regid)):
+        if is_old_val(regtype, regid, tag):
+            genptr_src_read(f,regtype,regid)
+        elif is_new_val(regtype, regid, tag):
+            genptr_src_read_new(f,regtype,regid)
+        else:
+            print("Bad register parse: ",regtype,regid,toss,numregs)
+    else:
+        print("Bad register parse: ",regtype,regid,toss,numregs)
+
+def gen_helper_call_opn(f, tag, regtype, regid, toss, numregs, i):
+    if (i > 0): f.write(", ")
+    if (is_pair(regid)):
+        f.write("%s%sV" % (regtype,regid))
+    elif (is_single(regid)):
+        if is_old_val(regtype, regid, tag):
+            f.write("%s%sV" % (regtype,regid))
+        elif is_new_val(regtype, regid, tag):
+            f.write("%s%sN" % (regtype,regid))
+        else:
+            print("Bad register parse: ",regtype,regid,toss,numregs)
+    else:
+        print("Bad register parse: ",regtype,regid,toss,numregs)
+
+def gen_helper_decl_imm(f,immlett):
+    f.write("DECL_TCG_IMM(tcgv_%s, %s);\n" % \
+        (imm_name(immlett), imm_name(immlett)))
+
+def gen_helper_call_imm(f,immlett):
+    f.write(", tcgv_%s" % imm_name(immlett))
+
+def gen_helper_free_imm(f,immlett):
+    f.write("FREE_TCG_IMM(tcgv_%s);\n" % imm_name(immlett))
+
+def genptr_dst_write(f,regtype, regid):
+    macro = "WRITE_%sREG_%s" % (regtype, regid)
+    f.write("%s(%s%sN, %s%sV);\n" % (macro, regtype, regid, regtype, regid))
+
+def genptr_dst_write_opn(f,regtype, regid, tag):
+    if (is_pair(regid)):
+        genptr_dst_write(f, regtype, regid)
+    elif (is_single(regid)):
+        genptr_dst_write(f, regtype, regid)
+    else:
+        print("Bad register parse: ",regtype,regid,toss,numregs)
+
+##
+## Generate the TCG code to call the helper
+##     For A2_add: Rd32=add(Rs32,Rt32), { RdV=RsV+RtV;}
+##     We produce:
+##       {
+##       /* A2_add */
+##       DECL_RREG_d(RdV, RdN, 0, 0);
+##       DECL_RREG_s(RsV, RsN, 1, 0);
+##       DECL_RREG_t(RtV, RtN, 2, 0);
+##       READ_RREG_s(RsV, RsN);
+##       READ_RREG_t(RtV, RtN);
+##       fWRAP_A2_add(
+##       do {
+##       gen_helper_A2_add(RdV, cpu_env, RsV, RtV);
+##       while (0),
+##       { RdV=RsV+RtV;});
+##       WRITE_RREG_d(RdN, RdV);
+##       FREE_RREG_d(RdV);
+##       FREE_RREG_s(RsV);
+##       FREE_RREG_t(RtV);
+##       /* A2_add */
+##       }
+##
+def gen_tcg_func(f, tag, regs, imms):
+    f.write('{\n')
+    f.write('/* %s */\n' % tag)
+    if need_ea(tag): gen_decl_ea_tcg(f)
+    i=0
+    ## Declare all the operands (regs and immediates)
+    for regtype,regid,toss,numregs in regs:
+        genptr_decl_opn(f, tag, regtype, regid, toss, numregs, i)
+        i += 1
+    for immlett,bits,immshift in imms:
+        genptr_decl_imm(f,immlett)
+
+    if 'A_PRIV' in attribdict[tag]:
+        f.write('fCHECKFORPRIV();\n')
+    if 'A_GUEST' in attribdict[tag]:
+        f.write('fCHECKFORGUEST();\n')
+    if 'A_FPOP' in attribdict[tag]:
+        f.write('fFPOP_START();\n');
+
+    ## Read all the inputs
+    for regtype,regid,toss,numregs in regs:
+        if (is_read(regid)):
+            genptr_src_read_opn(f,regtype,regid)
+
+    ## Generate the call to the helper
+    f.write("fWRAP_%s(\n" % tag)
+    f.write("do {\n")
+    for immlett,bits,immshift in imms:
+        gen_helper_decl_imm(f,immlett)
+    if need_part1(tag): f.write("PART1_WRAP(")
+    if need_slot(tag): f.write("SLOT_WRAP(")
+    f.write("gen_helper_%s(" % (tag))
+    i=0
+    ## If there is a scalar result, it is the return type
+    for regtype,regid,toss,numregs in regs:
+        if (is_written(regid)):
+            gen_helper_call_opn(f, tag, regtype, regid, toss, numregs, i)
+            i += 1
+    if (i > 0): f.write(", ")
+    f.write("cpu_env")
+    i=1
+    for regtype,regid,toss,numregs in regs:
+        if (is_read(regid)):
+            gen_helper_call_opn(f, tag, regtype, regid, toss, numregs, i)
+            i += 1
+    for immlett,bits,immshift in imms:
+        gen_helper_call_imm(f,immlett)
+
+    if need_slot(tag): f.write(", slot")
+    if need_part1(tag): f.write(", part1" )
+    f.write(")")
+    if need_slot(tag): f.write(")")
+    if need_part1(tag): f.write(")")
+    f.write(";\n")
+    for immlett,bits,immshift in imms:
+        gen_helper_free_imm(f,immlett)
+    f.write("} while (0)")
+    f.write(",\n%s);\n" % semdict[tag] )
+
+    ## Write all the outputs
+    for regtype,regid,toss,numregs in regs:
+        if (is_written(regid)):
+            genptr_dst_write_opn(f,regtype, regid, tag)
+
+    if 'A_FPOP' in attribdict[tag]:
+        f.write('fFPOP_END();\n');
+
+
+    ## Free all the operands (regs and immediates)
+    if need_ea(tag): gen_free_ea_tcg(f)
+    for regtype,regid,toss,numregs in regs:
+        genptr_free_opn(f,regtype,regid,i)
+        i += 1
+    for immlett,bits,immshift in imms:
+        genptr_free_imm(f,immlett)
+
+    f.write("/* %s */\n" % tag)
+    f.write("}")
+
+##
+## Helpers for gen_helper_definition
+##
+def gen_decl_ea(f):
+    f.write("size4u_t EA;\n")
+
+def gen_helper_return_type(f,regtype,regid,regno):
+    if regno > 1 : f.write(", ")
+    f.write("int32_t")
+
+def gen_helper_return_type_pair(f,regtype,regid,regno):
+    if regno > 1 : f.write(", ")
+    f.write("int64_t")
+
+def gen_helper_arg(f,regtype,regid,regno):
+    if regno > 0 : f.write(", " )
+    f.write("int32_t %s%sV" % (regtype,regid))
+
+def gen_helper_arg_new(f,regtype,regid,regno):
+    if regno >= 0 : f.write(", " )
+    f.write("int32_t %s%sN" % (regtype,regid))
+
+def gen_helper_arg_pair(f,regtype,regid,regno):
+    if regno >= 0 : f.write(", ")
+    f.write("int64_t %s%sV" % (regtype,regid))
+
+def gen_helper_arg_opn(f,regtype,regid,i):
+    if (is_pair(regid)):
+        gen_helper_arg_pair(f,regtype,regid,i)
+    elif (is_single(regid)):
+        if is_old_val(regtype, regid, tag):
+            gen_helper_arg(f,regtype,regid,i)
+        elif is_new_val(regtype, regid, tag):
+            gen_helper_arg_new(f,regtype,regid,i)
+        else:
+            print("Bad register parse: ",regtype,regid,toss,numregs)
+    else:
+        print("Bad register parse: ",regtype,regid,toss,numregs)
+
+def gen_helper_arg_imm(f,immlett):
+    f.write(", int32_t %s" % (imm_name(immlett)))
+
+def gen_helper_dest_decl(f,regtype,regid,regno,subfield=""):
+    f.write("int32_t %s%sV%s = 0;\n" % \
+        (regtype,regid,subfield))
+
+def gen_helper_dest_decl_pair(f,regtype,regid,regno,subfield=""):
+    f.write("int64_t %s%sV%s = 0;\n" % \
+        (regtype,regid,subfield))
+
+def gen_helper_dest_decl_opn(f,regtype,regid,i):
+    if (is_pair(regid)):
+        gen_helper_dest_decl_pair(f,regtype,regid,i)
+    elif (is_single(regid)):
+        gen_helper_dest_decl(f,regtype,regid,i)
+    else:
+        print("Bad register parse: ",regtype,regid,toss,numregs)
+
+def gen_helper_return(f,regtype,regid,regno):
+    f.write("return %s%sV;\n" % (regtype,regid))
+
+def gen_helper_return_pair(f,regtype,regid,regno):
+    f.write("return %s%sV;\n" % (regtype,regid))
+
+def gen_helper_return_opn(f, regtype, regid, i):
+    if (is_pair(regid)):
+        gen_helper_return_pair(f,regtype,regid,i)
+    elif (is_single(regid)):
+        gen_helper_return(f,regtype,regid,i)
+    else:
+        print("Bad register parse: ",regtype,regid,toss,numregs)
+
+##
+## Generate the TCG code to call the helper
+##     For A2_add: Rd32=add(Rs32,Rt32), { RdV=RsV+RtV;}
+##     We produce:
+##       #ifndef fWRAP_A2_add
+##       int32_t HELPER(A2_add)(CPUHexagonState *env, int32_t RsV, int32_t RtV)
+##       {
+##       uint32_t slot = 4; slot = slot;
+##       int32_t RdV = 0;
+##       { RdV=RsV+RtV;}
+##       COUNT_HELPER(A2_add);
+##       return RdV;
+##       }
+##       #endif
+##
+def gen_helper_definition(f, tag, regs, imms):
+    f.write('#ifndef fWRAP_%s\n' % tag)
+    numresults = 0
+    numscalarresults = 0
+    numscalarreadwrite = 0
+    for regtype,regid,toss,numregs in regs:
+        if (is_written(regid)):
+            numresults += 1
+            if (is_scalar_reg(regtype)):
+                numscalarresults += 1
+        if (is_readwrite(regid)):
+            if (is_scalar_reg(regtype)):
+                numscalarreadwrite += 1
+
+    if (numscalarresults > 1):
+        ## The helper is bogus when there is more than one result
+        f.write("void HELPER(%s)(CPUHexagonState *env) { BOGUS_HELPER(%s); }\n"
+                % (tag, tag))
+    else:
+        ## The return type of the function is the type of the destination
+        ## register
+        i=0
+        for regtype,regid,toss,numregs in regs:
+            if (is_written(regid)):
+                if (is_pair(regid)):
+                    gen_helper_return_type_pair(f,regtype,regid,i)
+                elif (is_single(regid)):
+                    gen_helper_return_type(f,regtype,regid,i)
+                else:
+                    print("Bad register parse: ",regtype,regid,toss,numregs)
+            i += 1
+
+        if (numscalarresults == 0):
+            f.write("void")
+        f.write(" HELPER(%s)(CPUHexagonState *env" % tag)
+
+        i = 1
+        ## Arguments to the helper function are the source regs and immediates
+        for regtype,regid,toss,numregs in regs:
+            if (is_read(regid)):
+                gen_helper_arg_opn(f,regtype,regid,i)
+                i += 1
+        for immlett,bits,immshift in imms:
+            gen_helper_arg_imm(f,immlett)
+            i += 1
+        if need_slot(tag):
+            if i > 0: f.write(", ")
+            f.write("uint32_t slot")
+            i += 1
+        if need_part1(tag):
+            if i > 0: f.write(", ")
+            f.write("uint32_t part1")
+        f.write(")\n{\n")
+        if (not need_slot(tag)): f.write("uint32_t slot = 4; slot = slot;\n" )
+        if need_ea(tag): gen_decl_ea(f)
+        ## Declare the return variable
+        i=0
+        for regtype,regid,toss,numregs in regs:
+            if (is_writeonly(regid)):
+                gen_helper_dest_decl_opn(f,regtype,regid,i)
+            i += 1
+
+        if 'A_FPOP' in attribdict[tag]:
+            f.write('fFPOP_START();\n');
+
+        f.write(semdict[tag])
+        f.write("\n")
+        f.write("COUNT_HELPER(%s);\n" % tag )
+
+        if 'A_FPOP' in attribdict[tag]:
+            f.write('fFPOP_END();\n');
+
+        ## Save/return the return variable
+        for regtype,regid,toss,numregs in regs:
+            if (is_written(regid)):
+                gen_helper_return_opn(f, regtype, regid, i)
+        f.write("}\n")
+        ## End of the helper definition
+    f.write('#endif\n')
+
+##
+## Bring it all together in the DEF_QEMU macro
+##
+def gen_qemu(f, tag):
+    regs = tagregs[tag]
+    imms = tagimms[tag]
+
+    f.write('DEF_QEMU(%s,%s,\n' % (tag,semdict[tag]))
+    gen_helper_prototype(f, tag, regs, imms)
+    f.write(",\n" )
+    gen_tcg_func(f, tag, regs, imms)
+    f.write(",\n" )
+    gen_helper_definition(f, tag, regs, imms)
+    f.write(")\n")
+
+##
+## Generate the qemu_def_generated.h file
+##
+f = StringIO()
+
+f.write("#ifndef DEF_QEMU\n")
+f.write("#define DEF_QEMU(TAG,SHORTCODE,HELPER,GENFN,HELPFN)   /* Nothing */\n")
+f.write("#endif\n\n")
+
+
+for tag in tags:
+    ## Skip assembler mapped instructions
+    if "A_MAPPING" in attribdict[tag]:
+        continue
+    ## Skip the fake instructions
+    if ( "A_FAKEINSN" in attribdict[tag] ) :
+        continue
+    ## Skip the priv instructions
+    if ( "A_PRIV" in attribdict[tag] ) :
+        continue
+    ## Skip the guest instructions
+    if ( "A_GUEST" in attribdict[tag] ) :
+        continue
+    ## Skip the diag instructions
+    if ( tag == "Y6_diag" ) :
+        continue
+    if ( tag == "Y6_diag0" ) :
+        continue
+    if ( tag == "Y6_diag1" ) :
+        continue
+
+    gen_qemu(f, tag)
+
+realf = open('qemu_def_generated.h','w')
+realf.write(f.getvalue())
+realf.close()
+f.close()
+
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 23/67] Hexagon generator phase 2 - qemu_wrap_generated.h
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (21 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 22/67] Hexagon generator phase 2 - qemu_def_generated.h Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-02-28 16:43 ` [RFC PATCH v2 24/67] Hexagon generator phase 2 - opcodes_def_generated.h Taylor Simpson
                   ` (44 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Gives a default definition of fWRAP_<tag> for each instruction

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/do_qemu.py | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/target/hexagon/do_qemu.py b/target/hexagon/do_qemu.py
index 6f0e376..3f52ef3 100755
--- a/target/hexagon/do_qemu.py
+++ b/target/hexagon/do_qemu.py
@@ -767,3 +767,17 @@ realf.write(f.getvalue())
 realf.close()
 f.close()
 
+##
+## Generate the qemu_wrap_generated.h file
+##     Gives a default definition of fWRAP_<tag> for each instruction
+##
+f = StringIO()
+for tag in tags:
+    f.write( "#ifndef fWRAP_%s\n" % tag )
+    f.write( "#define fWRAP_%s(GENHLPR, SHORTCODE) GENHLPR\n" % tag )
+    f.write( "#endif\n\n" )
+realf = open('qemu_wrap_generated.h', 'wt')
+realf.write(f.getvalue())
+realf.close()
+f.close()
+
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 24/67] Hexagon generator phase 2 - opcodes_def_generated.h
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (22 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 23/67] Hexagon generator phase 2 - qemu_wrap_generated.h Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-02-28 16:43 ` [RFC PATCH v2 25/67] Hexagon generator phase 2 - op_attribs_generated.h Taylor Simpson
                   ` (43 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Gives a list of all the opcodes

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/do_qemu.py | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/target/hexagon/do_qemu.py b/target/hexagon/do_qemu.py
index 3f52ef3..107e1e8 100755
--- a/target/hexagon/do_qemu.py
+++ b/target/hexagon/do_qemu.py
@@ -781,3 +781,15 @@ realf.write(f.getvalue())
 realf.close()
 f.close()
 
+##
+## Generate the opcodes_def_generated.h file
+##     Gives a list of all the opcodes
+##
+f = StringIO()
+for tag in tags:
+    f.write ( "OPCODE(%s),\n" % (tag) )
+realf = open('opcodes_def_generated.h', 'wt')
+realf.write(f.getvalue())
+realf.close()
+f.close()
+
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 25/67] Hexagon generator phase 2 - op_attribs_generated.h
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (23 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 24/67] Hexagon generator phase 2 - opcodes_def_generated.h Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-02-28 16:43 ` [RFC PATCH v2 26/67] Hexagon generator phase 2 - op_regs_generated.h Taylor Simpson
                   ` (42 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Lists all the attributes associated with each instruction

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/do_qemu.py | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/target/hexagon/do_qemu.py b/target/hexagon/do_qemu.py
index 107e1e8..499f0e0 100755
--- a/target/hexagon/do_qemu.py
+++ b/target/hexagon/do_qemu.py
@@ -793,3 +793,16 @@ realf.write(f.getvalue())
 realf.close()
 f.close()
 
+##
+## Generate the op_attribs_generated.h file
+##     Lists all the attributes associated with each instruction
+##
+f = StringIO()
+for tag in tags:
+    f.write('OP_ATTRIB(%s,ATTRIBS(%s))\n' % \
+        (tag, ','.join(sorted(attribdict[tag]))))
+realf = open('op_attribs_generated.h', 'wt')
+realf.write(f.getvalue())
+realf.close()
+f.close()
+
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 26/67] Hexagon generator phase 2 - op_regs_generated.h
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (24 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 25/67] Hexagon generator phase 2 - op_attribs_generated.h Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-02-28 16:43 ` [RFC PATCH v2 27/67] Hexagon generator phase 2 - printinsn-generated.h Taylor Simpson
                   ` (41 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Lists the register and immediate operands for each instruction

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/do_qemu.py | 86 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 86 insertions(+)

diff --git a/target/hexagon/do_qemu.py b/target/hexagon/do_qemu.py
index 499f0e0..0c7643a 100755
--- a/target/hexagon/do_qemu.py
+++ b/target/hexagon/do_qemu.py
@@ -806,3 +806,89 @@ realf.write(f.getvalue())
 realf.close()
 f.close()
 
+##
+## Generate the op_regs_generated.h file
+##     Lists the register and immediate operands for each instruction
+##
+def calculate_regid_reg(tag):
+    def letter_inc(x): return chr(ord(x)+1)
+    ordered_implregs = [ 'SP','FP','LR' ]
+    srcdst_lett = 'X'
+    src_lett = 'S'
+    dst_lett = 'D'
+    retstr = ""
+    mapdict = {}
+    for reg in ordered_implregs:
+        reg_rd = 0
+        reg_wr = 0
+        if ('A_IMPLICIT_READS_'+reg) in attribdict[tag]: reg_rd = 1
+        if ('A_IMPLICIT_WRITES_'+reg) in attribdict[tag]: reg_wr = 1
+        if reg_rd and reg_wr:
+            retstr += srcdst_lett
+            mapdict[srcdst_lett] = reg
+            srcdst_lett = letter_inc(srcdst_lett)
+        elif reg_rd:
+            retstr += src_lett
+            mapdict[src_lett] = reg
+            src_lett = letter_inc(src_lett)
+        elif reg_wr:
+            retstr += dst_lett
+            mapdict[dst_lett] = reg
+            dst_lett = letter_inc(dst_lett)
+    return retstr,mapdict
+
+def calculate_regid_letters(tag):
+    retstr,mapdict = calculate_regid_reg(tag)
+    return retstr
+
+def strip_verif_info_in_regs(x):
+    y=x.replace('UREG.','')
+    y=y.replace('MREG.','')
+    return y.replace('GREG.','')
+
+f = StringIO()
+
+for tag in tags:
+    regs = tagregs[tag]
+    rregs = []
+    wregs = []
+    regids = ""
+    for regtype,regid,toss,numregs in regs:
+        if is_read(regid):
+            if regid[0] not in regids: regids += regid[0]
+            rregs.append(regtype+regid+numregs)
+        if is_written(regid):
+            wregs.append(regtype+regid+numregs)
+            if regid[0] not in regids: regids += regid[0]
+    for attrib in attribdict[tag]:
+        if attribinfo[attrib]['rreg']:
+            rregs.append(strip_verif_info_in_regs(attribinfo[attrib]['rreg']))
+        if attribinfo[attrib]['wreg']:
+            wregs.append(strip_verif_info_in_regs(attribinfo[attrib]['wreg']))
+    regids += calculate_regid_letters(tag)
+    f.write('REGINFO(%s,"%s",\t/*RD:*/\t"%s",\t/*WR:*/\t"%s")\n' % \
+        (tag,regids,",".join(rregs),",".join(wregs)))
+
+for tag in tags:
+    imms = tagimms[tag]
+    f.write( 'IMMINFO(%s' % tag)
+    if not imms:
+        f.write(''','u',0,0,'U',0,0''')
+    for sign,size,shamt in imms:
+        if sign == 'r': sign = 's'
+        if not shamt:
+            shamt = "0"
+        f.write(''','%s',%s,%s''' % (sign,size,shamt))
+    if len(imms) == 1:
+        if sign.isupper():
+            myu = 'u'
+        else:
+            myu = 'U'
+        f.write(''','%s',0,0''' % myu)
+    f.write(')\n')
+
+realf = open('op_regs_generated.h','w')
+realf.write(f.getvalue())
+realf.close()
+f.close()
+
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 27/67] Hexagon generator phase 2 - printinsn-generated.h
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (25 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 26/67] Hexagon generator phase 2 - op_regs_generated.h Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-02-28 16:43 ` [RFC PATCH v2 28/67] Hexagon generator phase 3 - C preprocessor for decode tree Taylor Simpson
                   ` (40 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Data for printing (disassembling) each instruction (format string + operands)

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/do_qemu.py | 151 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 151 insertions(+)

diff --git a/target/hexagon/do_qemu.py b/target/hexagon/do_qemu.py
index 0c7643a..32543c8 100755
--- a/target/hexagon/do_qemu.py
+++ b/target/hexagon/do_qemu.py
@@ -892,3 +892,154 @@ realf.write(f.getvalue())
 realf.close()
 f.close()
 
+##
+## Generate the printinsn_generated.h file
+##     Data for printing each instruction (format string + operands)
+##
+def regprinter(m):
+    str = m.group(1)
+    str += ":".join(["%d"]*len(m.group(2)))
+    str += m.group(3)
+    if ('S' in m.group(1)) and (len(m.group(2)) == 1):
+        str += "/%s"
+    elif ('C' in m.group(1)) and (len(m.group(2)) == 1):
+        str += "/%s"
+    return str
+
+# Regular expression that matches any operator that contains '=' character:
+opswithequal_re = '[-+^&|!<>=]?='
+# Regular expression that matches any assignment operator.
+assignment_re = '[-+^&|]?='
+
+# Out of the operators that contain the = sign, if the operator is also an
+# assignment, spaces will be added around it, unless it's enclosed within
+# parentheses, or spaces are already present.
+
+equals = re.compile(opswithequal_re)
+assign = re.compile(assignment_re)
+
+def spacify(s):
+    slen = len(s)
+    paren_count = {}
+    i = 0
+    pc = 0
+    while i < slen:
+        c = s[i]
+        if c == '(':
+            pc += 1
+        elif c == ')':
+            pc -= 1
+        paren_count[i] = pc
+        i += 1
+
+    # Iterate over all operators that contain the equal sign. If any
+    # match is also an assignment operator, add spaces around it if
+    # the parenthesis count is 0.
+    pos = 0
+    out = []
+    for m in equals.finditer(s):
+        ms = m.start()
+        me = m.end()
+        # t is the string that matched opswithequal_re.
+        t = m.string[ms:me]
+        out += s[pos:ms]
+        pos = me
+        if paren_count[ms] == 0:
+            # Check if the entire string t is an assignment.
+            am = assign.match(t)
+            if am and len(am.group(0)) == me-ms:
+                # Don't add spaces if they are already there.
+                if ms > 0 and s[ms-1] != ' ':
+                    out.append(' ')
+                out += t
+                if me < slen and s[me] != ' ':
+                    out.append(' ')
+                continue
+        # If this is not an assignment, just append it to the output
+        # string.
+        out += t
+
+    # Append the remaining part of the string.
+    out += s[pos:len(s)]
+    return ''.join(out)
+
+immext_casere = re.compile(r'IMMEXT\(([A-Za-z])')
+
+f = StringIO()
+
+for tag in tags:
+    if not behdict[tag]: continue
+    extendable_upper_imm = False
+    extendable_lower_imm = False
+    m = immext_casere.search(semdict[tag])
+    if m:
+        if m.group(1).isupper():
+            extendable_upper_imm = True
+        else:
+            extendable_lower_imm = True
+    beh = behdict[tag]
+    beh = regre.sub(regprinter,beh)
+    beh = absimmre.sub(r"#%s0x%x",beh)
+    beh = relimmre.sub(r"PC+%s%d",beh)
+    beh = spacify(beh)
+    # Print out a literal "%s" at the end, used to match empty string
+    # so C won't complain at us
+    if ("A_VECX" in attribdict[tag]): macname = "DEF_VECX_PRINTINFO"
+    else: macname = "DEF_PRINTINFO"
+    f.write('%s(%s,"%s%%s"' % (macname,tag,beh))
+    regs_or_imms = reg_or_immre.findall(behdict[tag])
+    ri = 0
+    seenregs = {}
+    for allregs,a,b,c,d,allimm,immlett,bits,immshift in regs_or_imms:
+        if a:
+            #register
+            if b in seenregs:
+                regno = seenregs[b]
+            else:
+                regno = ri
+            if len(b) == 1:
+                f.write(',REGNO(%d)' % regno)
+                if 'S' in a:
+                    f.write(',sreg2str(REGNO(%d))' % regno)
+                elif 'C' in a:
+                    f.write(',creg2str(REGNO(%d))' % regno)
+            elif len(b) == 2:
+                f.write(',REGNO(%d)+1,REGNO(%d)' % (regno,regno))
+            elif len(b) == 4:
+                f.write(',REGNO(%d)^3,REGNO(%d)^2,REGNO(%d)^1,REGNO(%d)' % \
+                        (regno,regno,regno,regno))
+            else:
+                print("Put some stuff to handle quads here")
+            if b not in seenregs:
+                seenregs[b] = ri
+                ri += 1
+        else:
+            #immediate
+            if (immlett.isupper()):
+                if extendable_upper_imm:
+                    if immlett in 'rR':
+                        f.write(',insn->extension_valid?"##":""')
+                    else:
+                        f.write(',insn->extension_valid?"#":""')
+                else:
+                    f.write(',""')
+                ii = 1
+            else:
+                if extendable_lower_imm:
+                    if immlett in 'rR':
+                        f.write(',insn->extension_valid?"##":""')
+                    else:
+                        f.write(',insn->extension_valid?"#":""')
+                else:
+                    f.write(',""')
+                ii = 0
+            f.write(',IMMNO(%d)' % ii)
+    # append empty string so there is at least one more arg
+    f.write(',"")\n')
+
+realf = open('printinsn_generated.h','w')
+realf.write(f.getvalue())
+realf.close()
+f.close()
+
+
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 28/67] Hexagon generator phase 3 - C preprocessor for decode tree
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (26 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 27/67] Hexagon generator phase 2 - printinsn-generated.h Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-02-28 16:43 ` [RFC PATCH v2 29/67] Hexagon generater phase 4 - Decode tree Taylor Simpson
                   ` (39 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Run the C preprocessor across the instruction definition and encoding
files to expand macros and prepare the iset.py file.  The resulting
fill contains python data structures used to build the decode tree.

Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com
Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/gen_dectree_import.c | 205 ++++++++++++++++++++++++++++++++++++
 1 file changed, 205 insertions(+)
 create mode 100644 target/hexagon/gen_dectree_import.c

diff --git a/target/hexagon/gen_dectree_import.c b/target/hexagon/gen_dectree_import.c
new file mode 100644
index 0000000..4dfd4d4
--- /dev/null
+++ b/target/hexagon/gen_dectree_import.c
@@ -0,0 +1,205 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+/*
+ * This program generates the encodings file that is processed by
+ * the dectree.py script to produce the decoding tree.  We use the C
+ * preprocessor to manipulate the files imported from the Hexagon
+ * architecture library.
+ */
+#include <stdio.h>
+#include <string.h>
+#include "opcodes.h"
+
+#define STRINGIZE(X)    #X
+
+const char *opcode_names[] = {
+#define OPCODE(IID) STRINGIZE(IID)
+#include "opcodes_def_generated.h"
+    NULL
+#undef OPCODE
+};
+
+char *opcode_syntax[XX_LAST_OPCODE];
+
+/*
+ * Process the instruction definitions
+ *     Scalar core instructions have the following form
+ *         Q6INSN(A2_add,"Rd32=add(Rs32,Rt32)",ATTRIBS(),
+ *         "Add 32-bit registers",
+ *         { RdV=RsV+RtV;})
+ *     HVX instructions have the following form
+ *         EXTINSN(V6_vinsertwr, "Vx32.w=vinsert(Rt32)",
+ *         ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VX,A_CVI_LATE,A_NOTE_MPY_RESOURCE),
+ *         "Insert Word Scalar into Vector",
+ *         VxV.uw[0] = RtV;)
+ */
+void opcode_init()
+{
+#define Q6INSN(TAG, BEH, ATTRIBS, DESCR, SEM) \
+   opcode_syntax[TAG] = BEH;
+#define EXTINSN(TAG, BEH, ATTRIBS, DESCR, SEM) \
+   opcode_syntax[TAG] = BEH;
+#include "imported/allidefs.def"
+#undef Q6INSN
+#undef EXTINSN
+}
+
+const char *opcode_rregs[] = {
+#define REGINFO(TAG, REGINFO, RREGS, WREGS) RREGS,
+#define IMMINFO(TAG, SIGN, SIZE, SHAMT, SIGN2, SIZE2, SHAMT2)  /* nothing */
+#include "op_regs_generated.h"
+    NULL
+#undef REGINFO
+#undef IMMINFO
+};
+
+const char *opcode_wregs[] = {
+#define REGINFO(TAG, REGINFO, RREGS, WREGS) WREGS,
+#define IMMINFO(TAG, SIGN, SIZE, SHAMT, SIGN2, SIZE2, SHAMT2)  /* nothing */
+#include "op_regs_generated.h"
+    NULL
+#undef REGINFO
+#undef IMMINFO
+};
+
+opcode_encoding_t opcode_encodings[] = {
+#define DEF_ENC32(TAG, ENCSTR) \
+    [TAG] = { .encoding = ENCSTR },
+#define DEF_ENC_SUBINSN(TAG, CLASS, ENCSTR) \
+    [TAG] = { .encoding = ENCSTR, .enc_class = CLASS },
+#define DEF_EXT_ENC(TAG, CLASS, ENCSTR) \
+    [TAG] = { .encoding = ENCSTR, .enc_class = CLASS },
+#include "imported/encode.def"
+#undef DEF_ENC32
+#undef DEF_ENC_SUBINSN
+#undef DEF_EXT_ENC
+};
+
+static const char * const opcode_enc_class_names[XX_LAST_ENC_CLASS] = {
+    "NORMAL",
+    "16BIT",
+    "SUBINSN_A",
+    "SUBINSN_L1",
+    "SUBINSN_L2",
+    "SUBINSN_S1",
+    "SUBINSN_S2",
+    "EXT_noext",
+    "EXT_mmvec",
+};
+
+static const char *get_opcode_enc(int opcode)
+{
+    const char *tmp = opcode_encodings[opcode].encoding;
+    if (tmp == NULL) {
+        tmp = "MISSING ENCODING";
+    }
+    return tmp;
+}
+
+static const char *get_opcode_enc_class(int opcode)
+{
+    const char *tmp = opcode_encodings[opcode].encoding;
+    if (tmp == NULL) {
+        char *test = "V6_";        /* HVX */
+        char *name = (char *)opcode_names[opcode];
+        if (strncmp(name, test, strlen(test)) == 0) {
+            return "EXT_mmvec";
+        }
+    }
+    return opcode_enc_class_names[opcode_encodings[opcode].enc_class];
+}
+
+static void gen_iset_table(FILE *out)
+{
+    int i;
+
+    fprintf(out, "iset = {\n");
+    for (i = 0; i < XX_LAST_OPCODE; i++) {
+        fprintf(out, "\t\'%s\' : {\n", opcode_names[i]);
+        fprintf(out, "\t\t\'tag\' : \'%s\',\n", opcode_names[i]);
+        fprintf(out, "\t\t\'syntax\' : \'%s\',\n", opcode_syntax[i]);
+        fprintf(out, "\t\t\'rregs\' : \'%s\',\n", opcode_rregs[i]);
+        fprintf(out, "\t\t\'wregs\' : \'%s\',\n", opcode_wregs[i]);
+        fprintf(out, "\t\t\'enc\' : \'%s\',\n", get_opcode_enc(i));
+        fprintf(out, "\t\t\'enc_class\' : \'%s\',\n", get_opcode_enc_class(i));
+        fprintf(out, "\t},\n");
+    }
+    fprintf(out, "};\n\n");
+}
+
+static void gen_tags_list(FILE *out)
+{
+    int i;
+
+    fprintf(out, "tags = [\n");
+    for (i = 0; i < XX_LAST_OPCODE; i++) {
+        fprintf(out, "\t\'%s\',\n", opcode_names[i]);
+    }
+    fprintf(out, "];\n\n");
+}
+
+static void gen_enc_ext_spaces_table(FILE *out)
+{
+    fprintf(out, "enc_ext_spaces = {\n");
+#define DEF_EXT_SPACE(SPACEID, ENCSTR) \
+    fprintf(out, "\t\'%s\' : \'%s\',\n", #SPACEID, ENCSTR);
+#include "imported/encode.def"
+#undef DEF_EXT_SPACE
+    fprintf(out, "};\n\n");
+}
+
+static void gen_subinsn_groupings_table(FILE *out)
+{
+    fprintf(out, "subinsn_groupings = {\n");
+#define DEF_PACKED32(TAG, TYPEA, TYPEB, ENCSTR) \
+    do { \
+        fprintf(out, "\t\'%s\' : {\n", #TAG); \
+        fprintf(out, "\t\t\'name\' : \'%s\',\n", #TAG); \
+        fprintf(out, "\t\t\'class_a\' : \'%s\',\n", #TYPEA); \
+        fprintf(out, "\t\t\'class_b\' : \'%s\',\n", #TYPEB); \
+        fprintf(out, "\t\t\'enc\' : \'%s\',\n", ENCSTR); \
+        fprintf(out, "\t},\n"); \
+    } while (0);
+#include "imported/encode.def"
+#undef DEF_PACKED32
+    fprintf(out, "};\n\n");
+}
+
+int main(int argc, char *argv[])
+{
+    FILE *outfile;
+
+    if (argc != 2) {
+        fprintf(stderr, "Usage: gen_dectree_import ouptputfile\n");
+        return -1;
+    }
+    outfile = fopen(argv[1], "w");
+    if (outfile == NULL) {
+        fprintf(stderr, "Cannot open %s for writing\n", argv[1]);
+        return -1;
+    }
+
+    opcode_init();
+    gen_iset_table(outfile);
+    gen_tags_list(outfile);
+    gen_enc_ext_spaces_table(outfile);
+    gen_subinsn_groupings_table(outfile);
+
+    fclose(outfile);
+    return 0;
+}
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 29/67] Hexagon generater phase 4 - Decode tree
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (27 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 28/67] Hexagon generator phase 3 - C preprocessor for decode tree Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-02-28 16:43 ` [RFC PATCH v2 30/67] Hexagon opcode data structures Taylor Simpson
                   ` (38 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Python script that emits the decode tree in dectree_generated.h.

Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com
Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/dectree.py | 353 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 353 insertions(+)
 create mode 100755 target/hexagon/dectree.py

diff --git a/target/hexagon/dectree.py b/target/hexagon/dectree.py
new file mode 100755
index 0000000..2b9c82b
--- /dev/null
+++ b/target/hexagon/dectree.py
@@ -0,0 +1,353 @@
+#!/usr/bin/env python3
+
+##
+##  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+##
+##  This program is free software; you can redistribute it and/or modify
+##  it under the terms of the GNU General Public License as published by
+##  the Free Software Foundation; either version 2 of the License, or
+##  (at your option) any later version.
+##
+##  This program is distributed in the hope that it will be useful,
+##  but WITHOUT ANY WARRANTY; without even the implied warranty of
+##  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+##  GNU General Public License for more details.
+##
+##  You should have received a copy of the GNU General Public License
+##  along with this program; if not, see <http://www.gnu.org/licenses/>.
+##
+
+import io
+import re
+
+import sys
+sys.path.insert(0, sys.argv[1])
+import iset
+
+encs = {tag : ''.join(reversed(iset.iset[tag]['enc'].replace(' ', '')))
+    for tag in iset.tags if iset.iset[tag]['enc'] != 'MISSING ENCODING'}
+
+enc_classes = set([iset.iset[tag]['enc_class'] for tag in encs.keys()])
+subinsn_enc_classes = \
+    set([enc_class for enc_class in enc_classes \
+        if enc_class.startswith('SUBINSN_')])
+ext_enc_classes = \
+    set([enc_class for enc_class in enc_classes \
+        if enc_class not in ('NORMAL', '16BIT') and \
+           not enc_class.startswith('SUBINSN_')])
+
+try:
+    subinsn_groupings = iset.subinsn_groupings
+except AttributeError:
+    subinsn_groupings = {}
+
+for (tag, subinsn_grouping) in subinsn_groupings.items():
+    encs[tag] = ''.join(reversed(subinsn_grouping['enc'].replace(' ', '')))
+
+dectree_normal = {'leaves' : set()}
+dectree_16bit = {'leaves' : set()}
+dectree_subinsn_groupings = {'leaves' : set()}
+dectree_subinsns = {name : {'leaves' : set()} for name in subinsn_enc_classes}
+dectree_extensions = {name : {'leaves' : set()} for name in ext_enc_classes}
+
+for tag in encs.keys():
+    if tag in subinsn_groupings:
+        dectree_subinsn_groupings['leaves'].add(tag)
+        continue
+    enc_class = iset.iset[tag]['enc_class']
+    if enc_class.startswith('SUBINSN_'):
+        if len(encs[tag]) != 32:
+            encs[tag] = encs[tag] + '0' * (32 - len(encs[tag]))
+        dectree_subinsns[enc_class]['leaves'].add(tag)
+    elif  enc_class == '16BIT':
+        if len(encs[tag]) != 16:
+            raise Exception('Tag "{}" has enc_class "{}" and not an encoding ' +
+                            'width of 16 bits!'.format(tag, enc_class))
+        dectree_16bit['leaves'].add(tag)
+    else:
+        if len(encs[tag]) != 32:
+            raise Exception('Tag "{}" has enc_class "{}" and not an encoding ' +
+                            'width of 32 bits!'.format(tag, enc_class))
+        if enc_class == 'NORMAL':
+            dectree_normal['leaves'].add(tag)
+        else:
+            dectree_extensions[enc_class]['leaves'].add(tag)
+
+faketags = set()
+for (tag, enc) in iset.enc_ext_spaces.items():
+    faketags.add(tag)
+    encs[tag] = ''.join(reversed(enc.replace(' ', '')))
+    dectree_normal['leaves'].add(tag)
+
+faketags |= set(subinsn_groupings.keys())
+
+def every_bit_counts(bitset):
+    for i in range(1, len(next(iter(bitset)))):
+        if len(set([bits[:i] + bits[i+1:] for bits in bitset])) == len(bitset):
+            return False
+    return True
+
+def auto_separate(node):
+    tags = node['leaves']
+    if len(tags) <= 1:
+        return
+    enc_width = len(encs[next(iter(tags))])
+    opcode_bit_for_all = \
+        [all([encs[tag][i] in '01' \
+            for tag in tags]) for i in range(enc_width)]
+    opcode_bit_is_0_for_all = \
+        [opcode_bit_for_all[i] and all([encs[tag][i] == '0' \
+            for tag in tags]) for i in range(enc_width)]
+    opcode_bit_is_1_for_all = \
+        [opcode_bit_for_all[i] and all([encs[tag][i] == '1' \
+            for tag in tags]) for i in range(enc_width)]
+    differentiator_opcode_bit = \
+        [opcode_bit_for_all[i] and \
+         not (opcode_bit_is_0_for_all[i] or \
+         opcode_bit_is_1_for_all[i]) \
+            for i in range(enc_width)]
+    best_width = 0
+    for width in range(4, 0, -1):
+        for lsb in range(enc_width - width, -1, -1):
+            bitset = set([encs[tag][lsb:lsb+width] for tag in tags])
+            if all(differentiator_opcode_bit[lsb:lsb+width]) and \
+                (len(bitset) == len(tags) or every_bit_counts(bitset)):
+                best_width = width
+                best_lsb = lsb
+                caught_all_tags = len(bitset) == len(tags)
+                break
+        if best_width != 0:
+            break
+    if best_width == 0:
+        raise Exception('Could not find a way to differentiate the encodings ' +
+                         'of the following tags:\n{}'.format('\n'.join(tags)))
+    if caught_all_tags:
+        for width in range(1, best_width):
+            for lsb in range(enc_width - width, -1, -1):
+                bitset = set([encs[tag][lsb:lsb+width] for tag in tags])
+                if all(differentiator_opcode_bit[lsb:lsb+width]) and \
+                    len(bitset) == len(tags):
+                    best_width = width
+                    best_lsb = lsb
+                    break
+            else:
+                continue
+            break
+    node['separator_lsb'] = best_lsb
+    node['separator_width'] = best_width
+    node['children'] = []
+    for value in range(2 ** best_width):
+        child = {}
+        bits = ''.join(reversed('{:0{}b}'.format(value, best_width)))
+        child['leaves'] = \
+            set([tag for tag in tags \
+                if encs[tag][best_lsb:best_lsb+best_width] == bits])
+        node['children'].append(child)
+    for child in node['children']:
+        auto_separate(child)
+
+auto_separate(dectree_normal)
+auto_separate(dectree_16bit)
+if subinsn_groupings:
+    auto_separate(dectree_subinsn_groupings)
+for dectree_subinsn in dectree_subinsns.values():
+    auto_separate(dectree_subinsn)
+for dectree_ext in dectree_extensions.values():
+    auto_separate(dectree_ext)
+
+for tag in faketags:
+    del encs[tag]
+
+def table_name(parents, node):
+    path = parents + [node]
+    root = path[0]
+    tag = next(iter(node['leaves']))
+    if tag in subinsn_groupings:
+        enc_width = len(subinsn_groupings[tag]['enc'].replace(' ', ''))
+    else:
+        tag = next(iter(node['leaves'] - faketags))
+        enc_width = len(encs[tag])
+    determining_bits = ['_'] * enc_width
+    for (parent, child) in zip(path[:-1], path[1:]):
+        lsb = parent['separator_lsb']
+        width = parent['separator_width']
+        value = parent['children'].index(child)
+        determining_bits[lsb:lsb+width] = \
+            list(reversed('{:0{}b}'.format(value, width)))
+    if tag in subinsn_groupings:
+        name = 'DECODE_ROOT_EE'
+    else:
+        enc_class = iset.iset[tag]['enc_class']
+        if enc_class in ext_enc_classes:
+            name = 'DECODE_EXT_{}'.format(enc_class)
+        elif enc_class in subinsn_enc_classes:
+            name = 'DECODE_SUBINSN_{}'.format(enc_class)
+        else:
+            name = 'DECODE_ROOT_{}'.format(enc_width)
+    if node != root:
+        name += '_' + ''.join(reversed(determining_bits))
+    return name
+
+def print_node(f, node, parents):
+    if len(node['leaves']) <= 1:
+        return
+    name = table_name(parents, node)
+    lsb = node['separator_lsb']
+    width = node['separator_width']
+    print('DECODE_NEW_TABLE({},{},DECODE_SEPARATOR_BITS({},{}))'.\
+        format(name, 2 ** width, lsb, width), file=f)
+    for child in node['children']:
+        if len(child['leaves']) == 0:
+            print('INVALID()', file=f)
+        elif len(child['leaves']) == 1:
+            (tag,) = child['leaves']
+            if tag in subinsn_groupings:
+                class_a = subinsn_groupings[tag]['class_a']
+                class_b = subinsn_groupings[tag]['class_b']
+                enc = subinsn_groupings[tag]['enc'].replace(' ', '')
+                if 'RESERVED' in tag:
+                    print('INVALID()', file=f)
+                else:
+                    print('SUBINSNS({},{},{},"{}")'.\
+                        format(tag, class_a, class_b, enc), file=f)
+            elif tag in iset.enc_ext_spaces:
+                enc = iset.enc_ext_spaces[tag].replace(' ', '')
+                print('EXTSPACE({},"{}")'.format(tag, enc), file=f)
+            else:
+                enc = ''.join(reversed(encs[tag]))
+                print('TERMINAL({},"{}")'.format(tag, enc), file=f)
+        else:
+            print('TABLE_LINK({})'.format(table_name(parents + [node], child)),
+                  file=f)
+    print('DECODE_END_TABLE({},{},DECODE_SEPARATOR_BITS({},{}))'.\
+        format(name, 2 ** width, lsb, width), file=f)
+    print(file=f)
+    parents.append(node)
+    for child in node['children']:
+        print_node(f, child, parents)
+    parents.pop()
+
+def print_tree(f, tree):
+    print_node(f, tree, [])
+
+def print_match_info(f):
+    for tag in sorted(encs.keys(), key=iset.tags.index):
+        enc = ''.join(reversed(encs[tag]))
+        mask = int(re.sub(r'[^1]', r'0', enc.replace('0', '1')), 2)
+        match = int(re.sub(r'[^01]', r'0', enc), 2)
+        suffix = ''
+        print('DECODE{}_MATCH_INFO({},0x{:x}U,0x{:x}U)'.\
+            format(suffix, tag, mask, match), file=f)
+
+regre = re.compile(
+    r'((?<!DUP)[MNORCPQXSGVZA])([stuvwxyzdefg]+)([.]?[LlHh]?)(\d+S?)')
+immre = re.compile(r'[#]([rRsSuUm])(\d+)(?:[:](\d+))?')
+
+def ordered_unique(l):
+    return sorted(set(l), key=l.index)
+
+implicit_registers = {
+    'SP' : 29,
+    'FP' : 30,
+    'LR' : 31
+}
+
+num_registers = {
+    'R' : 32,
+    'V' : 32
+}
+
+def print_op_info(f):
+    for tag in sorted(encs.keys(), key=iset.tags.index):
+        enc = encs[tag]
+        print(file=f)
+        print('DECODE_OPINFO({},'.format(tag), file=f)
+        regs = ordered_unique(regre.findall(iset.iset[tag]['syntax']))
+        imms = ordered_unique(immre.findall(iset.iset[tag]['syntax']))
+        regno = 0
+        for reg in regs:
+            reg_type = reg[0]
+            reg_letter = reg[1][0]
+            reg_num_choices = int(reg[3].rstrip('S'))
+            reg_mapping = reg[0] + ''.join(['_' for letter in reg[1]]) + reg[3]
+            reg_enc_fields = re.findall(reg_letter + '+', enc)
+            if len(reg_enc_fields) == 0:
+                raise Exception('Tag "{}" missing register field!'.format(tag))
+            if len(reg_enc_fields) > 1:
+                raise Exception('Tag "{}" has split register field!'.\
+                    format(tag))
+            reg_enc_field = reg_enc_fields[0]
+            if 2 ** len(reg_enc_field) != reg_num_choices:
+                raise Exception('Tag "{}" has incorrect register field width!'.\
+                    format(tag))
+            print('        DECODE_REG({},{},{})'.\
+                format(regno, len(reg_enc_field), enc.index(reg_enc_field)),
+                       file=f)
+            if reg_type in num_registers and \
+                reg_num_choices != num_registers[reg_type]:
+                print('        DECODE_MAPPED_REG({},{})'.\
+                    format(regno, reg_mapping), file=f)
+            regno += 1
+        def implicit_register_key(reg):
+            return implicit_registers[reg]
+        for reg in sorted(
+            set([r for r in (iset.iset[tag]['rregs'].split(',') + \
+                iset.iset[tag]['wregs'].split(',')) \
+                    if r in implicit_registers]), key=implicit_register_key):
+            print('        DECODE_IMPL_REG({},{})'.\
+                format(regno, implicit_registers[reg]), file=f)
+            regno += 1
+        if imms and imms[0][0].isupper():
+            imms = reversed(imms)
+        for imm in imms:
+            if imm[0].isupper():
+                immno = 1
+            else:
+                immno = 0
+            imm_type = imm[0]
+            imm_width = int(imm[1])
+            imm_shift = imm[2]
+            if imm_shift:
+                imm_shift = int(imm_shift)
+            else:
+                imm_shift = 0
+            if imm_type.islower():
+                imm_letter = 'i'
+            else:
+                imm_letter = 'I'
+            remainder = imm_width
+            for m in reversed(list(re.finditer(imm_letter + '+', enc))):
+                remainder -= m.end() - m.start()
+                print('        DECODE_IMM({},{},{},{})'.\
+                    format(immno, m.end() - m.start(), m.start(), remainder),
+                        file=f)
+            if remainder != 0:
+                if imm[2]:
+                    imm[2] = ':' + imm[2]
+                raise Exception('Tag "{}" has an incorrect number of ' + \
+                    'encoding bits for immediate "{}"'.\
+                    format(tag, ''.join(imm)))
+            if imm_type.lower() in 'sr':
+                print('        DECODE_IMM_SXT({},{})'.\
+                    format(immno, imm_width), file=f)
+            if imm_type.lower() == 'n':
+                print('        DECODE_IMM_NEG({},{})'.\
+                    format(immno, imm_width), file=f)
+            if imm_shift:
+                print('        DECODE_IMM_SHIFT({},{})'.\
+                    format(immno, imm_shift), file=f)
+        print(')', file=f)
+
+if __name__ == '__main__':
+    f = io.StringIO()
+    print_tree(f, dectree_normal)
+    print_tree(f, dectree_16bit)
+    if subinsn_groupings:
+        print_tree(f, dectree_subinsn_groupings)
+    for (name, dectree_subinsn) in sorted(dectree_subinsns.items()):
+        print_tree(f, dectree_subinsn)
+    for (name, dectree_ext) in sorted(dectree_extensions.items()):
+        print_tree(f, dectree_ext)
+    print_match_info(f)
+    print_op_info(f)
+    open('dectree_generated.h', 'w').write(f.getvalue())
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 30/67] Hexagon opcode data structures
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (28 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 29/67] Hexagon generater phase 4 - Decode tree Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-02-28 16:43 ` [RFC PATCH v2 31/67] Hexagon macros to interface with the generator Taylor Simpson
                   ` (37 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/opcodes.h |  67 +++++++++++++++
 target/hexagon/opcodes.c | 217 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 284 insertions(+)
 create mode 100644 target/hexagon/opcodes.h
 create mode 100644 target/hexagon/opcodes.c

diff --git a/target/hexagon/opcodes.h b/target/hexagon/opcodes.h
new file mode 100644
index 0000000..d29a1a2
--- /dev/null
+++ b/target/hexagon/opcodes.h
@@ -0,0 +1,67 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HEXAGON_OPCODES_H
+#define HEXAGON_OPCODES_H
+
+#include "hex_arch_types.h"
+#include "attribs.h"
+
+typedef enum {
+#define OPCODE(IID) IID
+#include "opcodes_def_generated.h"
+    XX_LAST_OPCODE
+#undef OPCODE
+} opcode_t;
+
+typedef enum {
+    NORMAL,
+    HALF,
+    SUBINSN_A,
+    SUBINSN_L1,
+    SUBINSN_L2,
+    SUBINSN_S1,
+    SUBINSN_S2,
+    EXT_noext,
+    EXT_mmvec,
+    XX_LAST_ENC_CLASS
+} enc_class_t;
+
+extern const char *opcode_names[];
+
+extern const char *opcode_reginfo[];
+extern const char *opcode_rregs[];
+extern const char *opcode_wregs[];
+
+typedef struct {
+    const char * const encoding;
+    size4u_t vals;
+    size4u_t dep_vals;
+    const enc_class_t enc_class;
+    size1u_t is_ee:1;
+} opcode_encoding_t;
+
+extern opcode_encoding_t opcode_encodings[XX_LAST_OPCODE];
+
+extern size4u_t
+    opcode_attribs[XX_LAST_OPCODE][(A_ZZ_LASTATTRIB / ATTRIB_WIDTH) + 1];
+
+extern void opcode_init(void);
+
+extern int opcode_which_immediate_is_extended(opcode_t opcode);
+
+#endif
diff --git a/target/hexagon/opcodes.c b/target/hexagon/opcodes.c
new file mode 100644
index 0000000..b9c7612
--- /dev/null
+++ b/target/hexagon/opcodes.c
@@ -0,0 +1,217 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+/*
+ * opcodes.c
+ *
+ * data tables generated automatically
+ * Maybe some functions too
+ */
+
+#include "qemu/osdep.h"
+#include "opcodes.h"
+#include "decode.h"
+
+#define VEC_DESCR(A, B, C) DESCR(A, B, C)
+#define DONAME(X) #X
+
+const char *opcode_names[] = {
+#define OPCODE(IID) DONAME(IID)
+#include "opcodes_def_generated.h"
+    NULL
+#undef OPCODE
+};
+
+const char *opcode_reginfo[] = {
+#define IMMINFO(TAG, SIGN, SIZE, SHAMT, SIGN2, SIZE2, SHAMT2)    /* nothing */
+#define REGINFO(TAG, REGINFO, RREGS, WREGS) REGINFO,
+#include "op_regs_generated.h"
+    NULL
+#undef REGINFO
+#undef IMMINFO
+};
+
+
+const char *opcode_rregs[] = {
+#define IMMINFO(TAG, SIGN, SIZE, SHAMT, SIGN2, SIZE2, SHAMT2)    /* nothing */
+#define REGINFO(TAG, REGINFO, RREGS, WREGS) RREGS,
+#include "op_regs_generated.h"
+    NULL
+#undef REGINFO
+#undef IMMINFO
+};
+
+
+const char *opcode_wregs[] = {
+#define IMMINFO(TAG, SIGN, SIZE, SHAMT, SIGN2, SIZE2, SHAMT2)    /* nothing */
+#define REGINFO(TAG, REGINFO, RREGS, WREGS) WREGS,
+#include "op_regs_generated.h"
+    NULL
+#undef REGINFO
+#undef IMMINFO
+};
+
+const char *opcode_short_semantics[] = {
+#define OPCODE(X)              NULL
+#include "opcodes_def_generated.h"
+#undef OPCODE
+    NULL
+};
+
+
+size4u_t
+    opcode_attribs[XX_LAST_OPCODE][(A_ZZ_LASTATTRIB / ATTRIB_WIDTH) + 1] = {0};
+
+static void init_attribs(int tag, ...)
+{
+    va_list ap;
+    int attr;
+    va_start(ap, tag);
+    while ((attr = va_arg(ap, int)) != 0) {
+        opcode_attribs[tag][attr / ATTRIB_WIDTH] |= 1 << (attr % ATTRIB_WIDTH);
+    }
+}
+
+static size4u_t str2val(const char *str)
+{
+    size4u_t ret = 0;
+    for ( ; *str; str++) {
+        switch (*str) {
+        case ' ':
+        case '\t':
+            break;
+        case 's':
+        case 't':
+        case 'u':
+        case 'v':
+        case 'w':
+        case 'd':
+        case 'e':
+        case 'x':
+        case 'y':
+        case 'i':
+        case 'I':
+        case 'P':
+        case 'E':
+        case 'o':
+        case '-':
+        case '0':
+            ret = (ret << 1) | 0;
+            break;
+        case '1':
+            ret = (ret << 1) | 1;
+            break;
+        default:
+            break;
+        }
+    }
+    return ret;
+}
+
+static size1u_t has_ee(const char *str)
+{
+    return (strchr(str, 'E') != NULL);
+}
+
+opcode_encoding_t opcode_encodings[] = {
+#define DEF_ENC32(OPCODE, ENCSTR) \
+    [OPCODE] = { .encoding = ENCSTR },
+
+#define DEF_ENC_SUBINSN(OPCODE, CLASS, ENCSTR) \
+    [OPCODE] = { .encoding = ENCSTR, .enc_class = CLASS },
+
+#define DEF_EXT_ENC(OPCODE, CLASS, ENCSTR) \
+    [OPCODE] = { .encoding = ENCSTR, .enc_class = CLASS },
+
+#include "imported/encode.def"
+
+#undef DEF_ENC32
+#undef DEF_ENC_SUBINSN
+#undef DEF_EXT_ENC
+};
+
+void opcode_init(void)
+{
+    init_attribs(0, 0);
+
+#define DEF_ENC32(OPCODE, ENCSTR) \
+    opcode_encodings[OPCODE].vals = str2val(ENCSTR); \
+    opcode_encodings[OPCODE].is_ee = has_ee(ENCSTR);
+
+#define DEF_ENC_SUBINSN(OPCODE, CLASS, ENCSTR) \
+    opcode_encodings[OPCODE].vals = str2val(ENCSTR);
+
+#define LEGACY_DEF_ENC32(OPCODE, ENCSTR) \
+    opcode_encodings[OPCODE].dep_vals = str2val(ENCSTR);
+
+#define DEF_EXT_ENC(OPCODE, CLASS, ENCSTR) \
+    opcode_encodings[OPCODE].vals = str2val(ENCSTR);
+
+#include "imported/encode.def"
+
+#undef LEGACY_DEF_ENC32
+#undef DEF_ENC32
+#undef DEF_ENC_SUBINSN
+#undef DEF_EXT_ENC
+
+#define ATTRIBS(...) , ## __VA_ARGS__, 0
+#define OP_ATTRIB(TAG, ARGS) init_attribs(TAG ARGS);
+#include "op_attribs_generated.h"
+#undef OP_ATTRIB
+#undef ATTRIBS
+
+    decode_init();
+
+#define DEF_QEMU(TAG, SHORTCODE, HELPER, GENFN, HELPFN) \
+    opcode_short_semantics[TAG] = #SHORTCODE;
+#include "qemu_def_generated.h"
+#undef DEF_QEMU
+}
+
+
+#define NEEDLE "IMMEXT("
+
+int opcode_which_immediate_is_extended(opcode_t opcode)
+{
+    const char *p;
+    if (opcode >= XX_LAST_OPCODE) {
+        g_assert_not_reached();
+        return 0;
+    }
+    if (!GET_ATTRIB(opcode, A_EXTENDABLE)) {
+        g_assert_not_reached();
+        return 0;
+    }
+    p = opcode_short_semantics[opcode];
+    p = strstr(p, NEEDLE);
+    if (p == NULL) {
+        g_assert_not_reached();
+        return 0;
+    }
+    p += strlen(NEEDLE);
+    while (isspace(*p)) {
+        p++;
+    }
+    /* lower is always imm 0, upper always imm 1. */
+    if (islower(*p)) {
+        return 0;
+    } else if (isupper(*p)) {
+        return 1;
+    } else {
+        g_assert_not_reached();
+    }
+}
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 31/67] Hexagon macros to interface with the generator
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (29 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 30/67] Hexagon opcode data structures Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-02-28 16:43 ` [RFC PATCH v2 32/67] Hexagon macros referenced in instruction semantics Taylor Simpson
                   ` (36 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Various forms of declare, read, write, free

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/macros.h | 363 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 363 insertions(+)
 create mode 100644 target/hexagon/macros.h

diff --git a/target/hexagon/macros.h b/target/hexagon/macros.h
new file mode 100644
index 0000000..b8f8d9f
--- /dev/null
+++ b/target/hexagon/macros.h
@@ -0,0 +1,363 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HEXAGON_MACROS_H
+#define HEXAGON_MACROS_H
+
+#include "cpu.h"
+#include "hex_regs.h"
+#include "reg_fields.h"
+
+#ifdef QEMU_GENERATE
+#define DECL_REG(NAME, NUM, X, OFF) \
+    TCGv NAME = tcg_temp_local_new(); \
+    int NUM = REGNO(X) + OFF
+
+#define DECL_REG_WRITABLE(NAME, NUM, X, OFF) \
+    TCGv NAME = tcg_temp_local_new(); \
+    int NUM = REGNO(X) + OFF; \
+    do { \
+        int is_predicated = GET_ATTRIB(insn->opcode, A_CONDEXEC); \
+        if (is_predicated && !is_preloaded(ctx, NUM)) { \
+            tcg_gen_mov_tl(hex_new_value[NUM], hex_gpr[NUM]); \
+        } \
+    } while (0)
+/*
+ * For read-only temps, avoid allocating and freeing
+ */
+#define DECL_REG_READONLY(NAME, NUM, X, OFF) \
+    TCGv NAME; \
+    int NUM = REGNO(X) + OFF
+
+#define DECL_RREG_d(NAME, NUM, X, OFF) \
+    DECL_REG_WRITABLE(NAME, NUM, X, OFF)
+#define DECL_RREG_e(NAME, NUM, X, OFF) \
+    DECL_REG(NAME, NUM, X, OFF)
+#define DECL_RREG_s(NAME, NUM, X, OFF) \
+    DECL_REG_READONLY(NAME, NUM, X, OFF)
+#define DECL_RREG_t(NAME, NUM, X, OFF) \
+    DECL_REG_READONLY(NAME, NUM, X, OFF)
+#define DECL_RREG_u(NAME, NUM, X, OFF) \
+    DECL_REG_READONLY(NAME, NUM, X, OFF)
+#define DECL_RREG_v(NAME, NUM, X, OFF) \
+    DECL_REG_READONLY(NAME, NUM, X, OFF)
+#define DECL_RREG_x(NAME, NUM, X, OFF) \
+    DECL_REG_WRITABLE(NAME, NUM, X, OFF)
+#define DECL_RREG_y(NAME, NUM, X, OFF) \
+    DECL_REG_WRITABLE(NAME, NUM, X, OFF)
+
+#define DECL_PREG_d(NAME, NUM, X, OFF) \
+    DECL_REG(NAME, NUM, X, OFF)
+#define DECL_PREG_e(NAME, NUM, X, OFF) \
+    DECL_REG(NAME, NUM, X, OFF)
+#define DECL_PREG_s(NAME, NUM, X, OFF) \
+    DECL_REG_READONLY(NAME, NUM, X, OFF)
+#define DECL_PREG_t(NAME, NUM, X, OFF) \
+    DECL_REG_READONLY(NAME, NUM, X, OFF)
+#define DECL_PREG_u(NAME, NUM, X, OFF) \
+    DECL_REG_READONLY(NAME, NUM, X, OFF)
+#define DECL_PREG_v(NAME, NUM, X, OFF) \
+    DECL_REG_READONLY(NAME, NUM, X, OFF)
+#define DECL_PREG_x(NAME, NUM, X, OFF) \
+    DECL_REG(NAME, NUM, X, OFF)
+#define DECL_PREG_y(NAME, NUM, X, OFF) \
+    DECL_REG(NAME, NUM, X, OFF)
+
+#define DECL_CREG_d(NAME, NUM, X, OFF) \
+    DECL_REG(NAME, NUM, X, OFF)
+#define DECL_CREG_s(NAME, NUM, X, OFF) \
+    DECL_REG(NAME, NUM, X, OFF)
+
+#define DECL_MREG_u(NAME, NUM, X, OFF) \
+    DECL_REG(NAME, NUM, X, OFF)
+
+#define DECL_NEW_NREG_s(NAME, NUM, X, OFF) \
+    DECL_REG_READONLY(NAME, NUM, X, OFF)
+#define DECL_NEW_NREG_t(NAME, NUM, X, OFF) \
+    DECL_REG_READONLY(NAME, NUM, X, OFF)
+
+#define DECL_NEW_PREG_t(NAME, NUM, X, OFF) \
+    DECL_REG_READONLY(NAME, NUM, X, OFF)
+#define DECL_NEW_PREG_u(NAME, NUM, X, OFF) \
+    DECL_REG_READONLY(NAME, NUM, X, OFF)
+#define DECL_NEW_PREG_v(NAME, NUM, X, OFF) \
+    DECL_REG_READONLY(NAME, NUM, X, OFF)
+
+#define DECL_NEW_OREG_s(NAME, NUM, X, OFF) \
+    DECL_REG_READONLY(NAME, NUM, X, OFF)
+
+#define DECL_PAIR(NAME, NUM, X, OFF) \
+    TCGv_i64 NAME = tcg_temp_local_new_i64(); \
+    size1u_t NUM = REGNO(X) + OFF
+
+#define DECL_PAIR_WRITABLE(NAME, NUM, X, OFF) \
+    TCGv_i64 NAME = tcg_temp_local_new_i64(); \
+    size1u_t NUM = REGNO(X) + OFF; \
+    do { \
+        int is_predicated = GET_ATTRIB(insn->opcode, A_CONDEXEC); \
+        if (is_predicated) { \
+            if (!is_preloaded(ctx, NUM)) { \
+                tcg_gen_mov_tl(hex_new_value[NUM], hex_gpr[NUM]); \
+            } \
+            if (!is_preloaded(ctx, NUM + 1)) { \
+                tcg_gen_mov_tl(hex_new_value[NUM + 1], hex_gpr[NUM + 1]); \
+            } \
+        } \
+    } while (0)
+
+#define DECL_RREG_dd(NAME, NUM, X, OFF) \
+    DECL_PAIR_WRITABLE(NAME, NUM, X, OFF)
+#define DECL_RREG_ss(NAME, NUM, X, OFF) \
+    DECL_PAIR(NAME, NUM, X, OFF)
+#define DECL_RREG_tt(NAME, NUM, X, OFF) \
+    DECL_PAIR(NAME, NUM, X, OFF)
+#define DECL_RREG_xx(NAME, NUM, X, OFF) \
+    DECL_PAIR_WRITABLE(NAME, NUM, X, OFF)
+#define DECL_RREG_yy(NAME, NUM, X, OFF) \
+    DECL_PAIR_WRITABLE(NAME, NUM, X, OFF)
+
+#define DECL_CREG_dd(NAME, NUM, X, OFF) \
+    DECL_PAIR_WRITABLE(NAME, NUM, X, OFF)
+#define DECL_CREG_ss(NAME, NUM, X, OFF) \
+    DECL_PAIR(NAME, NUM, X, OFF)
+
+#define DECL_IMM(NAME, X) \
+    int NAME = IMMNO(X); \
+    do { \
+        NAME = NAME; \
+    } while (0)
+#define DECL_TCG_IMM(TCG_NAME, VAL) \
+    TCGv TCG_NAME = tcg_const_tl(VAL)
+
+#define DECL_EA \
+    TCGv EA; \
+    do { \
+        if (GET_ATTRIB(insn->opcode, A_CONDEXEC)) { \
+            EA = tcg_temp_local_new(); \
+        } else { \
+            EA = tcg_temp_new(); \
+        } \
+    } while (0)
+
+#define LOG_REG_WRITE(RNUM, VAL)\
+    do { \
+        int is_predicated = GET_ATTRIB(insn->opcode, A_CONDEXEC); \
+        gen_log_reg_write(RNUM, VAL, insn->slot, is_predicated); \
+        ctx_log_reg_write(ctx, (RNUM)); \
+    } while (0)
+
+#define LOG_PRED_WRITE(PNUM, VAL) \
+    do { \
+        gen_log_pred_write(PNUM, VAL); \
+        ctx_log_pred_write(ctx, (PNUM)); \
+    } while (0)
+
+#define FREE_REG(NAME) \
+    tcg_temp_free(NAME)
+#define FREE_REG_READONLY(NAME) \
+    /* Nothing */
+
+#define FREE_RREG_d(NAME)            FREE_REG(NAME)
+#define FREE_RREG_e(NAME)            FREE_REG(NAME)
+#define FREE_RREG_s(NAME)            FREE_REG_READONLY(NAME)
+#define FREE_RREG_t(NAME)            FREE_REG_READONLY(NAME)
+#define FREE_RREG_u(NAME)            FREE_REG_READONLY(NAME)
+#define FREE_RREG_v(NAME)            FREE_REG_READONLY(NAME)
+#define FREE_RREG_x(NAME)            FREE_REG(NAME)
+#define FREE_RREG_y(NAME)            FREE_REG(NAME)
+
+#define FREE_PREG_d(NAME)            FREE_REG(NAME)
+#define FREE_PREG_e(NAME)            FREE_REG(NAME)
+#define FREE_PREG_s(NAME)            FREE_REG_READONLY(NAME)
+#define FREE_PREG_t(NAME)            FREE_REG_READONLY(NAME)
+#define FREE_PREG_u(NAME)            FREE_REG_READONLY(NAME)
+#define FREE_PREG_v(NAME)            FREE_REG_READONLY(NAME)
+#define FREE_PREG_x(NAME)            FREE_REG(NAME)
+
+#define FREE_CREG_d(NAME)            FREE_REG(NAME)
+#define FREE_CREG_s(NAME)            FREE_REG_READONLY(NAME)
+
+#define FREE_MREG_u(NAME)            FREE_REG_READONLY(NAME)
+
+#define FREE_NEW_NREG_s(NAME)        FREE_REG(NAME)
+#define FREE_NEW_NREG_t(NAME)        FREE_REG(NAME)
+
+#define FREE_NEW_PREG_t(NAME)        FREE_REG_READONLY(NAME)
+#define FREE_NEW_PREG_u(NAME)        FREE_REG_READONLY(NAME)
+#define FREE_NEW_PREG_v(NAME)        FREE_REG_READONLY(NAME)
+
+#define FREE_NEW_OREG_s(NAME)        FREE_REG(NAME)
+
+#define FREE_REG_PAIR(NAME) \
+    tcg_temp_free_i64(NAME)
+
+#define FREE_RREG_dd(NAME)           FREE_REG_PAIR(NAME)
+#define FREE_RREG_ss(NAME)           FREE_REG_PAIR(NAME)
+#define FREE_RREG_tt(NAME)           FREE_REG_PAIR(NAME)
+#define FREE_RREG_xx(NAME)           FREE_REG_PAIR(NAME)
+#define FREE_RREG_yy(NAME)           FREE_REG_PAIR(NAME)
+
+#define FREE_CREG_dd(NAME)           FREE_REG_PAIR(NAME)
+#define FREE_CREG_ss(NAME)           FREE_REG_PAIR(NAME)
+
+#define FREE_IMM(NAME)               /* nothing */
+#define FREE_TCG_IMM(NAME)           tcg_temp_free(NAME)
+
+#define FREE_EA \
+    tcg_temp_free(EA)
+#else
+#define LOG_REG_WRITE(RNUM, VAL)\
+  log_reg_write(env, RNUM, VAL, slot)
+#define LOG_PRED_WRITE(RNUM, VAL)\
+    log_pred_write(env, RNUM, VAL)
+#endif
+
+#define SLOT_WRAP(CODE) \
+    do { \
+        TCGv slot = tcg_const_tl(insn->slot); \
+        CODE; \
+        tcg_temp_free(slot); \
+    } while (0)
+
+#define PART1_WRAP(CODE) \
+    do { \
+        TCGv part1 = tcg_const_tl(insn->part1); \
+        CODE; \
+        tcg_temp_free(part1); \
+    } while (0)
+
+#define MARK_LATE_PRED_WRITE(RNUM) /* Not modelled in qemu */
+
+#define REGNO(NUM) (insn->regno[NUM])
+#define IMMNO(NUM) (insn->immed[NUM])
+
+#ifdef QEMU_GENERATE
+#define READ_REG(dest, NUM) \
+    gen_read_reg(dest, NUM)
+#define READ_REG_READONLY(dest, NUM) \
+    do { dest = hex_gpr[NUM]; } while (0)
+
+#define READ_RREG_s(dest, NUM) \
+    READ_REG_READONLY(dest, NUM)
+#define READ_RREG_t(dest, NUM) \
+    READ_REG_READONLY(dest, NUM)
+#define READ_RREG_u(dest, NUM) \
+    READ_REG_READONLY(dest, NUM)
+#define READ_RREG_x(dest, NUM) \
+    READ_REG(dest, NUM)
+#define READ_RREG_y(dest, NUM) \
+    READ_REG(dest, NUM)
+
+#define READ_OREG_s(dest, NUM) \
+    READ_REG_READONLY(dest, NUM)
+
+#define READ_CREG_s(dest, NUM) \
+    do { \
+        if ((NUM) + HEX_REG_SA0 == HEX_REG_P3_0) { \
+            gen_read_p3_0(dest); \
+        } else { \
+            READ_REG_READONLY(dest, ((NUM) + HEX_REG_SA0)); \
+        } \
+    } while (0)
+
+#define READ_MREG_u(dest, NUM) \
+    do { \
+        READ_REG_READONLY(dest, ((NUM) + HEX_REG_M0)); \
+        dest = dest; \
+    } while (0)
+#else
+#define READ_REG(NUM) \
+    (env->gpr[(NUM)])
+#endif
+
+#ifdef QEMU_GENERATE
+#define READ_REG_PAIR(tmp, NUM) \
+    tcg_gen_concat_i32_i64(tmp, hex_gpr[NUM], hex_gpr[(NUM) + 1])
+#define READ_RREG_ss(tmp, NUM)          READ_REG_PAIR(tmp, NUM)
+#define READ_RREG_tt(tmp, NUM)          READ_REG_PAIR(tmp, NUM)
+#define READ_RREG_xx(tmp, NUM)          READ_REG_PAIR(tmp, NUM)
+#define READ_RREG_yy(tmp, NUM)          READ_REG_PAIR(tmp, NUM)
+
+#define READ_CREG_PAIR(tmp, i) \
+    READ_REG_PAIR(tmp, ((i) + HEX_REG_SA0))
+#define READ_CREG_ss(tmp, i)            READ_CREG_PAIR(tmp, i)
+#endif
+
+#ifdef QEMU_GENERATE
+#define READ_PREG(dest, NUM)             gen_read_preg(dest, (NUM))
+#define READ_PREG_READONLY(dest, NUM)    do { dest = hex_pred[NUM]; } while (0)
+
+#define READ_PREG_s(dest, NUM)           READ_PREG_READONLY(dest, NUM)
+#define READ_PREG_t(dest, NUM)           READ_PREG_READONLY(dest, NUM)
+#define READ_PREG_u(dest, NUM)           READ_PREG_READONLY(dest, NUM)
+#define READ_PREG_v(dest, NUM)           READ_PREG_READONLY(dest, NUM)
+#define READ_PREG_x(dest, NUM)           READ_PREG(dest, NUM)
+
+#define READ_NEW_PREG(pred, PNUM) \
+    do { pred = hex_new_pred_value[PNUM]; } while (0)
+#define READ_NEW_PREG_t(pred, PNUM)      READ_NEW_PREG(pred, PNUM)
+#define READ_NEW_PREG_u(pred, PNUM)      READ_NEW_PREG(pred, PNUM)
+#define READ_NEW_PREG_v(pred, PNUM)      READ_NEW_PREG(pred, PNUM)
+
+#define READ_NEW_REG(tmp, i) \
+    do { tmp = tcg_const_tl(i); } while (0)
+#define READ_NEW_NREG_s(tmp, i)          READ_NEW_REG(tmp, i)
+#define READ_NEW_NREG_t(tmp, i)          READ_NEW_REG(tmp, i)
+#define READ_NEW_OREG_s(tmp, i)          READ_NEW_REG(tmp, i)
+#else
+#define READ_PREG(NUM)                (env->pred[NUM])
+#endif
+
+
+#define WRITE_RREG(NUM, VAL)             LOG_REG_WRITE(NUM, VAL)
+#define WRITE_RREG_d(NUM, VAL)           LOG_REG_WRITE(NUM, VAL)
+#define WRITE_RREG_e(NUM, VAL)           LOG_REG_WRITE(NUM, VAL)
+#define WRITE_RREG_x(NUM, VAL)           LOG_REG_WRITE(NUM, VAL)
+#define WRITE_RREG_y(NUM, VAL)           LOG_REG_WRITE(NUM, VAL)
+
+#define WRITE_PREG(NUM, VAL)             LOG_PRED_WRITE(NUM, VAL)
+#define WRITE_PREG_d(NUM, VAL)           LOG_PRED_WRITE(NUM, VAL)
+#define WRITE_PREG_e(NUM, VAL)           LOG_PRED_WRITE(NUM, VAL)
+#define WRITE_PREG_x(NUM, VAL)           LOG_PRED_WRITE(NUM, VAL)
+
+#ifdef QEMU_GENERATE
+#define WRITE_CREG(i, tmp) \
+    do { \
+        if (i + HEX_REG_SA0 == HEX_REG_P3_0) { \
+            gen_write_p3_0(tmp); \
+        } else { \
+            WRITE_RREG((i) + HEX_REG_SA0, tmp); \
+        } \
+    } while (0)
+#define WRITE_CREG_d(NUM, VAL)           WRITE_CREG(NUM, VAL)
+
+#define WRITE_CREG_PAIR(i, tmp)          WRITE_REG_PAIR((i) + HEX_REG_SA0, tmp)
+#define WRITE_CREG_dd(NUM, VAL)          WRITE_CREG_PAIR(NUM, VAL)
+
+#define WRITE_REG_PAIR(NUM, VAL) \
+    do { \
+        int is_predicated = GET_ATTRIB(insn->opcode, A_CONDEXEC); \
+        gen_log_reg_write_pair(NUM, VAL, insn->slot, is_predicated); \
+        ctx_log_reg_write(ctx, (NUM)); \
+        ctx_log_reg_write(ctx, (NUM) + 1); \
+    } while (0)
+
+#define WRITE_RREG_dd(NUM, VAL)          WRITE_REG_PAIR(NUM, VAL)
+#define WRITE_RREG_xx(NUM, VAL)          WRITE_REG_PAIR(NUM, VAL)
+#define WRITE_RREG_yy(NUM, VAL)          WRITE_REG_PAIR(NUM, VAL)
+#endif
+
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 32/67] Hexagon macros referenced in instruction semantics
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (30 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 31/67] Hexagon macros to interface with the generator Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-02-28 16:43 ` [RFC PATCH v2 33/67] Hexagon instruction classes Taylor Simpson
                   ` (35 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/macros.h | 1111 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 1111 insertions(+)

diff --git a/target/hexagon/macros.h b/target/hexagon/macros.h
index b8f8d9f..2101a01 100644
--- a/target/hexagon/macros.h
+++ b/target/hexagon/macros.h
@@ -361,3 +361,1114 @@
 #define WRITE_RREG_yy(NUM, VAL)          WRITE_REG_PAIR(NUM, VAL)
 #endif
 
+#define PCALIGN 4
+#define PCALIGN_MASK (PCALIGN - 1)
+
+#define GET_FIELD(FIELD, REGIN) \
+    fEXTRACTU_BITS(REGIN, reg_field_info[FIELD].width, \
+                   reg_field_info[FIELD].offset)
+
+#ifdef QEMU_GENERATE
+#define GET_USR_FIELD(FIELD, DST) \
+    tcg_gen_extract_tl(DST, hex_gpr[HEX_REG_USR], \
+                       reg_field_info[FIELD].offset, \
+                       reg_field_info[FIELD].width)
+
+#define SET_USR_FIELD_FUNC(X) \
+    _Generic((X), int : gen_set_usr_fieldi, TCGv : gen_set_usr_field)
+#define SET_USR_FIELD(FIELD, VAL) \
+    SET_USR_FIELD_FUNC(VAL)(FIELD, VAL)
+#else
+#define GET_USR_FIELD(FIELD) \
+    fEXTRACTU_BITS(env->gpr[HEX_REG_USR], reg_field_info[FIELD].width, \
+                   reg_field_info[FIELD].offset)
+
+#define SET_USR_FIELD(FIELD, VAL) \
+    fINSERT_BITS(env->gpr[HEX_REG_USR], reg_field_info[FIELD].width, \
+                 reg_field_info[FIELD].offset, (VAL))
+#endif
+
+#ifdef QEMU_GENERATE
+#define MEM_LOAD1s(DST, VA)       tcg_gen_qemu_ld8s(DST, VA, ctx->mem_idx)
+#define MEM_LOAD1u(DST, VA)       tcg_gen_qemu_ld8u(DST, VA, ctx->mem_idx)
+#define MEM_LOAD2s(DST, VA)       tcg_gen_qemu_ld16s(DST, VA, ctx->mem_idx)
+#define MEM_LOAD2u(DST, VA)       tcg_gen_qemu_ld16u(DST, VA, ctx->mem_idx)
+#define MEM_LOAD4s(DST, VA)       tcg_gen_qemu_ld32s(DST, VA, ctx->mem_idx)
+#define MEM_LOAD4u(DST, VA)       tcg_gen_qemu_ld32s(DST, VA, ctx->mem_idx)
+#define MEM_LOAD8s(DST, VA)       tcg_gen_qemu_ld64(DST, VA, ctx->mem_idx)
+#define MEM_LOAD8u(DST, VA)       tcg_gen_qemu_ld64(DST, VA, ctx->mem_idx)
+
+#define MEM_STORE1_FUNC(X) \
+    _Generic((X), int : gen_store1i, TCGv_i32 : gen_store1)
+#define MEM_STORE1(VA, DATA, SLOT) \
+    MEM_STORE1_FUNC(DATA)(cpu_env, VA, DATA, ctx, SLOT)
+
+#define MEM_STORE2_FUNC(X) \
+    _Generic((X), int : gen_store2i, TCGv_i32 : gen_store2)
+#define MEM_STORE2(VA, DATA, SLOT) \
+    MEM_STORE2_FUNC(DATA)(cpu_env, VA, DATA, ctx, SLOT)
+
+#define MEM_STORE4_FUNC(X) \
+    _Generic((X), int : gen_store4i, TCGv_i32 : gen_store4)
+#define MEM_STORE4(VA, DATA, SLOT) \
+    MEM_STORE4_FUNC(DATA)(cpu_env, VA, DATA, ctx, SLOT)
+
+#define MEM_STORE8_FUNC(X) \
+    _Generic((X), int64_t : gen_store8i, TCGv_i64 : gen_store8)
+#define MEM_STORE8(VA, DATA, SLOT) \
+    MEM_STORE8_FUNC(DATA)(cpu_env, VA, DATA, ctx, SLOT)
+
+#else
+/*
+ * These should never be executed, but they are needed so the helpers will
+ * compile.  All the instructions with loads must be implemented under
+ * QEMU_GENERATE.
+ */
+static inline uint8_t mem_load1(CPUHexagonState *env, target_ulong vaddr)
+{
+    printf("ERROR: mem_load1\n");
+    g_assert_not_reached();
+}
+
+static inline uint16_t mem_load2(CPUHexagonState *env, target_ulong vadd)
+{
+    printf("ERROR: mem_load2\n");
+    g_assert_not_reached();
+}
+
+static inline uint32_t mem_load4(CPUHexagonState *env, target_ulong vaddr)
+{
+    printf("ERROR: mem_load4\n");
+    g_assert_not_reached();
+}
+
+static inline uint64_t mem_load8(CPUHexagonState *env, target_ulong vaddr)
+{
+    printf("ERROR: mem_load8\n");
+    g_assert_not_reached();
+}
+
+static inline
+uint32_t mem_load_locked4(CPUHexagonState *env, target_ulong vaddr)
+{
+    printf("ERROR: load_locked4\n");
+    g_assert_not_reached();
+    return 0;
+}
+
+static inline
+uint64_t mem_load_locked8(CPUHexagonState *env, target_ulong vaddr)
+{
+    printf("ERROR: load_locked8\n");
+    g_assert_not_reached();
+    return 0;
+}
+
+static inline
+uint8_t mem_store_conditional(CPUHexagonState *env,
+                              target_ulong vaddr, uint32_t src, int size)
+{
+    printf("ERROR: store conditional\n");
+    g_assert_not_reached();
+    return 0;
+}
+
+#define MEM_LOAD1s(VA) ((size1s_t)mem_load1(env, VA))
+#define MEM_LOAD1u(VA) ((size1u_t)mem_load1(env, VA))
+#define MEM_LOAD2s(VA) ((size2s_t)mem_load2(env, VA))
+#define MEM_LOAD2u(VA) ((size2u_t)mem_load2(env, VA))
+#define MEM_LOAD4s(VA) ((size4s_t)mem_load4(env, VA))
+#define MEM_LOAD4u(VA) ((size4u_t)mem_load4(env, VA))
+#define MEM_LOAD8s(VA) ((size8s_t)mem_load8(env, VA))
+#define MEM_LOAD8u(VA) ((size8u_t)mem_load8(env, VA))
+
+#define MEM_STORE1(VA, DATA, SLOT) log_store32(env, VA, DATA, 1, SLOT)
+#define MEM_STORE2(VA, DATA, SLOT) log_store32(env, VA, DATA, 2, SLOT)
+#define MEM_STORE4(VA, DATA, SLOT) log_store32(env, VA, DATA, 4, SLOT)
+#define MEM_STORE8(VA, DATA, SLOT) log_store64(env, VA, DATA, 8, SLOT)
+#endif
+
+
+#ifdef QEMU_GENERATE
+static inline void gen_slot_cancelled_check(TCGv check, int slot_num)
+{
+    TCGv mask = tcg_const_tl(1 << slot_num);
+    TCGv one = tcg_const_tl(1);
+    TCGv zero = tcg_const_tl(0);
+
+    tcg_gen_and_tl(mask, hex_slot_cancelled, mask);
+    tcg_gen_movcond_tl(TCG_COND_NE, check, mask, zero, one, zero);
+
+    tcg_temp_free(one);
+    tcg_temp_free(zero);
+    tcg_temp_free(mask);
+}
+
+static inline void gen_cancel(TCGv slot)
+{
+    TCGv one = tcg_const_tl(1);
+    TCGv mask = tcg_temp_new();
+    tcg_gen_shl_tl(mask, one, slot);
+    tcg_gen_or_tl(hex_slot_cancelled, hex_slot_cancelled, mask);
+    tcg_temp_free(one);
+    tcg_temp_free(mask);
+}
+
+#define CANCEL gen_cancel(slot);
+#else
+#define CANCEL cancel_slot(env, slot)
+#endif
+
+#define STORE_ZERO
+#define LOAD_CANCEL(EA) do { CANCEL; } while (0)
+
+#ifdef QEMU_GENERATE
+static inline void gen_pred_cancel(TCGv pred, int slot_num)
+ {
+    TCGv slot_mask = tcg_const_tl(1 << slot_num);
+    TCGv tmp = tcg_temp_new();
+    TCGv zero = tcg_const_tl(0);
+    TCGv one = tcg_const_tl(1);
+    tcg_gen_or_tl(slot_mask, hex_slot_cancelled, slot_mask);
+    tcg_gen_andi_tl(tmp, pred, 1);
+    tcg_gen_movcond_tl(TCG_COND_EQ, hex_slot_cancelled, tmp, zero,
+                       slot_mask, hex_slot_cancelled);
+    tcg_temp_free(slot_mask);
+    tcg_temp_free(tmp);
+    tcg_temp_free(zero);
+    tcg_temp_free(one);
+}
+#define PRED_LOAD_CANCEL(PRED, EA) \
+    gen_pred_cancel(PRED, insn->is_endloop ? 4 : insn->slot)
+
+#define PRED_STORE_CANCEL(PRED, EA) \
+    gen_pred_cancel(PRED, insn->is_endloop ? 4 : insn->slot)
+#else
+#define STORE_CANCEL(EA) { env->slot_cancelled |= (1 << slot); }
+#endif
+
+#define IS_CANCELLED(SLOT)
+#define fMAX(A, B) (((A) > (B)) ? (A) : (B))
+
+#ifdef QEMU_GENERATE
+#define fMIN(DST, A, B) tcg_gen_movcond_i32(TCG_COND_GT, DST, A, B, A, B)
+#else
+#define fMIN(A, B) (((A) < (B)) ? (A) : (B))
+#endif
+
+#define fABS(A) (((A) < 0) ? (-(A)) : (A))
+#define fINSERT_BITS(REG, WIDTH, OFFSET, INVAL) \
+    do { \
+        REG = ((REG) & ~(((fCONSTLL(1) << (WIDTH)) - 1) << (OFFSET))) | \
+           (((INVAL) & ((fCONSTLL(1) << (WIDTH)) - 1)) << (OFFSET)); \
+    } while (0)
+#define fEXTRACTU_BITS(INREG, WIDTH, OFFSET) \
+    (fZXTN(WIDTH, 32, (INREG >> OFFSET)))
+#define fEXTRACTU_BIDIR(INREG, WIDTH, OFFSET) \
+    (fZXTN(WIDTH, 32, fBIDIR_LSHIFTR((INREG), (OFFSET), 4_8)))
+#define fEXTRACTU_RANGE(INREG, HIBIT, LOWBIT) \
+    (fZXTN((HIBIT - LOWBIT + 1), 32, (INREG >> LOWBIT)))
+#define fINSERT_RANGE(INREG, HIBIT, LOWBIT, INVAL) \
+    do { \
+        int offset = LOWBIT; \
+        int width = HIBIT - LOWBIT + 1; \
+        INREG &= ~(((fCONSTLL(1) << width) - 1) << offset); \
+        INREG |= ((INVAL & ((fCONSTLL(1) << width) - 1)) << offset); \
+    } while (0)
+
+#ifdef QEMU_GENERATE
+#define f8BITSOF(RES, VAL) gen_8bitsof(RES, VAL)
+#define fLSBOLD(VAL) tcg_gen_andi_tl(LSB, (VAL), 1)
+#else
+#define f8BITSOF(VAL) ((VAL) ? 0xff : 0x00)
+#define fLSBOLD(VAL)  ((VAL) & 1)
+#endif
+
+#ifdef QEMU_GENERATE
+#define fLSBNEW(PVAL)   tcg_gen_mov_tl(LSB, (PVAL))
+#define fLSBNEW0        fLSBNEW(0)
+#define fLSBNEW1        fLSBNEW(1)
+#else
+#define fLSBNEW(PVAL)   (PVAL)
+#define fLSBNEW0        new_pred_value(env, 0)
+#define fLSBNEW1        new_pred_value(env, 1)
+#endif
+
+#ifdef QEMU_GENERATE
+static inline void gen_logical_not(TCGv dest, TCGv src)
+{
+    TCGv one = tcg_const_tl(1);
+    TCGv zero = tcg_const_tl(0);
+
+    tcg_gen_movcond_tl(TCG_COND_NE, dest, src, zero, zero, one);
+
+    tcg_temp_free(one);
+    tcg_temp_free(zero);
+}
+#define fLSBOLDNOT(VAL) \
+    do { \
+        tcg_gen_andi_tl(LSB, (VAL), 1); \
+        tcg_gen_xori_tl(LSB, LSB, 1); \
+    } while (0)
+#define fLSBNEWNOT(PNUM) \
+    gen_logical_not(LSB, (PNUM))
+#define fLSBNEW0NOT \
+    do { \
+        tcg_gen_mov_tl(LSB, hex_new_pred_value[0]); \
+        gen_logical_not(LSB, LSB); \
+    } while (0)
+#define fLSBNEW1NOT \
+    do { \
+        tcg_gen_mov_tl(LSB, hex_new_pred_value[1]); \
+        gen_logical_not(LSB, LSB); \
+    } while (0)
+#else
+#define fLSBNEWNOT(PNUM) (!fLSBNEW(PNUM))
+#define fLSBOLDNOT(VAL) (!fLSBOLD(VAL))
+#define fLSBNEW0NOT (!fLSBNEW0)
+#define fLSBNEW1NOT (!fLSBNEW1)
+#endif
+
+#define fNEWREG(RNUM) HELPER(new_value)(env, RNUM)
+
+#ifdef QEMU_GENERATE
+#define fNEWREG_ST(RNUM) gen_newreg_st(NEWREG_ST, cpu_env, RNUM)
+#else
+#define fNEWREG_ST(RNUM) HELPER(new_value)(env, RNUM)
+#endif
+
+#define fMEMZNEW(RNUM) (RNUM == 0)
+#define fMEMNZNEW(RNUM) (RNUM != 0)
+#define fVSATUVALN(N, VAL) \
+    ({ \
+        (((int)(VAL)) < 0) ? 0 : ((1LL << (N)) - 1); \
+    })
+#define fSATUVALN(N, VAL) \
+    ({ \
+        fSET_OVERFLOW(); \
+        ((VAL) < 0) ? 0 : ((1LL << (N)) - 1); \
+    })
+#define fSATVALN(N, VAL) \
+    ({ \
+        fSET_OVERFLOW(); \
+        ((VAL) < 0) ? (-(1LL << ((N) - 1))) : ((1LL << ((N) - 1)) - 1); \
+    })
+#define fVSATVALN(N, VAL) \
+    ({ \
+        ((VAL) < 0) ? (-(1LL << ((N) - 1))) : ((1LL << ((N) - 1)) - 1); \
+    })
+#define fZXTN(N, M, VAL) ((VAL) & ((1LL << (N)) - 1))
+#define fSXTN(N, M, VAL) \
+    ((fZXTN(N, M, VAL) ^ (1LL << ((N) - 1))) - (1LL << ((N) - 1)))
+#define fSATN(N, VAL) \
+    ((fSXTN(N, 64, VAL) == (VAL)) ? (VAL) : fSATVALN(N, VAL))
+#define fVSATN(N, VAL) \
+    ((fSXTN(N, 64, VAL) == (VAL)) ? (VAL) : fVSATVALN(N, VAL))
+#define fADDSAT64(DST, A, B) \
+    do { \
+        size8u_t __a = fCAST8u(A); \
+        size8u_t __b = fCAST8u(B); \
+        size8u_t __sum = __a + __b; \
+        size8u_t __xor = __a ^ __b; \
+        const size8u_t __mask = 0x8000000000000000ULL; \
+        if (__xor & __mask) { \
+            DST = __sum; \
+        } \
+        else if ((__a ^ __sum) & __mask) { \
+            if (__sum & __mask) { \
+                DST = 0x7FFFFFFFFFFFFFFFLL; \
+                fSET_OVERFLOW(); \
+            } else { \
+                DST = 0x8000000000000000LL; \
+                fSET_OVERFLOW(); \
+            } \
+        } else { \
+            DST = __sum; \
+        } \
+    } while (0)
+#define fVSATUN(N, VAL) \
+    ((fZXTN(N, 64, VAL) == (VAL)) ? (VAL) : fVSATUVALN(N, VAL))
+#define fSATUN(N, VAL) \
+    ((fZXTN(N, 64, VAL) == (VAL)) ? (VAL) : fSATUVALN(N, VAL))
+#define fSATH(VAL) (fSATN(16, VAL))
+#define fSATUH(VAL) (fSATUN(16, VAL))
+#define fVSATH(VAL) (fVSATN(16, VAL))
+#define fVSATUH(VAL) (fVSATUN(16, VAL))
+#define fSATUB(VAL) (fSATUN(8, VAL))
+#define fSATB(VAL) (fSATN(8, VAL))
+#define fVSATUB(VAL) (fVSATUN(8, VAL))
+#define fVSATB(VAL) (fVSATN(8, VAL))
+#define fIMMEXT(IMM) (IMM = IMM)
+#define fMUST_IMMEXT(IMM) fIMMEXT(IMM)
+
+#define fPCALIGN(IMM) IMM = (IMM & ~PCALIGN_MASK)
+
+#define fGET_EXTENSION (insn->extension)
+
+#ifdef QEMU_GENERATE
+static inline TCGv gen_read_ireg(TCGv tmp, TCGv val, int shift)
+{
+    /*
+     *  #define fREAD_IREG(VAL) \
+     *      (fSXTN(11, 64, (((VAL) & 0xf0000000)>>21) | ((VAL >> 17) & 0x7f)))
+     */
+    tcg_gen_sari_tl(tmp, val, 17);
+    tcg_gen_andi_tl(tmp, tmp, 0x7f);
+    tcg_gen_shli_tl(tmp, tmp, shift);
+    return tmp;
+}
+#define fREAD_IREG(VAL, SHIFT) gen_read_ireg(ireg, (VAL), (SHIFT))
+#define fREAD_R0() (READ_REG(tmp, 0))
+#define fREAD_LR() (READ_REG(tmp, HEX_REG_LR))
+#define fREAD_SSR() (READ_REG(tmp, HEX_REG_SSR))
+#else
+#define fREAD_IREG(VAL) \
+    (fSXTN(11, 64, (((VAL) & 0xf0000000) >> 21) | ((VAL >> 17) & 0x7f)))
+#define fREAD_R0() (READ_REG(0))
+#define fREAD_LR() (READ_REG(HEX_REG_LR))
+#define fREAD_SSR() (READ_REG(HEX_REG_SSR))
+#endif
+
+#define fWRITE_R0(A) WRITE_RREG(0, A)
+#define fWRITE_LR(A) WRITE_RREG(HEX_REG_LR, A)
+#define fWRITE_FP(A) WRITE_RREG(HEX_REG_FP, A)
+#define fWRITE_SP(A) WRITE_RREG(HEX_REG_SP, A)
+#define fWRITE_GOSP(A) WRITE_RREG(HEX_REG_GOSP, A)
+#define fWRITE_GP(A) WRITE_RREG(HEX_REG_GP, A)
+
+#ifdef QEMU_GENERATE
+#define fREAD_SP() (READ_REG(tmp, HEX_REG_SP))
+#define fREAD_GOSP() (READ_REG(tmp, HEX_REG_GOSP))
+#define fREAD_GELR() (READ_REG(tmp, HEX_REG_GELR))
+#define fREAD_GEVB() (READ_REG(tmp, HEX_REG_GEVB))
+#define fREAD_CSREG(N) (READ_REG(tmp, HEX_REG_CS0 + N))
+#define fREAD_LC0 (READ_REG(tmp, HEX_REG_LC0))
+#define fREAD_LC1 (READ_REG(tmp, HEX_REG_LC1))
+#define fREAD_SA0 (READ_REG(tmp, HEX_REG_SA0))
+#define fREAD_SA1 (READ_REG(tmp, HEX_REG_SA1))
+#define fREAD_FP() (READ_REG(tmp, HEX_REG_FP))
+#define fREAD_GP() (READ_REG(tmp, HEX_REG_GP))
+#define fREAD_PC() (READ_REG(tmp, HEX_REG_PC))
+#else
+#define fREAD_SP() (READ_REG(HEX_REG_SP))
+#define fREAD_GOSP() (READ_REG(HEX_REG_GOSP))
+#define fREAD_GELR() (READ_REG(HEX_REG_GELR))
+#define fREAD_GEVB() (READ_REG(HEX_REG_GEVB))
+#define fREAD_CSREG(N) (READ_REG(HEX_REG_CS0 + N))
+#define fREAD_LC0 (READ_REG(HEX_REG_LC0))
+#define fREAD_LC1 (READ_REG(HEX_REG_LC1))
+#define fREAD_SA0 (READ_REG(HEX_REG_SA0))
+#define fREAD_SA1 (READ_REG(HEX_REG_SA1))
+#define fREAD_FP() (READ_REG(HEX_REG_FP))
+#define fREAD_GP() (READ_REG(HEX_REG_GP))
+#define fREAD_PC() (READ_REG(HEX_REG_PC))
+#endif
+
+#define fREAD_NPC() (env->next_PC & (0xfffffffe))
+
+#ifdef QEMU_GENERATE
+#define fREAD_P0() (READ_PREG(tmp, 0))
+#define fREAD_P3() (READ_PREG(tmp, 3))
+#define fNOATTRIB_READ_P3() (READ_PREG(tmp, 3))
+#else
+#define fREAD_P0() (READ_PREG(0))
+#define fREAD_P3() (READ_PREG(3))
+#define fNOATTRIB_READ_P3() (READ_PREG(3))
+#endif
+
+#define fCHECK_PCALIGN(A)
+#define fCUREXT() GET_SSR_FIELD(SSR_XA)
+#define fCUREXT_WRAP(EXT_NO)
+
+#ifdef QEMU_GENERATE
+#define fWRITE_NPC(A) gen_write_new_pc(A)
+#else
+#define fWRITE_NPC(A) write_new_pc(env, A)
+#endif
+
+#define fLOOPSTATS(A)
+#define fCOF_CALLBACK(LOC, TYPE)
+#define fBRANCH(LOC, TYPE) \
+    do { \
+        fWRITE_NPC(LOC); \
+        fCOF_CALLBACK(LOC, TYPE); \
+    } while (0)
+#define fJUMPR(REGNO, TARGET, TYPE) fBRANCH(TARGET, COF_TYPE_JUMPR)
+#define fHINTJR(TARGET) { }
+#define fBP_RAS_CALL(A)
+#define fCALL(A) \
+    do { \
+        fWRITE_LR(fREAD_NPC()); \
+        fBRANCH(A, COF_TYPE_CALL); \
+    } while (0)
+#define fCALLR(A) \
+    do { \
+        fWRITE_LR(fREAD_NPC()); \
+        fBRANCH(A, COF_TYPE_CALLR); \
+    } while (0)
+#define fWRITE_LOOP_REGS0(START, COUNT) \
+    do { \
+        WRITE_RREG(HEX_REG_LC0, COUNT);  \
+        WRITE_RREG(HEX_REG_SA0, START); \
+    } while (0)
+#define fWRITE_LOOP_REGS1(START, COUNT) \
+    do { \
+        WRITE_RREG(HEX_REG_LC1, COUNT);  \
+        WRITE_RREG(HEX_REG_SA1, START);\
+    } while (0)
+#define fWRITE_LC0(VAL) WRITE_RREG(HEX_REG_LC0, VAL)
+#define fWRITE_LC1(VAL) WRITE_RREG(HEX_REG_LC1, VAL)
+
+#ifdef QEMU_GENERATE
+#define fCARRY_FROM_ADD(A, B, C) gen_carry_from_add64(tmp_i64, A, B, C)
+#else
+#define fCARRY_FROM_ADD(A, B, C) carry_from_add64(A, B, C)
+#endif
+
+#define fSETCV_ADD(A, B, CARRY) \
+    do { \
+        SET_USR_FIELD(USR_C, gen_carry_add((A), (B), ((A) + (B)))); \
+        SET_USR_FIELD(USR_V, gen_overflow_add((A), (B), ((A) + (B)))); \
+    } while (0)
+#define fSETCV_SUB(A, B, CARRY) \
+    do { \
+        SET_USR_FIELD(USR_C, gen_carry_add((A), (B), ((A) - (B)))); \
+        SET_USR_FIELD(USR_V, gen_overflow_add((A), (B), ((A) - (B)))); \
+    } while (0)
+#define fSET_OVERFLOW() SET_USR_FIELD(USR_OVF, 1)
+#define fSET_LPCFG(VAL) SET_USR_FIELD(USR_LPCFG, (VAL))
+#define fGET_LPCFG (GET_USR_FIELD(USR_LPCFG))
+#define fWRITE_P0(VAL) WRITE_PREG(0, VAL)
+#define fWRITE_P1(VAL) WRITE_PREG(1, VAL)
+#define fWRITE_P2(VAL) WRITE_PREG(2, VAL)
+#define fWRITE_P3(VAL) WRITE_PREG(3, VAL)
+#define fWRITE_P3_LATE(VAL) WRITE_PREG(3, VAL)
+#define fPART1(WORK) if (part1) { WORK; return; }
+#define fCAST4u(A) ((size4u_t)(A))
+#define fCAST4s(A) ((size4s_t)(A))
+#define fCAST8u(A) ((size8u_t)(A))
+#define fCAST8s(A) ((size8s_t)(A))
+#define fCAST2_2s(A) ((size2s_t)(A))
+#define fCAST2_2u(A) ((size2u_t)(A))
+#define fCAST4_4s(A) ((size4s_t)(A))
+#define fCAST4_4u(A) ((size4u_t)(A))
+#define fCAST4_8s(A) ((size8s_t)((size4s_t)(A)))
+#define fCAST4_8u(A) ((size8u_t)((size4u_t)(A)))
+#define fCAST8_8s(A) ((size8s_t)(A))
+#define fCAST8_8u(A) ((size8u_t)(A))
+#define fCAST2_8s(A) ((size8s_t)((size2s_t)(A)))
+#define fCAST2_8u(A) ((size8u_t)((size2u_t)(A)))
+#define fZE8_16(A) ((size2s_t)((size1u_t)(A)))
+#define fSE8_16(A) ((size2s_t)((size1s_t)(A)))
+#define fSE16_32(A) ((size4s_t)((size2s_t)(A)))
+#define fZE16_32(A) ((size4u_t)((size2u_t)(A)))
+#define fSE32_64(A) ((size8s_t)((size4s_t)(A)))
+#define fZE32_64(A) ((size8u_t)((size4u_t)(A)))
+#define fSE8_32(A) ((size4s_t)((size1s_t)(A)))
+#define fZE8_32(A) ((size4s_t)((size1u_t)(A)))
+#define fMPY8UU(A, B) (int)(fZE8_16(A) * fZE8_16(B))
+#define fMPY8US(A, B) (int)(fZE8_16(A) * fSE8_16(B))
+#define fMPY8SU(A, B) (int)(fSE8_16(A) * fZE8_16(B))
+#define fMPY8SS(A, B) (int)((short)(A) * (short)(B))
+#define fMPY16SS(A, B) fSE32_64(fSE16_32(A) * fSE16_32(B))
+#define fMPY16UU(A, B) fZE32_64(fZE16_32(A) * fZE16_32(B))
+#define fMPY16SU(A, B) fSE32_64(fSE16_32(A) * fZE16_32(B))
+#define fMPY16US(A, B) fMPY16SU(B, A)
+#define fMPY32SS(A, B) (fSE32_64(A) * fSE32_64(B))
+#define fMPY32UU(A, B) (fZE32_64(A) * fZE32_64(B))
+#define fMPY32SU(A, B) (fSE32_64(A) * fZE32_64(B))
+#define fMPY3216SS(A, B) (fSE32_64(A) * fSXTN(16, 64, B))
+#define fMPY3216SU(A, B) (fSE32_64(A) * fZXTN(16, 64, B))
+#define fROUND(A) (A + 0x8000)
+#define fCLIP(DST, SRC, U) \
+    do { \
+        size4s_t maxv = (1 << U) - 1; \
+        size4s_t minv = -(1 << U); \
+        DST = fMIN(maxv, fMAX(SRC, minv)); \
+    } while (0)
+#define fCRND(A) ((((A) & 0x3) == 0x3) ? ((A) + 1) : ((A)))
+#define fRNDN(A, N) ((((N) == 0) ? (A) : (((fSE32_64(A)) + (1 << ((N) - 1))))))
+#define fCRNDN(A, N) (conv_round(A, N))
+#define fCRNDN64(A, N) (conv_round64(A, N))
+#define fADD128(A, B) (add128(A, B))
+#define fSUB128(A, B) (sub128(A, B))
+#define fSHIFTR128(A, B) (shiftr128(A, B))
+#define fSHIFTL128(A, B) (shiftl128(A, B))
+#define fAND128(A, B) (and128(A, B))
+#define fCAST8S_16S(A) (cast8s_to_16s(A))
+#define fCAST16S_8S(A) (cast16s_to_8s(A))
+#define fCAST16S_4S(A) (cast16s_to_4s(A))
+
+#ifdef QEMU_GENERATE
+#define fEA_RI(REG, IMM) tcg_gen_addi_tl(EA, REG, IMM)
+#define fEA_RRs(REG, REG2, SCALE) \
+    do { \
+        TCGv tmp = tcg_temp_new(); \
+        tcg_gen_shli_tl(tmp, REG2, SCALE); \
+        tcg_gen_add_tl(EA, REG, tmp); \
+        tcg_temp_free(tmp); \
+    } while (0)
+#define fEA_IRs(IMM, REG, SCALE) \
+    do { \
+        tcg_gen_shli_tl(EA, REG, SCALE); \
+        tcg_gen_addi_tl(EA, EA, IMM); \
+    } while (0)
+#else
+#define fEA_RI(REG, IMM) \
+    do { \
+        EA = REG + IMM; \
+        fDOCHKPAGECROSS(REG, EA); \
+    } while (0)
+#define fEA_RRs(REG, REG2, SCALE) \
+    do { \
+        EA = REG + (REG2 << SCALE); \
+        fDOCHKPAGECROSS(REG, EA); \
+    } while (0)
+#define fEA_IRs(IMM, REG, SCALE) \
+    do { \
+        EA = IMM + (REG << SCALE); \
+        fDOCHKPAGECROSS(IMM, EA); \
+    } while (0)
+#endif
+
+#ifdef QEMU_GENERATE
+#define fEA_IMM(IMM) tcg_gen_movi_tl(EA, IMM)
+#define fEA_REG(REG) tcg_gen_mov_tl(EA, REG)
+static inline void gen_fbrev(TCGv result, TCGv src)
+{
+    TCGv result_hi = tcg_temp_new();
+    TCGv result_lo = tcg_temp_new();
+    TCGv tmp1 = tcg_temp_new();
+    TCGv tmp2 = tcg_temp_new();
+    int i;
+
+    /*
+     *  Bit reverse the low 16 bits of the address
+     */
+    tcg_gen_andi_tl(result_hi, src, 0xffff0000);
+    tcg_gen_movi_tl(result_lo, 0);
+    tcg_gen_mov_tl(tmp1, src);
+    for (i = 0; i < 16; i++) {
+        /*
+         * result_lo = (result_lo << 1) | (tmp1 & 1);
+         * tmp1 >>= 1;
+         */
+        tcg_gen_shli_tl(result_lo, result_lo, 1);
+        tcg_gen_andi_tl(tmp2, tmp1, 1);
+        tcg_gen_or_tl(result_lo, result_lo, tmp2);
+        tcg_gen_sari_tl(tmp1, tmp1, 1);
+    }
+    tcg_gen_or_tl(result, result_hi, result_lo);
+
+    tcg_temp_free(result_hi);
+    tcg_temp_free(result_lo);
+    tcg_temp_free(tmp1);
+    tcg_temp_free(tmp2);
+}
+static inline void gen_fcircadd(TCGv reg, TCGv incr, TCGv M, TCGv start_addr)
+{
+    TCGv length = tcg_temp_new();
+    TCGv new_ptr = tcg_temp_new();
+    TCGv end_addr = tcg_temp_new();
+    TCGv tmp = tcg_temp_new();
+
+    tcg_gen_andi_tl(length, M, 0x1ffff);
+    tcg_gen_add_tl(new_ptr, reg, incr);
+    tcg_gen_add_tl(end_addr, start_addr, length);
+
+    tcg_gen_sub_tl(tmp, new_ptr, length);
+    tcg_gen_movcond_tl(TCG_COND_GE, new_ptr, new_ptr, end_addr, tmp, new_ptr);
+    tcg_gen_add_tl(tmp, new_ptr, length);
+    tcg_gen_movcond_tl(TCG_COND_LT, new_ptr, new_ptr, start_addr, tmp, new_ptr);
+
+    tcg_gen_mov_tl(reg, new_ptr);
+
+    tcg_temp_free(length);
+    tcg_temp_free(new_ptr);
+    tcg_temp_free(end_addr);
+    tcg_temp_free(tmp);
+}
+
+#define fEA_BREVR(REG)      gen_fbrev(EA, REG)
+#define fEA_GPI(IMM)        tcg_gen_addi_tl(EA, fREAD_GP(), IMM)
+#define fPM_I(REG, IMM)     tcg_gen_addi_tl(REG, REG, IMM)
+#define fPM_M(REG, MVAL)    tcg_gen_add_tl(REG, REG, MVAL)
+#else
+#define fEA_IMM(IMM) EA = IMM
+#define fEA_REG(REG) EA = REG
+#define fEA_GPI(IMM) \
+    do { \
+        EA = fREAD_GP() + IMM; \
+        fGP_DOCHKPAGECROSS(fREAD_GP(), EA); \
+    } while (0)
+#define fPM_I(REG, IMM) \
+    do { \
+        REG = REG + IMM; \
+    } while (0)
+#define fPM_M(REG, MVAL) \
+    do { \
+        REG = REG + MVAL; \
+    } while (0)
+#define fEA_BREVR(REG) \
+    do { \
+        EA = fbrev(REG); \
+    } while (0)
+#endif
+#define fPM_CIRI(REG, IMM, MVAL) \
+    do { \
+        TCGv tcgv_siV = tcg_const_tl(siV); \
+        fcirc_add(REG, tcgv_siV, MuV); \
+        tcg_temp_free(tcgv_siV); \
+    } while (0)
+#define fPM_CIRR(REG, VAL, MVAL) \
+    do { \
+        fcirc_add(REG, VAL, MuV); \
+    } while (0)
+#define fMODCIRCU(N, P) ((N) & ((1 << (P)) - 1))
+#define fSCALE(N, A) (((size8s_t)(A)) << N)
+#define fVSATW(A) fVSATN(32, ((long long)A))
+#define fSATW(A) fSATN(32, ((long long)A))
+#define fVSAT(A) fVSATN(32, (A))
+#define fSAT(A) fSATN(32, (A))
+#define fSAT_ORIG_SHL(A, ORIG_REG) \
+    ((((size4s_t)((fSAT(A)) ^ ((size4s_t)(ORIG_REG)))) < 0) \
+        ? fSATVALN(32, ((size4s_t)(ORIG_REG))) \
+        : ((((ORIG_REG) > 0) && ((A) == 0)) ? fSATVALN(32, (ORIG_REG)) \
+                                            : fSAT(A)))
+#define fPASS(A) A
+#define fRND(A) (((A) + 1) >> 1)
+#define fBIDIR_SHIFTL(SRC, SHAMT, REGSTYPE) \
+    (((SHAMT) < 0) ? ((fCAST##REGSTYPE(SRC) >> ((-(SHAMT)) - 1)) >> 1) \
+                   : (fCAST##REGSTYPE(SRC) << (SHAMT)))
+#define fBIDIR_ASHIFTL(SRC, SHAMT, REGSTYPE) \
+    fBIDIR_SHIFTL(SRC, SHAMT, REGSTYPE##s)
+#define fBIDIR_LSHIFTL(SRC, SHAMT, REGSTYPE) \
+    fBIDIR_SHIFTL(SRC, SHAMT, REGSTYPE##u)
+#define fBIDIR_ASHIFTL_SAT(SRC, SHAMT, REGSTYPE) \
+    (((SHAMT) < 0) ? ((fCAST##REGSTYPE##s(SRC) >> ((-(SHAMT)) - 1)) >> 1) \
+                   : fSAT_ORIG_SHL(fCAST##REGSTYPE##s(SRC) << (SHAMT), (SRC)))
+#define fBIDIR_SHIFTR(SRC, SHAMT, REGSTYPE) \
+    (((SHAMT) < 0) ? ((fCAST##REGSTYPE(SRC) << ((-(SHAMT)) - 1)) << 1) \
+                   : (fCAST##REGSTYPE(SRC) >> (SHAMT)))
+#define fBIDIR_ASHIFTR(SRC, SHAMT, REGSTYPE) \
+    fBIDIR_SHIFTR(SRC, SHAMT, REGSTYPE##s)
+#define fBIDIR_LSHIFTR(SRC, SHAMT, REGSTYPE) \
+    fBIDIR_SHIFTR(SRC, SHAMT, REGSTYPE##u)
+#define fBIDIR_ASHIFTR_SAT(SRC, SHAMT, REGSTYPE) \
+    (((SHAMT) < 0) ? fSAT_ORIG_SHL((fCAST##REGSTYPE##s(SRC) \
+                        << ((-(SHAMT)) - 1)) << 1, (SRC)) \
+                   : (fCAST##REGSTYPE##s(SRC) >> (SHAMT)))
+#ifdef QEMU_GENERATE
+#define fASHIFTR(DST, SRC, SHAMT, REGSTYPE) \
+    gen_ashiftr_##REGSTYPE##s(DST, SRC, SHAMT)
+#define fLSHIFTR(DST, SRC, SHAMT, REGSTYPE) \
+    gen_lshiftr_##REGSTYPE##u(DST, SRC, SHAMT)
+#else
+#define fASHIFTR(SRC, SHAMT, REGSTYPE) (fCAST##REGSTYPE##s(SRC) >> (SHAMT))
+#define fLSHIFTR(SRC, SHAMT, REGSTYPE) \
+    (((SHAMT) >= 64) ? 0 : (fCAST##REGSTYPE##u(SRC) >> (SHAMT)))
+#endif
+#define fROTL(SRC, SHAMT, REGSTYPE) \
+    (((SHAMT) == 0) ? (SRC) : ((fCAST##REGSTYPE##u(SRC) << (SHAMT)) | \
+                              ((fCAST##REGSTYPE##u(SRC) >> \
+                                 ((sizeof(SRC) * 8) - (SHAMT))))))
+#define fROTR(SRC, SHAMT, REGSTYPE) \
+    (((SHAMT) == 0) ? (SRC) : ((fCAST##REGSTYPE##u(SRC) >> (SHAMT)) | \
+                              ((fCAST##REGSTYPE##u(SRC) << \
+                                 ((sizeof(SRC) * 8) - (SHAMT))))))
+#ifdef QEMU_GENERATE
+#define fASHIFTL(DST, SRC, SHAMT, REGSTYPE) \
+    gen_ashiftl_##REGSTYPE##s(DST, SRC, SHAMT)
+#else
+#define fASHIFTL(SRC, SHAMT, REGSTYPE) \
+    (((SHAMT) >= 64) ? 0 : (fCAST##REGSTYPE##s(SRC) << (SHAMT)))
+#endif
+#define fFLOAT(A) \
+    ({ union { float f; size4u_t i; } _fipun; _fipun.i = (A); _fipun.f; })
+#define fUNFLOAT(A) \
+    ({ union { float f; size4u_t i; } _fipun; \
+     _fipun.f = (A); isnan(_fipun.f) ? 0xFFFFFFFFU : _fipun.i; })
+#define fSFNANVAL() 0xffffffff
+#define fSFINFVAL(A) (((A) & 0x80000000) | 0x7f800000)
+#define fSFONEVAL(A) (((A) & 0x80000000) | fUNFLOAT(1.0))
+#define fCHECKSFNAN(DST, A) \
+    do { \
+        if (isnan(fFLOAT(A))) { \
+            if ((fGETBIT(22, A)) == 0) { \
+                fRAISEFLAGS(FE_INVALID); \
+            } \
+            DST = fSFNANVAL(); \
+        } \
+    } while (0)
+#define fCHECKSFNAN3(DST, A, B, C) \
+    do { \
+        fCHECKSFNAN(DST, A); \
+        fCHECKSFNAN(DST, B); \
+        fCHECKSFNAN(DST, C); \
+    } while (0)
+#define fSF_BIAS() 127
+#define fSF_MANTBITS() 23
+#define fSF_RECIP_LOOKUP(IDX) arch_recip_lookup(IDX)
+#define fSF_INVSQRT_LOOKUP(IDX) arch_invsqrt_lookup(IDX)
+#define fSF_MUL_POW2(A, B) \
+    (fUNFLOAT(fFLOAT(A) * fFLOAT((fSF_BIAS() + (B)) << fSF_MANTBITS())))
+#define fSF_GETEXP(A) (((A) >> fSF_MANTBITS()) & 0xff)
+#define fSF_MAXEXP() (254)
+#define fSF_RECIP_COMMON(N, D, O, A) arch_sf_recip_common(&N, &D, &O, &A)
+#define fSF_INVSQRT_COMMON(N, O, A) arch_sf_invsqrt_common(&N, &O, &A)
+#define fFMAFX(A, B, C, ADJ) internal_fmafx(A, B, C, fSXTN(8, 64, ADJ))
+#define fFMAF(A, B, C) internal_fmafx(A, B, C, 0)
+#define fSFMPY(A, B) internal_mpyf(A, B)
+#define fMAKESF(SIGN, EXP, MANT) \
+    ((((SIGN) & 1) << 31) | \
+     (((EXP) & 0xff) << fSF_MANTBITS()) | \
+     ((MANT) & ((1 << fSF_MANTBITS()) - 1)))
+#define fDOUBLE(A) \
+    ({ union { double f; size8u_t i; } _fipun; _fipun.i = (A); _fipun.f; })
+#define fUNDOUBLE(A) \
+    ({ union { double f; size8u_t i; } _fipun; \
+     _fipun.f = (A); \
+     isnan(_fipun.f) ? 0xFFFFFFFFFFFFFFFFULL : _fipun.i; })
+#define fDFNANVAL() 0xffffffffffffffffULL
+#define fDFINFVAL(A) (((A) & 0x8000000000000000ULL) | 0x7ff0000000000000ULL)
+#define fDFONEVAL(A) (((A) & 0x8000000000000000ULL) | fUNDOUBLE(1.0))
+#define fCHECKDFNAN(DST, A) \
+    do { \
+        if (isnan(fDOUBLE(A))) { \
+            if ((fGETBIT(51, A)) == 0) { \
+                fRAISEFLAGS(FE_INVALID); \
+            } \
+            DST = fDFNANVAL(); \
+        } \
+    } while (0)
+#define fCHECKDFNAN3(DST, A, B, C) \
+    do { \
+        fCHECKDFNAN(DST, A); \
+        fCHECKDFNAN(DST, B); \
+        fCHECKDFNAN(DST, C); \
+    } while (0)
+#define fDF_BIAS() 1023
+#define fDF_ISNORMAL(X) (fpclassify(fDOUBLE(X)) == FP_NORMAL)
+#define fDF_ISDENORM(X) (fpclassify(fDOUBLE(X)) == FP_SUBNORMAL)
+#define fDF_ISBIG(X) (fDF_GETEXP(X) >= 512)
+#define fDF_MANTBITS() 52
+#define fDF_RECIP_LOOKUP(IDX) (size8u_t)(arch_recip_lookup(IDX))
+#define fDF_INVSQRT_LOOKUP(IDX) (size8u_t)(arch_invsqrt_lookup(IDX))
+#define fDF_MUL_POW2(A, B) \
+    (fUNDOUBLE(fDOUBLE(A) * fDOUBLE((0ULL + fDF_BIAS() + (B)) << \
+     fDF_MANTBITS())))
+#define fDF_GETEXP(A) (((A) >> fDF_MANTBITS()) & 0x7ff)
+#define fDF_MAXEXP() (2046)
+#define fDF_RECIP_COMMON(N, D, O, A) arch_df_recip_common(&N, &D, &O, &A)
+#define fDF_INVSQRT_COMMON(N, O, A) arch_df_invsqrt_common(&N, &O, &A)
+#define fFMA(A, B, C) internal_fma(A, B, C)
+#define fDFMPY(A, B) internal_mpy(A, B)
+#define fDF_MPY_HH(A, B, ACC) internal_mpyhh(A, B, ACC)
+#define fFMAX(A, B, C, ADJ) internal_fmax(A, B, C, fSXTN(8, 64, ADJ) * 2)
+#define fMAKEDF(SIGN, EXP, MANT) \
+    ((((SIGN) & 1ULL) << 63) | \
+     (((EXP) & 0x7ffULL) << fDF_MANTBITS()) | \
+     ((MANT) & ((1ULL << fDF_MANTBITS()) - 1)))
+
+#ifdef QEMU_GENERATE
+/* These will be needed if we write any FP instructions with TCG */
+#define fFPOP_START()      /* nothing */
+#define fFPOP_END()        /* nothing */
+#else
+#define fFPOP_START() arch_fpop_start(env)
+#define fFPOP_END() arch_fpop_end(env)
+#endif
+
+#define fFPSETROUND_NEAREST() fesetround(FE_TONEAREST)
+#define fFPSETROUND_CHOP() fesetround(FE_TOWARDZERO)
+#define fFPCANCELFLAGS() feclearexcept(FE_ALL_EXCEPT)
+#define fISINFPROD(A, B) \
+    ((isinf(A) && isinf(B)) || \
+     (isinf(A) && isfinite(B) && ((B) != 0.0)) || \
+     (isinf(B) && isfinite(A) && ((A) != 0.0)))
+#define fISZEROPROD(A, B) \
+    ((((A) == 0.0) && isfinite(B)) || (((B) == 0.0) && isfinite(A)))
+#define fRAISEFLAGS(A) arch_raise_fpflag(A)
+#define fDF_MAX(A, B) \
+    (((A) == (B)) ? fDOUBLE(fUNDOUBLE(A) & fUNDOUBLE(B)) : fmax(A, B))
+#define fDF_MIN(A, B) \
+    (((A) == (B)) ? fDOUBLE(fUNDOUBLE(A) | fUNDOUBLE(B)) : fmin(A, B))
+#define fSF_MAX(A, B) \
+    (((A) == (B)) ? fFLOAT(fUNFLOAT(A) & fUNFLOAT(B)) : fmaxf(A, B))
+#define fSF_MIN(A, B) \
+    (((A) == (B)) ? fFLOAT(fUNFLOAT(A) | fUNFLOAT(B)) : fminf(A, B))
+#define fMMU(ADDR) ADDR
+
+#ifdef QEMU_GENERATE
+#define fcirc_add(REG, INCR, MV) \
+    gen_fcircadd(REG, INCR, MV, fREAD_CSREG(MuN))
+#else
+#define fcirc_add(REG, INCR, IMMED)  /* Not possible in helpers */
+#endif
+
+#define fbrev(REG) (fbrevaddr(REG))
+
+#ifdef QEMU_GENERATE
+#define fLOAD(NUM, SIZE, SIGN, EA, DST) MEM_LOAD##SIZE##SIGN(DST, EA)
+#else
+#define fLOAD(NUM, SIZE, SIGN, EA, DST) \
+    DST = (size##SIZE##SIGN##_t)MEM_LOAD##SIZE##SIGN(EA)
+#endif
+
+#define fMEMOP(NUM, SIZE, SIGN, EA, FNTYPE, VALUE)
+
+#ifdef QEMU_GENERATE
+#define fGET_FRAMEKEY() READ_REG(tmp, HEX_REG_FRAMEKEY)
+static inline TCGv_i64 gen_frame_scramble(TCGv_i64 result)
+{
+    /* ((LR << 32) | FP) ^ (FRAMEKEY << 32)) */
+    TCGv_i64 LR_i64 = tcg_temp_new_i64();
+    TCGv_i64 FP_i64 = tcg_temp_new_i64();
+    TCGv_i64 FRAMEKEY_i64 = tcg_temp_new_i64();
+
+    tcg_gen_extu_i32_i64(LR_i64, hex_gpr[HEX_REG_LR]);
+    tcg_gen_extu_i32_i64(FP_i64, hex_gpr[HEX_REG_FP]);
+    tcg_gen_extu_i32_i64(FRAMEKEY_i64, hex_gpr[HEX_REG_FRAMEKEY]);
+
+    tcg_gen_shli_i64(LR_i64, LR_i64, 32);
+    tcg_gen_shli_i64(FRAMEKEY_i64, FRAMEKEY_i64, 32);
+    tcg_gen_or_i64(result, LR_i64, FP_i64);
+    tcg_gen_xor_i64(result, result, FRAMEKEY_i64);
+
+    tcg_temp_free_i64(LR_i64);
+    tcg_temp_free_i64(FP_i64);
+    tcg_temp_free_i64(FRAMEKEY_i64);
+    return result;
+}
+#define fFRAME_SCRAMBLE(VAL) gen_frame_scramble(scramble_tmp)
+static inline TCGv_i64 gen_frame_unscramble(TCGv_i64 frame)
+{
+    TCGv_i64 FRAMEKEY_i64 = tcg_temp_new_i64();
+    tcg_gen_extu_i32_i64(FRAMEKEY_i64, hex_gpr[HEX_REG_FRAMEKEY]);
+    tcg_gen_shli_i64(FRAMEKEY_i64, FRAMEKEY_i64, 32);
+    tcg_gen_xor_i64(frame, frame, FRAMEKEY_i64);
+    tcg_temp_free_i64(FRAMEKEY_i64);
+    return frame;
+}
+
+#define fFRAME_UNSCRAMBLE(VAL) gen_frame_unscramble(VAL)
+#else
+#define fGET_FRAMEKEY() READ_REG(HEX_REG_FRAMEKEY)
+#define fFRAME_SCRAMBLE(VAL) ((VAL) ^ (fCAST8u(fGET_FRAMEKEY()) << 32))
+#define fFRAME_UNSCRAMBLE(VAL) fFRAME_SCRAMBLE(VAL)
+#endif
+
+#ifdef CONFIG_USER_ONLY
+#define fFRAMECHECK(ADDR, EA) do { } while (0) /* Not modelled in linux-user */
+#else
+/* System mode not implemented yet */
+#define fFRAMECHECK(ADDR, EA)  g_assert_not_reached();
+#endif
+
+#ifdef QEMU_GENERATE
+#define fLOAD_LOCKED(NUM, SIZE, SIGN, EA, DST) \
+    gen_load_locked##SIZE##SIGN(DST, EA, ctx->mem_idx);
+#else
+#define fLOAD_LOCKED(NUM, SIZE, SIGN, EA, DST) \
+    DST = (size##SIZE##SIGN##_t)mem_load_locked##SIZE(env, EA);
+#endif
+
+#define fLOAD_PHYS(NUM, SIZE, SIGN, SRC1, SRC2, DST)
+
+#ifdef QEMU_GENERATE
+#define fSTORE(NUM, SIZE, EA, SRC) MEM_STORE##SIZE(EA, SRC, insn->slot)
+#else
+#define fSTORE(NUM, SIZE, EA, SRC) MEM_STORE##SIZE(EA, SRC, slot)
+#endif
+
+#ifdef QEMU_GENERATE
+#define fSTORE_LOCKED(NUM, SIZE, EA, SRC, PRED) \
+    gen_store_conditional##SIZE(env, ctx, PdN, PRED, EA, SRC);
+#else
+#define fSTORE_LOCKED(NUM, SIZE, EA, SRC, PRED) \
+    PRED = (mem_store_conditional(env, EA, SRC, SIZE) ? 0xff : 0);
+#endif
+
+#define fVTCM_MEMCPY(DST, SRC, SIZE)
+#define fPERMUTEH(SRC0, SRC1, CTRL) fpermuteh((SRC0), (SRC1), CTRL)
+#define fPERMUTEB(SRC0, SRC1, CTRL) fpermuteb((SRC0), (SRC1), CTRL)
+
+#ifdef QEMU_GENERATE
+#define GETBYTE_FUNC(X) \
+    _Generic((X), TCGv_i32 : gen_get_byte, TCGv_i64 : gen_get_byte_i64)
+#define fGETBYTE(N, SRC) GETBYTE_FUNC(SRC)(BYTE, N, SRC, true)
+#define fGETUBYTE(N, SRC) GETBYTE_FUNC(SRC)(BYTE, N, SRC, false)
+#else
+#define fGETBYTE(N, SRC) ((size1s_t)((SRC >> ((N) * 8)) & 0xff))
+#define fGETUBYTE(N, SRC) ((size1u_t)((SRC >> ((N) * 8)) & 0xff))
+#endif
+
+#ifdef QEMU_GENERATE
+#define SETBYTE_FUNC(X) \
+    _Generic((X), TCGv_i32 : gen_set_byte, TCGv_i64 : gen_set_byte_i64)
+#define fSETBYTE(N, DST, VAL) SETBYTE_FUNC(DST)(N, DST, VAL)
+
+#define fGETHALF(N, SRC)  gen_get_half(HALF, N, SRC, true)
+#define fGETUHALF(N, SRC) gen_get_half(HALF, N, SRC, false)
+
+#define SETHALF_FUNC(X) \
+    _Generic((X), TCGv_i32 : gen_set_half, TCGv_i64 : gen_set_half_i64)
+#define fSETHALF(N, DST, VAL) SETHALF_FUNC(DST)(N, DST, VAL)
+#define fSETHALFw(N, DST, VAL) gen_set_half(N, DST, VAL)
+#define fSETHALFd(N, DST, VAL) gen_set_half_i64(N, DST, VAL)
+#else
+#define fSETBYTE(N, DST, VAL) \
+    do { \
+        DST = (DST & ~(0x0ffLL << ((N) * 8))) | \
+        (((size8u_t)((VAL) & 0x0ffLL)) << ((N) * 8)); \
+    } while (0)
+#define fGETHALF(N, SRC) ((size2s_t)((SRC >> ((N) * 16)) & 0xffff))
+#define fGETUHALF(N, SRC) ((size2u_t)((SRC >> ((N) * 16)) & 0xffff))
+#define fSETHALF(N, DST, VAL) \
+    do { \
+        DST = (DST & ~(0x0ffffLL << ((N) * 16))) | \
+        (((size8u_t)((VAL) & 0x0ffff)) << ((N) * 16)); \
+    } while (0)
+#define fSETHALFw fSETHALF
+#define fSETHALFd fSETHALF
+#endif
+
+#ifdef QEMU_GENERATE
+#define GETWORD_FUNC(X) \
+    _Generic((X), TCGv_i32 : gen_get_word, TCGv_i64 : gen_get_word_i64)
+#define fGETWORD(N, SRC)  GETWORD_FUNC(WORD)(WORD, N, SRC, true)
+#define fGETUWORD(N, SRC) GETWORD_FUNC(WORD)(WORD, N, SRC, false)
+#else
+#define fGETWORD(N, SRC) \
+    ((size8s_t)((size4s_t)((SRC >> ((N) * 32)) & 0x0ffffffffLL)))
+#define fGETUWORD(N, SRC) \
+    ((size8u_t)((size4u_t)((SRC >> ((N) * 32)) & 0x0ffffffffLL)))
+#endif
+
+#define fSETWORD(N, DST, VAL) \
+    do { \
+        DST = (DST & ~(0x0ffffffffLL << ((N) * 32))) | \
+              (((VAL) & 0x0ffffffffLL) << ((N) * 32)); \
+    } while (0)
+#define fACC()
+#define fEXTENSION_AUDIO(A) A
+
+#ifdef QEMU_GENERATE
+#define fSETBIT(N, DST, VAL) gen_set_bit((N), (DST), (VAL));
+#else
+#define fSETBIT(N, DST, VAL) \
+    do { \
+        DST = (DST & ~(1ULL << (N))) | (((size8u_t)(VAL)) << (N)); \
+    } while (0)
+#endif
+
+#define fGETBIT(N, SRC) (((SRC) >> N) & 1)
+#define fSETBITS(HI, LO, DST, VAL) \
+    do { \
+        int j; \
+        for (j = LO; j <= HI; j++) { \
+            fSETBIT(j, DST, VAL); \
+        } \
+    } while (0)
+#define fCOUNTONES_2(VAL) count_ones_2(VAL)
+#define fCOUNTONES_4(VAL) count_ones_4(VAL)
+#define fCOUNTONES_8(VAL) count_ones_8(VAL)
+#define fBREV_8(VAL) reverse_bits_8(VAL)
+#define fBREV_4(VAL) reverse_bits_4(VAL)
+#define fBREV_2(VAL) reverse_bits_2(VAL)
+#define fBREV_1(VAL) reverse_bits_1(VAL)
+#define fCL1_8(VAL) count_leading_ones_8(VAL)
+#define fCL1_4(VAL) count_leading_ones_4(VAL)
+#define fCL1_2(VAL) count_leading_ones_2(VAL)
+#define fCL1_1(VAL) count_leading_ones_1(VAL)
+#define fINTERLEAVE(ODD, EVEN) interleave(ODD, EVEN)
+#define fDEINTERLEAVE(MIXED) deinterleave(MIXED)
+#define fNORM16(VAL) \
+    ((VAL == 0) ? (31) : (fMAX(fCL1_2(VAL), fCL1_2(~VAL)) - 1))
+#define fHIDE(A) A
+#define fCONSTLL(A) A##LL
+#define fCONSTULL(A) A##ULL
+#define fECHO(A) (A)
+
+#define fDO_TRACE(SREG)
+#define fBREAK()
+#define fGP_DOCHKPAGECROSS(BASE, SUM)
+#define fDOCHKPAGECROSS(BASE, SUM)
+#define fPAUSE(IMM)
+#define fTRAP(TRAPTYPE, IMM) helper_raise_exception(env, HEX_EXCP_TRAP0)
+
+#define fALIGN_REG_FIELD_VALUE(FIELD, VAL) \
+    ((VAL) << reg_field_info[FIELD].offset)
+#define fGET_REG_FIELD_MASK(FIELD) \
+    (((1 << reg_field_info[FIELD].width) - 1) << reg_field_info[FIELD].offset)
+#define fLOG_REG_FIELD(REG, FIELD, VAL)
+#define fWRITE_GLOBAL_REG_FIELD(REG, FIELD, VAL)
+#define fLOG_GLOBAL_REG_FIELD(REG, FIELD, VAL)
+#define fREAD_REG_FIELD(REG, FIELD) \
+    fEXTRACTU_BITS(env->gpr[HEX_REG_##REG], \
+                   reg_field_info[FIELD].width, \
+                   reg_field_info[FIELD].offset)
+#define fREAD_GLOBAL_REG_FIELD(REG, FIELD)
+#define fGET_FIELD(VAL, FIELD)
+#define fSET_FIELD(VAL, FIELD, NEWVAL)
+#define fPOW2_HELP_ROUNDUP(VAL) \
+    ((VAL) | \
+     ((VAL) >> 1) | \
+     ((VAL) >> 2) | \
+     ((VAL) >> 4) | \
+     ((VAL) >> 8) | \
+     ((VAL) >> 16))
+#define fPOW2_ROUNDUP(VAL) (fPOW2_HELP_ROUNDUP((VAL) - 1) + 1)
+#define fBARRIER()
+#define fSYNCH()
+#define fISYNC()
+#define fICFETCH(REG)
+#define fDCFETCH(REG) do { REG = REG; } while (0) /* Nothing to do in qemu */
+#define fICINVIDX(REG)
+#define fICINVA(REG) do { REG = REG; } while (0) /* Nothing to do in qemu */
+#define fICKILL()
+#define fDCKILL()
+#define fL2KILL()
+#define fL2UNLOCK()
+#define fL2CLEAN()
+#define fL2CLEANINV()
+#define fL2CLEANPA(REG)
+#define fL2CLEANINVPA(REG)
+#define fL2CLEANINVIDX(REG)
+#define fL2CLEANIDX(REG)
+#define fL2INVIDX(REG)
+#define fL2FETCH(ADDR, HEIGHT, WIDTH, STRIDE, FLAGS)
+#define fL2TAGR(INDEX, DST, DSTREG)
+#define fL2LOCKA(VA, DST, PREGDST)
+#define fL2UNLOCKA(VA)
+#define fL2TAGW(INDEX, PART2)
+#define fDCCLEANIDX(REG)
+#define fDCCLEANA(REG) do { REG = REG; } while (0) /* Nothing to do in qemu */
+#define fDCCLEANINVIDX(REG)
+#define fDCCLEANINVA(REG) \
+    do { REG = REG; } while (0) /* Nothing to do in qemu */
+
+#ifdef QEMU_GENERATE
+#define fDCZEROA(REG) tcg_gen_mov_tl(hex_dczero_addr, (REG))
+#else
+#define fDCZEROA(REG) do { REG = REG; g_assert_not_reached(); } while (0)
+#endif
+
+#define fDCINVIDX(REG)
+#define fDCINVA(REG) do { REG = REG; } while (0) /* Nothing to do in qemu */
+#define fBRANCH_SPECULATED_RIGHT(JC, SD, DOTNEWVAL) \
+    (((JC) ^ (SD) ^ (DOTNEWVAL & 1)) & 0x1)
+#define fBRANCH_SPECULATE_STALL(DOTNEWVAL, JUMP_COND, SPEC_DIR, HINTBITNUM, \
+                                STRBITNUM) /* Nothing */
+
+#define IV1DEAD()
+#define fVIRTINSN_SPSWAP(IMM, REG)
+#define fVIRTINSN_GETIE(IMM, REG) { REG = 0xdeafbeef; }
+#define fVIRTINSN_SETIE(IMM, REG)
+#define fVIRTINSN_RTE(IMM, REG)
+#define fTRAP1_VIRTINSN(IMM) \
+    (((IMM) == 1) || ((IMM) == 3) || ((IMM) == 4) || ((IMM) == 6))
+#define fNOP_EXECUTED
+#define fPREDUSE_TIMING()
+
+#endif
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 33/67] Hexagon instruction classes
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (31 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 32/67] Hexagon macros referenced in instruction semantics Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-02-28 16:43 ` [RFC PATCH v2 34/67] Hexagon TCG generation helpers - step 1 Taylor Simpson
                   ` (34 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Used to determine legal VLIW slots for each instruction

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/iclass.h |  46 +++++++++++++++++++++
 target/hexagon/iclass.c | 107 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 153 insertions(+)
 create mode 100644 target/hexagon/iclass.h
 create mode 100644 target/hexagon/iclass.c

diff --git a/target/hexagon/iclass.h b/target/hexagon/iclass.h
new file mode 100644
index 0000000..89288ac
--- /dev/null
+++ b/target/hexagon/iclass.h
@@ -0,0 +1,46 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HEXAGON_ICLASS_H
+#define HEXAGON_ICLASS_H
+
+#include "opcodes.h"
+
+#define ICLASS_FROM_TYPE(TYPE) ICLASS_##TYPE
+
+typedef enum {
+
+#define DEF_PP_ICLASS32(TYPE, SLOTS, UNITS)    ICLASS_FROM_TYPE(TYPE),
+#define DEF_EE_ICLASS32(TYPE, SLOTS, UNITS)    /* nothing */
+#include "imported/iclass.def"
+#undef DEF_PP_ICLASS32
+#undef DEF_EE_ICLASS32
+
+#define DEF_EE_ICLASS32(TYPE, SLOTS, UNITS)    ICLASS_FROM_TYPE(TYPE),
+#define DEF_PP_ICLASS32(TYPE, SLOTS, UNITS)    /* nothing */
+#include "imported/iclass.def"
+#undef DEF_PP_ICLASS32
+#undef DEF_EE_ICLASS32
+
+    ICLASS_FROM_TYPE(COPROC_VX),
+    ICLASS_FROM_TYPE(COPROC_VMEM),
+    NUM_ICLASSES
+} iclass_t;
+
+extern const char *find_iclass_slots(opcode_t opcode, int itype);
+
+#endif
diff --git a/target/hexagon/iclass.c b/target/hexagon/iclass.c
new file mode 100644
index 0000000..9ada8cd
--- /dev/null
+++ b/target/hexagon/iclass.c
@@ -0,0 +1,107 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "iclass.h"
+
+typedef struct {
+    const char * const slots;
+} iclass_info_t;
+
+static const iclass_info_t iclass_info[] = {
+
+#define DEF_EE_ICLASS32(TYPE, SLOTS, UNITS)    /* nothing */
+#define DEF_PP_ICLASS32(TYPE, SLOTS, UNITS) \
+    [ICLASS_FROM_TYPE(TYPE)] = { .slots = #SLOTS },
+
+#include "imported/iclass.def"
+#undef DEF_PP_ICLASS32
+#undef DEF_EE_ICLASS32
+
+#define DEF_PP_ICLASS32(TYPE, SLOTS, UNITS)    /* nothing */
+#define DEF_EE_ICLASS32(TYPE, SLOTS, UNITS) \
+    [ICLASS_FROM_TYPE(TYPE)] = { .slots = #SLOTS },
+
+#include "imported/iclass.def"
+#undef DEF_PP_ICLASS32
+#undef DEF_EE_ICLASS32
+
+    {0}
+};
+
+const char *find_iclass_slots(opcode_t opcode, int itype)
+{
+    /* There are some exceptions to what the iclass dictates */
+    if (GET_ATTRIB(opcode, A_ICOP)) {
+        return "2";
+    } else if (GET_ATTRIB(opcode, A_RESTRICT_SLOT0ONLY)) {
+        return "0";
+    } else if (GET_ATTRIB(opcode, A_RESTRICT_SLOT1ONLY)) {
+        return "1";
+    } else if (GET_ATTRIB(opcode, A_RESTRICT_SLOT2ONLY)) {
+        return "2";
+    } else if (GET_ATTRIB(opcode, A_RESTRICT_SLOT3ONLY)) {
+        return "3";
+    } else if (GET_ATTRIB(opcode, A_COF) &&
+               GET_ATTRIB(opcode, A_INDIRECT) &&
+               !GET_ATTRIB(opcode, A_MEMLIKE) &&
+               !GET_ATTRIB(opcode, A_MEMLIKE_PACKET_RULES)) {
+        return "2";
+    } else if (GET_ATTRIB(opcode, A_RESTRICT_NOSLOT1)) {
+        return "0";
+    } else if ((opcode == J2_trap0) || (opcode == J2_trap1) ||
+               (opcode == Y2_isync) || (opcode == J2_rte) ||
+               (opcode == J2_pause) || (opcode == J4_hintjumpr)) {
+        return "2";
+    } else if ((itype == ICLASS_V2LDST) && (GET_ATTRIB(opcode, A_STORE))) {
+        return "01";
+    } else if ((itype == ICLASS_V2LDST) && (!GET_ATTRIB(opcode, A_STORE))) {
+        return "01";
+    } else if (GET_ATTRIB(opcode, A_CRSLOT23)) {
+        return "23";
+    } else if (GET_ATTRIB(opcode, A_RESTRICT_PREFERSLOT0)) {
+        return "0";
+    } else if (GET_ATTRIB(opcode, A_SUBINSN)) {
+        return "01";
+    } else if (GET_ATTRIB(opcode, A_CALL)) {
+        return "23";
+    } else if ((opcode == J4_jumpseti) || (opcode == J4_jumpsetr)) {
+        return "23";
+    } else if (GET_ATTRIB(opcode, A_EXTENSION) && GET_ATTRIB(opcode, A_CVI)) {
+        /* CVI EXTENSIONS */
+        if (GET_ATTRIB(opcode, A_CVI_VM)) {
+            return "01";
+        } else if (GET_ATTRIB(opcode, A_RESTRICT_SLOT2ONLY)) {
+            return "2";
+        } else if (GET_ATTRIB(opcode, A_CVI_SLOT23)) {
+            return "23";
+        } else if (GET_ATTRIB(opcode, A_CVI_VX)) {
+            return "23";
+        } else if (GET_ATTRIB(opcode, A_CVI_VX_DV)) {
+            return "23";
+        } else if (GET_ATTRIB(opcode, A_CVI_VS_VX)) {
+            return "23";
+        } else if (GET_ATTRIB(opcode, A_MEMLIKE)) {
+            return "01";
+        } else {
+            return "0123";
+        }
+    } else {
+        return iclass_info[itype].slots;
+    }
+}
+
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 34/67] Hexagon TCG generation helpers - step 1
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (32 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 33/67] Hexagon instruction classes Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-02-28 16:43 ` [RFC PATCH v2 35/67] Hexagon TCG generation helpers - step 2 Taylor Simpson
                   ` (33 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Helpers for reading and writing registers
Helpers for getting and setting parts of values (e.g., set bit)

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/genptr_helpers.h | 337 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 337 insertions(+)
 create mode 100644 target/hexagon/genptr_helpers.h

diff --git a/target/hexagon/genptr_helpers.h b/target/hexagon/genptr_helpers.h
new file mode 100644
index 0000000..d8d5d95
--- /dev/null
+++ b/target/hexagon/genptr_helpers.h
@@ -0,0 +1,337 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HEXAGON_GENPTR_HELPERS_H
+#define HEXAGON_GENPTR_HELPERS_H
+
+#include "tcg/tcg.h"
+
+static inline TCGv gen_read_reg(TCGv result, int num)
+{
+    tcg_gen_mov_tl(result, hex_gpr[num]);
+    return result;
+}
+
+static inline TCGv gen_read_preg(TCGv pred, uint8_t num)
+{
+    tcg_gen_mov_tl(pred, hex_pred[num]);
+    return pred;
+}
+
+static inline TCGv gen_newreg_st(TCGv result, TCGv_env cpu_env, TCGv rnum)
+{
+    gen_helper_new_value(result, cpu_env, rnum);
+    return result;
+}
+
+static inline bool is_preloaded(DisasContext *ctx, int num)
+{
+    int i;
+    for (i = 0; i < ctx->ctx_reg_log_idx; i++) {
+        if (ctx->ctx_reg_log[i] == num) {
+            return true;
+        }
+    }
+    return false;
+}
+
+static inline void gen_log_reg_write(int rnum, TCGv val, int slot,
+                                     int is_predicated)
+{
+    if (is_predicated) {
+        TCGv one = tcg_const_tl(1);
+        TCGv zero = tcg_const_tl(0);
+        TCGv slot_mask = tcg_temp_new();
+
+        tcg_gen_andi_tl(slot_mask, hex_slot_cancelled, 1 << slot);
+        tcg_gen_movcond_tl(TCG_COND_EQ, hex_new_value[rnum], slot_mask, zero,
+                           val, hex_new_value[rnum]);
+#if HEX_DEBUG
+        /* Do this so HELPER(debug_commit_end) will know */
+        tcg_gen_movcond_tl(TCG_COND_EQ, hex_reg_written[rnum], slot_mask, zero,
+                           one, hex_reg_written[rnum]);
+#endif
+
+        tcg_temp_free(one);
+        tcg_temp_free(zero);
+        tcg_temp_free(slot_mask);
+    } else {
+        tcg_gen_mov_tl(hex_new_value[rnum], val);
+#if HEX_DEBUG
+        /* Do this so HELPER(debug_commit_end) will know */
+        tcg_gen_movi_tl(hex_reg_written[rnum], 1);
+#endif
+    }
+}
+
+static inline void gen_log_reg_write_pair(int rnum, TCGv_i64 val, int slot,
+                                          int is_predicated)
+{
+    TCGv val32 = tcg_temp_new();
+
+    if (is_predicated) {
+        TCGv one = tcg_const_tl(1);
+        TCGv zero = tcg_const_tl(0);
+        TCGv slot_mask = tcg_temp_new();
+
+        tcg_gen_andi_tl(slot_mask, hex_slot_cancelled, 1 << slot);
+        /* Low word */
+        tcg_gen_extrl_i64_i32(val32, val);
+        tcg_gen_movcond_tl(TCG_COND_EQ, hex_new_value[rnum], slot_mask, zero,
+                           val32, hex_new_value[rnum]);
+        /* High word */
+        tcg_gen_extrh_i64_i32(val32, val);
+        tcg_gen_movcond_tl(TCG_COND_EQ, hex_new_value[rnum + 1],
+                           slot_mask, zero,
+                           val32, hex_new_value[rnum + 1]);
+
+        tcg_temp_free(one);
+        tcg_temp_free(zero);
+        tcg_temp_free(slot_mask);
+    } else {
+        /* Low word */
+        tcg_gen_extrl_i64_i32(val32, val);
+        tcg_gen_mov_tl(hex_new_value[rnum], val32);
+        /* High word */
+        tcg_gen_extrh_i64_i32(val32, val);
+        tcg_gen_mov_tl(hex_new_value[rnum + 1], val32);
+    }
+
+    tcg_temp_free(val32);
+}
+
+static inline void gen_log_pred_write(int pnum, TCGv val)
+{
+    TCGv zero = tcg_const_tl(0);
+    TCGv base_val = tcg_temp_new();
+    TCGv and_val = tcg_temp_new();
+    TCGv pred_written = tcg_temp_new();
+
+    /* Multiple writes to the same preg are and'ed together */
+    tcg_gen_andi_tl(base_val, val, 0xff);
+    tcg_gen_and_tl(and_val, base_val, hex_new_pred_value[pnum]);
+    tcg_gen_andi_tl(pred_written, hex_pred_written, 1 << pnum);
+    tcg_gen_movcond_tl(TCG_COND_NE, hex_new_pred_value[pnum],
+                       pred_written, zero,
+                       and_val, base_val);
+    tcg_gen_ori_tl(hex_pred_written, hex_pred_written, 1 << pnum);
+
+    tcg_temp_free(zero);
+    tcg_temp_free(base_val);
+    tcg_temp_free(and_val);
+    tcg_temp_free(pred_written);
+}
+
+static inline void gen_read_p3_0(TCGv control_reg)
+{
+    TCGv pval = tcg_temp_new();
+    int i;
+    tcg_gen_movi_tl(control_reg, 0);
+    for (i = NUM_PREGS - 1; i >= 0; i--) {
+        tcg_gen_shli_tl(control_reg, control_reg, 8);
+        tcg_gen_andi_tl(pval, hex_pred[i], 0xff);
+        tcg_gen_or_tl(control_reg, control_reg, pval);
+    }
+    tcg_temp_free(pval);
+}
+
+static inline void gen_write_p3_0(TCGv tmp)
+{
+    TCGv control_reg = tcg_temp_new();
+    TCGv pred_val = tcg_temp_new();
+    int i;
+
+    tcg_gen_mov_tl(control_reg, tmp);
+    for (i = 0; i < NUM_PREGS; i++) {
+        tcg_gen_andi_tl(pred_val, control_reg, 0xff);
+        tcg_gen_mov_tl(hex_pred[i], pred_val);
+        tcg_gen_shri_tl(control_reg, control_reg, 8);
+    }
+    tcg_temp_free(control_reg);
+    tcg_temp_free(pred_val);
+}
+
+static inline TCGv gen_get_byte(TCGv result, int N, TCGv src, bool sign)
+{
+    TCGv shift = tcg_const_tl(8 * N);
+    TCGv mask = tcg_const_tl(0xff);
+
+    tcg_gen_shr_tl(result, src, shift);
+    tcg_gen_and_tl(result, result, mask);
+    if (sign) {
+        tcg_gen_ext8s_tl(result, result);
+    } else {
+        tcg_gen_ext8u_tl(result, result);
+    }
+    tcg_temp_free(mask);
+    tcg_temp_free(shift);
+
+    return result;
+}
+
+static inline TCGv gen_get_byte_i64(TCGv result, int N, TCGv_i64 src, bool sign)
+{
+    TCGv_i64 result_i64 = tcg_temp_new_i64();
+    TCGv_i64 shift = tcg_const_i64(8 * N);
+    TCGv_i64 mask = tcg_const_i64(0xff);
+    tcg_gen_shr_i64(result_i64, src, shift);
+    tcg_gen_and_i64(result_i64, result_i64, mask);
+    tcg_gen_extrl_i64_i32(result, result_i64);
+    if (sign) {
+        tcg_gen_ext8s_tl(result, result);
+    } else {
+        tcg_gen_ext8u_tl(result, result);
+    }
+    tcg_temp_free_i64(result_i64);
+    tcg_temp_free_i64(shift);
+    tcg_temp_free_i64(mask);
+
+    return result;
+
+}
+static inline TCGv gen_get_half(TCGv result, int N, TCGv src, bool sign)
+{
+    TCGv shift = tcg_const_tl(16 * N);
+    TCGv mask = tcg_const_tl(0xffff);
+
+    tcg_gen_shr_tl(result, src, shift);
+    tcg_gen_and_tl(result, result, mask);
+    if (sign) {
+        tcg_gen_ext16s_tl(result, result);
+    } else {
+        tcg_gen_ext16u_tl(result, result);
+    }
+    tcg_temp_free(mask);
+    tcg_temp_free(shift);
+
+    return result;
+}
+
+static inline void gen_set_half(int N, TCGv result, TCGv src)
+{
+    TCGv mask1 = tcg_const_tl(~(0xffff << (N * 16)));
+    TCGv mask2 = tcg_const_tl(0xffff);
+    TCGv shift = tcg_const_tl(N * 16);
+    TCGv tmp = tcg_temp_new();
+
+    tcg_gen_and_tl(result, result, mask1);
+    tcg_gen_and_tl(tmp, src, mask2);
+    tcg_gen_shli_tl(tmp, tmp, N * 16);
+    tcg_gen_or_tl(result, result, tmp);
+
+    tcg_temp_free(mask1);
+    tcg_temp_free(mask2);
+    tcg_temp_free(shift);
+    tcg_temp_free(tmp);
+}
+
+static inline void gen_set_half_i64(int N, TCGv_i64 result, TCGv src)
+{
+    TCGv_i64 mask1 = tcg_const_i64(~(0xffffLL << (N * 16)));
+    TCGv_i64 mask2 = tcg_const_i64(0xffffLL);
+    TCGv_i64 shift = tcg_const_i64(N * 16);
+    TCGv_i64 tmp = tcg_temp_new_i64();
+
+    tcg_gen_and_i64(result, result, mask1);
+    tcg_gen_concat_i32_i64(tmp, src, src);
+    tcg_gen_and_i64(tmp, tmp, mask2);
+    tcg_gen_shli_i64(tmp, tmp, N * 16);
+    tcg_gen_or_i64(result, result, tmp);
+
+    tcg_temp_free_i64(mask1);
+    tcg_temp_free_i64(mask2);
+    tcg_temp_free_i64(shift);
+    tcg_temp_free_i64(tmp);
+}
+
+static inline void gen_set_byte(int N, TCGv result, TCGv src)
+{
+    TCGv mask1 = tcg_const_tl(~(0xff << (N * 8)));
+    TCGv mask2 = tcg_const_tl(0xff);
+    TCGv shift = tcg_const_tl(N * 8);
+    TCGv tmp = tcg_temp_new();
+
+    tcg_gen_and_tl(result, result, mask1);
+    tcg_gen_and_tl(tmp, src, mask2);
+    tcg_gen_shli_tl(tmp, tmp, N * 8);
+    tcg_gen_or_tl(result, result, tmp);
+
+    tcg_temp_free(mask1);
+    tcg_temp_free(mask2);
+    tcg_temp_free(shift);
+    tcg_temp_free(tmp);
+}
+
+static inline void gen_set_byte_i64(int N, TCGv_i64 result, TCGv src)
+{
+    TCGv_i64 mask1 = tcg_const_i64(~(0xffLL << (N * 8)));
+    TCGv_i64 mask2 = tcg_const_i64(0xffLL);
+    TCGv_i64 shift = tcg_const_i64(N * 8);
+    TCGv_i64 tmp = tcg_temp_new_i64();
+
+    tcg_gen_and_i64(result, result, mask1);
+    tcg_gen_concat_i32_i64(tmp, src, src);
+    tcg_gen_and_i64(tmp, tmp, mask2);
+    tcg_gen_shli_i64(tmp, tmp, N * 8);
+    tcg_gen_or_i64(result, result, tmp);
+
+    tcg_temp_free_i64(mask1);
+    tcg_temp_free_i64(mask2);
+    tcg_temp_free_i64(shift);
+    tcg_temp_free_i64(tmp);
+}
+
+static inline TCGv gen_get_word(TCGv result, int N, TCGv_i64 src, bool sign)
+{
+    if (N == 0) {
+        tcg_gen_extrl_i64_i32(result, src);
+    } else if (N == 1) {
+        tcg_gen_extrh_i64_i32(result, src);
+    } else {
+      g_assert_not_reached();
+    }
+    return result;
+}
+
+static inline TCGv_i64 gen_get_word_i64(TCGv_i64 result, int N, TCGv_i64 src,
+                                        bool sign)
+{
+    TCGv word = tcg_temp_new();
+    gen_get_word(word, N, src, sign);
+    if (sign) {
+        tcg_gen_ext_i32_i64(result, word);
+    } else {
+        tcg_gen_extu_i32_i64(result, word);
+    }
+    tcg_temp_free(word);
+    return result;
+}
+
+static inline TCGv gen_set_bit(int i, TCGv result, TCGv src)
+{
+    TCGv mask = tcg_const_tl(~(1 << i));
+    TCGv bit = tcg_temp_new();
+    tcg_gen_shli_tl(bit, src, i);
+    tcg_gen_and_tl(result, result, mask);
+    tcg_gen_or_tl(result, result, bit);
+    tcg_temp_free(mask);
+    tcg_temp_free(bit);
+
+    return result;
+}
+
+#endif
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 35/67] Hexagon TCG generation helpers - step 2
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (33 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 34/67] Hexagon TCG generation helpers - step 1 Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-02-28 16:43 ` [RFC PATCH v2 36/67] Hexagon TCG generation helpers - step 3 Taylor Simpson
                   ` (32 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Helpers for load-locked/store-conditional

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/genptr_helpers.h | 52 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/target/hexagon/genptr_helpers.h b/target/hexagon/genptr_helpers.h
index d8d5d95..c0e4c39 100644
--- a/target/hexagon/genptr_helpers.h
+++ b/target/hexagon/genptr_helpers.h
@@ -334,4 +334,56 @@ static inline TCGv gen_set_bit(int i, TCGv result, TCGv src)
     return result;
 }
 
+static inline void gen_load_locked4u(TCGv dest, TCGv vaddr, int mem_index)
+{
+    tcg_gen_qemu_ld32u(dest, vaddr, mem_index);
+    tcg_gen_mov_tl(llsc_addr, vaddr);
+    tcg_gen_mov_tl(llsc_val, dest);
+}
+
+static inline void gen_load_locked8u(TCGv_i64 dest, TCGv vaddr, int mem_index)
+{
+    tcg_gen_qemu_ld64(dest, vaddr, mem_index);
+    tcg_gen_mov_tl(llsc_addr, vaddr);
+    tcg_gen_mov_i64(llsc_val_i64, dest);
+}
+
+static inline void gen_store_conditional4(CPUHexagonState *env,
+                                          DisasContext *ctx, int prednum,
+                                          TCGv pred, TCGv vaddr, TCGv src)
+{
+    TCGv tmp = tcg_temp_new();
+    TCGLabel *fail = gen_new_label();
+
+    tcg_gen_ld_tl(tmp, cpu_env, offsetof(CPUHexagonState, llsc_addr));
+    tcg_gen_brcond_tl(TCG_COND_NE, vaddr, tmp, fail);
+    tcg_gen_movi_tl(tmp, prednum);
+    tcg_gen_st_tl(tmp, cpu_env, offsetof(CPUHexagonState, llsc_reg));
+    tcg_gen_st_tl(src, cpu_env, offsetof(CPUHexagonState, llsc_newval));
+    gen_exception(HEX_EXCP_SC4);
+
+    gen_set_label(fail);
+    tcg_gen_movi_tl(pred, 0);
+    tcg_temp_free(tmp);
+}
+
+static inline void gen_store_conditional8(CPUHexagonState *env,
+                                          DisasContext *ctx, int prednum,
+                                          TCGv pred, TCGv vaddr, TCGv_i64 src)
+{
+    TCGv tmp = tcg_temp_new();
+    TCGLabel *fail = gen_new_label();
+
+    tcg_gen_ld_tl(tmp, cpu_env, offsetof(CPUHexagonState, llsc_addr));
+    tcg_gen_brcond_tl(TCG_COND_NE, vaddr, tmp, fail);
+    tcg_gen_movi_tl(tmp, prednum);
+    tcg_gen_st_tl(tmp, cpu_env, offsetof(CPUHexagonState, llsc_reg));
+    tcg_gen_st_i64(src, cpu_env, offsetof(CPUHexagonState, llsc_newval_i64));
+    gen_exception(HEX_EXCP_SC8);
+
+    gen_set_label(fail);
+    tcg_gen_movi_tl(pred, 0);
+    tcg_temp_free(tmp);
+}
+
 #endif
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 36/67] Hexagon TCG generation helpers - step 3
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (34 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 35/67] Hexagon TCG generation helpers - step 2 Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-02-28 16:43 ` [RFC PATCH v2 37/67] Hexagon TCG generation helpers - step 4 Taylor Simpson
                   ` (31 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Helpers for store instructions

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/genptr_helpers.h | 77 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 77 insertions(+)

diff --git a/target/hexagon/genptr_helpers.h b/target/hexagon/genptr_helpers.h
index c0e4c39..0e2d7b9 100644
--- a/target/hexagon/genptr_helpers.h
+++ b/target/hexagon/genptr_helpers.h
@@ -386,4 +386,81 @@ static inline void gen_store_conditional8(CPUHexagonState *env,
     tcg_temp_free(tmp);
 }
 
+static inline void gen_store32(TCGv vaddr, TCGv src, int width, int slot)
+{
+    tcg_gen_mov_tl(hex_store_addr[slot], vaddr);
+    tcg_gen_movi_tl(hex_store_width[slot], width);
+    tcg_gen_mov_tl(hex_store_val32[slot], src);
+}
+
+static inline void gen_store1(TCGv_env cpu_env, TCGv vaddr, TCGv src,
+                              DisasContext *ctx, int slot)
+{
+    TCGv tmp = tcg_const_tl(slot);
+    gen_store32(vaddr, src, 1, slot);
+    tcg_temp_free(tmp);
+    ctx->ctx_store_width[slot] = 1;
+}
+
+static inline void gen_store1i(TCGv_env cpu_env, TCGv vaddr, int32_t src,
+                               DisasContext *ctx, int slot)
+{
+    TCGv tmp = tcg_const_tl(src);
+    gen_store1(cpu_env, vaddr, tmp, ctx, slot);
+    tcg_temp_free(tmp);
+}
+
+static inline void gen_store2(TCGv_env cpu_env, TCGv vaddr, TCGv src,
+                              DisasContext *ctx, int slot)
+{
+    TCGv tmp = tcg_const_tl(slot);
+    gen_store32(vaddr, src, 2, slot);
+    tcg_temp_free(tmp);
+    ctx->ctx_store_width[slot] = 2;
+}
+
+static inline void gen_store2i(TCGv_env cpu_env, TCGv vaddr, int32_t src,
+                               DisasContext *ctx, int slot)
+{
+    TCGv tmp = tcg_const_tl(src);
+    gen_store2(cpu_env, vaddr, tmp, ctx, slot);
+    tcg_temp_free(tmp);
+}
+
+static inline void gen_store4(TCGv_env cpu_env, TCGv vaddr, TCGv src,
+                              DisasContext *ctx, int slot)
+{
+    TCGv tmp = tcg_const_tl(slot);
+    gen_store32(vaddr, src, 4, slot);
+    tcg_temp_free(tmp);
+    ctx->ctx_store_width[slot] = 4;
+}
+
+static inline void gen_store4i(TCGv_env cpu_env, TCGv vaddr, int32_t src,
+                               DisasContext *ctx, int slot)
+{
+    TCGv tmp = tcg_const_tl(src);
+    gen_store4(cpu_env, vaddr, tmp, ctx, slot);
+    tcg_temp_free(tmp);
+}
+
+static inline void gen_store8(TCGv_env cpu_env, TCGv vaddr, TCGv_i64 src,
+                              DisasContext *ctx, int slot)
+{
+    TCGv tmp = tcg_const_tl(slot);
+    tcg_gen_mov_tl(hex_store_addr[slot], vaddr);
+    tcg_gen_movi_tl(hex_store_width[slot], 8);
+    tcg_gen_mov_i64(hex_store_val64[slot], src);
+    tcg_temp_free(tmp);
+    ctx->ctx_store_width[slot] = 8;
+}
+
+static inline void gen_store8i(TCGv_env cpu_env, TCGv vaddr, int64_t src,
+                               DisasContext *ctx, int slot)
+{
+    TCGv_i64 tmp = tcg_const_i64(src);
+    gen_store8(cpu_env, vaddr, tmp, ctx, slot);
+    tcg_temp_free_i64(tmp);
+}
+
 #endif
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 37/67] Hexagon TCG generation helpers - step 4
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (35 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 36/67] Hexagon TCG generation helpers - step 3 Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-02-28 16:43 ` [RFC PATCH v2 38/67] Hexagon TCG generation helpers - step 5 Taylor Simpson
                   ` (30 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Helpers referenced in macros.h

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/genptr_helpers.h | 67 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 67 insertions(+)

diff --git a/target/hexagon/genptr_helpers.h b/target/hexagon/genptr_helpers.h
index 0e2d7b9..9917d72 100644
--- a/target/hexagon/genptr_helpers.h
+++ b/target/hexagon/genptr_helpers.h
@@ -463,4 +463,71 @@ static inline void gen_store8i(TCGv_env cpu_env, TCGv vaddr, int64_t src,
     tcg_temp_free_i64(tmp);
 }
 
+static inline TCGv_i64 gen_carry_from_add64(TCGv_i64 result, TCGv_i64 a,
+                                            TCGv_i64 b, TCGv_i64 c)
+{
+    TCGv_i64 WORD = tcg_temp_new_i64();
+    TCGv_i64 tmpa = tcg_temp_new_i64();
+    TCGv_i64 tmpb = tcg_temp_new_i64();
+    TCGv_i64 tmpc = tcg_temp_new_i64();
+
+    tcg_gen_mov_i64(tmpa, fGETUWORD(0, a));
+    tcg_gen_mov_i64(tmpb, fGETUWORD(0, b));
+    tcg_gen_add_i64(tmpc, tmpa, tmpb);
+    tcg_gen_add_i64(tmpc, tmpc, c);
+    tcg_gen_mov_i64(tmpa, fGETUWORD(1, a));
+    tcg_gen_mov_i64(tmpb, fGETUWORD(1, b));
+    tcg_gen_add_i64(tmpc, tmpa, tmpb);
+    tcg_gen_add_i64(tmpc, tmpc, fGETUWORD(1, tmpc));
+    tcg_gen_mov_i64(result, fGETUWORD(1, tmpc));
+
+    tcg_temp_free_i64(WORD);
+    tcg_temp_free_i64(tmpa);
+    tcg_temp_free_i64(tmpb);
+    tcg_temp_free_i64(tmpc);
+    return result;
+}
+
+static inline TCGv gen_8bitsof(TCGv result, TCGv value)
+{
+    TCGv zero = tcg_const_tl(0);
+    TCGv ones = tcg_const_tl(0xff);
+    tcg_gen_movcond_tl(TCG_COND_NE, result, value, zero, ones, zero);
+    tcg_temp_free(zero);
+    tcg_temp_free(ones);
+
+    return result;
+}
+
+static inline void gen_write_new_pc(TCGv addr)
+{
+    /* If there are multiple branches in a packet, ignore the second one */
+    TCGv zero = tcg_const_tl(0);
+    tcg_gen_movcond_tl(TCG_COND_NE, hex_next_PC, hex_branch_taken, zero,
+                       hex_next_PC, addr);
+    tcg_gen_movi_tl(hex_branch_taken, 1);
+    tcg_temp_free(zero);
+}
+
+static inline void gen_set_usr_field(int field, TCGv val)
+{
+    tcg_gen_deposit_tl(hex_gpr[HEX_REG_USR], hex_gpr[HEX_REG_USR], val,
+                       reg_field_info[field].offset,
+                       reg_field_info[field].width);
+}
+
+static inline void gen_set_usr_fieldi(int field, int x)
+{
+    TCGv val = tcg_const_tl(x);
+    gen_set_usr_field(field, val);
+    tcg_temp_free(val);
+}
+
+static inline void gen_cond_return(TCGv pred, TCGv addr)
+{
+    TCGv zero = tcg_const_tl(0);
+    tcg_gen_movcond_tl(TCG_COND_NE, hex_next_PC, pred, zero, addr, hex_next_PC);
+    tcg_temp_free(zero);
+}
+
 #endif
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 38/67] Hexagon TCG generation helpers - step 5
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (36 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 37/67] Hexagon TCG generation helpers - step 4 Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-02-28 16:43 ` [RFC PATCH v2 39/67] Hexagon TCG generation - step 01 Taylor Simpson
                   ` (29 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Helpers for instructions overriden for optimization

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/genptr_helpers.h | 314 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 314 insertions(+)

diff --git a/target/hexagon/genptr_helpers.h b/target/hexagon/genptr_helpers.h
index 9917d72..e342f29 100644
--- a/target/hexagon/genptr_helpers.h
+++ b/target/hexagon/genptr_helpers.h
@@ -530,4 +530,318 @@ static inline void gen_cond_return(TCGv pred, TCGv addr)
     tcg_temp_free(zero);
 }
 
+static inline void gen_loop0r(TCGv RsV, int riV, insn_t *insn)
+{
+    TCGv tmp = tcg_temp_new();
+    fIMMEXT(riV);
+    fPCALIGN(riV);
+    /* fWRITE_LOOP_REGS0( fREAD_PC()+riV, RsV); */
+    tcg_gen_addi_tl(tmp, hex_gpr[HEX_REG_PC], riV);
+    gen_log_reg_write(HEX_REG_LC0, RsV, insn->slot, 0);
+    gen_log_reg_write(HEX_REG_SA0, tmp, insn->slot, 0);
+    fSET_LPCFG(0);
+    tcg_temp_free(tmp);
+}
+
+static inline void gen_loop1r(TCGv RsV, int riV, insn_t *insn)
+{
+    TCGv tmp = tcg_temp_new();
+    fIMMEXT(riV);
+    fPCALIGN(riV);
+    /* fWRITE_LOOP_REGS1( fREAD_PC()+riV, RsV); */
+    tcg_gen_addi_tl(tmp, hex_gpr[HEX_REG_PC], riV);
+    gen_log_reg_write(HEX_REG_LC1, RsV, insn->slot, 0);
+    gen_log_reg_write(HEX_REG_SA1, tmp, insn->slot, 0);
+    tcg_temp_free(tmp);
+}
+
+static inline void gen_compare(TCGCond cond, TCGv res, TCGv arg1, TCGv arg2)
+{
+    TCGv one = tcg_const_tl(0xff);
+    TCGv zero = tcg_const_tl(0);
+
+    tcg_gen_movcond_tl(cond, res, arg1, arg2, one, zero);
+
+    tcg_temp_free(one);
+    tcg_temp_free(zero);
+}
+
+static inline void gen_comparei(TCGCond cond, TCGv res, TCGv arg1, int arg2)
+{
+    TCGv tmp = tcg_const_tl(arg2);
+    gen_compare(cond, res, arg1, tmp);
+    tcg_temp_free(tmp);
+}
+
+static inline void gen_compare_i64(TCGCond cond, TCGv res,
+                                   TCGv_i64 arg1, TCGv_i64 arg2)
+{
+    TCGv_i64 one = tcg_const_i64(0xff);
+    TCGv_i64 zero = tcg_const_i64(0);
+    TCGv_i64 temp = tcg_temp_new_i64();
+
+    tcg_gen_movcond_i64(cond, temp, arg1, arg2, one, zero);
+    tcg_gen_extrl_i64_i32(res, temp);
+    tcg_gen_andi_tl(res, res, 0xff);
+
+    tcg_temp_free_i64(one);
+    tcg_temp_free_i64(zero);
+    tcg_temp_free_i64(temp);
+}
+
+static inline void gen_cmpnd_cmp_jmp(int pnum, TCGCond cond, bool sense,
+                                     TCGv arg1, TCGv arg2, int pc_off)
+{
+    TCGv new_pc = tcg_temp_new();
+    TCGv pred = tcg_temp_new();
+    TCGv zero = tcg_const_tl(0);
+    TCGv one = tcg_const_tl(1);
+
+    tcg_gen_addi_tl(new_pc, hex_gpr[HEX_REG_PC], pc_off);
+    gen_compare(cond, pred, arg1, arg2);
+    gen_log_pred_write(pnum, pred);
+    if (!sense) {
+        tcg_gen_xori_tl(pred, pred, 0xff);
+    }
+
+    /* If there are multiple branches in a packet, ignore the second one */
+    tcg_gen_movcond_tl(TCG_COND_NE, pred, hex_branch_taken, zero, zero, pred);
+
+    tcg_gen_movcond_tl(TCG_COND_NE, hex_next_PC, pred, zero,
+                       new_pc, hex_next_PC);
+    tcg_gen_movcond_tl(TCG_COND_NE, hex_branch_taken, pred, zero,
+                       one, hex_branch_taken);
+
+    tcg_temp_free(new_pc);
+    tcg_temp_free(pred);
+    tcg_temp_free(zero);
+    tcg_temp_free(one);
+}
+
+static inline void gen_cmpnd_cmpi_jmp(int pnum, TCGCond cond, bool sense,
+                                      TCGv arg1, int arg2, int pc_off)
+{
+    TCGv tmp = tcg_const_tl(arg2);
+    gen_cmpnd_cmp_jmp(pnum, cond, sense, arg1, tmp, pc_off);
+    tcg_temp_free(tmp);
+
+}
+
+static inline void gen_cmpnd_cmp_n1_jmp(int pnum, TCGCond cond, bool sense,
+                                        TCGv arg, int pc_off)
+{
+    gen_cmpnd_cmpi_jmp(pnum, cond, sense, arg, -1, pc_off);
+}
+
+
+static inline void gen_jump(int pc_off)
+{
+    TCGv new_pc = tcg_temp_new();
+    tcg_gen_addi_tl(new_pc, hex_gpr[HEX_REG_PC], pc_off);
+    gen_write_new_pc(new_pc);
+    tcg_temp_free(new_pc);
+}
+
+static inline void gen_cond_jumpr(TCGv pred, TCGv dst_pc)
+{
+    TCGv zero = tcg_const_tl(0);
+    TCGv one = tcg_const_tl(1);
+    TCGv new_pc = tcg_temp_new();
+
+    tcg_gen_movcond_tl(TCG_COND_EQ, new_pc, pred, zero, hex_next_PC, dst_pc);
+
+    /* If there are multiple jumps in a packet, only the first one is taken */
+    tcg_gen_movcond_tl(TCG_COND_NE, hex_next_PC, hex_branch_taken, zero,
+                       hex_next_PC, new_pc);
+    tcg_gen_movcond_tl(TCG_COND_EQ, hex_branch_taken, pred, zero,
+                       hex_branch_taken, one);
+
+    tcg_temp_free(zero);
+    tcg_temp_free(one);
+    tcg_temp_free(new_pc);
+}
+
+static inline void gen_cond_jump(TCGv pred, int pc_off)
+{
+    TCGv new_pc = tcg_temp_new();
+
+    tcg_gen_addi_tl(new_pc, hex_gpr[HEX_REG_PC], pc_off);
+    gen_cond_jumpr(pred, new_pc);
+
+    tcg_temp_free(new_pc);
+}
+
+static inline void gen_call(int pc_off)
+{
+    gen_log_reg_write(HEX_REG_LR, hex_next_PC, 4, false);
+    gen_jump(pc_off);
+}
+
+static inline void gen_callr(TCGv new_pc)
+{
+    gen_log_reg_write(HEX_REG_LR, hex_next_PC, 4, false);
+    gen_write_new_pc(new_pc);
+}
+
+static inline void gen_endloop0(void)
+{
+    TCGv lpcfg = tcg_temp_local_new();
+
+    GET_USR_FIELD(USR_LPCFG, lpcfg);
+
+    /*
+     *    if (lpcfg == 1) {
+     *        hex_new_pred_value[3] = 0xff;
+     *        hex_pred_written |= 1 << 3;
+     *    }
+     */
+    TCGLabel *label1 = gen_new_label();
+    tcg_gen_brcondi_tl(TCG_COND_NE, lpcfg, 1, label1);
+    {
+        tcg_gen_movi_tl(hex_new_pred_value[3], 0xff);
+        tcg_gen_ori_tl(hex_pred_written, hex_pred_written, 1 << 3);
+    }
+    gen_set_label(label1);
+
+    /*
+     *    if (lpcfg) {
+     *        SET_USR_FIELD(USR_LPCFG, lpcfg - 1);
+     *    }
+     */
+    TCGLabel *label2 = gen_new_label();
+    tcg_gen_brcondi_tl(TCG_COND_EQ, lpcfg, 0, label2);
+    {
+        tcg_gen_subi_tl(lpcfg, lpcfg, 1);
+        SET_USR_FIELD(USR_LPCFG, lpcfg);
+    }
+    gen_set_label(label2);
+
+    /*
+     *    if (hex_gpr[HEX_REG_LC0] > 1) {
+     *        hex_next_PC = hex_gpr[HEX_REG_SA0];
+     *        hex_branch_taken = 1;
+     *        hex_gpr[HEX_REG_LC0] = hex_gpr[HEX_REG_LC0] - 1;
+     *    }
+     */
+    TCGLabel *label3 = gen_new_label();
+    tcg_gen_brcondi_tl(TCG_COND_LEU, hex_gpr[HEX_REG_LC0], 1, label3);
+    {
+        tcg_gen_mov_tl(hex_next_PC, hex_gpr[HEX_REG_SA0]);
+        tcg_gen_movi_tl(hex_branch_taken, 1);
+        TCGv lc0 = tcg_temp_local_new();
+        tcg_gen_mov_tl(lc0, hex_gpr[HEX_REG_LC0]);
+        tcg_gen_subi_tl(lc0, lc0, 1);
+        tcg_gen_mov_tl(hex_new_value[HEX_REG_LC0], lc0);
+        tcg_temp_free(lc0);
+    }
+    gen_set_label(label3);
+
+    tcg_temp_free(lpcfg);
+}
+
+static inline void gen_endloop1(void)
+{
+    /*
+     *    if (hex_gpr[HEX_REG_LC1] > 1) {
+     *        hex_next_PC = hex_gpr[HEX_REG_SA1];
+     *        hex_branch_taken = 1;
+     *        hex_gpr[HEX_REG_LC1] = hex_gpr[HEX_REG_LC1] - 1;
+     *    }
+     */
+    TCGLabel *label = gen_new_label();
+    tcg_gen_brcondi_tl(TCG_COND_LEU, hex_gpr[HEX_REG_LC1], 1, label);
+    {
+        tcg_gen_mov_tl(hex_next_PC, hex_gpr[HEX_REG_SA1]);
+        tcg_gen_movi_tl(hex_branch_taken, 1);
+        TCGv lc1 = tcg_temp_local_new();
+        tcg_gen_mov_tl(lc1, hex_gpr[HEX_REG_LC1]);
+        tcg_gen_subi_tl(lc1, lc1, 1);
+        tcg_gen_mov_tl(hex_new_value[HEX_REG_LC1], lc1);
+        tcg_temp_free(lc1);
+    }
+    gen_set_label(label);
+}
+
+static inline void gen_ashiftr_4_4s(TCGv dst, TCGv src, int32_t shift_amt)
+{
+    tcg_gen_sari_tl(dst, src, shift_amt);
+}
+
+static inline void gen_ashiftl_4_4s(TCGv dst, TCGv src, int32_t shift_amt)
+{
+    if (shift_amt >= 64) {
+        tcg_gen_movi_tl(dst, 0);
+    } else {
+        tcg_gen_shli_tl(dst, src, shift_amt);
+    }
+}
+
+static inline void gen_cmp_jumpnv(TCGCond cond, int rnum, TCGv src, int pc_off)
+{
+    TCGv pred = tcg_temp_new();
+    tcg_gen_setcond_tl(cond, pred, hex_new_value[rnum], src);
+    gen_cond_jump(pred, pc_off);
+    tcg_temp_free(pred);
+}
+
+static inline void gen_cmpi_jumpnv(TCGCond cond, int rnum, int src, int pc_off)
+{
+    TCGv pred = tcg_temp_new();
+    tcg_gen_setcondi_tl(cond, pred, hex_new_value[rnum], src);
+    gen_cond_jump(pred, pc_off);
+    tcg_temp_free(pred);
+}
+
+static inline void gen_asl_r_r_or(TCGv RxV, TCGv RsV, TCGv RtV)
+{
+    TCGv zero = tcg_const_tl(0);
+    TCGv shift_amt = tcg_temp_new();
+    TCGv_i64 shift_amt_i64 = tcg_temp_new_i64();
+    TCGv_i64 shift_left_val_i64 = tcg_temp_new_i64();
+    TCGv shift_left_val = tcg_temp_new();
+    TCGv_i64 shift_right_val_i64 = tcg_temp_new_i64();
+    TCGv shift_right_val = tcg_temp_new();
+    TCGv or_val = tcg_temp_new();
+
+    /* Sign extend 7->32 bits */
+    tcg_gen_shli_tl(shift_amt, RtV, 32 - 7);
+    tcg_gen_sari_tl(shift_amt, shift_amt, 32 - 7);
+    tcg_gen_ext_i32_i64(shift_amt_i64, shift_amt);
+
+    tcg_gen_ext_i32_i64(shift_left_val_i64, RsV);
+    tcg_gen_shl_i64(shift_left_val_i64, shift_left_val_i64, shift_amt_i64);
+    tcg_gen_extrl_i64_i32(shift_left_val, shift_left_val_i64);
+
+    /* ((-(SHAMT)) - 1) */
+    tcg_gen_neg_i64(shift_amt_i64, shift_amt_i64);
+    tcg_gen_subi_i64(shift_amt_i64, shift_amt_i64, 1);
+
+    tcg_gen_ext_i32_i64(shift_right_val_i64, RsV);
+    tcg_gen_sar_i64(shift_right_val_i64, shift_right_val_i64, shift_amt_i64);
+    tcg_gen_sari_i64(shift_right_val_i64, shift_right_val_i64, 1);
+    tcg_gen_extrl_i64_i32(shift_right_val, shift_right_val_i64);
+
+    tcg_gen_movcond_tl(TCG_COND_GE, or_val, shift_amt, zero,
+                       shift_left_val, shift_right_val);
+    tcg_gen_or_tl(RxV, RxV, or_val);
+
+    tcg_temp_free(zero);
+    tcg_temp_free(shift_amt);
+    tcg_temp_free_i64(shift_amt_i64);
+    tcg_temp_free_i64(shift_left_val_i64);
+    tcg_temp_free(shift_left_val);
+    tcg_temp_free_i64(shift_right_val_i64);
+    tcg_temp_free(shift_right_val);
+    tcg_temp_free(or_val);
+}
+
+static inline void gen_lshiftr_4_4u(TCGv dst, TCGv src, int32_t shift_amt)
+{
+    if (shift_amt >= 64) {
+        tcg_gen_movi_tl(dst, 0);
+    } else {
+        tcg_gen_shri_tl(dst, src, shift_amt);
+    }
+}
+
 #endif
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 39/67] Hexagon TCG generation - step 01
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (37 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 38/67] Hexagon TCG generation helpers - step 5 Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-02-28 16:43 ` [RFC PATCH v2 40/67] Hexagon TCG generation - step 02 Taylor Simpson
                   ` (28 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Include the generated files and set up the data structures

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/genptr.h | 25 +++++++++++++++++++++
 target/hexagon/genptr.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 84 insertions(+)
 create mode 100644 target/hexagon/genptr.h
 create mode 100644 target/hexagon/genptr.c

diff --git a/target/hexagon/genptr.h b/target/hexagon/genptr.h
new file mode 100644
index 0000000..a3a3db1
--- /dev/null
+++ b/target/hexagon/genptr.h
@@ -0,0 +1,25 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HEXAGON_GENPTR_H
+#define HEXAGON_GENPTR_H
+
+#include "insn.h"
+
+extern semantic_insn_t opcode_genptr[];
+
+#endif
diff --git a/target/hexagon/genptr.c b/target/hexagon/genptr.c
new file mode 100644
index 0000000..5a602bd
--- /dev/null
+++ b/target/hexagon/genptr.c
@@ -0,0 +1,59 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#define QEMU_GENERATE
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "cpu.h"
+#include "internal.h"
+#include "tcg/tcg-op.h"
+#include "insn.h"
+#include "opcodes.h"
+#include "translate.h"
+#include "macros.h"
+#include "genptr_helpers.h"
+
+#include "qemu_wrap_generated.h"
+
+#define DEF_QEMU(TAG, SHORTCODE, HELPER, GENFN, HELPFN) \
+static void generate_##TAG(CPUHexagonState *env, DisasContext *ctx, \
+                           insn_t *insn) \
+{ \
+    GENFN \
+}
+#include "qemu_def_generated.h"
+#undef DEF_QEMU
+
+
+/* Fill in the table with NULLs because not all the opcodes have DEF_QEMU */
+semantic_insn_t opcode_genptr[] = {
+#define OPCODE(X)                              NULL
+#include "opcodes_def_generated.h"
+    NULL
+#undef OPCODE
+};
+
+/* This function overwrites the NULL entries where we have a DEF_QEMU */
+void init_genptr(void)
+{
+#define DEF_QEMU(TAG, SHORTCODE, HELPER, GENFN, HELPFN) \
+    opcode_genptr[TAG] = generate_##TAG;
+#include "qemu_def_generated.h"
+#undef DEF_QEMU
+}
+
+
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 40/67] Hexagon TCG generation - step 02
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (38 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 39/67] Hexagon TCG generation - step 01 Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-02-28 16:43 ` [RFC PATCH v2 41/67] Hexagon TCG generation - step 03 Taylor Simpson
                   ` (27 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Override load instructions

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/helper_overrides.h | 404 ++++++++++++++++++++++++++++++++++++++
 target/hexagon/genptr.c           |   1 +
 2 files changed, 405 insertions(+)
 create mode 100644 target/hexagon/helper_overrides.h

diff --git a/target/hexagon/helper_overrides.h b/target/hexagon/helper_overrides.h
new file mode 100644
index 0000000..6a69fc6
--- /dev/null
+++ b/target/hexagon/helper_overrides.h
@@ -0,0 +1,404 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HEXAGON_HELPER_OVERRIDES_H
+#define HEXAGON_HELPER_OVERRIDES_H
+
+/*
+ * Here is a primer to understand the tag names for load/store instructions
+ *
+ * Data types
+ *      b        signed byte                       r0 = memb(r2+#0)
+ *     ub        unsigned byte                     r0 = memub(r2+#0)
+ *      h        signed half word (16 bits)        r0 = memh(r2+#0)
+ *     uh        unsigned half word                r0 = memuh(r2+#0)
+ *      i        integer (32 bits)                 r0 = memw(r2+#0)
+ *      d        double word (64 bits)             r1:0 = memd(r2+#0)
+ *
+ * Addressing modes
+ *     _io       indirect with offset              r0 = memw(r1+#4)
+ *     _ur       absolute with register offset     r0 = memw(r1<<#4+##variable)
+ *     _rr       indirect with register offset     r0 = memw(r1+r4<<#2)
+ *     gp        global pointer relative           r0 = memw(gp+#200)
+ *     _sp       stack pointer relative            r0 = memw(r29+#12)
+ *     _ap       absolute set                      r0 = memw(r1=##variable)
+ *     _pr       post increment register           r0 = memw(r1++m1)
+ *     _pbr      post increment bit reverse        r0 = memw(r1++m1:brev)
+ *     _pi       post increment immediate          r0 = memb(r1++#1)
+ *     _pci      post increment circular immediate r0 = memw(r1++#4:circ(m0))
+ *     _pcr      post increment circular register  r0 = memw(r1++I:circ(m0))
+ */
+
+/* Macros for complex addressing modes */
+#define GET_EA_ap \
+    do { \
+        fEA_IMM(UiV); \
+        tcg_gen_movi_tl(ReV, UiV); \
+    } while (0)
+#define GET_EA_pr \
+    do { \
+        fEA_REG(RxV); \
+        fPM_M(RxV, MuV); \
+    } while (0)
+#define GET_EA_pbr \
+    do { \
+        fEA_BREVR(RxV); \
+        fPM_M(RxV, MuV); \
+    } while (0)
+#define GET_EA_pi \
+    do { \
+        fEA_REG(RxV); \
+        fPM_I(RxV, siV); \
+    } while (0)
+#define GET_EA_pci \
+    do { \
+        fEA_REG(RxV); \
+        fPM_CIRI(RxV, siV, MuV); \
+    } while (0)
+#define GET_EA_pcr(SHIFT) \
+    do { \
+        fEA_REG(RxV); \
+        fPM_CIRR(RxV, fREAD_IREG(MuV, (SHIFT)), MuV); \
+    } while (0)
+
+/*
+ * Many instructions will work with just macro redefinitions
+ * with the caveat that they need a tmp variable to carry a
+ * value between them.
+ */
+#define fWRAP_tmp(SHORTCODE) \
+    do { \
+        TCGv tmp = tcg_temp_new(); \
+        SHORTCODE; \
+        tcg_temp_free(tmp); \
+    } while (0)
+
+/* Byte load instructions */
+#define fWRAP_L2_loadrub_io(GENHLPR, SHORTCODE)      SHORTCODE
+#define fWRAP_L2_loadrb_io(GENHLPR, SHORTCODE)       SHORTCODE
+#define fWRAP_L4_loadrub_ur(GENHLPR, SHORTCODE)      SHORTCODE
+#define fWRAP_L4_loadrb_ur(GENHLPR, SHORTCODE)       SHORTCODE
+#define fWRAP_L4_loadrub_rr(GENHLPR, SHORTCODE)      SHORTCODE
+#define fWRAP_L4_loadrb_rr(GENHLPR, SHORTCODE)       SHORTCODE
+#define fWRAP_L2_loadrubgp(GENHLPR, SHORTCODE)       fWRAP_tmp(SHORTCODE)
+#define fWRAP_L2_loadrbgp(GENHLPR, SHORTCODE)        fWRAP_tmp(SHORTCODE)
+#define fWRAP_SL1_loadrub_io(GENHLPR, SHORTCODE)     SHORTCODE
+#define fWRAP_SL2_loadrb_io(GENHLPR, SHORTCODE)      SHORTCODE
+
+/* Half word load instruction */
+#define fWRAP_L2_loadruh_io(GENHLPR, SHORTCODE)      SHORTCODE
+#define fWRAP_L2_loadrh_io(GENHLPR, SHORTCODE)       SHORTCODE
+#define fWRAP_L4_loadruh_ur(GENHLPR, SHORTCODE)      SHORTCODE
+#define fWRAP_L4_loadrh_ur(GENHLPR, SHORTCODE)       SHORTCODE
+#define fWRAP_L4_loadruh_rr(GENHLPR, SHORTCODE)      SHORTCODE
+#define fWRAP_L4_loadrh_rr(GENHLPR, SHORTCODE)       SHORTCODE
+#define fWRAP_L2_loadruhgp(GENHLPR, SHORTCODE)       fWRAP_tmp(SHORTCODE)
+#define fWRAP_L2_loadrhgp(GENHLPR, SHORTCODE)        fWRAP_tmp(SHORTCODE)
+#define fWRAP_SL2_loadruh_io(GENHLPR, SHORTCODE)     SHORTCODE
+#define fWRAP_SL2_loadrh_io(GENHLPR, SHORTCODE)      SHORTCODE
+
+/* Word load instructions */
+#define fWRAP_L2_loadri_io(GENHLPR, SHORTCODE)       SHORTCODE
+#define fWRAP_L4_loadri_ur(GENHLPR, SHORTCODE)       SHORTCODE
+#define fWRAP_L4_loadri_rr(GENHLPR, SHORTCODE)       SHORTCODE
+#define fWRAP_L2_loadrigp(GENHLPR, SHORTCODE)        fWRAP_tmp(SHORTCODE)
+#define fWRAP_SL1_loadri_io(GENHLPR, SHORTCODE)      SHORTCODE
+#define fWRAP_SL2_loadri_sp(GENHLPR, SHORTCODE)      fWRAP_tmp(SHORTCODE)
+
+/* Double word load instructions */
+#define fWRAP_L2_loadrd_io(GENHLPR, SHORTCODE)       SHORTCODE
+#define fWRAP_L4_loadrd_ur(GENHLPR, SHORTCODE)       SHORTCODE
+#define fWRAP_L4_loadrd_rr(GENHLPR, SHORTCODE)       SHORTCODE
+#define fWRAP_L2_loadrdgp(GENHLPR, SHORTCODE)        fWRAP_tmp(SHORTCODE)
+#define fWRAP_SL2_loadrd_sp(GENHLPR, SHORTCODE)      fWRAP_tmp(SHORTCODE)
+
+/* Instructions with multiple definitions */
+#define fWRAP_LOAD_AP(RES, SIZE, SIGN) \
+    do { \
+        fMUST_IMMEXT(UiV); \
+        fEA_IMM(UiV); \
+        fLOAD(1, SIZE, SIGN, EA, RES); \
+        tcg_gen_movi_tl(ReV, UiV); \
+    } while (0)
+
+#define fWRAP_L4_loadrub_ap(GENHLPR, SHORTCODE) \
+    fWRAP_LOAD_AP(RdV, 1, u)
+#define fWRAP_L4_loadrb_ap(GENHLPR, SHORTCODE) \
+    fWRAP_LOAD_AP(RdV, 1, s)
+#define fWRAP_L4_loadruh_ap(GENHLPR, SHORTCODE) \
+    fWRAP_LOAD_AP(RdV, 2, u)
+#define fWRAP_L4_loadrh_ap(GENHLPR, SHORTCODE) \
+    fWRAP_LOAD_AP(RdV, 2, s)
+#define fWRAP_L4_loadri_ap(GENHLPR, SHORTCODE) \
+    fWRAP_LOAD_AP(RdV, 4, u)
+#define fWRAP_L4_loadrd_ap(GENHLPR, SHORTCODE) \
+    fWRAP_LOAD_AP(RddV, 8, u)
+
+#define fWRAP_L2_loadrub_pci(GENHLPR, SHORTCODE) \
+      fWRAP_tmp(SHORTCODE)
+#define fWRAP_L2_loadrb_pci(GENHLPR, SHORTCODE) \
+      fWRAP_tmp(SHORTCODE)
+#define fWRAP_L2_loadruh_pci(GENHLPR, SHORTCODE) \
+      fWRAP_tmp(SHORTCODE)
+#define fWRAP_L2_loadrh_pci(GENHLPR, SHORTCODE) \
+      fWRAP_tmp(SHORTCODE)
+#define fWRAP_L2_loadri_pci(GENHLPR, SHORTCODE) \
+      fWRAP_tmp(SHORTCODE)
+#define fWRAP_L2_loadrd_pci(GENHLPR, SHORTCODE) \
+      fWRAP_tmp(SHORTCODE)
+
+#define fWRAP_PCR(SHIFT, LOAD) \
+    do { \
+        TCGv ireg = tcg_temp_new(); \
+        TCGv tmp = tcg_temp_new(); \
+        fEA_REG(RxV); \
+        fREAD_IREG(MuV, SHIFT); \
+        gen_fcircadd(RxV, ireg, MuV, fREAD_CSREG(MuN)); \
+        LOAD; \
+        tcg_temp_free(tmp); \
+        tcg_temp_free(ireg); \
+    } while (0)
+
+#define fWRAP_L2_loadrub_pcr(GENHLPR, SHORTCODE) \
+      fWRAP_PCR(0, fLOAD(1, 1, u, EA, RdV))
+#define fWRAP_L2_loadrb_pcr(GENHLPR, SHORTCODE) \
+      fWRAP_PCR(0, fLOAD(1, 1, s, EA, RdV))
+#define fWRAP_L2_loadruh_pcr(GENHLPR, SHORTCODE) \
+      fWRAP_PCR(1, fLOAD(1, 2, u, EA, RdV))
+#define fWRAP_L2_loadrh_pcr(GENHLPR, SHORTCODE) \
+      fWRAP_PCR(1, fLOAD(1, 2, s, EA, RdV))
+#define fWRAP_L2_loadri_pcr(GENHLPR, SHORTCODE) \
+      fWRAP_PCR(2, fLOAD(1, 4, u, EA, RdV))
+#define fWRAP_L2_loadrd_pcr(GENHLPR, SHORTCODE) \
+      fWRAP_PCR(3, fLOAD(1, 8, u, EA, RddV))
+
+#define fWRAP_L2_loadrub_pr(GENHLPR, SHORTCODE)      SHORTCODE
+#define fWRAP_L2_loadrub_pbr(GENHLPR, SHORTCODE)     SHORTCODE
+#define fWRAP_L2_loadrub_pi(GENHLPR, SHORTCODE)      SHORTCODE
+#define fWRAP_L2_loadrb_pr(GENHLPR, SHORTCODE)       SHORTCODE
+#define fWRAP_L2_loadrb_pbr(GENHLPR, SHORTCODE)      SHORTCODE
+#define fWRAP_L2_loadrb_pi(GENHLPR, SHORTCODE)       SHORTCODE;
+#define fWRAP_L2_loadruh_pr(GENHLPR, SHORTCODE)      SHORTCODE
+#define fWRAP_L2_loadruh_pbr(GENHLPR, SHORTCODE)     SHORTCODE
+#define fWRAP_L2_loadruh_pi(GENHLPR, SHORTCODE)      SHORTCODE;
+#define fWRAP_L2_loadrh_pr(GENHLPR, SHORTCODE)       SHORTCODE
+#define fWRAP_L2_loadrh_pbr(GENHLPR, SHORTCODE)      SHORTCODE
+#define fWRAP_L2_loadrh_pi(GENHLPR, SHORTCODE)       SHORTCODE
+#define fWRAP_L2_loadri_pr(GENHLPR, SHORTCODE)       SHORTCODE
+#define fWRAP_L2_loadri_pbr(GENHLPR, SHORTCODE)      SHORTCODE
+#define fWRAP_L2_loadri_pi(GENHLPR, SHORTCODE)       SHORTCODE
+#define fWRAP_L2_loadrd_pr(GENHLPR, SHORTCODE)       SHORTCODE
+#define fWRAP_L2_loadrd_pbr(GENHLPR, SHORTCODE)      SHORTCODE
+#define fWRAP_L2_loadrd_pi(GENHLPR, SHORTCODE)       SHORTCODE
+
+/*
+ * These instructions load 2 bytes and places them in
+ * two halves of the destination register.
+ * The GET_EA macro determines the addressing mode.
+ * The fGB macro determines whether to zero-extend or
+ * sign-extend.
+ */
+#define fWRAP_loadbXw2(GET_EA, fGB) \
+    do { \
+        TCGv ireg = tcg_temp_new(); \
+        TCGv tmp = tcg_temp_new(); \
+        TCGv tmpV = tcg_temp_new(); \
+        TCGv BYTE = tcg_temp_new(); \
+        int i; \
+        GET_EA; \
+        fLOAD(1, 2, u, EA, tmpV); \
+        tcg_gen_movi_tl(RdV, 0); \
+        for (i = 0; i < 2; i++) { \
+            fSETHALF(i, RdV, fGB(i, tmpV)); \
+        } \
+        tcg_temp_free(ireg); \
+        tcg_temp_free(tmp); \
+        tcg_temp_free(tmpV); \
+        tcg_temp_free(BYTE); \
+    } while (0)
+
+#define fWRAP_L2_loadbzw2_io(GENHLPR, SHORTCODE) \
+    fWRAP_loadbXw2(fEA_RI(RsV, siV), fGETUBYTE)
+#define fWRAP_L4_loadbzw2_ur(GENHLPR, SHORTCODE) \
+    fWRAP_loadbXw2(fEA_IRs(UiV, RtV, uiV), fGETUBYTE)
+#define fWRAP_L2_loadbsw2_io(GENHLPR, SHORTCODE) \
+    fWRAP_loadbXw2(fEA_RI(RsV, siV), fGETBYTE)
+#define fWRAP_L4_loadbsw2_ur(GENHLPR, SHORTCODE) \
+    fWRAP_loadbXw2(fEA_IRs(UiV, RtV, uiV), fGETBYTE)
+#define fWRAP_L4_loadbzw2_ap(GENHLPR, SHORTCODE) \
+    fWRAP_loadbXw2(GET_EA_ap, fGETUBYTE)
+#define fWRAP_L2_loadbzw2_pr(GENHLPR, SHORTCODE) \
+    fWRAP_loadbXw2(GET_EA_pr, fGETUBYTE)
+#define fWRAP_L2_loadbzw2_pbr(GENHLPR, SHORTCODE) \
+    fWRAP_loadbXw2(GET_EA_pbr, fGETUBYTE)
+#define fWRAP_L2_loadbzw2_pi(GENHLPR, SHORTCODE) \
+    fWRAP_loadbXw2(GET_EA_pi, fGETUBYTE)
+#define fWRAP_L4_loadbsw2_ap(GENHLPR, SHORTCODE) \
+    fWRAP_loadbXw2(GET_EA_ap, fGETBYTE)
+#define fWRAP_L2_loadbsw2_pr(GENHLPR, SHORTCODE) \
+    fWRAP_loadbXw2(GET_EA_pr, fGETBYTE)
+#define fWRAP_L2_loadbsw2_pbr(GENHLPR, SHORTCODE) \
+    fWRAP_loadbXw2(GET_EA_pbr, fGETBYTE)
+#define fWRAP_L2_loadbsw2_pi(GENHLPR, SHORTCODE) \
+    fWRAP_loadbXw2(GET_EA_pi, fGETBYTE)
+#define fWRAP_L2_loadbzw2_pci(GENHLPR, SHORTCODE) \
+    fWRAP_loadbXw2(GET_EA_pci, fGETUBYTE)
+#define fWRAP_L2_loadbsw2_pci(GENHLPR, SHORTCODE) \
+    fWRAP_loadbXw2(GET_EA_pci, fGETBYTE)
+#define fWRAP_L2_loadbzw2_pcr(GENHLPR, SHORTCODE) \
+    fWRAP_loadbXw2(GET_EA_pcr(1), fGETUBYTE)
+#define fWRAP_L2_loadbsw2_pcr(GENHLPR, SHORTCODE) \
+    fWRAP_loadbXw2(GET_EA_pcr(1), fGETBYTE)
+
+/*
+ * These instructions load 4 bytes and places them in
+ * four halves of the destination register pair.
+ * The GET_EA macro determines the addressing mode.
+ * The fGB macro determines whether to zero-extend or
+ * sign-extend.
+ */
+#define fWRAP_loadbXw4(GET_EA, fGB) \
+    do { \
+        TCGv ireg = tcg_temp_new(); \
+        TCGv tmp = tcg_temp_new(); \
+        TCGv tmpV = tcg_temp_new(); \
+        TCGv BYTE = tcg_temp_new(); \
+        int i; \
+        GET_EA; \
+        fLOAD(1, 4, u, EA, tmpV);  \
+        tcg_gen_movi_i64(RddV, 0); \
+        for (i = 0; i < 4; i++) { \
+            fSETHALF(i, RddV, fGB(i, tmpV));  \
+        }  \
+        tcg_temp_free(ireg); \
+        tcg_temp_free(tmp); \
+        tcg_temp_free(tmpV); \
+        tcg_temp_free(BYTE); \
+    } while (0)
+
+#define fWRAP_L2_loadbzw4_io(GENHLPR, SHORTCODE) \
+    fWRAP_loadbXw4(fEA_RI(RsV, siV), fGETUBYTE)
+#define fWRAP_L4_loadbzw4_ur(GENHLPR, SHORTCODE) \
+    fWRAP_loadbXw4(fEA_IRs(UiV, RtV, uiV), fGETUBYTE)
+#define fWRAP_L2_loadbsw4_io(GENHLPR, SHORTCODE) \
+    fWRAP_loadbXw4(fEA_RI(RsV, siV), fGETBYTE)
+#define fWRAP_L4_loadbsw4_ur(GENHLPR, SHORTCODE) \
+    fWRAP_loadbXw4(fEA_IRs(UiV, RtV, uiV), fGETBYTE)
+#define fWRAP_L2_loadbzw4_pci(GENHLPR, SHORTCODE) \
+    fWRAP_loadbXw4(GET_EA_pci, fGETUBYTE)
+#define fWRAP_L2_loadbsw4_pci(GENHLPR, SHORTCODE) \
+    fWRAP_loadbXw4(GET_EA_pci, fGETBYTE)
+#define fWRAP_L2_loadbzw4_pcr(GENHLPR, SHORTCODE) \
+    fWRAP_loadbXw4(GET_EA_pcr(2), fGETUBYTE)
+#define fWRAP_L2_loadbsw4_pcr(GENHLPR, SHORTCODE) \
+    fWRAP_loadbXw4(GET_EA_pcr(2), fGETBYTE)
+#define fWRAP_L4_loadbzw4_ap(GENHLPR, SHORTCODE) \
+    fWRAP_loadbXw4(GET_EA_ap, fGETUBYTE)
+#define fWRAP_L2_loadbzw4_pr(GENHLPR, SHORTCODE) \
+    fWRAP_loadbXw4(GET_EA_pr, fGETUBYTE)
+#define fWRAP_L2_loadbzw4_pbr(GENHLPR, SHORTCODE) \
+    fWRAP_loadbXw4(GET_EA_pbr, fGETUBYTE)
+#define fWRAP_L2_loadbzw4_pi(GENHLPR, SHORTCODE) \
+    fWRAP_loadbXw4(GET_EA_pi, fGETUBYTE)
+#define fWRAP_L4_loadbsw4_ap(GENHLPR, SHORTCODE) \
+    fWRAP_loadbXw4(GET_EA_ap, fGETBYTE)
+#define fWRAP_L2_loadbsw4_pr(GENHLPR, SHORTCODE) \
+    fWRAP_loadbXw4(GET_EA_pr, fGETBYTE)
+#define fWRAP_L2_loadbsw4_pbr(GENHLPR, SHORTCODE) \
+    fWRAP_loadbXw4(GET_EA_pbr, fGETBYTE)
+#define fWRAP_L2_loadbsw4_pi(GENHLPR, SHORTCODE) \
+    fWRAP_loadbXw4(GET_EA_pi, fGETBYTE)
+
+/*
+ * These instructions load a half word, shift the destination right by 16 bits
+ * and place the loaded value in the high half word of the destination pair.
+ * The GET_EA macro determines the addressing mode.
+ */
+#define fWRAP_loadalignh(GET_EA) \
+    do { \
+        TCGv ireg = tcg_temp_new(); \
+        TCGv tmp = tcg_temp_new(); \
+        TCGv tmpV = tcg_temp_new(); \
+        TCGv_i64 tmp_i64 = tcg_temp_new_i64(); \
+        READ_REG_PAIR(RyyV, RyyN); \
+        GET_EA;  \
+        fLOAD(1, 2, u, EA, tmpV);  \
+        tcg_gen_extu_i32_i64(tmp_i64, tmpV); \
+        tcg_gen_shli_i64(tmp_i64, tmp_i64, 48); \
+        tcg_gen_shri_i64(RyyV, RyyV, 16); \
+        tcg_gen_or_i64(RyyV, RyyV, tmp_i64); \
+        tcg_temp_free(ireg); \
+        tcg_temp_free(tmp); \
+        tcg_temp_free(tmpV); \
+        tcg_temp_free_i64(tmp_i64); \
+    } while (0)
+
+#define fWRAP_L4_loadalignh_ur(GENHLPR, SHORTCODE) \
+    fWRAP_loadalignh(fEA_IRs(UiV, RtV, uiV))
+#define fWRAP_L2_loadalignh_io(GENHLPR, SHORTCODE) \
+    fWRAP_loadalignh(fEA_RI(RsV, siV))
+#define fWRAP_L2_loadalignh_pci(GENHLPR, SHORTCODE) \
+    fWRAP_loadalignh(GET_EA_pci)
+#define fWRAP_L2_loadalignh_pcr(GENHLPR, SHORTCODE) \
+    fWRAP_loadalignh(GET_EA_pcr(1))
+#define fWRAP_L4_loadalignh_ap(GENHLPR, SHORTCODE) \
+    fWRAP_loadalignh(GET_EA_ap)
+#define fWRAP_L2_loadalignh_pr(GENHLPR, SHORTCODE) \
+    fWRAP_loadalignh(GET_EA_pr)
+#define fWRAP_L2_loadalignh_pbr(GENHLPR, SHORTCODE) \
+    fWRAP_loadalignh(GET_EA_pbr)
+#define fWRAP_L2_loadalignh_pi(GENHLPR, SHORTCODE) \
+    fWRAP_loadalignh(GET_EA_pi)
+
+/* Same as above, but loads a byte instead of half word */
+#define fWRAP_loadalignb(GET_EA) \
+    do { \
+        TCGv ireg = tcg_temp_new(); \
+        TCGv tmp = tcg_temp_new(); \
+        TCGv tmpV = tcg_temp_new(); \
+        TCGv_i64 tmp_i64 = tcg_temp_new_i64(); \
+        READ_REG_PAIR(RyyV, RyyN); \
+        GET_EA;  \
+        fLOAD(1, 1, u, EA, tmpV);  \
+        tcg_gen_extu_i32_i64(tmp_i64, tmpV); \
+        tcg_gen_shli_i64(tmp_i64, tmp_i64, 56); \
+        tcg_gen_shri_i64(RyyV, RyyV, 8); \
+        tcg_gen_or_i64(RyyV, RyyV, tmp_i64); \
+        tcg_temp_free(ireg); \
+        tcg_temp_free(tmp); \
+        tcg_temp_free(tmpV); \
+        tcg_temp_free_i64(tmp_i64); \
+    } while (0)
+
+#define fWRAP_L2_loadalignb_io(GENHLPR, SHORTCODE) \
+    fWRAP_loadalignb(fEA_RI(RsV, siV))
+#define fWRAP_L4_loadalignb_ur(GENHLPR, SHORTCODE) \
+    fWRAP_loadalignb(fEA_IRs(UiV, RtV, uiV))
+#define fWRAP_L2_loadalignb_pci(GENHLPR, SHORTCODE) \
+    fWRAP_loadalignb(GET_EA_pci)
+#define fWRAP_L2_loadalignb_pcr(GENHLPR, SHORTCODE) \
+    fWRAP_loadalignb(GET_EA_pcr(0))
+#define fWRAP_L4_loadalignb_ap(GENHLPR, SHORTCODE) \
+    fWRAP_loadalignb(GET_EA_ap)
+#define fWRAP_L2_loadalignb_pr(GENHLPR, SHORTCODE) \
+    fWRAP_loadalignb(GET_EA_pr)
+#define fWRAP_L2_loadalignb_pbr(GENHLPR, SHORTCODE) \
+    fWRAP_loadalignb(GET_EA_pbr)
+#define fWRAP_L2_loadalignb_pi(GENHLPR, SHORTCODE) \
+    fWRAP_loadalignb(GET_EA_pi)
+
+#endif
diff --git a/target/hexagon/genptr.c b/target/hexagon/genptr.c
index 5a602bd..6b9cc0d 100644
--- a/target/hexagon/genptr.c
+++ b/target/hexagon/genptr.c
@@ -26,6 +26,7 @@
 #include "translate.h"
 #include "macros.h"
 #include "genptr_helpers.h"
+#include "helper_overrides.h"
 
 #include "qemu_wrap_generated.h"
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 41/67] Hexagon TCG generation - step 03
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (39 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 40/67] Hexagon TCG generation - step 02 Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-02-28 16:43 ` [RFC PATCH v2 42/67] Hexagon TCG generation - step 04 Taylor Simpson
                   ` (26 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Override predicated load instructions

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/helper_overrides.h | 235 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 235 insertions(+)

diff --git a/target/hexagon/helper_overrides.h b/target/hexagon/helper_overrides.h
index 6a69fc6..20d8584 100644
--- a/target/hexagon/helper_overrides.h
+++ b/target/hexagon/helper_overrides.h
@@ -401,4 +401,239 @@
 #define fWRAP_L2_loadalignb_pi(GENHLPR, SHORTCODE) \
     fWRAP_loadalignb(GET_EA_pi)
 
+/*
+ * Predicated loads
+ * Here is a primer to understand the tag names
+ *
+ * Predicate used
+ *      t        true "old" value                  if (p0) r0 = memb(r2+#0)
+ *      f        false "old" value                 if (!p0) r0 = memb(r2+#0)
+ *      tnew     true "new" value                  if (p0.new) r0 = memb(r2+#0)
+ *      fnew     false "new" value                 if (!p0.new) r0 = memb(r2+#0)
+ */
+#define fWRAP_PRED_LOAD(GET_EA, PRED, SIZE, SIGN) \
+    do { \
+        TCGv LSB = tcg_temp_local_new(); \
+        TCGLabel *label = gen_new_label(); \
+        GET_EA; \
+        PRED;  \
+        PRED_LOAD_CANCEL(LSB, EA); \
+        tcg_gen_movi_tl(RdV, 0); \
+        tcg_gen_brcondi_tl(TCG_COND_EQ, LSB, 0, label); \
+            fLOAD(1, SIZE, SIGN, EA, RdV); \
+        gen_set_label(label); \
+        tcg_temp_free(LSB); \
+    } while (0)
+
+#define fWRAP_L2_ploadrubt_io(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_RI(RsV, uiV), fLSBOLD(PtV), 1, u)
+#define fWRAP_L2_ploadrubt_pi(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(GET_EA_pi, fLSBOLD(PtV), 1, u)
+#define fWRAP_L2_ploadrubf_io(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_RI(RsV, uiV), fLSBOLDNOT(PtV), 1, u)
+#define fWRAP_L2_ploadrubf_pi(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(GET_EA_pi, fLSBOLDNOT(PtV), 1, u)
+#define fWRAP_L2_ploadrubtnew_io(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_RI(RsV, uiV), fLSBNEW(PtN), 1, u)
+#define fWRAP_L2_ploadrubfnew_io(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_RI(RsV, uiV), fLSBNEWNOT(PtN), 1, u)
+#define fWRAP_L4_ploadrubt_rr(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_RRs(RsV, RtV, uiV), fLSBOLD(PvV), 1, u)
+#define fWRAP_L4_ploadrubf_rr(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_RRs(RsV, RtV, uiV), fLSBOLDNOT(PvV), 1, u)
+#define fWRAP_L4_ploadrubtnew_rr(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_RRs(RsV, RtV, uiV), fLSBNEW(PvN), 1, u)
+#define fWRAP_L4_ploadrubfnew_rr(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_RRs(RsV, RtV, uiV), fLSBNEWNOT(PvN), 1, u)
+#define fWRAP_L2_ploadrubtnew_pi(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(GET_EA_pi, fLSBNEW(PtN), 1, u)
+#define fWRAP_L2_ploadrubfnew_pi(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(GET_EA_pi, fLSBNEWNOT(PtN), 1, u)
+#define fWRAP_L4_ploadrubt_abs(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_IMM(uiV), fLSBOLD(PtV), 1, u)
+#define fWRAP_L4_ploadrubf_abs(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_IMM(uiV), fLSBOLDNOT(PtV), 1, u)
+#define fWRAP_L4_ploadrubtnew_abs(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_IMM(uiV), fLSBNEW(PtN), 1, u)
+#define fWRAP_L4_ploadrubfnew_abs(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_IMM(uiV), fLSBNEWNOT(PtN), 1, u)
+#define fWRAP_L2_ploadrbt_io(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_RI(RsV, uiV), fLSBOLD(PtV), 1, s)
+#define fWRAP_L2_ploadrbt_pi(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(GET_EA_pi, fLSBOLD(PtV), 1, s)
+#define fWRAP_L2_ploadrbf_io(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_RI(RsV, uiV), fLSBOLDNOT(PtV), 1, s)
+#define fWRAP_L2_ploadrbf_pi(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(GET_EA_pi, fLSBOLDNOT(PtV), 1, s)
+#define fWRAP_L2_ploadrbtnew_io(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_RI(RsV, uiV), fLSBNEW(PtN), 1, s)
+#define fWRAP_L2_ploadrbfnew_io(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_RI(RsV, uiV), fLSBNEWNOT(PtN), 1, s)
+#define fWRAP_L4_ploadrbt_rr(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_RRs(RsV, RtV, uiV), fLSBOLD(PvV), 1, s)
+#define fWRAP_L4_ploadrbf_rr(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_RRs(RsV, RtV, uiV), fLSBOLDNOT(PvV), 1, s)
+#define fWRAP_L4_ploadrbtnew_rr(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_RRs(RsV, RtV, uiV), fLSBNEW(PvN), 1, s)
+#define fWRAP_L4_ploadrbfnew_rr(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_RRs(RsV, RtV, uiV), fLSBNEWNOT(PvN), 1, s)
+#define fWRAP_L2_ploadrbtnew_pi(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(GET_EA_pi, fLSBNEW(PtN), 1, s)
+#define fWRAP_L2_ploadrbfnew_pi(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD({ fEA_REG(RxV); fPM_I(RxV, siV); }, fLSBNEWNOT(PtN), 1, s)
+#define fWRAP_L4_ploadrbt_abs(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_IMM(uiV), fLSBOLD(PtV), 1, s)
+#define fWRAP_L4_ploadrbf_abs(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_IMM(uiV), fLSBOLDNOT(PtV), 1, s)
+#define fWRAP_L4_ploadrbtnew_abs(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_IMM(uiV), fLSBNEW(PtN), 1, s)
+#define fWRAP_L4_ploadrbfnew_abs(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_IMM(uiV), fLSBNEWNOT(PtN), 1, s)
+
+#define fWRAP_L2_ploadruht_io(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_RI(RsV, uiV), fLSBOLD(PtV), 2, u)
+#define fWRAP_L2_ploadruht_pi(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(GET_EA_pi, fLSBOLD(PtV), 2, u)
+#define fWRAP_L2_ploadruhf_io(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_RI(RsV, uiV), fLSBOLDNOT(PtV), 2, u)
+#define fWRAP_L2_ploadruhf_pi(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(GET_EA_pi, fLSBOLDNOT(PtV), 2, u)
+#define fWRAP_L2_ploadruhtnew_io(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_RI(RsV, uiV), fLSBNEW(PtN), 2, u)
+#define fWRAP_L2_ploadruhfnew_io(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_RI(RsV, uiV), fLSBNEWNOT(PtN), 2, u)
+#define fWRAP_L4_ploadruht_rr(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_RRs(RsV, RtV, uiV), fLSBOLD(PvV), 2, u)
+#define fWRAP_L4_ploadruhf_rr(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_RRs(RsV, RtV, uiV), fLSBOLDNOT(PvV), 2, u)
+#define fWRAP_L4_ploadruhtnew_rr(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_RRs(RsV, RtV, uiV), fLSBNEW(PvN), 2, u)
+#define fWRAP_L4_ploadruhfnew_rr(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_RRs(RsV, RtV, uiV), fLSBNEWNOT(PvN), 2, u)
+#define fWRAP_L2_ploadruhtnew_pi(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(GET_EA_pi, fLSBNEW(PtN), 2, u)
+#define fWRAP_L2_ploadruhfnew_pi(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(GET_EA_pi, fLSBNEWNOT(PtN), 2, u)
+#define fWRAP_L4_ploadruht_abs(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_IMM(uiV), fLSBOLD(PtV), 2, u)
+#define fWRAP_L4_ploadruhf_abs(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_IMM(uiV), fLSBOLDNOT(PtV), 2, u)
+#define fWRAP_L4_ploadruhtnew_abs(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_IMM(uiV), fLSBNEW(PtN), 2, u)
+#define fWRAP_L4_ploadruhfnew_abs(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_IMM(uiV), fLSBNEWNOT(PtN), 2, u)
+#define fWRAP_L2_ploadrht_io(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_RI(RsV, uiV), fLSBOLD(PtV), 2, s)
+#define fWRAP_L2_ploadrht_pi(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(GET_EA_pi, fLSBOLD(PtV), 2, s)
+#define fWRAP_L2_ploadrhf_io(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_RI(RsV, uiV), fLSBOLDNOT(PtV), 2, s)
+#define fWRAP_L2_ploadrhf_pi(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(GET_EA_pi, fLSBOLDNOT(PtV), 2, s)
+#define fWRAP_L2_ploadrhtnew_io(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_RI(RsV, uiV), fLSBNEW(PtN), 2, s)
+#define fWRAP_L2_ploadrhfnew_io(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_RI(RsV, uiV), fLSBNEWNOT(PtN), 2, s)
+#define fWRAP_L4_ploadrht_rr(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_RRs(RsV, RtV, uiV), fLSBOLD(PvV), 2, s)
+#define fWRAP_L4_ploadrhf_rr(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_RRs(RsV, RtV, uiV), fLSBOLDNOT(PvV), 2, s)
+#define fWRAP_L4_ploadrhtnew_rr(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_RRs(RsV, RtV, uiV), fLSBNEW(PvN), 2, s)
+#define fWRAP_L4_ploadrhfnew_rr(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_RRs(RsV, RtV, uiV), fLSBNEWNOT(PvN), 2, s)
+#define fWRAP_L2_ploadrhtnew_pi(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(GET_EA_pi, fLSBNEW(PtN), 2, s)
+#define fWRAP_L2_ploadrhfnew_pi(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(GET_EA_pi, fLSBNEWNOT(PtN), 2, s)
+#define fWRAP_L4_ploadrht_abs(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_IMM(uiV), fLSBOLD(PtV), 2, s)
+#define fWRAP_L4_ploadrhf_abs(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_IMM(uiV), fLSBOLDNOT(PtV), 2, s)
+#define fWRAP_L4_ploadrhtnew_abs(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_IMM(uiV), fLSBNEW(PtN), 2, s)
+#define fWRAP_L4_ploadrhfnew_abs(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_IMM(uiV), fLSBNEWNOT(PtN), 2, s)
+
+#define fWRAP_L2_ploadrit_io(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_RI(RsV, uiV), fLSBOLD(PtV), 4, u)
+#define fWRAP_L2_ploadrit_pi(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(GET_EA_pi, fLSBOLD(PtV), 4, u)
+#define fWRAP_L2_ploadrif_io(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_RI(RsV, uiV), fLSBOLDNOT(PtV), 4, u)
+#define fWRAP_L2_ploadrif_pi(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(GET_EA_pi, fLSBOLDNOT(PtV), 4, u)
+#define fWRAP_L2_ploadritnew_io(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_RI(RsV, uiV), fLSBNEW(PtN), 4, u)
+#define fWRAP_L2_ploadrifnew_io(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_RI(RsV, uiV), fLSBNEWNOT(PtN), 4, u)
+#define fWRAP_L4_ploadrit_rr(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_RRs(RsV, RtV, uiV), fLSBOLD(PvV), 4, u)
+#define fWRAP_L4_ploadrif_rr(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_RRs(RsV, RtV, uiV), fLSBOLDNOT(PvV), 4, u)
+#define fWRAP_L4_ploadritnew_rr(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_RRs(RsV, RtV, uiV), fLSBNEW(PvN), 4, u)
+#define fWRAP_L4_ploadrifnew_rr(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_RRs(RsV, RtV, uiV), fLSBNEWNOT(PvN), 4, u)
+#define fWRAP_L2_ploadritnew_pi(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(GET_EA_pi, fLSBNEW(PtN), 4, u)
+#define fWRAP_L2_ploadrifnew_pi(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(GET_EA_pi, fLSBNEWNOT(PtN), 4, u)
+#define fWRAP_L4_ploadrit_abs(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_IMM(uiV), fLSBOLD(PtV), 4, u)
+#define fWRAP_L4_ploadrif_abs(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_IMM(uiV), fLSBOLDNOT(PtV), 4, u)
+#define fWRAP_L4_ploadritnew_abs(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_IMM(uiV), fLSBNEW(PtN), 4, u)
+#define fWRAP_L4_ploadrifnew_abs(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD(fEA_IMM(uiV), fLSBNEWNOT(PtN), 4, u)
+
+/* Predicated loads into a register pair */
+#define fWRAP_PRED_LOAD_PAIR(GET_EA, PRED) \
+    do { \
+        TCGv LSB = tcg_temp_local_new(); \
+        TCGLabel *label = gen_new_label(); \
+        GET_EA; \
+        PRED;  \
+        PRED_LOAD_CANCEL(LSB, EA); \
+        tcg_gen_movi_i64(RddV, 0); \
+        tcg_gen_brcondi_tl(TCG_COND_EQ, LSB, 0, label); \
+            fLOAD(1, 8, u, EA, RddV); \
+        gen_set_label(label); \
+        tcg_temp_free(LSB); \
+    } while (0)
+
+#define fWRAP_L2_ploadrdt_io(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD_PAIR(fEA_RI(RsV, uiV), fLSBOLD(PtV))
+#define fWRAP_L2_ploadrdt_pi(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD_PAIR(GET_EA_pi, fLSBOLD(PtV))
+#define fWRAP_L2_ploadrdf_io(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD_PAIR(fEA_RI(RsV, uiV), fLSBOLDNOT(PtV))
+#define fWRAP_L2_ploadrdf_pi(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD_PAIR(GET_EA_pi, fLSBOLDNOT(PtV))
+#define fWRAP_L2_ploadrdtnew_io(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD_PAIR(fEA_RI(RsV, uiV), fLSBNEW(PtN))
+#define fWRAP_L2_ploadrdfnew_io(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD_PAIR(fEA_RI(RsV, uiV), fLSBNEWNOT(PtN))
+#define fWRAP_L4_ploadrdt_rr(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD_PAIR(fEA_RRs(RsV, RtV, uiV), fLSBOLD(PvV))
+#define fWRAP_L4_ploadrdf_rr(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD_PAIR(fEA_RRs(RsV, RtV, uiV), fLSBOLDNOT(PvV))
+#define fWRAP_L4_ploadrdtnew_rr(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD_PAIR(fEA_RRs(RsV, RtV, uiV), fLSBNEW(PvN))
+#define fWRAP_L4_ploadrdfnew_rr(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD_PAIR(fEA_RRs(RsV, RtV, uiV), fLSBNEWNOT(PvN))
+#define fWRAP_L2_ploadrdtnew_pi(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD_PAIR(GET_EA_pi, fLSBNEW(PtN))
+#define fWRAP_L2_ploadrdfnew_pi(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD_PAIR(GET_EA_pi, fLSBNEWNOT(PtN))
+#define fWRAP_L4_ploadrdt_abs(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD_PAIR(fEA_IMM(uiV), fLSBOLD(PtV))
+#define fWRAP_L4_ploadrdf_abs(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD_PAIR(fEA_IMM(uiV), fLSBOLDNOT(PtV))
+#define fWRAP_L4_ploadrdtnew_abs(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD_PAIR(fEA_IMM(uiV), fLSBNEW(PtN))
+#define fWRAP_L4_ploadrdfnew_abs(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_LOAD_PAIR(fEA_IMM(uiV), fLSBNEWNOT(PtN))
+
 #endif
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 42/67] Hexagon TCG generation - step 04
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (40 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 41/67] Hexagon TCG generation - step 03 Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-02-28 16:43 ` [RFC PATCH v2 43/67] Hexagon TCG generation - step 05 Taylor Simpson
                   ` (25 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Override store instructions

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/helper_overrides.h | 241 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 241 insertions(+)

diff --git a/target/hexagon/helper_overrides.h b/target/hexagon/helper_overrides.h
index 20d8584..fdcc517 100644
--- a/target/hexagon/helper_overrides.h
+++ b/target/hexagon/helper_overrides.h
@@ -636,4 +636,245 @@
 #define fWRAP_L4_ploadrdfnew_abs(GENHLPR, SHORTCODE) \
     fWRAP_PRED_LOAD_PAIR(fEA_IMM(uiV), fLSBNEWNOT(PtN))
 
+/* load-locked and store-locked */
+#define fWRAP_L2_loadw_locked(GENHLPR, SHORTCODE) \
+    SHORTCODE
+#define fWRAP_L4_loadd_locked(GENHLPR, SHORTCODE) \
+    SHORTCODE
+#define fWRAP_S2_storew_locked(GENHLPR, SHORTCODE) \
+    do { SHORTCODE; READ_PREG(PdV, PdN); } while (0)
+#define fWRAP_S4_stored_locked(GENHLPR, SHORTCODE) \
+    do { SHORTCODE; READ_PREG(PdV, PdN); } while (0)
+
+#define fWRAP_STORE(SHORTCODE) \
+    do { \
+        TCGv HALF = tcg_temp_new(); \
+        TCGv BYTE = tcg_temp_new(); \
+        TCGv NEWREG_ST = tcg_temp_new(); \
+        TCGv tmp = tcg_temp_new(); \
+        SHORTCODE; \
+        tcg_temp_free(HALF); \
+        tcg_temp_free(BYTE); \
+        tcg_temp_free(NEWREG_ST); \
+        tcg_temp_free(tmp); \
+    } while (0)
+
+#define fWRAP_STORE_ap(STORE) \
+    do { \
+        TCGv HALF = tcg_temp_new(); \
+        TCGv BYTE = tcg_temp_new(); \
+        TCGv NEWREG_ST = tcg_temp_new(); \
+        { \
+            fEA_IMM(UiV); \
+            STORE; \
+            tcg_gen_movi_tl(ReV, UiV); \
+        } \
+        tcg_temp_free(HALF); \
+        tcg_temp_free(BYTE); \
+        tcg_temp_free(NEWREG_ST); \
+    } while (0)
+
+#define fWRAP_STORE_pcr(SHIFT, STORE) \
+    do { \
+        TCGv ireg = tcg_temp_new(); \
+        TCGv HALF = tcg_temp_new(); \
+        TCGv BYTE = tcg_temp_new(); \
+        TCGv NEWREG_ST = tcg_temp_new(); \
+        TCGv tmp = tcg_temp_new(); \
+        fEA_REG(RxV); \
+        fPM_CIRR(RxV, fREAD_IREG(MuV, SHIFT), MuV); \
+        STORE; \
+        tcg_temp_free(ireg); \
+        tcg_temp_free(HALF); \
+        tcg_temp_free(BYTE); \
+        tcg_temp_free(NEWREG_ST); \
+        tcg_temp_free(tmp); \
+    } while (0)
+
+#define fWRAP_S2_storerb_io(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_S2_storerb_pi(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_S4_storerb_ap(GENHLPR, SHORTCODE) \
+    fWRAP_STORE_ap(fSTORE(1, 1, EA, fGETBYTE(0, RtV)))
+#define fWRAP_S2_storerb_pr(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_S4_storerb_ur(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_S2_storerb_pbr(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_S2_storerb_pci(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_S2_storerb_pcr(GENHLPR, SHORTCODE) \
+    fWRAP_STORE_pcr(0, fSTORE(1, 1, EA, fGETBYTE(0, RtV)))
+#define fWRAP_S4_storerb_rr(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_S4_storerbnew_rr(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_S4_storeirb_io(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_S2_storerbgp(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_SS1_storeb_io(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_SS2_storebi0(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+
+#define fWRAP_S2_storerh_io(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_S2_storerh_pi(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_S4_storerh_ap(GENHLPR, SHORTCODE) \
+    fWRAP_STORE_ap(fSTORE(1, 2, EA, fGETHALF(0, RtV)))
+#define fWRAP_S2_storerh_pr(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_S4_storerh_ur(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_S2_storerh_pbr(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_S2_storerh_pci(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_S2_storerh_pcr(GENHLPR, SHORTCODE) \
+    fWRAP_STORE_pcr(1, fSTORE(1, 2, EA, fGETHALF(0, RtV)))
+#define fWRAP_S4_storerh_rr(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_S4_storeirh_io(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_S2_storerhgp(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_SS2_storeh_io(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+
+#define fWRAP_S2_storerf_io(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_S2_storerf_pi(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_S4_storerf_ap(GENHLPR, SHORTCODE) \
+    fWRAP_STORE_ap(fSTORE(1, 2, EA, fGETHALF(1, RtV)))
+#define fWRAP_S2_storerf_pr(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_S4_storerf_ur(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_S2_storerf_pbr(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_S2_storerf_pci(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_S2_storerf_pcr(GENHLPR, SHORTCODE) \
+    fWRAP_STORE_pcr(1, fSTORE(1, 2, EA, fGETHALF(1, RtV)))
+#define fWRAP_S4_storerf_rr(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_S2_storerfgp(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+
+#define fWRAP_S2_storeri_io(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_S2_storeri_pi(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_S4_storeri_ap(GENHLPR, SHORTCODE) \
+    fWRAP_STORE_ap(fSTORE(1, 4, EA, RtV))
+#define fWRAP_S2_storeri_pr(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_S4_storeri_ur(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_S2_storeri_pbr(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_S2_storeri_pci(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_S2_storeri_pcr(GENHLPR, SHORTCODE) \
+    fWRAP_STORE_pcr(2, fSTORE(1, 4, EA, RtV))
+#define fWRAP_S4_storeri_rr(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_S4_storerinew_rr(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_S4_storeiri_io(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_S2_storerigp(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_SS1_storew_io(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_SS2_storew_sp(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_SS2_storewi0(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+
+#define fWRAP_S2_storerd_io(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_S2_storerd_pi(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_S4_storerd_ap(GENHLPR, SHORTCODE) \
+    fWRAP_STORE_ap(fSTORE(1, 8, EA, RttV))
+#define fWRAP_S2_storerd_pr(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_S4_storerd_ur(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_S2_storerd_pbr(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_S2_storerd_pci(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_S2_storerd_pcr(GENHLPR, SHORTCODE) \
+    fWRAP_STORE_pcr(3, fSTORE(1, 8, EA, RttV))
+#define fWRAP_S4_storerd_rr(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_S2_storerdgp(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_SS2_stored_sp(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+
+#define fWRAP_S2_storerbnew_io(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_S2_storerbnew_pi(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_S4_storerbnew_ap(GENHLPR, SHORTCODE) \
+    fWRAP_STORE_ap(fSTORE(1, 1, EA, fGETBYTE(0, fNEWREG_ST(NtN))))
+#define fWRAP_S2_storerbnew_pr(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_S4_storerbnew_ur(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_S2_storerbnew_pbr(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_S2_storerbnew_pci(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_S2_storerbnew_pcr(GENHLPR, SHORTCODE) \
+    fWRAP_STORE_pcr(0, fSTORE(1, 1, EA, fGETBYTE(0, fNEWREG_ST(NtN))))
+#define fWRAP_S2_storerbnewgp(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+
+#define fWRAP_S2_storerhnew_io(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_S2_storerhnew_pi(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_S4_storerhnew_ap(GENHLPR, SHORTCODE) \
+    fWRAP_STORE_ap(fSTORE(1, 2, EA, fGETHALF(0, fNEWREG_ST(NtN))))
+#define fWRAP_S2_storerhnew_pr(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_S4_storerhnew_ur(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_S2_storerhnew_pbr(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_S2_storerhnew_pci(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_S2_storerhnew_pcr(GENHLPR, SHORTCODE) \
+    fWRAP_STORE_pcr(1, fSTORE(1, 2, EA, fGETHALF(0, fNEWREG_ST(NtN))))
+#define fWRAP_S2_storerhnewgp(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+
+#define fWRAP_S2_storerinew_io(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_S2_storerinew_pi(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_S4_storerinew_ap(GENHLPR, SHORTCODE) \
+    fWRAP_STORE_ap(fSTORE(1, 4, EA, fNEWREG_ST(NtN)))
+#define fWRAP_S2_storerinew_pr(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_S4_storerinew_ur(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_S2_storerinew_pbr(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_S2_storerinew_pci(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+#define fWRAP_S2_storerinew_pcr(GENHLPR, SHORTCODE) \
+    fWRAP_STORE_pcr(2, fSTORE(1, 4, EA, fNEWREG_ST(NtN)))
+#define fWRAP_S2_storerinewgp(GENHLPR, SHORTCODE) \
+    fWRAP_STORE(SHORTCODE)
+
 #endif
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 43/67] Hexagon TCG generation - step 05
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (41 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 42/67] Hexagon TCG generation - step 04 Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-02-28 16:43 ` [RFC PATCH v2 44/67] Hexagon TCG generation - step 06 Taylor Simpson
                   ` (24 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Override predicated store instructions

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/helper_overrides.h | 54 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 54 insertions(+)

diff --git a/target/hexagon/helper_overrides.h b/target/hexagon/helper_overrides.h
index fdcc517..a6cbce0 100644
--- a/target/hexagon/helper_overrides.h
+++ b/target/hexagon/helper_overrides.h
@@ -877,4 +877,58 @@
 #define fWRAP_S2_storerinewgp(GENHLPR, SHORTCODE) \
     fWRAP_STORE(SHORTCODE)
 
+/* Predicated stores */
+#define fWRAP_PRED_STORE(GET_EA, PRED, SRC, SIZE, INC) \
+    do { \
+        TCGv LSB = tcg_temp_local_new(); \
+        TCGv NEWREG_ST = tcg_temp_local_new(); \
+        TCGv BYTE = tcg_temp_local_new(); \
+        TCGv HALF = tcg_temp_local_new(); \
+        TCGLabel *label = gen_new_label(); \
+        GET_EA; \
+        PRED;  \
+        PRED_STORE_CANCEL(LSB, EA); \
+        tcg_gen_brcondi_tl(TCG_COND_EQ, LSB, 0, label); \
+            INC; \
+            fSTORE(1, SIZE, EA, SRC); \
+        gen_set_label(label); \
+        tcg_temp_free(LSB); \
+        tcg_temp_free(NEWREG_ST); \
+        tcg_temp_free(BYTE); \
+        tcg_temp_free(HALF); \
+    } while (0)
+
+#define NOINC    do {} while (0)
+
+#define fWRAP_S4_pstorerinewfnew_rr(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_STORE(fEA_RRs(RsV, RuV, uiV), fLSBNEWNOT(PvN), \
+                     hex_new_value[NtX], 4, NOINC)
+#define fWRAP_S2_pstorerdtnew_pi(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_STORE(fEA_REG(RxV), fLSBNEW(PvN), \
+                     RttV, 8, tcg_gen_addi_tl(RxV, RxV, IMMNO(0)))
+#define fWRAP_S4_pstorerdtnew_io(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_STORE(fEA_RI(RsV, uiV), fLSBNEW(PvN), \
+                     RttV, 8, NOINC)
+#define fWRAP_S4_pstorerbtnew_io(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_STORE(fEA_RI(RsV, uiV), fLSBNEW(PvN), \
+                     fGETBYTE(0, RtV), 1, NOINC)
+#define fWRAP_S2_pstorerhtnew_pi(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_STORE(fEA_REG(RxV), fLSBNEW(PvN), \
+                     fGETHALF(0, RtV), 2, tcg_gen_addi_tl(RxV, RxV, IMMNO(0)))
+#define fWRAP_S2_pstoreritnew_pi(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_STORE(fEA_REG(RxV), fLSBNEW(PvN), \
+                     RtV, 4, tcg_gen_addi_tl(RxV, RxV, IMMNO(0)))
+#define fWRAP_S2_pstorerif_io(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_STORE(fEA_RI(RsV, uiV), fLSBOLDNOT(PvV), \
+                     RtV, 4, NOINC)
+#define fWRAP_S4_pstorerit_abs(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_STORE(fEA_IMM(uiV), fLSBOLD(PvV), \
+                     RtV, 4, NOINC)
+#define fWRAP_S2_pstorerinewf_io(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_STORE(fEA_RI(RsV, uiV), fLSBOLDNOT(PvV), \
+                     hex_new_value[NtX], 4, NOINC)
+#define fWRAP_S4_pstorerbnewfnew_abs(GENHLPR, SHORTCODE) \
+    fWRAP_PRED_STORE(fEA_IMM(uiV), fLSBNEWNOT(PvN), \
+                     fGETBYTE(0, hex_new_value[NtX]), 1, NOINC)
+
 #endif
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 44/67] Hexagon TCG generation - step 06
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (42 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 43/67] Hexagon TCG generation - step 05 Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-02-28 16:43 ` [RFC PATCH v2 45/67] Hexagon TCG generation - step 07 Taylor Simpson
                   ` (23 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Override memop instructions

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/helper_overrides.h | 60 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 60 insertions(+)

diff --git a/target/hexagon/helper_overrides.h b/target/hexagon/helper_overrides.h
index a6cbce0..00647cb 100644
--- a/target/hexagon/helper_overrides.h
+++ b/target/hexagon/helper_overrides.h
@@ -931,4 +931,64 @@
     fWRAP_PRED_STORE(fEA_IMM(uiV), fLSBNEWNOT(PvN), \
                      fGETBYTE(0, hex_new_value[NtX]), 1, NOINC)
 
+/* We have to brute force memops because they have C math in the semantics */
+#define fWRAP_MEMOP(GENHLPR, SHORTCODE, SIZE, OP) \
+    do { \
+        TCGv tmp = tcg_temp_new(); \
+        fEA_RI(RsV, uiV); \
+        fLOAD(1, SIZE, s, EA, tmp); \
+        OP; \
+        fSTORE(1, SIZE, EA, tmp); \
+        tcg_temp_free(tmp); \
+    } while (0)
+
+#define fWRAP_L4_add_memopw_io(GENHLPR, SHORTCODE) \
+    fWRAP_MEMOP(GENHLPR, SHORTCODE, 4, tcg_gen_add_tl(tmp, tmp, RtV))
+#define fWRAP_L4_add_memopb_io(GENHLPR, SHORTCODE) \
+    fWRAP_MEMOP(GENHLPR, SHORTCODE, 1, tcg_gen_add_tl(tmp, tmp, RtV))
+#define fWRAP_L4_add_memoph_io(GENHLPR, SHORTCODE) \
+    fWRAP_MEMOP(GENHLPR, SHORTCODE, 2, tcg_gen_add_tl(tmp, tmp, RtV))
+#define fWRAP_L4_sub_memopw_io(GENHLPR, SHORTCODE) \
+    fWRAP_MEMOP(GENHLPR, SHORTCODE, 4, tcg_gen_sub_tl(tmp, tmp, RtV))
+#define fWRAP_L4_sub_memopb_io(GENHLPR, SHORTCODE) \
+    fWRAP_MEMOP(GENHLPR, SHORTCODE, 1, tcg_gen_sub_tl(tmp, tmp, RtV))
+#define fWRAP_L4_sub_memoph_io(GENHLPR, SHORTCODE) \
+    fWRAP_MEMOP(GENHLPR, SHORTCODE, 2, tcg_gen_sub_tl(tmp, tmp, RtV))
+#define fWRAP_L4_and_memopw_io(GENHLPR, SHORTCODE) \
+    fWRAP_MEMOP(GENHLPR, SHORTCODE, 4, tcg_gen_and_tl(tmp, tmp, RtV))
+#define fWRAP_L4_and_memopb_io(GENHLPR, SHORTCODE) \
+    fWRAP_MEMOP(GENHLPR, SHORTCODE, 1, tcg_gen_and_tl(tmp, tmp, RtV))
+#define fWRAP_L4_and_memoph_io(GENHLPR, SHORTCODE) \
+    fWRAP_MEMOP(GENHLPR, SHORTCODE, 2, tcg_gen_and_tl(tmp, tmp, RtV))
+#define fWRAP_L4_or_memopw_io(GENHLPR, SHORTCODE) \
+    fWRAP_MEMOP(GENHLPR, SHORTCODE, 4, tcg_gen_or_tl(tmp, tmp, RtV))
+#define fWRAP_L4_or_memopb_io(GENHLPR, SHORTCODE) \
+    fWRAP_MEMOP(GENHLPR, SHORTCODE, 1, tcg_gen_or_tl(tmp, tmp, RtV))
+#define fWRAP_L4_or_memoph_io(GENHLPR, SHORTCODE) \
+    fWRAP_MEMOP(GENHLPR, SHORTCODE, 2, tcg_gen_or_tl(tmp, tmp, RtV))
+#define fWRAP_L4_iadd_memopw_io(GENHLPR, SHORTCODE) \
+    fWRAP_MEMOP(GENHLPR, SHORTCODE, 4, tcg_gen_addi_tl(tmp, tmp, UiV))
+#define fWRAP_L4_iadd_memopb_io(GENHLPR, SHORTCODE) \
+    fWRAP_MEMOP(GENHLPR, SHORTCODE, 1, tcg_gen_addi_tl(tmp, tmp, UiV))
+#define fWRAP_L4_iadd_memoph_io(GENHLPR, SHORTCODE) \
+    fWRAP_MEMOP(GENHLPR, SHORTCODE, 2, tcg_gen_addi_tl(tmp, tmp, UiV))
+#define fWRAP_L4_isub_memopw_io(GENHLPR, SHORTCODE) \
+    fWRAP_MEMOP(GENHLPR, SHORTCODE, 4, tcg_gen_subi_tl(tmp, tmp, UiV))
+#define fWRAP_L4_isub_memopb_io(GENHLPR, SHORTCODE) \
+    fWRAP_MEMOP(GENHLPR, SHORTCODE, 1, tcg_gen_subi_tl(tmp, tmp, UiV))
+#define fWRAP_L4_isub_memoph_io(GENHLPR, SHORTCODE) \
+    fWRAP_MEMOP(GENHLPR, SHORTCODE, 2, tcg_gen_subi_tl(tmp, tmp, UiV))
+#define fWRAP_L4_iand_memopw_io(GENHLPR, SHORTCODE) \
+    fWRAP_MEMOP(GENHLPR, SHORTCODE, 4, tcg_gen_andi_tl(tmp, tmp, ~(1 << UiV)))
+#define fWRAP_L4_iand_memopb_io(GENHLPR, SHORTCODE) \
+    fWRAP_MEMOP(GENHLPR, SHORTCODE, 1, tcg_gen_andi_tl(tmp, tmp, ~(1 << UiV)))
+#define fWRAP_L4_iand_memoph_io(GENHLPR, SHORTCODE) \
+    fWRAP_MEMOP(GENHLPR, SHORTCODE, 2, tcg_gen_andi_tl(tmp, tmp, ~(1 << UiV)))
+#define fWRAP_L4_ior_memopw_io(GENHLPR, SHORTCODE) \
+    fWRAP_MEMOP(GENHLPR, SHORTCODE, 4, tcg_gen_ori_tl(tmp, tmp, 1 << UiV))
+#define fWRAP_L4_ior_memopb_io(GENHLPR, SHORTCODE) \
+    fWRAP_MEMOP(GENHLPR, SHORTCODE, 1, tcg_gen_ori_tl(tmp, tmp, 1 << UiV))
+#define fWRAP_L4_ior_memoph_io(GENHLPR, SHORTCODE) \
+    fWRAP_MEMOP(GENHLPR, SHORTCODE, 2, tcg_gen_ori_tl(tmp, tmp, 1 << UiV))
+
 #endif
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 45/67] Hexagon TCG generation - step 07
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (43 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 44/67] Hexagon TCG generation - step 06 Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-02-28 16:43 ` [RFC PATCH v2 46/67] Hexagon TCG generation - step 08 Taylor Simpson
                   ` (22 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Override dczeroa, allocframe, and return instructions

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/helper_overrides.h | 209 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 209 insertions(+)

diff --git a/target/hexagon/helper_overrides.h b/target/hexagon/helper_overrides.h
index 00647cb..1ac363e 100644
--- a/target/hexagon/helper_overrides.h
+++ b/target/hexagon/helper_overrides.h
@@ -991,4 +991,213 @@
 #define fWRAP_L4_ior_memoph_io(GENHLPR, SHORTCODE) \
     fWRAP_MEMOP(GENHLPR, SHORTCODE, 2, tcg_gen_ori_tl(tmp, tmp, 1 << UiV))
 
+/* dczeroa clears the 32 byte cache line at the address given */
+#define fWRAP_Y2_dczeroa(GENHLPR, SHORTCODE) SHORTCODE
+
+/* We have to brute force allocframe because it has C math in the semantics */
+#define fWRAP_S2_allocframe(GENHLPR, SHORTCODE) \
+    do { \
+        TCGv_i64 scramble_tmp = tcg_temp_new_i64(); \
+        TCGv tmp = tcg_temp_new(); \
+        { fEA_RI(RxV, -8); \
+          fSTORE(1, 8, EA, fFRAME_SCRAMBLE((fCAST8_8u(fREAD_LR()) << 32) | \
+                                           fCAST4_4u(fREAD_FP()))); \
+          fWRITE_FP(EA); \
+          fFRAMECHECK(EA - uiV, EA); \
+          tcg_gen_subi_tl(RxV, EA, uiV); \
+        } \
+        tcg_temp_free_i64(scramble_tmp); \
+        tcg_temp_free(tmp); \
+    } while (0)
+
+#define fWRAP_SS2_allocframe(GENHLPR, SHORTCODE) \
+    do { \
+        TCGv_i64 scramble_tmp = tcg_temp_new_i64(); \
+        TCGv tmp = tcg_temp_new(); \
+        { fEA_RI(fREAD_SP(), -8); \
+          fSTORE(1, 8, EA, fFRAME_SCRAMBLE((fCAST8_8u(fREAD_LR()) << 32) | \
+                                           fCAST4_4u(fREAD_FP()))); \
+          fWRITE_FP(EA); \
+          fFRAMECHECK(EA - uiV, EA); \
+          tcg_gen_subi_tl(tmp, EA, uiV); \
+          fWRITE_SP(tmp); \
+        } \
+        tcg_temp_free_i64(scramble_tmp); \
+        tcg_temp_free(tmp); \
+    } while (0)
+
+/* Also have to brute force the deallocframe variants */
+#define fWRAP_L2_deallocframe(GENHLPR, SHORTCODE) \
+    do { \
+        TCGv tmp = tcg_temp_new(); \
+        TCGv_i64 tmp_i64 = tcg_temp_new_i64(); \
+        { \
+          fEA_REG(RsV); \
+          fLOAD(1, 8, u, EA, tmp_i64); \
+          tcg_gen_mov_i64(RddV, fFRAME_UNSCRAMBLE(tmp_i64)); \
+          tcg_gen_addi_tl(tmp, EA, 8); \
+          fWRITE_SP(tmp); \
+        } \
+        tcg_temp_free(tmp); \
+        tcg_temp_free_i64(tmp_i64); \
+    } while (0)
+
+#define fWRAP_SL2_deallocframe(GENHLPR, SHORTCODE) \
+    do { \
+        TCGv WORD = tcg_temp_new(); \
+        TCGv tmp = tcg_temp_new(); \
+        TCGv_i64 tmp_i64 = tcg_temp_new_i64(); \
+        { \
+          fEA_REG(fREAD_FP()); \
+          fLOAD(1, 8, u, EA, tmp_i64); \
+          fFRAME_UNSCRAMBLE(tmp_i64); \
+          fWRITE_LR(fGETWORD(1, tmp_i64)); \
+          fWRITE_FP(fGETWORD(0, tmp_i64)); \
+          tcg_gen_addi_tl(tmp, EA, 8); \
+          fWRITE_SP(tmp); \
+        } \
+        tcg_temp_free(WORD); \
+        tcg_temp_free(tmp); \
+        tcg_temp_free_i64(tmp_i64); \
+    } while (0)
+
+#define fWRAP_L4_return(GENHLPR, SHORTCODE) \
+    do { \
+        TCGv tmp = tcg_temp_new(); \
+        TCGv_i64 tmp_i64 = tcg_temp_new_i64(); \
+        TCGv WORD = tcg_temp_new(); \
+        { \
+          fEA_REG(RsV); \
+          fLOAD(1, 8, u, EA, tmp_i64); \
+          tcg_gen_mov_i64(RddV, fFRAME_UNSCRAMBLE(tmp_i64)); \
+          tcg_gen_addi_tl(tmp, EA, 8); \
+          fWRITE_SP(tmp); \
+          fJUMPR(REG_LR, fGETWORD(1, RddV), COF_TYPE_JUMPR);\
+        } \
+        tcg_temp_free(tmp); \
+        tcg_temp_free_i64(tmp_i64); \
+        tcg_temp_free(WORD); \
+    } while (0)
+
+#define fWRAP_SL2_return(GENHLPR, SHORTCODE) \
+    do { \
+        TCGv tmp = tcg_temp_new(); \
+        TCGv_i64 tmp_i64 = tcg_temp_new_i64(); \
+        TCGv WORD = tcg_temp_new(); \
+        { \
+          fEA_REG(fREAD_FP()); \
+          fLOAD(1, 8, u, EA, tmp_i64); \
+          fFRAME_UNSCRAMBLE(tmp_i64); \
+          fWRITE_LR(fGETWORD(1, tmp_i64)); \
+          fWRITE_FP(fGETWORD(0, tmp_i64)); \
+          tcg_gen_addi_tl(tmp, EA, 8); \
+          fWRITE_SP(tmp); \
+          fJUMPR(REG_LR, fGETWORD(1, tmp_i64), COF_TYPE_JUMPR);\
+        } \
+        tcg_temp_free(tmp); \
+        tcg_temp_free_i64(tmp_i64); \
+        tcg_temp_free(WORD); \
+    } while (0)
+
+/*
+ * Conditional returns follow the same predicate naming convention as
+ * predicated loads above
+ */
+#define fWRAP_COND_RETURN(PRED) \
+    do { \
+        TCGv LSB = tcg_temp_new(); \
+        TCGv_i64 LSB_i64 = tcg_temp_new_i64(); \
+        TCGv zero = tcg_const_tl(0); \
+        TCGv_i64 zero_i64 = tcg_const_i64(0); \
+        TCGv_i64 unscramble = tcg_temp_new_i64(); \
+        TCGv WORD = tcg_temp_new(); \
+        TCGv SP = tcg_temp_new(); \
+        TCGv_i64 tmp_i64 = tcg_temp_new_i64(); \
+        TCGv tmp = tcg_temp_new(); \
+        fEA_REG(RsV); \
+        PRED; \
+        tcg_gen_extu_i32_i64(LSB_i64, LSB); \
+        fLOAD(1, 8, u, EA, tmp_i64); \
+        tcg_gen_mov_i64(unscramble, fFRAME_UNSCRAMBLE(tmp_i64)); \
+        READ_REG_PAIR(RddV, HEX_REG_FP); \
+        tcg_gen_movcond_i64(TCG_COND_NE, RddV, LSB_i64, zero_i64, \
+                            unscramble, RddV); \
+        tcg_gen_mov_tl(SP, hex_gpr[HEX_REG_SP]); \
+        tcg_gen_addi_tl(tmp, EA, 8); \
+        tcg_gen_movcond_tl(TCG_COND_NE, SP, LSB, zero, tmp, SP); \
+        fWRITE_SP(SP); \
+        gen_cond_return(LSB, fGETWORD(1, RddV)); \
+        tcg_temp_free(LSB); \
+        tcg_temp_free_i64(LSB_i64); \
+        tcg_temp_free(zero); \
+        tcg_temp_free_i64(zero_i64); \
+        tcg_temp_free_i64(unscramble); \
+        tcg_temp_free(WORD); \
+        tcg_temp_free(SP); \
+        tcg_temp_free_i64(tmp_i64); \
+        tcg_temp_free(tmp); \
+    } while (0)
+
+#define fWRAP_L4_return_t(GENHLPR, SHORTCODE) \
+    fWRAP_COND_RETURN(fLSBOLD(PvV))
+#define fWRAP_L4_return_f(GENHLPR, SHORTCODE) \
+    fWRAP_COND_RETURN(fLSBOLDNOT(PvV))
+#define fWRAP_L4_return_tnew_pt(GENHLPR, SHORTCODE) \
+    fWRAP_COND_RETURN(fLSBNEW(PvN))
+#define fWRAP_L4_return_fnew_pt(GENHLPR, SHORTCODE) \
+    fWRAP_COND_RETURN(fLSBNEWNOT(PvN))
+#define fWRAP_L4_return_tnew_pnt(GENHLPR, SHORTCODE) \
+    fWRAP_COND_RETURN(fLSBNEW(PvN))
+#define fWRAP_L4_return_tnew_pnt(GENHLPR, SHORTCODE) \
+    fWRAP_COND_RETURN(fLSBNEW(PvN))
+#define fWRAP_L4_return_fnew_pnt(GENHLPR, SHORTCODE) \
+    fWRAP_COND_RETURN(fLSBNEWNOT(PvN))
+
+#define fWRAP_COND_RETURN_SUBINSN(PRED) \
+    do { \
+        TCGv LSB = tcg_temp_new(); \
+        TCGv_i64 LSB_i64 = tcg_temp_new_i64(); \
+        TCGv zero = tcg_const_tl(0); \
+        TCGv_i64 zero_i64 = tcg_const_i64(0); \
+        TCGv_i64 unscramble = tcg_temp_new_i64(); \
+        TCGv_i64 RddV = tcg_temp_new_i64(); \
+        TCGv WORD = tcg_temp_new(); \
+        TCGv SP = tcg_temp_new(); \
+        TCGv_i64 tmp_i64 = tcg_temp_new_i64(); \
+        TCGv tmp = tcg_temp_new(); \
+        fEA_REG(fREAD_FP()); \
+        PRED; \
+        tcg_gen_extu_i32_i64(LSB_i64, LSB); \
+        fLOAD(1, 8, u, EA, tmp_i64); \
+        tcg_gen_mov_i64(unscramble, fFRAME_UNSCRAMBLE(tmp_i64)); \
+        READ_REG_PAIR(RddV, HEX_REG_FP); \
+        tcg_gen_movcond_i64(TCG_COND_NE, RddV, LSB_i64, zero_i64, \
+                            unscramble, RddV); \
+        tcg_gen_mov_tl(SP, hex_gpr[HEX_REG_SP]); \
+        tcg_gen_addi_tl(tmp, EA, 8); \
+        tcg_gen_movcond_tl(TCG_COND_NE, SP, LSB, zero, tmp, SP); \
+        fWRITE_SP(SP); \
+        WRITE_REG_PAIR(HEX_REG_FP, RddV); \
+        gen_cond_return(LSB, fGETWORD(1, RddV)); \
+        tcg_temp_free(LSB); \
+        tcg_temp_free_i64(LSB_i64); \
+        tcg_temp_free(zero); \
+        tcg_temp_free_i64(zero_i64); \
+        tcg_temp_free_i64(unscramble); \
+        tcg_temp_free_i64(RddV); \
+        tcg_temp_free(WORD); \
+        tcg_temp_free(SP); \
+        tcg_temp_free_i64(tmp_i64); \
+        tcg_temp_free(tmp); \
+    } while (0)
+
+#define fWRAP_SL2_return_t(GENHLPR, SHORTCODE) \
+    fWRAP_COND_RETURN_SUBINSN(fLSBOLD(fREAD_P0()))
+#define fWRAP_SL2_return_f(GENHLPR, SHORTCODE) \
+    fWRAP_COND_RETURN_SUBINSN(fLSBOLDNOT(fREAD_P0()))
+#define fWRAP_SL2_return_tnew(GENHLPR, SHORTCODE) \
+    fWRAP_COND_RETURN_SUBINSN(fLSBNEW0)
+#define fWRAP_SL2_return_fnew(GENHLPR, SHORTCODE) \
+    fWRAP_COND_RETURN_SUBINSN(fLSBNEW0NOT)
+
 #endif
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 46/67] Hexagon TCG generation - step 08
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (44 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 45/67] Hexagon TCG generation - step 07 Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-02-28 16:43 ` [RFC PATCH v2 47/67] Hexagon TCG generation - step 09 Taylor Simpson
                   ` (21 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Override mathematical operations with more than one definition

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/helper_overrides.h | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/target/hexagon/helper_overrides.h b/target/hexagon/helper_overrides.h
index 1ac363e..d18aea4 100644
--- a/target/hexagon/helper_overrides.h
+++ b/target/hexagon/helper_overrides.h
@@ -1200,4 +1200,34 @@
 #define fWRAP_SL2_return_fnew(GENHLPR, SHORTCODE) \
     fWRAP_COND_RETURN_SUBINSN(fLSBNEW0NOT)
 
+/*
+ * Mathematical operations with more than one definition require
+ * special handling
+ */
+/*
+ * Approximate reciprocal
+ * r3,p1 = sfrecipa(r0, r1)
+ */
+#define fWRAP_F2_sfrecipa(GENHLPR, SHORTCODE) \
+    do { \
+        gen_helper_sfrecipa_val(RdV, cpu_env, RsV, RtV);  \
+        gen_helper_sfrecipa_pred(PeV, cpu_env, RsV, RtV);  \
+    } while (0)
+
+/*
+ * Approximation of the reciprocal square root
+ * r1,p0 = sfinvsqrta(r0)
+ */
+#define fWRAP_F2_sfinvsqrta(GENHLPR, SHORTCODE) \
+    do { \
+        gen_helper_sfinvsqrta_val(RdV, cpu_env, RsV); \
+        gen_helper_sfinvsqrta_pred(PeV, cpu_env, RsV); \
+    } while (0)
+
+#define fWRAP_A5_ACS(GENHLPR, SHORTCODE) \
+    do { \
+        gen_helper_vacsh_val(RxxV, cpu_env, RxxV, RssV, RttV); \
+        gen_helper_vacsh_pred(PeV, cpu_env, RxxV, RssV, RttV); \
+    } while (0)
+
 #endif
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 47/67] Hexagon TCG generation - step 09
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (45 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 46/67] Hexagon TCG generation - step 08 Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-02-28 16:43 ` [RFC PATCH v2 48/67] Hexagon TCG generation - step 10 Taylor Simpson
                   ` (20 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Override instructions to speed up qemu

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/helper_overrides.h | 97 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 97 insertions(+)

diff --git a/target/hexagon/helper_overrides.h b/target/hexagon/helper_overrides.h
index d18aea4..5443a94e 100644
--- a/target/hexagon/helper_overrides.h
+++ b/target/hexagon/helper_overrides.h
@@ -1230,4 +1230,101 @@
         gen_helper_vacsh_pred(PeV, cpu_env, RxxV, RssV, RttV); \
     } while (0)
 
+/*
+ * The following fWRAP macros are to speed up qemu
+ * We can add more over time
+ */
+
+/*
+ * Add or subtract with carry.
+ * Predicate register is used as an extra input and output.
+ * r5:4 = add(r1:0, r3:2, p1):carry
+ */
+#define fWRAP_A4_addp_c(GENHLPR, SHORTCODE) \
+    do { \
+        TCGv LSB = tcg_temp_new(); \
+        TCGv_i64 LSB_i64 = tcg_temp_new_i64(); \
+        TCGv_i64 tmp_i64 = tcg_temp_new_i64(); \
+        TCGv tmp = tcg_temp_new(); \
+        tcg_gen_add_i64(RddV, RssV, RttV); \
+        fLSBOLD(PxV); \
+        tcg_gen_extu_i32_i64(LSB_i64, LSB); \
+        tcg_gen_add_i64(RddV, RddV, LSB_i64); \
+        fCARRY_FROM_ADD(RssV, RttV, LSB_i64); \
+        tcg_gen_extrl_i64_i32(tmp, RssV); \
+        f8BITSOF(PxV, tmp); \
+        tcg_temp_free(LSB); \
+        tcg_temp_free_i64(LSB_i64); \
+        tcg_temp_free_i64(tmp_i64); \
+        tcg_temp_free(tmp); \
+    } while (0)
+
+/* r5:4 = sub(r1:0, r3:2, p1):carry */
+#define fWRAP_A4_subp_c(GENHLPR, SHORTCODE) \
+    do { \
+        TCGv LSB = tcg_temp_new(); \
+        TCGv_i64 LSB_i64 = tcg_temp_new_i64(); \
+        TCGv_i64 tmp_i64 = tcg_temp_new_i64(); \
+        TCGv tmp = tcg_temp_new(); \
+        tcg_gen_not_i64(tmp_i64, RttV); \
+        tcg_gen_add_i64(RddV, RssV, tmp_i64); \
+        fLSBOLD(PxV); \
+        tcg_gen_extu_i32_i64(LSB_i64, LSB); \
+        tcg_gen_add_i64(RddV, RddV, LSB_i64); \
+        fCARRY_FROM_ADD(RssV, tmp_i64, LSB_i64); \
+        tcg_gen_extrl_i64_i32(tmp, RssV); \
+        f8BITSOF(PxV, tmp); \
+        tcg_temp_free(LSB); \
+        tcg_temp_free_i64(LSB_i64); \
+        tcg_temp_free_i64(tmp_i64); \
+        tcg_temp_free(tmp); \
+    } while (0)
+
+/*
+ * Compare each of the 8 unsigned bytes
+ * The minimum is places in each byte of the destination.
+ * Each bit of the predicate is set true if the bit from the first operand
+ * is greater than the bit from the second operand.
+ * r5:4,p1 = vminub(r1:0, r3:2)
+ */
+#define fWRAP_A6_vminub_RdP(GENHLPR, SHORTCODE) \
+    do { \
+        TCGv BYTE = tcg_temp_new(); \
+        TCGv left = tcg_temp_new(); \
+        TCGv right = tcg_temp_new(); \
+        TCGv tmp = tcg_temp_new(); \
+        int i; \
+        tcg_gen_movi_tl(PeV, 0); \
+        tcg_gen_movi_i64(RddV, 0); \
+        for (i = 0; i < 8; i++) { \
+            fGETUBYTE(i, RttV); \
+            tcg_gen_mov_tl(left, BYTE); \
+            fGETUBYTE(i, RssV); \
+            tcg_gen_mov_tl(right, BYTE); \
+            tcg_gen_setcond_tl(TCG_COND_GT, tmp, left, right); \
+            fSETBIT(i, PeV, tmp); \
+            fMIN(tmp, left, right); \
+            fSETBYTE(i, RddV, tmp); \
+        } \
+        tcg_temp_free(BYTE); \
+        tcg_temp_free(left); \
+        tcg_temp_free(right); \
+        tcg_temp_free(tmp); \
+    } while (0)
+
+#define fWRAP_J2_call(GENHLPR, SHORTCODE) \
+    gen_call(riV)
+#define fWRAP_J2_callr(GENHLPR, SHORTCODE) \
+    gen_callr(RsV)
+
+#define fWRAP_J2_loop0r(GENHLPR, SHORTCODE) \
+    gen_loop0r(RsV, riV, insn)
+#define fWRAP_J2_loop1r(GENHLPR, SHORTCODE) \
+    gen_loop1r(RsV, riV, insn)
+
+#define fWRAP_J2_endloop0(GENHLPR, SHORTCODE) \
+    gen_endloop0()
+#define fWRAP_J2_endloop1(GENHLPR, SHORTCODE) \
+    gen_endloop1()
+
 #endif
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 48/67] Hexagon TCG generation - step 10
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (46 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 47/67] Hexagon TCG generation - step 09 Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-02-28 16:43 ` [RFC PATCH v2 49/67] Hexagon TCG generation - step 11 Taylor Simpson
                   ` (19 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Override compound compare and jump instructions

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/helper_overrides.h | 105 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 105 insertions(+)

diff --git a/target/hexagon/helper_overrides.h b/target/hexagon/helper_overrides.h
index 5443a94e..d3311be 100644
--- a/target/hexagon/helper_overrides.h
+++ b/target/hexagon/helper_overrides.h
@@ -1327,4 +1327,109 @@
 #define fWRAP_J2_endloop1(GENHLPR, SHORTCODE) \
     gen_endloop1()
 
+/*
+ * Compound compare and jump instructions
+ * Here is a primer to understand the tag names
+ *
+ * Comparison
+ *      cmpeqi   compare equal to an immediate
+ *      cmpgti   compare greater than an immediate
+ *      cmpgtiu  compare greater than an unsigned immediate
+ *      cmpeqn1  compare equal to negative 1
+ *      cmpgtn1  compare greater than negative 1
+ *      cmpeq    compare equal (two registers)
+ *
+ * Condition
+ *      tp0      p0 is true     p0 = cmp.eq(r0,#5); if (p0.new) jump:nt address
+ *      fp0      p0 is false    p0 = cmp.eq(r0,#5); if (!p0.new) jump:nt address
+ *      tp1      p1 is true     p1 = cmp.eq(r0,#5); if (p1.new) jump:nt address
+ *      fp1      p1 is false    p1 = cmp.eq(r0,#5); if (!p1.new) jump:nt address
+ *
+ * Prediction (not modelled in qemu)
+ *      _nt      not taken
+ *      _t       taken
+ */
+#define fWRAP_J4_cmpeqi_tp0_jump_nt(GENHLPR, SHORTCODE) \
+    gen_cmpnd_cmpi_jmp(0, TCG_COND_EQ, true, RsV, UiV, riV)
+#define fWRAP_J4_cmpeqi_fp0_jump_nt(GENHLPR, SHORTCODE) \
+    gen_cmpnd_cmpi_jmp(0, TCG_COND_EQ, false, RsV, UiV, riV)
+#define fWRAP_J4_cmpeqi_tp0_jump_t(GENHLPR, SHORTCODE) \
+    gen_cmpnd_cmpi_jmp(0, TCG_COND_EQ, true, RsV, UiV, riV)
+#define fWRAP_J4_cmpeqi_fp0_jump_t(GENHLPR, SHORTCODE) \
+    gen_cmpnd_cmpi_jmp(0, TCG_COND_EQ, false, RsV, UiV, riV)
+#define fWRAP_J4_cmpeqi_tp1_jump_nt(GENHLPR, SHORTCODE) \
+    gen_cmpnd_cmpi_jmp(1, TCG_COND_EQ, true, RsV, UiV, riV)
+#define fWRAP_J4_cmpeqi_fp1_jump_nt(GENHLPR, SHORTCODE) \
+    gen_cmpnd_cmpi_jmp(1, TCG_COND_EQ, false, RsV, UiV, riV)
+#define fWRAP_J4_cmpeqi_tp1_jump_t(GENHLPR, SHORTCODE) \
+    gen_cmpnd_cmpi_jmp(1, TCG_COND_EQ, true, RsV, UiV, riV)
+#define fWRAP_J4_cmpeqi_fp1_jump_t(GENHLPR, SHORTCODE) \
+    gen_cmpnd_cmpi_jmp(1, TCG_COND_EQ, false, RsV, UiV, riV)
+#define fWRAP_J4_cmpgti_tp0_jump_nt(GENHLPR, SHORTCODE) \
+    gen_cmpnd_cmpi_jmp(0, TCG_COND_GT, true, RsV, UiV, riV)
+#define fWRAP_J4_cmpgti_fp0_jump_nt(GENHLPR, SHORTCODE) \
+    gen_cmpnd_cmpi_jmp(0, TCG_COND_GT, false, RsV, UiV, riV)
+#define fWRAP_J4_cmpgti_tp0_jump_t(GENHLPR, SHORTCODE) \
+    gen_cmpnd_cmpi_jmp(0, TCG_COND_GT, true, RsV, UiV, riV)
+#define fWRAP_J4_cmpgti_fp0_jump_t(GENHLPR, SHORTCODE) \
+    gen_cmpnd_cmpi_jmp(0, TCG_COND_GT, false, RsV, UiV, riV)
+#define fWRAP_J4_cmpgti_tp1_jump_nt(GENHLPR, SHORTCODE) \
+    gen_cmpnd_cmpi_jmp(1, TCG_COND_GT, true, RsV, UiV, riV)
+#define fWRAP_J4_cmpgti_fp1_jump_nt(GENHLPR, SHORTCODE) \
+    gen_cmpnd_cmpi_jmp(1, TCG_COND_GT, false, RsV, UiV, riV)
+#define fWRAP_J4_cmpgti_tp1_jump_t(GENHLPR, SHORTCODE) \
+    gen_cmpnd_cmpi_jmp(1, TCG_COND_GT, true, RsV, UiV, riV)
+#define fWRAP_J4_cmpgti_fp1_jump_t(GENHLPR, SHORTCODE) \
+    gen_cmpnd_cmpi_jmp(1, TCG_COND_GT, false, RsV, UiV, riV)
+#define fWRAP_J4_cmpgtui_tp0_jump_nt(GENHLPR, SHORTCODE) \
+    gen_cmpnd_cmpi_jmp(0, TCG_COND_GTU, true, RsV, UiV, riV)
+#define fWRAP_J4_cmpgtui_fp0_jump_nt(GENHLPR, SHORTCODE) \
+    gen_cmpnd_cmpi_jmp(0, TCG_COND_GTU, false, RsV, UiV, riV)
+#define fWRAP_J4_cmpgtui_tp0_jump_t(GENHLPR, SHORTCODE) \
+    gen_cmpnd_cmpi_jmp(0, TCG_COND_GTU, true, RsV, UiV, riV)
+#define fWRAP_J4_cmpgtui_fp0_jump_t(GENHLPR, SHORTCODE) \
+    gen_cmpnd_cmpi_jmp(0, TCG_COND_GTU, false, RsV, UiV, riV)
+#define fWRAP_J4_cmpgtui_tp1_jump_nt(GENHLPR, SHORTCODE) \
+    gen_cmpnd_cmpi_jmp(1, TCG_COND_GTU, true, RsV, UiV, riV)
+#define fWRAP_J4_cmpgtui_fp1_jump_nt(GENHLPR, SHORTCODE) \
+    gen_cmpnd_cmpi_jmp(1, TCG_COND_GTU, false, RsV, UiV, riV)
+#define fWRAP_J4_cmpgtui_tp1_jump_t(GENHLPR, SHORTCODE) \
+    gen_cmpnd_cmpi_jmp(1, TCG_COND_GTU, true, RsV, UiV, riV)
+#define fWRAP_J4_cmpgtui_fp1_jump_t(GENHLPR, SHORTCODE) \
+    gen_cmpnd_cmpi_jmp(1, TCG_COND_GTU, false, RsV, UiV, riV)
+#define fWRAP_J4_cmpeqn1_tp0_jump_nt(GENHLPR, SHORTCODE) \
+    gen_cmpnd_cmp_n1_jmp(0, TCG_COND_EQ, true, RsV, riV)
+#define fWRAP_J4_cmpeqn1_fp0_jump_nt(GENHLPR, SHORTCODE) \
+    gen_cmpnd_cmp_n1_jmp(0, TCG_COND_EQ, false, RsV, riV)
+#define fWRAP_J4_cmpeqn1_tp0_jump_t(GENHLPR, SHORTCODE) \
+    gen_cmpnd_cmp_n1_jmp(0, TCG_COND_EQ, true, RsV, riV)
+#define fWRAP_J4_cmpeqn1_fp0_jump_t(GENHLPR, SHORTCODE) \
+    gen_cmpnd_cmp_n1_jmp(0, TCG_COND_EQ, false, RsV, riV)
+#define fWRAP_J4_cmpeqn1_tp1_jump_nt(GENHLPR, SHORTCODE) \
+    gen_cmpnd_cmp_n1_jmp(1, TCG_COND_EQ, true, RsV, riV)
+#define fWRAP_J4_cmpeqn1_fp1_jump_nt(GENHLPR, SHORTCODE) \
+    gen_cmpnd_cmp_n1_jmp(1, TCG_COND_EQ, false, RsV, riV)
+#define fWRAP_J4_cmpeqn1_tp1_jump_t(GENHLPR, SHORTCODE) \
+    gen_cmpnd_cmp_n1_jmp(1, TCG_COND_EQ, true, RsV, riV)
+#define fWRAP_J4_cmpeqn1_fp1_jump_t(GENHLPR, SHORTCODE) \
+    gen_cmpnd_cmp_n1_jmp(1, TCG_COND_EQ, false, RsV, riV)
+#define fWRAP_J4_cmpgtn1_tp0_jump_nt(GENHLPR, SHORTCODE) \
+    gen_cmpnd_cmp_n1_jmp(0, TCG_COND_GT, true, RsV, riV)
+#define fWRAP_J4_cmpgtn1_fp0_jump_nt(GENHLPR, SHORTCODE) \
+    gen_cmpnd_cmp_n1_jmp(0, TCG_COND_GT, false, RsV, riV)
+#define fWRAP_J4_cmpgtn1_tp0_jump_t(GENHLPR, SHORTCODE) \
+    gen_cmpnd_cmp_n1_jmp(0, TCG_COND_GT, true, RsV, riV)
+#define fWRAP_J4_cmpgtn1_fp0_jump_t(GENHLPR, SHORTCODE) \
+    gen_cmpnd_cmp_n1_jmp(0, TCG_COND_GT, false, RsV, riV)
+#define fWRAP_J4_cmpgtn1_tp1_jump_nt(GENHLPR, SHORTCODE) \
+    gen_cmpnd_cmp_n1_jmp(1, TCG_COND_GT, true, RsV, riV)
+#define fWRAP_J4_cmpgtn1_fp1_jump_nt(GENHLPR, SHORTCODE) \
+    gen_cmpnd_cmp_n1_jmp(1, TCG_COND_GT, false, RsV, riV)
+#define fWRAP_J4_cmpgtn1_tp1_jump_t(GENHLPR, SHORTCODE) \
+    gen_cmpnd_cmp_n1_jmp(1, TCG_COND_GT, true, RsV, riV)
+#define fWRAP_J4_cmpgtn1_fp1_jump_t(GENHLPR, SHORTCODE) \
+    gen_cmpnd_cmp_n1_jmp(1, TCG_COND_GT, false, RsV, riV)
+#define fWRAP_J4_cmpeq_tp0_jump_t(GENHLPR, SHORTCODE) \
+    gen_cmpnd_cmp_jmp(0, TCG_COND_EQ, true, RsV, RtV, riV)
+
 #endif
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 49/67] Hexagon TCG generation - step 11
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (47 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 48/67] Hexagon TCG generation - step 10 Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-02-28 16:43 ` [RFC PATCH v2 50/67] Hexagon TCG generation - step 12 Taylor Simpson
                   ` (18 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Override compare, transfer, conditional jump instructions

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/helper_overrides.h | 119 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 119 insertions(+)

diff --git a/target/hexagon/helper_overrides.h b/target/hexagon/helper_overrides.h
index d3311be..ad9a88c 100644
--- a/target/hexagon/helper_overrides.h
+++ b/target/hexagon/helper_overrides.h
@@ -1432,4 +1432,123 @@
 #define fWRAP_J4_cmpeq_tp0_jump_t(GENHLPR, SHORTCODE) \
     gen_cmpnd_cmp_jmp(0, TCG_COND_EQ, true, RsV, RtV, riV)
 
+/* p0 = cmp.eq(r0, #7) */
+#define fWRAP_SA1_cmpeqi(GENHLPR, SHORTCODE) \
+    do { \
+        TCGv tmp = tcg_temp_new(); \
+        gen_comparei(TCG_COND_EQ, tmp, RsV, uiV); \
+        gen_log_pred_write(0, tmp); \
+        tcg_temp_free(tmp); \
+    } while (0)
+
+/* r0 = add(r29,#8) */
+#define fWRAP_SA1_addsp(GENHLPR, SHORTCODE) \
+    tcg_gen_addi_tl(RdV, hex_gpr[HEX_REG_SP], IMMNO(0))
+
+/* r0 = add(r0, r1) */
+#define fWRAP_SA1_addrx(GENHLPR, SHORTCODE) \
+    tcg_gen_add_tl(RxV, RxV, RsV)
+
+#define fWRAP_A2_add(GENHLPR, SHORTCODE) \
+    tcg_gen_add_tl(RdV, RsV, RtV)
+
+#define fWRAP_A2_sub(GENHLPR, SHORTCODE) \
+    tcg_gen_sub_tl(RdV, RtV, RsV)
+
+/* r0 = sub(#10, r1) */
+#define fWRAP_A2_subri(GENHLPR, SHORTCODE) \
+    tcg_gen_subfi_tl(RdV, siV, RsV)
+
+#define fWRAP_A2_addi(GENHLPR, SHORTCODE) \
+    tcg_gen_addi_tl(RdV, RsV, siV)
+
+#define fWRAP_A2_and(GENHLPR, SHORTCODE) \
+    tcg_gen_and_tl(RdV, RsV, RtV)
+
+#define fWRAP_A2_andir(GENHLPR, SHORTCODE) \
+    tcg_gen_andi_tl(RdV, RsV, siV)
+
+#define fWRAP_A2_xor(GENHLPR, SHORTCODE) \
+    tcg_gen_xor_tl(RdV, RsV, RtV)
+
+/* Transfer instructions */
+#define fWRAP_A2_tfr(GENHLPR, SHORTCODE) \
+    tcg_gen_mov_tl(RdV, RsV)
+#define fWRAP_SA1_tfr(GENHLPR, SHORTCODE) \
+    tcg_gen_mov_tl(RdV, RsV)
+#define fWRAP_A2_tfrsi(GENHLPR, SHORTCODE) \
+    tcg_gen_movi_tl(RdV, siV)
+#define fWRAP_A2_tfrcrr(GENHLPR, SHORTCODE) \
+    tcg_gen_mov_tl(RdV, CsV)
+#define fWRAP_A2_tfrrcr(GENHLPR, SHORTCODE) \
+    tcg_gen_mov_tl(CdV, RsV)
+
+#define fWRAP_A2_nop(GENHLPR, SHORTCODE) \
+    do { } while (0)
+
+/* Compare instructions */
+#define fWRAP_C2_cmpeq(GENHLPR, SHORTCODE) \
+    gen_compare(TCG_COND_EQ, PdV, RsV, RtV)
+#define fWRAP_C4_cmpneq(GENHLPR, SHORTCODE) \
+    gen_compare(TCG_COND_NE, PdV, RsV, RtV)
+#define fWRAP_C2_cmpgt(GENHLPR, SHORTCODE) \
+    gen_compare(TCG_COND_GT, PdV, RsV, RtV)
+#define fWRAP_C2_cmpgtu(GENHLPR, SHORTCODE) \
+    gen_compare(TCG_COND_GTU, PdV, RsV, RtV)
+#define fWRAP_C4_cmplte(GENHLPR, SHORTCODE) \
+    gen_compare(TCG_COND_LE, PdV, RsV, RtV)
+#define fWRAP_C4_cmplteu(GENHLPR, SHORTCODE) \
+    gen_compare(TCG_COND_LEU, PdV, RsV, RtV)
+#define fWRAP_C2_cmpeqp(GENHLPR, SHORTCODE) \
+    gen_compare_i64(TCG_COND_EQ, PdV, RssV, RttV)
+#define fWRAP_C2_cmpgtp(GENHLPR, SHORTCODE) \
+    gen_compare_i64(TCG_COND_GT, PdV, RssV, RttV)
+#define fWRAP_C2_cmpgtup(GENHLPR, SHORTCODE) \
+    gen_compare_i64(TCG_COND_GTU, PdV, RssV, RttV)
+#define fWRAP_C2_cmpeqi(GENHLPR, SHORTCODE) \
+    gen_comparei(TCG_COND_EQ, PdV, RsV, siV)
+#define fWRAP_C2_cmpgti(GENHLPR, SHORTCODE) \
+    gen_comparei(TCG_COND_GT, PdV, RsV, siV)
+#define fWRAP_C2_cmpgtui(GENHLPR, SHORTCODE) \
+    gen_comparei(TCG_COND_GTU, PdV, RsV, uiV)
+
+#define fWRAP_SA1_zxtb(GENHLPR, SHORTCODE) \
+    tcg_gen_ext8u_tl(RdV, RsV)
+
+#define fWRAP_J2_jump(GENHLPR, SHORTCODE) \
+    gen_jump(riV)
+#define fWRAP_J2_jumpr(GENHLPR, SHORTCODE) \
+    gen_write_new_pc(RsV)
+
+#define fWRAP_cond_jump(COND) \
+    do { \
+        TCGv LSB = tcg_temp_new(); \
+        COND; \
+        gen_cond_jump(LSB, riV); \
+        tcg_temp_free(LSB); \
+    } while (0)
+
+#define fWRAP_J2_jumpt(GENHLPR, SHORTCODE) \
+    fWRAP_cond_jump(fLSBOLD(PuV))
+#define fWRAP_J2_jumpf(GENHLPR, SHORTCODE) \
+    fWRAP_cond_jump(fLSBOLDNOT(PuV))
+#define fWRAP_J2_jumpfnewpt(GENHLPR, SHORTCODE) \
+    fWRAP_cond_jump(fLSBNEWNOT(PuN))
+#define fWRAP_J2_jumpfnew(GENHLPR, SHORTCODE) \
+    fWRAP_cond_jump(fLSBNEWNOT(PuN))
+
+#define fWRAP_J2_jumprfnew(GENHLPR, SHORTCODE) \
+    do { \
+        TCGv LSB = tcg_temp_new(); \
+        tcg_gen_andi_tl(LSB, PuN, 1); \
+        tcg_gen_xori_tl(LSB, LSB, 1); \
+        gen_cond_jumpr(LSB, RsV); \
+        tcg_temp_free(LSB); \
+    } while (0)
+
+#define fWRAP_J2_jumptnew(GENHLPR, SHORTCODE) \
+    gen_cond_jump(PuN, riV)
+#define fWRAP_J2_jumptnewpt(GENHLPR, SHORTCODE) \
+    gen_cond_jump(PuN, riV)
+
 #endif
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 50/67] Hexagon TCG generation - step 12
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (48 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 49/67] Hexagon TCG generation - step 11 Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-02-28 16:43 ` [RFC PATCH v2 51/67] Hexagon translation Taylor Simpson
                   ` (17 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Override miscellaneous instructions identified during profiling

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/helper_overrides.h | 296 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 296 insertions(+)

diff --git a/target/hexagon/helper_overrides.h b/target/hexagon/helper_overrides.h
index ad9a88c..b796ced 100644
--- a/target/hexagon/helper_overrides.h
+++ b/target/hexagon/helper_overrides.h
@@ -1551,4 +1551,300 @@
 #define fWRAP_J2_jumptnewpt(GENHLPR, SHORTCODE) \
     gen_cond_jump(PuN, riV)
 
+/*
+ * New value compare & jump instructions
+ * if ([!]COND(r0.new, r1) jump:t address
+ * if ([!]COND(r0.new, #7) jump:t address
+ */
+#define fWRAP_J4_cmpgt_f_jumpnv_t(GENHLPR, SHORTCODE) \
+    gen_cmp_jumpnv(TCG_COND_LE, NsX, RtV, riV)
+#define fWRAP_J4_cmpeq_f_jumpnv_nt(GENHLPR, SHORTCODE) \
+    gen_cmp_jumpnv(TCG_COND_NE, NsX, RtV, riV)
+#define fWRAP_J4_cmpgt_t_jumpnv_t(GENHLPR, SHORTCODE) \
+    gen_cmp_jumpnv(TCG_COND_GT, NsX, RtV, riV)
+#define fWRAP_J4_cmpeqi_t_jumpnv_nt(GENHLPR, SHORTCODE) \
+    gen_cmpi_jumpnv(TCG_COND_EQ, NsX, UiV, riV)
+#define fWRAP_J4_cmpltu_f_jumpnv_t(GENHLPR, SHORTCODE) \
+    gen_cmp_jumpnv(TCG_COND_GEU, NsX, RtV, riV)
+#define fWRAP_J4_cmpgtui_t_jumpnv_t(GENHLPR, SHORTCODE) \
+    gen_cmpi_jumpnv(TCG_COND_GTU, NsX, UiV, riV)
+#define fWRAP_J4_cmpeq_f_jumpnv_t(GENHLPR, SHORTCODE) \
+    gen_cmp_jumpnv(TCG_COND_NE, NsX, RtV, riV)
+#define fWRAP_J4_cmpeqi_f_jumpnv_t(GENHLPR, SHORTCODE) \
+    gen_cmpi_jumpnv(TCG_COND_NE, NsX, UiV, riV)
+#define fWRAP_J4_cmpgtu_t_jumpnv_t(GENHLPR, SHORTCODE) \
+    gen_cmp_jumpnv(TCG_COND_GTU, NsX, RtV, riV)
+#define fWRAP_J4_cmpgtu_f_jumpnv_t(GENHLPR, SHORTCODE) \
+    gen_cmp_jumpnv(TCG_COND_LEU, NsX, RtV, riV)
+#define fWRAP_J4_cmplt_t_jumpnv_t(GENHLPR, SHORTCODE) \
+    gen_cmp_jumpnv(TCG_COND_LT, NsX, RtV, riV)
+
+/* r0 = r1 ; jump address */
+#define fWRAP_J4_jumpsetr(GENHLPR, SHORTCODE) \
+    do { \
+        tcg_gen_mov_tl(RdV, RsV); \
+        gen_jump(riV); \
+    } while (0)
+
+/* r0 = lsr(r1, #5) */
+#define fWRAP_S2_lsr_i_r(GENHLPR, SHORTCODE) \
+    fLSHIFTR(RdV, RsV, IMMNO(0), 4_4)
+
+/* r0 += lsr(r1, #5) */
+#define fWRAP_S2_lsr_i_r_acc(GENHLPR, SHORTCODE) \
+    do { \
+        TCGv tmp = tcg_temp_new(); \
+        fLSHIFTR(tmp, RsV, IMMNO(0), 4_4); \
+        tcg_gen_add_tl(RxV, RxV, tmp); \
+        tcg_temp_free(tmp); \
+    } while (0)
+
+/* r0 ^= lsr(r1, #5) */
+#define fWRAP_S2_lsr_i_r_xacc(GENHLPR, SHORTCODE) \
+    do { \
+        TCGv tmp = tcg_temp_new(); \
+        fLSHIFTR(tmp, RsV, IMMNO(0), 4_4); \
+        tcg_gen_xor_tl(RxV, RxV, tmp); \
+        tcg_temp_free(tmp); \
+    } while (0)
+
+/* r0 = asr(r1, #5) */
+#define fWRAP_S2_asr_i_r(GENHLPR, SHORTCODE) \
+    fASHIFTR(RdV, RsV, IMMNO(0), 4_4)
+
+/* r0 = addasl(r1, r2, #3) */
+#define fWRAP_S2_addasl_rrri(GENHLPR, SHORTCODE) \
+    do { \
+        TCGv tmp = tcg_temp_new(); \
+        fASHIFTL(tmp, RsV, IMMNO(0), 4_4); \
+        tcg_gen_add_tl(RdV, RtV, tmp); \
+        tcg_temp_free(tmp); \
+    } while (0)
+
+/* r0 |= asl(r1, r2) */
+#define fWRAP_S2_asl_r_r_or(GENHLPR, SHORTCODE) \
+    gen_asl_r_r_or(RxV, RsV, RtV)
+
+/* r0 = asl(r1, #5) */
+#define fWRAP_S2_asl_i_r(GENHLPR, SHORTCODE) \
+    fASHIFTL(RdV, RsV, IMMNO(0), 4_4)
+
+/* r0 |= asl(r1, #5) */
+#define fWRAP_S2_asl_i_r_or(GENHLPR, SHORTCODE) \
+    do { \
+        TCGv tmp = tcg_temp_new(); \
+        fASHIFTL(tmp, RsV, IMMNO(0), 4_4); \
+        tcg_gen_or_tl(RxV, RxV, tmp); \
+        tcg_temp_free(tmp); \
+    } while (0)
+
+/* r0 = vsplatb(r1) */
+#define fWRAP_S2_vsplatrb(GENHLPR, SHORTCODE) \
+    do { \
+        TCGv tmp = tcg_temp_new(); \
+        int i; \
+        tcg_gen_movi_tl(RdV, 0); \
+        tcg_gen_andi_tl(tmp, RsV, 0xff); \
+        for (i = 0; i < 4; i++) { \
+            tcg_gen_shli_tl(RdV, RdV, 8); \
+            tcg_gen_or_tl(RdV, RdV, tmp); \
+        } \
+        tcg_temp_free(tmp); \
+    } while (0)
+
+#define fWRAP_SA1_seti(GENHLPR, SHORTCODE) \
+    tcg_gen_movi_tl(RdV, IMMNO(0))
+
+#define fWRAP_S2_insert(GENHLPR, SHORTCODE) \
+    tcg_gen_deposit_i32(RxV, RxV, RsV, IMMNO(1), IMMNO(0))
+
+#define fWRAP_S2_extractu(GENHLPR, SHORTCODE) \
+    tcg_gen_extract_i32(RdV, RsV, IMMNO(1), IMMNO(0))
+
+#define fWRAP_A2_combinew(GENHLPR, SHORTCODE) \
+    tcg_gen_concat_i32_i64(RddV, RtV, RsV)
+#define fWRAP_A2_combineii(GENHLPR, SHORTCODE) \
+    do { \
+        TCGv tmp_lo = tcg_const_tl(SiV); \
+        TCGv tmp_hi = tcg_const_tl(siV); \
+        tcg_gen_concat_i32_i64(RddV, tmp_lo, tmp_hi); \
+        tcg_temp_free(tmp_lo); \
+        tcg_temp_free(tmp_hi); \
+    } while (0)
+#define fWRAP_A4_combineri(GENHLPR, SHORTCODE) \
+    do { \
+        TCGv tmp_lo = tcg_const_tl(siV); \
+        tcg_gen_concat_i32_i64(RddV, tmp_lo, RsV); \
+        tcg_temp_free(tmp_lo); \
+    } while (0)
+#define fWRAP_A4_combineir(GENHLPR, SHORTCODE) \
+    do { \
+        TCGv tmp_hi = tcg_const_tl(siV); \
+        tcg_gen_concat_i32_i64(RddV, RsV, tmp_hi); \
+        tcg_temp_free(tmp_hi); \
+    } while (0)
+#define fWRAP_A4_combineii(GENHLPR, SHORTCODE) \
+    do { \
+        TCGv tmp_lo = tcg_const_tl(UiV); \
+        TCGv tmp_hi = tcg_const_tl(siV); \
+        tcg_gen_concat_i32_i64(RddV, tmp_lo, tmp_hi); \
+        tcg_temp_free(tmp_lo); \
+        tcg_temp_free(tmp_hi); \
+    } while (0)
+
+#define fWRAP_SA1_combine0i(GENHLPR, SHORTCODE) \
+    do { \
+        TCGv tmp_lo = tcg_const_tl(uiV); \
+        TCGv zero = tcg_const_tl(0); \
+        tcg_gen_concat_i32_i64(RddV, tmp_lo, zero); \
+        tcg_temp_free(tmp_lo); \
+        tcg_temp_free(zero); \
+    } while (0)
+
+/* r0 = or(#8, asl(r1, #5)) */
+#define fWRAP_S4_ori_asl_ri(GENHLPR, SHORTCODE) \
+    do { \
+        TCGv tmp = tcg_temp_new(); \
+        tcg_gen_shli_tl(tmp, RxV, IMMNO(1)); \
+        tcg_gen_ori_tl(RxV, tmp, IMMNO(0)); \
+        tcg_temp_free(tmp); \
+    } while (0)
+
+/* r0 = add(r1, sub(#6, r2)) */
+#define fWRAP_S4_subaddi(GENHLPR, SHORTCODE) \
+    do { \
+        tcg_gen_sub_tl(RdV, RsV, RuV); \
+        tcg_gen_addi_tl(RdV, RdV, IMMNO(0)); \
+    } while (0)
+
+#define fWRAP_SA1_inc(GENHLPR, SHORTCODE) \
+    tcg_gen_addi_tl(RdV, RsV, 1)
+
+#define fWRAP_SA1_dec(GENHLPR, SHORTCODE) \
+    tcg_gen_subi_tl(RdV, RsV, 1)
+
+/* if (p0.new) r0 = #0 */
+#define fWRAP_SA1_clrtnew(GENHLPR, SHORTCODE) \
+    do { \
+        TCGv mask = tcg_temp_new(); \
+        TCGv zero = tcg_const_tl(0); \
+        tcg_gen_movi_tl(RdV, 0); \
+        tcg_gen_movi_tl(mask, 1 << insn->slot); \
+        tcg_gen_or_tl(mask, hex_slot_cancelled, mask); \
+        tcg_gen_movcond_tl(TCG_COND_EQ, hex_slot_cancelled, \
+                           hex_new_pred_value[0], zero, \
+                           mask, hex_slot_cancelled); \
+        tcg_temp_free(mask); \
+        tcg_temp_free(zero); \
+    } while (0)
+
+/* r0 = add(r1 , mpyi(#6, r2)) */
+#define fWRAP_M4_mpyri_addr_u2(GENHLPR, SHORTCODE) \
+    do { \
+        tcg_gen_muli_tl(RdV, RsV, IMMNO(0)); \
+        tcg_gen_add_tl(RdV, RuV, RdV); \
+    } while (0)
+
+/* Predicated add instructions */
+#define WRAP_padd(PRED, ADD) \
+    do { \
+        TCGv LSB = tcg_temp_new(); \
+        TCGv mask = tcg_temp_new(); \
+        TCGv zero = tcg_const_tl(0); \
+        PRED; \
+        ADD; \
+        tcg_gen_movi_tl(mask, 1 << insn->slot); \
+        tcg_gen_or_tl(mask, hex_slot_cancelled, mask); \
+        tcg_gen_movcond_tl(TCG_COND_NE, hex_slot_cancelled, LSB, zero, \
+                           hex_slot_cancelled, mask); \
+        tcg_temp_free(LSB); \
+        tcg_temp_free(mask); \
+        tcg_temp_free(zero); \
+    } while (0)
+
+#define fWRAP_A2_paddt(GENHLPR, SHORTCODE) \
+    WRAP_padd(fLSBOLD(PuV), tcg_gen_add_tl(RdV, RsV, RtV))
+#define fWRAP_A2_paddf(GENHLPR, SHORTCODE) \
+    WRAP_padd(fLSBOLDNOT(PuV), tcg_gen_add_tl(RdV, RsV, RtV))
+#define fWRAP_A2_paddit(GENHLPR, SHORTCODE) \
+    WRAP_padd(fLSBOLD(PuV), tcg_gen_addi_tl(RdV, RsV, IMMNO(0)))
+#define fWRAP_A2_paddif(GENHLPR, SHORTCODE) \
+    WRAP_padd(fLSBOLDNOT(PuV), tcg_gen_addi_tl(RdV, RsV, IMMNO(0)))
+#define fWRAP_A2_padditnew(GENHLPR, SHORTCODE) \
+    WRAP_padd(fLSBNEW(PuN), tcg_gen_addi_tl(RdV, RsV, IMMNO(0)))
+
+/* Conditional move instructions */
+#define fWRAP_COND_MOVE(VAL, COND) \
+    do { \
+        TCGv LSB = tcg_temp_new(); \
+        TCGv zero = tcg_const_tl(0); \
+        TCGv mask = tcg_temp_new(); \
+        TCGv value = tcg_const_tl(siV); \
+        VAL; \
+        tcg_gen_movcond_tl(COND, RdV, LSB, zero, value, zero); \
+        tcg_gen_movi_tl(mask, 1 << insn->slot); \
+        tcg_gen_movcond_tl(TCG_COND_EQ, mask, LSB, zero, mask, zero); \
+        tcg_gen_or_tl(hex_slot_cancelled, hex_slot_cancelled, mask); \
+        tcg_temp_free(LSB); \
+        tcg_temp_free(zero); \
+        tcg_temp_free(mask); \
+        tcg_temp_free(value); \
+    } while (0)
+
+#define fWRAP_C2_cmoveit(GENHLPR, SHORTCODE) \
+    fWRAP_COND_MOVE(fLSBOLD(PuV), TCG_COND_NE)
+#define fWRAP_C2_cmovenewit(GENHLPR, SHORTCODE) \
+    fWRAP_COND_MOVE(fLSBNEW(PuN), TCG_COND_NE)
+#define fWRAP_C2_cmovenewif(GENHLPR, SHORTCODE) \
+    fWRAP_COND_MOVE(fLSBNEWNOT(PuN), TCG_COND_NE)
+
+/* p0 = tstbit(r0, #5) */
+#define fWRAP_S2_tstbit_i(GENHLPR, SHORTCODE) \
+    do { \
+        TCGv tmp = tcg_temp_new(); \
+        tcg_gen_andi_tl(tmp, RsV, (1 << IMMNO(0))); \
+        gen_8bitsof(PdV, tmp); \
+        tcg_temp_free(tmp); \
+    } while (0)
+
+/* p0 = !tstbit(r0, #5) */
+#define fWRAP_S4_ntstbit_i(GENHLPR, SHORTCODE) \
+    do { \
+        TCGv tmp = tcg_temp_new(); \
+        tcg_gen_andi_tl(tmp, RsV, (1 << IMMNO(0))); \
+        gen_8bitsof(PdV, tmp); \
+        tcg_gen_xori_tl(PdV, PdV, 0xff); \
+        tcg_temp_free(tmp); \
+    } while (0)
+
+/* r0 = setbit(r1, #5) */
+#define fWRAP_S2_setbit_i(GENHLPR, SHORTCODE) \
+    tcg_gen_ori_tl(RdV, RsV, 1 << IMMNO(0))
+
+/* r0 += add(r1, #8) */
+#define fWRAP_M2_accii(GENHLPR, SHORTCODE) \
+    do { \
+        TCGv tmp = tcg_temp_new(); \
+        tcg_gen_add_tl(tmp, RxV, RsV); \
+        tcg_gen_addi_tl(RxV, tmp, IMMNO(0)); \
+        tcg_temp_free(tmp); \
+    } while (0)
+
+/* p0 = bitsclr(r1, #6) */
+#define fWRAP_C2_bitsclri(GENHLPR, SHORTCODE) \
+    do { \
+        TCGv tmp = tcg_temp_new(); \
+        TCGv zero = tcg_const_tl(0); \
+        tcg_gen_andi_tl(tmp, RsV, IMMNO(0)); \
+        gen_compare(TCG_COND_EQ, PdV, tmp, zero); \
+        tcg_temp_free(tmp); \
+        tcg_temp_free(zero); \
+    } while (0)
+
+#define fWRAP_SL2_jumpr31(GENHLPR, SHORTCODE) \
+    gen_write_new_pc(hex_gpr[HEX_REG_LR])
+
+#define fWRAP_SL2_jumpr31_tnew(GENHLPR, SHORTCODE) \
+    gen_cond_jumpr(hex_new_pred_value[0], hex_gpr[HEX_REG_LR])
+
 #endif
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 51/67] Hexagon translation
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (49 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 50/67] Hexagon TCG generation - step 12 Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-02-28 16:43 ` [RFC PATCH v2 52/67] Hexagon Linux user emulation Taylor Simpson
                   ` (16 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Read the instruction memory
Create a packet data structure
Generate TCG code for the start of the packet
Invoke the generate function for each instruction
Generate TCG code for the end of the packet

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/translate.h |  82 +++++
 target/hexagon/translate.c | 728 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 810 insertions(+)
 create mode 100644 target/hexagon/translate.h
 create mode 100644 target/hexagon/translate.c

diff --git a/target/hexagon/translate.h b/target/hexagon/translate.h
new file mode 100644
index 0000000..65b721e
--- /dev/null
+++ b/target/hexagon/translate.h
@@ -0,0 +1,82 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HEXAGON_TRANSLATE_H
+#define HEXAGON_TRANSLATE_H
+
+#include "cpu.h"
+#include "exec/translator.h"
+#include "exec/helper-proto.h"
+#include "exec/helper-gen.h"
+#include "internal.h"
+
+typedef struct DisasContext {
+    DisasContextBase base;
+    uint32_t mem_idx;
+    int ctx_reg_log[REG_WRITES_MAX];
+    int ctx_reg_log_idx;
+    int ctx_preg_log[PRED_WRITES_MAX];
+    int ctx_preg_log_idx;
+    uint8_t ctx_store_width[STORES_MAX];
+} DisasContext;
+
+static inline void ctx_log_reg_write(DisasContext *ctx, int rnum)
+{
+#if HEX_DEBUG
+    int i;
+    for (i = 0; i < ctx->ctx_reg_log_idx; i++) {
+        if (ctx->ctx_reg_log[i] == rnum) {
+            HEX_DEBUG_LOG("WARNING: Multiple writes to r%d\n", rnum);
+        }
+    }
+#endif
+    ctx->ctx_reg_log[ctx->ctx_reg_log_idx] = rnum;
+    ctx->ctx_reg_log_idx++;
+}
+
+static inline void ctx_log_pred_write(DisasContext *ctx, int pnum)
+{
+    ctx->ctx_preg_log[ctx->ctx_preg_log_idx] = pnum;
+    ctx->ctx_preg_log_idx++;
+}
+
+extern TCGv hex_gpr[TOTAL_PER_THREAD_REGS];
+extern TCGv hex_pred[NUM_PREGS];
+extern TCGv hex_next_PC;
+extern TCGv hex_this_PC;
+extern TCGv hex_slot_cancelled;
+extern TCGv hex_branch_taken;
+extern TCGv hex_new_value[TOTAL_PER_THREAD_REGS];
+extern TCGv hex_reg_written[TOTAL_PER_THREAD_REGS];
+extern TCGv hex_new_pred_value[NUM_PREGS];
+extern TCGv hex_pred_written;
+extern TCGv hex_store_addr[STORES_MAX];
+extern TCGv hex_store_width[STORES_MAX];
+extern TCGv hex_store_val32[STORES_MAX];
+extern TCGv_i64 hex_store_val64[STORES_MAX];
+extern TCGv hex_dczero_addr;
+extern TCGv llsc_addr;
+extern TCGv llsc_val;
+extern TCGv_i64 llsc_val_i64;
+extern TCGv hex_is_gather_store_insn;
+extern TCGv hex_gather_issued;
+
+void hexagon_translate_init(void);
+extern void gen_exception(int excp);
+extern void gen_exception_debug(void);
+
+#endif
diff --git a/target/hexagon/translate.c b/target/hexagon/translate.c
new file mode 100644
index 0000000..57ab294
--- /dev/null
+++ b/target/hexagon/translate.c
@@ -0,0 +1,728 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#define QEMU_GENERATE
+#include "qemu/osdep.h"
+#include "cpu.h"
+#include "tcg/tcg-op.h"
+#include "exec/cpu_ldst.h"
+#include "exec/log.h"
+#include "internal.h"
+#include "attribs.h"
+#include "insn.h"
+#include "decode.h"
+#include "translate.h"
+#include "printinsn.h"
+#include "macros.h"
+
+TCGv hex_gpr[TOTAL_PER_THREAD_REGS];
+TCGv hex_pred[NUM_PREGS];
+TCGv hex_next_PC;
+TCGv hex_this_PC;
+TCGv hex_slot_cancelled;
+TCGv hex_branch_taken;
+TCGv hex_new_value[TOTAL_PER_THREAD_REGS];
+#if HEX_DEBUG
+TCGv hex_reg_written[TOTAL_PER_THREAD_REGS];
+#endif
+TCGv hex_new_pred_value[NUM_PREGS];
+TCGv hex_pred_written;
+TCGv hex_store_addr[STORES_MAX];
+TCGv hex_store_width[STORES_MAX];
+TCGv hex_store_val32[STORES_MAX];
+TCGv_i64 hex_store_val64[STORES_MAX];
+TCGv hex_dczero_addr;
+TCGv llsc_addr;
+TCGv llsc_val;
+TCGv_i64 llsc_val_i64;
+TCGv hex_is_gather_store_insn;
+TCGv hex_gather_issued;
+
+static const char * const hexagon_prednames[] = {
+  "p0", "p1", "p2", "p3"
+};
+
+void gen_exception(int excp)
+{
+    TCGv_i32 helper_tmp = tcg_const_i32(excp);
+    gen_helper_raise_exception(cpu_env, helper_tmp);
+    tcg_temp_free_i32(helper_tmp);
+}
+
+void gen_exception_debug(void)
+{
+    gen_exception(EXCP_DEBUG);
+}
+
+#if HEX_DEBUG
+#define PACKET_BUFFER_LEN              1028
+static void print_pkt(packet_t *pkt)
+{
+    char buf[PACKET_BUFFER_LEN];
+    snprint_a_pkt(buf, PACKET_BUFFER_LEN, pkt);
+    HEX_DEBUG_LOG("%s", buf);
+}
+#define HEX_DEBUG_PRINT_PKT(pkt)  print_pkt(pkt)
+#else
+#define HEX_DEBUG_PRINT_PKT(pkt)  /* nothing */
+#endif
+
+static int read_packet_words(CPUHexagonState *env, DisasContext *ctx,
+                             uint32_t words[])
+{
+    bool found_end = false;
+    int max_words;
+    int nwords;
+    int i;
+
+    /* Make sure we don't cross a page boundary */
+    max_words = -(ctx->base.pc_next | TARGET_PAGE_MASK) / sizeof(uint32_t);
+    if (max_words < PACKET_WORDS_MAX) {
+        /* Might cross a page boundary */
+        if (ctx->base.num_insns == 1) {
+            /* OK if it's the first packet in the TB */
+            max_words = PACKET_WORDS_MAX;
+        }
+    } else {
+        max_words = PACKET_WORDS_MAX;
+    }
+
+    memset(words, 0, PACKET_WORDS_MAX * sizeof(uint32_t));
+    for (nwords = 0; !found_end && nwords < max_words; nwords++) {
+        words[nwords] = cpu_ldl_code(env,
+                                ctx->base.pc_next + nwords * sizeof(uint32_t));
+        found_end = is_packet_end(words[nwords]);
+    }
+    if (!found_end) {
+        if (nwords == PACKET_WORDS_MAX) {
+            /* Read too many words without finding the end */
+            gen_exception(HEX_EXCP_INVALID_PACKET);
+            ctx->base.is_jmp = DISAS_NORETURN;
+            return 0;
+        }
+        /* Crosses page boundary - defer to next TB */
+        ctx->base.is_jmp = DISAS_TOO_MANY;
+        return 0;
+    }
+
+    HEX_DEBUG_LOG("decode_packet: pc = 0x%x\n", ctx->base.pc_next);
+    HEX_DEBUG_LOG("    words = { ");
+    for (i = 0; i < nwords; i++) {
+        HEX_DEBUG_LOG("0x%x, ", words[i]);
+    }
+    HEX_DEBUG_LOG("}\n");
+
+    return nwords;
+}
+
+static void gen_start_packet(DisasContext *ctx, packet_t *pkt)
+{
+    target_ulong next_PC = ctx->base.pc_next + pkt->encod_pkt_size_in_bytes;
+    int i;
+
+    /* Clear out the disassembly context */
+    ctx->ctx_reg_log_idx = 0;
+    ctx->ctx_preg_log_idx = 0;
+    for (i = 0; i < STORES_MAX; i++) {
+        ctx->ctx_store_width[i] = 0;
+    }
+
+#if HEX_DEBUG
+    /* Handy place to set a breakpoint before the packet executes */
+    gen_helper_debug_start_packet(cpu_env);
+    tcg_gen_movi_tl(hex_this_PC, ctx->base.pc_next);
+#endif
+
+    /* Initialize the runtime state for packet semantics */
+    tcg_gen_movi_tl(hex_gpr[HEX_REG_PC], ctx->base.pc_next);
+    tcg_gen_movi_tl(hex_slot_cancelled, 0);
+    if (pkt->pkt_has_cof) {
+        tcg_gen_movi_tl(hex_branch_taken, 0);
+        tcg_gen_movi_tl(hex_next_PC, next_PC);
+    }
+    tcg_gen_movi_tl(hex_pred_written, 0);
+}
+
+static int is_gather_store_insn(insn_t *insn)
+{
+    int check = GET_ATTRIB(insn->opcode, A_CVI_NEW);
+    check &= (insn->new_value_producer_slot == 1);
+    return check;
+}
+
+/*
+ * The LOG_*_WRITE macros mark most of the writes in a packet
+ * However, there are some implicit writes marked as attributes
+ * of the applicable instructions.
+ */
+static void mark_implicit_reg_write(DisasContext *ctx, insn_t *insn,
+                                    int attrib, int rnum)
+{
+    if (GET_ATTRIB(insn->opcode, attrib)) {
+        ctx->ctx_reg_log[ctx->ctx_reg_log_idx] = rnum;
+        ctx->ctx_reg_log_idx++;
+    }
+}
+
+static void mark_implicit_pred_write(DisasContext *ctx, insn_t *insn,
+                                     int attrib, int pnum)
+{
+    if (GET_ATTRIB(insn->opcode, attrib)) {
+        ctx->ctx_preg_log[ctx->ctx_preg_log_idx] = pnum;
+        ctx->ctx_preg_log_idx++;
+    }
+}
+
+static void mark_implicit_writes(DisasContext *ctx, insn_t *insn)
+{
+    mark_implicit_reg_write(ctx, insn, A_IMPLICIT_WRITES_LR,  HEX_REG_LR);
+    mark_implicit_reg_write(ctx, insn, A_IMPLICIT_WRITES_LC0, HEX_REG_LC0);
+    mark_implicit_reg_write(ctx, insn, A_IMPLICIT_WRITES_SA0, HEX_REG_SA0);
+    mark_implicit_reg_write(ctx, insn, A_IMPLICIT_WRITES_LC1, HEX_REG_LC1);
+    mark_implicit_reg_write(ctx, insn, A_IMPLICIT_WRITES_SA1, HEX_REG_SA1);
+
+    mark_implicit_pred_write(ctx, insn, A_IMPLICIT_WRITES_P0, 0);
+    mark_implicit_pred_write(ctx, insn, A_IMPLICIT_WRITES_P1, 1);
+    mark_implicit_pred_write(ctx, insn, A_IMPLICIT_WRITES_P2, 2);
+    mark_implicit_pred_write(ctx, insn, A_IMPLICIT_WRITES_P3, 3);
+}
+
+static void gen_insn(CPUHexagonState *env, DisasContext *ctx, insn_t *insn)
+{
+    if (insn->generate) {
+        bool is_gather_store = is_gather_store_insn(insn);
+        if (is_gather_store) {
+            tcg_gen_movi_tl(hex_is_gather_store_insn, 1);
+        }
+        insn->generate(env, ctx, insn);
+        mark_implicit_writes(ctx, insn);
+        if (is_gather_store) {
+            tcg_gen_movi_tl(hex_is_gather_store_insn, 0);
+        }
+    } else {
+        gen_exception(HEX_EXCP_INVALID_OPCODE);
+        ctx->base.is_jmp = DISAS_NORETURN;
+    }
+}
+
+/*
+ * Helpers for generating the packet commit
+ */
+static void gen_reg_writes(DisasContext *ctx)
+{
+    int i;
+
+    for (i = 0; i < ctx->ctx_reg_log_idx; i++) {
+        int reg_num = ctx->ctx_reg_log[i];
+
+        tcg_gen_mov_tl(hex_gpr[reg_num], hex_new_value[reg_num]);
+    }
+}
+
+static void gen_pred_writes(DisasContext *ctx, packet_t *pkt)
+{
+    /* Early exit if the log is empty */
+    if (!ctx->ctx_preg_log_idx) {
+        return;
+    }
+
+    TCGv zero = tcg_const_tl(0);
+    TCGv control_reg = tcg_temp_new();
+    TCGv pval = tcg_temp_new();
+    int i;
+
+    /*
+     * Only endloop instructions will conditionally
+     * write a predicate.  If there are no endloop
+     * instructions, we can use the non-conditional
+     * write of the predicates.
+     */
+    if (pkt->pkt_has_endloop) {
+        TCGv pred_written = tcg_temp_new();
+        for (i = 0; i < ctx->ctx_preg_log_idx; i++) {
+            int pred_num = ctx->ctx_preg_log[i];
+
+            tcg_gen_andi_tl(pred_written, hex_pred_written, 1 << pred_num);
+            tcg_gen_movcond_tl(TCG_COND_NE, hex_pred[pred_num],
+                               pred_written, zero,
+                               hex_new_pred_value[pred_num],
+                               hex_pred[pred_num]);
+        }
+        tcg_temp_free(pred_written);
+    } else {
+        for (i = 0; i < ctx->ctx_preg_log_idx; i++) {
+            int pred_num = ctx->ctx_preg_log[i];
+            tcg_gen_mov_tl(hex_pred[pred_num], hex_new_pred_value[pred_num]);
+#if HEX_DEBUG
+            /* Do this so HELPER(debug_commit_end) will know */
+            tcg_gen_ori_tl(hex_pred_written, hex_pred_written, 1 << pred_num);
+#endif
+        }
+    }
+
+    tcg_temp_free(zero);
+    tcg_temp_free(control_reg);
+    tcg_temp_free(pval);
+}
+
+#if HEX_DEBUG
+static inline void gen_check_store_width(DisasContext *ctx, int slot_num)
+{
+    TCGv slot = tcg_const_tl(slot_num);
+    TCGv check = tcg_const_tl(ctx->ctx_store_width[slot_num]);
+    gen_helper_debug_check_store_width(cpu_env, slot, check);
+    tcg_temp_free(slot);
+    tcg_temp_free(check);
+}
+#define HEX_DEBUG_GEN_CHECK_STORE_WIDTH(ctx, slot_num) \
+    gen_check_store_width(ctx, slot_num)
+#else
+#define HEX_DEBUG_GEN_CHECK_STORE_WIDTH(ctx, slot_num)  /* nothing */
+#endif
+
+static void process_store(DisasContext *ctx, int slot_num)
+{
+    TCGv cancelled = tcg_temp_local_new();
+    TCGLabel *label_end = gen_new_label();
+
+    /* Don't do anything if the slot was cancelled */
+    gen_slot_cancelled_check(cancelled, slot_num);
+    tcg_gen_brcondi_tl(TCG_COND_NE, cancelled, 0, label_end);
+    {
+        int ctx_width = ctx->ctx_store_width[slot_num];
+        TCGv address = tcg_temp_local_new();
+        tcg_gen_mov_tl(address, hex_store_addr[slot_num]);
+
+        /*
+         * If we know the width from the DisasContext, we can
+         * generate much cleaner code.
+         * Unfortunately, not all instructions execute the fSTORE
+         * macro during code generation.  Anything that uses the
+         * generic helper will have this problem.  Instructions
+         * that use fWRAP to generate proper TCG code will be OK.
+         */
+        if (ctx_width == 1) {
+            HEX_DEBUG_GEN_CHECK_STORE_WIDTH(ctx, slot_num);
+            TCGv value = tcg_temp_new();
+            tcg_gen_mov_tl(value, hex_store_val32[slot_num]);
+            tcg_gen_qemu_st8(value, address, ctx->mem_idx);
+            tcg_temp_free(value);
+        } else if (ctx_width == 2) {
+            HEX_DEBUG_GEN_CHECK_STORE_WIDTH(ctx, slot_num);
+            TCGv value = tcg_temp_new();
+            tcg_gen_mov_tl(value, hex_store_val32[slot_num]);
+            tcg_gen_qemu_st16(value, address, ctx->mem_idx);
+            tcg_temp_free(value);
+        } else if (ctx_width == 4) {
+            HEX_DEBUG_GEN_CHECK_STORE_WIDTH(ctx, slot_num);
+            TCGv value = tcg_temp_new();
+            tcg_gen_mov_tl(value, hex_store_val32[slot_num]);
+            tcg_gen_qemu_st32(value, address, ctx->mem_idx);
+            tcg_temp_free(value);
+        } else if (ctx_width == 8) {
+            HEX_DEBUG_GEN_CHECK_STORE_WIDTH(ctx, slot_num);
+            TCGv_i64 value = tcg_temp_new_i64();
+            tcg_gen_mov_i64(value, hex_store_val64[slot_num]);
+            tcg_gen_qemu_st64(value, address, ctx->mem_idx);
+            tcg_temp_free_i64(value);
+        } else {
+            /*
+             * If we get to here, we don't know the width at
+             * TCG generation time, we'll generate branching
+             * based on the width at runtime.
+             */
+            TCGLabel *label_w2 = gen_new_label();
+            TCGLabel *label_w4 = gen_new_label();
+            TCGLabel *label_w8 = gen_new_label();
+            TCGv width = tcg_temp_local_new();
+
+            tcg_gen_mov_tl(width, hex_store_width[slot_num]);
+            tcg_gen_brcondi_tl(TCG_COND_NE, width, 1, label_w2);
+            {
+                /* Width is 1 byte */
+                TCGv value = tcg_temp_new();
+                tcg_gen_mov_tl(value, hex_store_val32[slot_num]);
+                tcg_gen_qemu_st8(value, address, ctx->mem_idx);
+                tcg_gen_br(label_end);
+                tcg_temp_free(value);
+            }
+            gen_set_label(label_w2);
+            tcg_gen_brcondi_tl(TCG_COND_NE, width, 2, label_w4);
+            {
+                /* Width is 2 bytes */
+                TCGv value = tcg_temp_new();
+                tcg_gen_mov_tl(value, hex_store_val32[slot_num]);
+                tcg_gen_qemu_st16(value, address, ctx->mem_idx);
+                tcg_gen_br(label_end);
+                tcg_temp_free(value);
+            }
+            gen_set_label(label_w4);
+            tcg_gen_brcondi_tl(TCG_COND_NE, width, 4, label_w8);
+            {
+                /* Width is 4 bytes */
+                TCGv value = tcg_temp_new();
+                tcg_gen_mov_tl(value, hex_store_val32[slot_num]);
+                tcg_gen_qemu_st32(value, address, ctx->mem_idx);
+                tcg_gen_br(label_end);
+                tcg_temp_free(value);
+            }
+            gen_set_label(label_w8);
+            {
+                /* Width is 8 bytes */
+                TCGv_i64 value = tcg_temp_new_i64();
+                tcg_gen_mov_i64(value, hex_store_val64[slot_num]);
+                tcg_gen_qemu_st64(value, address, ctx->mem_idx);
+                tcg_gen_br(label_end);
+                tcg_temp_free_i64(value);
+            }
+
+            tcg_temp_free(width);
+        }
+        tcg_temp_free(address);
+    }
+    gen_set_label(label_end);
+
+    tcg_temp_free(cancelled);
+}
+
+static void process_store_log(DisasContext *ctx, packet_t *pkt)
+{
+    /*
+     *  When a packet has two stores, the hardware processes
+     *  slot 1 and then slot 2.  This will be important when
+     *  the memory accesses overlap.
+     */
+    if (pkt->pkt_has_store_s1 && !pkt->pkt_has_dczeroa) {
+        process_store(ctx, 1);
+    }
+    if (pkt->pkt_has_store_s0 && !pkt->pkt_has_dczeroa) {
+        process_store(ctx, 0);
+    }
+}
+
+/* Zero out a 32-bit cache line */
+static void process_dczeroa(DisasContext *ctx, packet_t *pkt)
+{
+    if (pkt->pkt_has_dczeroa) {
+        /* Store 32 bytes of zero starting at (addr & ~0x1f) */
+        TCGv addr = tcg_temp_new();
+        TCGv_i64 zero = tcg_const_i64(0);
+
+        tcg_gen_andi_tl(addr, hex_dczero_addr, ~0x1f);
+        tcg_gen_qemu_st64(zero, addr, ctx->mem_idx);
+        tcg_gen_addi_tl(addr, addr, 8);
+        tcg_gen_qemu_st64(zero, addr, ctx->mem_idx);
+        tcg_gen_addi_tl(addr, addr, 8);
+        tcg_gen_qemu_st64(zero, addr, ctx->mem_idx);
+        tcg_gen_addi_tl(addr, addr, 8);
+        tcg_gen_qemu_st64(zero, addr, ctx->mem_idx);
+
+        tcg_temp_free(addr);
+        tcg_temp_free_i64(zero);
+    }
+}
+
+static bool process_change_of_flow(DisasContext *ctx, packet_t *pkt)
+{
+    if (pkt->pkt_has_cof) {
+        tcg_gen_mov_tl(hex_gpr[HEX_REG_PC], hex_next_PC);
+        return true;
+    }
+    return false;
+}
+
+static void gen_exec_counters(packet_t *pkt)
+{
+    int num_insns = pkt->num_insns;
+    int num_real_insns = 0;
+    int i;
+
+    for (i = 0; i < num_insns; i++) {
+        if (!pkt->insn[i].is_endloop &&
+            !pkt->insn[i].part1 &&
+            !GET_ATTRIB(pkt->insn[i].opcode, A_IT_NOP)) {
+            num_real_insns++;
+        }
+    }
+
+    tcg_gen_addi_tl(hex_gpr[HEX_REG_QEMU_PKT_CNT],
+                    hex_gpr[HEX_REG_QEMU_PKT_CNT], 1);
+    if (num_real_insns) {
+        tcg_gen_addi_tl(hex_gpr[HEX_REG_QEMU_INSN_CNT],
+                        hex_gpr[HEX_REG_QEMU_INSN_CNT], num_real_insns);
+    }
+}
+
+static void gen_commit_packet(DisasContext *ctx, packet_t *pkt)
+{
+    bool end_tb = false;
+
+    gen_reg_writes(ctx);
+    gen_pred_writes(ctx, pkt);
+    process_store_log(ctx, pkt);
+    process_dczeroa(ctx, pkt);
+    end_tb |= process_change_of_flow(ctx, pkt);
+    gen_exec_counters(pkt);
+#if HEX_DEBUG
+    {
+        TCGv has_st0 =
+            tcg_const_tl(pkt->pkt_has_store_s0 && !pkt->pkt_has_dczeroa);
+        TCGv has_st1 =
+            tcg_const_tl(pkt->pkt_has_store_s1 && !pkt->pkt_has_dczeroa);
+
+        /* Handy place to set a breakpoint at the end of execution */
+        gen_helper_debug_commit_end(cpu_env, has_st0, has_st1);
+
+        tcg_temp_free(has_st0);
+        tcg_temp_free(has_st1);
+    }
+#endif
+
+    if (end_tb) {
+        tcg_gen_exit_tb(NULL, 0);
+        ctx->base.is_jmp = DISAS_NORETURN;
+    }
+}
+
+static void decode_and_translate_packet(CPUHexagonState *env, DisasContext *ctx)
+{
+    uint32_t words[PACKET_WORDS_MAX];
+    int nwords;
+    packet_t pkt;
+    int i;
+
+    nwords = read_packet_words(env, ctx, words);
+    if (!nwords) {
+        return;
+    }
+
+    if (decode_this(nwords, words, &pkt)) {
+        HEX_DEBUG_PRINT_PKT(&pkt);
+        gen_start_packet(ctx, &pkt);
+        for (i = 0; i < pkt.num_insns; i++) {
+            gen_insn(env, ctx, &pkt.insn[i]);
+        }
+        gen_commit_packet(ctx, &pkt);
+        ctx->base.pc_next += pkt.encod_pkt_size_in_bytes;
+    } else {
+        gen_exception(HEX_EXCP_INVALID_PACKET);
+        ctx->base.is_jmp = DISAS_NORETURN;
+    }
+}
+
+static void hexagon_tr_init_disas_context(DisasContextBase *dcbase,
+                                          CPUState *cs)
+{
+    DisasContext *ctx = container_of(dcbase, DisasContext, base);
+
+    ctx->mem_idx = MMU_USER_IDX;
+}
+
+static void hexagon_tr_tb_start(DisasContextBase *db, CPUState *cpu)
+{
+}
+
+static void hexagon_tr_insn_start(DisasContextBase *dcbase, CPUState *cpu)
+{
+    DisasContext *ctx = container_of(dcbase, DisasContext, base);
+
+    tcg_gen_insn_start(ctx->base.pc_next);
+}
+
+static bool hexagon_tr_breakpoint_check(DisasContextBase *dcbase, CPUState *cpu,
+                                        const CPUBreakpoint *bp)
+{
+    DisasContext *ctx = container_of(dcbase, DisasContext, base);
+
+    tcg_gen_movi_tl(hex_gpr[HEX_REG_PC], ctx->base.pc_next);
+    ctx->base.is_jmp = DISAS_NORETURN;
+    gen_exception_debug();
+    /*
+     * The address covered by the breakpoint must be included in
+     * [tb->pc, tb->pc + tb->size) in order to for it to be
+     * properly cleared -- thus we increment the PC here so that
+     * the logic setting tb->size below does the right thing.
+     */
+    ctx->base.pc_next += 4;
+    return true;
+}
+
+
+static void hexagon_tr_translate_packet(DisasContextBase *dcbase, CPUState *cpu)
+{
+    DisasContext *ctx = container_of(dcbase, DisasContext, base);
+    CPUHexagonState *env = cpu->env_ptr;
+
+    decode_and_translate_packet(env, ctx);
+
+    if (ctx->base.is_jmp == DISAS_NEXT) {
+        target_ulong page_start;
+
+        page_start = ctx->base.pc_first & TARGET_PAGE_MASK;
+        if (ctx->base.pc_next - page_start >= TARGET_PAGE_SIZE) {
+            ctx->base.is_jmp = DISAS_TOO_MANY;
+        }
+
+        /*
+         * The CPU log is used to compare against LLDB single stepping,
+         * so end the TLB after every packet.
+         */
+        if (qemu_loglevel_mask(CPU_LOG_TB_CPU)) {
+            ctx->base.is_jmp = DISAS_TOO_MANY;
+        }
+#if HEX_DEBUG
+        /* When debugging, only put one packet per TB */
+        ctx->base.is_jmp = DISAS_TOO_MANY;
+#endif
+    }
+}
+
+static void hexagon_tr_tb_stop(DisasContextBase *dcbase, CPUState *cpu)
+{
+    DisasContext *ctx = container_of(dcbase, DisasContext, base);
+
+    switch (ctx->base.is_jmp) {
+    case DISAS_TOO_MANY:
+        tcg_gen_movi_tl(hex_gpr[HEX_REG_PC], ctx->base.pc_next);
+        if (ctx->base.singlestep_enabled) {
+            gen_exception_debug();
+        } else {
+            tcg_gen_exit_tb(NULL, 0);
+        }
+        break;
+    case DISAS_NORETURN:
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void hexagon_tr_disas_log(const DisasContextBase *dcbase, CPUState *cpu)
+{
+    qemu_log("IN: %s\n", lookup_symbol(dcbase->pc_first));
+    log_target_disas(cpu, dcbase->pc_first, dcbase->tb->size);
+}
+
+
+static const TranslatorOps hexagon_tr_ops = {
+    .init_disas_context = hexagon_tr_init_disas_context,
+    .tb_start           = hexagon_tr_tb_start,
+    .insn_start         = hexagon_tr_insn_start,
+    .breakpoint_check   = hexagon_tr_breakpoint_check,
+    .translate_insn     = hexagon_tr_translate_packet,
+    .tb_stop            = hexagon_tr_tb_stop,
+    .disas_log          = hexagon_tr_disas_log,
+};
+
+void gen_intermediate_code(CPUState *cs, TranslationBlock *tb, int max_insns)
+{
+    DisasContext ctx;
+
+    translator_loop(&hexagon_tr_ops, &ctx.base, cs, tb, max_insns);
+}
+
+#define NAME_LEN               64
+static char new_value_names[TOTAL_PER_THREAD_REGS][NAME_LEN];
+#if HEX_DEBUG
+static char reg_written_names[TOTAL_PER_THREAD_REGS][NAME_LEN];
+#endif
+static char new_pred_value_names[NUM_PREGS][NAME_LEN];
+static char store_addr_names[STORES_MAX][NAME_LEN];
+static char store_width_names[STORES_MAX][NAME_LEN];
+static char store_val32_names[STORES_MAX][NAME_LEN];
+static char store_val64_names[STORES_MAX][NAME_LEN];
+
+void hexagon_translate_init(void)
+{
+    int i;
+
+    opcode_init();
+
+    for (i = 0; i < TOTAL_PER_THREAD_REGS; i++) {
+        hex_gpr[i] = tcg_global_mem_new(cpu_env,
+            offsetof(CPUHexagonState, gpr[i]),
+            hexagon_regnames[i]);
+
+        sprintf(new_value_names[i], "new_%s", hexagon_regnames[i]);
+        hex_new_value[i] = tcg_global_mem_new(cpu_env,
+            offsetof(CPUHexagonState, new_value[i]),
+            new_value_names[i]);
+
+#if HEX_DEBUG
+        sprintf(reg_written_names[i], "reg_written_%s", hexagon_regnames[i]);
+        hex_reg_written[i] = tcg_global_mem_new(cpu_env,
+            offsetof(CPUHexagonState, reg_written[i]),
+            reg_written_names[i]);
+#endif
+    }
+    for (i = 0; i < NUM_PREGS; i++) {
+        hex_pred[i] = tcg_global_mem_new(cpu_env,
+            offsetof(CPUHexagonState, pred[i]),
+            hexagon_prednames[i]);
+
+        sprintf(new_pred_value_names[i], "new_pred_%s", hexagon_prednames[i]);
+        hex_new_pred_value[i] = tcg_global_mem_new(cpu_env,
+            offsetof(CPUHexagonState, new_pred_value[i]),
+            new_pred_value_names[i]);
+    }
+    hex_pred_written = tcg_global_mem_new(cpu_env,
+        offsetof(CPUHexagonState, pred_written), "pred_written");
+    hex_next_PC = tcg_global_mem_new(cpu_env,
+        offsetof(CPUHexagonState, next_PC), "next_PC");
+    hex_this_PC = tcg_global_mem_new(cpu_env,
+        offsetof(CPUHexagonState, this_PC), "this_PC");
+    hex_slot_cancelled = tcg_global_mem_new(cpu_env,
+        offsetof(CPUHexagonState, slot_cancelled), "slot_cancelled");
+    hex_branch_taken = tcg_global_mem_new(cpu_env,
+        offsetof(CPUHexagonState, branch_taken), "branch_taken");
+    hex_dczero_addr = tcg_global_mem_new(cpu_env,
+        offsetof(CPUHexagonState, dczero_addr), "dczero_addr");
+    llsc_addr = tcg_global_mem_new(cpu_env,
+        offsetof(CPUHexagonState, llsc_addr), "llsc_addr");
+    llsc_val = tcg_global_mem_new(cpu_env,
+        offsetof(CPUHexagonState, llsc_val), "llsc_val");
+    llsc_val_i64 = tcg_global_mem_new_i64(cpu_env,
+        offsetof(CPUHexagonState, llsc_val_i64), "llsc_val_i64");
+    hex_is_gather_store_insn = tcg_global_mem_new(cpu_env,
+        offsetof(CPUHexagonState, is_gather_store_insn),
+        "is_gather_store_insn");
+    hex_gather_issued = tcg_global_mem_new(cpu_env,
+        offsetof(CPUHexagonState, gather_issued), "gather_issued");
+    for (i = 0; i < STORES_MAX; i++) {
+        sprintf(store_addr_names[i], "store_addr_%d", i);
+        hex_store_addr[i] = tcg_global_mem_new(cpu_env,
+            offsetof(CPUHexagonState, mem_log_stores[i].va),
+            store_addr_names[i]);
+
+        sprintf(store_width_names[i], "store_width_%d", i);
+        hex_store_width[i] = tcg_global_mem_new(cpu_env,
+            offsetof(CPUHexagonState, mem_log_stores[i].width),
+            store_width_names[i]);
+
+        sprintf(store_val32_names[i], "store_val32_%d", i);
+        hex_store_val32[i] = tcg_global_mem_new(cpu_env,
+            offsetof(CPUHexagonState, mem_log_stores[i].data32),
+            store_val32_names[i]);
+
+        sprintf(store_val64_names[i], "store_val64_%d", i);
+        hex_store_val64[i] = tcg_global_mem_new_i64(cpu_env,
+            offsetof(CPUHexagonState, mem_log_stores[i].data64),
+            store_val64_names[i]);
+    }
+
+    init_genptr();
+}
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 52/67] Hexagon Linux user emulation
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (50 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 51/67] Hexagon translation Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-02-28 16:43 ` [RFC PATCH v2 53/67] Hexagon build infrastructure Taylor Simpson
                   ` (15 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Implementation of Linux user emulation for RISC-V
Some common files modified in addition to new files in linux-user/hexagon

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 linux-user/hexagon/sockbits.h       |  18 ++
 linux-user/hexagon/syscall_nr.h     | 346 ++++++++++++++++++++++++++++++++++++
 linux-user/hexagon/target_cpu.h     |  44 +++++
 linux-user/hexagon/target_elf.h     |  38 ++++
 linux-user/hexagon/target_fcntl.h   |  18 ++
 linux-user/hexagon/target_signal.h  |  34 ++++
 linux-user/hexagon/target_structs.h |  46 +++++
 linux-user/hexagon/target_syscall.h |  32 ++++
 linux-user/hexagon/termbits.h       |  18 ++
 linux-user/syscall_defs.h           |  33 ++++
 linux-user/elfload.c                |  16 ++
 linux-user/hexagon/cpu_loop.c       | 173 ++++++++++++++++++
 linux-user/hexagon/signal.c         | 276 ++++++++++++++++++++++++++++
 linux-user/syscall.c                |   2 +
 14 files changed, 1094 insertions(+)
 create mode 100644 linux-user/hexagon/sockbits.h
 create mode 100644 linux-user/hexagon/syscall_nr.h
 create mode 100644 linux-user/hexagon/target_cpu.h
 create mode 100644 linux-user/hexagon/target_elf.h
 create mode 100644 linux-user/hexagon/target_fcntl.h
 create mode 100644 linux-user/hexagon/target_signal.h
 create mode 100644 linux-user/hexagon/target_structs.h
 create mode 100644 linux-user/hexagon/target_syscall.h
 create mode 100644 linux-user/hexagon/termbits.h
 create mode 100644 linux-user/hexagon/cpu_loop.c
 create mode 100644 linux-user/hexagon/signal.c

diff --git a/linux-user/hexagon/sockbits.h b/linux-user/hexagon/sockbits.h
new file mode 100644
index 0000000..a6e8966
--- /dev/null
+++ b/linux-user/hexagon/sockbits.h
@@ -0,0 +1,18 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "../generic/sockbits.h"
diff --git a/linux-user/hexagon/syscall_nr.h b/linux-user/hexagon/syscall_nr.h
new file mode 100644
index 0000000..9ea38cc
--- /dev/null
+++ b/linux-user/hexagon/syscall_nr.h
@@ -0,0 +1,346 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+/*
+ * Syscall numbers
+ */
+
+#define  TARGET_NR_io_setup 0
+#define  TARGET_NR_io_destroy 1
+#define  TARGET_NR_io_submit 2
+#define  TARGET_NR_io_cancel 3
+#define  TARGET_NR_io_getevents 4
+#define  TARGET_NR_setxattr 5
+#define  TARGET_NR_lsetxattr 6
+#define  TARGET_NR_fsetxattr 7
+#define  TARGET_NR_getxattr 8
+#define  TARGET_NR_lgetxattr 9
+#define  TARGET_NR_fgetxattr 10
+#define  TARGET_NR_listxattr 11
+#define  TARGET_NR_llistxattr 12
+#define  TARGET_NR_flistxattr 13
+#define  TARGET_NR_removexattr 14
+#define  TARGET_NR_lremovexattr 15
+#define  TARGET_NR_fremovexattr 16
+#define  TARGET_NR_getcwd 17
+#define  TARGET_NR_lookup_dcookie 18
+#define  TARGET_NR_eventfd2 19
+#define  TARGET_NR_epoll_create1 20
+#define  TARGET_NR_epoll_ctl 21
+#define  TARGET_NR_epoll_pwait 22
+#define  TARGET_NR_dup 23
+#define  TARGET_NR_dup3 24
+#define  TARGET_NR3264_fcntl 25
+#define  TARGET_NR_inotify_init1 26
+#define  TARGET_NR_inotify_add_watch 27
+#define  TARGET_NR_inotify_rm_watch 28
+#define  TARGET_NR_ioctl 29
+#define  TARGET_NR_ioprio_set 30
+#define  TARGET_NR_ioprio_get 31
+#define  TARGET_NR_flock 32
+#define  TARGET_NR_mknodat 33
+#define  TARGET_NR_mkdirat 34
+#define  TARGET_NR_unlinkat 35
+#define  TARGET_NR_symlinkat 36
+#define  TARGET_NR_linkat 37
+#define  TARGET_NR_renameat 38
+#define  TARGET_NR_umount2 39
+#define  TARGET_NR_mount 40
+#define  TARGET_NR_pivot_root 41
+#define  TARGET_NR_nfsservctl 42
+#define  TARGET_NR3264_statfs 43
+#define  TARGET_NR_statfs TARGET_NR3264_statfs
+#define  TARGET_NR3264_fstatfs 44
+#define  TARGET_NR3264_truncate 45
+#define  TARGET_NR3264_ftruncate 46
+#define  TARGET_NR_fallocate 47
+#define  TARGET_NR_faccessat 48
+#define  TARGET_NR_chdir 49
+#define  TARGET_NR_fchdir 50
+#define  TARGET_NR_chroot 51
+#define  TARGET_NR_fchmod 52
+#define  TARGET_NR_fchmodat 53
+#define  TARGET_NR_fchownat 54
+#define  TARGET_NR_fchown 55
+#define  TARGET_NR_openat 56
+#define  TARGET_NR_close 57
+#define  TARGET_NR_vhangup 58
+#define  TARGET_NR_pipe2 59
+#define  TARGET_NR_quotactl 60
+#define  TARGET_NR_getdents64 61
+#define  TARGET_NR3264_lseek 62
+#define  TARGET_NR_read 63
+#define  TARGET_NR_write 64
+#define  TARGET_NR_readv 65
+#define  TARGET_NR_writev 66
+#define  TARGET_NR_pread64 67
+#define  TARGET_NR_pwrite64 68
+#define  TARGET_NR_preadv 69
+#define  TARGET_NR_pwritev 70
+#define  TARGET_NR3264_sendfile 71
+#define  TARGET_NR_pselect6 72
+#define  TARGET_NR_ppoll 73
+#define  TARGET_NR_signalfd4 74
+#define  TARGET_NR_vmsplice 75
+#define  TARGET_NR_splice 76
+#define  TARGET_NR_tee 77
+#define  TARGET_NR_readlinkat 78
+#define  TARGET_NR3264_fstatat 79
+#define  TARGET_NR3264_fstat 80
+#define  TARGET_NR_fstat  TARGET_NR3264_fstat
+#define  TARGET_NR_sync 81
+#define  TARGET_NR_fsync 82
+#define  TARGET_NR_fdatasync 83
+#define  TARGET_NR_sync_file_range 84
+#define  TARGET_NR_timerfd_create 85
+#define  TARGET_NR_timerfd_settime 86
+#define  TARGET_NR_timerfd_gettime 87
+#define  TARGET_NR_utimensat 88
+#define  TARGET_NR_acct 89
+#define  TARGET_NR_capget 90
+#define  TARGET_NR_capset 91
+#define  TARGET_NR_personality 92
+#define  TARGET_NR_exit 93
+#define  TARGET_NR_exit_group 94
+#define  TARGET_NR_waitid 95
+#define  TARGET_NR_set_tid_address 96
+#define  TARGET_NR_unshare 97
+#define  TARGET_NR_futex 98
+#define  TARGET_NR_set_robust_list 99
+#define  TARGET_NR_get_robust_list 100
+#define  TARGET_NR_nanosleep 101
+#define  TARGET_NR_getitimer 102
+#define  TARGET_NR_setitimer 103
+#define  TARGET_NR_kexec_load 104
+#define  TARGET_NR_init_module 105
+#define  TARGET_NR_delete_module 106
+#define  TARGET_NR_timer_create 107
+#define  TARGET_NR_timer_gettime 108
+#define  TARGET_NR_timer_getoverrun 109
+#define  TARGET_NR_timer_settime 110
+#define  TARGET_NR_timer_delete 111
+#define  TARGET_NR_clock_settime 112
+#define  TARGET_NR_clock_gettime 113
+#define  TARGET_NR_clock_getres 114
+#define  TARGET_NR_clock_nanosleep 115
+#define  TARGET_NR_syslog 116
+#define  TARGET_NR_ptrace 117
+#define  TARGET_NR_sched_setparam 118
+#define  TARGET_NR_sched_setscheduler 119
+#define  TARGET_NR_sched_getscheduler 120
+#define  TARGET_NR_sched_getparam 121
+#define  TARGET_NR_sched_setaffinity 122
+#define  TARGET_NR_sched_getaffinity 123
+#define  TARGET_NR_sched_yield 124
+#define  TARGET_NR_sched_get_priority_max 125
+#define  TARGET_NR_sched_get_priority_min 126
+#define  TARGET_NR_sched_rr_get_interval 127
+#define  TARGET_NR_restart_syscall 128
+#define  TARGET_NR_kill 129
+#define  TARGET_NR_tkill 130
+#define  TARGET_NR_tgkill 131
+#define  TARGET_NR_sigaltstack 132
+#define  TARGET_NR_rt_sigsuspend 133
+#define  TARGET_NR_rt_sigaction 134
+#define  TARGET_NR_rt_sigprocmask 135
+#define  TARGET_NR_rt_sigpending 136
+#define  TARGET_NR_rt_sigtimedwait 137
+#define  TARGET_NR_rt_sigqueueinfo 138
+#define  TARGET_NR_rt_sigreturn 139
+#define  TARGET_NR_setpriority 140
+#define  TARGET_NR_getpriority 141
+#define  TARGET_NR_reboot 142
+#define  TARGET_NR_setregid 143
+#define  TARGET_NR_setgid 144
+#define  TARGET_NR_setreuid 145
+#define  TARGET_NR_setuid 146
+#define  TARGET_NR_setresuid 147
+#define  TARGET_NR_getresuid 148
+#define  TARGET_NR_setresgid 149
+#define  TARGET_NR_getresgid 150
+#define  TARGET_NR_setfsuid 151
+#define  TARGET_NR_setfsgid 152
+#define  TARGET_NR_times 153
+#define  TARGET_NR_setpgid 154
+#define  TARGET_NR_getpgid 155
+#define  TARGET_NR_getsid 156
+#define  TARGET_NR_setsid 157
+#define  TARGET_NR_getgroups 158
+#define  TARGET_NR_setgroups 159
+#define  TARGET_NR_uname 160
+#define  TARGET_NR_sethostname 161
+#define  TARGET_NR_setdomainname 162
+#define  TARGET_NR_getrlimit 163
+#define  TARGET_NR_setrlimit 164
+#define  TARGET_NR_getrusage 165
+#define  TARGET_NR_umask 166
+#define  TARGET_NR_prctl 167
+#define  TARGET_NR_getcpu 168
+#define  TARGET_NR_gettimeofday 169
+#define  TARGET_NR_settimeofday 170
+#define  TARGET_NR_adjtimex 171
+#define  TARGET_NR_getpid 172
+#define  TARGET_NR_getppid 173
+#define  TARGET_NR_getuid 174
+#define  TARGET_NR_geteuid 175
+#define  TARGET_NR_getgid 176
+#define  TARGET_NR_getegid 177
+#define  TARGET_NR_gettid 178
+#define  TARGET_NR_sysinfo 179
+#define  TARGET_NR_mq_open 180
+#define  TARGET_NR_mq_unlink 181
+#define  TARGET_NR_mq_timedsend 182
+#define  TARGET_NR_mq_timedreceive 183
+#define  TARGET_NR_mq_notify 184
+#define  TARGET_NR_mq_getsetattr 185
+#define  TARGET_NR_msgget 186
+#define  TARGET_NR_msgctl 187
+#define  TARGET_NR_msgrcv 188
+#define  TARGET_NR_msgsnd 189
+#define  TARGET_NR_semget 190
+#define  TARGET_NR_semctl 191
+#define  TARGET_NR_semtimedop 192
+#define  TARGET_NR_semop 193
+#define  TARGET_NR_shmget 194
+#define  TARGET_NR_shmctl 195
+#define  TARGET_NR_shmat 196
+#define  TARGET_NR_shmdt 197
+#define  TARGET_NR_socket 198
+#define  TARGET_NR_socketpair 199
+#define  TARGET_NR_bind 200
+#define  TARGET_NR_listen 201
+#define  TARGET_NR_accept 202
+#define  TARGET_NR_connect 203
+#define  TARGET_NR_getsockname 204
+#define  TARGET_NR_getpeername 205
+#define  TARGET_NR_sendto 206
+#define  TARGET_NR_recvfrom 207
+#define  TARGET_NR_setsockopt 208
+#define  TARGET_NR_getsockopt 209
+#define  TARGET_NR_shutdown 210
+#define  TARGET_NR_sendmsg 211
+#define  TARGET_NR_recvmsg 212
+#define  TARGET_NR_readahead 213
+#define  TARGET_NR_brk 214
+#define  TARGET_NR_munmap 215
+#define  TARGET_NR_mremap 216
+#define  TARGET_NR_add_key 217
+#define  TARGET_NR_request_key 218
+#define  TARGET_NR_keyctl 219
+#define  TARGET_NR_clone 220
+#define  TARGET_NR_execve 221
+#define  TARGET_NR3264_mmap 222
+#define  TARGET_NR3264_fadvise64 223
+#define  TARGET_NR_swapon 224
+#define  TARGET_NR_swapoff 225
+#define  TARGET_NR_mprotect 226
+#define  TARGET_NR_msync 227
+#define  TARGET_NR_mlock 228
+#define  TARGET_NR_munlock 229
+#define  TARGET_NR_mlockall 230
+#define  TARGET_NR_munlockall 231
+#define  TARGET_NR_mincore 232
+#define  TARGET_NR_madvise 233
+#define  TARGET_NR_remap_file_pages 234
+#define  TARGET_NR_mbind 235
+#define  TARGET_NR_get_mempolicy 236
+#define  TARGET_NR_set_mempolicy 237
+#define  TARGET_NR_migrate_pages 238
+#define  TARGET_NR_move_pages 239
+#define  TARGET_NR_rt_tgsigqueueinfo 240
+#define  TARGET_NR_perf_event_open 241
+#define  TARGET_NR_accept4 242
+#define  TARGET_NR_recvmmsg 243
+#define  TARGET_NR_arch_specific_syscall 244
+#define  TARGET_NR_wait4 260
+#define  TARGET_NR_prlimit64 261
+#define  TARGET_NR_fanotify_init 262
+#define  TARGET_NR_fanotify_mark 263
+#define  TARGET_NR_name_to_handle_at         264
+#define  TARGET_NR_open_by_handle_at         265
+#define  TARGET_NR_clock_adjtime 266
+#define  TARGET_NR_syncfs 267
+#define  TARGET_NR_setns 268
+#define  TARGET_NR_sendmmsg 269
+#define  TARGET_NR_process_vm_readv 270
+#define  TARGET_NR_process_vm_writev 271
+#define  TARGET_NR_kcmp 272
+#define  TARGET_NR_finit_module 273
+#define  TARGET_NR_sched_setattr 274
+#define  TARGET_NR_sched_getattr 275
+#define  TARGET_NR_renameat2 276
+#define  TARGET_NR_seccomp 277
+#define  TARGET_NR_getrandom 278
+#define  TARGET_NR_memfd_create 279
+#define  TARGET_NR_bpf 280
+#define  TARGET_NR_execveat 281
+#define  TARGET_NR_open 1024
+#define  TARGET_NR_link 1025
+#define  TARGET_NR_unlink 1026
+#define  TARGET_NR_mknod 1027
+#define  TARGET_NR_chmod 1028
+#define  TARGET_NR_chown 1029
+#define  TARGET_NR_mkdir 1030
+#define  TARGET_NR_rmdir 1031
+#define  TARGET_NR_lchown 1032
+#define  TARGET_NR_access 1033
+#define  TARGET_NR_rename 1034
+#define  TARGET_NR_readlink 1035
+#define  TARGET_NR_symlink 1036
+#define  TARGET_NR_utimes 1037
+#define  TARGET_NR3264_stat 1038
+#define  TARGET_NR3264_lstat 1039
+#define  TARGET_NR_pipe 1040
+#define  TARGET_NR_dup2 1041
+#define  TARGET_NR_epoll_create 1042
+#define  TARGET_NR_inotify_init 1043
+#define  TARGET_NR_eventfd 1044
+#define  TARGET_NR_signalfd 1045
+#define  TARGET_NR_alarm 1059
+#define  TARGET_NR_getpgrp 1060
+#define  TARGET_NR_pause 1061
+#define  TARGET_NR_time 1062
+#define  TARGET_NR_utime 1063
+#define  TARGET_NR_creat 1064
+#define  TARGET_NR_getdents 1065
+#define  TARGET_NR_futimesat 1066
+#define  TARGET_NR_select 1067
+#define  TARGET_NR_poll 1068
+#define  TARGET_NR_epoll_wait 1069
+#define  TARGET_NR_ustat 1070
+#define  TARGET_NR_vfork 1071
+#define  TARGET_NR_oldwait4 1072
+#define  TARGET_NR_recv 1073
+#define  TARGET_NR_send 1074
+#define  TARGET_NR_bdflush 1075
+#define  TARGET_NR_umount 1076
+#define  TARGET_NR_uselib 1077
+#define  TARGET_NR__sysctl 1078
+#define  TARGET_NR_fork 1079
+#define  TARGET_NR_syscalls (TARGET_NR_fork + 1)
+#define  TARGET_NR_fstatfs  TARGET_NR3264_fstatfs
+#define  TARGET_NR_truncate64  TARGET_NR3264_truncate
+#define  TARGET_NR_ftruncate64  TARGET_NR3264_ftruncate
+#define  TARGET_NR_sendfile  TARGET_NR3264_sendfile
+#define  TARGET_NR_newfstatat  TARGET_NR3264_fstatat
+#define  TARGET_NR_lstat  TARGET_NR3264_lstat
+#define  TARGET_NR_fcntl64  TARGET_NR3264_fcntl
+#define  TARGET_NR__llseek  TARGET_NR3264_lseek
+#define  TARGET_NR_mmap2  TARGET_NR3264_mmap
+#define  TARGET_NR_fadvise64_64  TARGET_NR3264_fadvise64
+#define  TARGET_NR_stat64  TARGET_NR3264_stat
+
diff --git a/linux-user/hexagon/target_cpu.h b/linux-user/hexagon/target_cpu.h
new file mode 100644
index 0000000..53d45f5
--- /dev/null
+++ b/linux-user/hexagon/target_cpu.h
@@ -0,0 +1,44 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HEXAGON_TARGET_CPU_H
+#define HEXAGON_TARGET_CPU_H
+
+static inline void cpu_clone_regs_child(CPUHexagonState *env,
+                                        target_ulong newsp, unsigned flags)
+{
+    if (newsp) {
+        env->gpr[HEX_REG_SP] = newsp;
+    }
+    env->gpr[0] = 0;
+}
+
+static inline void cpu_clone_regs_parent(CPUHexagonState *env, unsigned flags)
+{
+}
+
+static inline void cpu_set_tls(CPUHexagonState *env, target_ulong newtls)
+{
+    env->gpr[HEX_REG_UGP] = newtls;
+}
+
+static inline abi_ulong get_sp_from_cpustate(CPUHexagonState *state)
+{
+    return state->gpr[HEX_REG_SP];
+}
+
+#endif
diff --git a/linux-user/hexagon/target_elf.h b/linux-user/hexagon/target_elf.h
new file mode 100644
index 0000000..6aea113
--- /dev/null
+++ b/linux-user/hexagon/target_elf.h
@@ -0,0 +1,38 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HEXAGON_TARGET_ELF_H
+#define HEXAGON_TARGET_ELF_H
+
+static inline const char *cpu_get_model(uint32_t eflags)
+{
+    /* For now, treat anything newer than v5 as a v67 */
+    /* FIXME - Disable instructions that are newer than the specified arch */
+    if (eflags == 0x04 ||    /* v5  */
+        eflags == 0x05 ||    /* v55 */
+        eflags == 0x60 ||    /* v60 */
+        eflags == 0x61 ||    /* v61 */
+        eflags == 0x62 ||    /* v62 */
+        eflags == 0x65 ||    /* v65 */
+        eflags == 0x66 ||    /* v66 */
+        eflags == 0x67) {    /* v67 */
+        return "v67";
+    }
+    return "unknown";
+}
+
+#endif
diff --git a/linux-user/hexagon/target_fcntl.h b/linux-user/hexagon/target_fcntl.h
new file mode 100644
index 0000000..08162e6
--- /dev/null
+++ b/linux-user/hexagon/target_fcntl.h
@@ -0,0 +1,18 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "../generic/fcntl.h"
diff --git a/linux-user/hexagon/target_signal.h b/linux-user/hexagon/target_signal.h
new file mode 100644
index 0000000..12a6187
--- /dev/null
+++ b/linux-user/hexagon/target_signal.h
@@ -0,0 +1,34 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HEXAGON_TARGET_SIGNAL_H
+#define HEXAGON_TARGET_SIGNAL_H
+
+typedef struct target_sigaltstack {
+    abi_ulong ss_sp;
+    abi_int ss_flags;
+    abi_ulong ss_size;
+} target_stack_t;
+
+#define TARGET_SS_ONSTACK 1
+#define TARGET_SS_DISABLE 2
+
+#define TARGET_MINSIGSTKSZ 2048
+
+#include "../generic/signal.h"
+
+#endif /* TARGET_SIGNAL_H */
diff --git a/linux-user/hexagon/target_structs.h b/linux-user/hexagon/target_structs.h
new file mode 100644
index 0000000..2e06227
--- /dev/null
+++ b/linux-user/hexagon/target_structs.h
@@ -0,0 +1,46 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+/*
+ * Hexagon specific structures for linux-user
+ */
+#ifndef HEXAGON_TARGET_STRUCTS_H
+#define HEXAGON_TARGET_STRUCTS_H
+
+struct target_ipc_perm {
+    abi_int __key;
+    abi_int uid;
+    abi_int gid;
+    abi_int cuid;
+    abi_int cgid;
+    abi_ushort mode;
+    abi_ushort __pad1;
+    abi_ushort __seq;
+};
+
+struct target_shmid_ds {
+    struct target_ipc_perm shm_perm;
+    abi_long shm_segsz;
+    abi_ulong shm_atime;
+    abi_ulong shm_dtime;
+    abi_ulong shm_ctime;
+    abi_int shm_cpid;
+    abi_int shm_lpid;
+    abi_ulong shm_nattch;
+};
+
+#endif
diff --git a/linux-user/hexagon/target_syscall.h b/linux-user/hexagon/target_syscall.h
new file mode 100644
index 0000000..a3bd307
--- /dev/null
+++ b/linux-user/hexagon/target_syscall.h
@@ -0,0 +1,32 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HEXAGON_TARGET_SYSCALL_H
+#define HEXAGON_TARGET_SYSCALL_H
+
+struct target_pt_regs {
+    abi_long sepc;
+    abi_long sp;
+};
+
+#define UNAME_MACHINE "hexagon"
+#define UNAME_MINIMUM_RELEASE "4.15.0"
+
+#define TARGET_MLOCKALL_MCL_CURRENT 1
+#define TARGET_MLOCKALL_MCL_FUTURE  2
+
+#endif
diff --git a/linux-user/hexagon/termbits.h b/linux-user/hexagon/termbits.h
new file mode 100644
index 0000000..c5f92f1
--- /dev/null
+++ b/linux-user/hexagon/termbits.h
@@ -0,0 +1,18 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "../i386/termbits.h"
diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h
index 152ec63..049379d 100644
--- a/linux-user/syscall_defs.h
+++ b/linux-user/syscall_defs.h
@@ -102,6 +102,14 @@
 #define TARGET_IOC_WRITE  2U
 #define TARGET_IOC_READ   1U
 
+#elif defined(TARGET_HEXAGON)
+
+#define TARGET_IOC_SIZEBITS     14
+
+#define TARGET_IOC_NONE   0U
+#define TARGET_IOC_WRITE  1U
+#define TARGET_IOC_READ          2U
+
 #else
 #error unsupported CPU
 #endif
@@ -2133,6 +2141,31 @@ struct target_stat64 {
     uint64_t   st_ino;
 };
 
+#elif defined(TARGET_HEXAGON)
+
+struct target_stat {
+    unsigned long long st_dev;
+    unsigned long long st_ino;
+    unsigned int st_mode;
+    unsigned int st_nlink;
+    unsigned int st_uid;
+    unsigned int st_gid;
+    unsigned long long st_rdev;
+    target_ulong __pad1;
+    long long st_size;
+    target_long st_blksize;
+    int __pad2;
+    long long st_blocks;
+
+    target_long target_st_atime;
+    target_long target_st_atime_nsec;
+    target_long target_st_mtime;
+    target_long target_st_mtime_nsec;
+    target_long target_st_ctime;
+    target_long target_st_ctime_nsec;
+    int __unused[2];
+};
+
 #else
 #error unsupported CPU
 #endif
diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index b1a895f..cafbeb2 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -1507,6 +1507,22 @@ static void elf_core_copy_regs(target_elf_gregset_t *regs,
 
 #endif /* TARGET_XTENSA */
 
+#ifdef TARGET_HEXAGON
+
+#define ELF_START_MMAP 0x20000000
+
+#define ELF_CLASS       ELFCLASS32
+#define ELF_ARCH        EM_HEXAGON
+
+static inline void init_thread(struct target_pt_regs *regs,
+                               struct image_info *infop)
+{
+    regs->sepc = infop->entry;
+    regs->sp = infop->start_stack;
+}
+
+#endif /* TARGET_HEXAGON */
+
 #ifndef ELF_PLATFORM
 #define ELF_PLATFORM (NULL)
 #endif
diff --git a/linux-user/hexagon/cpu_loop.c b/linux-user/hexagon/cpu_loop.c
new file mode 100644
index 0000000..ef4ca6c
--- /dev/null
+++ b/linux-user/hexagon/cpu_loop.c
@@ -0,0 +1,173 @@
+/*
+ *  qemu user cpu loop
+ *
+ *  Copyright (c) 2003-2008 Fabrice Bellard
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu.h"
+#include "cpu_loop-common.h"
+#include "internal.h"
+#include "translate.h"
+
+static int do_store_exclusive(CPUHexagonState *env, bool single)
+{
+    target_ulong addr;
+    target_ulong val;
+    uint64_t val_i64;
+    int segv = 0;
+    int reg;
+
+    addr = env->llsc_addr;
+    reg = env->llsc_reg;
+
+    start_exclusive();
+    mmap_lock();
+
+    env->pred[reg] = 0;
+    if (single) {
+        segv = get_user_s32(val, addr);
+    } else {
+        segv = get_user_s64(val_i64, addr);
+    }
+    if (!segv) {
+        if (single) {
+            if (val == env->llsc_val) {
+                segv = put_user_u32(env->llsc_newval, addr);
+                if (!segv) {
+                    env->pred[reg] = 0xff;
+                }
+            }
+        } else {
+            if (val_i64 == env->llsc_val_i64) {
+                segv = put_user_u64(env->llsc_newval_i64, addr);
+                if (!segv) {
+                    env->pred[reg] = 0xff;
+                }
+            }
+        }
+    }
+    env->llsc_addr = ~0;
+    if (!segv) {
+        env->next_PC += 4;
+    }
+
+    mmap_unlock();
+    end_exclusive();
+    return segv;
+}
+
+void cpu_loop(CPUHexagonState *env)
+{
+    CPUState *cs = CPU(hexagon_env_get_cpu(env));
+    target_siginfo_t info;
+    int trapnr, signum, sigcode;
+    target_ulong sigaddr;
+    target_ulong syscallnum;
+    target_ulong ret;
+
+    for (;;) {
+        cpu_exec_start(cs);
+        trapnr = cpu_exec(cs);
+        cpu_exec_end(cs);
+        process_queued_cpu_work(cs);
+
+        signum = 0;
+        sigcode = 0;
+        sigaddr = 0;
+
+        switch (trapnr) {
+        case EXCP_INTERRUPT:
+            /* just indicate that signals should be handled asap */
+            break;
+        case HEX_EXCP_TRAP0:
+            syscallnum = env->gpr[6];
+            env->gpr[HEX_REG_PC] += 4;
+#if COUNT_HEX_HELPERS
+            if (syscallnum == TARGET_NR_exit_group) {
+                print_helper_counts();
+            }
+#endif
+            ret = do_syscall(env,
+                             syscallnum,
+                             env->gpr[0],
+                             env->gpr[1],
+                             env->gpr[2],
+                             env->gpr[3],
+                             env->gpr[4],
+                             env->gpr[5],
+                             0, 0);
+            if (ret == -TARGET_ERESTARTSYS) {
+                env->gpr[HEX_REG_PC] -= 4;
+            } else if (ret != -TARGET_QEMU_ESIGRETURN) {
+                env->gpr[0] = ret;
+            }
+            break;
+        case HEX_EXCP_TRAP1:
+            EXCP_DUMP(env, "\nqemu: trap1 exception %#x - aborting\n",
+                     trapnr);
+            exit(EXIT_FAILURE);
+            break;
+        case HEX_EXCP_SC4:
+            if (do_store_exclusive(env, true)) {
+                info.si_signo = TARGET_SIGSEGV;
+                info.si_errno = 0;
+                info.si_code = TARGET_SEGV_MAPERR;
+                info._sifields._sigfault._addr = env->next_PC;
+                queue_signal(env, info.si_signo, QEMU_SI_FAULT, &info);
+            }
+            break;
+        case HEX_EXCP_SC8:
+            if (do_store_exclusive(env, false)) {
+                info.si_signo = TARGET_SIGSEGV;
+                info.si_errno = 0;
+                info.si_code = TARGET_SEGV_MAPERR;
+                info._sifields._sigfault._addr = env->next_PC;
+                queue_signal(env, info.si_signo, QEMU_SI_FAULT, &info);
+            }
+            break;
+        case HEX_EXCP_FETCH_NO_UPAGE:
+        case HEX_EXCP_PRIV_NO_UREAD:
+        case HEX_EXCP_PRIV_NO_UWRITE:
+            signum = TARGET_SIGSEGV;
+            sigcode = TARGET_SEGV_MAPERR;
+            break;
+        default:
+            EXCP_DUMP(env, "\nqemu: unhandled CPU exception %#x - aborting\n",
+                     trapnr);
+            exit(EXIT_FAILURE);
+        }
+
+        if (signum) {
+            target_siginfo_t info = {
+                .si_signo = signum,
+                .si_errno = 0,
+                .si_code = sigcode,
+                ._sifields._sigfault._addr = sigaddr
+            };
+            queue_signal(env, info.si_signo, QEMU_SI_KILL, &info);
+        }
+
+        process_pending_signals(env);
+    }
+}
+
+void target_cpu_copy_regs(CPUArchState *env, struct target_pt_regs *regs)
+{
+    env->gpr[HEX_REG_PC] = regs->sepc;
+    env->gpr[HEX_REG_SP] = regs->sp;
+}
diff --git a/linux-user/hexagon/signal.c b/linux-user/hexagon/signal.c
new file mode 100644
index 0000000..99837e1
--- /dev/null
+++ b/linux-user/hexagon/signal.c
@@ -0,0 +1,276 @@
+/*
+ *  Emulation of Linux signals
+ *
+ *  Copyright (c) 2003 Fabrice Bellard
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+#include "qemu/osdep.h"
+#include "qemu.h"
+#include "signal-common.h"
+#include "linux-user/trace.h"
+
+struct target_sigcontext {
+    target_ulong r0,  r1,  r2,  r3;
+    target_ulong r4,  r5,  r6,  r7;
+    target_ulong r8,  r9, r10, r11;
+    target_ulong r12, r13, r14, r15;
+    target_ulong r16, r17, r18, r19;
+    target_ulong r20, r21, r22, r23;
+    target_ulong r24, r25, r26, r27;
+    target_ulong r28, r29, r30, r31;
+    target_ulong sa0;
+    target_ulong lc0;
+    target_ulong sa1;
+    target_ulong lc1;
+    target_ulong m0;
+    target_ulong m1;
+    target_ulong usr;
+    target_ulong p3_0;
+    target_ulong gp;
+    target_ulong ugp;
+    target_ulong pc;
+    target_ulong cause;
+    target_ulong badva;
+    target_ulong pad1;
+    target_ulong pad2;
+    target_ulong pad3;
+};
+
+struct target_ucontext {
+    unsigned long uc_flags;
+    target_ulong uc_link; /* target pointer */
+    target_stack_t uc_stack;
+    struct target_sigcontext uc_mcontext;
+    target_sigset_t uc_sigmask;
+};
+
+struct target_rt_sigframe {
+    uint32_t tramp[2];
+    struct target_siginfo info;
+    struct target_ucontext uc;
+};
+
+static abi_ulong get_sigframe(struct target_sigaction *ka,
+                              CPUHexagonState *regs, size_t framesize)
+{
+    abi_ulong sp = get_sp_from_cpustate(regs);
+
+    /* This is the X/Open sanctioned signal stack switching.  */
+    sp = target_sigsp(sp, ka) - framesize;
+
+    sp = QEMU_ALIGN_DOWN(sp, 8);
+
+    return sp;
+}
+
+static void setup_sigcontext(struct target_sigcontext *sc, CPUHexagonState *env)
+{
+    __put_user(env->gpr[HEX_REG_R00], &sc->r0);
+    __put_user(env->gpr[HEX_REG_R01], &sc->r1);
+    __put_user(env->gpr[HEX_REG_R02], &sc->r2);
+    __put_user(env->gpr[HEX_REG_R03], &sc->r3);
+    __put_user(env->gpr[HEX_REG_R04], &sc->r4);
+    __put_user(env->gpr[HEX_REG_R05], &sc->r5);
+    __put_user(env->gpr[HEX_REG_R06], &sc->r6);
+    __put_user(env->gpr[HEX_REG_R07], &sc->r7);
+    __put_user(env->gpr[HEX_REG_R08], &sc->r8);
+    __put_user(env->gpr[HEX_REG_R09], &sc->r9);
+    __put_user(env->gpr[HEX_REG_R10], &sc->r10);
+    __put_user(env->gpr[HEX_REG_R11], &sc->r11);
+    __put_user(env->gpr[HEX_REG_R12], &sc->r12);
+    __put_user(env->gpr[HEX_REG_R13], &sc->r13);
+    __put_user(env->gpr[HEX_REG_R14], &sc->r14);
+    __put_user(env->gpr[HEX_REG_R15], &sc->r15);
+    __put_user(env->gpr[HEX_REG_R16], &sc->r16);
+    __put_user(env->gpr[HEX_REG_R17], &sc->r17);
+    __put_user(env->gpr[HEX_REG_R18], &sc->r18);
+    __put_user(env->gpr[HEX_REG_R19], &sc->r19);
+    __put_user(env->gpr[HEX_REG_R20], &sc->r20);
+    __put_user(env->gpr[HEX_REG_R21], &sc->r21);
+    __put_user(env->gpr[HEX_REG_R22], &sc->r22);
+    __put_user(env->gpr[HEX_REG_R23], &sc->r23);
+    __put_user(env->gpr[HEX_REG_R24], &sc->r24);
+    __put_user(env->gpr[HEX_REG_R25], &sc->r25);
+    __put_user(env->gpr[HEX_REG_R26], &sc->r26);
+    __put_user(env->gpr[HEX_REG_R27], &sc->r27);
+    __put_user(env->gpr[HEX_REG_R28], &sc->r28);
+    __put_user(env->gpr[HEX_REG_R29], &sc->r29);
+    __put_user(env->gpr[HEX_REG_R30], &sc->r30);
+    __put_user(env->gpr[HEX_REG_R31], &sc->r31);
+    __put_user(env->gpr[HEX_REG_SA0], &sc->sa0);
+    __put_user(env->gpr[HEX_REG_LC0], &sc->lc0);
+    __put_user(env->gpr[HEX_REG_SA1], &sc->sa1);
+    __put_user(env->gpr[HEX_REG_LC1], &sc->lc1);
+    __put_user(env->gpr[HEX_REG_M0], &sc->m0);
+    __put_user(env->gpr[HEX_REG_M1], &sc->m1);
+    __put_user(env->gpr[HEX_REG_USR], &sc->usr);
+    __put_user(env->gpr[HEX_REG_P3_0], &sc->p3_0);
+    __put_user(env->gpr[HEX_REG_GP], &sc->gp);
+    __put_user(env->gpr[HEX_REG_UGP], &sc->ugp);
+    __put_user(env->gpr[HEX_REG_PC], &sc->pc);
+}
+
+static void setup_ucontext(struct target_ucontext *uc,
+                           CPUHexagonState *env, target_sigset_t *set)
+{
+    __put_user(0,    &(uc->uc_flags));
+    __put_user(0,    &(uc->uc_link));
+
+    target_save_altstack(&uc->uc_stack, env);
+
+    int i;
+    for (i = 0; i < TARGET_NSIG_WORDS; i++) {
+        __put_user(set->sig[i], &(uc->uc_sigmask.sig[i]));
+    }
+
+    setup_sigcontext(&uc->uc_mcontext, env);
+}
+
+static inline void install_sigtramp(uint32_t *tramp)
+{
+    __put_user(0x7800d166, tramp + 0); /*  { r6=#__NR_rt_sigreturn } */
+    __put_user(0x5400c004, tramp + 1); /*  { trap0(#1) } */
+}
+
+void setup_rt_frame(int sig, struct target_sigaction *ka,
+                    target_siginfo_t *info,
+                    target_sigset_t *set, CPUHexagonState *env)
+{
+    abi_ulong frame_addr;
+    struct target_rt_sigframe *frame;
+
+    frame_addr = get_sigframe(ka, env, sizeof(*frame));
+    trace_user_setup_rt_frame(env, frame_addr);
+
+    if (!lock_user_struct(VERIFY_WRITE, frame, frame_addr, 0)) {
+        goto badframe;
+    }
+
+    setup_ucontext(&frame->uc, env, set);
+    tswap_siginfo(&frame->info, info);
+    install_sigtramp(frame->tramp);
+
+    env->gpr[HEX_REG_PC] = ka->_sa_handler;
+    env->gpr[HEX_REG_SP] = frame_addr;
+    env->gpr[HEX_REG_R00] = sig;
+    env->gpr[HEX_REG_R01] =
+        frame_addr + offsetof(struct target_rt_sigframe, info);
+    env->gpr[HEX_REG_R02] =
+        frame_addr + offsetof(struct target_rt_sigframe, uc);
+    env->gpr[HEX_REG_LR] =
+        frame_addr + offsetof(struct target_rt_sigframe, tramp);
+
+    return;
+
+badframe:
+    unlock_user_struct(frame, frame_addr, 1);
+    if (sig == TARGET_SIGSEGV) {
+        ka->_sa_handler = TARGET_SIG_DFL;
+    }
+    force_sig(TARGET_SIGSEGV);
+}
+
+static void restore_sigcontext(CPUHexagonState *env,
+                               struct target_sigcontext *sc)
+{
+    __get_user(env->gpr[HEX_REG_R00], &sc->r0);
+    __get_user(env->gpr[HEX_REG_R01], &sc->r1);
+    __get_user(env->gpr[HEX_REG_R02], &sc->r2);
+    __get_user(env->gpr[HEX_REG_R03], &sc->r3);
+    __get_user(env->gpr[HEX_REG_R04], &sc->r4);
+    __get_user(env->gpr[HEX_REG_R05], &sc->r5);
+    __get_user(env->gpr[HEX_REG_R06], &sc->r6);
+    __get_user(env->gpr[HEX_REG_R07], &sc->r7);
+    __get_user(env->gpr[HEX_REG_R08], &sc->r8);
+    __get_user(env->gpr[HEX_REG_R09], &sc->r9);
+    __get_user(env->gpr[HEX_REG_R10], &sc->r10);
+    __get_user(env->gpr[HEX_REG_R11], &sc->r11);
+    __get_user(env->gpr[HEX_REG_R12], &sc->r12);
+    __get_user(env->gpr[HEX_REG_R13], &sc->r13);
+    __get_user(env->gpr[HEX_REG_R14], &sc->r14);
+    __get_user(env->gpr[HEX_REG_R15], &sc->r15);
+    __get_user(env->gpr[HEX_REG_R16], &sc->r16);
+    __get_user(env->gpr[HEX_REG_R17], &sc->r17);
+    __get_user(env->gpr[HEX_REG_R18], &sc->r18);
+    __get_user(env->gpr[HEX_REG_R19], &sc->r19);
+    __get_user(env->gpr[HEX_REG_R20], &sc->r20);
+    __get_user(env->gpr[HEX_REG_R21], &sc->r21);
+    __get_user(env->gpr[HEX_REG_R22], &sc->r22);
+    __get_user(env->gpr[HEX_REG_R23], &sc->r23);
+    __get_user(env->gpr[HEX_REG_R24], &sc->r24);
+    __get_user(env->gpr[HEX_REG_R25], &sc->r25);
+    __get_user(env->gpr[HEX_REG_R26], &sc->r26);
+    __get_user(env->gpr[HEX_REG_R27], &sc->r27);
+    __get_user(env->gpr[HEX_REG_R28], &sc->r28);
+    __get_user(env->gpr[HEX_REG_R29], &sc->r29);
+    __get_user(env->gpr[HEX_REG_R30], &sc->r30);
+    __get_user(env->gpr[HEX_REG_R31], &sc->r31);
+    __get_user(env->gpr[HEX_REG_SA0], &sc->sa0);
+    __get_user(env->gpr[HEX_REG_LC0], &sc->lc0);
+    __get_user(env->gpr[HEX_REG_SA1], &sc->sa1);
+    __get_user(env->gpr[HEX_REG_LC1], &sc->lc1);
+    __get_user(env->gpr[HEX_REG_M0], &sc->m0);
+    __get_user(env->gpr[HEX_REG_M1], &sc->m1);
+    __get_user(env->gpr[HEX_REG_USR], &sc->usr);
+    __get_user(env->gpr[HEX_REG_P3_0], &sc->p3_0);
+    __get_user(env->gpr[HEX_REG_GP], &sc->gp);
+    __get_user(env->gpr[HEX_REG_UGP], &sc->ugp);
+    __get_user(env->gpr[HEX_REG_PC], &sc->pc);
+}
+
+static void restore_ucontext(CPUHexagonState *env, struct target_ucontext *uc)
+{
+    sigset_t blocked;
+    target_sigset_t target_set;
+    int i;
+
+    target_sigemptyset(&target_set);
+    for (i = 0; i < TARGET_NSIG_WORDS; i++) {
+        __get_user(target_set.sig[i], &(uc->uc_sigmask.sig[i]));
+    }
+
+    target_to_host_sigset_internal(&blocked, &target_set);
+    set_sigmask(&blocked);
+
+    restore_sigcontext(env, &uc->uc_mcontext);
+}
+
+long do_rt_sigreturn(CPUHexagonState *env)
+{
+    struct target_rt_sigframe *frame;
+    abi_ulong frame_addr;
+
+    frame_addr = env->gpr[HEX_REG_SP];
+    trace_user_do_sigreturn(env, frame_addr);
+    if (!lock_user_struct(VERIFY_READ, frame, frame_addr, 1)) {
+        goto badframe;
+    }
+
+    restore_ucontext(env, &frame->uc);
+
+    if (do_sigaltstack(frame_addr + offsetof(struct target_rt_sigframe,
+            uc.uc_stack), 0, get_sp_from_cpustate(env)) == -EFAULT) {
+        goto badframe;
+    }
+
+    unlock_user_struct(frame, frame_addr, 0);
+    return -TARGET_QEMU_ESIGRETURN;
+
+badframe:
+    unlock_user_struct(frame, frame_addr, 0);
+    force_sig(TARGET_SIGSEGV);
+    return 0;
+}
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 8d27d10..26cbedc 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -516,6 +516,8 @@ static inline int regpairs_aligned(void *cpu_env, int num)
 }
 #elif defined(TARGET_XTENSA)
 static inline int regpairs_aligned(void *cpu_env, int num) { return 1; }
+#elif defined(TARGET_HEXAGON)
+static inline int regpairs_aligned(void *cpu_env, int num) { return 1; }
 #else
 static inline int regpairs_aligned(void *cpu_env, int num) { return 0; }
 #endif
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 53/67] Hexagon build infrastructure
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (51 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 52/67] Hexagon Linux user emulation Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-02-28 16:43 ` [RFC PATCH v2 54/67] Hexagon - Add Hexagon Vector eXtensions (HVX) to core definition Taylor Simpson
                   ` (14 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Add file to default-configs
Change configure
Add target/hexagon/Makefile.objs
Change scripts/qemu-binfmt-conf.sh
Modify tests/tcg/configure.sh
Add reference files to tests/tcg/hexagon

At this point in the patch series, you can build a hexagon-linux-user target
and run programs on the Hexagon scalar core.  With hexagon-linux-clang
installed, "make check-tcg TIMEOUT=60" will pass.

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 configure                              |   9 +
 default-configs/hexagon-linux-user.mak |   1 +
 scripts/qemu-binfmt-conf.sh            |   6 +-
 target/hexagon/Makefile.objs           | 121 ++++++
 tests/tcg/configure.sh                 |   4 +-
 tests/tcg/hexagon/float_convs.ref      | 748 ++++++++++++++++++++++++++++++++
 tests/tcg/hexagon/float_madds.ref      | 768 +++++++++++++++++++++++++++++++++
 7 files changed, 1655 insertions(+), 2 deletions(-)
 create mode 100644 default-configs/hexagon-linux-user.mak
 create mode 100644 target/hexagon/Makefile.objs
 create mode 100644 tests/tcg/hexagon/float_convs.ref
 create mode 100644 tests/tcg/hexagon/float_madds.ref

diff --git a/configure b/configure
index 48d6f89..a10885f 100755
--- a/configure
+++ b/configure
@@ -7862,6 +7862,12 @@ case "$target_name" in
     bflt="yes"
     mttcg="yes"
   ;;
+  hexagon)
+    TARGET_BASE_ARCH=hexagon
+    TARGET_ABI_DIR=hexagon
+    mttcg=yes
+    target_compiler=$cross_cc_hexagon
+  ;;
   *)
     error_exit "Unsupported target CPU"
   ;;
@@ -8025,6 +8031,9 @@ for i in $ARCH $TARGET_BASE_ARCH ; do
   xtensa*)
     disas_config "XTENSA"
   ;;
+  hexagon)
+    disas_config "HEXAGON"
+  ;;
   esac
 done
 if test "$tcg_interpreter" = "yes" ; then
diff --git a/default-configs/hexagon-linux-user.mak b/default-configs/hexagon-linux-user.mak
new file mode 100644
index 0000000..ad55af0
--- /dev/null
+++ b/default-configs/hexagon-linux-user.mak
@@ -0,0 +1 @@
+# Default configuration for hexagon-linux-user
diff --git a/scripts/qemu-binfmt-conf.sh b/scripts/qemu-binfmt-conf.sh
index 9f1580a..7b5d54b 100755
--- a/scripts/qemu-binfmt-conf.sh
+++ b/scripts/qemu-binfmt-conf.sh
@@ -4,7 +4,7 @@
 qemu_target_list="i386 i486 alpha arm armeb sparc sparc32plus sparc64 \
 ppc ppc64 ppc64le m68k mips mipsel mipsn32 mipsn32el mips64 mips64el \
 sh4 sh4eb s390x aarch64 aarch64_be hppa riscv32 riscv64 xtensa xtensaeb \
-microblaze microblazeel or1k x86_64"
+microblaze microblazeel or1k x86_64 hexagon"
 
 i386_magic='\x7fELF\x01\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x03\x00'
 i386_mask='\xff\xff\xff\xff\xff\xfe\xfe\x00\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff\xff'
@@ -136,6 +136,10 @@ or1k_magic='\x7fELF\x01\x02\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\
 or1k_mask='\xff\xff\xff\xff\xff\xff\xff\x00\xff\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff'
 or1k_family=or1k
 
+hexagon_magic='\x7fELF\x01\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\xa4\x00'
+hexagon_mask='\xff\xff\xff\xff\xff\xff\xff\x00\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff\xff'
+hexagon_family=hexagon
+
 qemu_get_family() {
     cpu=${HOST_ARCH:-$(uname -m)}
     case "$cpu" in
diff --git a/target/hexagon/Makefile.objs b/target/hexagon/Makefile.objs
new file mode 100644
index 0000000..be0d08f
--- /dev/null
+++ b/target/hexagon/Makefile.objs
@@ -0,0 +1,121 @@
+##
+##  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+##
+##  This program is free software; you can redistribute it and/or modify
+##  it under the terms of the GNU General Public License as published by
+##  the Free Software Foundation; either version 2 of the License, or
+##  (at your option) any later version.
+##
+##  This program is distributed in the hope that it will be useful,
+##  but WITHOUT ANY WARRANTY; without even the implied warranty of
+##  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+##  GNU General Public License for more details.
+##
+##  You should have received a copy of the GNU General Public License
+##  along with this program; if not, see <http://www.gnu.org/licenses/>.
+##
+
+obj-y += \
+    cpu.o \
+    translate.o \
+    op_helper.o \
+    gdbstub.o \
+    genptr.o \
+    reg_fields.o \
+    decode.o \
+    iclass.o \
+    opcodes.o \
+    printinsn.o \
+    arch.o \
+    fma_emu.o \
+    conv_emu.o
+
+#
+#  Step 1
+#  We use a C program to create semantics_generated.pyinc
+#
+BUILD_USER_DIR = $(BUILD_DIR)/hexagon-linux-user
+GEN_SEMANTICS = gen_semantics
+GEN_SEMANTICS_SRC = $(SRC_PATH)/target/hexagon/gen_semantics.c
+
+IDEF_FILES = \
+    $(SRC_PATH)/target/hexagon/imported/alu.idef \
+    $(SRC_PATH)/target/hexagon/imported/branch.idef \
+    $(SRC_PATH)/target/hexagon/imported/compare.idef \
+    $(SRC_PATH)/target/hexagon/imported/float.idef \
+    $(SRC_PATH)/target/hexagon/imported/ldst.idef \
+    $(SRC_PATH)/target/hexagon/imported/mpy.idef \
+    $(SRC_PATH)/target/hexagon/imported/shift.idef \
+    $(SRC_PATH)/target/hexagon/imported/subinsns.idef \
+    $(SRC_PATH)/target/hexagon/imported/system.idef
+DEF_FILES = \
+    $(SRC_PATH)/target/hexagon/imported/allidefs.def \
+    $(SRC_PATH)/target/hexagon/imported/macros.def
+
+$(GEN_SEMANTICS): $(GEN_SEMANTICS_SRC) $(IDEF_FILES) $(DEF_FILES)
+	$(CC) $(CFLAGS) $(GEN_SEMANTICS_SRC) -o $(GEN_SEMANTICS)
+
+SEMANTICS=semantics_generated.pyinc
+$(SEMANTICS): $(GEN_SEMANTICS)
+	$(call quiet-command, \
+	    $(BUILD_USER_DIR)/$(GEN_SEMANTICS) $(SEMANTICS), \
+	    "GEN", $(SEMANTICS))
+
+#
+# Step 2
+# We use the do_qemu.py script to generate the following files
+#
+QEMU_DEF_H = $(BUILD_USER_DIR)/qemu_def_generated.h
+QEMU_WRAP_H = $(BUILD_USER_DIR)/qemu_wrap_generated.h
+OPCODES_DEF_H = $(BUILD_USER_DIR)/opcodes_def_generated.h
+OP_ATTRIBS_H = $(BUILD_USER_DIR)/op_attribs_generated.h
+OP_REGS_H = $(BUILD_USER_DIR)/op_regs_generated.h
+PRINTINSN_H = $(BUILD_USER_DIR)/printinsn_generated.h
+
+GENERATED_HEXAGON_FILES = \
+    $(QEMU_DEF_H) \
+    $(QEMU_WRAP_H) \
+    $(OPCODES_DEF_H) \
+    $(OP_ATTRIBS_H) \
+    $(OP_REGS_H) \
+    $(PRINTINSN_H)
+
+$(GENERATED_HEXAGON_FILES): \
+    $(SRC_PATH)/target/hexagon/do_qemu.py \
+    $(SEMANTICS) \
+    $(SRC_PATH)/target/hexagon/attribs_def.h
+	$(call quiet-command, \
+	    $(SRC_PATH)/target/hexagon/do_qemu.py \
+                $(SEMANTICS) \
+                $(SRC_PATH)/target/hexagon/attribs_def.h, \
+	    "GEN", "Hexagon generated files")
+
+#
+# Step 3
+# We use a C program to create iset.py which is imported into dectree.py
+# to create the decode tree
+#
+GEN_DECTREE_IMPORT=gen_dectree_import
+GEN_DECTREE_IMPORT_SRC = $(SRC_PATH)/target/hexagon/gen_dectree_import.c
+
+$(GEN_DECTREE_IMPORT): $(GEN_DECTREE_IMPORT_SRC) $(GENERATED_HEXAGON_FILES)
+	$(CC) $(CFLAGS) -I$(BUILD_USER_DIR) $(GEN_DECTREE_IMPORT_SRC) -o $(GEN_DECTREE_IMPORT)
+
+DECTREE_IMPORT=iset.py
+$(DECTREE_IMPORT): $(GEN_DECTREE_IMPORT)
+	$(call quiet-command, \
+	    $(BUILD_USER_DIR)/$(GEN_DECTREE_IMPORT) $(DECTREE_IMPORT), \
+	    "GEN", $(DECTREE_IMPORT))
+
+#
+# Step 4
+# We use the dectree.py script to generate the decode tree header file
+#
+DECTREE_HEADER=dectree_generated.h
+$(DECTREE_HEADER): $(SRC_PATH)/target/hexagon/dectree.py $(DECTREE_IMPORT)
+	$(call quiet-command, \
+	    $(SRC_PATH)/target/hexagon/dectree.py \
+                $(BUILD_USER_DIR), \
+	    "GEN", "Hexagon decode tree")
+
+generated-files-y += $(GENERATED_HEXAGON_FILES) $(DECTREE_HEADER)
diff --git a/tests/tcg/configure.sh b/tests/tcg/configure.sh
index 9eb6ba3..31f156c 100755
--- a/tests/tcg/configure.sh
+++ b/tests/tcg/configure.sh
@@ -60,6 +60,8 @@ fi
 : ${cross_cc_cflags_s390x="-m64"}
 : ${cross_cc_cflags_sparc="-m32 -mv8plus -mcpu=ultrasparc"}
 : ${cross_cc_cflags_sparc64="-m64 -mcpu=ultrasparc"}
+: ${cross_cc_hexagon="hexagon-linux-clang"}
+: ${cross_cc_cflags_hexagon="-mv67"}
 
 for target in $target_list; do
   arch=${target%%-*}
@@ -85,7 +87,7 @@ for target in $target_list; do
     xtensa|xtensaeb)
       arches=xtensa
       ;;
-    alpha|cris|hppa|i386|lm32|m68k|openrisc|riscv64|s390x|sh4|sparc64)
+    alpha|cris|hppa|i386|lm32|m68k|openrisc|riscv64|s390x|sh4|sparc64|hexagon)
       arches=$target
       ;;
     *)
diff --git a/tests/tcg/hexagon/float_convs.ref b/tests/tcg/hexagon/float_convs.ref
new file mode 100644
index 0000000..9ec9ffc
--- /dev/null
+++ b/tests/tcg/hexagon/float_convs.ref
@@ -0,0 +1,748 @@
+### Rounding to nearest
+from single: f32(-nan:0xffa00000)
+  to double: f64(-nan:0x00ffffffffffffffff) (INVALID)
+   to int32: -1 (INVALID)
+   to int64: -1 (INVALID)
+  to uint32: -1 (INVALID)
+  to uint64: -1 (INVALID)
+from single: f32(-nan:0xffc00000)
+  to double: f64(-nan:0x00ffffffffffffffff) (OK)
+   to int32: -1 (INVALID)
+   to int64: -1 (INVALID)
+  to uint32: -1 (INVALID)
+  to uint64: -1 (INVALID)
+from single: f32(-inf:0xff800000)
+  to double: f64(-inf:0x00fff0000000000000) (OK)
+   to int32: -2147483648 (INVALID)
+   to int64: -9223372036854775808 (INVALID)
+  to uint32: 0 (INVALID)
+  to uint64: 0 (INVALID)
+from single: f32(-0x1.fffffe00000000000000p+127:0xff7fffff)
+  to double: f64(-0x1.fffffe00000000000000p+127:0x00c7efffffe0000000) (INEXACT )
+   to int32: -2147483648 (INVALID)
+   to int64: -9223372036854775808 (INVALID)
+  to uint32: 0 (INVALID)
+  to uint64: 0 (INVALID)
+from single: f32(-0x1.1874b200000000000000p+103:0xf30c3a59)
+  to double: f64(-0x1.1874b200000000000000p+103:0x00c661874b20000000) (INEXACT )
+   to int32: -2147483648 (INVALID)
+   to int64: -9223372036854775808 (INVALID)
+  to uint32: 0 (INVALID)
+  to uint64: 0 (INVALID)
+from single: f32(-0x1.c0bab600000000000000p+99:0xf1605d5b)
+  to double: f64(-0x1.c0bab600000000000000p+99:0x00c62c0bab60000000) (INEXACT )
+   to int32: -2147483648 (INVALID)
+   to int64: -9223372036854775808 (INVALID)
+  to uint32: 0 (INVALID)
+  to uint64: 0 (INVALID)
+from single: f32(-0x1.31f75000000000000000p-40:0xab98fba8)
+  to double: f64(-0x1.31f75000000000000000p-40:0x00bd731f7500000000) (INEXACT )
+   to int32: 0 (INEXACT )
+   to int64: 0 (INEXACT )
+  to uint32: 0 (INVALID)
+  to uint64: 0 (INVALID)
+from single: f32(-0x1.50544400000000000000p-66:0x9ea82a22)
+  to double: f64(-0x1.50544400000000000000p-66:0x00bbd5054440000000) (INEXACT )
+   to int32: 0 (INEXACT )
+   to int64: 0 (INEXACT )
+  to uint32: 0 (INVALID)
+  to uint64: 0 (INVALID)
+from single: f32(-0x1.00000000000000000000p-126:0x80800000)
+  to double: f64(-0x1.00000000000000000000p-126:0x00b810000000000000) (OK)
+   to int32: 0 (INEXACT )
+   to int64: 0 (INEXACT )
+  to uint32: 0 (INVALID)
+  to uint64: 0 (INVALID)
+from single: f32(0x0.00000000000000000000p+0:0000000000)
+  to double: f64(0x0.00000000000000000000p+0:00000000000000000000) (OK)
+   to int32: 0 (OK)
+   to int64: 0 (OK)
+  to uint32: 0 (OK)
+  to uint64: 0 (OK)
+from single: f32(0x1.00000000000000000000p-126:0x00800000)
+  to double: f64(0x1.00000000000000000000p-126:0x003810000000000000) (OK)
+   to int32: 0 (INEXACT )
+   to int64: 0 (INEXACT )
+  to uint32: 0 (INEXACT )
+  to uint64: 0 (INEXACT )
+from single: f32(0x1.00000000000000000000p-25:0x33000000)
+  to double: f64(0x1.00000000000000000000p-25:0x003e60000000000000) (OK)
+   to int32: 0 (INEXACT )
+   to int64: 0 (INEXACT )
+  to uint32: 0 (INEXACT )
+  to uint64: 0 (INEXACT )
+from single: f32(0x1.ffffe600000000000000p-25:0x337ffff3)
+  to double: f64(0x1.ffffe600000000000000p-25:0x003e6ffffe60000000) (INEXACT )
+   to int32: 0 (INEXACT )
+   to int64: 0 (INEXACT )
+  to uint32: 0 (INEXACT )
+  to uint64: 0 (INEXACT )
+from single: f32(0x1.ff801a00000000000000p-15:0x387fc00d)
+  to double: f64(0x1.ff801a00000000000000p-15:0x003f0ff801a0000000) (INEXACT )
+   to int32: 0 (INEXACT )
+   to int64: 0 (INEXACT )
+  to uint32: 0 (INEXACT )
+  to uint64: 0 (INEXACT )
+from single: f32(0x1.00000c00000000000000p-14:0x38800006)
+  to double: f64(0x1.00000c00000000000000p-14:0x003f100000c0000000) (INEXACT )
+   to int32: 0 (INEXACT )
+   to int64: 0 (INEXACT )
+  to uint32: 0 (INEXACT )
+  to uint64: 0 (INEXACT )
+from single: f32(0x1.00000000000000000000p+0:0x3f800000)
+  to double: f64(0x1.00000000000000000000p+0:0x003ff0000000000000) (OK)
+   to int32: 1 (OK)
+   to int64: 1 (OK)
+  to uint32: 1 (OK)
+  to uint64: 1 (OK)
+from single: f32(0x1.00400000000000000000p+0:0x3f802000)
+  to double: f64(0x1.00400000000000000000p+0:0x003ff0040000000000) (INEXACT )
+   to int32: 1 (INEXACT )
+   to int64: 1 (INEXACT )
+  to uint32: 1 (INEXACT )
+  to uint64: 1 (INEXACT )
+from single: f32(0x1.00000000000000000000p+1:0x40000000)
+  to double: f64(0x1.00000000000000000000p+1:0x004000000000000000) (OK)
+   to int32: 2 (OK)
+   to int64: 2 (OK)
+  to uint32: 2 (OK)
+  to uint64: 2 (OK)
+from single: f32(0x1.5bf0a800000000000000p+1:0x402df854)
+  to double: f64(0x1.5bf0a800000000000000p+1:0x004005bf0a80000000) (INEXACT )
+   to int32: 2 (INEXACT )
+   to int64: 2 (INEXACT )
+  to uint32: 2 (INEXACT )
+  to uint64: 2 (INEXACT )
+from single: f32(0x1.921fb600000000000000p+1:0x40490fdb)
+  to double: f64(0x1.921fb600000000000000p+1:0x00400921fb60000000) (INEXACT )
+   to int32: 3 (INEXACT )
+   to int64: 3 (INEXACT )
+  to uint32: 3 (INEXACT )
+  to uint64: 3 (INEXACT )
+from single: f32(0x1.ffbe0000000000000000p+15:0x477fdf00)
+  to double: f64(0x1.ffbe0000000000000000p+15:0x0040effbe000000000) (INEXACT )
+   to int32: 65503 (OK)
+   to int64: 65503 (OK)
+  to uint32: 65503 (OK)
+  to uint64: 65503 (OK)
+from single: f32(0x1.ffc00000000000000000p+15:0x477fe000)
+  to double: f64(0x1.ffc00000000000000000p+15:0x0040effc0000000000) (INEXACT )
+   to int32: 65504 (OK)
+   to int64: 65504 (OK)
+  to uint32: 65504 (OK)
+  to uint64: 65504 (OK)
+from single: f32(0x1.ffc20000000000000000p+15:0x477fe100)
+  to double: f64(0x1.ffc20000000000000000p+15:0x0040effc2000000000) (INEXACT )
+   to int32: 65505 (OK)
+   to int64: 65505 (OK)
+  to uint32: 65505 (OK)
+  to uint64: 65505 (OK)
+from single: f32(0x1.ffbf0000000000000000p+16:0x47ffdf80)
+  to double: f64(0x1.ffbf0000000000000000p+16:0x0040fffbf000000000) (INEXACT )
+   to int32: 131007 (OK)
+   to int64: 131007 (OK)
+  to uint32: 131007 (OK)
+  to uint64: 131007 (OK)
+from single: f32(0x1.ffc00000000000000000p+16:0x47ffe000)
+  to double: f64(0x1.ffc00000000000000000p+16:0x0040fffc0000000000) (INEXACT )
+   to int32: 131008 (OK)
+   to int64: 131008 (OK)
+  to uint32: 131008 (OK)
+  to uint64: 131008 (OK)
+from single: f32(0x1.ffc10000000000000000p+16:0x47ffe080)
+  to double: f64(0x1.ffc10000000000000000p+16:0x0040fffc1000000000) (INEXACT )
+   to int32: 131009 (OK)
+   to int64: 131009 (OK)
+  to uint32: 131009 (OK)
+  to uint64: 131009 (OK)
+from single: f32(0x1.c0bab600000000000000p+99:0x71605d5b)
+  to double: f64(0x1.c0bab600000000000000p+99:0x00462c0bab60000000) (INEXACT )
+   to int32: 2147483647 (INVALID)
+   to int64: 9223372036854775807 (INVALID)
+  to uint32: -1 (INVALID)
+  to uint64: -1 (INVALID)
+from single: f32(0x1.fffffe00000000000000p+127:0x7f7fffff)
+  to double: f64(0x1.fffffe00000000000000p+127:0x0047efffffe0000000) (INEXACT )
+   to int32: 2147483647 (INVALID)
+   to int64: 9223372036854775807 (INVALID)
+  to uint32: -1 (INVALID)
+  to uint64: -1 (INVALID)
+from single: f32(inf:0x7f800000)
+  to double: f64(inf:0x007ff0000000000000) (OK)
+   to int32: 2147483647 (INVALID)
+   to int64: 9223372036854775807 (INVALID)
+  to uint32: -1 (INVALID)
+  to uint64: -1 (INVALID)
+from single: f32(-nan:0x7fc00000)
+  to double: f64(-nan:0x00ffffffffffffffff) (OK)
+   to int32: -1 (INVALID)
+   to int64: -1 (INVALID)
+  to uint32: -1 (INVALID)
+  to uint64: -1 (INVALID)
+from single: f32(-nan:0x7fa00000)
+  to double: f64(-nan:0x00ffffffffffffffff) (INVALID)
+   to int32: -1 (INVALID)
+   to int64: -1 (INVALID)
+  to uint32: -1 (INVALID)
+  to uint64: -1 (INVALID)
+### Rounding upwards
+from single: f32(-nan:0xffa00000)
+  to double: f64(-nan:0x00ffffffffffffffff) (INVALID)
+   to int32: -1 (INVALID)
+   to int64: -1 (INVALID)
+  to uint32: -1 (INVALID)
+  to uint64: -1 (INVALID)
+from single: f32(-nan:0xffc00000)
+  to double: f64(-nan:0x00ffffffffffffffff) (OK)
+   to int32: -1 (INVALID)
+   to int64: -1 (INVALID)
+  to uint32: -1 (INVALID)
+  to uint64: -1 (INVALID)
+from single: f32(-inf:0xff800000)
+  to double: f64(-inf:0x00fff0000000000000) (OK)
+   to int32: -2147483648 (INVALID)
+   to int64: -9223372036854775808 (INVALID)
+  to uint32: 0 (INVALID)
+  to uint64: 0 (INVALID)
+from single: f32(-0x1.fffffe00000000000000p+127:0xff7fffff)
+  to double: f64(-0x1.fffffe00000000000000p+127:0x00c7efffffe0000000) (INEXACT )
+   to int32: -2147483648 (INVALID)
+   to int64: -9223372036854775808 (INVALID)
+  to uint32: 0 (INVALID)
+  to uint64: 0 (INVALID)
+from single: f32(-0x1.1874b200000000000000p+103:0xf30c3a59)
+  to double: f64(-0x1.1874b200000000000000p+103:0x00c661874b20000000) (INEXACT )
+   to int32: -2147483648 (INVALID)
+   to int64: -9223372036854775808 (INVALID)
+  to uint32: 0 (INVALID)
+  to uint64: 0 (INVALID)
+from single: f32(-0x1.c0bab600000000000000p+99:0xf1605d5b)
+  to double: f64(-0x1.c0bab600000000000000p+99:0x00c62c0bab60000000) (INEXACT )
+   to int32: -2147483648 (INVALID)
+   to int64: -9223372036854775808 (INVALID)
+  to uint32: 0 (INVALID)
+  to uint64: 0 (INVALID)
+from single: f32(-0x1.31f75000000000000000p-40:0xab98fba8)
+  to double: f64(-0x1.31f75000000000000000p-40:0x00bd731f7500000000) (INEXACT )
+   to int32: 0 (INEXACT )
+   to int64: 0 (INEXACT )
+  to uint32: 0 (INVALID)
+  to uint64: 0 (INVALID)
+from single: f32(-0x1.50544400000000000000p-66:0x9ea82a22)
+  to double: f64(-0x1.50544400000000000000p-66:0x00bbd5054440000000) (INEXACT )
+   to int32: 0 (INEXACT )
+   to int64: 0 (INEXACT )
+  to uint32: 0 (INVALID)
+  to uint64: 0 (INVALID)
+from single: f32(-0x1.00000000000000000000p-126:0x80800000)
+  to double: f64(-0x1.00000000000000000000p-126:0x00b810000000000000) (OK)
+   to int32: 0 (INEXACT )
+   to int64: 0 (INEXACT )
+  to uint32: 0 (INVALID)
+  to uint64: 0 (INVALID)
+from single: f32(0x0.00000000000000000000p+0:0000000000)
+  to double: f64(0x0.00000000000000000000p+0:00000000000000000000) (OK)
+   to int32: 0 (OK)
+   to int64: 0 (OK)
+  to uint32: 0 (OK)
+  to uint64: 0 (OK)
+from single: f32(0x1.00000000000000000000p-126:0x00800000)
+  to double: f64(0x1.00000000000000000000p-126:0x003810000000000000) (OK)
+   to int32: 0 (INEXACT )
+   to int64: 0 (INEXACT )
+  to uint32: 0 (INEXACT )
+  to uint64: 0 (INEXACT )
+from single: f32(0x1.00000000000000000000p-25:0x33000000)
+  to double: f64(0x1.00000000000000000000p-25:0x003e60000000000000) (OK)
+   to int32: 0 (INEXACT )
+   to int64: 0 (INEXACT )
+  to uint32: 0 (INEXACT )
+  to uint64: 0 (INEXACT )
+from single: f32(0x1.ffffe600000000000000p-25:0x337ffff3)
+  to double: f64(0x1.ffffe600000000000000p-25:0x003e6ffffe60000000) (INEXACT )
+   to int32: 0 (INEXACT )
+   to int64: 0 (INEXACT )
+  to uint32: 0 (INEXACT )
+  to uint64: 0 (INEXACT )
+from single: f32(0x1.ff801a00000000000000p-15:0x387fc00d)
+  to double: f64(0x1.ff801a00000000000000p-15:0x003f0ff801a0000000) (INEXACT )
+   to int32: 0 (INEXACT )
+   to int64: 0 (INEXACT )
+  to uint32: 0 (INEXACT )
+  to uint64: 0 (INEXACT )
+from single: f32(0x1.00000c00000000000000p-14:0x38800006)
+  to double: f64(0x1.00000c00000000000000p-14:0x003f100000c0000000) (INEXACT )
+   to int32: 0 (INEXACT )
+   to int64: 0 (INEXACT )
+  to uint32: 0 (INEXACT )
+  to uint64: 0 (INEXACT )
+from single: f32(0x1.00000000000000000000p+0:0x3f800000)
+  to double: f64(0x1.00000000000000000000p+0:0x003ff0000000000000) (OK)
+   to int32: 1 (OK)
+   to int64: 1 (OK)
+  to uint32: 1 (OK)
+  to uint64: 1 (OK)
+from single: f32(0x1.00400000000000000000p+0:0x3f802000)
+  to double: f64(0x1.00400000000000000000p+0:0x003ff0040000000000) (INEXACT )
+   to int32: 1 (INEXACT )
+   to int64: 1 (INEXACT )
+  to uint32: 1 (INEXACT )
+  to uint64: 1 (INEXACT )
+from single: f32(0x1.00000000000000000000p+1:0x40000000)
+  to double: f64(0x1.00000000000000000000p+1:0x004000000000000000) (OK)
+   to int32: 2 (OK)
+   to int64: 2 (OK)
+  to uint32: 2 (OK)
+  to uint64: 2 (OK)
+from single: f32(0x1.5bf0a800000000000000p+1:0x402df854)
+  to double: f64(0x1.5bf0a800000000000000p+1:0x004005bf0a80000000) (INEXACT )
+   to int32: 2 (INEXACT )
+   to int64: 2 (INEXACT )
+  to uint32: 2 (INEXACT )
+  to uint64: 2 (INEXACT )
+from single: f32(0x1.921fb600000000000000p+1:0x40490fdb)
+  to double: f64(0x1.921fb600000000000000p+1:0x00400921fb60000000) (INEXACT )
+   to int32: 3 (INEXACT )
+   to int64: 3 (INEXACT )
+  to uint32: 3 (INEXACT )
+  to uint64: 3 (INEXACT )
+from single: f32(0x1.ffbe0000000000000000p+15:0x477fdf00)
+  to double: f64(0x1.ffbe0000000000000000p+15:0x0040effbe000000000) (INEXACT )
+   to int32: 65503 (OK)
+   to int64: 65503 (OK)
+  to uint32: 65503 (OK)
+  to uint64: 65503 (OK)
+from single: f32(0x1.ffc00000000000000000p+15:0x477fe000)
+  to double: f64(0x1.ffc00000000000000000p+15:0x0040effc0000000000) (INEXACT )
+   to int32: 65504 (OK)
+   to int64: 65504 (OK)
+  to uint32: 65504 (OK)
+  to uint64: 65504 (OK)
+from single: f32(0x1.ffc20000000000000000p+15:0x477fe100)
+  to double: f64(0x1.ffc20000000000000000p+15:0x0040effc2000000000) (INEXACT )
+   to int32: 65505 (OK)
+   to int64: 65505 (OK)
+  to uint32: 65505 (OK)
+  to uint64: 65505 (OK)
+from single: f32(0x1.ffbf0000000000000000p+16:0x47ffdf80)
+  to double: f64(0x1.ffbf0000000000000000p+16:0x0040fffbf000000000) (INEXACT )
+   to int32: 131007 (OK)
+   to int64: 131007 (OK)
+  to uint32: 131007 (OK)
+  to uint64: 131007 (OK)
+from single: f32(0x1.ffc00000000000000000p+16:0x47ffe000)
+  to double: f64(0x1.ffc00000000000000000p+16:0x0040fffc0000000000) (INEXACT )
+   to int32: 131008 (OK)
+   to int64: 131008 (OK)
+  to uint32: 131008 (OK)
+  to uint64: 131008 (OK)
+from single: f32(0x1.ffc10000000000000000p+16:0x47ffe080)
+  to double: f64(0x1.ffc10000000000000000p+16:0x0040fffc1000000000) (INEXACT )
+   to int32: 131009 (OK)
+   to int64: 131009 (OK)
+  to uint32: 131009 (OK)
+  to uint64: 131009 (OK)
+from single: f32(0x1.c0bab600000000000000p+99:0x71605d5b)
+  to double: f64(0x1.c0bab600000000000000p+99:0x00462c0bab60000000) (INEXACT )
+   to int32: 2147483647 (INVALID)
+   to int64: 9223372036854775807 (INVALID)
+  to uint32: -1 (INVALID)
+  to uint64: -1 (INVALID)
+from single: f32(0x1.fffffe00000000000000p+127:0x7f7fffff)
+  to double: f64(0x1.fffffe00000000000000p+127:0x0047efffffe0000000) (INEXACT )
+   to int32: 2147483647 (INVALID)
+   to int64: 9223372036854775807 (INVALID)
+  to uint32: -1 (INVALID)
+  to uint64: -1 (INVALID)
+from single: f32(inf:0x7f800000)
+  to double: f64(inf:0x007ff0000000000000) (OK)
+   to int32: 2147483647 (INVALID)
+   to int64: 9223372036854775807 (INVALID)
+  to uint32: -1 (INVALID)
+  to uint64: -1 (INVALID)
+from single: f32(-nan:0x7fc00000)
+  to double: f64(-nan:0x00ffffffffffffffff) (OK)
+   to int32: -1 (INVALID)
+   to int64: -1 (INVALID)
+  to uint32: -1 (INVALID)
+  to uint64: -1 (INVALID)
+from single: f32(-nan:0x7fa00000)
+  to double: f64(-nan:0x00ffffffffffffffff) (INVALID)
+   to int32: -1 (INVALID)
+   to int64: -1 (INVALID)
+  to uint32: -1 (INVALID)
+  to uint64: -1 (INVALID)
+### Rounding downwards
+from single: f32(-nan:0xffa00000)
+  to double: f64(-nan:0x00ffffffffffffffff) (INVALID)
+   to int32: -1 (INVALID)
+   to int64: -1 (INVALID)
+  to uint32: -1 (INVALID)
+  to uint64: -1 (INVALID)
+from single: f32(-nan:0xffc00000)
+  to double: f64(-nan:0x00ffffffffffffffff) (OK)
+   to int32: -1 (INVALID)
+   to int64: -1 (INVALID)
+  to uint32: -1 (INVALID)
+  to uint64: -1 (INVALID)
+from single: f32(-inf:0xff800000)
+  to double: f64(-inf:0x00fff0000000000000) (OK)
+   to int32: -2147483648 (INVALID)
+   to int64: -9223372036854775808 (INVALID)
+  to uint32: 0 (INVALID)
+  to uint64: 0 (INVALID)
+from single: f32(-0x1.fffffe00000000000000p+127:0xff7fffff)
+  to double: f64(-0x1.fffffe00000000000000p+127:0x00c7efffffe0000000) (INEXACT )
+   to int32: -2147483648 (INVALID)
+   to int64: -9223372036854775808 (INVALID)
+  to uint32: 0 (INVALID)
+  to uint64: 0 (INVALID)
+from single: f32(-0x1.1874b200000000000000p+103:0xf30c3a59)
+  to double: f64(-0x1.1874b200000000000000p+103:0x00c661874b20000000) (INEXACT )
+   to int32: -2147483648 (INVALID)
+   to int64: -9223372036854775808 (INVALID)
+  to uint32: 0 (INVALID)
+  to uint64: 0 (INVALID)
+from single: f32(-0x1.c0bab600000000000000p+99:0xf1605d5b)
+  to double: f64(-0x1.c0bab600000000000000p+99:0x00c62c0bab60000000) (INEXACT )
+   to int32: -2147483648 (INVALID)
+   to int64: -9223372036854775808 (INVALID)
+  to uint32: 0 (INVALID)
+  to uint64: 0 (INVALID)
+from single: f32(-0x1.31f75000000000000000p-40:0xab98fba8)
+  to double: f64(-0x1.31f75000000000000000p-40:0x00bd731f7500000000) (INEXACT )
+   to int32: 0 (INEXACT )
+   to int64: 0 (INEXACT )
+  to uint32: 0 (INVALID)
+  to uint64: 0 (INVALID)
+from single: f32(-0x1.50544400000000000000p-66:0x9ea82a22)
+  to double: f64(-0x1.50544400000000000000p-66:0x00bbd5054440000000) (INEXACT )
+   to int32: 0 (INEXACT )
+   to int64: 0 (INEXACT )
+  to uint32: 0 (INVALID)
+  to uint64: 0 (INVALID)
+from single: f32(-0x1.00000000000000000000p-126:0x80800000)
+  to double: f64(-0x1.00000000000000000000p-126:0x00b810000000000000) (OK)
+   to int32: 0 (INEXACT )
+   to int64: 0 (INEXACT )
+  to uint32: 0 (INVALID)
+  to uint64: 0 (INVALID)
+from single: f32(0x0.00000000000000000000p+0:0000000000)
+  to double: f64(0x0.00000000000000000000p+0:00000000000000000000) (OK)
+   to int32: 0 (OK)
+   to int64: 0 (OK)
+  to uint32: 0 (OK)
+  to uint64: 0 (OK)
+from single: f32(0x1.00000000000000000000p-126:0x00800000)
+  to double: f64(0x1.00000000000000000000p-126:0x003810000000000000) (OK)
+   to int32: 0 (INEXACT )
+   to int64: 0 (INEXACT )
+  to uint32: 0 (INEXACT )
+  to uint64: 0 (INEXACT )
+from single: f32(0x1.00000000000000000000p-25:0x33000000)
+  to double: f64(0x1.00000000000000000000p-25:0x003e60000000000000) (OK)
+   to int32: 0 (INEXACT )
+   to int64: 0 (INEXACT )
+  to uint32: 0 (INEXACT )
+  to uint64: 0 (INEXACT )
+from single: f32(0x1.ffffe600000000000000p-25:0x337ffff3)
+  to double: f64(0x1.ffffe600000000000000p-25:0x003e6ffffe60000000) (INEXACT )
+   to int32: 0 (INEXACT )
+   to int64: 0 (INEXACT )
+  to uint32: 0 (INEXACT )
+  to uint64: 0 (INEXACT )
+from single: f32(0x1.ff801a00000000000000p-15:0x387fc00d)
+  to double: f64(0x1.ff801a00000000000000p-15:0x003f0ff801a0000000) (INEXACT )
+   to int32: 0 (INEXACT )
+   to int64: 0 (INEXACT )
+  to uint32: 0 (INEXACT )
+  to uint64: 0 (INEXACT )
+from single: f32(0x1.00000c00000000000000p-14:0x38800006)
+  to double: f64(0x1.00000c00000000000000p-14:0x003f100000c0000000) (INEXACT )
+   to int32: 0 (INEXACT )
+   to int64: 0 (INEXACT )
+  to uint32: 0 (INEXACT )
+  to uint64: 0 (INEXACT )
+from single: f32(0x1.00000000000000000000p+0:0x3f800000)
+  to double: f64(0x1.00000000000000000000p+0:0x003ff0000000000000) (OK)
+   to int32: 1 (OK)
+   to int64: 1 (OK)
+  to uint32: 1 (OK)
+  to uint64: 1 (OK)
+from single: f32(0x1.00400000000000000000p+0:0x3f802000)
+  to double: f64(0x1.00400000000000000000p+0:0x003ff0040000000000) (INEXACT )
+   to int32: 1 (INEXACT )
+   to int64: 1 (INEXACT )
+  to uint32: 1 (INEXACT )
+  to uint64: 1 (INEXACT )
+from single: f32(0x1.00000000000000000000p+1:0x40000000)
+  to double: f64(0x1.00000000000000000000p+1:0x004000000000000000) (OK)
+   to int32: 2 (OK)
+   to int64: 2 (OK)
+  to uint32: 2 (OK)
+  to uint64: 2 (OK)
+from single: f32(0x1.5bf0a800000000000000p+1:0x402df854)
+  to double: f64(0x1.5bf0a800000000000000p+1:0x004005bf0a80000000) (INEXACT )
+   to int32: 2 (INEXACT )
+   to int64: 2 (INEXACT )
+  to uint32: 2 (INEXACT )
+  to uint64: 2 (INEXACT )
+from single: f32(0x1.921fb600000000000000p+1:0x40490fdb)
+  to double: f64(0x1.921fb600000000000000p+1:0x00400921fb60000000) (INEXACT )
+   to int32: 3 (INEXACT )
+   to int64: 3 (INEXACT )
+  to uint32: 3 (INEXACT )
+  to uint64: 3 (INEXACT )
+from single: f32(0x1.ffbe0000000000000000p+15:0x477fdf00)
+  to double: f64(0x1.ffbe0000000000000000p+15:0x0040effbe000000000) (INEXACT )
+   to int32: 65503 (OK)
+   to int64: 65503 (OK)
+  to uint32: 65503 (OK)
+  to uint64: 65503 (OK)
+from single: f32(0x1.ffc00000000000000000p+15:0x477fe000)
+  to double: f64(0x1.ffc00000000000000000p+15:0x0040effc0000000000) (INEXACT )
+   to int32: 65504 (OK)
+   to int64: 65504 (OK)
+  to uint32: 65504 (OK)
+  to uint64: 65504 (OK)
+from single: f32(0x1.ffc20000000000000000p+15:0x477fe100)
+  to double: f64(0x1.ffc20000000000000000p+15:0x0040effc2000000000) (INEXACT )
+   to int32: 65505 (OK)
+   to int64: 65505 (OK)
+  to uint32: 65505 (OK)
+  to uint64: 65505 (OK)
+from single: f32(0x1.ffbf0000000000000000p+16:0x47ffdf80)
+  to double: f64(0x1.ffbf0000000000000000p+16:0x0040fffbf000000000) (INEXACT )
+   to int32: 131007 (OK)
+   to int64: 131007 (OK)
+  to uint32: 131007 (OK)
+  to uint64: 131007 (OK)
+from single: f32(0x1.ffc00000000000000000p+16:0x47ffe000)
+  to double: f64(0x1.ffc00000000000000000p+16:0x0040fffc0000000000) (INEXACT )
+   to int32: 131008 (OK)
+   to int64: 131008 (OK)
+  to uint32: 131008 (OK)
+  to uint64: 131008 (OK)
+from single: f32(0x1.ffc10000000000000000p+16:0x47ffe080)
+  to double: f64(0x1.ffc10000000000000000p+16:0x0040fffc1000000000) (INEXACT )
+   to int32: 131009 (OK)
+   to int64: 131009 (OK)
+  to uint32: 131009 (OK)
+  to uint64: 131009 (OK)
+from single: f32(0x1.c0bab600000000000000p+99:0x71605d5b)
+  to double: f64(0x1.c0bab600000000000000p+99:0x00462c0bab60000000) (INEXACT )
+   to int32: 2147483647 (INVALID)
+   to int64: 9223372036854775807 (INVALID)
+  to uint32: -1 (INVALID)
+  to uint64: -1 (INVALID)
+from single: f32(0x1.fffffe00000000000000p+127:0x7f7fffff)
+  to double: f64(0x1.fffffe00000000000000p+127:0x0047efffffe0000000) (INEXACT )
+   to int32: 2147483647 (INVALID)
+   to int64: 9223372036854775807 (INVALID)
+  to uint32: -1 (INVALID)
+  to uint64: -1 (INVALID)
+from single: f32(inf:0x7f800000)
+  to double: f64(inf:0x007ff0000000000000) (OK)
+   to int32: 2147483647 (INVALID)
+   to int64: 9223372036854775807 (INVALID)
+  to uint32: -1 (INVALID)
+  to uint64: -1 (INVALID)
+from single: f32(-nan:0x7fc00000)
+  to double: f64(-nan:0x00ffffffffffffffff) (OK)
+   to int32: -1 (INVALID)
+   to int64: -1 (INVALID)
+  to uint32: -1 (INVALID)
+  to uint64: -1 (INVALID)
+from single: f32(-nan:0x7fa00000)
+  to double: f64(-nan:0x00ffffffffffffffff) (INVALID)
+   to int32: -1 (INVALID)
+   to int64: -1 (INVALID)
+  to uint32: -1 (INVALID)
+  to uint64: -1 (INVALID)
+### Rounding to zero
+from single: f32(-nan:0xffa00000)
+  to double: f64(-nan:0x00ffffffffffffffff) (INVALID)
+   to int32: -1 (INVALID)
+   to int64: -1 (INVALID)
+  to uint32: -1 (INVALID)
+  to uint64: -1 (INVALID)
+from single: f32(-nan:0xffc00000)
+  to double: f64(-nan:0x00ffffffffffffffff) (OK)
+   to int32: -1 (INVALID)
+   to int64: -1 (INVALID)
+  to uint32: -1 (INVALID)
+  to uint64: -1 (INVALID)
+from single: f32(-inf:0xff800000)
+  to double: f64(-inf:0x00fff0000000000000) (OK)
+   to int32: -2147483648 (INVALID)
+   to int64: -9223372036854775808 (INVALID)
+  to uint32: 0 (INVALID)
+  to uint64: 0 (INVALID)
+from single: f32(-0x1.fffffe00000000000000p+127:0xff7fffff)
+  to double: f64(-0x1.fffffe00000000000000p+127:0x00c7efffffe0000000) (INEXACT )
+   to int32: -2147483648 (INVALID)
+   to int64: -9223372036854775808 (INVALID)
+  to uint32: 0 (INVALID)
+  to uint64: 0 (INVALID)
+from single: f32(-0x1.1874b200000000000000p+103:0xf30c3a59)
+  to double: f64(-0x1.1874b200000000000000p+103:0x00c661874b20000000) (INEXACT )
+   to int32: -2147483648 (INVALID)
+   to int64: -9223372036854775808 (INVALID)
+  to uint32: 0 (INVALID)
+  to uint64: 0 (INVALID)
+from single: f32(-0x1.c0bab600000000000000p+99:0xf1605d5b)
+  to double: f64(-0x1.c0bab600000000000000p+99:0x00c62c0bab60000000) (INEXACT )
+   to int32: -2147483648 (INVALID)
+   to int64: -9223372036854775808 (INVALID)
+  to uint32: 0 (INVALID)
+  to uint64: 0 (INVALID)
+from single: f32(-0x1.31f75000000000000000p-40:0xab98fba8)
+  to double: f64(-0x1.31f75000000000000000p-40:0x00bd731f7500000000) (INEXACT )
+   to int32: 0 (INEXACT )
+   to int64: 0 (INEXACT )
+  to uint32: 0 (INVALID)
+  to uint64: 0 (INVALID)
+from single: f32(-0x1.50544400000000000000p-66:0x9ea82a22)
+  to double: f64(-0x1.50544400000000000000p-66:0x00bbd5054440000000) (INEXACT )
+   to int32: 0 (INEXACT )
+   to int64: 0 (INEXACT )
+  to uint32: 0 (INVALID)
+  to uint64: 0 (INVALID)
+from single: f32(-0x1.00000000000000000000p-126:0x80800000)
+  to double: f64(-0x1.00000000000000000000p-126:0x00b810000000000000) (OK)
+   to int32: 0 (INEXACT )
+   to int64: 0 (INEXACT )
+  to uint32: 0 (INVALID)
+  to uint64: 0 (INVALID)
+from single: f32(0x0.00000000000000000000p+0:0000000000)
+  to double: f64(0x0.00000000000000000000p+0:00000000000000000000) (OK)
+   to int32: 0 (OK)
+   to int64: 0 (OK)
+  to uint32: 0 (OK)
+  to uint64: 0 (OK)
+from single: f32(0x1.00000000000000000000p-126:0x00800000)
+  to double: f64(0x1.00000000000000000000p-126:0x003810000000000000) (OK)
+   to int32: 0 (INEXACT )
+   to int64: 0 (INEXACT )
+  to uint32: 0 (INEXACT )
+  to uint64: 0 (INEXACT )
+from single: f32(0x1.00000000000000000000p-25:0x33000000)
+  to double: f64(0x1.00000000000000000000p-25:0x003e60000000000000) (OK)
+   to int32: 0 (INEXACT )
+   to int64: 0 (INEXACT )
+  to uint32: 0 (INEXACT )
+  to uint64: 0 (INEXACT )
+from single: f32(0x1.ffffe600000000000000p-25:0x337ffff3)
+  to double: f64(0x1.ffffe600000000000000p-25:0x003e6ffffe60000000) (INEXACT )
+   to int32: 0 (INEXACT )
+   to int64: 0 (INEXACT )
+  to uint32: 0 (INEXACT )
+  to uint64: 0 (INEXACT )
+from single: f32(0x1.ff801a00000000000000p-15:0x387fc00d)
+  to double: f64(0x1.ff801a00000000000000p-15:0x003f0ff801a0000000) (INEXACT )
+   to int32: 0 (INEXACT )
+   to int64: 0 (INEXACT )
+  to uint32: 0 (INEXACT )
+  to uint64: 0 (INEXACT )
+from single: f32(0x1.00000c00000000000000p-14:0x38800006)
+  to double: f64(0x1.00000c00000000000000p-14:0x003f100000c0000000) (INEXACT )
+   to int32: 0 (INEXACT )
+   to int64: 0 (INEXACT )
+  to uint32: 0 (INEXACT )
+  to uint64: 0 (INEXACT )
+from single: f32(0x1.00000000000000000000p+0:0x3f800000)
+  to double: f64(0x1.00000000000000000000p+0:0x003ff0000000000000) (OK)
+   to int32: 1 (OK)
+   to int64: 1 (OK)
+  to uint32: 1 (OK)
+  to uint64: 1 (OK)
+from single: f32(0x1.00400000000000000000p+0:0x3f802000)
+  to double: f64(0x1.00400000000000000000p+0:0x003ff0040000000000) (INEXACT )
+   to int32: 1 (INEXACT )
+   to int64: 1 (INEXACT )
+  to uint32: 1 (INEXACT )
+  to uint64: 1 (INEXACT )
+from single: f32(0x1.00000000000000000000p+1:0x40000000)
+  to double: f64(0x1.00000000000000000000p+1:0x004000000000000000) (OK)
+   to int32: 2 (OK)
+   to int64: 2 (OK)
+  to uint32: 2 (OK)
+  to uint64: 2 (OK)
+from single: f32(0x1.5bf0a800000000000000p+1:0x402df854)
+  to double: f64(0x1.5bf0a800000000000000p+1:0x004005bf0a80000000) (INEXACT )
+   to int32: 2 (INEXACT )
+   to int64: 2 (INEXACT )
+  to uint32: 2 (INEXACT )
+  to uint64: 2 (INEXACT )
+from single: f32(0x1.921fb600000000000000p+1:0x40490fdb)
+  to double: f64(0x1.921fb600000000000000p+1:0x00400921fb60000000) (INEXACT )
+   to int32: 3 (INEXACT )
+   to int64: 3 (INEXACT )
+  to uint32: 3 (INEXACT )
+  to uint64: 3 (INEXACT )
+from single: f32(0x1.ffbe0000000000000000p+15:0x477fdf00)
+  to double: f64(0x1.ffbe0000000000000000p+15:0x0040effbe000000000) (INEXACT )
+   to int32: 65503 (OK)
+   to int64: 65503 (OK)
+  to uint32: 65503 (OK)
+  to uint64: 65503 (OK)
+from single: f32(0x1.ffc00000000000000000p+15:0x477fe000)
+  to double: f64(0x1.ffc00000000000000000p+15:0x0040effc0000000000) (INEXACT )
+   to int32: 65504 (OK)
+   to int64: 65504 (OK)
+  to uint32: 65504 (OK)
+  to uint64: 65504 (OK)
+from single: f32(0x1.ffc20000000000000000p+15:0x477fe100)
+  to double: f64(0x1.ffc20000000000000000p+15:0x0040effc2000000000) (INEXACT )
+   to int32: 65505 (OK)
+   to int64: 65505 (OK)
+  to uint32: 65505 (OK)
+  to uint64: 65505 (OK)
+from single: f32(0x1.ffbf0000000000000000p+16:0x47ffdf80)
+  to double: f64(0x1.ffbf0000000000000000p+16:0x0040fffbf000000000) (INEXACT )
+   to int32: 131007 (OK)
+   to int64: 131007 (OK)
+  to uint32: 131007 (OK)
+  to uint64: 131007 (OK)
+from single: f32(0x1.ffc00000000000000000p+16:0x47ffe000)
+  to double: f64(0x1.ffc00000000000000000p+16:0x0040fffc0000000000) (INEXACT )
+   to int32: 131008 (OK)
+   to int64: 131008 (OK)
+  to uint32: 131008 (OK)
+  to uint64: 131008 (OK)
+from single: f32(0x1.ffc10000000000000000p+16:0x47ffe080)
+  to double: f64(0x1.ffc10000000000000000p+16:0x0040fffc1000000000) (INEXACT )
+   to int32: 131009 (OK)
+   to int64: 131009 (OK)
+  to uint32: 131009 (OK)
+  to uint64: 131009 (OK)
+from single: f32(0x1.c0bab600000000000000p+99:0x71605d5b)
+  to double: f64(0x1.c0bab600000000000000p+99:0x00462c0bab60000000) (INEXACT )
+   to int32: 2147483647 (INVALID)
+   to int64: 9223372036854775807 (INVALID)
+  to uint32: -1 (INVALID)
+  to uint64: -1 (INVALID)
+from single: f32(0x1.fffffe00000000000000p+127:0x7f7fffff)
+  to double: f64(0x1.fffffe00000000000000p+127:0x0047efffffe0000000) (INEXACT )
+   to int32: 2147483647 (INVALID)
+   to int64: 9223372036854775807 (INVALID)
+  to uint32: -1 (INVALID)
+  to uint64: -1 (INVALID)
+from single: f32(inf:0x7f800000)
+  to double: f64(inf:0x007ff0000000000000) (OK)
+   to int32: 2147483647 (INVALID)
+   to int64: 9223372036854775807 (INVALID)
+  to uint32: -1 (INVALID)
+  to uint64: -1 (INVALID)
+from single: f32(-nan:0x7fc00000)
+  to double: f64(-nan:0x00ffffffffffffffff) (OK)
+   to int32: -1 (INVALID)
+   to int64: -1 (INVALID)
+  to uint32: -1 (INVALID)
+  to uint64: -1 (INVALID)
+from single: f32(-nan:0x7fa00000)
+  to double: f64(-nan:0x00ffffffffffffffff) (INVALID)
+   to int32: -1 (INVALID)
+   to int64: -1 (INVALID)
+  to uint32: -1 (INVALID)
+  to uint64: -1 (INVALID)
diff --git a/tests/tcg/hexagon/float_madds.ref b/tests/tcg/hexagon/float_madds.ref
new file mode 100644
index 0000000..ceed3bb
--- /dev/null
+++ b/tests/tcg/hexagon/float_madds.ref
@@ -0,0 +1,768 @@
+### Rounding to nearest
+op : f32(-nan:0xffa00000) * f32(-nan:0xffc00000) + f32(-inf:0xff800000)
+res: f32(-nan:0xffffffff) flags=INVALID (0/0)
+op : f32(-nan:0xffc00000) * f32(-inf:0xff800000) + f32(-nan:0xffa00000)
+res: f32(-nan:0xffffffff) flags=INVALID (0/1)
+op : f32(-inf:0xff800000) * f32(-nan:0xffa00000) + f32(-nan:0xffc00000)
+res: f32(-nan:0xffffffff) flags=INVALID (0/2)
+op : f32(-nan:0xffc00000) * f32(-inf:0xff800000) + f32(-0x1.fffffe00000000000000p+127:0xff7fffff)
+res: f32(-nan:0xffffffff) flags=OK (1/0)
+op : f32(-inf:0xff800000) * f32(-0x1.fffffe00000000000000p+127:0xff7fffff) + f32(-nan:0xffc00000)
+res: f32(-nan:0xffffffff) flags=OK (1/1)
+op : f32(-0x1.fffffe00000000000000p+127:0xff7fffff) * f32(-nan:0xffc00000) + f32(-inf:0xff800000)
+res: f32(-nan:0xffffffff) flags=OK (1/2)
+op : f32(-inf:0xff800000) * f32(-0x1.fffffe00000000000000p+127:0xff7fffff) + f32(-0x1.1874b200000000000000p+103:0xf30c3a59)
+res: f32(inf:0x7f800000) flags=OK (2/0)
+op : f32(-0x1.fffffe00000000000000p+127:0xff7fffff) * f32(-0x1.1874b200000000000000p+103:0xf30c3a59) + f32(-inf:0xff800000)
+res: f32(-inf:0xff800000) flags=OK (2/1)
+op : f32(-0x1.1874b200000000000000p+103:0xf30c3a59) * f32(-inf:0xff800000) + f32(-0x1.fffffe00000000000000p+127:0xff7fffff)
+res: f32(inf:0x7f800000) flags=OK (2/2)
+op : f32(-0x1.fffffe00000000000000p+127:0xff7fffff) * f32(-0x1.1874b200000000000000p+103:0xf30c3a59) + f32(-0x1.c0bab600000000000000p+99:0xf1605d5b)
+res: f32(inf:0x7f800000) flags=OVERFLOW INEXACT  (3/0)
+op : f32(-0x1.1874b200000000000000p+103:0xf30c3a59) * f32(-0x1.c0bab600000000000000p+99:0xf1605d5b) + f32(-0x1.fffffe00000000000000p+127:0xff7fffff)
+res: f32(inf:0x7f800000) flags=OVERFLOW INEXACT  (3/1)
+op : f32(-0x1.c0bab600000000000000p+99:0xf1605d5b) * f32(-0x1.fffffe00000000000000p+127:0xff7fffff) + f32(-0x1.1874b200000000000000p+103:0xf30c3a59)
+res: f32(inf:0x7f800000) flags=OVERFLOW INEXACT  (3/2)
+op : f32(-0x1.1874b200000000000000p+103:0xf30c3a59) * f32(-0x1.c0bab600000000000000p+99:0xf1605d5b) + f32(-0x1.31f75000000000000000p-40:0xab98fba8)
+res: f32(inf:0x7f800000) flags=OVERFLOW INEXACT  (4/0)
+op : f32(-0x1.c0bab600000000000000p+99:0xf1605d5b) * f32(-0x1.31f75000000000000000p-40:0xab98fba8) + f32(-0x1.1874b200000000000000p+103:0xf30c3a59)
+res: f32(-0x1.1874b200000000000000p+103:0xf30c3a59) flags=INEXACT  (4/1)
+op : f32(-0x1.31f75000000000000000p-40:0xab98fba8) * f32(-0x1.1874b200000000000000p+103:0xf30c3a59) + f32(-0x1.c0bab600000000000000p+99:0xf1605d5b)
+res: f32(-0x1.c0bab600000000000000p+99:0xf1605d5b) flags=INEXACT  (4/2)
+op : f32(-0x1.c0bab600000000000000p+99:0xf1605d5b) * f32(-0x1.31f75000000000000000p-40:0xab98fba8) + f32(-0x1.50544400000000000000p-66:0x9ea82a22)
+res: f32(0x1.0c27fa00000000000000p+60:0x5d8613fd) flags=INEXACT  (5/0)
+op : f32(-0x1.31f75000000000000000p-40:0xab98fba8) * f32(-0x1.50544400000000000000p-66:0x9ea82a22) + f32(-0x1.c0bab600000000000000p+99:0xf1605d5b)
+res: f32(-0x1.c0bab600000000000000p+99:0xf1605d5b) flags=INEXACT  (5/1)
+op : f32(-0x1.50544400000000000000p-66:0x9ea82a22) * f32(-0x1.c0bab600000000000000p+99:0xf1605d5b) + f32(-0x1.31f75000000000000000p-40:0xab98fba8)
+res: f32(0x1.26c46200000000000000p+34:0x50936231) flags=INEXACT  (5/2)
+op : f32(-0x1.31f75000000000000000p-40:0xab98fba8) * f32(-0x1.50544400000000000000p-66:0x9ea82a22) + f32(-0x1.00000000000000000000p-126:0x80800000)
+res: f32(0x1.91f94000000000000000p-106:0x0ac8fca0) flags=INEXACT  (6/0)
+op : f32(-0x1.50544400000000000000p-66:0x9ea82a22) * f32(-0x1.00000000000000000000p-126:0x80800000) + f32(-0x1.31f75000000000000000p-40:0xab98fba8)
+res: f32(-0x1.31f75000000000000000p-40:0xab98fba8) flags=INEXACT  (6/1)
+op : f32(-0x1.00000000000000000000p-126:0x80800000) * f32(-0x1.31f75000000000000000p-40:0xab98fba8) + f32(-0x1.50544400000000000000p-66:0x9ea82a22)
+res: f32(-0x1.50544400000000000000p-66:0x9ea82a22) flags=INEXACT  (6/2)
+op : f32(-0x1.50544400000000000000p-66:0x9ea82a22) * f32(-0x1.00000000000000000000p-126:0x80800000) + f32(0x0.00000000000000000000p+0:0000000000)
+res: f32(0x0.00000000000000000000p+0:0000000000) flags=UNDERFLOW INEXACT  (7/0)
+op : f32(-0x1.00000000000000000000p-126:0x80800000) * f32(0x0.00000000000000000000p+0:0000000000) + f32(-0x1.50544400000000000000p-66:0x9ea82a22)
+res: f32(-0x1.50544400000000000000p-66:0x9ea82a22) flags=INEXACT  (7/1)
+op : f32(0x0.00000000000000000000p+0:0000000000) * f32(-0x1.50544400000000000000p-66:0x9ea82a22) + f32(-0x1.00000000000000000000p-126:0x80800000)
+res: f32(-0x1.00000000000000000000p-126:0x80800000) flags=OK (7/2)
+op : f32(-0x1.00000000000000000000p-126:0x80800000) * f32(0x0.00000000000000000000p+0:0000000000) + f32(0x1.00000000000000000000p-126:0x00800000)
+res: f32(0x1.00000000000000000000p-126:0x00800000) flags=OK (8/0)
+op : f32(0x0.00000000000000000000p+0:0000000000) * f32(0x1.00000000000000000000p-126:0x00800000) + f32(-0x1.00000000000000000000p-126:0x80800000)
+res: f32(-0x1.00000000000000000000p-126:0x80800000) flags=OK (8/1)
+op : f32(0x1.00000000000000000000p-126:0x00800000) * f32(-0x1.00000000000000000000p-126:0x80800000) + f32(0x0.00000000000000000000p+0:0000000000)
+res: f32(-0x0.00000000000000000000p+0:0x80000000) flags=UNDERFLOW INEXACT  (8/2)
+op : f32(0x0.00000000000000000000p+0:0000000000) * f32(0x1.00000000000000000000p-126:0x00800000) + f32(0x1.00000000000000000000p-25:0x33000000)
+res: f32(0x1.00000000000000000000p-25:0x33000000) flags=OK (9/0)
+op : f32(0x1.00000000000000000000p-126:0x00800000) * f32(0x1.00000000000000000000p-25:0x33000000) + f32(0x0.00000000000000000000p+0:0000000000)
+res: f32(0x0.00000000000000000000p+0:0000000000) flags=UNDERFLOW INEXACT  (9/1)
+op : f32(0x1.00000000000000000000p-25:0x33000000) * f32(0x0.00000000000000000000p+0:0000000000) + f32(0x1.00000000000000000000p-126:0x00800000)
+res: f32(0x1.00000000000000000000p-126:0x00800000) flags=OK (9/2)
+op : f32(0x1.00000000000000000000p-126:0x00800000) * f32(0x1.00000000000000000000p-25:0x33000000) + f32(0x1.ffffe600000000000000p-25:0x337ffff3)
+res: f32(0x1.ffffe600000000000000p-25:0x337ffff3) flags=INEXACT  (10/0)
+op : f32(0x1.00000000000000000000p-25:0x33000000) * f32(0x1.ffffe600000000000000p-25:0x337ffff3) + f32(0x1.00000000000000000000p-126:0x00800000)
+res: f32(0x1.ffffe600000000000000p-50:0x26fffff3) flags=INEXACT  (10/1)
+op : f32(0x1.ffffe600000000000000p-25:0x337ffff3) * f32(0x1.00000000000000000000p-126:0x00800000) + f32(0x1.00000000000000000000p-25:0x33000000)
+res: f32(0x1.00000000000000000000p-25:0x33000000) flags=INEXACT  (10/2)
+op : f32(0x1.00000000000000000000p-25:0x33000000) * f32(0x1.ffffe600000000000000p-25:0x337ffff3) + f32(0x1.ff801a00000000000000p-15:0x387fc00d)
+res: f32(0x1.ff801a00000000000000p-15:0x387fc00d) flags=INEXACT  (11/0)
+op : f32(0x1.ffffe600000000000000p-25:0x337ffff3) * f32(0x1.ff801a00000000000000p-15:0x387fc00d) + f32(0x1.00000000000000000000p-25:0x33000000)
+res: f32(0x1.0007fe00000000000000p-25:0x330003ff) flags=INEXACT  (11/1)
+op : f32(0x1.ff801a00000000000000p-15:0x387fc00d) * f32(0x1.00000000000000000000p-25:0x33000000) + f32(0x1.ffffe600000000000000p-25:0x337ffff3)
+res: f32(0x1.0001f200000000000000p-24:0x338000f9) flags=INEXACT  (11/2)
+op : f32(0x1.ffffe600000000000000p-25:0x337ffff3) * f32(0x1.ff801a00000000000000p-15:0x387fc00d) + f32(0x1.00000c00000000000000p-14:0x38800006)
+res: f32(0x1.00000c00000000000000p-14:0x38800006) flags=INEXACT  (12/0)
+op : f32(0x1.ff801a00000000000000p-15:0x387fc00d) * f32(0x1.00000c00000000000000p-14:0x38800006) + f32(0x1.ffffe600000000000000p-25:0x337ffff3)
+res: f32(0x1.0ffbf400000000000000p-24:0x3387fdfa) flags=INEXACT  (12/1)
+op : f32(0x1.00000c00000000000000p-14:0x38800006) * f32(0x1.ffffe600000000000000p-25:0x337ffff3) + f32(0x1.ff801a00000000000000p-15:0x387fc00d)
+res: f32(0x1.ff801c00000000000000p-15:0x387fc00e) flags=INEXACT  (12/2)
+op : f32(0x1.ff801a00000000000000p-15:0x387fc00d) * f32(0x1.00000c00000000000000p-14:0x38800006) + f32(0x1.00000000000000000000p+0:0x3f800000)
+res: f32(0x1.00000000000000000000p+0:0x3f800000) flags=INEXACT  (13/0)
+op : f32(0x1.00000c00000000000000p-14:0x38800006) * f32(0x1.00000000000000000000p+0:0x3f800000) + f32(0x1.ff801a00000000000000p-15:0x387fc00d)
+res: f32(0x1.ffc01800000000000000p-14:0x38ffe00c) flags=INEXACT  (13/1)
+op : f32(0x1.00000000000000000000p+0:0x3f800000) * f32(0x1.ff801a00000000000000p-15:0x387fc00d) + f32(0x1.00000c00000000000000p-14:0x38800006)
+res: f32(0x1.ffc01800000000000000p-14:0x38ffe00c) flags=INEXACT  (13/2)
+op : f32(0x1.00000c00000000000000p-14:0x38800006) * f32(0x1.00000000000000000000p+0:0x3f800000) + f32(0x1.00400000000000000000p+0:0x3f802000)
+res: f32(0x1.00440000000000000000p+0:0x3f802200) flags=INEXACT  (14/0)
+op : f32(0x1.00000000000000000000p+0:0x3f800000) * f32(0x1.00400000000000000000p+0:0x3f802000) + f32(0x1.00000c00000000000000p-14:0x38800006)
+res: f32(0x1.00440000000000000000p+0:0x3f802200) flags=INEXACT  (14/1)
+op : f32(0x1.00400000000000000000p+0:0x3f802000) * f32(0x1.00000c00000000000000p-14:0x38800006) + f32(0x1.00000000000000000000p+0:0x3f800000)
+res: f32(0x1.00040200000000000000p+0:0x3f800201) flags=INEXACT  (14/2)
+op : f32(0x1.00000000000000000000p+0:0x3f800000) * f32(0x1.00400000000000000000p+0:0x3f802000) + f32(0x1.00000000000000000000p+1:0x40000000)
+res: f32(0x1.80200000000000000000p+1:0x40401000) flags=INEXACT  (15/0)
+op : f32(0x1.00400000000000000000p+0:0x3f802000) * f32(0x1.00000000000000000000p+1:0x40000000) + f32(0x1.00000000000000000000p+0:0x3f800000)
+res: f32(0x1.80400000000000000000p+1:0x40402000) flags=INEXACT  (15/1)
+op : f32(0x1.00000000000000000000p+1:0x40000000) * f32(0x1.00000000000000000000p+0:0x3f800000) + f32(0x1.00400000000000000000p+0:0x3f802000)
+res: f32(0x1.80200000000000000000p+1:0x40401000) flags=INEXACT  (15/2)
+op : f32(0x1.00400000000000000000p+0:0x3f802000) * f32(0x1.00000000000000000000p+1:0x40000000) + f32(0x1.5bf0a800000000000000p+1:0x402df854)
+res: f32(0x1.2e185400000000000000p+2:0x40970c2a) flags=INEXACT  (16/0)
+op : f32(0x1.00000000000000000000p+1:0x40000000) * f32(0x1.5bf0a800000000000000p+1:0x402df854) + f32(0x1.00400000000000000000p+0:0x3f802000)
+res: f32(0x1.9c00a800000000000000p+2:0x40ce0054) flags=INEXACT  (16/1)
+op : f32(0x1.5bf0a800000000000000p+1:0x402df854) * f32(0x1.00400000000000000000p+0:0x3f802000) + f32(0x1.00000000000000000000p+1:0x40000000)
+res: f32(0x1.2e23d200000000000000p+2:0x409711e9) flags=INEXACT  (16/2)
+op : f32(0x1.00000000000000000000p+1:0x40000000) * f32(0x1.5bf0a800000000000000p+1:0x402df854) + f32(0x1.921fb600000000000000p+1:0x40490fdb)
+res: f32(0x1.12804200000000000000p+3:0x41094021) flags=INEXACT  (17/0)
+op : f32(0x1.5bf0a800000000000000p+1:0x402df854) * f32(0x1.921fb600000000000000p+1:0x40490fdb) + f32(0x1.00000000000000000000p+1:0x40000000)
+res: f32(0x1.51458000000000000000p+3:0x4128a2c0) flags=INEXACT  (17/1)
+op : f32(0x1.921fb600000000000000p+1:0x40490fdb) * f32(0x1.00000000000000000000p+1:0x40000000) + f32(0x1.5bf0a800000000000000p+1:0x402df854)
+res: f32(0x1.200c0400000000000000p+3:0x41100602) flags=INEXACT  (17/2)
+op : f32(0x1.5bf0a800000000000000p+1:0x402df854) * f32(0x1.921fb600000000000000p+1:0x40490fdb) + f32(0x1.ffbe0000000000000000p+15:0x477fdf00)
+res: f32(0x1.ffcf1400000000000000p+15:0x477fe78a) flags=INEXACT  (18/0)
+op : f32(0x1.921fb600000000000000p+1:0x40490fdb) * f32(0x1.ffbe0000000000000000p+15:0x477fdf00) + f32(0x1.5bf0a800000000000000p+1:0x402df854)
+res: f32(0x1.91ed3c00000000000000p+17:0x4848f69e) flags=INEXACT  (18/1)
+op : f32(0x1.ffbe0000000000000000p+15:0x477fdf00) * f32(0x1.5bf0a800000000000000p+1:0x402df854) + f32(0x1.921fb600000000000000p+1:0x40490fdb)
+res: f32(0x1.5bc56000000000000000p+17:0x482de2b0) flags=INEXACT  (18/2)
+op : f32(0x1.921fb600000000000000p+1:0x40490fdb) * f32(0x1.ffbe0000000000000000p+15:0x477fdf00) + f32(0x1.ffc00000000000000000p+15:0x477fe000)
+res: f32(0x1.08edf000000000000000p+18:0x488476f8) flags=INEXACT  (19/0)
+op : f32(0x1.ffbe0000000000000000p+15:0x477fdf00) * f32(0x1.ffc00000000000000000p+15:0x477fe000) + f32(0x1.921fb600000000000000p+1:0x40490fdb)
+res: f32(0x1.ff7e0800000000000000p+31:0x4f7fbf04) flags=INEXACT  (19/1)
+op : f32(0x1.ffc00000000000000000p+15:0x477fe000) * f32(0x1.921fb600000000000000p+1:0x40490fdb) + f32(0x1.ffbe0000000000000000p+15:0x477fdf00)
+res: f32(0x1.08ee7a00000000000000p+18:0x4884773d) flags=INEXACT  (19/2)
+op : f32(0x1.ffbe0000000000000000p+15:0x477fdf00) * f32(0x1.ffc00000000000000000p+15:0x477fe000) + f32(0x1.ffc20000000000000000p+15:0x477fe100)
+res: f32(0x1.ff800800000000000000p+31:0x4f7fc004) flags=INEXACT  (20/0)
+op : f32(0x1.ffc00000000000000000p+15:0x477fe000) * f32(0x1.ffc20000000000000000p+15:0x477fe100) + f32(0x1.ffbe0000000000000000p+15:0x477fdf00)
+res: f32(0x1.ff840800000000000000p+31:0x4f7fc204) flags=INEXACT  (20/1)
+op : f32(0x1.ffc20000000000000000p+15:0x477fe100) * f32(0x1.ffbe0000000000000000p+15:0x477fdf00) + f32(0x1.ffc00000000000000000p+15:0x477fe000)
+res: f32(0x1.ff820800000000000000p+31:0x4f7fc104) flags=INEXACT  (20/2)
+op : f32(0x1.ffc00000000000000000p+15:0x477fe000) * f32(0x1.ffc20000000000000000p+15:0x477fe100) + f32(0x1.ffbf0000000000000000p+16:0x47ffdf80)
+res: f32(0x1.ff860800000000000000p+31:0x4f7fc304) flags=INEXACT  (21/0)
+op : f32(0x1.ffc20000000000000000p+15:0x477fe100) * f32(0x1.ffbf0000000000000000p+16:0x47ffdf80) + f32(0x1.ffc00000000000000000p+15:0x477fe000)
+res: f32(0x1.ff820800000000000000p+32:0x4fffc104) flags=INEXACT  (21/1)
+op : f32(0x1.ffbf0000000000000000p+16:0x47ffdf80) * f32(0x1.ffc00000000000000000p+15:0x477fe000) + f32(0x1.ffc20000000000000000p+15:0x477fe100)
+res: f32(0x1.ff800800000000000000p+32:0x4fffc004) flags=INEXACT  (21/2)
+op : f32(0x1.ffc20000000000000000p+15:0x477fe100) * f32(0x1.ffbf0000000000000000p+16:0x47ffdf80) + f32(0x1.ffc00000000000000000p+16:0x47ffe000)
+res: f32(0x1.ff830800000000000000p+32:0x4fffc184) flags=INEXACT  (22/0)
+op : f32(0x1.ffbf0000000000000000p+16:0x47ffdf80) * f32(0x1.ffc00000000000000000p+16:0x47ffe000) + f32(0x1.ffc20000000000000000p+15:0x477fe100)
+res: f32(0x1.ff7f8800000000000000p+33:0x507fbfc4) flags=INEXACT  (22/1)
+op : f32(0x1.ffc00000000000000000p+16:0x47ffe000) * f32(0x1.ffc20000000000000000p+15:0x477fe100) + f32(0x1.ffbf0000000000000000p+16:0x47ffdf80)
+res: f32(0x1.ff840800000000000000p+32:0x4fffc204) flags=INEXACT  (22/2)
+op : f32(0x1.ffbf0000000000000000p+16:0x47ffdf80) * f32(0x1.ffc00000000000000000p+16:0x47ffe000) + f32(0x1.ffc10000000000000000p+16:0x47ffe080)
+res: f32(0x1.ff800800000000000000p+33:0x507fc004) flags=INEXACT  (23/0)
+op : f32(0x1.ffc00000000000000000p+16:0x47ffe000) * f32(0x1.ffc10000000000000000p+16:0x47ffe080) + f32(0x1.ffbf0000000000000000p+16:0x47ffdf80)
+res: f32(0x1.ff820800000000000000p+33:0x507fc104) flags=INEXACT  (23/1)
+op : f32(0x1.ffc10000000000000000p+16:0x47ffe080) * f32(0x1.ffbf0000000000000000p+16:0x47ffdf80) + f32(0x1.ffc00000000000000000p+16:0x47ffe000)
+res: f32(0x1.ff810800000000000000p+33:0x507fc084) flags=INEXACT  (23/2)
+op : f32(0x1.ffc00000000000000000p+16:0x47ffe000) * f32(0x1.ffc10000000000000000p+16:0x47ffe080) + f32(0x1.c0bab600000000000000p+99:0x71605d5b)
+res: f32(0x1.c0bab600000000000000p+99:0x71605d5b) flags=INEXACT  (24/0)
+op : f32(0x1.ffc10000000000000000p+16:0x47ffe080) * f32(0x1.c0bab600000000000000p+99:0x71605d5b) + f32(0x1.ffc00000000000000000p+16:0x47ffe000)
+res: f32(0x1.c0838000000000000000p+116:0x79e041c0) flags=INEXACT  (24/1)
+op : f32(0x1.c0bab600000000000000p+99:0x71605d5b) * f32(0x1.ffc00000000000000000p+16:0x47ffe000) + f32(0x1.ffc10000000000000000p+16:0x47ffe080)
+res: f32(0x1.c0829e00000000000000p+116:0x79e0414f) flags=INEXACT  (24/2)
+op : f32(0x1.ffc10000000000000000p+16:0x47ffe080) * f32(0x1.c0bab600000000000000p+99:0x71605d5b) + f32(0x1.fffffe00000000000000p+127:0x7f7fffff)
+res: f32(inf:0x7f800000) flags=OVERFLOW INEXACT  (25/0)
+op : f32(0x1.c0bab600000000000000p+99:0x71605d5b) * f32(0x1.fffffe00000000000000p+127:0x7f7fffff) + f32(0x1.ffc10000000000000000p+16:0x47ffe080)
+res: f32(inf:0x7f800000) flags=OVERFLOW INEXACT  (25/1)
+op : f32(0x1.fffffe00000000000000p+127:0x7f7fffff) * f32(0x1.ffc10000000000000000p+16:0x47ffe080) + f32(0x1.c0bab600000000000000p+99:0x71605d5b)
+res: f32(inf:0x7f800000) flags=OVERFLOW INEXACT  (25/2)
+op : f32(0x1.c0bab600000000000000p+99:0x71605d5b) * f32(0x1.fffffe00000000000000p+127:0x7f7fffff) + f32(inf:0x7f800000)
+res: f32(inf:0x7f800000) flags=OK (26/0)
+op : f32(0x1.fffffe00000000000000p+127:0x7f7fffff) * f32(inf:0x7f800000) + f32(0x1.c0bab600000000000000p+99:0x71605d5b)
+res: f32(inf:0x7f800000) flags=OK (26/1)
+op : f32(inf:0x7f800000) * f32(0x1.c0bab600000000000000p+99:0x71605d5b) + f32(0x1.fffffe00000000000000p+127:0x7f7fffff)
+res: f32(inf:0x7f800000) flags=OK (26/2)
+op : f32(0x1.fffffe00000000000000p+127:0x7f7fffff) * f32(inf:0x7f800000) + f32(-nan:0x7fc00000)
+res: f32(-nan:0xffffffff) flags=OK (27/0)
+op : f32(inf:0x7f800000) * f32(-nan:0x7fc00000) + f32(0x1.fffffe00000000000000p+127:0x7f7fffff)
+res: f32(-nan:0xffffffff) flags=OK (27/1)
+op : f32(-nan:0x7fc00000) * f32(0x1.fffffe00000000000000p+127:0x7f7fffff) + f32(inf:0x7f800000)
+res: f32(-nan:0xffffffff) flags=OK (27/2)
+op : f32(inf:0x7f800000) * f32(-nan:0x7fc00000) + f32(-nan:0x7fa00000)
+res: f32(-nan:0xffffffff) flags=INVALID (28/0)
+op : f32(-nan:0x7fc00000) * f32(-nan:0x7fa00000) + f32(inf:0x7f800000)
+res: f32(-nan:0xffffffff) flags=INVALID (28/1)
+op : f32(-nan:0x7fa00000) * f32(inf:0x7f800000) + f32(-nan:0x7fc00000)
+res: f32(-nan:0xffffffff) flags=INVALID (28/2)
+op : f32(-nan:0x7fc00000) * f32(-nan:0x7fa00000) + f32(-nan:0xffa00000)
+res: f32(-nan:0xffffffff) flags=INVALID (29/0)
+op : f32(-nan:0x7fa00000) * f32(-nan:0xffa00000) + f32(-nan:0x7fc00000)
+res: f32(-nan:0xffffffff) flags=INVALID (29/1)
+op : f32(-nan:0xffa00000) * f32(-nan:0x7fc00000) + f32(-nan:0x7fa00000)
+res: f32(-nan:0xffffffff) flags=INVALID (29/2)
+op : f32(-nan:0x7fa00000) * f32(-nan:0xffa00000) + f32(-nan:0xffc00000)
+res: f32(-nan:0xffffffff) flags=INVALID (30/0)
+op : f32(-nan:0xffa00000) * f32(-nan:0xffc00000) + f32(-nan:0x7fa00000)
+res: f32(-nan:0xffffffff) flags=INVALID (30/1)
+op : f32(-nan:0xffc00000) * f32(-nan:0x7fa00000) + f32(-nan:0xffa00000)
+res: f32(-nan:0xffffffff) flags=INVALID (30/2)
+# LP184149
+op : f32(0x0.00000000000000000000p+0:0000000000) * f32(0x1.00000000000000000000p-1:0x3f000000) + f32(0x0.00000000000000000000p+0:0000000000)
+res: f32(0x0.00000000000000000000p+0:0000000000) flags=OK (31/0)
+op : f32(0x1.00000000000000000000p-149:0x00000001) * f32(0x1.00000000000000000000p-149:0x00000001) + f32(0x1.00000000000000000000p-149:0x00000001)
+res: f32(0x1.00000000000000000000p-149:0x00000001) flags=UNDERFLOW INEXACT  (32/0)
+### Rounding upwards
+op : f32(-nan:0xffa00000) * f32(-nan:0xffc00000) + f32(-inf:0xff800000)
+res: f32(-nan:0xffffffff) flags=INVALID (0/0)
+op : f32(-nan:0xffc00000) * f32(-inf:0xff800000) + f32(-nan:0xffa00000)
+res: f32(-nan:0xffffffff) flags=INVALID (0/1)
+op : f32(-inf:0xff800000) * f32(-nan:0xffa00000) + f32(-nan:0xffc00000)
+res: f32(-nan:0xffffffff) flags=INVALID (0/2)
+op : f32(-nan:0xffc00000) * f32(-inf:0xff800000) + f32(-0x1.fffffe00000000000000p+127:0xff7fffff)
+res: f32(-nan:0xffffffff) flags=OK (1/0)
+op : f32(-inf:0xff800000) * f32(-0x1.fffffe00000000000000p+127:0xff7fffff) + f32(-nan:0xffc00000)
+res: f32(-nan:0xffffffff) flags=OK (1/1)
+op : f32(-0x1.fffffe00000000000000p+127:0xff7fffff) * f32(-nan:0xffc00000) + f32(-inf:0xff800000)
+res: f32(-nan:0xffffffff) flags=OK (1/2)
+op : f32(-inf:0xff800000) * f32(-0x1.fffffe00000000000000p+127:0xff7fffff) + f32(-0x1.1874b200000000000000p+103:0xf30c3a59)
+res: f32(inf:0x7f800000) flags=OK (2/0)
+op : f32(-0x1.fffffe00000000000000p+127:0xff7fffff) * f32(-0x1.1874b200000000000000p+103:0xf30c3a59) + f32(-inf:0xff800000)
+res: f32(-inf:0xff800000) flags=OK (2/1)
+op : f32(-0x1.1874b200000000000000p+103:0xf30c3a59) * f32(-inf:0xff800000) + f32(-0x1.fffffe00000000000000p+127:0xff7fffff)
+res: f32(inf:0x7f800000) flags=OK (2/2)
+op : f32(-0x1.fffffe00000000000000p+127:0xff7fffff) * f32(-0x1.1874b200000000000000p+103:0xf30c3a59) + f32(-0x1.c0bab600000000000000p+99:0xf1605d5b)
+res: f32(inf:0x7f800000) flags=OVERFLOW INEXACT  (3/0)
+op : f32(-0x1.1874b200000000000000p+103:0xf30c3a59) * f32(-0x1.c0bab600000000000000p+99:0xf1605d5b) + f32(-0x1.fffffe00000000000000p+127:0xff7fffff)
+res: f32(inf:0x7f800000) flags=OVERFLOW INEXACT  (3/1)
+op : f32(-0x1.c0bab600000000000000p+99:0xf1605d5b) * f32(-0x1.fffffe00000000000000p+127:0xff7fffff) + f32(-0x1.1874b200000000000000p+103:0xf30c3a59)
+res: f32(inf:0x7f800000) flags=OVERFLOW INEXACT  (3/2)
+op : f32(-0x1.1874b200000000000000p+103:0xf30c3a59) * f32(-0x1.c0bab600000000000000p+99:0xf1605d5b) + f32(-0x1.31f75000000000000000p-40:0xab98fba8)
+res: f32(inf:0x7f800000) flags=OVERFLOW INEXACT  (4/0)
+op : f32(-0x1.c0bab600000000000000p+99:0xf1605d5b) * f32(-0x1.31f75000000000000000p-40:0xab98fba8) + f32(-0x1.1874b200000000000000p+103:0xf30c3a59)
+res: f32(-0x1.1874b000000000000000p+103:0xf30c3a58) flags=INEXACT  (4/1)
+op : f32(-0x1.31f75000000000000000p-40:0xab98fba8) * f32(-0x1.1874b200000000000000p+103:0xf30c3a59) + f32(-0x1.c0bab600000000000000p+99:0xf1605d5b)
+res: f32(-0x1.c0bab400000000000000p+99:0xf1605d5a) flags=INEXACT  (4/2)
+op : f32(-0x1.c0bab600000000000000p+99:0xf1605d5b) * f32(-0x1.31f75000000000000000p-40:0xab98fba8) + f32(-0x1.50544400000000000000p-66:0x9ea82a22)
+res: f32(0x1.0c27fa00000000000000p+60:0x5d8613fd) flags=INEXACT  (5/0)
+op : f32(-0x1.31f75000000000000000p-40:0xab98fba8) * f32(-0x1.50544400000000000000p-66:0x9ea82a22) + f32(-0x1.c0bab600000000000000p+99:0xf1605d5b)
+res: f32(-0x1.c0bab400000000000000p+99:0xf1605d5a) flags=INEXACT  (5/1)
+op : f32(-0x1.50544400000000000000p-66:0x9ea82a22) * f32(-0x1.c0bab600000000000000p+99:0xf1605d5b) + f32(-0x1.31f75000000000000000p-40:0xab98fba8)
+res: f32(0x1.26c46200000000000000p+34:0x50936231) flags=INEXACT  (5/2)
+op : f32(-0x1.31f75000000000000000p-40:0xab98fba8) * f32(-0x1.50544400000000000000p-66:0x9ea82a22) + f32(-0x1.00000000000000000000p-126:0x80800000)
+res: f32(0x1.91f94000000000000000p-106:0x0ac8fca0) flags=INEXACT  (6/0)
+op : f32(-0x1.50544400000000000000p-66:0x9ea82a22) * f32(-0x1.00000000000000000000p-126:0x80800000) + f32(-0x1.31f75000000000000000p-40:0xab98fba8)
+res: f32(-0x1.31f74e00000000000000p-40:0xab98fba7) flags=INEXACT  (6/1)
+op : f32(-0x1.00000000000000000000p-126:0x80800000) * f32(-0x1.31f75000000000000000p-40:0xab98fba8) + f32(-0x1.50544400000000000000p-66:0x9ea82a22)
+res: f32(-0x1.50544200000000000000p-66:0x9ea82a21) flags=INEXACT  (6/2)
+op : f32(-0x1.50544400000000000000p-66:0x9ea82a22) * f32(-0x1.00000000000000000000p-126:0x80800000) + f32(0x0.00000000000000000000p+0:0000000000)
+res: f32(0x1.00000000000000000000p-149:0x00000001) flags=UNDERFLOW INEXACT  (7/0)
+op : f32(-0x1.00000000000000000000p-126:0x80800000) * f32(0x0.00000000000000000000p+0:0000000000) + f32(-0x1.50544400000000000000p-66:0x9ea82a22)
+res: f32(-0x1.50544400000000000000p-66:0x9ea82a22) flags=INEXACT  (7/1)
+op : f32(0x0.00000000000000000000p+0:0000000000) * f32(-0x1.50544400000000000000p-66:0x9ea82a22) + f32(-0x1.00000000000000000000p-126:0x80800000)
+res: f32(-0x1.00000000000000000000p-126:0x80800000) flags=OK (7/2)
+op : f32(-0x1.00000000000000000000p-126:0x80800000) * f32(0x0.00000000000000000000p+0:0000000000) + f32(0x1.00000000000000000000p-126:0x00800000)
+res: f32(0x1.00000000000000000000p-126:0x00800000) flags=OK (8/0)
+op : f32(0x0.00000000000000000000p+0:0000000000) * f32(0x1.00000000000000000000p-126:0x00800000) + f32(-0x1.00000000000000000000p-126:0x80800000)
+res: f32(-0x1.00000000000000000000p-126:0x80800000) flags=OK (8/1)
+op : f32(0x1.00000000000000000000p-126:0x00800000) * f32(-0x1.00000000000000000000p-126:0x80800000) + f32(0x0.00000000000000000000p+0:0000000000)
+res: f32(-0x0.00000000000000000000p+0:0x80000000) flags=UNDERFLOW INEXACT  (8/2)
+op : f32(0x0.00000000000000000000p+0:0000000000) * f32(0x1.00000000000000000000p-126:0x00800000) + f32(0x1.00000000000000000000p-25:0x33000000)
+res: f32(0x1.00000000000000000000p-25:0x33000000) flags=OK (9/0)
+op : f32(0x1.00000000000000000000p-126:0x00800000) * f32(0x1.00000000000000000000p-25:0x33000000) + f32(0x0.00000000000000000000p+0:0000000000)
+res: f32(0x1.00000000000000000000p-149:0x00000001) flags=UNDERFLOW INEXACT  (9/1)
+op : f32(0x1.00000000000000000000p-25:0x33000000) * f32(0x0.00000000000000000000p+0:0000000000) + f32(0x1.00000000000000000000p-126:0x00800000)
+res: f32(0x1.00000000000000000000p-126:0x00800000) flags=OK (9/2)
+op : f32(0x1.00000000000000000000p-126:0x00800000) * f32(0x1.00000000000000000000p-25:0x33000000) + f32(0x1.ffffe600000000000000p-25:0x337ffff3)
+res: f32(0x1.ffffe800000000000000p-25:0x337ffff4) flags=INEXACT  (10/0)
+op : f32(0x1.00000000000000000000p-25:0x33000000) * f32(0x1.ffffe600000000000000p-25:0x337ffff3) + f32(0x1.00000000000000000000p-126:0x00800000)
+res: f32(0x1.ffffe800000000000000p-50:0x26fffff4) flags=INEXACT  (10/1)
+op : f32(0x1.ffffe600000000000000p-25:0x337ffff3) * f32(0x1.00000000000000000000p-126:0x00800000) + f32(0x1.00000000000000000000p-25:0x33000000)
+res: f32(0x1.00000200000000000000p-25:0x33000001) flags=INEXACT  (10/2)
+op : f32(0x1.00000000000000000000p-25:0x33000000) * f32(0x1.ffffe600000000000000p-25:0x337ffff3) + f32(0x1.ff801a00000000000000p-15:0x387fc00d)
+res: f32(0x1.ff801c00000000000000p-15:0x387fc00e) flags=INEXACT  (11/0)
+op : f32(0x1.ffffe600000000000000p-25:0x337ffff3) * f32(0x1.ff801a00000000000000p-15:0x387fc00d) + f32(0x1.00000000000000000000p-25:0x33000000)
+res: f32(0x1.00080000000000000000p-25:0x33000400) flags=INEXACT  (11/1)
+op : f32(0x1.ff801a00000000000000p-15:0x387fc00d) * f32(0x1.00000000000000000000p-25:0x33000000) + f32(0x1.ffffe600000000000000p-25:0x337ffff3)
+res: f32(0x1.0001f400000000000000p-24:0x338000fa) flags=INEXACT  (11/2)
+op : f32(0x1.ffffe600000000000000p-25:0x337ffff3) * f32(0x1.ff801a00000000000000p-15:0x387fc00d) + f32(0x1.00000c00000000000000p-14:0x38800006)
+res: f32(0x1.00000e00000000000000p-14:0x38800007) flags=INEXACT  (12/0)
+op : f32(0x1.ff801a00000000000000p-15:0x387fc00d) * f32(0x1.00000c00000000000000p-14:0x38800006) + f32(0x1.ffffe600000000000000p-25:0x337ffff3)
+res: f32(0x1.0ffbf600000000000000p-24:0x3387fdfb) flags=INEXACT  (12/1)
+op : f32(0x1.00000c00000000000000p-14:0x38800006) * f32(0x1.ffffe600000000000000p-25:0x337ffff3) + f32(0x1.ff801a00000000000000p-15:0x387fc00d)
+res: f32(0x1.ff801c00000000000000p-15:0x387fc00e) flags=INEXACT  (12/2)
+op : f32(0x1.ff801a00000000000000p-15:0x387fc00d) * f32(0x1.00000c00000000000000p-14:0x38800006) + f32(0x1.00000000000000000000p+0:0x3f800000)
+res: f32(0x1.00000200000000000000p+0:0x3f800001) flags=INEXACT  (13/0)
+op : f32(0x1.00000c00000000000000p-14:0x38800006) * f32(0x1.00000000000000000000p+0:0x3f800000) + f32(0x1.ff801a00000000000000p-15:0x387fc00d)
+res: f32(0x1.ffc01a00000000000000p-14:0x38ffe00d) flags=INEXACT  (13/1)
+op : f32(0x1.00000000000000000000p+0:0x3f800000) * f32(0x1.ff801a00000000000000p-15:0x387fc00d) + f32(0x1.00000c00000000000000p-14:0x38800006)
+res: f32(0x1.ffc01a00000000000000p-14:0x38ffe00d) flags=INEXACT  (13/2)
+op : f32(0x1.00000c00000000000000p-14:0x38800006) * f32(0x1.00000000000000000000p+0:0x3f800000) + f32(0x1.00400000000000000000p+0:0x3f802000)
+res: f32(0x1.00440200000000000000p+0:0x3f802201) flags=INEXACT  (14/0)
+op : f32(0x1.00000000000000000000p+0:0x3f800000) * f32(0x1.00400000000000000000p+0:0x3f802000) + f32(0x1.00000c00000000000000p-14:0x38800006)
+res: f32(0x1.00440200000000000000p+0:0x3f802201) flags=INEXACT  (14/1)
+op : f32(0x1.00400000000000000000p+0:0x3f802000) * f32(0x1.00000c00000000000000p-14:0x38800006) + f32(0x1.00000000000000000000p+0:0x3f800000)
+res: f32(0x1.00040200000000000000p+0:0x3f800201) flags=INEXACT  (14/2)
+op : f32(0x1.00000000000000000000p+0:0x3f800000) * f32(0x1.00400000000000000000p+0:0x3f802000) + f32(0x1.00000000000000000000p+1:0x40000000)
+res: f32(0x1.80200000000000000000p+1:0x40401000) flags=INEXACT  (15/0)
+op : f32(0x1.00400000000000000000p+0:0x3f802000) * f32(0x1.00000000000000000000p+1:0x40000000) + f32(0x1.00000000000000000000p+0:0x3f800000)
+res: f32(0x1.80400000000000000000p+1:0x40402000) flags=INEXACT  (15/1)
+op : f32(0x1.00000000000000000000p+1:0x40000000) * f32(0x1.00000000000000000000p+0:0x3f800000) + f32(0x1.00400000000000000000p+0:0x3f802000)
+res: f32(0x1.80200000000000000000p+1:0x40401000) flags=INEXACT  (15/2)
+op : f32(0x1.00400000000000000000p+0:0x3f802000) * f32(0x1.00000000000000000000p+1:0x40000000) + f32(0x1.5bf0a800000000000000p+1:0x402df854)
+res: f32(0x1.2e185400000000000000p+2:0x40970c2a) flags=INEXACT  (16/0)
+op : f32(0x1.00000000000000000000p+1:0x40000000) * f32(0x1.5bf0a800000000000000p+1:0x402df854) + f32(0x1.00400000000000000000p+0:0x3f802000)
+res: f32(0x1.9c00a800000000000000p+2:0x40ce0054) flags=INEXACT  (16/1)
+op : f32(0x1.5bf0a800000000000000p+1:0x402df854) * f32(0x1.00400000000000000000p+0:0x3f802000) + f32(0x1.00000000000000000000p+1:0x40000000)
+res: f32(0x1.2e23d400000000000000p+2:0x409711ea) flags=INEXACT  (16/2)
+op : f32(0x1.00000000000000000000p+1:0x40000000) * f32(0x1.5bf0a800000000000000p+1:0x402df854) + f32(0x1.921fb600000000000000p+1:0x40490fdb)
+res: f32(0x1.12804200000000000000p+3:0x41094021) flags=INEXACT  (17/0)
+op : f32(0x1.5bf0a800000000000000p+1:0x402df854) * f32(0x1.921fb600000000000000p+1:0x40490fdb) + f32(0x1.00000000000000000000p+1:0x40000000)
+res: f32(0x1.51458200000000000000p+3:0x4128a2c1) flags=INEXACT  (17/1)
+op : f32(0x1.921fb600000000000000p+1:0x40490fdb) * f32(0x1.00000000000000000000p+1:0x40000000) + f32(0x1.5bf0a800000000000000p+1:0x402df854)
+res: f32(0x1.200c0600000000000000p+3:0x41100603) flags=INEXACT  (17/2)
+op : f32(0x1.5bf0a800000000000000p+1:0x402df854) * f32(0x1.921fb600000000000000p+1:0x40490fdb) + f32(0x1.ffbe0000000000000000p+15:0x477fdf00)
+res: f32(0x1.ffcf1600000000000000p+15:0x477fe78b) flags=INEXACT  (18/0)
+op : f32(0x1.921fb600000000000000p+1:0x40490fdb) * f32(0x1.ffbe0000000000000000p+15:0x477fdf00) + f32(0x1.5bf0a800000000000000p+1:0x402df854)
+res: f32(0x1.91ed3c00000000000000p+17:0x4848f69e) flags=INEXACT  (18/1)
+op : f32(0x1.ffbe0000000000000000p+15:0x477fdf00) * f32(0x1.5bf0a800000000000000p+1:0x402df854) + f32(0x1.921fb600000000000000p+1:0x40490fdb)
+res: f32(0x1.5bc56200000000000000p+17:0x482de2b1) flags=INEXACT  (18/2)
+op : f32(0x1.921fb600000000000000p+1:0x40490fdb) * f32(0x1.ffbe0000000000000000p+15:0x477fdf00) + f32(0x1.ffc00000000000000000p+15:0x477fe000)
+res: f32(0x1.08edf000000000000000p+18:0x488476f8) flags=INEXACT  (19/0)
+op : f32(0x1.ffbe0000000000000000p+15:0x477fdf00) * f32(0x1.ffc00000000000000000p+15:0x477fe000) + f32(0x1.921fb600000000000000p+1:0x40490fdb)
+res: f32(0x1.ff7e0a00000000000000p+31:0x4f7fbf05) flags=INEXACT  (19/1)
+op : f32(0x1.ffc00000000000000000p+15:0x477fe000) * f32(0x1.921fb600000000000000p+1:0x40490fdb) + f32(0x1.ffbe0000000000000000p+15:0x477fdf00)
+res: f32(0x1.08ee7a00000000000000p+18:0x4884773d) flags=INEXACT  (19/2)
+op : f32(0x1.ffbe0000000000000000p+15:0x477fdf00) * f32(0x1.ffc00000000000000000p+15:0x477fe000) + f32(0x1.ffc20000000000000000p+15:0x477fe100)
+res: f32(0x1.ff800a00000000000000p+31:0x4f7fc005) flags=INEXACT  (20/0)
+op : f32(0x1.ffc00000000000000000p+15:0x477fe000) * f32(0x1.ffc20000000000000000p+15:0x477fe100) + f32(0x1.ffbe0000000000000000p+15:0x477fdf00)
+res: f32(0x1.ff840800000000000000p+31:0x4f7fc204) flags=INEXACT  (20/1)
+op : f32(0x1.ffc20000000000000000p+15:0x477fe100) * f32(0x1.ffbe0000000000000000p+15:0x477fdf00) + f32(0x1.ffc00000000000000000p+15:0x477fe000)
+res: f32(0x1.ff820800000000000000p+31:0x4f7fc104) flags=INEXACT  (20/2)
+op : f32(0x1.ffc00000000000000000p+15:0x477fe000) * f32(0x1.ffc20000000000000000p+15:0x477fe100) + f32(0x1.ffbf0000000000000000p+16:0x47ffdf80)
+res: f32(0x1.ff860800000000000000p+31:0x4f7fc304) flags=INEXACT  (21/0)
+op : f32(0x1.ffc20000000000000000p+15:0x477fe100) * f32(0x1.ffbf0000000000000000p+16:0x47ffdf80) + f32(0x1.ffc00000000000000000p+15:0x477fe000)
+res: f32(0x1.ff820800000000000000p+32:0x4fffc104) flags=INEXACT  (21/1)
+op : f32(0x1.ffbf0000000000000000p+16:0x47ffdf80) * f32(0x1.ffc00000000000000000p+15:0x477fe000) + f32(0x1.ffc20000000000000000p+15:0x477fe100)
+res: f32(0x1.ff800a00000000000000p+32:0x4fffc005) flags=INEXACT  (21/2)
+op : f32(0x1.ffc20000000000000000p+15:0x477fe100) * f32(0x1.ffbf0000000000000000p+16:0x47ffdf80) + f32(0x1.ffc00000000000000000p+16:0x47ffe000)
+res: f32(0x1.ff830800000000000000p+32:0x4fffc184) flags=INEXACT  (22/0)
+op : f32(0x1.ffbf0000000000000000p+16:0x47ffdf80) * f32(0x1.ffc00000000000000000p+16:0x47ffe000) + f32(0x1.ffc20000000000000000p+15:0x477fe100)
+res: f32(0x1.ff7f8a00000000000000p+33:0x507fbfc5) flags=INEXACT  (22/1)
+op : f32(0x1.ffc00000000000000000p+16:0x47ffe000) * f32(0x1.ffc20000000000000000p+15:0x477fe100) + f32(0x1.ffbf0000000000000000p+16:0x47ffdf80)
+res: f32(0x1.ff840800000000000000p+32:0x4fffc204) flags=INEXACT  (22/2)
+op : f32(0x1.ffbf0000000000000000p+16:0x47ffdf80) * f32(0x1.ffc00000000000000000p+16:0x47ffe000) + f32(0x1.ffc10000000000000000p+16:0x47ffe080)
+res: f32(0x1.ff800a00000000000000p+33:0x507fc005) flags=INEXACT  (23/0)
+op : f32(0x1.ffc00000000000000000p+16:0x47ffe000) * f32(0x1.ffc10000000000000000p+16:0x47ffe080) + f32(0x1.ffbf0000000000000000p+16:0x47ffdf80)
+res: f32(0x1.ff820800000000000000p+33:0x507fc104) flags=INEXACT  (23/1)
+op : f32(0x1.ffc10000000000000000p+16:0x47ffe080) * f32(0x1.ffbf0000000000000000p+16:0x47ffdf80) + f32(0x1.ffc00000000000000000p+16:0x47ffe000)
+res: f32(0x1.ff810800000000000000p+33:0x507fc084) flags=INEXACT  (23/2)
+op : f32(0x1.ffc00000000000000000p+16:0x47ffe000) * f32(0x1.ffc10000000000000000p+16:0x47ffe080) + f32(0x1.c0bab600000000000000p+99:0x71605d5b)
+res: f32(0x1.c0bab800000000000000p+99:0x71605d5c) flags=INEXACT  (24/0)
+op : f32(0x1.ffc10000000000000000p+16:0x47ffe080) * f32(0x1.c0bab600000000000000p+99:0x71605d5b) + f32(0x1.ffc00000000000000000p+16:0x47ffe000)
+res: f32(0x1.c0838000000000000000p+116:0x79e041c0) flags=INEXACT  (24/1)
+op : f32(0x1.c0bab600000000000000p+99:0x71605d5b) * f32(0x1.ffc00000000000000000p+16:0x47ffe000) + f32(0x1.ffc10000000000000000p+16:0x47ffe080)
+res: f32(0x1.c082a000000000000000p+116:0x79e04150) flags=INEXACT  (24/2)
+op : f32(0x1.ffc10000000000000000p+16:0x47ffe080) * f32(0x1.c0bab600000000000000p+99:0x71605d5b) + f32(0x1.fffffe00000000000000p+127:0x7f7fffff)
+res: f32(inf:0x7f800000) flags=OVERFLOW INEXACT  (25/0)
+op : f32(0x1.c0bab600000000000000p+99:0x71605d5b) * f32(0x1.fffffe00000000000000p+127:0x7f7fffff) + f32(0x1.ffc10000000000000000p+16:0x47ffe080)
+res: f32(inf:0x7f800000) flags=OVERFLOW INEXACT  (25/1)
+op : f32(0x1.fffffe00000000000000p+127:0x7f7fffff) * f32(0x1.ffc10000000000000000p+16:0x47ffe080) + f32(0x1.c0bab600000000000000p+99:0x71605d5b)
+res: f32(inf:0x7f800000) flags=OVERFLOW INEXACT  (25/2)
+op : f32(0x1.c0bab600000000000000p+99:0x71605d5b) * f32(0x1.fffffe00000000000000p+127:0x7f7fffff) + f32(inf:0x7f800000)
+res: f32(inf:0x7f800000) flags=OK (26/0)
+op : f32(0x1.fffffe00000000000000p+127:0x7f7fffff) * f32(inf:0x7f800000) + f32(0x1.c0bab600000000000000p+99:0x71605d5b)
+res: f32(inf:0x7f800000) flags=OK (26/1)
+op : f32(inf:0x7f800000) * f32(0x1.c0bab600000000000000p+99:0x71605d5b) + f32(0x1.fffffe00000000000000p+127:0x7f7fffff)
+res: f32(inf:0x7f800000) flags=OK (26/2)
+op : f32(0x1.fffffe00000000000000p+127:0x7f7fffff) * f32(inf:0x7f800000) + f32(-nan:0x7fc00000)
+res: f32(-nan:0xffffffff) flags=OK (27/0)
+op : f32(inf:0x7f800000) * f32(-nan:0x7fc00000) + f32(0x1.fffffe00000000000000p+127:0x7f7fffff)
+res: f32(-nan:0xffffffff) flags=OK (27/1)
+op : f32(-nan:0x7fc00000) * f32(0x1.fffffe00000000000000p+127:0x7f7fffff) + f32(inf:0x7f800000)
+res: f32(-nan:0xffffffff) flags=OK (27/2)
+op : f32(inf:0x7f800000) * f32(-nan:0x7fc00000) + f32(-nan:0x7fa00000)
+res: f32(-nan:0xffffffff) flags=INVALID (28/0)
+op : f32(-nan:0x7fc00000) * f32(-nan:0x7fa00000) + f32(inf:0x7f800000)
+res: f32(-nan:0xffffffff) flags=INVALID (28/1)
+op : f32(-nan:0x7fa00000) * f32(inf:0x7f800000) + f32(-nan:0x7fc00000)
+res: f32(-nan:0xffffffff) flags=INVALID (28/2)
+op : f32(-nan:0x7fc00000) * f32(-nan:0x7fa00000) + f32(-nan:0xffa00000)
+res: f32(-nan:0xffffffff) flags=INVALID (29/0)
+op : f32(-nan:0x7fa00000) * f32(-nan:0xffa00000) + f32(-nan:0x7fc00000)
+res: f32(-nan:0xffffffff) flags=INVALID (29/1)
+op : f32(-nan:0xffa00000) * f32(-nan:0x7fc00000) + f32(-nan:0x7fa00000)
+res: f32(-nan:0xffffffff) flags=INVALID (29/2)
+op : f32(-nan:0x7fa00000) * f32(-nan:0xffa00000) + f32(-nan:0xffc00000)
+res: f32(-nan:0xffffffff) flags=INVALID (30/0)
+op : f32(-nan:0xffa00000) * f32(-nan:0xffc00000) + f32(-nan:0x7fa00000)
+res: f32(-nan:0xffffffff) flags=INVALID (30/1)
+op : f32(-nan:0xffc00000) * f32(-nan:0x7fa00000) + f32(-nan:0xffa00000)
+res: f32(-nan:0xffffffff) flags=INVALID (30/2)
+# LP184149
+op : f32(0x0.00000000000000000000p+0:0000000000) * f32(0x1.00000000000000000000p-1:0x3f000000) + f32(0x0.00000000000000000000p+0:0000000000)
+res: f32(0x0.00000000000000000000p+0:0000000000) flags=OK (31/0)
+op : f32(0x1.00000000000000000000p-149:0x00000001) * f32(0x1.00000000000000000000p-149:0x00000001) + f32(0x1.00000000000000000000p-149:0x00000001)
+res: f32(0x1.00000000000000000000p-148:0x00000002) flags=UNDERFLOW INEXACT  (32/0)
+### Rounding downwards
+op : f32(-nan:0xffa00000) * f32(-nan:0xffc00000) + f32(-inf:0xff800000)
+res: f32(-nan:0xffffffff) flags=INVALID (0/0)
+op : f32(-nan:0xffc00000) * f32(-inf:0xff800000) + f32(-nan:0xffa00000)
+res: f32(-nan:0xffffffff) flags=INVALID (0/1)
+op : f32(-inf:0xff800000) * f32(-nan:0xffa00000) + f32(-nan:0xffc00000)
+res: f32(-nan:0xffffffff) flags=INVALID (0/2)
+op : f32(-nan:0xffc00000) * f32(-inf:0xff800000) + f32(-0x1.fffffe00000000000000p+127:0xff7fffff)
+res: f32(-nan:0xffffffff) flags=OK (1/0)
+op : f32(-inf:0xff800000) * f32(-0x1.fffffe00000000000000p+127:0xff7fffff) + f32(-nan:0xffc00000)
+res: f32(-nan:0xffffffff) flags=OK (1/1)
+op : f32(-0x1.fffffe00000000000000p+127:0xff7fffff) * f32(-nan:0xffc00000) + f32(-inf:0xff800000)
+res: f32(-nan:0xffffffff) flags=OK (1/2)
+op : f32(-inf:0xff800000) * f32(-0x1.fffffe00000000000000p+127:0xff7fffff) + f32(-0x1.1874b200000000000000p+103:0xf30c3a59)
+res: f32(inf:0x7f800000) flags=OK (2/0)
+op : f32(-0x1.fffffe00000000000000p+127:0xff7fffff) * f32(-0x1.1874b200000000000000p+103:0xf30c3a59) + f32(-inf:0xff800000)
+res: f32(-inf:0xff800000) flags=OK (2/1)
+op : f32(-0x1.1874b200000000000000p+103:0xf30c3a59) * f32(-inf:0xff800000) + f32(-0x1.fffffe00000000000000p+127:0xff7fffff)
+res: f32(inf:0x7f800000) flags=OK (2/2)
+op : f32(-0x1.fffffe00000000000000p+127:0xff7fffff) * f32(-0x1.1874b200000000000000p+103:0xf30c3a59) + f32(-0x1.c0bab600000000000000p+99:0xf1605d5b)
+res: f32(0x1.fffffe00000000000000p+127:0x7f7fffff) flags=OVERFLOW INEXACT  (3/0)
+op : f32(-0x1.1874b200000000000000p+103:0xf30c3a59) * f32(-0x1.c0bab600000000000000p+99:0xf1605d5b) + f32(-0x1.fffffe00000000000000p+127:0xff7fffff)
+res: f32(0x1.fffffe00000000000000p+127:0x7f7fffff) flags=OVERFLOW INEXACT  (3/1)
+op : f32(-0x1.c0bab600000000000000p+99:0xf1605d5b) * f32(-0x1.fffffe00000000000000p+127:0xff7fffff) + f32(-0x1.1874b200000000000000p+103:0xf30c3a59)
+res: f32(0x1.fffffe00000000000000p+127:0x7f7fffff) flags=OVERFLOW INEXACT  (3/2)
+op : f32(-0x1.1874b200000000000000p+103:0xf30c3a59) * f32(-0x1.c0bab600000000000000p+99:0xf1605d5b) + f32(-0x1.31f75000000000000000p-40:0xab98fba8)
+res: f32(0x1.fffffe00000000000000p+127:0x7f7fffff) flags=OVERFLOW INEXACT  (4/0)
+op : f32(-0x1.c0bab600000000000000p+99:0xf1605d5b) * f32(-0x1.31f75000000000000000p-40:0xab98fba8) + f32(-0x1.1874b200000000000000p+103:0xf30c3a59)
+res: f32(-0x1.1874b200000000000000p+103:0xf30c3a59) flags=INEXACT  (4/1)
+op : f32(-0x1.31f75000000000000000p-40:0xab98fba8) * f32(-0x1.1874b200000000000000p+103:0xf30c3a59) + f32(-0x1.c0bab600000000000000p+99:0xf1605d5b)
+res: f32(-0x1.c0bab600000000000000p+99:0xf1605d5b) flags=INEXACT  (4/2)
+op : f32(-0x1.c0bab600000000000000p+99:0xf1605d5b) * f32(-0x1.31f75000000000000000p-40:0xab98fba8) + f32(-0x1.50544400000000000000p-66:0x9ea82a22)
+res: f32(0x1.0c27f800000000000000p+60:0x5d8613fc) flags=INEXACT  (5/0)
+op : f32(-0x1.31f75000000000000000p-40:0xab98fba8) * f32(-0x1.50544400000000000000p-66:0x9ea82a22) + f32(-0x1.c0bab600000000000000p+99:0xf1605d5b)
+res: f32(-0x1.c0bab600000000000000p+99:0xf1605d5b) flags=INEXACT  (5/1)
+op : f32(-0x1.50544400000000000000p-66:0x9ea82a22) * f32(-0x1.c0bab600000000000000p+99:0xf1605d5b) + f32(-0x1.31f75000000000000000p-40:0xab98fba8)
+res: f32(0x1.26c46000000000000000p+34:0x50936230) flags=INEXACT  (5/2)
+op : f32(-0x1.31f75000000000000000p-40:0xab98fba8) * f32(-0x1.50544400000000000000p-66:0x9ea82a22) + f32(-0x1.00000000000000000000p-126:0x80800000)
+res: f32(0x1.91f93e00000000000000p-106:0x0ac8fc9f) flags=INEXACT  (6/0)
+op : f32(-0x1.50544400000000000000p-66:0x9ea82a22) * f32(-0x1.00000000000000000000p-126:0x80800000) + f32(-0x1.31f75000000000000000p-40:0xab98fba8)
+res: f32(-0x1.31f75000000000000000p-40:0xab98fba8) flags=INEXACT  (6/1)
+op : f32(-0x1.00000000000000000000p-126:0x80800000) * f32(-0x1.31f75000000000000000p-40:0xab98fba8) + f32(-0x1.50544400000000000000p-66:0x9ea82a22)
+res: f32(-0x1.50544400000000000000p-66:0x9ea82a22) flags=INEXACT  (6/2)
+op : f32(-0x1.50544400000000000000p-66:0x9ea82a22) * f32(-0x1.00000000000000000000p-126:0x80800000) + f32(0x0.00000000000000000000p+0:0000000000)
+res: f32(0x0.00000000000000000000p+0:0000000000) flags=UNDERFLOW INEXACT  (7/0)
+op : f32(-0x1.00000000000000000000p-126:0x80800000) * f32(0x0.00000000000000000000p+0:0000000000) + f32(-0x1.50544400000000000000p-66:0x9ea82a22)
+res: f32(-0x1.50544400000000000000p-66:0x9ea82a22) flags=INEXACT  (7/1)
+op : f32(0x0.00000000000000000000p+0:0000000000) * f32(-0x1.50544400000000000000p-66:0x9ea82a22) + f32(-0x1.00000000000000000000p-126:0x80800000)
+res: f32(-0x1.00000000000000000000p-126:0x80800000) flags=OK (7/2)
+op : f32(-0x1.00000000000000000000p-126:0x80800000) * f32(0x0.00000000000000000000p+0:0000000000) + f32(0x1.00000000000000000000p-126:0x00800000)
+res: f32(0x1.00000000000000000000p-126:0x00800000) flags=OK (8/0)
+op : f32(0x0.00000000000000000000p+0:0000000000) * f32(0x1.00000000000000000000p-126:0x00800000) + f32(-0x1.00000000000000000000p-126:0x80800000)
+res: f32(-0x1.00000000000000000000p-126:0x80800000) flags=OK (8/1)
+op : f32(0x1.00000000000000000000p-126:0x00800000) * f32(-0x1.00000000000000000000p-126:0x80800000) + f32(0x0.00000000000000000000p+0:0000000000)
+res: f32(-0x1.00000000000000000000p-149:0x80000001) flags=UNDERFLOW INEXACT  (8/2)
+op : f32(0x0.00000000000000000000p+0:0000000000) * f32(0x1.00000000000000000000p-126:0x00800000) + f32(0x1.00000000000000000000p-25:0x33000000)
+res: f32(0x1.00000000000000000000p-25:0x33000000) flags=OK (9/0)
+op : f32(0x1.00000000000000000000p-126:0x00800000) * f32(0x1.00000000000000000000p-25:0x33000000) + f32(0x0.00000000000000000000p+0:0000000000)
+res: f32(0x0.00000000000000000000p+0:0000000000) flags=UNDERFLOW INEXACT  (9/1)
+op : f32(0x1.00000000000000000000p-25:0x33000000) * f32(0x0.00000000000000000000p+0:0000000000) + f32(0x1.00000000000000000000p-126:0x00800000)
+res: f32(0x1.00000000000000000000p-126:0x00800000) flags=OK (9/2)
+op : f32(0x1.00000000000000000000p-126:0x00800000) * f32(0x1.00000000000000000000p-25:0x33000000) + f32(0x1.ffffe600000000000000p-25:0x337ffff3)
+res: f32(0x1.ffffe600000000000000p-25:0x337ffff3) flags=INEXACT  (10/0)
+op : f32(0x1.00000000000000000000p-25:0x33000000) * f32(0x1.ffffe600000000000000p-25:0x337ffff3) + f32(0x1.00000000000000000000p-126:0x00800000)
+res: f32(0x1.ffffe600000000000000p-50:0x26fffff3) flags=INEXACT  (10/1)
+op : f32(0x1.ffffe600000000000000p-25:0x337ffff3) * f32(0x1.00000000000000000000p-126:0x00800000) + f32(0x1.00000000000000000000p-25:0x33000000)
+res: f32(0x1.00000000000000000000p-25:0x33000000) flags=INEXACT  (10/2)
+op : f32(0x1.00000000000000000000p-25:0x33000000) * f32(0x1.ffffe600000000000000p-25:0x337ffff3) + f32(0x1.ff801a00000000000000p-15:0x387fc00d)
+res: f32(0x1.ff801a00000000000000p-15:0x387fc00d) flags=INEXACT  (11/0)
+op : f32(0x1.ffffe600000000000000p-25:0x337ffff3) * f32(0x1.ff801a00000000000000p-15:0x387fc00d) + f32(0x1.00000000000000000000p-25:0x33000000)
+res: f32(0x1.0007fe00000000000000p-25:0x330003ff) flags=INEXACT  (11/1)
+op : f32(0x1.ff801a00000000000000p-15:0x387fc00d) * f32(0x1.00000000000000000000p-25:0x33000000) + f32(0x1.ffffe600000000000000p-25:0x337ffff3)
+res: f32(0x1.0001f200000000000000p-24:0x338000f9) flags=INEXACT  (11/2)
+op : f32(0x1.ffffe600000000000000p-25:0x337ffff3) * f32(0x1.ff801a00000000000000p-15:0x387fc00d) + f32(0x1.00000c00000000000000p-14:0x38800006)
+res: f32(0x1.00000c00000000000000p-14:0x38800006) flags=INEXACT  (12/0)
+op : f32(0x1.ff801a00000000000000p-15:0x387fc00d) * f32(0x1.00000c00000000000000p-14:0x38800006) + f32(0x1.ffffe600000000000000p-25:0x337ffff3)
+res: f32(0x1.0ffbf400000000000000p-24:0x3387fdfa) flags=INEXACT  (12/1)
+op : f32(0x1.00000c00000000000000p-14:0x38800006) * f32(0x1.ffffe600000000000000p-25:0x337ffff3) + f32(0x1.ff801a00000000000000p-15:0x387fc00d)
+res: f32(0x1.ff801a00000000000000p-15:0x387fc00d) flags=INEXACT  (12/2)
+op : f32(0x1.ff801a00000000000000p-15:0x387fc00d) * f32(0x1.00000c00000000000000p-14:0x38800006) + f32(0x1.00000000000000000000p+0:0x3f800000)
+res: f32(0x1.00000000000000000000p+0:0x3f800000) flags=INEXACT  (13/0)
+op : f32(0x1.00000c00000000000000p-14:0x38800006) * f32(0x1.00000000000000000000p+0:0x3f800000) + f32(0x1.ff801a00000000000000p-15:0x387fc00d)
+res: f32(0x1.ffc01800000000000000p-14:0x38ffe00c) flags=INEXACT  (13/1)
+op : f32(0x1.00000000000000000000p+0:0x3f800000) * f32(0x1.ff801a00000000000000p-15:0x387fc00d) + f32(0x1.00000c00000000000000p-14:0x38800006)
+res: f32(0x1.ffc01800000000000000p-14:0x38ffe00c) flags=INEXACT  (13/2)
+op : f32(0x1.00000c00000000000000p-14:0x38800006) * f32(0x1.00000000000000000000p+0:0x3f800000) + f32(0x1.00400000000000000000p+0:0x3f802000)
+res: f32(0x1.00440000000000000000p+0:0x3f802200) flags=INEXACT  (14/0)
+op : f32(0x1.00000000000000000000p+0:0x3f800000) * f32(0x1.00400000000000000000p+0:0x3f802000) + f32(0x1.00000c00000000000000p-14:0x38800006)
+res: f32(0x1.00440000000000000000p+0:0x3f802200) flags=INEXACT  (14/1)
+op : f32(0x1.00400000000000000000p+0:0x3f802000) * f32(0x1.00000c00000000000000p-14:0x38800006) + f32(0x1.00000000000000000000p+0:0x3f800000)
+res: f32(0x1.00040000000000000000p+0:0x3f800200) flags=INEXACT  (14/2)
+op : f32(0x1.00000000000000000000p+0:0x3f800000) * f32(0x1.00400000000000000000p+0:0x3f802000) + f32(0x1.00000000000000000000p+1:0x40000000)
+res: f32(0x1.80200000000000000000p+1:0x40401000) flags=INEXACT  (15/0)
+op : f32(0x1.00400000000000000000p+0:0x3f802000) * f32(0x1.00000000000000000000p+1:0x40000000) + f32(0x1.00000000000000000000p+0:0x3f800000)
+res: f32(0x1.80400000000000000000p+1:0x40402000) flags=INEXACT  (15/1)
+op : f32(0x1.00000000000000000000p+1:0x40000000) * f32(0x1.00000000000000000000p+0:0x3f800000) + f32(0x1.00400000000000000000p+0:0x3f802000)
+res: f32(0x1.80200000000000000000p+1:0x40401000) flags=INEXACT  (15/2)
+op : f32(0x1.00400000000000000000p+0:0x3f802000) * f32(0x1.00000000000000000000p+1:0x40000000) + f32(0x1.5bf0a800000000000000p+1:0x402df854)
+res: f32(0x1.2e185400000000000000p+2:0x40970c2a) flags=INEXACT  (16/0)
+op : f32(0x1.00000000000000000000p+1:0x40000000) * f32(0x1.5bf0a800000000000000p+1:0x402df854) + f32(0x1.00400000000000000000p+0:0x3f802000)
+res: f32(0x1.9c00a800000000000000p+2:0x40ce0054) flags=INEXACT  (16/1)
+op : f32(0x1.5bf0a800000000000000p+1:0x402df854) * f32(0x1.00400000000000000000p+0:0x3f802000) + f32(0x1.00000000000000000000p+1:0x40000000)
+res: f32(0x1.2e23d200000000000000p+2:0x409711e9) flags=INEXACT  (16/2)
+op : f32(0x1.00000000000000000000p+1:0x40000000) * f32(0x1.5bf0a800000000000000p+1:0x402df854) + f32(0x1.921fb600000000000000p+1:0x40490fdb)
+res: f32(0x1.12804000000000000000p+3:0x41094020) flags=INEXACT  (17/0)
+op : f32(0x1.5bf0a800000000000000p+1:0x402df854) * f32(0x1.921fb600000000000000p+1:0x40490fdb) + f32(0x1.00000000000000000000p+1:0x40000000)
+res: f32(0x1.51458000000000000000p+3:0x4128a2c0) flags=INEXACT  (17/1)
+op : f32(0x1.921fb600000000000000p+1:0x40490fdb) * f32(0x1.00000000000000000000p+1:0x40000000) + f32(0x1.5bf0a800000000000000p+1:0x402df854)
+res: f32(0x1.200c0400000000000000p+3:0x41100602) flags=INEXACT  (17/2)
+op : f32(0x1.5bf0a800000000000000p+1:0x402df854) * f32(0x1.921fb600000000000000p+1:0x40490fdb) + f32(0x1.ffbe0000000000000000p+15:0x477fdf00)
+res: f32(0x1.ffcf1400000000000000p+15:0x477fe78a) flags=INEXACT  (18/0)
+op : f32(0x1.921fb600000000000000p+1:0x40490fdb) * f32(0x1.ffbe0000000000000000p+15:0x477fdf00) + f32(0x1.5bf0a800000000000000p+1:0x402df854)
+res: f32(0x1.91ed3a00000000000000p+17:0x4848f69d) flags=INEXACT  (18/1)
+op : f32(0x1.ffbe0000000000000000p+15:0x477fdf00) * f32(0x1.5bf0a800000000000000p+1:0x402df854) + f32(0x1.921fb600000000000000p+1:0x40490fdb)
+res: f32(0x1.5bc56000000000000000p+17:0x482de2b0) flags=INEXACT  (18/2)
+op : f32(0x1.921fb600000000000000p+1:0x40490fdb) * f32(0x1.ffbe0000000000000000p+15:0x477fdf00) + f32(0x1.ffc00000000000000000p+15:0x477fe000)
+res: f32(0x1.08edee00000000000000p+18:0x488476f7) flags=INEXACT  (19/0)
+op : f32(0x1.ffbe0000000000000000p+15:0x477fdf00) * f32(0x1.ffc00000000000000000p+15:0x477fe000) + f32(0x1.921fb600000000000000p+1:0x40490fdb)
+res: f32(0x1.ff7e0800000000000000p+31:0x4f7fbf04) flags=INEXACT  (19/1)
+op : f32(0x1.ffc00000000000000000p+15:0x477fe000) * f32(0x1.921fb600000000000000p+1:0x40490fdb) + f32(0x1.ffbe0000000000000000p+15:0x477fdf00)
+res: f32(0x1.08ee7800000000000000p+18:0x4884773c) flags=INEXACT  (19/2)
+op : f32(0x1.ffbe0000000000000000p+15:0x477fdf00) * f32(0x1.ffc00000000000000000p+15:0x477fe000) + f32(0x1.ffc20000000000000000p+15:0x477fe100)
+res: f32(0x1.ff800800000000000000p+31:0x4f7fc004) flags=INEXACT  (20/0)
+op : f32(0x1.ffc00000000000000000p+15:0x477fe000) * f32(0x1.ffc20000000000000000p+15:0x477fe100) + f32(0x1.ffbe0000000000000000p+15:0x477fdf00)
+res: f32(0x1.ff840600000000000000p+31:0x4f7fc203) flags=INEXACT  (20/1)
+op : f32(0x1.ffc20000000000000000p+15:0x477fe100) * f32(0x1.ffbe0000000000000000p+15:0x477fdf00) + f32(0x1.ffc00000000000000000p+15:0x477fe000)
+res: f32(0x1.ff820600000000000000p+31:0x4f7fc103) flags=INEXACT  (20/2)
+op : f32(0x1.ffc00000000000000000p+15:0x477fe000) * f32(0x1.ffc20000000000000000p+15:0x477fe100) + f32(0x1.ffbf0000000000000000p+16:0x47ffdf80)
+res: f32(0x1.ff860600000000000000p+31:0x4f7fc303) flags=INEXACT  (21/0)
+op : f32(0x1.ffc20000000000000000p+15:0x477fe100) * f32(0x1.ffbf0000000000000000p+16:0x47ffdf80) + f32(0x1.ffc00000000000000000p+15:0x477fe000)
+res: f32(0x1.ff820600000000000000p+32:0x4fffc103) flags=INEXACT  (21/1)
+op : f32(0x1.ffbf0000000000000000p+16:0x47ffdf80) * f32(0x1.ffc00000000000000000p+15:0x477fe000) + f32(0x1.ffc20000000000000000p+15:0x477fe100)
+res: f32(0x1.ff800800000000000000p+32:0x4fffc004) flags=INEXACT  (21/2)
+op : f32(0x1.ffc20000000000000000p+15:0x477fe100) * f32(0x1.ffbf0000000000000000p+16:0x47ffdf80) + f32(0x1.ffc00000000000000000p+16:0x47ffe000)
+res: f32(0x1.ff830600000000000000p+32:0x4fffc183) flags=INEXACT  (22/0)
+op : f32(0x1.ffbf0000000000000000p+16:0x47ffdf80) * f32(0x1.ffc00000000000000000p+16:0x47ffe000) + f32(0x1.ffc20000000000000000p+15:0x477fe100)
+res: f32(0x1.ff7f8800000000000000p+33:0x507fbfc4) flags=INEXACT  (22/1)
+op : f32(0x1.ffc00000000000000000p+16:0x47ffe000) * f32(0x1.ffc20000000000000000p+15:0x477fe100) + f32(0x1.ffbf0000000000000000p+16:0x47ffdf80)
+res: f32(0x1.ff840600000000000000p+32:0x4fffc203) flags=INEXACT  (22/2)
+op : f32(0x1.ffbf0000000000000000p+16:0x47ffdf80) * f32(0x1.ffc00000000000000000p+16:0x47ffe000) + f32(0x1.ffc10000000000000000p+16:0x47ffe080)
+res: f32(0x1.ff800800000000000000p+33:0x507fc004) flags=INEXACT  (23/0)
+op : f32(0x1.ffc00000000000000000p+16:0x47ffe000) * f32(0x1.ffc10000000000000000p+16:0x47ffe080) + f32(0x1.ffbf0000000000000000p+16:0x47ffdf80)
+res: f32(0x1.ff820600000000000000p+33:0x507fc103) flags=INEXACT  (23/1)
+op : f32(0x1.ffc10000000000000000p+16:0x47ffe080) * f32(0x1.ffbf0000000000000000p+16:0x47ffdf80) + f32(0x1.ffc00000000000000000p+16:0x47ffe000)
+res: f32(0x1.ff810600000000000000p+33:0x507fc083) flags=INEXACT  (23/2)
+op : f32(0x1.ffc00000000000000000p+16:0x47ffe000) * f32(0x1.ffc10000000000000000p+16:0x47ffe080) + f32(0x1.c0bab600000000000000p+99:0x71605d5b)
+res: f32(0x1.c0bab600000000000000p+99:0x71605d5b) flags=INEXACT  (24/0)
+op : f32(0x1.ffc10000000000000000p+16:0x47ffe080) * f32(0x1.c0bab600000000000000p+99:0x71605d5b) + f32(0x1.ffc00000000000000000p+16:0x47ffe000)
+res: f32(0x1.c0837e00000000000000p+116:0x79e041bf) flags=INEXACT  (24/1)
+op : f32(0x1.c0bab600000000000000p+99:0x71605d5b) * f32(0x1.ffc00000000000000000p+16:0x47ffe000) + f32(0x1.ffc10000000000000000p+16:0x47ffe080)
+res: f32(0x1.c0829e00000000000000p+116:0x79e0414f) flags=INEXACT  (24/2)
+op : f32(0x1.ffc10000000000000000p+16:0x47ffe080) * f32(0x1.c0bab600000000000000p+99:0x71605d5b) + f32(0x1.fffffe00000000000000p+127:0x7f7fffff)
+res: f32(0x1.fffffe00000000000000p+127:0x7f7fffff) flags=OVERFLOW INEXACT  (25/0)
+op : f32(0x1.c0bab600000000000000p+99:0x71605d5b) * f32(0x1.fffffe00000000000000p+127:0x7f7fffff) + f32(0x1.ffc10000000000000000p+16:0x47ffe080)
+res: f32(0x1.fffffe00000000000000p+127:0x7f7fffff) flags=OVERFLOW INEXACT  (25/1)
+op : f32(0x1.fffffe00000000000000p+127:0x7f7fffff) * f32(0x1.ffc10000000000000000p+16:0x47ffe080) + f32(0x1.c0bab600000000000000p+99:0x71605d5b)
+res: f32(0x1.fffffe00000000000000p+127:0x7f7fffff) flags=OVERFLOW INEXACT  (25/2)
+op : f32(0x1.c0bab600000000000000p+99:0x71605d5b) * f32(0x1.fffffe00000000000000p+127:0x7f7fffff) + f32(inf:0x7f800000)
+res: f32(inf:0x7f800000) flags=OK (26/0)
+op : f32(0x1.fffffe00000000000000p+127:0x7f7fffff) * f32(inf:0x7f800000) + f32(0x1.c0bab600000000000000p+99:0x71605d5b)
+res: f32(inf:0x7f800000) flags=OK (26/1)
+op : f32(inf:0x7f800000) * f32(0x1.c0bab600000000000000p+99:0x71605d5b) + f32(0x1.fffffe00000000000000p+127:0x7f7fffff)
+res: f32(inf:0x7f800000) flags=OK (26/2)
+op : f32(0x1.fffffe00000000000000p+127:0x7f7fffff) * f32(inf:0x7f800000) + f32(-nan:0x7fc00000)
+res: f32(-nan:0xffffffff) flags=OK (27/0)
+op : f32(inf:0x7f800000) * f32(-nan:0x7fc00000) + f32(0x1.fffffe00000000000000p+127:0x7f7fffff)
+res: f32(-nan:0xffffffff) flags=OK (27/1)
+op : f32(-nan:0x7fc00000) * f32(0x1.fffffe00000000000000p+127:0x7f7fffff) + f32(inf:0x7f800000)
+res: f32(-nan:0xffffffff) flags=OK (27/2)
+op : f32(inf:0x7f800000) * f32(-nan:0x7fc00000) + f32(-nan:0x7fa00000)
+res: f32(-nan:0xffffffff) flags=INVALID (28/0)
+op : f32(-nan:0x7fc00000) * f32(-nan:0x7fa00000) + f32(inf:0x7f800000)
+res: f32(-nan:0xffffffff) flags=INVALID (28/1)
+op : f32(-nan:0x7fa00000) * f32(inf:0x7f800000) + f32(-nan:0x7fc00000)
+res: f32(-nan:0xffffffff) flags=INVALID (28/2)
+op : f32(-nan:0x7fc00000) * f32(-nan:0x7fa00000) + f32(-nan:0xffa00000)
+res: f32(-nan:0xffffffff) flags=INVALID (29/0)
+op : f32(-nan:0x7fa00000) * f32(-nan:0xffa00000) + f32(-nan:0x7fc00000)
+res: f32(-nan:0xffffffff) flags=INVALID (29/1)
+op : f32(-nan:0xffa00000) * f32(-nan:0x7fc00000) + f32(-nan:0x7fa00000)
+res: f32(-nan:0xffffffff) flags=INVALID (29/2)
+op : f32(-nan:0x7fa00000) * f32(-nan:0xffa00000) + f32(-nan:0xffc00000)
+res: f32(-nan:0xffffffff) flags=INVALID (30/0)
+op : f32(-nan:0xffa00000) * f32(-nan:0xffc00000) + f32(-nan:0x7fa00000)
+res: f32(-nan:0xffffffff) flags=INVALID (30/1)
+op : f32(-nan:0xffc00000) * f32(-nan:0x7fa00000) + f32(-nan:0xffa00000)
+res: f32(-nan:0xffffffff) flags=INVALID (30/2)
+# LP184149
+op : f32(0x0.00000000000000000000p+0:0000000000) * f32(0x1.00000000000000000000p-1:0x3f000000) + f32(0x0.00000000000000000000p+0:0000000000)
+res: f32(0x0.00000000000000000000p+0:0000000000) flags=OK (31/0)
+op : f32(0x1.00000000000000000000p-149:0x00000001) * f32(0x1.00000000000000000000p-149:0x00000001) + f32(0x1.00000000000000000000p-149:0x00000001)
+res: f32(0x1.00000000000000000000p-149:0x00000001) flags=UNDERFLOW INEXACT  (32/0)
+### Rounding to zero
+op : f32(-nan:0xffa00000) * f32(-nan:0xffc00000) + f32(-inf:0xff800000)
+res: f32(-nan:0xffffffff) flags=INVALID (0/0)
+op : f32(-nan:0xffc00000) * f32(-inf:0xff800000) + f32(-nan:0xffa00000)
+res: f32(-nan:0xffffffff) flags=INVALID (0/1)
+op : f32(-inf:0xff800000) * f32(-nan:0xffa00000) + f32(-nan:0xffc00000)
+res: f32(-nan:0xffffffff) flags=INVALID (0/2)
+op : f32(-nan:0xffc00000) * f32(-inf:0xff800000) + f32(-0x1.fffffe00000000000000p+127:0xff7fffff)
+res: f32(-nan:0xffffffff) flags=OK (1/0)
+op : f32(-inf:0xff800000) * f32(-0x1.fffffe00000000000000p+127:0xff7fffff) + f32(-nan:0xffc00000)
+res: f32(-nan:0xffffffff) flags=OK (1/1)
+op : f32(-0x1.fffffe00000000000000p+127:0xff7fffff) * f32(-nan:0xffc00000) + f32(-inf:0xff800000)
+res: f32(-nan:0xffffffff) flags=OK (1/2)
+op : f32(-inf:0xff800000) * f32(-0x1.fffffe00000000000000p+127:0xff7fffff) + f32(-0x1.1874b200000000000000p+103:0xf30c3a59)
+res: f32(inf:0x7f800000) flags=OK (2/0)
+op : f32(-0x1.fffffe00000000000000p+127:0xff7fffff) * f32(-0x1.1874b200000000000000p+103:0xf30c3a59) + f32(-inf:0xff800000)
+res: f32(-inf:0xff800000) flags=OK (2/1)
+op : f32(-0x1.1874b200000000000000p+103:0xf30c3a59) * f32(-inf:0xff800000) + f32(-0x1.fffffe00000000000000p+127:0xff7fffff)
+res: f32(inf:0x7f800000) flags=OK (2/2)
+op : f32(-0x1.fffffe00000000000000p+127:0xff7fffff) * f32(-0x1.1874b200000000000000p+103:0xf30c3a59) + f32(-0x1.c0bab600000000000000p+99:0xf1605d5b)
+res: f32(0x1.fffffe00000000000000p+127:0x7f7fffff) flags=OVERFLOW INEXACT  (3/0)
+op : f32(-0x1.1874b200000000000000p+103:0xf30c3a59) * f32(-0x1.c0bab600000000000000p+99:0xf1605d5b) + f32(-0x1.fffffe00000000000000p+127:0xff7fffff)
+res: f32(0x1.fffffe00000000000000p+127:0x7f7fffff) flags=OVERFLOW INEXACT  (3/1)
+op : f32(-0x1.c0bab600000000000000p+99:0xf1605d5b) * f32(-0x1.fffffe00000000000000p+127:0xff7fffff) + f32(-0x1.1874b200000000000000p+103:0xf30c3a59)
+res: f32(0x1.fffffe00000000000000p+127:0x7f7fffff) flags=OVERFLOW INEXACT  (3/2)
+op : f32(-0x1.1874b200000000000000p+103:0xf30c3a59) * f32(-0x1.c0bab600000000000000p+99:0xf1605d5b) + f32(-0x1.31f75000000000000000p-40:0xab98fba8)
+res: f32(0x1.fffffe00000000000000p+127:0x7f7fffff) flags=OVERFLOW INEXACT  (4/0)
+op : f32(-0x1.c0bab600000000000000p+99:0xf1605d5b) * f32(-0x1.31f75000000000000000p-40:0xab98fba8) + f32(-0x1.1874b200000000000000p+103:0xf30c3a59)
+res: f32(-0x1.1874b000000000000000p+103:0xf30c3a58) flags=INEXACT  (4/1)
+op : f32(-0x1.31f75000000000000000p-40:0xab98fba8) * f32(-0x1.1874b200000000000000p+103:0xf30c3a59) + f32(-0x1.c0bab600000000000000p+99:0xf1605d5b)
+res: f32(-0x1.c0bab400000000000000p+99:0xf1605d5a) flags=INEXACT  (4/2)
+op : f32(-0x1.c0bab600000000000000p+99:0xf1605d5b) * f32(-0x1.31f75000000000000000p-40:0xab98fba8) + f32(-0x1.50544400000000000000p-66:0x9ea82a22)
+res: f32(0x1.0c27f800000000000000p+60:0x5d8613fc) flags=INEXACT  (5/0)
+op : f32(-0x1.31f75000000000000000p-40:0xab98fba8) * f32(-0x1.50544400000000000000p-66:0x9ea82a22) + f32(-0x1.c0bab600000000000000p+99:0xf1605d5b)
+res: f32(-0x1.c0bab400000000000000p+99:0xf1605d5a) flags=INEXACT  (5/1)
+op : f32(-0x1.50544400000000000000p-66:0x9ea82a22) * f32(-0x1.c0bab600000000000000p+99:0xf1605d5b) + f32(-0x1.31f75000000000000000p-40:0xab98fba8)
+res: f32(0x1.26c46000000000000000p+34:0x50936230) flags=INEXACT  (5/2)
+op : f32(-0x1.31f75000000000000000p-40:0xab98fba8) * f32(-0x1.50544400000000000000p-66:0x9ea82a22) + f32(-0x1.00000000000000000000p-126:0x80800000)
+res: f32(0x1.91f93e00000000000000p-106:0x0ac8fc9f) flags=INEXACT  (6/0)
+op : f32(-0x1.50544400000000000000p-66:0x9ea82a22) * f32(-0x1.00000000000000000000p-126:0x80800000) + f32(-0x1.31f75000000000000000p-40:0xab98fba8)
+res: f32(-0x1.31f74e00000000000000p-40:0xab98fba7) flags=INEXACT  (6/1)
+op : f32(-0x1.00000000000000000000p-126:0x80800000) * f32(-0x1.31f75000000000000000p-40:0xab98fba8) + f32(-0x1.50544400000000000000p-66:0x9ea82a22)
+res: f32(-0x1.50544200000000000000p-66:0x9ea82a21) flags=INEXACT  (6/2)
+op : f32(-0x1.50544400000000000000p-66:0x9ea82a22) * f32(-0x1.00000000000000000000p-126:0x80800000) + f32(0x0.00000000000000000000p+0:0000000000)
+res: f32(0x0.00000000000000000000p+0:0000000000) flags=UNDERFLOW INEXACT  (7/0)
+op : f32(-0x1.00000000000000000000p-126:0x80800000) * f32(0x0.00000000000000000000p+0:0000000000) + f32(-0x1.50544400000000000000p-66:0x9ea82a22)
+res: f32(-0x1.50544400000000000000p-66:0x9ea82a22) flags=INEXACT  (7/1)
+op : f32(0x0.00000000000000000000p+0:0000000000) * f32(-0x1.50544400000000000000p-66:0x9ea82a22) + f32(-0x1.00000000000000000000p-126:0x80800000)
+res: f32(-0x1.00000000000000000000p-126:0x80800000) flags=OK (7/2)
+op : f32(-0x1.00000000000000000000p-126:0x80800000) * f32(0x0.00000000000000000000p+0:0000000000) + f32(0x1.00000000000000000000p-126:0x00800000)
+res: f32(0x1.00000000000000000000p-126:0x00800000) flags=OK (8/0)
+op : f32(0x0.00000000000000000000p+0:0000000000) * f32(0x1.00000000000000000000p-126:0x00800000) + f32(-0x1.00000000000000000000p-126:0x80800000)
+res: f32(-0x1.00000000000000000000p-126:0x80800000) flags=OK (8/1)
+op : f32(0x1.00000000000000000000p-126:0x00800000) * f32(-0x1.00000000000000000000p-126:0x80800000) + f32(0x0.00000000000000000000p+0:0000000000)
+res: f32(-0x0.00000000000000000000p+0:0x80000000) flags=UNDERFLOW INEXACT  (8/2)
+op : f32(0x0.00000000000000000000p+0:0000000000) * f32(0x1.00000000000000000000p-126:0x00800000) + f32(0x1.00000000000000000000p-25:0x33000000)
+res: f32(0x1.00000000000000000000p-25:0x33000000) flags=OK (9/0)
+op : f32(0x1.00000000000000000000p-126:0x00800000) * f32(0x1.00000000000000000000p-25:0x33000000) + f32(0x0.00000000000000000000p+0:0000000000)
+res: f32(0x0.00000000000000000000p+0:0000000000) flags=UNDERFLOW INEXACT  (9/1)
+op : f32(0x1.00000000000000000000p-25:0x33000000) * f32(0x0.00000000000000000000p+0:0000000000) + f32(0x1.00000000000000000000p-126:0x00800000)
+res: f32(0x1.00000000000000000000p-126:0x00800000) flags=OK (9/2)
+op : f32(0x1.00000000000000000000p-126:0x00800000) * f32(0x1.00000000000000000000p-25:0x33000000) + f32(0x1.ffffe600000000000000p-25:0x337ffff3)
+res: f32(0x1.ffffe600000000000000p-25:0x337ffff3) flags=INEXACT  (10/0)
+op : f32(0x1.00000000000000000000p-25:0x33000000) * f32(0x1.ffffe600000000000000p-25:0x337ffff3) + f32(0x1.00000000000000000000p-126:0x00800000)
+res: f32(0x1.ffffe600000000000000p-50:0x26fffff3) flags=INEXACT  (10/1)
+op : f32(0x1.ffffe600000000000000p-25:0x337ffff3) * f32(0x1.00000000000000000000p-126:0x00800000) + f32(0x1.00000000000000000000p-25:0x33000000)
+res: f32(0x1.00000000000000000000p-25:0x33000000) flags=INEXACT  (10/2)
+op : f32(0x1.00000000000000000000p-25:0x33000000) * f32(0x1.ffffe600000000000000p-25:0x337ffff3) + f32(0x1.ff801a00000000000000p-15:0x387fc00d)
+res: f32(0x1.ff801a00000000000000p-15:0x387fc00d) flags=INEXACT  (11/0)
+op : f32(0x1.ffffe600000000000000p-25:0x337ffff3) * f32(0x1.ff801a00000000000000p-15:0x387fc00d) + f32(0x1.00000000000000000000p-25:0x33000000)
+res: f32(0x1.0007fe00000000000000p-25:0x330003ff) flags=INEXACT  (11/1)
+op : f32(0x1.ff801a00000000000000p-15:0x387fc00d) * f32(0x1.00000000000000000000p-25:0x33000000) + f32(0x1.ffffe600000000000000p-25:0x337ffff3)
+res: f32(0x1.0001f200000000000000p-24:0x338000f9) flags=INEXACT  (11/2)
+op : f32(0x1.ffffe600000000000000p-25:0x337ffff3) * f32(0x1.ff801a00000000000000p-15:0x387fc00d) + f32(0x1.00000c00000000000000p-14:0x38800006)
+res: f32(0x1.00000c00000000000000p-14:0x38800006) flags=INEXACT  (12/0)
+op : f32(0x1.ff801a00000000000000p-15:0x387fc00d) * f32(0x1.00000c00000000000000p-14:0x38800006) + f32(0x1.ffffe600000000000000p-25:0x337ffff3)
+res: f32(0x1.0ffbf400000000000000p-24:0x3387fdfa) flags=INEXACT  (12/1)
+op : f32(0x1.00000c00000000000000p-14:0x38800006) * f32(0x1.ffffe600000000000000p-25:0x337ffff3) + f32(0x1.ff801a00000000000000p-15:0x387fc00d)
+res: f32(0x1.ff801a00000000000000p-15:0x387fc00d) flags=INEXACT  (12/2)
+op : f32(0x1.ff801a00000000000000p-15:0x387fc00d) * f32(0x1.00000c00000000000000p-14:0x38800006) + f32(0x1.00000000000000000000p+0:0x3f800000)
+res: f32(0x1.00000000000000000000p+0:0x3f800000) flags=INEXACT  (13/0)
+op : f32(0x1.00000c00000000000000p-14:0x38800006) * f32(0x1.00000000000000000000p+0:0x3f800000) + f32(0x1.ff801a00000000000000p-15:0x387fc00d)
+res: f32(0x1.ffc01800000000000000p-14:0x38ffe00c) flags=INEXACT  (13/1)
+op : f32(0x1.00000000000000000000p+0:0x3f800000) * f32(0x1.ff801a00000000000000p-15:0x387fc00d) + f32(0x1.00000c00000000000000p-14:0x38800006)
+res: f32(0x1.ffc01800000000000000p-14:0x38ffe00c) flags=INEXACT  (13/2)
+op : f32(0x1.00000c00000000000000p-14:0x38800006) * f32(0x1.00000000000000000000p+0:0x3f800000) + f32(0x1.00400000000000000000p+0:0x3f802000)
+res: f32(0x1.00440000000000000000p+0:0x3f802200) flags=INEXACT  (14/0)
+op : f32(0x1.00000000000000000000p+0:0x3f800000) * f32(0x1.00400000000000000000p+0:0x3f802000) + f32(0x1.00000c00000000000000p-14:0x38800006)
+res: f32(0x1.00440000000000000000p+0:0x3f802200) flags=INEXACT  (14/1)
+op : f32(0x1.00400000000000000000p+0:0x3f802000) * f32(0x1.00000c00000000000000p-14:0x38800006) + f32(0x1.00000000000000000000p+0:0x3f800000)
+res: f32(0x1.00040000000000000000p+0:0x3f800200) flags=INEXACT  (14/2)
+op : f32(0x1.00000000000000000000p+0:0x3f800000) * f32(0x1.00400000000000000000p+0:0x3f802000) + f32(0x1.00000000000000000000p+1:0x40000000)
+res: f32(0x1.80200000000000000000p+1:0x40401000) flags=INEXACT  (15/0)
+op : f32(0x1.00400000000000000000p+0:0x3f802000) * f32(0x1.00000000000000000000p+1:0x40000000) + f32(0x1.00000000000000000000p+0:0x3f800000)
+res: f32(0x1.80400000000000000000p+1:0x40402000) flags=INEXACT  (15/1)
+op : f32(0x1.00000000000000000000p+1:0x40000000) * f32(0x1.00000000000000000000p+0:0x3f800000) + f32(0x1.00400000000000000000p+0:0x3f802000)
+res: f32(0x1.80200000000000000000p+1:0x40401000) flags=INEXACT  (15/2)
+op : f32(0x1.00400000000000000000p+0:0x3f802000) * f32(0x1.00000000000000000000p+1:0x40000000) + f32(0x1.5bf0a800000000000000p+1:0x402df854)
+res: f32(0x1.2e185400000000000000p+2:0x40970c2a) flags=INEXACT  (16/0)
+op : f32(0x1.00000000000000000000p+1:0x40000000) * f32(0x1.5bf0a800000000000000p+1:0x402df854) + f32(0x1.00400000000000000000p+0:0x3f802000)
+res: f32(0x1.9c00a800000000000000p+2:0x40ce0054) flags=INEXACT  (16/1)
+op : f32(0x1.5bf0a800000000000000p+1:0x402df854) * f32(0x1.00400000000000000000p+0:0x3f802000) + f32(0x1.00000000000000000000p+1:0x40000000)
+res: f32(0x1.2e23d200000000000000p+2:0x409711e9) flags=INEXACT  (16/2)
+op : f32(0x1.00000000000000000000p+1:0x40000000) * f32(0x1.5bf0a800000000000000p+1:0x402df854) + f32(0x1.921fb600000000000000p+1:0x40490fdb)
+res: f32(0x1.12804000000000000000p+3:0x41094020) flags=INEXACT  (17/0)
+op : f32(0x1.5bf0a800000000000000p+1:0x402df854) * f32(0x1.921fb600000000000000p+1:0x40490fdb) + f32(0x1.00000000000000000000p+1:0x40000000)
+res: f32(0x1.51458000000000000000p+3:0x4128a2c0) flags=INEXACT  (17/1)
+op : f32(0x1.921fb600000000000000p+1:0x40490fdb) * f32(0x1.00000000000000000000p+1:0x40000000) + f32(0x1.5bf0a800000000000000p+1:0x402df854)
+res: f32(0x1.200c0400000000000000p+3:0x41100602) flags=INEXACT  (17/2)
+op : f32(0x1.5bf0a800000000000000p+1:0x402df854) * f32(0x1.921fb600000000000000p+1:0x40490fdb) + f32(0x1.ffbe0000000000000000p+15:0x477fdf00)
+res: f32(0x1.ffcf1400000000000000p+15:0x477fe78a) flags=INEXACT  (18/0)
+op : f32(0x1.921fb600000000000000p+1:0x40490fdb) * f32(0x1.ffbe0000000000000000p+15:0x477fdf00) + f32(0x1.5bf0a800000000000000p+1:0x402df854)
+res: f32(0x1.91ed3a00000000000000p+17:0x4848f69d) flags=INEXACT  (18/1)
+op : f32(0x1.ffbe0000000000000000p+15:0x477fdf00) * f32(0x1.5bf0a800000000000000p+1:0x402df854) + f32(0x1.921fb600000000000000p+1:0x40490fdb)
+res: f32(0x1.5bc56000000000000000p+17:0x482de2b0) flags=INEXACT  (18/2)
+op : f32(0x1.921fb600000000000000p+1:0x40490fdb) * f32(0x1.ffbe0000000000000000p+15:0x477fdf00) + f32(0x1.ffc00000000000000000p+15:0x477fe000)
+res: f32(0x1.08edee00000000000000p+18:0x488476f7) flags=INEXACT  (19/0)
+op : f32(0x1.ffbe0000000000000000p+15:0x477fdf00) * f32(0x1.ffc00000000000000000p+15:0x477fe000) + f32(0x1.921fb600000000000000p+1:0x40490fdb)
+res: f32(0x1.ff7e0800000000000000p+31:0x4f7fbf04) flags=INEXACT  (19/1)
+op : f32(0x1.ffc00000000000000000p+15:0x477fe000) * f32(0x1.921fb600000000000000p+1:0x40490fdb) + f32(0x1.ffbe0000000000000000p+15:0x477fdf00)
+res: f32(0x1.08ee7800000000000000p+18:0x4884773c) flags=INEXACT  (19/2)
+op : f32(0x1.ffbe0000000000000000p+15:0x477fdf00) * f32(0x1.ffc00000000000000000p+15:0x477fe000) + f32(0x1.ffc20000000000000000p+15:0x477fe100)
+res: f32(0x1.ff800800000000000000p+31:0x4f7fc004) flags=INEXACT  (20/0)
+op : f32(0x1.ffc00000000000000000p+15:0x477fe000) * f32(0x1.ffc20000000000000000p+15:0x477fe100) + f32(0x1.ffbe0000000000000000p+15:0x477fdf00)
+res: f32(0x1.ff840600000000000000p+31:0x4f7fc203) flags=INEXACT  (20/1)
+op : f32(0x1.ffc20000000000000000p+15:0x477fe100) * f32(0x1.ffbe0000000000000000p+15:0x477fdf00) + f32(0x1.ffc00000000000000000p+15:0x477fe000)
+res: f32(0x1.ff820600000000000000p+31:0x4f7fc103) flags=INEXACT  (20/2)
+op : f32(0x1.ffc00000000000000000p+15:0x477fe000) * f32(0x1.ffc20000000000000000p+15:0x477fe100) + f32(0x1.ffbf0000000000000000p+16:0x47ffdf80)
+res: f32(0x1.ff860600000000000000p+31:0x4f7fc303) flags=INEXACT  (21/0)
+op : f32(0x1.ffc20000000000000000p+15:0x477fe100) * f32(0x1.ffbf0000000000000000p+16:0x47ffdf80) + f32(0x1.ffc00000000000000000p+15:0x477fe000)
+res: f32(0x1.ff820600000000000000p+32:0x4fffc103) flags=INEXACT  (21/1)
+op : f32(0x1.ffbf0000000000000000p+16:0x47ffdf80) * f32(0x1.ffc00000000000000000p+15:0x477fe000) + f32(0x1.ffc20000000000000000p+15:0x477fe100)
+res: f32(0x1.ff800800000000000000p+32:0x4fffc004) flags=INEXACT  (21/2)
+op : f32(0x1.ffc20000000000000000p+15:0x477fe100) * f32(0x1.ffbf0000000000000000p+16:0x47ffdf80) + f32(0x1.ffc00000000000000000p+16:0x47ffe000)
+res: f32(0x1.ff830600000000000000p+32:0x4fffc183) flags=INEXACT  (22/0)
+op : f32(0x1.ffbf0000000000000000p+16:0x47ffdf80) * f32(0x1.ffc00000000000000000p+16:0x47ffe000) + f32(0x1.ffc20000000000000000p+15:0x477fe100)
+res: f32(0x1.ff7f8800000000000000p+33:0x507fbfc4) flags=INEXACT  (22/1)
+op : f32(0x1.ffc00000000000000000p+16:0x47ffe000) * f32(0x1.ffc20000000000000000p+15:0x477fe100) + f32(0x1.ffbf0000000000000000p+16:0x47ffdf80)
+res: f32(0x1.ff840600000000000000p+32:0x4fffc203) flags=INEXACT  (22/2)
+op : f32(0x1.ffbf0000000000000000p+16:0x47ffdf80) * f32(0x1.ffc00000000000000000p+16:0x47ffe000) + f32(0x1.ffc10000000000000000p+16:0x47ffe080)
+res: f32(0x1.ff800800000000000000p+33:0x507fc004) flags=INEXACT  (23/0)
+op : f32(0x1.ffc00000000000000000p+16:0x47ffe000) * f32(0x1.ffc10000000000000000p+16:0x47ffe080) + f32(0x1.ffbf0000000000000000p+16:0x47ffdf80)
+res: f32(0x1.ff820600000000000000p+33:0x507fc103) flags=INEXACT  (23/1)
+op : f32(0x1.ffc10000000000000000p+16:0x47ffe080) * f32(0x1.ffbf0000000000000000p+16:0x47ffdf80) + f32(0x1.ffc00000000000000000p+16:0x47ffe000)
+res: f32(0x1.ff810600000000000000p+33:0x507fc083) flags=INEXACT  (23/2)
+op : f32(0x1.ffc00000000000000000p+16:0x47ffe000) * f32(0x1.ffc10000000000000000p+16:0x47ffe080) + f32(0x1.c0bab600000000000000p+99:0x71605d5b)
+res: f32(0x1.c0bab600000000000000p+99:0x71605d5b) flags=INEXACT  (24/0)
+op : f32(0x1.ffc10000000000000000p+16:0x47ffe080) * f32(0x1.c0bab600000000000000p+99:0x71605d5b) + f32(0x1.ffc00000000000000000p+16:0x47ffe000)
+res: f32(0x1.c0837e00000000000000p+116:0x79e041bf) flags=INEXACT  (24/1)
+op : f32(0x1.c0bab600000000000000p+99:0x71605d5b) * f32(0x1.ffc00000000000000000p+16:0x47ffe000) + f32(0x1.ffc10000000000000000p+16:0x47ffe080)
+res: f32(0x1.c0829e00000000000000p+116:0x79e0414f) flags=INEXACT  (24/2)
+op : f32(0x1.ffc10000000000000000p+16:0x47ffe080) * f32(0x1.c0bab600000000000000p+99:0x71605d5b) + f32(0x1.fffffe00000000000000p+127:0x7f7fffff)
+res: f32(0x1.fffffe00000000000000p+127:0x7f7fffff) flags=OVERFLOW INEXACT  (25/0)
+op : f32(0x1.c0bab600000000000000p+99:0x71605d5b) * f32(0x1.fffffe00000000000000p+127:0x7f7fffff) + f32(0x1.ffc10000000000000000p+16:0x47ffe080)
+res: f32(0x1.fffffe00000000000000p+127:0x7f7fffff) flags=OVERFLOW INEXACT  (25/1)
+op : f32(0x1.fffffe00000000000000p+127:0x7f7fffff) * f32(0x1.ffc10000000000000000p+16:0x47ffe080) + f32(0x1.c0bab600000000000000p+99:0x71605d5b)
+res: f32(0x1.fffffe00000000000000p+127:0x7f7fffff) flags=OVERFLOW INEXACT  (25/2)
+op : f32(0x1.c0bab600000000000000p+99:0x71605d5b) * f32(0x1.fffffe00000000000000p+127:0x7f7fffff) + f32(inf:0x7f800000)
+res: f32(inf:0x7f800000) flags=OK (26/0)
+op : f32(0x1.fffffe00000000000000p+127:0x7f7fffff) * f32(inf:0x7f800000) + f32(0x1.c0bab600000000000000p+99:0x71605d5b)
+res: f32(inf:0x7f800000) flags=OK (26/1)
+op : f32(inf:0x7f800000) * f32(0x1.c0bab600000000000000p+99:0x71605d5b) + f32(0x1.fffffe00000000000000p+127:0x7f7fffff)
+res: f32(inf:0x7f800000) flags=OK (26/2)
+op : f32(0x1.fffffe00000000000000p+127:0x7f7fffff) * f32(inf:0x7f800000) + f32(-nan:0x7fc00000)
+res: f32(-nan:0xffffffff) flags=OK (27/0)
+op : f32(inf:0x7f800000) * f32(-nan:0x7fc00000) + f32(0x1.fffffe00000000000000p+127:0x7f7fffff)
+res: f32(-nan:0xffffffff) flags=OK (27/1)
+op : f32(-nan:0x7fc00000) * f32(0x1.fffffe00000000000000p+127:0x7f7fffff) + f32(inf:0x7f800000)
+res: f32(-nan:0xffffffff) flags=OK (27/2)
+op : f32(inf:0x7f800000) * f32(-nan:0x7fc00000) + f32(-nan:0x7fa00000)
+res: f32(-nan:0xffffffff) flags=INVALID (28/0)
+op : f32(-nan:0x7fc00000) * f32(-nan:0x7fa00000) + f32(inf:0x7f800000)
+res: f32(-nan:0xffffffff) flags=INVALID (28/1)
+op : f32(-nan:0x7fa00000) * f32(inf:0x7f800000) + f32(-nan:0x7fc00000)
+res: f32(-nan:0xffffffff) flags=INVALID (28/2)
+op : f32(-nan:0x7fc00000) * f32(-nan:0x7fa00000) + f32(-nan:0xffa00000)
+res: f32(-nan:0xffffffff) flags=INVALID (29/0)
+op : f32(-nan:0x7fa00000) * f32(-nan:0xffa00000) + f32(-nan:0x7fc00000)
+res: f32(-nan:0xffffffff) flags=INVALID (29/1)
+op : f32(-nan:0xffa00000) * f32(-nan:0x7fc00000) + f32(-nan:0x7fa00000)
+res: f32(-nan:0xffffffff) flags=INVALID (29/2)
+op : f32(-nan:0x7fa00000) * f32(-nan:0xffa00000) + f32(-nan:0xffc00000)
+res: f32(-nan:0xffffffff) flags=INVALID (30/0)
+op : f32(-nan:0xffa00000) * f32(-nan:0xffc00000) + f32(-nan:0x7fa00000)
+res: f32(-nan:0xffffffff) flags=INVALID (30/1)
+op : f32(-nan:0xffc00000) * f32(-nan:0x7fa00000) + f32(-nan:0xffa00000)
+res: f32(-nan:0xffffffff) flags=INVALID (30/2)
+# LP184149
+op : f32(0x0.00000000000000000000p+0:0000000000) * f32(0x1.00000000000000000000p-1:0x3f000000) + f32(0x0.00000000000000000000p+0:0000000000)
+res: f32(0x0.00000000000000000000p+0:0000000000) flags=OK (31/0)
+op : f32(0x1.00000000000000000000p-149:0x00000001) * f32(0x1.00000000000000000000p-149:0x00000001) + f32(0x1.00000000000000000000p-149:0x00000001)
+res: f32(0x1.00000000000000000000p-149:0x00000001) flags=UNDERFLOW INEXACT  (32/0)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 54/67] Hexagon - Add Hexagon Vector eXtensions (HVX) to core definition
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (52 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 53/67] Hexagon build infrastructure Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-02-28 16:43 ` [RFC PATCH v2 55/67] Hexagon HVX support in gdbstub Taylor Simpson
                   ` (13 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

HVX is a set of wide vector instructions.  Machine state includes
    vector registers (VRegs)
    vector predicate registers (QRegs)
    temporary registers for packet semantics
    store buffer (masked stores and scatter/gather)

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/cpu.h         | 42 +++++++++++++++++++++
 target/hexagon/insn.h        | 16 ++++++++
 target/hexagon/internal.h    |  2 +
 target/hexagon/mmvec/mmvec.h | 87 ++++++++++++++++++++++++++++++++++++++++++++
 target/hexagon/cpu.c         | 51 +++++++++++++++++++++++++-
 5 files changed, 197 insertions(+), 1 deletion(-)
 create mode 100644 target/hexagon/mmvec/mmvec.h

diff --git a/target/hexagon/cpu.h b/target/hexagon/cpu.h
index 6146561..6215f91 100644
--- a/target/hexagon/cpu.h
+++ b/target/hexagon/cpu.h
@@ -30,6 +30,7 @@ typedef struct CPUHexagonState CPUHexagonState;
 #include "qemu-common.h"
 #include "exec/cpu-defs.h"
 #include "hex_regs.h"
+#include "mmvec/mmvec.h"
 
 #define NUM_PREGS 4
 #ifdef CONFIG_USER_ONLY
@@ -42,6 +43,7 @@ typedef struct CPUHexagonState CPUHexagonState;
 #define STORES_MAX 2
 #define REG_WRITES_MAX 32
 #define PRED_WRITES_MAX 5                   /* 4 insns + endloop */
+#define VSTORES_MAX 2
 
 #define TYPE_HEXAGON_CPU "hexagon-cpu"
 
@@ -60,6 +62,19 @@ struct MemLog {
     uint64_t data64;
 };
 
+typedef struct {
+    target_ulong va;
+    int size;
+    mmvector_t mask;
+    mmvector_t data;
+} vstorelog_t;
+
+typedef struct {
+    unsigned char cdata[256];
+    uint32_t range;
+    uint8_t format;
+} mem_access_info_t;
+
 #define EXEC_STATUS_OK          0x0000
 #define EXEC_STATUS_STOP        0x0002
 #define EXEC_STATUS_REPLAY      0x0010
@@ -72,6 +87,9 @@ struct MemLog {
 #define CLEAR_EXCEPTION         (env->status &= (~EXEC_STATUS_EXCEPTION))
 #define SET_EXCEPTION           (env->status |= EXEC_STATUS_EXCEPTION)
 
+/* This needs to be large enough for all the reads and writes in a packet */
+#define TEMP_VECTORS_MAX        25
+
 struct CPUHexagonState {
     target_ulong gpr[TOTAL_PER_THREAD_REGS];
     target_ulong pred[NUM_PREGS];
@@ -110,6 +128,30 @@ struct CPUHexagonState {
 
     target_ulong is_gather_store_insn;
     target_ulong gather_issued;
+
+    mmvector_t VRegs[NUM_VREGS];
+    mmvector_t future_VRegs[NUM_VREGS];
+    mmvector_t tmp_VRegs[NUM_VREGS];
+
+    VRegMask VRegs_updated_tmp;
+    VRegMask VRegs_updated;
+    VRegMask VRegs_select;
+
+    mmqreg_t QRegs[NUM_QREGS];
+    mmqreg_t future_QRegs[NUM_QREGS];
+    QRegMask QRegs_updated;
+
+    vstorelog_t vstore[VSTORES_MAX];
+    uint8_t store_pending[VSTORES_MAX];
+    uint8_t vstore_pending[VSTORES_MAX];
+    uint8_t vtcm_pending;
+    vtcm_storelog_t vtcm_log;
+    mem_access_info_t mem_access[SLOTS_MAX];
+
+    int status;
+
+    mmvector_t temp_vregs[TEMP_VECTORS_MAX];
+    mmqreg_t temp_qregs[TEMP_VECTORS_MAX];
 };
 
 #define HEXAGON_CPU_CLASS(klass) \
diff --git a/target/hexagon/insn.h b/target/hexagon/insn.h
index a80bcb9..16d298d 100644
--- a/target/hexagon/insn.h
+++ b/target/hexagon/insn.h
@@ -49,12 +49,16 @@ struct Instruction {
     size4u_t is_dcfetch:1;   /* Has an A_DCFETCH attribute */
     size4u_t is_load:1;      /* Has A_LOAD attribute */
     size4u_t is_store:1;     /* Has A_STORE attribute */
+    size4u_t is_vmem_ld:1;   /* Has an A_LOAD and an A_VMEM attribute */
+    size4u_t is_vmem_st:1;   /* Has an A_STORE and an A_VMEM attribute */
+    size4u_t is_scatgath:1;  /* Has an A_CVI_GATHER or A_CVI_SCATTER attr */
     size4u_t is_memop:1;     /* Has A_MEMOP attribute */
     size4u_t is_dealloc:1;   /* Is a dealloc return or dealloc frame */
     size4u_t is_aia:1;       /* Is a post increment */
     size4u_t is_endloop:1;   /* This is an end of loop */
     size4u_t is_2nd_jump:1;  /* This is the second jump of a dual-jump packet */
     size4u_t new_value_producer_slot:4;
+    size4u_t hvx_resource:8;
     size4s_t immed[IMMEDS_MAX];    /* immediate field */
 };
 
@@ -121,10 +125,22 @@ struct Packet {
 
     /* Misc */
     size8u_t num_rops:4;            /* Num risc ops in the packet */
+    size8u_t pkt_has_vtcm_access:1; /* Is a vmem access going to VTCM */
     size8u_t pkt_access_count:2;    /* Is a vmem access going to VTCM */
     size8u_t pkt_ldaccess_l2:2;     /* vmem ld access to l2 */
     size8u_t pkt_ldaccess_vtcm:2;   /* vmem ld access to vtcm */
 
+    /* Count the types of HVX instructions */
+    size8u_t pkt_hvx_va:4;
+    size8u_t pkt_hvx_vx:4;
+    size8u_t pkt_hvx_vp:4;
+    size8u_t pkt_hvx_vs:4;
+    size8u_t pkt_hvx_all:4;
+    size8u_t pkt_hvx_none:4;
+
+    size8u_t pkt_has_hvx:1;
+    size8u_t pkt_has_extension:1;
+
     insn_t insn[INSTRUCTIONS_MAX];
 };
 
diff --git a/target/hexagon/internal.h b/target/hexagon/internal.h
index 092dedc..4ff7020 100644
--- a/target/hexagon/internal.h
+++ b/target/hexagon/internal.h
@@ -39,6 +39,8 @@
 extern int hexagon_gdb_read_register(CPUState *cpu, uint8_t *buf, int reg);
 extern int hexagon_gdb_write_register(CPUState *cpu, uint8_t *buf, int reg);
 
+extern void hexagon_debug_vreg(CPUHexagonState *env, int regnum);
+extern void hexagon_debug_qreg(CPUHexagonState *env, int regnum);
 extern void hexagon_debug(CPUHexagonState *env);
 
 #if COUNT_HEX_HELPERS
diff --git a/target/hexagon/mmvec/mmvec.h b/target/hexagon/mmvec/mmvec.h
new file mode 100644
index 0000000..145af15
--- /dev/null
+++ b/target/hexagon/mmvec/mmvec.h
@@ -0,0 +1,87 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HEXAGON_MMVEC_H
+#define HEXAGON_MMVEC_H
+
+#define MAX_VEC_SIZE_LOGBYTES 7
+#define MAX_VEC_SIZE_BYTES  (1 << MAX_VEC_SIZE_LOGBYTES)
+
+#define NUM_VREGS           32
+#define NUM_QREGS           4
+
+typedef uint32_t VRegMask; /* at least NUM_VREGS bits */
+typedef uint32_t QRegMask; /* at least NUM_QREGS bits */
+
+#define VECTOR_SIZE_BYTE    (fVECSIZE())
+
+typedef union {
+    uint64_t ud[MAX_VEC_SIZE_BYTES / 8];
+    int64_t   d[MAX_VEC_SIZE_BYTES / 8];
+    uint32_t uw[MAX_VEC_SIZE_BYTES / 4];
+    int32_t   w[MAX_VEC_SIZE_BYTES / 4];
+    uint16_t uh[MAX_VEC_SIZE_BYTES / 2];
+    int16_t   h[MAX_VEC_SIZE_BYTES / 2];
+    uint8_t  ub[MAX_VEC_SIZE_BYTES / 1];
+    int8_t    b[MAX_VEC_SIZE_BYTES / 1];
+} mmvector_t;
+
+typedef union {
+    uint64_t ud[2 * MAX_VEC_SIZE_BYTES / 8];
+    int64_t   d[2 * MAX_VEC_SIZE_BYTES / 8];
+    uint32_t uw[2 * MAX_VEC_SIZE_BYTES / 4];
+    int32_t   w[2 * MAX_VEC_SIZE_BYTES / 4];
+    uint16_t uh[2 * MAX_VEC_SIZE_BYTES / 2];
+    int16_t   h[2 * MAX_VEC_SIZE_BYTES / 2];
+    uint8_t  ub[2 * MAX_VEC_SIZE_BYTES / 1];
+    int8_t    b[2 * MAX_VEC_SIZE_BYTES / 1];
+    mmvector_t v[2];
+} mmvector_pair_t;
+
+typedef union {
+    uint64_t ud[MAX_VEC_SIZE_BYTES / 8 / 8];
+    int64_t   d[MAX_VEC_SIZE_BYTES / 8 / 8];
+    uint32_t uw[MAX_VEC_SIZE_BYTES / 4 / 8];
+    int32_t   w[MAX_VEC_SIZE_BYTES / 4 / 8];
+    uint16_t uh[MAX_VEC_SIZE_BYTES / 2 / 8];
+    int16_t   h[MAX_VEC_SIZE_BYTES / 2 / 8];
+    uint8_t  ub[MAX_VEC_SIZE_BYTES / 1 / 8];
+    int8_t    b[MAX_VEC_SIZE_BYTES / 1 / 8];
+} mmqreg_t;
+
+typedef struct {
+    mmvector_t data;
+    mmvector_t mask;
+    mmvector_pair_t offsets;
+    int size;
+    target_ulong va_base;
+    target_ulong va[MAX_VEC_SIZE_BYTES];
+    int oob_access;
+    int op;
+    int op_size;
+} vtcm_storelog_t;
+
+
+/* Types of vector register assignment */
+typedef enum {
+    EXT_DFL,      /* Default */
+    EXT_NEW,      /* New - value used in the same packet */
+    EXT_TMP       /* Temp - value used but not stored to register */
+} vector_dst_type_t;
+
+#endif
+
diff --git a/target/hexagon/cpu.c b/target/hexagon/cpu.c
index 576c566..ccbc113 100644
--- a/target/hexagon/cpu.c
+++ b/target/hexagon/cpu.c
@@ -122,6 +122,39 @@ static void print_reg(FILE *f, CPUHexagonState *env, int regnum)
     fprintf(f, "  %s = 0x" TARGET_FMT_lx "\n", hexagon_regnames[regnum], value);
 }
 
+static void print_vreg(FILE *f, CPUHexagonState *env, int regnum)
+{
+    int i;
+    fprintf(f, "  v%d = (", regnum);
+    fprintf(f, "0x%02x", env->VRegs[regnum].ub[MAX_VEC_SIZE_BYTES - 1]);
+    for (i = MAX_VEC_SIZE_BYTES - 2; i >= 0; i--) {
+        fprintf(f, ", 0x%02x", env->VRegs[regnum].ub[i]);
+    }
+    fprintf(f, ")\n");
+}
+
+void hexagon_debug_vreg(CPUHexagonState *env, int regnum)
+{
+    print_vreg(stdout, env, regnum);
+}
+
+static void print_qreg(FILE *f, CPUHexagonState *env, int regnum)
+{
+    int i;
+    fprintf(f, "  q%d = (", regnum);
+    fprintf(f, ", 0x%02x",
+                env->QRegs[regnum].ub[MAX_VEC_SIZE_BYTES / 8 - 1]);
+    for (i = MAX_VEC_SIZE_BYTES / 8 - 2; i >= 0; i--) {
+        fprintf(f, ", 0x%02x", env->QRegs[regnum].ub[i]);
+    }
+    fprintf(f, ")\n");
+}
+
+void hexagon_debug_qreg(CPUHexagonState *env, int regnum)
+{
+    print_qreg(stdout, env, regnum);
+}
+
 static void hexagon_dump(CPUHexagonState *env, FILE *f)
 {
     static target_ulong last_pc;
@@ -166,6 +199,22 @@ static void hexagon_dump(CPUHexagonState *env, FILE *f)
     print_reg(f, env, HEX_REG_CS1);
 #endif
     fprintf(f, "}\n");
+
+/*
+ * The HVX register dump takes up a ton of space in the log
+ * Don't print it unless it is needed
+ */
+#define DUMP_HVX 0
+#if DUMP_HVX
+    fprintf(f, "Vector Registers = {\n");
+    for (i = 0; i < NUM_VREGS; i++) {
+        print_vreg(f, env, i);
+    }
+    for (i = 0; i < NUM_QREGS; i++) {
+        print_qreg(f, env, i);
+    }
+    fprintf(f, "}\n");
+#endif
 }
 
 static void hexagon_dump_state(CPUState *cs, FILE *f, int flags)
@@ -291,7 +340,7 @@ static void hexagon_cpu_class_init(ObjectClass *c, void *data)
     cc->gdb_core_xml_file = "hexagon-core.xml";
     cc->gdb_read_register = hexagon_gdb_read_register;
     cc->gdb_write_register = hexagon_gdb_write_register;
-    cc->gdb_num_core_regs = TOTAL_PER_THREAD_REGS;
+    cc->gdb_num_core_regs = TOTAL_PER_THREAD_REGS + NUM_VREGS + NUM_QREGS;
     cc->gdb_stop_before_watchpoint = true;
     cc->disas_set_info = hexagon_cpu_disas_set_info;
 #ifdef CONFIG_TCG
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 55/67] Hexagon HVX support in gdbstub
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (53 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 54/67] Hexagon - Add Hexagon Vector eXtensions (HVX) to core definition Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-02-28 16:43 ` [RFC PATCH v2 56/67] Hexagon HVX import instruction encodings Taylor Simpson
                   ` (12 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/gdbstub.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 62 insertions(+)

diff --git a/target/hexagon/gdbstub.c b/target/hexagon/gdbstub.c
index e678aea..79a4f33 100644
--- a/target/hexagon/gdbstub.c
+++ b/target/hexagon/gdbstub.c
@@ -21,6 +21,28 @@
 #include "cpu.h"
 #include "internal.h"
 
+static int gdb_get_vreg(CPUHexagonState *env, uint8_t *mem_buf, int n)
+{
+    int total = 0;
+    int i;
+    for (i = 0; i < MAX_VEC_SIZE_BYTES / 4; i++) {
+        total += gdb_get_regl(mem_buf, env->VRegs[n].uw[i]);
+        mem_buf += 4;
+    }
+    return total;
+}
+
+static int gdb_get_qreg(CPUHexagonState *env, uint8_t *mem_buf, int n)
+{
+    int total = 0;
+    int i;
+    for (i = 0; i < MAX_VEC_SIZE_BYTES / 4 / 8; i++) {
+        total += gdb_get_regl(mem_buf, env->QRegs[n].uw[i]);
+        mem_buf += 4;
+    }
+    return total;
+}
+
 int hexagon_gdb_read_register(CPUState *cs, uint8_t *mem_buf, int n)
 {
     HexagonCPU *cpu = HEXAGON_CPU(cs);
@@ -29,11 +51,41 @@ int hexagon_gdb_read_register(CPUState *cs, uint8_t *mem_buf, int n)
     if (n < TOTAL_PER_THREAD_REGS) {
         return gdb_get_regl(mem_buf, env->gpr[n]);
     }
+    n -= TOTAL_PER_THREAD_REGS;
+
+    if (n < NUM_VREGS) {
+        return gdb_get_vreg(env, mem_buf, n);
+    }
+    n -= NUM_VREGS;
+
+    if (n < NUM_QREGS) {
+        return gdb_get_qreg(env, mem_buf, n);
+    }
 
     g_assert_not_reached();
     return 0;
 }
 
+static int gdb_put_vreg(CPUHexagonState *env, uint8_t *mem_buf, int n)
+{
+    int i;
+    for (i = 0; i < MAX_VEC_SIZE_BYTES / 4; i++) {
+        env->VRegs[n].uw[i] = ldtul_p(mem_buf);
+        mem_buf += 4;
+    }
+    return MAX_VEC_SIZE_BYTES;
+}
+
+static int gdb_put_qreg(CPUHexagonState *env, uint8_t *mem_buf, int n)
+{
+    int i;
+    for (i = 0; i < MAX_VEC_SIZE_BYTES / 4 / 8; i++) {
+        env->QRegs[n].uw[i] = ldtul_p(mem_buf);
+        mem_buf += 4;
+    }
+    return MAX_VEC_SIZE_BYTES / 8;
+}
+
 int hexagon_gdb_write_register(CPUState *cs, uint8_t *mem_buf, int n)
 {
     HexagonCPU *cpu = HEXAGON_CPU(cs);
@@ -43,6 +95,16 @@ int hexagon_gdb_write_register(CPUState *cs, uint8_t *mem_buf, int n)
         env->gpr[n] = ldtul_p(mem_buf);
         return sizeof(target_ulong);
     }
+    n -= TOTAL_PER_THREAD_REGS;
+
+    if (n < NUM_VREGS) {
+        return gdb_put_vreg(env, mem_buf, n);
+    }
+    n -= NUM_VREGS;
+
+    if (n < NUM_QREGS) {
+        return gdb_put_qreg(env, mem_buf, n);
+    }
 
     g_assert_not_reached();
     return 0;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 56/67] Hexagon HVX import instruction encodings
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (54 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 55/67] Hexagon HVX support in gdbstub Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-02-28 16:43 ` [RFC PATCH v2 57/67] Hexagon HVX import semantics Taylor Simpson
                   ` (11 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/imported/allextenc.def        |  20 +
 target/hexagon/imported/encode.def           |   1 +
 target/hexagon/imported/mmvec/encode_ext.def | 830 +++++++++++++++++++++++++++
 3 files changed, 851 insertions(+)
 create mode 100644 target/hexagon/imported/allextenc.def
 create mode 100644 target/hexagon/imported/mmvec/encode_ext.def

diff --git a/target/hexagon/imported/allextenc.def b/target/hexagon/imported/allextenc.def
new file mode 100644
index 0000000..9af9412
--- /dev/null
+++ b/target/hexagon/imported/allextenc.def
@@ -0,0 +1,20 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#define EXTNAME mmvec
+#include "mmvec/encode_ext.def"
+#undef EXTNAME
diff --git a/target/hexagon/imported/encode.def b/target/hexagon/imported/encode.def
index ae23301..7eb40be 100644
--- a/target/hexagon/imported/encode.def
+++ b/target/hexagon/imported/encode.def
@@ -71,6 +71,7 @@
 
 #include "encode_pp.def"
 #include "encode_subinsn.def"
+#include "allextenc.def"
 
 #ifdef __SELF_DEF_FIELD32
 #undef __SELF_DEF_FIELD32
diff --git a/target/hexagon/imported/mmvec/encode_ext.def b/target/hexagon/imported/mmvec/encode_ext.def
new file mode 100644
index 0000000..e2db99b
--- /dev/null
+++ b/target/hexagon/imported/mmvec/encode_ext.def
@@ -0,0 +1,830 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#define CONCAT(A,B) A##B
+#define EXTEXTNAME(X) CONCAT(EXT_,X)
+#define DEF_ENC(TAG,STR) DEF_EXT_ENC(TAG,EXTEXTNAME(EXTNAME),STR)
+
+
+#ifndef NO_MMVEC
+DEF_ENC(V6_extractw,  ICLASS_LD" 001 0 000sssss  PP0uuuuu  --1ddddd") /* coproc insn, returns Rd */
+#endif
+
+
+#ifndef NO_MMVEC
+
+
+
+DEF_CLASS32(ICLASS_NCJ" 1--- -------- PP------ --------",COPROC_VMEM)
+DEF_CLASS32(ICLASS_NCJ" 1000 0-0ttttt PPi--iii ---ddddd",BaseOffset_VMEM_Loads)
+DEF_CLASS32(ICLASS_NCJ" 1000 1-0ttttt PPivviii ---ddddd",BaseOffset_if_Pv_VMEM_Loads)
+DEF_CLASS32(ICLASS_NCJ" 1000 0-1ttttt PPi--iii --------",BaseOffset_VMEM_Stores1)
+DEF_CLASS32(ICLASS_NCJ" 1000 1-0ttttt PPi--iii 00------",BaseOffset_VMEM_Stores2)
+DEF_CLASS32(ICLASS_NCJ" 1000 1-1ttttt PPivviii --------",BaseOffset_if_Pv_VMEM_Stores)
+
+DEF_CLASS32(ICLASS_NCJ" 1001 0-0xxxxx PP---iii ---ddddd",PostImm_VMEM_Loads)
+DEF_CLASS32(ICLASS_NCJ" 1001 1-0xxxxx PP-vviii ---ddddd",PostImm_if_Pv_VMEM_Loads)
+DEF_CLASS32(ICLASS_NCJ" 1001 0-1xxxxx PP---iii --------",PostImm_VMEM_Stores1)
+DEF_CLASS32(ICLASS_NCJ" 1001 1-0xxxxx PP---iii 00------",PostImm_VMEM_Stores2)
+DEF_CLASS32(ICLASS_NCJ" 1001 1-1xxxxx PP-vviii --------",PostImm_if_Pv_VMEM_Stores)
+
+DEF_CLASS32(ICLASS_NCJ" 1011 0-0xxxxx PPu----- ---ddddd",PostM_VMEM_Loads)
+DEF_CLASS32(ICLASS_NCJ" 1011 1-0xxxxx PPuvv--- ---ddddd",PostM_if_Pv_VMEM_Loads)
+DEF_CLASS32(ICLASS_NCJ" 1011 0-1xxxxx PPu----- --------",PostM_VMEM_Stores1)
+DEF_CLASS32(ICLASS_NCJ" 1011 1-0xxxxx PPu----- 00------",PostM_VMEM_Stores2)
+DEF_CLASS32(ICLASS_NCJ" 1011 1-1xxxxx PPuvv--- --------",PostM_if_Pv_VMEM_Stores)
+
+DEF_CLASS32(ICLASS_NCJ" 110- 0------- PP------ --------",Z_Load)
+DEF_CLASS32(ICLASS_NCJ" 110- 1------- PP------ --------",Z_Load_if_Pv)
+
+DEF_CLASS32(ICLASS_NCJ" 1111 000ttttt PPu--0-- ---vvvvv",Gather)
+DEF_CLASS32(ICLASS_NCJ" 1111 000ttttt PPu--1-- -ssvvvvv",Gather_if_Qs)
+DEF_CLASS32(ICLASS_NCJ" 1111 001ttttt PPuvvvvv ---wwwww",Scatter)
+DEF_CLASS32(ICLASS_NCJ" 1111 001ttttt PPuvvvvv -----sss",Scatter_New)
+DEF_CLASS32(ICLASS_NCJ" 1111 1--ttttt PPuvvvvv -sswwwww",Scatter_if_Qs)
+
+
+DEF_FIELD32(ICLASS_NCJ" 1--- -!------ PP------ --------",NT,"NonTemporal")
+
+
+
+DEF_FIELDROW_DESC32(                ICLASS_NCJ" 1 000 --- ----- PP i --iii ----- ---","[#0] vmem(Rt+#s4)[:nt]")
+
+#define LDST_ENC(TAG,MAJ3,MID3,RREG,TINY6,MIN3,VREG) DEF_ENC(TAG, ICLASS_NCJ "1" #MAJ3 #MID3 #RREG "PP" #TINY6 #MIN3 #VREG)
+
+#define LDST_BO(TAGPRE,MID3,PRED,MIN3,VREG) LDST_ENC(TAGPRE##_ai, 000,MID3,ttttt,i PRED iii,MIN3,VREG)
+#define LDST_PI(TAGPRE,MID3,PRED,MIN3,VREG) LDST_ENC(TAGPRE##_pi, 001,MID3,xxxxx,- PRED iii,MIN3,VREG)
+#define LDST_PM(TAGPRE,MID3,PRED,MIN3,VREG) LDST_ENC(TAGPRE##_ppu,011,MID3,xxxxx,u PRED ---,MIN3,VREG)
+
+#define LDST_BASICLD(OP,TAGPRE) \
+    OP(TAGPRE,                000,00,000,ddddd) \
+    OP(TAGPRE##_nt,           010,00,000,ddddd) \
+    OP(TAGPRE##_cur,          000,00,001,ddddd) \
+    OP(TAGPRE##_nt_cur,       010,00,001,ddddd) \
+    OP(TAGPRE##_tmp,          000,00,010,ddddd) \
+    OP(TAGPRE##_nt_tmp,       010,00,010,ddddd)
+
+#define LDST_BASICST(OP,TAGPRE) \
+    OP(TAGPRE,           001,--,000,sssss) \
+    OP(TAGPRE##_nt,      011,--,000,sssss) \
+    OP(TAGPRE##_new,     001,--,001,-0sss) \
+    OP(TAGPRE##_srls,    001,--,001,-1---) \
+    OP(TAGPRE##_nt_new,  011,--,001,--sss) \
+
+
+#define LDST_QPREDST(OP,TAGPRE) \
+    OP(TAGPRE##_qpred,    100,vv,000,sssss) \
+    OP(TAGPRE##_nt_qpred, 110,vv,000,sssss) \
+    OP(TAGPRE##_nqpred,   100,vv,001,sssss) \
+    OP(TAGPRE##_nt_nqpred,110,vv,001,sssss) \
+
+#define LDST_CONDLD(OP,TAGPRE) \
+    OP(TAGPRE##_pred,         100,vv,010,ddddd) \
+    OP(TAGPRE##_nt_pred,      110,vv,010,ddddd) \
+    OP(TAGPRE##_npred,        100,vv,011,ddddd) \
+    OP(TAGPRE##_nt_npred,     110,vv,011,ddddd) \
+    OP(TAGPRE##_cur_pred,     100,vv,100,ddddd) \
+    OP(TAGPRE##_nt_cur_pred,  110,vv,100,ddddd) \
+    OP(TAGPRE##_cur_npred,    100,vv,101,ddddd) \
+    OP(TAGPRE##_nt_cur_npred, 110,vv,101,ddddd) \
+    OP(TAGPRE##_tmp_pred,     100,vv,110,ddddd) \
+    OP(TAGPRE##_nt_tmp_pred,  110,vv,110,ddddd) \
+    OP(TAGPRE##_tmp_npred,    100,vv,111,ddddd) \
+    OP(TAGPRE##_nt_tmp_npred, 110,vv,111,ddddd) \
+
+#define LDST_PREDST(OP,TAGPRE,NT,MIN2) \
+    OP(TAGPRE##_pred,      1 NT 1,vv,MIN2 0,sssss) \
+    OP(TAGPRE##_npred,     1 NT 1,vv,MIN2 1,sssss)
+
+#define LDST_PREDSTNEW(OP,TAGPRE,NT,MIN2) \
+    OP(TAGPRE##_pred,      1 NT 1,vv,MIN2 0,NT 0 sss) \
+    OP(TAGPRE##_npred,     1 NT 1,vv,MIN2 1,NT 1 sss)
+
+// 0.0,vv,0--,sssss: pred st
+#define LDST_BASICPREDST(OP,TAGPRE) \
+    LDST_PREDST(OP,TAGPRE,             0,00) \
+    LDST_PREDST(OP,TAGPRE##_nt,        1,00) \
+    LDST_PREDSTNEW(OP,TAGPRE##_new,    0,01) \
+    LDST_PREDSTNEW(OP,TAGPRE##_nt_new, 1,01)
+
+
+
+LDST_BASICLD(LDST_BO,V6_vL32b)
+LDST_CONDLD(LDST_BO,V6_vL32b)
+LDST_BASICLD(LDST_PI,V6_vL32b)
+LDST_CONDLD(LDST_PI,V6_vL32b)
+LDST_BASICLD(LDST_PM,V6_vL32b)
+LDST_CONDLD(LDST_PM,V6_vL32b)
+
+// Loads
+
+LDST_BO(V6_vL32Ub,000,00,111,ddddd)
+//Stores
+LDST_BASICST(LDST_BO,V6_vS32b)
+
+
+LDST_BO(V6_vS32Ub,001,--,111,sssss)
+
+
+
+
+// Byte Enabled Stores
+LDST_QPREDST(LDST_BO,V6_vS32b)
+
+// Scalar Predicated Stores
+LDST_BASICPREDST(LDST_BO,V6_vS32b)
+
+
+LDST_PREDST(LDST_BO,V6_vS32Ub,0,11)
+
+
+
+
+DEF_FIELDROW_DESC32(                ICLASS_NCJ" 1 001 --- ----- PP - ----- ddddd ---","[#1] vmem(Rx++#s3)[:nt]")
+
+// Loads
+LDST_BASICLD(LDST_PI,V6_vL32b)
+
+
+LDST_PI(V6_vL32Ub,000,00,111,ddddd)
+
+//Stores
+LDST_BASICST(LDST_PI,V6_vS32b)
+
+
+
+LDST_PI(V6_vS32Ub,001,--,111,sssss)
+
+
+// Byte Enabled Stores
+LDST_QPREDST(LDST_PI,V6_vS32b)
+
+
+// Scalar Predicated Stores
+LDST_BASICPREDST(LDST_PI,V6_vS32b)
+
+
+LDST_PREDST(LDST_PI,V6_vS32Ub,0,11)
+
+
+
+DEF_FIELDROW_DESC32(            ICLASS_NCJ" 1 011 --- ----- PP - ----- ----- ---","[#3] vmem(Rx++#M)[:nt]")
+
+// Loads
+LDST_BASICLD(LDST_PM,V6_vL32b)
+
+
+LDST_PM(V6_vL32Ub,000,00,111,ddddd)
+
+//Stores
+LDST_BASICST(LDST_PM,V6_vS32b)
+
+
+
+LDST_PM(V6_vS32Ub,001,--,111,sssss)
+
+// Byte Enabled Stores
+LDST_QPREDST(LDST_PM,V6_vS32b)
+
+// Scalar Predicated Stores
+LDST_BASICPREDST(LDST_PM,V6_vS32b)
+
+
+LDST_PREDST(LDST_PM,V6_vS32Ub,0,11)
+
+
+
+DEF_ENC(V6_vaddcarrysat,    ICLASS_CJ" 1 101 100 vvvvv PP 1 uuuuu 0ss ddddd") //
+DEF_ENC(V6_vaddcarryo,        ICLASS_CJ" 1 101 101 vvvvv PP 1 uuuuu 0ee ddddd") //
+DEF_ENC(V6_vsubcarryo,        ICLASS_CJ" 1 101 101 vvvvv PP 1 uuuuu 1ee ddddd") //
+DEF_ENC(V6_vsatdw,          ICLASS_CJ" 1 101 100 vvvvv PP 1 uuuuu 111 ddddd") //
+
+DEF_FIELDROW_DESC32(           ICLASS_NCJ" 1 111 --- ----- PP - ----- ----- ---","[#6] vgather,vscatter")
+DEF_ENC(V6_vgathermw,         ICLASS_NCJ" 1 111 000 ttttt PP u --000 --- vvvvv")    // vtmp.w=vmem(Rt32,Mu2,Vv32.w).w
+DEF_ENC(V6_vgathermh,         ICLASS_NCJ" 1 111 000 ttttt PP u --001 --- vvvvv")    // vtmp.h=vmem(Rt32,Mu2,Vv32.h).h
+DEF_ENC(V6_vgathermhw,         ICLASS_NCJ" 1 111 000 ttttt PP u --010 --- vvvvv")    // vtmp.h=vmem(Rt32,Mu2,Vvv32.w).h
+
+
+DEF_ENC(V6_vgathermwq,         ICLASS_NCJ" 1 111 000 ttttt PP u --100 -ss vvvvv")    // if (Qs4) vtmp.w=vmem(Rt32,Mu2,Vv32.w).w
+DEF_ENC(V6_vgathermhq,         ICLASS_NCJ" 1 111 000 ttttt PP u --101 -ss vvvvv")    // if (Qs4) vtmp.h=vmem(Rt32,Mu2,Vv32.h).h
+DEF_ENC(V6_vgathermhwq,     ICLASS_NCJ" 1 111 000 ttttt PP u --110 -ss vvvvv")    // if (Qs4) vtmp.h=vmem(Rt32,Mu2,Vvv32.w).h
+
+
+
+DEF_ENC(V6_vscattermw,         ICLASS_NCJ" 1 111 001 ttttt PP u vvvvv 000 wwwww")    // vmem(Rt32,Mu2,Vv32.w)=Vw32.w
+DEF_ENC(V6_vscattermh,         ICLASS_NCJ" 1 111 001 ttttt PP u vvvvv 001 wwwww")    // vmem(Rt32,Mu2,Vv32.h)=Vw32.h
+DEF_ENC(V6_vscattermhw,     ICLASS_NCJ" 1 111 001 ttttt PP u vvvvv 010 wwwww")    // vmem(Rt32,Mu2,Vv32.h)=Vw32.h
+
+DEF_ENC(V6_vscattermw_add,     ICLASS_NCJ" 1 111 001 ttttt PP u vvvvv 100 wwwww")    // vmem(Rt32,Mu2,Vv32.w) += Vw32.w
+DEF_ENC(V6_vscattermh_add,     ICLASS_NCJ" 1 111 001 ttttt PP u vvvvv 101 wwwww")    // vmem(Rt32,Mu2,Vv32.h) += Vw32.h
+DEF_ENC(V6_vscattermhw_add, ICLASS_NCJ" 1 111 001 ttttt PP u vvvvv 110 wwwww")    // vmem(Rt32,Mu2,Vv32.h) += Vw32.h
+
+
+DEF_ENC(V6_vscattermwq,     ICLASS_NCJ" 1 111 100 ttttt PP u vvvvv 0ss wwwww")    // if (Qs4) vmem(Rt32,Mu2,Vv32.w)=Vw32.w
+DEF_ENC(V6_vscattermhq,     ICLASS_NCJ" 1 111 100 ttttt PP u vvvvv 1ss wwwww")    // if (Qs4) vmem(Rt32,Mu2,Vv32.h)=Vw32.h
+DEF_ENC(V6_vscattermhwq,     ICLASS_NCJ" 1 111 101 ttttt PP u vvvvv 0ss wwwww")    // if (Qs4) vmem(Rt32,Mu2,Vv32.h)=Vw32.h
+
+
+
+
+
+DEF_CLASS32(ICLASS_CJ" 1--- -------- PP------ --------",COPROC_VX)
+
+
+
+/***************************************************************
+*
+*  Group #0, Uses Q6 Rt8: new in v61
+*
+****************************************************************/
+
+DEF_FIELDROW_DESC32(            ICLASS_CJ" 1 000 --- ----- PP - ----- ----- ---","[#1] Vd32=(Vu32, Vv32, Rt8)")
+DEF_ENC(V6_vasrhbsat,             ICLASS_CJ" 1 000 vvv vvttt PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vasruwuhrndsat,         ICLASS_CJ" 1 000 vvv vvttt PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vasrwuhrndsat,         ICLASS_CJ" 1 000 vvv vvttt PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vlutvvb_nm,             ICLASS_CJ" 1 000 vvv vvttt PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vlutvwh_nm,             ICLASS_CJ" 1 000 vvv vvttt PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vasruhubrndsat,         ICLASS_CJ" 1 000 vvv vvttt PP 0 uuuuu 111 ddddd") //
+
+DEF_ENC(V6_vasruwuhsat,         ICLASS_CJ" 1 000 vvv vvttt PP 1 uuuuu 100 ddddd") //
+DEF_ENC(V6_vasruhubsat,            ICLASS_CJ" 1 000 vvv vvttt PP 1 uuuuu 101 ddddd") //
+
+/***************************************************************
+*
+*  Group #1, Uses Q6 Rt32
+*
+****************************************************************/
+
+DEF_FIELDROW_DESC32(        ICLASS_CJ" 1 001 --- ----- PP - ----- ----- ---","[#1] Vd32=(Vu32, Rt32)")
+DEF_ENC(V6_vtmpyb,             ICLASS_CJ" 1 001 000 ttttt PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vtmpybus,         ICLASS_CJ" 1 001 000 ttttt PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vdmpyhb,         ICLASS_CJ" 1 001 000 ttttt PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vrmpyub,         ICLASS_CJ" 1 001 000 ttttt PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vrmpybus,         ICLASS_CJ" 1 001 000 ttttt PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vdsaduh,         ICLASS_CJ" 1 001 000 ttttt PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vdmpybus,         ICLASS_CJ" 1 001 000 ttttt PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vdmpybus_dv,     ICLASS_CJ" 1 001 000 ttttt PP 0 uuuuu 111 ddddd") //
+
+DEF_ENC(V6_vdmpyhsusat,     ICLASS_CJ" 1 001 001 ttttt PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vdmpyhsuisat,     ICLASS_CJ" 1 001 001 ttttt PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vdmpyhsat,         ICLASS_CJ" 1 001 001 ttttt PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vdmpyhisat,         ICLASS_CJ" 1 001 001 ttttt PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vdmpyhb_dv,         ICLASS_CJ" 1 001 001 ttttt PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vmpybus,         ICLASS_CJ" 1 001 001 ttttt PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vmpabus,         ICLASS_CJ" 1 001 001 ttttt PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vmpahb,             ICLASS_CJ" 1 001 001 ttttt PP 0 uuuuu 111 ddddd") //
+
+DEF_ENC(V6_vmpyh,             ICLASS_CJ" 1 001 010 ttttt PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vmpyhss,         ICLASS_CJ" 1 001 010 ttttt PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vmpyhsrs,         ICLASS_CJ" 1 001 010 ttttt PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vmpyuh,             ICLASS_CJ" 1 001 010 ttttt PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vrmpybusi,         ICLASS_CJ" 1 001 010 ttttt PP 0 uuuuu 10i ddddd") //
+DEF_ENC(V6_vrsadubi,         ICLASS_CJ" 1 001 010 ttttt PP 0 uuuuu 11i ddddd") //
+
+DEF_ENC(V6_vmpyihb,         ICLASS_CJ" 1 001 011 ttttt PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vror,             ICLASS_CJ" 1 001 011 ttttt PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vmpyuhe,         ICLASS_CJ" 1 001 011 ttttt PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vmpabuu,         ICLASS_CJ" 1 001 011 ttttt PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vlut4,            ICLASS_CJ" 1 001 011 ttttt PP 0 uuuuu 100 ddddd") //
+
+
+DEF_ENC(V6_vasrw,             ICLASS_CJ" 1 001 011 ttttt PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vasrh,             ICLASS_CJ" 1 001 011 ttttt PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vaslw,             ICLASS_CJ" 1 001 011 ttttt PP 0 uuuuu 111 ddddd") //
+
+DEF_ENC(V6_vaslh,             ICLASS_CJ" 1 001 100 ttttt PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vlsrw,             ICLASS_CJ" 1 001 100 ttttt PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vlsrh,             ICLASS_CJ" 1 001 100 ttttt PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vlsrb,            ICLASS_CJ" 1 001 100 ttttt PP 0 uuuuu 011 ddddd") //
+
+DEF_ENC(V6_vmpauhb,            ICLASS_CJ" 1 001 100 ttttt PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vmpyiwub,         ICLASS_CJ" 1 001 100 ttttt PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vmpyiwh,         ICLASS_CJ" 1 001 100 ttttt PP 0 uuuuu 111 ddddd") //
+
+DEF_ENC(V6_vmpyiwb,         ICLASS_CJ" 1 001 101 ttttt PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_lvsplatw,         ICLASS_CJ" 1 001 101 ttttt PP 0 ----0 001 ddddd") //
+
+
+
+DEF_ENC(V6_pred_scalar2,     ICLASS_CJ" 1 001 101 ttttt PP 0 ----- 010 -01dd") //
+DEF_ENC(V6_vandvrt,         ICLASS_CJ" 1 001 101 ttttt PP 0 uuuuu 010 -10dd") //
+DEF_ENC(V6_pred_scalar2v2,     ICLASS_CJ" 1 001 101 ttttt PP 0 ----- 010 -11dd") //
+
+DEF_ENC(V6_vtmpyhb,         ICLASS_CJ" 1 001 101 ttttt PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vandqrt,         ICLASS_CJ" 1 001 101 ttttt PP 0 --0uu 101 ddddd") //
+DEF_ENC(V6_vandnqrt,         ICLASS_CJ" 1 001 101 ttttt PP 0 --1uu 101 ddddd") //
+
+DEF_ENC(V6_vrmpyubi,         ICLASS_CJ" 1 001 101 ttttt PP 0 uuuuu 11i ddddd") //
+
+DEF_ENC(V6_vmpyub,             ICLASS_CJ" 1 001 110 ttttt PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_lvsplath,         ICLASS_CJ" 1 001 110 ttttt PP 0 ----- 001 ddddd") //
+DEF_ENC(V6_lvsplatb,         ICLASS_CJ" 1 001 110 ttttt PP 0 ----- 010 ddddd") //
+
+
+DEF_FIELDROW_DESC32(        ICLASS_CJ" 1 001 --- ----- PP - ----- ----- ---","[#1] Vx32=(Vu32, Rt32)")
+DEF_ENC(V6_vtmpyb_acc,         ICLASS_CJ" 1 001 000 ttttt PP 1 uuuuu 000 xxxxx") //
+DEF_ENC(V6_vtmpybus_acc,     ICLASS_CJ" 1 001 000 ttttt PP 1 uuuuu 001 xxxxx") //
+DEF_ENC(V6_vtmpyhb_acc,     ICLASS_CJ" 1 001 000 ttttt PP 1 uuuuu 010 xxxxx") //
+DEF_ENC(V6_vdmpyhb_acc,     ICLASS_CJ" 1 001 000 ttttt PP 1 uuuuu 011 xxxxx") //
+DEF_ENC(V6_vrmpyub_acc,     ICLASS_CJ" 1 001 000 ttttt PP 1 uuuuu 100 xxxxx") //
+DEF_ENC(V6_vrmpybus_acc,     ICLASS_CJ" 1 001 000 ttttt PP 1 uuuuu 101 xxxxx") //
+DEF_ENC(V6_vdmpybus_acc,     ICLASS_CJ" 1 001 000 ttttt PP 1 uuuuu 110 xxxxx") //
+DEF_ENC(V6_vdmpybus_dv_acc, ICLASS_CJ" 1 001 000 ttttt PP 1 uuuuu 111 xxxxx") //
+
+DEF_ENC(V6_vdmpyhsusat_acc, ICLASS_CJ" 1 001 001 ttttt PP 1 uuuuu 000 xxxxx") //
+DEF_ENC(V6_vdmpyhsuisat_acc,ICLASS_CJ" 1 001 001 ttttt PP 1 uuuuu 001 xxxxx") //
+DEF_ENC(V6_vdmpyhisat_acc,     ICLASS_CJ" 1 001 001 ttttt PP 1 uuuuu 010 xxxxx") //
+DEF_ENC(V6_vdmpyhsat_acc,     ICLASS_CJ" 1 001 001 ttttt PP 1 uuuuu 011 xxxxx") //
+DEF_ENC(V6_vdmpyhb_dv_acc,     ICLASS_CJ" 1 001 001 ttttt PP 1 uuuuu 100 xxxxx") //
+DEF_ENC(V6_vmpybus_acc,     ICLASS_CJ" 1 001 001 ttttt PP 1 uuuuu 101 xxxxx") //
+DEF_ENC(V6_vmpabus_acc,     ICLASS_CJ" 1 001 001 ttttt PP 1 uuuuu 110 xxxxx") //
+DEF_ENC(V6_vmpahb_acc,         ICLASS_CJ" 1 001 001 ttttt PP 1 uuuuu 111 xxxxx") //
+
+DEF_ENC(V6_vmpyhsat_acc,     ICLASS_CJ" 1 001 010 ttttt PP 1 uuuuu 000 xxxxx") //
+DEF_ENC(V6_vmpyuh_acc,         ICLASS_CJ" 1 001 010 ttttt PP 1 uuuuu 001 xxxxx") //
+DEF_ENC(V6_vmpyiwb_acc,     ICLASS_CJ" 1 001 010 ttttt PP 1 uuuuu 010 xxxxx") //
+DEF_ENC(V6_vmpyiwh_acc,     ICLASS_CJ" 1 001 010 ttttt PP 1 uuuuu 011 xxxxx") //
+DEF_ENC(V6_vrmpybusi_acc,     ICLASS_CJ" 1 001 010 ttttt PP 1 uuuuu 10i xxxxx") //
+DEF_ENC(V6_vrsadubi_acc,     ICLASS_CJ" 1 001 010 ttttt PP 1 uuuuu 11i xxxxx") //
+
+DEF_ENC(V6_vdsaduh_acc,     ICLASS_CJ" 1 001 011 ttttt PP 1 uuuuu 000 xxxxx") //
+DEF_ENC(V6_vmpyihb_acc,     ICLASS_CJ" 1 001 011 ttttt PP 1 uuuuu 001 xxxxx") //
+DEF_ENC(V6_vaslw_acc,         ICLASS_CJ" 1 001 011 ttttt PP 1 uuuuu 010 xxxxx") //
+DEF_ENC(V6_vandqrt_acc,     ICLASS_CJ" 1 001 011 ttttt PP 1 --0uu 011 xxxxx") //
+DEF_ENC(V6_vandnqrt_acc,     ICLASS_CJ" 1 001 011 ttttt PP 1 --1uu 011 xxxxx") //
+DEF_ENC(V6_vandvrt_acc,     ICLASS_CJ" 1 001 011 ttttt PP 1 uuuuu 100 ---xx") //
+DEF_ENC(V6_vasrw_acc,         ICLASS_CJ" 1 001 011 ttttt PP 1 uuuuu 101 xxxxx") //
+DEF_ENC(V6_vrmpyubi_acc,     ICLASS_CJ" 1 001 011 ttttt PP 1 uuuuu 11i xxxxx") //
+
+DEF_ENC(V6_vmpyub_acc,         ICLASS_CJ" 1 001 100 ttttt PP 1 uuuuu 000 xxxxx") //
+DEF_ENC(V6_vmpyiwub_acc,    ICLASS_CJ" 1 001 100 ttttt PP 1 uuuuu 001 xxxxx") //
+DEF_ENC(V6_vmpauhb_acc,        ICLASS_CJ" 1 001 100 ttttt PP 1 uuuuu 010 xxxxx") //
+DEF_ENC(V6_vmpyuhe_acc,        ICLASS_CJ" 1 001 100 ttttt PP 1 uuuuu 011 xxxxx")
+DEF_ENC(V6_vmpahhsat,        ICLASS_CJ" 1 001 100 ttttt PP 1 uuuuu 100 xxxxx") //
+DEF_ENC(V6_vmpauhuhsat,        ICLASS_CJ" 1 001 100 ttttt PP 1 uuuuu 101 xxxxx") //
+DEF_ENC(V6_vmpsuhuhsat,        ICLASS_CJ" 1 001 100 ttttt PP 1 uuuuu 110 xxxxx") //
+DEF_ENC(V6_vasrh_acc,         ICLASS_CJ" 1 001 100 ttttt PP 1 uuuuu 111 xxxxx") //
+
+
+
+
+DEF_ENC(V6_vinsertwr,        ICLASS_CJ" 1 001 101 ttttt PP 1 ----- 001 xxxxx")
+
+DEF_ENC(V6_vmpabuu_acc,        ICLASS_CJ" 1 001 101 ttttt PP 1 uuuuu 100 xxxxx") //
+DEF_ENC(V6_vaslh_acc,        ICLASS_CJ" 1 001 101 ttttt PP 1 uuuuu 101 xxxxx") //
+DEF_ENC(V6_vmpyh_acc,        ICLASS_CJ" 1 001 101 ttttt PP 1 uuuuu 110 xxxxx") //
+
+
+
+DEF_FIELDROW_DESC32(        ICLASS_CJ" 1 001 --- ----- PP - ----- ----- ---","[#1] (Vx32, Vy32, Rt32)")
+DEF_ENC(V6_vshuff,             ICLASS_CJ" 1 001 111 ttttt PP 1 yyyyy 001 xxxxx") //
+DEF_ENC(V6_vdeal,             ICLASS_CJ" 1 001 111 ttttt PP 1 yyyyy 010 xxxxx") //
+
+
+
+
+
+/***************************************************************
+*
+*  Group #2, Uses Q6 Ps4, note SILVER uses Rt8
+*
+*  Lots of encoding space here!
+*
+****************************************************************/
+
+
+DEF_FIELDROW_DESC32(    ICLASS_CJ" 1 010 --- ----- PP - ----- ----- ---","[#2] if (Ps) Vd=Vu")
+DEF_ENC(V6_vcmov,         ICLASS_CJ" 1 010 000 ----- PP - uuuuu -ss ddddd")
+DEF_ENC(V6_vncmov,         ICLASS_CJ" 1 010 001 ----- PP - uuuuu -ss ddddd")
+DEF_ENC(V6_vnccombine,     ICLASS_CJ" 1 010 010 vvvvv PP - uuuuu -ss ddddd")
+DEF_ENC(V6_vccombine,     ICLASS_CJ" 1 010 011 vvvvv PP - uuuuu -ss ddddd")
+
+DEF_ENC(V6_vrotr,       ICLASS_CJ" 1 010 100 vvvvv PP 1 uuuuu 111 ddddd")
+DEF_ENC(V6_vasr_into,   ICLASS_CJ" 1 010 101 vvvvv PP 1 uuuuu 111 xxxxx")
+
+/***************************************************************
+*
+*  Group #3, Uses Q6 Rt8
+*
+****************************************************************/
+
+DEF_FIELDROW_DESC32(        ICLASS_CJ" 1 011 --- ----- PP - ----- ----- ---","[#3] Vd32=(Vu32, Vv32, Rt8)")
+DEF_ENC(V6_valignb,         ICLASS_CJ" 1 011 vvv vvttt PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vlalignb,         ICLASS_CJ" 1 011 vvv vvttt PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vasrwh,         ICLASS_CJ" 1 011 vvv vvttt PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vasrwhsat,         ICLASS_CJ" 1 011 vvv vvttt PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vasrwhrndsat,     ICLASS_CJ" 1 011 vvv vvttt PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vasrwuhsat,         ICLASS_CJ" 1 011 vvv vvttt PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vasrhubsat,         ICLASS_CJ" 1 011 vvv vvttt PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vasrhubrndsat,     ICLASS_CJ" 1 011 vvv vvttt PP 0 uuuuu 111 ddddd") //
+
+DEF_ENC(V6_vasrhbrndsat,     ICLASS_CJ" 1 011 vvv vvttt PP 1 uuuuu 000 ddddd") //
+DEF_ENC(V6_vlutvvb,            ICLASS_CJ" 1 011 vvv vvttt PP 1 uuuuu 001 ddddd")
+DEF_ENC(V6_vshuffvdd,         ICLASS_CJ" 1 011 vvv vvttt PP 1 uuuuu 011 ddddd") //
+DEF_ENC(V6_vdealvdd,         ICLASS_CJ" 1 011 vvv vvttt PP 1 uuuuu 100 ddddd") //
+DEF_ENC(V6_vlutvvb_oracc,    ICLASS_CJ" 1 011 vvv vvttt PP 1 uuuuu 101 xxxxx")
+DEF_ENC(V6_vlutvwh,            ICLASS_CJ" 1 011 vvv vvttt PP 1 uuuuu 110 ddddd")
+DEF_ENC(V6_vlutvwh_oracc,    ICLASS_CJ" 1 011 vvv vvttt PP 1 uuuuu 111 xxxxx")
+
+
+
+/***************************************************************
+*
+*  Group #4, No Q6 regs
+*
+****************************************************************/
+
+DEF_FIELDROW_DESC32(    ICLASS_CJ" 1 100 --- ----- PP 0 ----- ----- ---","[#4] Vd32=(Vu32, Vv32)")
+DEF_ENC(V6_vrmpyubv,     ICLASS_CJ" 1 100 000 vvvvv PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vrmpybv,     ICLASS_CJ" 1 100 000 vvvvv PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vrmpybusv,     ICLASS_CJ" 1 100 000 vvvvv PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vdmpyhvsat,     ICLASS_CJ" 1 100 000 vvvvv PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vmpybv,         ICLASS_CJ" 1 100 000 vvvvv PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vmpyubv,     ICLASS_CJ" 1 100 000 vvvvv PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vmpybusv,     ICLASS_CJ" 1 100 000 vvvvv PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vmpyhv,         ICLASS_CJ" 1 100 000 vvvvv PP 0 uuuuu 111 ddddd") //
+
+DEF_ENC(V6_vmpyuhv,     ICLASS_CJ" 1 100 001 vvvvv PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vmpyhvsrs,     ICLASS_CJ" 1 100 001 vvvvv PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vmpyhus,     ICLASS_CJ" 1 100 001 vvvvv PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vmpabusv,     ICLASS_CJ" 1 100 001 vvvvv PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vmpyih,         ICLASS_CJ" 1 100 001 vvvvv PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vand,         ICLASS_CJ" 1 100 001 vvvvv PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vor,         ICLASS_CJ" 1 100 001 vvvvv PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vxor,         ICLASS_CJ" 1 100 001 vvvvv PP 0 uuuuu 111 ddddd") //
+
+DEF_ENC(V6_vaddw,         ICLASS_CJ" 1 100 010 vvvvv PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vaddubsat,     ICLASS_CJ" 1 100 010 vvvvv PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vadduhsat,     ICLASS_CJ" 1 100 010 vvvvv PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vaddhsat,     ICLASS_CJ" 1 100 010 vvvvv PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vaddwsat,     ICLASS_CJ" 1 100 010 vvvvv PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vsubb,         ICLASS_CJ" 1 100 010 vvvvv PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vsubh,         ICLASS_CJ" 1 100 010 vvvvv PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vsubw,         ICLASS_CJ" 1 100 010 vvvvv PP 0 uuuuu 111 ddddd") //
+
+DEF_ENC(V6_vsububsat,     ICLASS_CJ" 1 100 011 vvvvv PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vsubuhsat,     ICLASS_CJ" 1 100 011 vvvvv PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vsubhsat,     ICLASS_CJ" 1 100 011 vvvvv PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vsubwsat,     ICLASS_CJ" 1 100 011 vvvvv PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vaddb_dv,     ICLASS_CJ" 1 100 011 vvvvv PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vaddh_dv,     ICLASS_CJ" 1 100 011 vvvvv PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vaddw_dv,     ICLASS_CJ" 1 100 011 vvvvv PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vaddubsat_dv,ICLASS_CJ" 1 100 011 vvvvv PP 0 uuuuu 111 ddddd") //
+
+DEF_ENC(V6_vadduhsat_dv,ICLASS_CJ" 1 100 100 vvvvv PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vaddhsat_dv, ICLASS_CJ" 1 100 100 vvvvv PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vaddwsat_dv, ICLASS_CJ" 1 100 100 vvvvv PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vsubb_dv,     ICLASS_CJ" 1 100 100 vvvvv PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vsubh_dv,     ICLASS_CJ" 1 100 100 vvvvv PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vsubw_dv,     ICLASS_CJ" 1 100 100 vvvvv PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vsububsat_dv,ICLASS_CJ" 1 100 100 vvvvv PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vsubuhsat_dv,ICLASS_CJ" 1 100 100 vvvvv PP 0 uuuuu 111 ddddd") //
+
+DEF_ENC(V6_vsubhsat_dv,    ICLASS_CJ" 1 100 101 vvvvv PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vsubwsat_dv, ICLASS_CJ" 1 100 101 vvvvv PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vaddubh,     ICLASS_CJ" 1 100 101 vvvvv PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vadduhw,     ICLASS_CJ" 1 100 101 vvvvv PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vaddhw,         ICLASS_CJ" 1 100 101 vvvvv PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vsububh,     ICLASS_CJ" 1 100 101 vvvvv PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vsubuhw,        ICLASS_CJ" 1 100 101 vvvvv PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vsubhw,        ICLASS_CJ" 1 100 101 vvvvv PP 0 uuuuu 111 ddddd") //
+
+DEF_ENC(V6_vabsdiffub,    ICLASS_CJ" 1 100 110 vvvvv PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vabsdiffh,     ICLASS_CJ" 1 100 110 vvvvv PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vabsdiffuh,     ICLASS_CJ" 1 100 110 vvvvv PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vabsdiffw,     ICLASS_CJ" 1 100 110 vvvvv PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vavgub,         ICLASS_CJ" 1 100 110 vvvvv PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vavguh,         ICLASS_CJ" 1 100 110 vvvvv PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vavgh,        ICLASS_CJ" 1 100 110 vvvvv PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vavgw,        ICLASS_CJ" 1 100 110 vvvvv PP 0 uuuuu 111 ddddd") //
+
+DEF_ENC(V6_vnavgub,        ICLASS_CJ" 1 100 111 vvvvv PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vnavgh,         ICLASS_CJ" 1 100 111 vvvvv PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vnavgw,         ICLASS_CJ" 1 100 111 vvvvv PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vavgubrnd,     ICLASS_CJ" 1 100 111 vvvvv PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vavguhrnd,     ICLASS_CJ" 1 100 111 vvvvv PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vavghrnd,     ICLASS_CJ" 1 100 111 vvvvv PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vavgwrnd,    ICLASS_CJ" 1 100 111 vvvvv PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vmpabuuv,    ICLASS_CJ" 1 100 111 vvvvv PP 0 uuuuu 111 ddddd") //
+
+DEF_FIELDROW_DESC32(        ICLASS_CJ" 1 100 --- ----- PP 1 ----- ----- ---","[#4] Vx32=(Vu32, Vv32)")
+DEF_ENC(V6_vrmpyubv_acc,      ICLASS_CJ" 1 100 000 vvvvv PP 1 uuuuu 000 xxxxx") //
+DEF_ENC(V6_vrmpybv_acc,       ICLASS_CJ" 1 100 000 vvvvv PP 1 uuuuu 001 xxxxx") //
+DEF_ENC(V6_vrmpybusv_acc,    ICLASS_CJ" 1 100 000 vvvvv PP 1 uuuuu 010 xxxxx") //
+DEF_ENC(V6_vdmpyhvsat_acc,    ICLASS_CJ" 1 100 000 vvvvv PP 1 uuuuu 011 xxxxx") //
+DEF_ENC(V6_vmpybv_acc,         ICLASS_CJ" 1 100 000 vvvvv PP 1 uuuuu 100 xxxxx") //
+DEF_ENC(V6_vmpyubv_acc,     ICLASS_CJ" 1 100 000 vvvvv PP 1 uuuuu 101 xxxxx") //
+DEF_ENC(V6_vmpybusv_acc,    ICLASS_CJ" 1 100 000 vvvvv PP 1 uuuuu 110 xxxxx") //
+DEF_ENC(V6_vmpyhv_acc,        ICLASS_CJ" 1 100 000 vvvvv PP 1 uuuuu 111 xxxxx") //
+
+DEF_ENC(V6_vmpyuhv_acc,        ICLASS_CJ" 1 100 001 vvvvv PP 1 uuuuu 000 xxxxx") //
+DEF_ENC(V6_vmpyhus_acc,     ICLASS_CJ" 1 100 001 vvvvv PP 1 uuuuu 001 xxxxx") //
+DEF_ENC(V6_vaddhw_acc,        ICLASS_CJ" 1 100 001 vvvvv PP 1 uuuuu 010 xxxxx") //
+DEF_ENC(V6_vmpyowh_64_acc,    ICLASS_CJ" 1 100 001 vvvvv PP 1 uuuuu 011 xxxxx")
+DEF_ENC(V6_vmpyih_acc,         ICLASS_CJ" 1 100 001 vvvvv PP 1 uuuuu 100 xxxxx") //
+DEF_ENC(V6_vmpyiewuh_acc,    ICLASS_CJ" 1 100 001 vvvvv PP 1 uuuuu 101 xxxxx") //
+DEF_ENC(V6_vmpyowh_sacc,    ICLASS_CJ" 1 100 001 vvvvv PP 1 uuuuu 110 xxxxx") //
+DEF_ENC(V6_vmpyowh_rnd_sacc,ICLASS_CJ" 1 100 001 vvvvv PP 1 uuuuu 111 xxxxx") //
+
+DEF_ENC(V6_vmpyiewh_acc,      ICLASS_CJ" 1 100 010 vvvvv PP 1 uuuuu 000 xxxxx") //
+
+DEF_ENC(V6_vadduhw_acc,          ICLASS_CJ" 1 100 010 vvvvv PP 1 uuuuu 100 xxxxx") //
+DEF_ENC(V6_vaddubh_acc,          ICLASS_CJ" 1 100 010 vvvvv PP 1 uuuuu 101 xxxxx") //
+
+DEF_FIELDROW_DESC32(    ICLASS_CJ" 1 100 100 ----- PP 1 ----- ----- ---","[#4] Qx4=(Vu32, Vv32)")
+// Grouped by element size (lsbs), operation (next-lsbs) and operation (next-lsbs)
+DEF_ENC(V6_veqb_and,     ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 000 000xx") //
+DEF_ENC(V6_veqh_and,     ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 000 001xx") //
+DEF_ENC(V6_veqw_and,     ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 000 010xx") //
+
+DEF_ENC(V6_vgtb_and,    ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 000 100xx") //
+DEF_ENC(V6_vgth_and,    ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 000 101xx") //
+DEF_ENC(V6_vgtw_and,    ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 000 110xx") //
+
+DEF_ENC(V6_vgtub_and,    ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 001 000xx") //
+DEF_ENC(V6_vgtuh_and,    ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 001 001xx") //
+DEF_ENC(V6_vgtuw_and,    ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 001 010xx") //
+
+DEF_ENC(V6_veqb_or,     ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 010 000xx") //
+DEF_ENC(V6_veqh_or,     ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 010 001xx") //
+DEF_ENC(V6_veqw_or,     ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 010 010xx") //
+
+DEF_ENC(V6_vgtb_or,        ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 010 100xx") //
+DEF_ENC(V6_vgth_or,        ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 010 101xx") //
+DEF_ENC(V6_vgtw_or,        ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 010 110xx") //
+
+DEF_ENC(V6_vgtub_or,    ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 011 000xx") //
+DEF_ENC(V6_vgtuh_or,    ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 011 001xx") //
+DEF_ENC(V6_vgtuw_or,    ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 011 010xx") //
+
+DEF_ENC(V6_veqb_xor,     ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 100 000xx") //
+DEF_ENC(V6_veqh_xor,     ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 100 001xx") //
+DEF_ENC(V6_veqw_xor,     ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 100 010xx") //
+
+DEF_ENC(V6_vgtb_xor,    ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 100 100xx") //
+DEF_ENC(V6_vgth_xor,    ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 100 101xx") //
+DEF_ENC(V6_vgtw_xor,    ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 100 110xx") //
+
+DEF_ENC(V6_vgtub_xor,    ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 101 000xx") //
+DEF_ENC(V6_vgtuh_xor,    ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 101 001xx") //
+DEF_ENC(V6_vgtuw_xor,    ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 101 010xx") //
+
+DEF_FIELDROW_DESC32(    ICLASS_CJ" 1 100 101 ----- PP 1 ----- ----- ---","[#4] Qx4,Vd32=(Vu32, Vv32)")
+DEF_ENC(V6_vaddcarry,    ICLASS_CJ" 1 100 101 vvvvv PP 1 uuuuu 0xx ddddd") //
+DEF_ENC(V6_vsubcarry,    ICLASS_CJ" 1 100 101 vvvvv PP 1 uuuuu 1xx ddddd") //
+
+DEF_FIELDROW_DESC32(        ICLASS_CJ" 1 100 11- ----- PP 1 ----- ----- ---","[#4] Vx32|=(Vu32, Vv32,#)")
+DEF_ENC(V6_vlutvvb_oracci,    ICLASS_CJ" 1 100 110 vvvvv PP 1 uuuuu iii xxxxx") //
+DEF_ENC(V6_vlutvwh_oracci,    ICLASS_CJ" 1 100 111 vvvvv PP 1 uuuuu iii xxxxx") //
+
+
+
+/***************************************************************
+*
+*  Group #5, Reserved/Deprecated. Uses Q6 Rx. Stupid FFT.
+*
+****************************************************************/
+
+
+
+
+/************** Not Real Instructions for Debug Only  ************************/
+DEF_FIELDROW_DESC32(            ICLASS_CJ" 1 101 111 11111 PP 1 ----- ----- ---","DEBUG")
+DEF_ENC(V6_ppred,               ICLASS_CJ" 1 101 111 11111 PP 1----0 000 ---ss")
+DEF_ENC(V6_pv32 ,               ICLASS_CJ" 1 101 111 11111 PP 1----0 001 uuuuu")
+DEF_ENC(V6_pv32d,               ICLASS_CJ" 1 101 111 11111 PP 1----0 010 uuuuu")
+DEF_ENC(V6_pv16 ,               ICLASS_CJ" 1 101 111 11111 PP 1----0 011 uuuuu")
+DEF_ENC(V6_pv16d,               ICLASS_CJ" 1 101 111 11111 PP 1----0 100 uuuuu")
+DEF_ENC(V6_pv8d,                ICLASS_CJ" 1 101 111 11111 PP 1----0 101 uuuuu")
+DEF_ENC(V6_pv64d,               ICLASS_CJ" 1 101 111 11111 PP 1----0 111 uuuuu")
+DEF_ENC(V6_preg,                ICLASS_CJ" 1 101 111 11111 PP 1----1 000 uuuuu")
+DEF_ENC(V6_pregd,               ICLASS_CJ" 1 101 111 11111 PP 1----1 001 uuuuu")
+DEF_ENC(V6_pv32du,              ICLASS_CJ" 1 101 111 11111 PP 1----1 010 uuuuu")
+DEF_ENC(V6_pv8,                 ICLASS_CJ" 1 101 111 11111 PP 1----1 011 uuuuu")
+DEF_ENC(V6_pregf,                 ICLASS_CJ" 1 101 111 11111 PP 1----1 110 uuuuu")
+DEF_ENC(V6_pz,                     ICLASS_CJ" 1 101 111 11111 PP 1----1 111 --00u")
+
+
+/***************************************************************
+*
+*  Group #6, No Q6 regs
+*
+****************************************************************/
+
+DEF_FIELDROW_DESC32(    ICLASS_CJ" 1 110 --0 ----- PP 0 ----- ----- ---","[#6] Vd32=Vu32")
+DEF_ENC(V6_vabsh,         ICLASS_CJ" 1 110 --0 ---00 PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vabsh_sat,     ICLASS_CJ" 1 110 --0 ---00 PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vabsw,         ICLASS_CJ" 1 110 --0 ---00 PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vabsw_sat,     ICLASS_CJ" 1 110 --0 ---00 PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vnot,         ICLASS_CJ" 1 110 --0 ---00 PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vdealh,         ICLASS_CJ" 1 110 --0 ---00 PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vdealb,         ICLASS_CJ" 1 110 --0 ---00 PP 0 uuuuu 111 ddddd") //
+
+DEF_ENC(V6_vunpackub,     ICLASS_CJ" 1 110 --0 ---01 PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vunpackuh,     ICLASS_CJ" 1 110 --0 ---01 PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vunpackb,     ICLASS_CJ" 1 110 --0 ---01 PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vunpackh,     ICLASS_CJ" 1 110 --0 ---01 PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vabsb,         ICLASS_CJ" 1 110 --0 ---01 PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vabsb_sat,     ICLASS_CJ" 1 110 --0 ---01 PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vshuffh,     ICLASS_CJ" 1 110 --0 ---01 PP 0 uuuuu 111 ddddd") //
+
+DEF_ENC(V6_vshuffb,     ICLASS_CJ" 1 110 --0 ---10 PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vzb,         ICLASS_CJ" 1 110 --0 ---10 PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vzh,         ICLASS_CJ" 1 110 --0 ---10 PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vsb,         ICLASS_CJ" 1 110 --0 ---10 PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vsh,         ICLASS_CJ" 1 110 --0 ---10 PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vcl0w,         ICLASS_CJ" 1 110 --0 ---10 PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vpopcounth,     ICLASS_CJ" 1 110 --0 ---10 PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vcl0h,         ICLASS_CJ" 1 110 --0 ---10 PP 0 uuuuu 111 ddddd") //
+
+
+DEF_FIELDROW_DESC32(    ICLASS_CJ" 1 110 --0 ---11 PP 0 ----- ----- ---","[#6] Qd4=Qt4, Qs4")
+DEF_ENC(V6_pred_and,     ICLASS_CJ" 1 110 tt0 ---11 PP 0 ---ss 000 000dd") //
+DEF_ENC(V6_pred_or,     ICLASS_CJ" 1 110 tt0 ---11 PP 0 ---ss 000 001dd") //
+DEF_ENC(V6_pred_not,     ICLASS_CJ" 1 110 --0 ---11 PP 0 ---ss 000 010dd") //
+DEF_ENC(V6_pred_xor,     ICLASS_CJ" 1 110 tt0 ---11 PP 0 ---ss 000 011dd") //
+DEF_ENC(V6_pred_or_n,     ICLASS_CJ" 1 110 tt0 ---11 PP 0 ---ss 000 100dd") //
+DEF_ENC(V6_pred_and_n,     ICLASS_CJ" 1 110 tt0 ---11 PP 0 ---ss 000 101dd") //
+DEF_ENC(V6_shuffeqh,     ICLASS_CJ" 1 110 tt0 ---11 PP 0 ---ss 000 110dd") //
+DEF_ENC(V6_shuffeqw,     ICLASS_CJ" 1 110 tt0 ---11 PP 0 ---ss 000 111dd") //
+
+DEF_ENC(V6_vnormamtw,        ICLASS_CJ" 1 110 --0 ---11 PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vnormamth,        ICLASS_CJ" 1 110 --0 ---11 PP 0 uuuuu 101 ddddd") //
+
+DEF_FIELDROW_DESC32(        ICLASS_CJ" 1 110 --1 ----- PP 0 ----- ----- ---","[#6] Vd32=Vu32,Vv32")
+DEF_ENC(V6_vlutvvbi,        ICLASS_CJ" 1 110 001 vvvvv PP 0 uuuuu iii ddddd")
+DEF_ENC(V6_vlutvwhi,        ICLASS_CJ" 1 110 011 vvvvv PP 0 uuuuu iii ddddd")
+
+DEF_ENC(V6_vaddbsat_dv,        ICLASS_CJ" 1 110 101 vvvvv PP 0 uuuuu 000 ddddd")
+DEF_ENC(V6_vsubbsat_dv,        ICLASS_CJ" 1 110 101 vvvvv PP 0 uuuuu 001 ddddd")
+DEF_ENC(V6_vadduwsat_dv,    ICLASS_CJ" 1 110 101 vvvvv PP 0 uuuuu 010 ddddd")
+DEF_ENC(V6_vsubuwsat_dv,    ICLASS_CJ" 1 110 101 vvvvv PP 0 uuuuu 011 ddddd")
+DEF_ENC(V6_vaddububb_sat,    ICLASS_CJ" 1 110 101 vvvvv PP 0 uuuuu 100 ddddd")
+DEF_ENC(V6_vsubububb_sat,    ICLASS_CJ" 1 110 101 vvvvv PP 0 uuuuu 101 ddddd")
+DEF_ENC(V6_vmpyewuh_64,        ICLASS_CJ" 1 110 101 vvvvv PP 0 uuuuu 110 ddddd")
+
+DEF_FIELDROW_DESC32(        ICLASS_CJ" 1 110 --0 ----- PP 1 ----- ----- ---","Vx32=Vu32")
+DEF_ENC(V6_vunpackob,         ICLASS_CJ" 1 110 --0 ---00 PP 1 uuuuu 000 xxxxx") //
+DEF_ENC(V6_vunpackoh,         ICLASS_CJ" 1 110 --0 ---00 PP 1 uuuuu 001 xxxxx") //
+//DEF_ENC(V6_vunpackow,     ICLASS_CJ" 1 110 --0 ---00 PP 1 uuuuu 010 xxxxx") //
+
+DEF_ENC(V6_vhist,            ICLASS_CJ" 1 110 --0 ---00 PP 1 -000- 100 -----")
+DEF_ENC(V6_vwhist256,        ICLASS_CJ" 1 110 --0 ---00 PP 1 -0010 100 -----")
+DEF_ENC(V6_vwhist256_sat,    ICLASS_CJ" 1 110 --0 ---00 PP 1 -0011 100 -----")
+DEF_ENC(V6_vwhist128,        ICLASS_CJ" 1 110 --0 ---00 PP 1 -010- 100 -----")
+DEF_ENC(V6_vwhist128m,        ICLASS_CJ" 1 110 --0 ---00 PP 1 -011i 100 -----")
+
+DEF_FIELDROW_DESC32(        ICLASS_CJ" 1 110 --0 ----- PP 1 ----- ----- ---","if (Qv4) Vx32=Vu32")
+DEF_ENC(V6_vaddbq,             ICLASS_CJ" 1 110 vv0 ---01 PP 1 uuuuu 000 xxxxx") //
+DEF_ENC(V6_vaddhq,             ICLASS_CJ" 1 110 vv0 ---01 PP 1 uuuuu 001 xxxxx") //
+DEF_ENC(V6_vaddwq,             ICLASS_CJ" 1 110 vv0 ---01 PP 1 uuuuu 010 xxxxx") //
+DEF_ENC(V6_vaddbnq,         ICLASS_CJ" 1 110 vv0 ---01 PP 1 uuuuu 011 xxxxx") //
+DEF_ENC(V6_vaddhnq,         ICLASS_CJ" 1 110 vv0 ---01 PP 1 uuuuu 100 xxxxx") //
+DEF_ENC(V6_vaddwnq,         ICLASS_CJ" 1 110 vv0 ---01 PP 1 uuuuu 101 xxxxx") //
+DEF_ENC(V6_vsubbq,             ICLASS_CJ" 1 110 vv0 ---01 PP 1 uuuuu 110 xxxxx") //
+DEF_ENC(V6_vsubhq,             ICLASS_CJ" 1 110 vv0 ---01 PP 1 uuuuu 111 xxxxx") //
+
+DEF_ENC(V6_vsubwq,             ICLASS_CJ" 1 110 vv0 ---10 PP 1 uuuuu 000 xxxxx") //
+DEF_ENC(V6_vsubbnq,         ICLASS_CJ" 1 110 vv0 ---10 PP 1 uuuuu 001 xxxxx") //
+DEF_ENC(V6_vsubhnq,         ICLASS_CJ" 1 110 vv0 ---10 PP 1 uuuuu 010 xxxxx") //
+DEF_ENC(V6_vsubwnq,         ICLASS_CJ" 1 110 vv0 ---10 PP 1 uuuuu 011 xxxxx") //
+
+DEF_ENC(V6_vhistq,            ICLASS_CJ" 1 110 vv0 ---10 PP 1 --00- 100 -----")
+DEF_ENC(V6_vwhist256q,        ICLASS_CJ" 1 110 vv0 ---10 PP 1 --010 100 -----")
+DEF_ENC(V6_vwhist256q_sat,    ICLASS_CJ" 1 110 vv0 ---10 PP 1 --011 100 -----")
+DEF_ENC(V6_vwhist128q,        ICLASS_CJ" 1 110 vv0 ---10 PP 1 --10- 100 -----")
+DEF_ENC(V6_vwhist128qm,        ICLASS_CJ" 1 110 vv0 ---10 PP 1 --11i 100 -----")
+
+
+DEF_ENC(V6_vandvqv,            ICLASS_CJ" 1 110 vv0 ---11 PP 1 uuuuu 000 ddddd")
+DEF_ENC(V6_vandvnqv,        ICLASS_CJ" 1 110 vv0 ---11 PP 1 uuuuu 001 ddddd")
+
+
+DEF_ENC(V6_vprefixqb,       ICLASS_CJ" 1 110 vv0 ---11 PP 1 --000 010 ddddd") //
+DEF_ENC(V6_vprefixqh,       ICLASS_CJ" 1 110 vv0 ---11 PP 1 --001 010 ddddd") //
+DEF_ENC(V6_vprefixqw,       ICLASS_CJ" 1 110 vv0 ---11 PP 1 --010 010 ddddd") //
+
+
+
+
+DEF_ENC(V6_vassign,            ICLASS_CJ" 1 110 --0 ---11 PP 1 uuuuu 111 ddddd")
+
+DEF_ENC(V6_valignbi,         ICLASS_CJ" 1 110 001 vvvvv PP 1 uuuuu iii ddddd")
+DEF_ENC(V6_vlalignbi,         ICLASS_CJ" 1 110 011 vvvvv PP 1 uuuuu iii ddddd")
+DEF_ENC(V6_vswap,             ICLASS_CJ" 1 110 101 vvvvv PP 1 uuuuu -tt ddddd") //
+DEF_ENC(V6_vmux,             ICLASS_CJ" 1 110 111 vvvvv PP 1 uuuuu -tt ddddd") //
+
+
+
+/***************************************************************
+*
+*  Group #7, No Q6 regs
+*
+****************************************************************/
+
+DEF_FIELDROW_DESC32(    ICLASS_CJ" 1 111 --- ----- PP 0 ----- ----- ---","[#7] Vd32=(Vu32, Vv32)")
+DEF_ENC(V6_vaddbsat,    ICLASS_CJ" 1 111 000 vvvvv PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vminub,         ICLASS_CJ" 1 111 000 vvvvv PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vminuh,         ICLASS_CJ" 1 111 000 vvvvv PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vminh,         ICLASS_CJ" 1 111 000 vvvvv PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vminw,         ICLASS_CJ" 1 111 000 vvvvv PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vmaxub,         ICLASS_CJ" 1 111 000 vvvvv PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vmaxuh,         ICLASS_CJ" 1 111 000 vvvvv PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vmaxh,         ICLASS_CJ" 1 111 000 vvvvv PP 0 uuuuu 111 ddddd") //
+
+
+DEF_ENC(V6_vaddclbh,    ICLASS_CJ" 1 111 000 vvvvv PP 1 uuuuu 000 ddddd") //
+DEF_ENC(V6_vaddclbw,    ICLASS_CJ" 1 111 000 vvvvv PP 1 uuuuu 001 ddddd") //
+
+DEF_ENC(V6_vavguw,        ICLASS_CJ" 1 111 000 vvvvv PP 1 uuuuu 010 ddddd") //
+DEF_ENC(V6_vavguwrnd,    ICLASS_CJ" 1 111 000 vvvvv PP 1 uuuuu 011 ddddd") //
+DEF_ENC(V6_vavgb,        ICLASS_CJ" 1 111 000 vvvvv PP 1 uuuuu 100 ddddd") //
+DEF_ENC(V6_vavgbrnd,    ICLASS_CJ" 1 111 000 vvvvv PP 1 uuuuu 101 ddddd") //
+DEF_ENC(V6_vnavgb,        ICLASS_CJ" 1 111 000 vvvvv PP 1 uuuuu 110 ddddd") //
+
+
+DEF_ENC(V6_vmaxw,         ICLASS_CJ" 1 111 001 vvvvv PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vdelta,         ICLASS_CJ" 1 111 001 vvvvv PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vsubbsat,    ICLASS_CJ" 1 111 001 vvvvv PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vrdelta,     ICLASS_CJ" 1 111 001 vvvvv PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vminb,         ICLASS_CJ" 1 111 001 vvvvv PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vmaxb,         ICLASS_CJ" 1 111 001 vvvvv PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vsatuwuh,    ICLASS_CJ" 1 111 001 vvvvv PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vdealb4w,     ICLASS_CJ" 1 111 001 vvvvv PP 0 uuuuu 111 ddddd") //
+
+
+DEF_ENC(V6_vmpyowh_rnd,     ICLASS_CJ" 1 111 010 vvvvv PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vshuffeb,      ICLASS_CJ" 1 111 010 vvvvv PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vshuffob,      ICLASS_CJ" 1 111 010 vvvvv PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vshufeh,      ICLASS_CJ" 1 111 010 vvvvv PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vshufoh,      ICLASS_CJ" 1 111 010 vvvvv PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vshufoeh,      ICLASS_CJ" 1 111 010 vvvvv PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vshufoeb,      ICLASS_CJ" 1 111 010 vvvvv PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vcombine,     ICLASS_CJ" 1 111 010 vvvvv PP 0 uuuuu 111 ddddd") //
+
+DEF_ENC(V6_vmpyieoh,     ICLASS_CJ" 1 111 011 vvvvv PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vadduwsat,     ICLASS_CJ" 1 111 011 vvvvv PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vsathub,     ICLASS_CJ" 1 111 011 vvvvv PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vsatwh,         ICLASS_CJ" 1 111 011 vvvvv PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vroundwh,    ICLASS_CJ" 1 111 011 vvvvv PP 0 uuuuu 100 ddddd")
+DEF_ENC(V6_vroundwuh,    ICLASS_CJ" 1 111 011 vvvvv PP 0 uuuuu 101 ddddd")
+DEF_ENC(V6_vroundhb,    ICLASS_CJ" 1 111 011 vvvvv PP 0 uuuuu 110 ddddd")
+DEF_ENC(V6_vroundhub,    ICLASS_CJ" 1 111 011 vvvvv PP 0 uuuuu 111 ddddd")
+
+DEF_FIELDROW_DESC32(    ICLASS_CJ" 1 111 100 ----- PP - ----- ----- ---","[#7] Qd4=(Vu32, Vv32)")
+DEF_ENC(V6_veqb,         ICLASS_CJ" 1 111 100 vvvvv PP 0 uuuuu 000 000dd") //
+DEF_ENC(V6_veqh,         ICLASS_CJ" 1 111 100 vvvvv PP 0 uuuuu 000 001dd") //
+DEF_ENC(V6_veqw,         ICLASS_CJ" 1 111 100 vvvvv PP 0 uuuuu 000 010dd") //
+
+DEF_ENC(V6_vgtb,         ICLASS_CJ" 1 111 100 vvvvv PP 0 uuuuu 000 100dd") //
+DEF_ENC(V6_vgth,         ICLASS_CJ" 1 111 100 vvvvv PP 0 uuuuu 000 101dd") //
+DEF_ENC(V6_vgtw,         ICLASS_CJ" 1 111 100 vvvvv PP 0 uuuuu 000 110dd") //
+
+DEF_ENC(V6_vgtub,         ICLASS_CJ" 1 111 100 vvvvv PP 0 uuuuu 001 000dd") //
+DEF_ENC(V6_vgtuh,         ICLASS_CJ" 1 111 100 vvvvv PP 0 uuuuu 001 001dd") //
+DEF_ENC(V6_vgtuw,         ICLASS_CJ" 1 111 100 vvvvv PP 0 uuuuu 001 010dd") //
+
+
+DEF_ENC(V6_vasrwv,         ICLASS_CJ" 1 111 101 vvvvv PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vlsrwv,         ICLASS_CJ" 1 111 101 vvvvv PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vlsrhv,         ICLASS_CJ" 1 111 101 vvvvv PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vasrhv,         ICLASS_CJ" 1 111 101 vvvvv PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vaslwv,         ICLASS_CJ" 1 111 101 vvvvv PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vaslhv,         ICLASS_CJ" 1 111 101 vvvvv PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vaddb,         ICLASS_CJ" 1 111 101 vvvvv PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vaddh,         ICLASS_CJ" 1 111 101 vvvvv PP 0 uuuuu 111 ddddd") //
+
+
+DEF_ENC(V6_vmpyiewuh,     ICLASS_CJ" 1 111 110 vvvvv PP 0 uuuuu 000 ddddd")
+DEF_ENC(V6_vmpyiowh,    ICLASS_CJ" 1 111 110 vvvvv PP 0 uuuuu 001 ddddd")
+DEF_ENC(V6_vpackeb,     ICLASS_CJ" 1 111 110 vvvvv PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vpackeh,     ICLASS_CJ" 1 111 110 vvvvv PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vsubuwsat,     ICLASS_CJ" 1 111 110 vvvvv PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vpackhub_sat,ICLASS_CJ" 1 111 110 vvvvv PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vpackhb_sat, ICLASS_CJ" 1 111 110 vvvvv PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vpackwuh_sat,ICLASS_CJ" 1 111 110 vvvvv PP 0 uuuuu 111 ddddd") //
+
+DEF_ENC(V6_vpackwh_sat, ICLASS_CJ" 1 111 111 vvvvv PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vpackob,     ICLASS_CJ" 1 111 111 vvvvv PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vpackoh,     ICLASS_CJ" 1 111 111 vvvvv PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vrounduhub,     ICLASS_CJ" 1 111 111 vvvvv PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vrounduwuh,     ICLASS_CJ" 1 111 111 vvvvv PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vmpyewuh,    ICLASS_CJ" 1 111 111 vvvvv PP 0 uuuuu 101 ddddd")
+DEF_ENC(V6_vmpyowh,        ICLASS_CJ" 1 111 111 vvvvv PP 0 uuuuu 111 ddddd")
+
+
+#endif /* NO MMVEC */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 57/67] Hexagon HVX import semantics
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (55 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 56/67] Hexagon HVX import instruction encodings Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-02-28 16:43 ` [RFC PATCH v2 58/67] Hexagon HVX import macro definitions Taylor Simpson
                   ` (10 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Imported from the Hexagon architecture library
    imported/allext.idef           Top level file for all extensions
    imported/mmvec/ext.idef        HVX instruction definitions

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/imported/allext.idef    |   25 +
 target/hexagon/imported/allidefs.def   |    1 +
 target/hexagon/imported/mmvec/ext.idef | 2780 ++++++++++++++++++++++++++++++++
 3 files changed, 2806 insertions(+)
 create mode 100644 target/hexagon/imported/allext.idef
 create mode 100644 target/hexagon/imported/mmvec/ext.idef

diff --git a/target/hexagon/imported/allext.idef b/target/hexagon/imported/allext.idef
new file mode 100644
index 0000000..26db774
--- /dev/null
+++ b/target/hexagon/imported/allext.idef
@@ -0,0 +1,25 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+/*
+ * Top level file for all instruction set extensions
+ */
+#define EXTNAME mmvec
+#define EXTSTR "mmvec"
+#include "mmvec/ext.idef"
+#undef EXTNAME
+#undef EXTSTR
diff --git a/target/hexagon/imported/allidefs.def b/target/hexagon/imported/allidefs.def
index d2f13a2..03f3f1b 100644
--- a/target/hexagon/imported/allidefs.def
+++ b/target/hexagon/imported/allidefs.def
@@ -89,3 +89,4 @@
 #include "shift.idef"
 #include "system.idef"
 #include "subinsns.idef"
+#include "allext.idef"
diff --git a/target/hexagon/imported/mmvec/ext.idef b/target/hexagon/imported/mmvec/ext.idef
new file mode 100644
index 0000000..0bc6e9f
--- /dev/null
+++ b/target/hexagon/imported/mmvec/ext.idef
@@ -0,0 +1,2780 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+/******************************************************************************
+ *
+ *     HOYA: MULTI MEDIA INSTRUCITONS
+ *
+ ******************************************************************************/
+
+#ifndef EXTINSN
+#define EXTINSN Q6INSN
+#define __SELF_DEF_EXTINSN 1
+#endif
+
+#ifndef NO_MMVEC
+
+#define DO_FOR_EACH_CODE(WIDTH, CODE) \
+{ \
+    fHIDE(int i;) \
+    fVFOREACH(WIDTH, i) {\
+        CODE ;\
+    } \
+}
+
+
+
+
+#define ITERATOR_INSN_ANY_SLOT(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA,A_NOTE_ANY_RESOURCE),  \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+
+
+#define ITERATOR_INSN2_ANY_SLOT(WIDTH,TAG,SYNTAX,SYNTAX2,DESCR,CODE) \
+ITERATOR_INSN_ANY_SLOT(WIDTH,TAG,SYNTAX2,DESCR,CODE) \
+DEF_CVI_MAPPING(V6_##TAG##_alt, SYNTAX, SYNTAX2)
+
+#define ITERATOR_INSN_ANY_SLOT_DOUBLE_VEC(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA_DV,A_NOTE_ANY2_RESOURCE),  \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+
+#define ITERATOR_INSN2_ANY_SLOT_DOUBLE_VEC(WIDTH,TAG,SYNTAX,SYNTAX2,DESCR,CODE) \
+ITERATOR_INSN_ANY_SLOT_DOUBLE_VEC(WIDTH,TAG,SYNTAX2,DESCR,CODE) \
+DEF_CVI_MAPPING(V6_##TAG##_alt, SYNTAX, SYNTAX2)
+
+
+#define ITERATOR_INSN_SHIFT_SLOT(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VS,A_NOTE_SHIFT_RESOURCE),  \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+
+
+#define ITERATOR_INSN_SHIFT_SLOT_VV_LATE(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VS,A_CVI_VV_LATE,A_NOTE_SHIFT_RESOURCE),  \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+#define ITERATOR_INSN2_SHIFT_SLOT(WIDTH,TAG,SYNTAX,SYNTAX2,DESCR,CODE) \
+ITERATOR_INSN_SHIFT_SLOT(WIDTH,TAG,SYNTAX2,DESCR,CODE) \
+DEF_CVI_MAPPING(V6_##TAG##_alt, SYNTAX, SYNTAX2)
+
+#define ITERATOR_INSN_PERMUTE_SLOT(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VP,A_NOTE_PERMUTE_RESOURCE),  \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+#define ITERATOR_INSN2_PERMUTE_SLOT(WIDTH,TAG,SYNTAX,SYNTAX2,DESCR,CODE) \
+ITERATOR_INSN_PERMUTE_SLOT(WIDTH,TAG,SYNTAX2,DESCR,CODE) \
+DEF_CVI_MAPPING(V6_##TAG##_alt, SYNTAX, SYNTAX2)
+
+#define ITERATOR_INSN_PERMUTE_SLOT_DEP(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_NOTE_DEPRECATED,A_EXTENSION,A_CVI,A_CVI_VP,A_NOTE_PERMUTE_RESOURCE),
+
+
+#define ITERATOR_INSN2_PERMUTE_SLOT_DEP(WIDTH,TAG,SYNTAX,SYNTAX2,DESCR,CODE) \
+ITERATOR_INSN_PERMUTE_SLOT_DEP(WIDTH,TAG,SYNTAX2,DESCR,CODE) \
+DEF_CVI_MAPPING(V6_##TAG##_alt, SYNTAX, SYNTAX2)
+
+#define ITERATOR_INSN_PERMUTE_SLOT_DOUBLE_VEC(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VP_VS,A_NOTE_PERMUTE_RESOURCE,A_NOTE_SHIFT_RESOURCE),  \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+#define ITERATOR_INSN_PERMUTE_SLOT_DOUBLE_VEC_DEP(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_NOTE_DEPRECATED,A_EXTENSION,A_CVI,A_CVI_VP_VS,A_NOTE_PERMUTE_RESOURCE,A_NOTE_SHIFT_RESOURCE),  \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+#define ITERATOR_INSN2_PERMUTE_SLOT_DOUBLE_VEC(WIDTH,TAG,SYNTAX,SYNTAX2,DESCR,CODE) \
+ITERATOR_INSN_PERMUTE_SLOT_DOUBLE_VEC(WIDTH,TAG,SYNTAX2,DESCR,CODE) \
+DEF_CVI_MAPPING(V6_##TAG##_alt, SYNTAX, SYNTAX2)
+
+#define ITERATOR_INSN_MPY_SLOT(WIDTH,TAG, SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, \
+ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VX,A_NOTE_MPY_RESOURCE),  \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+#define ITERATOR_INSN_MPY_SLOT_LATE(WIDTH,TAG, SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, \
+ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VX,A_NOTE_MPY_RESOURCE,A_CVI_LATE),  \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+#define ITERATOR_INSN2_MPY_SLOT(WIDTH,TAG,SYNTAX,SYNTAX2,DESCR,CODE) \
+ITERATOR_INSN_MPY_SLOT(WIDTH,TAG,SYNTAX2,DESCR,CODE) \
+DEF_CVI_MAPPING(V6_##TAG##_alt, SYNTAX, SYNTAX2)
+
+#define ITERATOR_INSN2_MPY_SLOT_LATE(WIDTH,TAG, SYNTAX,SYNTAX2,DESCR,CODE) \
+ITERATOR_INSN_MPY_SLOT_LATE(WIDTH,TAG, SYNTAX2,DESCR,CODE) \
+DEF_CVI_MAPPING(V6_##TAG##_alt, SYNTAX, SYNTAX2)
+
+
+#define ITERATOR_INSN_MPY_SLOT_DOUBLE_VEC(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VX_DV,A_NOTE_MPYDV_RESOURCE),  \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+#define ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(WIDTH,TAG,SYNTAX,SYNTAX2,DESCR,CODE) \
+ITERATOR_INSN_MPY_SLOT_DOUBLE_VEC(WIDTH,TAG,SYNTAX2,DESCR,CODE) \
+DEF_CVI_MAPPING(V6_##TAG##_alt, SYNTAX, SYNTAX2)
+
+
+
+
+#define ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC2(WIDTH,TAG,SYNTAX,SYNTAX2,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VX_DV,A_CVI_VX_VSRC0_IS_DST,A_NOTE_MPYDV_RESOURCE), DESCR, DO_FOR_EACH_CODE(WIDTH, CODE)) \
+DEF_CVI_MAPPING(V6_##TAG##_alt, SYNTAX, SYNTAX2)
+
+#define ITERATOR_INSN_SLOT2_DOUBLE_VEC(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VX_DV,A_CVI_EARLY,A_RESTRICT_SLOT2ONLY,A_NOTE_MPYDV_RESOURCE), DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+#define VEC_DEF_MAPPING2(TAG, SYNTAX1_mapped, SYNTAX1, SYNTAX2_mapped, SYNTAX2) \
+DEF_CVI_MAPPING(V6_##TAG, SYNTAX1_mapped, SYNTAX1) \
+DEF_CVI_MAPPING(V6_##TAG##_alt, SYNTAX2_mapped, SYNTAX1_mapped)
+
+
+#define ITERATOR_INSN_VHISTLIKE(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_4SLOT,A_CVI_REQUIRES_TMPLOAD),  \
+DESCR, fHIDE(mmvector_t input;) input = fTMPVDATA(); DO_FOR_EACH_CODE(WIDTH, CODE))
+
+
+
+
+
+/******************************************************************************************
+*
+* MMVECTOR MEMORY OPERATIONS - NO NAPALI V1
+*
+*******************************************************************************************/
+
+
+
+#define ITERATOR_INSN_MPY_SLOT_DOUBLE_VEC_NOV1(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VX_DV,A_NOTE_MPYDV_RESOURCE,A_NOTE_NONAPALIV1),  \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+#define ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC_NOV1(WIDTH,TAG,SYNTAX,SYNTAX2,DESCR,CODE) \
+ITERATOR_INSN_MPY_SLOT_DOUBLE_VEC_NOV1(WIDTH,TAG,SYNTAX2,DESCR,CODE) \
+DEF_CVI_MAPPING(V6_##TAG##_alt, SYNTAX, SYNTAX2)
+
+
+
+#define ITERATOR_INSN_SHIFT_SLOT_NOV1(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VS,A_NOTE_SHIFT_RESOURCE,A_NOTE_NONAPALIV1),  \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+#define ITERATOR_INSN2_SHIFT_SLOT_NOV1(WIDTH,TAG,SYNTAX,SYNTAX2,DESCR,CODE) \
+ITERATOR_INSN_SHIFT_SLOT_NOV1(WIDTH,TAG,SYNTAX2,DESCR,CODE) \
+DEF_CVI_MAPPING(V6_##TAG##_alt, SYNTAX, SYNTAX2)
+
+
+#define ITERATOR_INSN_ANY_SLOT_NOV1(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA,A_NOTE_ANY_RESOURCE,A_NOTE_NONAPALIV1),  \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+#define ITERATOR_INSN2_ANY_SLOT_NOV1(WIDTH,TAG,SYNTAX,SYNTAX2,DESCR,CODE) \
+ITERATOR_INSN_ANY_SLOT_NOV1(WIDTH,TAG,SYNTAX2,DESCR,CODE) \
+DEF_CVI_MAPPING(V6_##TAG##_alt, SYNTAX, SYNTAX2)
+
+
+#define ITERATOR_INSN_MPY_SLOT_NOV1(WIDTH,TAG, SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, \
+ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VX,A_NOTE_MPY_RESOURCE,A_NOTE_NONAPALIV1),  \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+#define ITERATOR_INSN_PERMUTE_SLOT_NOV1(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VP,A_NOTE_PERMUTE_RESOURCE,A_NOTE_NONAPALIV1),  \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+#define ITERATOR_INSN2_PERMUTE_SLOTT_NOV1(WIDTH,TAG,SYNTAX,SYNTAX2,DESCR,CODE) \
+ITERATOR_INSN_PERMUTE_SLOT(WIDTH,TAG,SYNTAX2,DESCR,CODE) \
+DEF_CVI_MAPPING(V6_##TAG##_alt, SYNTAX, SYNTAX2)
+
+#define ITERATOR_INSN_PERMUTE_SLOT_DEPT_NOV1(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_NOTE_DEPRECATED,A_EXTENSION,A_CVI,A_CVI_VP,A_NOTE_PERMUTE_RESOURCE,A_NOTE_NONAPALIV1),
+
+
+#define ITERATOR_INSN2_PERMUTE_SLOT_DEPT_NOV1(WIDTH,TAG,SYNTAX,SYNTAX2,DESCR,CODE) \
+ITERATOR_INSN_PERMUTE_SLOT_DEP_NOV1(WIDTH,TAG,SYNTAX2,DESCR,CODE) \
+DEF_CVI_MAPPING(V6_##TAG##_alt, SYNTAX, SYNTAX2)
+
+#define ITERATOR_INSN_PERMUTE_SLOT_DOUBLE_VEC_NOV1(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VP_VS,A_NOTE_PERMUTE_RESOURCE,A_NOTE_SHIFT_RESOURCE,A_NOTE_NONAPALIV1),  \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+#define ITERATOR_INSN_PERMUTE_SLOT_DOUBLE_VEC_DEPT_NOV1(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_NOTE_DEPRECATED,A_EXTENSION,A_CVI,A_CVI_VP_VS,A_NOTE_PERMUTE_RESOURCE,A_NOTE_SHIFT_RESOURCE,A_NOTE_NONAPALIV1),  \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+#define ITERATOR_INSN2_PERMUTE_SLOT_DOUBLE_VEC_NOV1(WIDTH,TAG,SYNTAX,SYNTAX2,DESCR,CODE) \
+ITERATOR_INSN_PERMUTE_SLOT_DOUBLE_VEC_NOV1(WIDTH,TAG,SYNTAX2,DESCR,CODE) \
+DEF_CVI_MAPPING(V6_##TAG##_alt, SYNTAX, SYNTAX2)
+
+#define NARROWING_SHIFT_NOV1(ITERSIZE,TAG,DSTM,DSTTYPE,SRCTYPE,SYNOPTS,SATFUNC,RNDFUNC,SHAMTMASK) \
+ITERATOR_INSN_SHIFT_SLOT_NOV1(ITERSIZE,TAG, \
+"Vd32." #DSTTYPE "=vasr(Vu32." #SRCTYPE ",Vv32." #SRCTYPE ",Rt8)" #SYNOPTS, \
+"Vector shift right and shuffle", \
+	fHIDE(fRT8NOTE())\
+    fHIDE(int )shamt = RtV & SHAMTMASK; \
+    DSTM(0,VdV.SRCTYPE[i],SATFUNC(RNDFUNC(VvV.SRCTYPE[i],shamt) >> shamt)); \
+    DSTM(1,VdV.SRCTYPE[i],SATFUNC(RNDFUNC(VuV.SRCTYPE[i],shamt) >> shamt)))
+
+#define MMVEC_AVGS_NOV1(TYPE,TYPE2,DESCR, WIDTH, DEST,SRC)\
+ITERATOR_INSN2_ANY_SLOT_NOV1(WIDTH,vavg##TYPE,                        "Vd32=vavg"TYPE2"(Vu32,Vv32)",          "Vd32."#DEST"=vavg(Vu32."#SRC",Vv32."#SRC")",          "Vector Average "DESCR,                                      VdV.DEST[i]  = fVAVGS(       WIDTH,  VuV.SRC[i], VvV.SRC[i])) \
+ITERATOR_INSN2_ANY_SLOT_NOV1(WIDTH,vavg##TYPE##rnd,                   "Vd32=vavg"TYPE2"(Vu32,Vv32):rnd",      "Vd32."#DEST"=vavg(Vu32."#SRC",Vv32."#SRC"):rnd",      "Vector Average % Round"DESCR,                               VdV.DEST[i]  = fVAVGSRND(    WIDTH,  VuV.SRC[i], VvV.SRC[i])) \
+ITERATOR_INSN2_ANY_SLOT_NOV1(WIDTH,vnavg##TYPE,                       "Vd32=vnavg"TYPE2"(Vu32,Vv32)",         "Vd32."#DEST"=vnavg(Vu32."#SRC",Vv32."#SRC")",         "Vector Negative Average "DESCR,                             VdV.DEST[i]  = fVNAVGS(      WIDTH,  VuV.SRC[i], VvV.SRC[i]))
+
+  #define MMVEC_AVGU_NOV1(TYPE,TYPE2,DESCR, WIDTH, DEST,SRC)\
+ITERATOR_INSN2_ANY_SLOT_NOV1(WIDTH,vavg##TYPE,                        "Vd32=vavg"TYPE2"(Vu32,Vv32)",         "Vd32."#DEST"=vavg(Vu32."#SRC",Vv32."#SRC")",        "Vector Average "DESCR,                                      VdV.DEST[i] = fVAVGU(   WIDTH,  VuV.SRC[i], VvV.SRC[i])) \
+ITERATOR_INSN2_ANY_SLOT_NOV1(WIDTH,vavg##TYPE##rnd,                   "Vd32=vavg"TYPE2"(Vu32,Vv32):rnd",     "Vd32."#DEST"=vavg(Vu32."#SRC",Vv32."#SRC"):rnd",    "Vector Average % Round"DESCR,                               VdV.DEST[i] = fVAVGURND(WIDTH,  VuV.SRC[i], VvV.SRC[i]))
+
+
+
+/******************************************************************************************
+*
+* MMVECTOR MEMORY OPERATIONS
+*
+*******************************************************************************************/
+
+#define MMVEC_EACH_EA(TAG,DESCR,ATTRIB,NT,SYNTAXA,SYNTAXB,BEH) \
+EXTINSN(V6_##TAG##_pi,      SYNTAXA "(Rx32++#s3)" NT SYNTAXB,ATTRIB,DESCR,{ fEA_REG(RxV); BEH; fPM_I(RxV,VEC_SCALE(siV)); }) \
+EXTINSN(V6_##TAG##_ai,      SYNTAXA "(Rt32+#s4)" NT SYNTAXB,ATTRIB,DESCR,{ fEA_RI(RtV,VEC_SCALE(siV)); BEH;}) \
+EXTINSN(V6_##TAG##_ppu,      SYNTAXA "(Rx32++Mu2)" NT SYNTAXB,ATTRIB,DESCR,{ fEA_REG(RxV); BEH; fPM_M(RxV,MuV); }) \
+
+
+#define MMVEC_COND_EACH_EA_TRUE(TAG,DESCR,ATTRIB,NT,SYNTAXA,SYNTAXB,SYNTAXP,BEH) \
+EXTINSN(V6_##TAG##_pred_pi,      "if (" #SYNTAXP "4) " SYNTAXA "(Rx32++#s3)" NT SYNTAXB, ATTRIB,DESCR, { if (fLSBOLD(SYNTAXP##V)) { fEA_REG(RxV); BEH; fPM_I(RxV,siV*fVECSIZE()); } else {CANCEL;}}) \
+EXTINSN(V6_##TAG##_pred_ai,      "if (" #SYNTAXP "4) " SYNTAXA "(Rt32+#s4)" NT SYNTAXB, ATTRIB,DESCR,  { if (fLSBOLD(SYNTAXP##V)) { fEA_RI(RtV,siV*fVECSIZE()); BEH;} else {CANCEL;}}) \
+EXTINSN(V6_##TAG##_pred_ppu,     "if (" #SYNTAXP "4) " SYNTAXA "(Rx32++Mu2)" NT SYNTAXB,ATTRIB,DESCR,  { if (fLSBOLD(SYNTAXP##V)) { fEA_REG(RxV); BEH; fPM_M(RxV,MuV); } else {CANCEL;}}) \
+
+#define MMVEC_COND_EACH_EA_FALSE(TAG,DESCR,ATTRIB,NT,SYNTAXA,SYNTAXB,SYNTAXP,BEH) \
+EXTINSN(V6_##TAG##_npred_pi,     "if (!" #SYNTAXP "4) " SYNTAXA "(Rx32++#s3)" NT SYNTAXB,ATTRIB,DESCR,{ if (fLSBOLDNOT(SYNTAXP##V)) { fEA_REG(RxV); BEH; fPM_I(RxV,siV*fVECSIZE()); } else {CANCEL;}}) \
+EXTINSN(V6_##TAG##_npred_ai,     "if (!" #SYNTAXP "4) " SYNTAXA "(Rt32+#s4)" NT SYNTAXB,ATTRIB,DESCR, { if (fLSBOLDNOT(SYNTAXP##V)) { fEA_RI(RtV,siV*fVECSIZE()); BEH;} else {CANCEL;}}) \
+EXTINSN(V6_##TAG##_npred_ppu,    "if (!" #SYNTAXP "4) " SYNTAXA "(Rx32++Mu2)" NT SYNTAXB,ATTRIB,DESCR,{ if (fLSBOLDNOT(SYNTAXP##V)) { fEA_REG(RxV); BEH; fPM_M(RxV,MuV); } else {CANCEL;}})
+
+#define MMVEC_COND_EACH_EA(TAG,DESCR,ATTRIB,NT,SYNTAXA,SYNTAXB,SYNTAXP,BEH) \
+MMVEC_COND_EACH_EA_TRUE(TAG,DESCR,ATTRIB,NT,SYNTAXA,SYNTAXB,SYNTAXP,BEH) \
+MMVEC_COND_EACH_EA_FALSE(TAG,DESCR,ATTRIB,NT,SYNTAXA,SYNTAXB,SYNTAXP,BEH)
+
+
+#define VEC_SCALE(X) X*fVECSIZE()
+
+
+#define MMVEC_LD(TAG,DESCR,ATTRIB,NT) MMVEC_EACH_EA(TAG,DESCR,ATTRIB,NT,"Vd32=vmem","",fLOADMMV(EA,VdV))
+#define MMVEC_LDC(TAG,DESCR,ATTRIB,NT) MMVEC_EACH_EA(TAG##_cur,DESCR,ATTRIB,NT,"Vd32.cur=vmem","",fLOADMMV(EA,VdV))
+#define MMVEC_LDT(TAG,DESCR,ATTRIB,NT) MMVEC_EACH_EA(TAG##_tmp,DESCR,ATTRIB,NT,"Vd32.tmp=vmem","",fLOADMMV(EA,VdV))
+#define MMVEC_LDU(TAG,DESCR,ATTRIB,NT) MMVEC_EACH_EA(TAG,DESCR,ATTRIB,NT,"Vd32=vmemu","",fLOADMMVU(EA,VdV))
+
+
+#define MMVEC_STQ(TAG,DESCR,ATTRIB,NT) \
+MMVEC_EACH_EA(TAG##_qpred,DESCR,ATTRIB,NT,"if (Qv4) vmem","=Vs32",fSTOREMMVQ(EA,VsV,QvV)) \
+MMVEC_EACH_EA(TAG##_nqpred,DESCR,ATTRIB,NT,"if (!Qv4) vmem","=Vs32",fSTOREMMVNQ(EA,VsV,QvV))
+
+DEF_CVI_MAPPING(V6_stup0,  "if (Pv4) vmemu(Rt32)=Vs32",  "if (Pv4) vmemu(Rt32+#0)=Vs32")
+DEF_CVI_MAPPING(V6_stunp0, "if (!Pv4) vmemu(Rt32)=Vs32",  "if (!Pv4) vmemu(Rt32+#0)=Vs32")
+
+
+
+/****************************************************************
+* MAPPING FOR VMEMs
+****************************************************************/
+
+#define ATTR_VMEM A_EXTENSION,A_CVI,A_CVI_VM,A_NOTE_VMEM,A_NOTE_ANY_RESOURCE,A_VMEM
+#define ATTR_VMEMU A_EXTENSION,A_CVI,A_CVI_VM,A_NOTE_VMEM,A_NOTE_PERMUTE_RESOURCE,A_CVI_VP,A_VMEMU
+
+
+MMVEC_LD(vL32b,  "Aligned Vector Load",        ATTRIBS(ATTR_VMEM,A_LOAD,A_CVI_VA),)
+MMVEC_LDC(vL32b,  "Aligned Vector Load Cur",	ATTRIBS(ATTR_VMEM,A_LOAD,A_CVI_NEW,A_CVI_VA),)
+MMVEC_LDT(vL32b,  "Aligned Vector Load Tmp",	ATTRIBS(ATTR_VMEM,A_LOAD,A_CVI_TMP),)
+
+MMVEC_COND_EACH_EA(vL32b,"Conditional Aligned Vector Load",ATTRIBS(ATTR_VMEM,A_LOAD,A_CVI_VA),,"Vd32=vmem",,Pv,fLOADMMV(EA,VdV);)
+MMVEC_COND_EACH_EA(vL32b_cur,"Conditional Aligned Vector Load Cur",ATTRIBS(ATTR_VMEM,A_LOAD,A_CVI_VA,A_CVI_NEW),,"Vd32.cur=vmem",,Pv,fLOADMMV(EA,VdV);)
+MMVEC_COND_EACH_EA(vL32b_tmp,"Conditional Aligned Vector Load Tmp",ATTRIBS(ATTR_VMEM,A_LOAD,A_CVI_TMP),,"Vd32.tmp=vmem",,Pv,fLOADMMV(EA,VdV);)
+
+MMVEC_EACH_EA(vS32b,"Aligned Vector Store",ATTRIBS(ATTR_VMEM,A_STORE,A_RESTRICT_SLOT0ONLY,A_CVI_VA),,"vmem","=Vs32",fSTOREMMV(EA,VsV))
+MMVEC_COND_EACH_EA(vS32b,"Aligned Vector Store",ATTRIBS(ATTR_VMEM,A_STORE,A_RESTRICT_SLOT0ONLY,A_CVI_VA),,"vmem","=Vs32",Pv,fSTOREMMV(EA,VsV))
+
+
+MMVEC_STQ(vS32b,  "Aligned Vector Store",      ATTRIBS(ATTR_VMEM,A_STORE,A_RESTRICT_SLOT0ONLY,A_CVI_VA),)
+
+MMVEC_LDU(vL32Ub, "Unaligned Vector Load",     ATTRIBS(ATTR_VMEMU,A_LOAD,A_RESTRICT_NOSLOT1),)
+
+MMVEC_EACH_EA(vS32Ub,"Unaligned Vector Store",ATTRIBS(ATTR_VMEMU,A_STORE,A_RESTRICT_NOSLOT1),,"vmemu","=Vs32",fSTOREMMVU(EA,VsV))
+
+MMVEC_COND_EACH_EA(vS32Ub,"Unaligned Vector Store",ATTRIBS(ATTR_VMEMU,A_STORE,A_RESTRICT_NOSLOT1),,"vmemu","=Vs32",Pv,fSTOREMMVU(EA,VsV))
+
+MMVEC_EACH_EA(vS32b_new,"Aligned Vector Store New",ATTRIBS(ATTR_VMEM,A_STORE,A_CVI_NEW,A_RESTRICT_SINGLE_MEM_FIRST,A_DOTNEWVALUE,A_RESTRICT_SLOT0ONLY),,"vmem","=Os8.new",fSTOREMMV(EA,fNEWVREG(OsN)))
+
+// V65 store relase, zero byte store
+MMVEC_EACH_EA(vS32b_srls,"Aligned Vector Scatter Release",ATTRIBS(ATTR_VMEM,A_STORE,A_CVI_SCATTER_RELEASE,A_CVI_NEW,A_RESTRICT_SLOT0ONLY),,"vmem",":scatter_release",fSTORERELEASE(EA,0))
+
+
+
+MMVEC_COND_EACH_EA(vS32b_new,"Aligned Vector Store New",ATTRIBS(ATTR_VMEM,A_STORE,A_CVI_NEW,A_RESTRICT_SINGLE_MEM_FIRST,A_DOTNEWVALUE,A_RESTRICT_SLOT0ONLY),,"vmem","=Os8.new",Pv,fSTOREMMV(EA,fNEWVREG(OsN)))
+
+
+// Loads
+DEF_CVI_MAPPING(V6_ld0,   "Vd32=vmem(Rt32)",  "Vd32=vmem(Rt32+#0)")
+DEF_CVI_MAPPING(V6_ldu0,  "Vd32=vmemu(Rt32)", "Vd32=vmemu(Rt32+#0)")
+
+DEF_CVI_MAPPING(V6_ldp0,   "if (Pv4) Vd32=vmem(Rt32)",  "if (Pv4) Vd32=vmem(Rt32+#0)")
+DEF_CVI_MAPPING(V6_ldnp0,  "if (!Pv4) Vd32=vmem(Rt32)",  "if (!Pv4) Vd32=vmem(Rt32+#0)")
+DEF_CVI_MAPPING(V6_ldcp0,  "if (Pv4) Vd32.cur=vmem(Rt32)", "if (Pv4) Vd32.cur=vmem(Rt32+#0)")
+DEF_CVI_MAPPING(V6_ldtp0,  "if (Pv4) Vd32.tmp=vmem(Rt32)", "if (Pv4) Vd32.tmp=vmem(Rt32+#0)")
+DEF_CVI_MAPPING(V6_ldcnp0,  "if (!Pv4) Vd32.cur=vmem(Rt32)", "if (!Pv4) Vd32.cur=vmem(Rt32+#0)")
+DEF_CVI_MAPPING(V6_ldtnp0,  "if (!Pv4) Vd32.tmp=vmem(Rt32)", "if (!Pv4) Vd32.tmp=vmem(Rt32+#0)")
+
+
+
+
+// Stores
+DEF_CVI_MAPPING(V6_st0,    "vmem(Rt32)=Vs32",     "vmem(Rt32+#0)=Vs32")
+DEF_CVI_MAPPING(V6_stn0,   "vmem(Rt32)=Os8.new",  "vmem(Rt32+#0)=Os8.new")
+DEF_CVI_MAPPING(V6_stq0,   "if (Qv4) vmem(Rt32)=Vs32",     "if (Qv4) vmem(Rt32+#0)=Vs32")
+DEF_CVI_MAPPING(V6_stnq0,  "if (!Qv4) vmem(Rt32)=Vs32",     "if (!Qv4) vmem(Rt32+#0)=Vs32")
+DEF_CVI_MAPPING(V6_stp0,   "if (Pv4) vmem(Rt32)=Vs32",     "if (Pv4) vmem(Rt32+#0)=Vs32")
+DEF_CVI_MAPPING(V6_stnp0,  "if (!Pv4) vmem(Rt32)=Vs32",     "if (!Pv4) vmem(Rt32+#0)=Vs32")
+
+
+DEF_CVI_MAPPING(V6_stu0,   "vmemu(Rt32)=Vs32",  "vmemu(Rt32+#0)=Vs32")
+
+
+
+/******************************************************************************************
+*
+* MMVECTOR MEMORY OPERATIONS - NON TEMPORAL
+*
+*******************************************************************************************/
+
+#define ATTR_VMEM_NT A_EXTENSION,A_CVI,A_CVI_VM,A_NOTE_VMEM,A_NT_VMEM,A_NOTE_NT_VMEM,A_NOTE_ANY_RESOURCE,A_VMEM
+
+MMVEC_EACH_EA(vS32b_nt,"Aligned Vector Store - Non temporal",ATTRIBS(ATTR_VMEM_NT,A_STORE,A_RESTRICT_SLOT0ONLY,A_CVI_VA),":nt","vmem","=Vs32",fSTOREMMV(EA,VsV))
+MMVEC_COND_EACH_EA(vS32b_nt,"Aligned Vector Store - Non temporal",ATTRIBS(ATTR_VMEM_NT,A_STORE,A_RESTRICT_SLOT0ONLY,A_CVI_VA),":nt","vmem","=Vs32",Pv,fSTOREMMV(EA,VsV))
+
+MMVEC_EACH_EA(vS32b_nt_new,"Aligned Vector Store New - Non temporal",ATTRIBS(ATTR_VMEM_NT,A_STORE,A_CVI_NEW,A_RESTRICT_SINGLE_MEM_FIRST,A_DOTNEWVALUE,A_RESTRICT_SLOT0ONLY),":nt","vmem","=Os8.new",fSTOREMMV(EA,fNEWVREG(OsN)))
+MMVEC_COND_EACH_EA(vS32b_nt_new,"Aligned Vector Store New - Non temporal",ATTRIBS(ATTR_VMEM_NT,A_STORE,A_CVI_NEW,A_RESTRICT_SINGLE_MEM_FIRST,A_DOTNEWVALUE,A_RESTRICT_SLOT0ONLY),":nt","vmem","=Os8.new",Pv,fSTOREMMV(EA,fNEWVREG(OsN)))
+
+
+MMVEC_STQ(vS32b_nt,  "Aligned Vector Store - Non temporal",      ATTRIBS(ATTR_VMEM_NT,A_STORE,A_RESTRICT_SLOT0ONLY,A_CVI_VA),":nt")
+
+MMVEC_LD(vL32b_nt,  "Aligned Vector Load - Non temporal",       ATTRIBS(ATTR_VMEM_NT,A_LOAD,A_CVI_VA),":nt")
+MMVEC_LDC(vL32b_nt,  "Aligned Vector Load Cur - Non temporal",	ATTRIBS(ATTR_VMEM_NT,A_LOAD,A_CVI_NEW,A_CVI_VA),":nt")
+MMVEC_LDT(vL32b_nt,  "Aligned Vector Load Tmp - Non temporal",	ATTRIBS(ATTR_VMEM_NT,A_LOAD,A_CVI_TMP),":nt")
+
+MMVEC_COND_EACH_EA(vL32b_nt,"Conditional Aligned Vector Load",ATTRIBS(ATTR_VMEM_NT,A_CVI_VA),,"Vd32=vmem",":nt",Pv,fLOADMMV(EA,VdV);)
+MMVEC_COND_EACH_EA(vL32b_nt_cur,"Conditional Aligned Vector Load Cur",ATTRIBS(ATTR_VMEM_NT,A_CVI_VA,A_CVI_NEW),,"Vd32.cur=vmem",":nt",Pv,fLOADMMV(EA,VdV);)
+MMVEC_COND_EACH_EA(vL32b_nt_tmp,"Conditional Aligned Vector Load Tmp",ATTRIBS(ATTR_VMEM_NT,A_CVI_TMP),,"Vd32.tmp=vmem",":nt",Pv,fLOADMMV(EA,VdV);)
+
+
+// Loads
+DEF_CVI_MAPPING(V6_ldnt0,   "Vd32=vmem(Rt32):nt",  "Vd32=vmem(Rt32+#0):nt")
+DEF_CVI_MAPPING(V6_ldpnt0,   "if (Pv4) Vd32=vmem(Rt32):nt",  "if (Pv4) Vd32=vmem(Rt32+#0):nt")
+DEF_CVI_MAPPING(V6_ldnpnt0,   "if (!Pv4) Vd32=vmem(Rt32):nt",  "if (!Pv4) Vd32=vmem(Rt32+#0):nt")
+DEF_CVI_MAPPING(V6_ldcpnt0,  "if (Pv4) Vd32.cur=vmem(Rt32):nt", "if (Pv4) Vd32.cur=vmem(Rt32+#0):nt")
+DEF_CVI_MAPPING(V6_ldtpnt0,  "if (Pv4) Vd32.tmp=vmem(Rt32):nt", "if (Pv4) Vd32.tmp=vmem(Rt32+#0):nt")
+DEF_CVI_MAPPING(V6_ldcnpnt0,  "if (!Pv4) Vd32.cur=vmem(Rt32):nt", "if (!Pv4) Vd32.cur=vmem(Rt32+#0):nt")
+DEF_CVI_MAPPING(V6_ldtnpnt0,  "if (!Pv4) Vd32.tmp=vmem(Rt32):nt", "if (!Pv4) Vd32.tmp=vmem(Rt32+#0):nt")
+
+
+// Stores
+DEF_CVI_MAPPING(V6_stnt0,    "vmem(Rt32):nt=Vs32",     "vmem(Rt32+#0):nt=Vs32")
+DEF_CVI_MAPPING(V6_stnnt0,   "vmem(Rt32):nt=Os8.new",  "vmem(Rt32+#0):nt=Os8.new")
+DEF_CVI_MAPPING(V6_stqnt0,   "if (Qv4) vmem(Rt32):nt=Vs32",     "if (Qv4) vmem(Rt32+#0):nt=Vs32")
+DEF_CVI_MAPPING(V6_stnqnt0,  "if (!Qv4) vmem(Rt32):nt=Vs32",     "if (!Qv4) vmem(Rt32+#0):nt=Vs32")
+DEF_CVI_MAPPING(V6_stpnt0,   "if (Pv4) vmem(Rt32):nt=Vs32",     "if (Pv4) vmem(Rt32+#0):nt=Vs32")
+DEF_CVI_MAPPING(V6_stnpnt0,  "if (!Pv4) vmem(Rt32):nt=Vs32",     "if (!Pv4) vmem(Rt32+#0):nt=Vs32")
+
+
+
+#undef VEC_SCALE
+
+
+/***************************************************
+ * Vector Alignment
+ ************************************************/
+
+#define VALIGNB(SHIFT)  \
+    fHIDE(int i;) \
+    for(i = 0; i < fVBYTES(); i++) {\
+        VdV.ub[i] = (i+SHIFT>=fVBYTES()) ? VuV.ub[i+SHIFT-fVBYTES()] : VvV.ub[i+SHIFT];\
+	}
+
+EXTINSN(V6_valignb,  "Vd32=valign(Vu32,Vv32,Rt8)",  ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VP,A_NOTE_PERMUTE_RESOURCE,A_NOTE_RT8),"Align Two vectors by Rt8 as control",
+{
+	unsigned shift = RtV & (fVBYTES()-1);
+	VALIGNB(shift)
+})
+EXTINSN(V6_vlalignb, "Vd32=vlalign(Vu32,Vv32,Rt8)", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VP,A_NOTE_PERMUTE_RESOURCE,A_NOTE_RT8),"Align Two vectors by Rt8 as control",
+{
+	unsigned shift = fVBYTES() - (RtV & (fVBYTES()-1));
+	VALIGNB(shift)
+})
+EXTINSN(V6_valignbi, "Vd32=valign(Vu32,Vv32,#u3)", 	ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VP,A_NOTE_PERMUTE_RESOURCE),"Align Two vectors by #u3 as control",
+{
+	VALIGNB(uiV)
+})
+EXTINSN(V6_vlalignbi,"Vd32=vlalign(Vu32,Vv32,#u3)", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VP,A_NOTE_PERMUTE_RESOURCE),"Align Two vectors by #u3 as control",
+{
+	unsigned shift = fVBYTES() - uiV;
+	VALIGNB(shift)
+})
+
+EXTINSN(V6_vror, "Vd32=vror(Vu32,Rt32)", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VP,A_NOTE_PERMUTE_RESOURCE),
+"Align Two vectors by Rt32 as control",
+{
+	fHIDE(int k;)
+	for (k=0;k<fVBYTES();k++) {
+		VdV.ub[k] = VuV.ub[(k+RtV)&(fVBYTES()-1)];
+	}
+	})
+
+
+
+
+
+
+
+/**************************************************************
+* Unpack elements with zero/sign extend and cross lane permute
+***************************************************************/
+
+ITERATOR_INSN2_PERMUTE_SLOT_DOUBLE_VEC(8,vunpackub,  "Vdd32=vunpackub(Vu32)", "Vdd32.uh=vunpack(Vu32.ub)", "Unpack byte with zero-extend",     fVARRAY_ELEMENT_ACCESS(VddV, uh, i)  = fZE8_16( VuV.ub[i]))
+ITERATOR_INSN2_PERMUTE_SLOT_DOUBLE_VEC(8,vunpackb,   "Vdd32=vunpackb(Vu32)",  "Vdd32.h=vunpack(Vu32.b)",   "Unpack bytes with sign-extend",    fVARRAY_ELEMENT_ACCESS(VddV, h,  i)  = fSE8_16( VuV.b[i] ))
+ITERATOR_INSN2_PERMUTE_SLOT_DOUBLE_VEC(16,vunpackuh, "Vdd32=vunpackuh(Vu32)", "Vdd32.uw=vunpack(Vu32.uh)", "Unpack halves with zero-extend",   fVARRAY_ELEMENT_ACCESS(VddV, uw, i)  = fZE16_32(VuV.uh[i]))
+ITERATOR_INSN2_PERMUTE_SLOT_DOUBLE_VEC(16,vunpackh,  "Vdd32=vunpackh(Vu32)",  "Vdd32.w=vunpack(Vu32.h)",   "Unpack halves with sign-extend",   fVARRAY_ELEMENT_ACCESS(VddV, w,  i)  = fSE16_32(VuV.h[i] ))
+
+ITERATOR_INSN2_PERMUTE_SLOT_DOUBLE_VEC(8, vunpackob, "Vxx32|=vunpackob(Vu32)", "Vxx32.h|=vunpacko(Vu32.b)", "Unpack byte to odd bytes ",       fVARRAY_ELEMENT_ACCESS(VxxV, uh, i) |= fZE8_16( VuV.ub[i])<<8)
+ITERATOR_INSN2_PERMUTE_SLOT_DOUBLE_VEC(16,vunpackoh, "Vxx32|=vunpackoh(Vu32)", "Vxx32.w|=vunpacko(Vu32.h)", "Unpack halves to odd halves",     fVARRAY_ELEMENT_ACCESS(VxxV, uw, i) |= fZE16_32(VuV.uh[i])<<16)
+
+
+/**************************************************************
+* Pack elements and cross lane permute
+***************************************************************/
+
+ ITERATOR_INSN2_PERMUTE_SLOT(16, vpackeb,  "Vd32=vpackeb(Vu32,Vv32)", "Vd32.b=vpacke(Vu32.h,Vv32.h)",
+ "Pack  bytes",
+    VdV.ub[i]               = fGETUBYTE(0, VvV.uh[i]);
+    VdV.ub[i+fVELEM(16)]    = fGETUBYTE(0, VuV.uh[i]))
+
+ ITERATOR_INSN2_PERMUTE_SLOT(32, vpackeh,  "Vd32=vpackeh(Vu32,Vv32)", "Vd32.h=vpacke(Vu32.w,Vv32.w)",
+ "Pack  halfwords",
+    VdV.uh[i]               = fGETUHALF(0, VvV.uw[i]);
+    VdV.uh[i+fVELEM(32)]    = fGETUHALF(0, VuV.uw[i]))
+
+  ITERATOR_INSN2_PERMUTE_SLOT(16, vpackob,  "Vd32=vpackob(Vu32,Vv32)", "Vd32.b=vpacko(Vu32.h,Vv32.h)",
+ "Pack  bytes",
+    VdV.ub[i]               = fGETUBYTE(1, VvV.uh[i]);
+    VdV.ub[i+fVELEM(16)]    = fGETUBYTE(1, VuV.uh[i]))
+
+ ITERATOR_INSN2_PERMUTE_SLOT(32, vpackoh,  "Vd32=vpackoh(Vu32,Vv32)", "Vd32.h=vpacko(Vu32.w,Vv32.w)",
+ "Pack  halfwords",
+    VdV.uh[i]               = fGETUHALF(1, VvV.uw[i]);
+    VdV.uh[i+fVELEM(32)]    = fGETUHALF(1, VuV.uw[i]))
+
+
+
+ITERATOR_INSN2_PERMUTE_SLOT(16, vpackhub_sat,  "Vd32=vpackhub(Vu32,Vv32):sat", "Vd32.ub=vpack(Vu32.h,Vv32.h):sat",
+ "Pack ubytes with saturation",
+    VdV.ub[i]               = fVSATUB(VvV.h[i]);
+    VdV.ub[i+fVELEM(16)]    = fVSATUB(VuV.h[i]))
+
+
+ITERATOR_INSN2_PERMUTE_SLOT(16, vpackhb_sat,  "Vd32=vpackhb(Vu32,Vv32):sat", "Vd32.b=vpack(Vu32.h,Vv32.h):sat",
+ "Pack bytes with saturation",
+    VdV.b[i]               = fVSATB(VvV.h[i]);
+    VdV.b[i+fVELEM(16)]    = fVSATB(VuV.h[i]))
+
+
+ITERATOR_INSN2_PERMUTE_SLOT(32, vpackwuh_sat,  "Vd32=vpackwuh(Vu32,Vv32):sat", "Vd32.uh=vpack(Vu32.w,Vv32.w):sat",
+ "Pack ubytes with saturation",
+    VdV.uh[i]               = fVSATUH(VvV.w[i]);
+    VdV.uh[i+fVELEM(32)]    = fVSATUH(VuV.w[i]))
+
+ITERATOR_INSN2_PERMUTE_SLOT(32, vpackwh_sat,  "Vd32=vpackwh(Vu32,Vv32):sat", "Vd32.h=vpack(Vu32.w,Vv32.w):sat",
+ "Pack bytes with saturation",
+    VdV.h[i]               = fVSATH(VvV.w[i]);
+    VdV.h[i+fVELEM(32)]    = fVSATH(VuV.w[i]))
+
+
+
+
+
+/**************************************************************
+* Zero/Sign Extend with in-lane permute
+***************************************************************/
+
+ITERATOR_INSN2_ANY_SLOT_DOUBLE_VEC(16,vzb,"Vdd32=vzxtb(Vu32)","Vdd32.uh=vzxt(Vu32.ub)",
+"Vector Zero Extend Bytes",
+    VddV.v[0].uh[i] = fZE8_16(fGETUBYTE(0, VuV.uh[i]));
+    VddV.v[1].uh[i] = fZE8_16(fGETUBYTE(1, VuV.uh[i])))
+
+ITERATOR_INSN2_ANY_SLOT_DOUBLE_VEC(16,vsb,"Vdd32=vsxtb(Vu32)","Vdd32.h=vsxt(Vu32.b)",
+"Vector Sign Extend Bytes",
+    VddV.v[0].h[i] = fSE8_16(fGETBYTE(0, VuV.h[i]));
+    VddV.v[1].h[i] = fSE8_16(fGETBYTE(1, VuV.h[i])))
+
+ITERATOR_INSN2_ANY_SLOT_DOUBLE_VEC(32,vzh,"Vdd32=vzxth(Vu32)","Vdd32.uw=vzxt(Vu32.uh)",
+"Vector Zero Extend halfwords",
+    VddV.v[0].uw[i] = fZE16_32(fGETUHALF(0, VuV.uw[i]));
+    VddV.v[1].uw[i] = fZE16_32(fGETUHALF(1, VuV.uw[i])))
+
+ITERATOR_INSN2_ANY_SLOT_DOUBLE_VEC(32,vsh,"Vdd32=vsxth(Vu32)","Vdd32.w=vsxt(Vu32.h)",
+"Vector Sign Extend halfwords",
+    VddV.v[0].w[i] = fSE16_32(fGETHALF(0, VuV.w[i]));
+    VddV.v[1].w[i] = fSE16_32(fGETHALF(1, VuV.w[i])))
+
+
+/**********************************************************************
+*
+*
+*
+*               MMVECTOR REDUCTION
+*
+*
+*
+**********************************************************************/
+
+/********************************************
+*  2-WAY REDUCTION - UNSIGNED BYTE BY BYTE
+********************************************/
+
+
+ITERATOR_INSN2_MPY_SLOT(16,vdmpybus,"Vd32=vdmpybus(Vu32,Rt32)","Vd32.h=vdmpy(Vu32.ub,Rt32.b)",
+"Vector Dual Multiply-Accumulates unsigned bytes by bytes",
+    VdV.h[i]   = fMPY8US( fGETUBYTE(0, VuV.uh[i]), fGETBYTE((2*i) % 4, RtV));
+    VdV.h[i]  += fMPY8US( fGETUBYTE(1, VuV.uh[i]), fGETBYTE((2*i+1)%4, RtV)))
+
+ITERATOR_INSN2_MPY_SLOT(16,vdmpybus_acc,"Vx32+=vdmpybus(Vu32,Rt32)","Vx32.h+=vdmpy(Vu32.ub,Rt32.b)",
+"Vector Dual Multiply-Accumulates unsigned bytes by  bytes, and accumulate",
+    VxV.h[i] += fMPY8US( fGETUBYTE(0, VuV.uh[i]), fGETBYTE((2*i) % 4, RtV));
+    VxV.h[i] += fMPY8US( fGETUBYTE(1, VuV.uh[i]), fGETBYTE((2*i+1)%4, RtV)))
+
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vdmpybus_dv,"Vdd32=vdmpybus(Vuu32,Rt32)","Vdd32.h=vdmpy(Vuu32.ub,Rt32.b)",
+"Vector Dual Multiply-Accumulates unsigned bytes by  bytes, and accumulate Sliding Window Reduction",
+    VddV.v[0].h[i]  = fMPY8US(fGETUBYTE(0, VuuV.v[0].uh[i]),fGETBYTE((2*i) % 4, RtV));
+    VddV.v[0].h[i] += fMPY8US(fGETUBYTE(1, VuuV.v[0].uh[i]),fGETBYTE((2*i+1)%4, RtV));
+
+    VddV.v[1].h[i]  = fMPY8US(fGETUBYTE(1, VuuV.v[0].uh[i]),fGETBYTE((2*i) % 4, RtV));
+    VddV.v[1].h[i] += fMPY8US(fGETUBYTE(0, VuuV.v[1].uh[i]),fGETBYTE((2*i+1)%4, RtV)))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vdmpybus_dv_acc,"Vxx32+=vdmpybus(Vuu32,Rt32)","Vxx32.h+=vdmpy(Vuu32.ub,Rt32.b)",
+"Vector Dual Multiply-Accumulates unsigned bytes by  bytes, and accumulate Sliding Window Reduction",
+    VxxV.v[0].h[i] += fMPY8US(fGETUBYTE(0, VuuV.v[0].uh[i]),fGETBYTE((2*i) % 4, RtV));
+    VxxV.v[0].h[i] += fMPY8US(fGETUBYTE(1, VuuV.v[0].uh[i]),fGETBYTE((2*i+1)%4, RtV));
+
+    VxxV.v[1].h[i] += fMPY8US(fGETUBYTE(1, VuuV.v[0].uh[i]),fGETBYTE((2*i) % 4, RtV));
+    VxxV.v[1].h[i] += fMPY8US(fGETUBYTE(0, VuuV.v[1].uh[i]),fGETBYTE((2*i+1)%4, RtV)))
+
+
+
+/********************************************
+*  2-WAY REDUCTION - HALF BY BYTE
+********************************************/
+ITERATOR_INSN2_MPY_SLOT(32,vdmpyhb,"Vd32=vdmpyhb(Vu32,Rt32)","Vd32.w=vdmpy(Vu32.h,Rt32.b)",
+"Dual-Vector 2-Element Half x Byte Reduction with Sliding Window Overlap",
+    VdV.w[i]  = fMPY16SS(fGETHALF(0, VuV.w[i]),fGETBYTE((2*i+0)%4, RtV));
+    VdV.w[i] += fMPY16SS(fGETHALF(1, VuV.w[i]),fGETBYTE((2*i+1)%4, RtV)))
+
+ITERATOR_INSN2_MPY_SLOT(32,vdmpyhb_acc,"Vx32+=vdmpyhb(Vu32,Rt32)","Vx32.w+=vdmpy(Vu32.h,Rt32.b)",
+"Dual-Vector 2-Element Half x Byte Reduction with Sliding Window Overlap",
+    VxV.w[i] += fMPY16SS(fGETHALF(0, VuV.w[i]),fGETBYTE((2*i+0)%4, RtV));
+    VxV.w[i] += fMPY16SS(fGETHALF(1, VuV.w[i]),fGETBYTE((2*i+1)%4, RtV)))
+
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vdmpyhb_dv,"Vdd32=vdmpyhb(Vuu32,Rt32)","Vdd32.w=vdmpy(Vuu32.h,Rt32.b)",
+"Dual-Vector 2-Element Half x Byte Reduction with Sliding Window Overlap",
+    VddV.v[0].w[i]  = fMPY16SS(fGETHALF(0, VuuV.v[0].w[i]),fGETBYTE((2*i+0)%4, RtV));
+    VddV.v[0].w[i] += fMPY16SS(fGETHALF(1, VuuV.v[0].w[i]),fGETBYTE((2*i+1)%4, RtV));
+
+    VddV.v[1].w[i]  = fMPY16SS(fGETHALF(1, VuuV.v[0].w[i]),fGETBYTE((2*i+0)%4, RtV));
+    VddV.v[1].w[i] += fMPY16SS(fGETHALF(0, VuuV.v[1].w[i]),fGETBYTE((2*i+1)%4, RtV)))
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vdmpyhb_dv_acc,"Vxx32+=vdmpyhb(Vuu32,Rt32)","Vxx32.w+=vdmpy(Vuu32.h,Rt32.b)",
+"Dual-Vector 2-Element Half x Byte Reduction with Sliding Window Overlap",
+    VxxV.v[0].w[i] += fMPY16SS(fGETHALF(0, VuuV.v[0].w[i]),fGETBYTE((2*i+0)%4, RtV));
+    VxxV.v[0].w[i] += fMPY16SS(fGETHALF(1, VuuV.v[0].w[i]),fGETBYTE((2*i+1)%4, RtV));
+
+    VxxV.v[1].w[i] += fMPY16SS(fGETHALF(1, VuuV.v[0].w[i]),fGETBYTE((2*i+0)%4, RtV));
+    VxxV.v[1].w[i] += fMPY16SS(fGETHALF(0, VuuV.v[1].w[i]),fGETBYTE((2*i+1)%4, RtV)))
+
+
+
+
+
+/********************************************
+*  2-WAY REDUCTION - HALF BY HALF
+********************************************/
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vdmpyhvsat,"Vd32=vdmpyh(Vu32,Vv32):sat","Vd32.w=vdmpy(Vu32.h,Vv32.h):sat",
+"Vector halfword multiply, accumulate pairs, sat to word",
+    fHIDE(size8s_t accum;)
+    accum    = fMPY16SS(fGETHALF(0,VuV.w[i]),fGETHALF(0, VvV.w[i]));
+    accum   += fMPY16SS(fGETHALF(1,VuV.w[i]),fGETHALF(1, VvV.w[i]));
+    VdV.w[i] = fVSATW(accum))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vdmpyhvsat_acc,"Vx32+=vdmpyh(Vu32,Vv32):sat","Vx32.w+=vdmpy(Vu32.h,Vv32.h):sat",
+"Vector halfword multiply, accumulate pairs, sat to word",
+    fHIDE(size8s_t accum;)
+    accum    = fMPY16SS(fGETHALF(0,VuV.w[i]),fGETHALF(0, VvV.w[i]));
+    accum   += fMPY16SS(fGETHALF(1,VuV.w[i]),fGETHALF(1, VvV.w[i]));
+    VxV.w[i] = fVSATW(VxV.w[i]+accum))
+
+
+/* VDMPYH */
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vdmpyhsat,"Vd32=vdmpyh(Vu32,Rt32):sat","Vd32.w=vdmpy(Vu32.h,Rt32.h):sat",
+"Vector halfword multiply, accumulate pairs, saturate to word",
+    fHIDE(size8s_t accum;)
+    accum    = fMPY16SS(fGETHALF(0, VuV.w[i]),fGETHALF(0, RtV));
+    accum   += fMPY16SS(fGETHALF(1, VuV.w[i]),fGETHALF(1, RtV));
+    VdV.w[i] = fVSATW(accum))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vdmpyhsat_acc,"Vx32+=vdmpyh(Vu32,Rt32):sat","Vx32.w+=vdmpy(Vu32.h,Rt32.h):sat",
+"Vector halfword multiply, accumulate pairs, saturate to word",
+    fHIDE(size8s_t) accum = VxV.w[i];
+    accum   += fMPY16SS(fGETHALF(0, VuV.w[i]),fGETHALF(0, RtV));
+    accum   += fMPY16SS(fGETHALF(1, VuV.w[i]),fGETHALF(1, RtV));
+    VxV.w[i] = fVSATW(accum))
+
+
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vdmpyhisat,"Vd32=vdmpyh(Vuu32,Rt32):sat","Vd32.w=vdmpy(Vuu32.h,Rt32.h):sat",
+"Dual Vector Signed Halfword by Signed Halfword 2-Way Reduction to Halfword with saturation",
+    fHIDE(size8s_t accum;)
+    accum    = fMPY16SS(fGETHALF(1,VuuV.v[0].w[i]),fGETHALF(0,RtV));
+    accum   += fMPY16SS(fGETHALF(0,VuuV.v[1].w[i]),fGETHALF(1,RtV));
+    VdV.w[i] = fVSATW(accum))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vdmpyhisat_acc,"Vx32+=vdmpyh(Vuu32,Rt32):sat","Vx32.w+=vdmpy(Vuu32.h,Rt32.h):sat",
+"Dual Vector Signed Halfword by Signed Halfword 2-Way Reduction to Halfword with accumulation and saturation",
+    fHIDE(size8s_t) accum = VxV.w[i];
+    accum   += fMPY16SS(fGETHALF(1,VuuV.v[0].w[i]),fGETHALF(0,RtV));
+    accum   += fMPY16SS(fGETHALF(0,VuuV.v[1].w[i]),fGETHALF(1,RtV));
+    VxV.w[i] = fVSATW(accum))
+
+
+
+
+
+
+
+/* VDMPYHSU */
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vdmpyhsusat,"Vd32=vdmpyhsu(Vu32,Rt32):sat","Vd32.w=vdmpy(Vu32.h,Rt32.uh):sat",
+"Vector halfword multiply, accumulate pairs, saturate to word",
+    fHIDE(size8s_t accum;)
+    accum    = fMPY16SU(fGETHALF(0, VuV.w[i]),fGETUHALF(0, RtV));
+    accum   += fMPY16SU(fGETHALF(1, VuV.w[i]),fGETUHALF(1, RtV));
+    VdV.w[i] = fVSATW(accum))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vdmpyhsusat_acc,"Vx32+=vdmpyhsu(Vu32,Rt32):sat","Vx32.w+=vdmpy(Vu32.h,Rt32.uh):sat",
+"Vector halfword multiply, accumulate pairs, saturate to word",
+    fHIDE(size8s_t) accum=VxV.w[i];
+    accum   += fMPY16SU(fGETHALF(0, VuV.w[i]),fGETUHALF(0, RtV));
+    accum   += fMPY16SU(fGETHALF(1, VuV.w[i]),fGETUHALF(1, RtV));
+    VxV.w[i] = fVSATW(accum))
+
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vdmpyhsuisat,"Vd32=vdmpyhsu(Vuu32,Rt32,#1):sat","Vd32.w=vdmpy(Vuu32.h,Rt32.uh,#1):sat",
+"Dual Vector Signed Halfword by Signed Halfword 2-Way Reduction to Halfword with saturation",
+    fHIDE(size8s_t accum;)
+    accum    = fMPY16SU(fGETHALF(1,VuuV.v[0].w[i]),fGETUHALF(0,RtV));
+    accum   += fMPY16SU(fGETHALF(0,VuuV.v[1].w[i]),fGETUHALF(1,RtV));
+    VdV.w[i] = fVSATW(accum))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vdmpyhsuisat_acc,"Vx32+=vdmpyhsu(Vuu32,Rt32,#1):sat","Vx32.w+=vdmpy(Vuu32.h,Rt32.uh,#1):sat",
+"Dual Vector Signed Halfword by Signed Halfword 2-Way Reduction to Halfword with accumulation and saturation",
+    fHIDE(size8s_t) accum=VxV.w[i];
+    accum   += fMPY16SU(fGETHALF(1, VuuV.v[0].w[i]),fGETUHALF(0,RtV));
+    accum   += fMPY16SU(fGETHALF(0, VuuV.v[1].w[i]),fGETUHALF(1,RtV));
+    VxV.w[i] = fVSATW(accum))
+
+
+
+/********************************************
+*  3-WAY REDUCTION - UNSIGNED BYTE BY  BYTE
+********************************************/
+
+ ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vtmpyb, "Vdd32=vtmpyb(Vuu32,Rt32)", "Vdd32.h=vtmpy(Vuu32.b,Rt32.b)",
+"Dual Vector 3x1 Reduction",
+    VddV.v[0].h[i]  = fMPY8SS(fGETBYTE(0,VuuV.v[0].h[i]), fGETBYTE((2*i  )%4, RtV));
+    VddV.v[0].h[i] += fMPY8SS(fGETBYTE(1,VuuV.v[0].h[i]), fGETBYTE((2*i+1)%4, RtV));
+    VddV.v[0].h[i] += fGETBYTE(0,VuuV.v[1].h[i]);
+
+    VddV.v[1].h[i]  = fMPY8SS(fGETBYTE(1,VuuV.v[0].h[i]), fGETBYTE((2*i  )%4, RtV));
+    VddV.v[1].h[i] += fMPY8SS(fGETBYTE(0,VuuV.v[1].h[i]), fGETBYTE((2*i+1)%4, RtV));
+    VddV.v[1].h[i] += fGETBYTE(1,VuuV.v[1].h[i]))
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vtmpyb_acc, "Vxx32+=vtmpyb(Vuu32,Rt32)", "Vxx32.h+=vtmpy(Vuu32.b,Rt32.b)",
+"Dual Vector 3x1 Reduction",
+    VxxV.v[0].h[i] += fMPY8SS(fGETBYTE(0,VuuV.v[0].h[i]), fGETBYTE((2*i  )%4, RtV));
+    VxxV.v[0].h[i] += fMPY8SS(fGETBYTE(1,VuuV.v[0].h[i]), fGETBYTE((2*i+1)%4, RtV));
+    VxxV.v[0].h[i] += fGETBYTE(0,VuuV.v[1].h[i]);
+
+    VxxV.v[1].h[i] += fMPY8SS(fGETBYTE(1,VuuV.v[0].h[i]), fGETBYTE((2*i  )%4, RtV));
+    VxxV.v[1].h[i] += fMPY8SS(fGETBYTE(0,VuuV.v[1].h[i]), fGETBYTE((2*i+1)%4, RtV));
+    VxxV.v[1].h[i] += fGETBYTE(1,VuuV.v[1].h[i]))
+
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vtmpybus, "Vdd32=vtmpybus(Vuu32,Rt32)", "Vdd32.h=vtmpy(Vuu32.ub,Rt32.b)",
+"Dual Vector 3x1 Reduction",
+    VddV.v[0].h[i]  = fMPY8US(fGETUBYTE(0,VuuV.v[0].uh[i]), fGETBYTE((2*i  )%4, RtV));
+    VddV.v[0].h[i] += fMPY8US(fGETUBYTE(1,VuuV.v[0].uh[i]), fGETBYTE((2*i+1)%4, RtV));
+    VddV.v[0].h[i] += fGETUBYTE(0,VuuV.v[1].uh[i]);
+
+    VddV.v[1].h[i]  = fMPY8US(fGETUBYTE(1,VuuV.v[0].uh[i]), fGETBYTE((2*i  )%4, RtV));
+    VddV.v[1].h[i] += fMPY8US(fGETUBYTE(0,VuuV.v[1].uh[i]), fGETBYTE((2*i+1)%4, RtV));
+    VddV.v[1].h[i] += fGETUBYTE(1,VuuV.v[1].uh[i]))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vtmpybus_acc, "Vxx32+=vtmpybus(Vuu32,Rt32)", "Vxx32.h+=vtmpy(Vuu32.ub,Rt32.b)",
+"Dual Vector 3x1 Reduction",
+    VxxV.v[0].h[i] += fMPY8US(fGETUBYTE(0,VuuV.v[0].uh[i]), fGETBYTE((2*i  )%4, RtV));
+    VxxV.v[0].h[i] += fMPY8US(fGETUBYTE(1,VuuV.v[0].uh[i]), fGETBYTE((2*i+1)%4, RtV));
+    VxxV.v[0].h[i] += fGETUBYTE(0,VuuV.v[1].uh[i]);
+
+    VxxV.v[1].h[i] += fMPY8US(fGETUBYTE(1,VuuV.v[0].uh[i]), fGETBYTE((2*i  )%4, RtV));
+    VxxV.v[1].h[i] += fMPY8US(fGETUBYTE(0,VuuV.v[1].uh[i]), fGETBYTE((2*i+1)%4, RtV));
+    VxxV.v[1].h[i] += fGETUBYTE(1,VuuV.v[1].uh[i]))
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vtmpyhb, "Vdd32=vtmpyhb(Vuu32,Rt32)", "Vdd32.w=vtmpy(Vuu32.h,Rt32.b)",
+"Dual Vector 3x1 Reduction",
+    VddV.v[0].w[i] = fMPY16SS(fGETHALF(0,VuuV.v[0].w[i]), fSE8_16(fGETBYTE((2*i+0)%4, RtV)));
+    VddV.v[0].w[i]+= fMPY16SS(fGETHALF(1,VuuV.v[0].w[i]), fSE8_16(fGETBYTE((2*i+1)%4, RtV)));
+    VddV.v[0].w[i]+= fGETHALF(0,VuuV.v[1].w[i]);
+
+    VddV.v[1].w[i] = fMPY16SS(fGETHALF(1,VuuV.v[0].w[i]), fSE8_16(fGETBYTE((2*i+0)%4, RtV)));
+    VddV.v[1].w[i]+= fMPY16SS(fGETHALF(0,VuuV.v[1].w[i]), fSE8_16(fGETBYTE((2*i+1)%4, RtV)));
+    VddV.v[1].w[i]+= fGETHALF(1,VuuV.v[1].w[i]))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vtmpyhb_acc, "Vxx32+=vtmpyhb(Vuu32,Rt32)", "Vxx32.w+=vtmpy(Vuu32.h,Rt32.b)",
+"Dual Vector 3x1 Reduction",
+    VxxV.v[0].w[i]+= fMPY16SS(fGETHALF(0,VuuV.v[0].w[i]), fSE8_16(fGETBYTE((2*i+0)%4, RtV)));
+    VxxV.v[0].w[i]+= fMPY16SS(fGETHALF(1,VuuV.v[0].w[i]), fSE8_16(fGETBYTE((2*i+1)%4, RtV)));
+    VxxV.v[0].w[i]+= fGETHALF(0,VuuV.v[1].w[i]);
+
+    VxxV.v[1].w[i]+= fMPY16SS(fGETHALF(1,VuuV.v[0].w[i]), fSE8_16(fGETBYTE((2*i+0)%4, RtV)));
+    VxxV.v[1].w[i]+= fMPY16SS(fGETHALF(0,VuuV.v[1].w[i]), fSE8_16(fGETBYTE((2*i+1)%4, RtV)));
+    VxxV.v[1].w[i]+= fGETHALF(1,VuuV.v[1].w[i]))
+
+
+/********************************************
+*  4-WAY REDUCTION - UNSIGNED BYTE BY UNSIGNED BYTE
+********************************************/
+
+
+
+ITERATOR_INSN2_MPY_SLOT(32,vrmpyub,"Vd32=vrmpyub(Vu32,Rt32)","Vd32.uw=vrmpy(Vu32.ub,Rt32.ub)",
+"Vector Multiply-Accumulate Reduce with 4 byte coefficients",
+    VdV.uw[i]  = fMPY8UU(fGETUBYTE(0,VuV.uw[i]), fGETUBYTE(0,RtV));
+    VdV.uw[i] += fMPY8UU(fGETUBYTE(1,VuV.uw[i]), fGETUBYTE(1,RtV));
+    VdV.uw[i] += fMPY8UU(fGETUBYTE(2,VuV.uw[i]), fGETUBYTE(2,RtV));
+    VdV.uw[i] += fMPY8UU(fGETUBYTE(3,VuV.uw[i]), fGETUBYTE(3,RtV)))
+
+ITERATOR_INSN2_MPY_SLOT(32,vrmpyub_acc,"Vx32+=vrmpyub(Vu32,Rt32)","Vx32.uw+=vrmpy(Vu32.ub,Rt32.ub)",
+"Vector Multiply-Accumulate Reduce with 4 byte coefficients Accumulate",
+    VxV.uw[i] += fMPY8UU(fGETUBYTE(0,VuV.uw[i]), fGETUBYTE(0,RtV));
+    VxV.uw[i] += fMPY8UU(fGETUBYTE(1,VuV.uw[i]), fGETUBYTE(1,RtV));
+    VxV.uw[i] += fMPY8UU(fGETUBYTE(2,VuV.uw[i]), fGETUBYTE(2,RtV));
+    VxV.uw[i] += fMPY8UU(fGETUBYTE(3,VuV.uw[i]), fGETUBYTE(3,RtV)))
+
+
+ITERATOR_INSN2_MPY_SLOT(32,vrmpyubv,"Vd32=vrmpyub(Vu32,Vv32)","Vd32.uw=vrmpy(Vu32.ub,Vv32.ub)",
+"Vector Multiply-Accumulate Reduce with 4 byte coefficients",
+    VdV.uw[i]  = fMPY8UU(fGETUBYTE(0,VuV.uw[i]), fGETUBYTE(0,VvV.uw[i]));
+    VdV.uw[i] += fMPY8UU(fGETUBYTE(1,VuV.uw[i]), fGETUBYTE(1,VvV.uw[i]));
+    VdV.uw[i] += fMPY8UU(fGETUBYTE(2,VuV.uw[i]), fGETUBYTE(2,VvV.uw[i]));
+    VdV.uw[i] += fMPY8UU(fGETUBYTE(3,VuV.uw[i]), fGETUBYTE(3,VvV.uw[i])))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vrmpyubv_acc,"Vx32+=vrmpyub(Vu32,Vv32)","Vx32.uw+=vrmpy(Vu32.ub,Vv32.ub)",
+"Vector Multiply-Accumulate Reduce with 4 byte coefficients Accumulate",
+    VxV.uw[i] += fMPY8UU(fGETUBYTE(0,VuV.uw[i]), fGETUBYTE(0,VvV.uw[i]));
+    VxV.uw[i] += fMPY8UU(fGETUBYTE(1,VuV.uw[i]), fGETUBYTE(1,VvV.uw[i]));
+    VxV.uw[i] += fMPY8UU(fGETUBYTE(2,VuV.uw[i]), fGETUBYTE(2,VvV.uw[i]));
+    VxV.uw[i] += fMPY8UU(fGETUBYTE(3,VuV.uw[i]), fGETUBYTE(3,VvV.uw[i])))
+
+ITERATOR_INSN2_MPY_SLOT(32,vrmpybv,"Vd32=vrmpyb(Vu32,Vv32)","Vd32.w=vrmpy(Vu32.b,Vv32.b)",
+"Vector Multiply-Accumulate Reduce with 4 byte coefficients",
+    VdV.w[i]  = fMPY8SS(fGETBYTE(0, VuV.w[i]), fGETBYTE(0, VvV.w[i]));
+    VdV.w[i] += fMPY8SS(fGETBYTE(1, VuV.w[i]), fGETBYTE(1, VvV.w[i]));
+    VdV.w[i] += fMPY8SS(fGETBYTE(2, VuV.w[i]), fGETBYTE(2, VvV.w[i]));
+    VdV.w[i] += fMPY8SS(fGETBYTE(3, VuV.w[i]), fGETBYTE(3, VvV.w[i])))
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vrmpybv_acc,"Vx32+=vrmpyb(Vu32,Vv32)","Vx32.w+=vrmpy(Vu32.b,Vv32.b)",
+"Vector Multiply-Accumulate Reduce with 4 byte coefficients",
+    VxV.w[i] += fMPY8SS(fGETBYTE(0, VuV.w[i]), fGETBYTE(0, VvV.w[i]));
+    VxV.w[i] += fMPY8SS(fGETBYTE(1, VuV.w[i]), fGETBYTE(1, VvV.w[i]));
+    VxV.w[i] += fMPY8SS(fGETBYTE(2, VuV.w[i]), fGETBYTE(2, VvV.w[i]));
+    VxV.w[i] += fMPY8SS(fGETBYTE(3, VuV.w[i]), fGETBYTE(3, VvV.w[i])))
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vrmpyubi,"Vdd32=vrmpyub(Vuu32,Rt32,#u1)","Vdd32.uw=vrmpy(Vuu32.ub,Rt32.ub,#u1)",
+"Dual Vector Unsigned Byte By Signed Byte 4-way Reduction to Word",
+    VddV.v[0].uw[i]  = fMPY8UU(fGETUBYTE(0, VuuV.v[uiV ? 1:0].uw[i]),fGETUBYTE((0-uiV) & 0x3,RtV));
+    VddV.v[0].uw[i] += fMPY8UU(fGETUBYTE(1, VuuV.v[0        ].uw[i]),fGETUBYTE((1-uiV) & 0x3,RtV));
+    VddV.v[0].uw[i] += fMPY8UU(fGETUBYTE(2, VuuV.v[0        ].uw[i]),fGETUBYTE((2-uiV) & 0x3,RtV));
+    VddV.v[0].uw[i] += fMPY8UU(fGETUBYTE(3, VuuV.v[0        ].uw[i]),fGETUBYTE((3-uiV) & 0x3,RtV));
+
+    VddV.v[1].uw[i]  = fMPY8UU(fGETUBYTE(0, VuuV.v[1        ].uw[i]),fGETUBYTE((2-uiV) & 0x3,RtV));
+    VddV.v[1].uw[i] += fMPY8UU(fGETUBYTE(1, VuuV.v[1        ].uw[i]),fGETUBYTE((3-uiV) & 0x3,RtV));
+    VddV.v[1].uw[i] += fMPY8UU(fGETUBYTE(2, VuuV.v[uiV ? 1:0].uw[i]),fGETUBYTE((0-uiV) & 0x3,RtV));
+    VddV.v[1].uw[i] += fMPY8UU(fGETUBYTE(3, VuuV.v[0        ].uw[i]),fGETUBYTE((1-uiV) & 0x3,RtV)))
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vrmpyubi_acc,"Vxx32+=vrmpyub(Vuu32,Rt32,#u1)","Vxx32.uw+=vrmpy(Vuu32.ub,Rt32.ub,#u1)",
+"Dual Vector Unsigned Byte By Signed Byte 4-way Reduction with accumulate and saturation to Word",
+    VxxV.v[0].uw[i] += fMPY8UU(fGETUBYTE(0, VuuV.v[uiV ? 1:0].uw[i]),fGETUBYTE((0-uiV) & 0x3,RtV));
+    VxxV.v[0].uw[i] += fMPY8UU(fGETUBYTE(1, VuuV.v[0        ].uw[i]),fGETUBYTE((1-uiV) & 0x3,RtV));
+    VxxV.v[0].uw[i] += fMPY8UU(fGETUBYTE(2, VuuV.v[0        ].uw[i]),fGETUBYTE((2-uiV) & 0x3,RtV));
+    VxxV.v[0].uw[i] += fMPY8UU(fGETUBYTE(3, VuuV.v[0        ].uw[i]),fGETUBYTE((3-uiV) & 0x3,RtV));
+
+    VxxV.v[1].uw[i] += fMPY8UU(fGETUBYTE(0, VuuV.v[1        ].uw[i]),fGETUBYTE((2-uiV) & 0x3,RtV));
+    VxxV.v[1].uw[i] += fMPY8UU(fGETUBYTE(1, VuuV.v[1        ].uw[i]),fGETUBYTE((3-uiV) & 0x3,RtV));
+    VxxV.v[1].uw[i] += fMPY8UU(fGETUBYTE(2, VuuV.v[uiV ? 1:0].uw[i]),fGETUBYTE((0-uiV) & 0x3,RtV));
+    VxxV.v[1].uw[i] += fMPY8UU(fGETUBYTE(3, VuuV.v[0        ].uw[i]),fGETUBYTE((1-uiV) & 0x3,RtV)))
+
+
+
+
+/********************************************
+*  4-WAY REDUCTION - UNSIGNED BYTE BY  BYTE
+********************************************/
+
+ITERATOR_INSN2_MPY_SLOT(32,vrmpybus,"Vd32=vrmpybus(Vu32,Rt32)","Vd32.w=vrmpy(Vu32.ub,Rt32.b)",
+"Vector Multiply-Accumulate Reduce with 4 byte coefficients",
+    VdV.w[i]  = fMPY8US(fGETUBYTE(0,VuV.uw[i]), fGETBYTE(0,RtV));
+    VdV.w[i] += fMPY8US(fGETUBYTE(1,VuV.uw[i]), fGETBYTE(1,RtV));
+    VdV.w[i] += fMPY8US(fGETUBYTE(2,VuV.uw[i]), fGETBYTE(2,RtV));
+    VdV.w[i] += fMPY8US(fGETUBYTE(3,VuV.uw[i]), fGETBYTE(3,RtV)))
+
+
+ITERATOR_INSN2_MPY_SLOT(32,vrmpybus_acc,"Vx32+=vrmpybus(Vu32,Rt32)","Vx32.w+=vrmpy(Vu32.ub,Rt32.b)",
+"Vector Multiply-Accumulate Reduce with 4 byte coefficients",
+    VxV.w[i] += fMPY8US(fGETUBYTE(0,VuV.uw[i]), fGETBYTE(0,RtV));
+    VxV.w[i] += fMPY8US(fGETUBYTE(1,VuV.uw[i]), fGETBYTE(1,RtV));
+    VxV.w[i] += fMPY8US(fGETUBYTE(2,VuV.uw[i]), fGETBYTE(2,RtV));
+    VxV.w[i] += fMPY8US(fGETUBYTE(3,VuV.uw[i]), fGETBYTE(3,RtV)))
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vrmpybusi,"Vdd32=vrmpybus(Vuu32,Rt32,#u1)","Vdd32.w=vrmpy(Vuu32.ub,Rt32.b,#u1)",
+"Dual Vector Unsigned Byte By Signed Byte 4-way Reduction to Word",
+    VddV.v[0].w[i]  = fMPY8US(fGETUBYTE(0, VuuV.v[uiV ? 1:0].uw[i]),fGETBYTE((0-uiV) & 0x3,RtV));
+    VddV.v[0].w[i] += fMPY8US(fGETUBYTE(1, VuuV.v[0        ].uw[i]),fGETBYTE((1-uiV) & 0x3,RtV));
+    VddV.v[0].w[i] += fMPY8US(fGETUBYTE(2, VuuV.v[0        ].uw[i]),fGETBYTE((2-uiV) & 0x3,RtV));
+    VddV.v[0].w[i] += fMPY8US(fGETUBYTE(3, VuuV.v[0        ].uw[i]),fGETBYTE((3-uiV) & 0x3,RtV));
+
+    VddV.v[1].w[i]  = fMPY8US(fGETUBYTE(0, VuuV.v[1        ].uw[i]),fGETBYTE((2-uiV) & 0x3,RtV));
+    VddV.v[1].w[i] += fMPY8US(fGETUBYTE(1, VuuV.v[1        ].uw[i]),fGETBYTE((3-uiV) & 0x3,RtV));
+    VddV.v[1].w[i] += fMPY8US(fGETUBYTE(2, VuuV.v[uiV ? 1:0].uw[i]),fGETBYTE((0-uiV) & 0x3,RtV));
+    VddV.v[1].w[i] += fMPY8US(fGETUBYTE(3, VuuV.v[0        ].uw[i]),fGETBYTE((1-uiV) & 0x3,RtV)))
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vrmpybusi_acc,"Vxx32+=vrmpybus(Vuu32,Rt32,#u1)","Vxx32.w+=vrmpy(Vuu32.ub,Rt32.b,#u1)",
+"Dual Vector Unsigned Byte By Signed Byte 4-way Reduction with accumulate and saturation to Word",
+    VxxV.v[0].w[i] += fMPY8US(fGETUBYTE(0, VuuV.v[uiV ? 1:0].uw[i]),fGETBYTE((0-uiV) & 0x3,RtV));
+    VxxV.v[0].w[i] += fMPY8US(fGETUBYTE(1, VuuV.v[0        ].uw[i]),fGETBYTE((1-uiV) & 0x3,RtV));
+    VxxV.v[0].w[i] += fMPY8US(fGETUBYTE(2, VuuV.v[0        ].uw[i]),fGETBYTE((2-uiV) & 0x3,RtV));
+    VxxV.v[0].w[i] += fMPY8US(fGETUBYTE(3, VuuV.v[0        ].uw[i]),fGETBYTE((3-uiV) & 0x3,RtV));
+
+    VxxV.v[1].w[i] += fMPY8US(fGETUBYTE(0, VuuV.v[1        ].uw[i]),fGETBYTE((2-uiV) & 0x3,RtV));
+    VxxV.v[1].w[i] += fMPY8US(fGETUBYTE(1, VuuV.v[1        ].uw[i]),fGETBYTE((3-uiV) & 0x3,RtV));
+    VxxV.v[1].w[i] += fMPY8US(fGETUBYTE(2, VuuV.v[uiV ? 1:0].uw[i]),fGETBYTE((0-uiV) & 0x3,RtV));
+    VxxV.v[1].w[i] += fMPY8US(fGETUBYTE(3, VuuV.v[0        ].uw[i]),fGETBYTE((1-uiV) & 0x3,RtV)))
+
+
+
+
+ITERATOR_INSN2_MPY_SLOT(32,vrmpybusv,"Vd32=vrmpybus(Vu32,Vv32)","Vd32.w=vrmpy(Vu32.ub,Vv32.b)",
+"Vector Multiply-Accumulate Reduce with 4 byte coefficients",
+    VdV.w[i]  = fMPY8US(fGETUBYTE(0,VuV.uw[i]), fGETBYTE(0,VvV.w[i]));
+    VdV.w[i] += fMPY8US(fGETUBYTE(1,VuV.uw[i]), fGETBYTE(1,VvV.w[i]));
+    VdV.w[i] += fMPY8US(fGETUBYTE(2,VuV.uw[i]), fGETBYTE(2,VvV.w[i]));
+    VdV.w[i] += fMPY8US(fGETUBYTE(3,VuV.uw[i]), fGETBYTE(3,VvV.w[i])))
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vrmpybusv_acc,"Vx32+=vrmpybus(Vu32,Vv32)","Vx32.w+=vrmpy(Vu32.ub,Vv32.b)",
+"Vector Multiply-Accumulate Reduce with 4 byte coefficients",
+    VxV.w[i] += fMPY8US(fGETUBYTE(0,VuV.uw[i]), fGETBYTE(0,VvV.w[i]));
+    VxV.w[i] += fMPY8US(fGETUBYTE(1,VuV.uw[i]), fGETBYTE(1,VvV.w[i]));
+    VxV.w[i] += fMPY8US(fGETUBYTE(2,VuV.uw[i]), fGETBYTE(2,VvV.w[i]));
+    VxV.w[i] += fMPY8US(fGETUBYTE(3,VuV.uw[i]), fGETBYTE(3,VvV.w[i])))
+
+
+
+
+
+
+
+
+
+
+
+/********************************************
+*  2-WAY REDUCTION - SAD
+********************************************/
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vdsaduh,"Vdd32=vdsaduh(Vuu32,Rt32)","Vdd32.uw=vdsad(Vuu32.uh,Rt32.uh)",
+"Dual Vector Halfword by Byte 4-Way Reduction to Word",
+    VddV.v[0].uw[i]  = fABS(fGETUHALF(0, VuuV.v[0].uw[i]) - fGETUHALF(0,RtV));
+    VddV.v[0].uw[i] += fABS(fGETUHALF(1, VuuV.v[0].uw[i]) - fGETUHALF(1,RtV));
+    VddV.v[1].uw[i]  = fABS(fGETUHALF(1, VuuV.v[0].uw[i]) - fGETUHALF(0,RtV));
+    VddV.v[1].uw[i] += fABS(fGETUHALF(0, VuuV.v[1].uw[i]) - fGETUHALF(1,RtV)))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vdsaduh_acc,"Vxx32+=vdsaduh(Vuu32,Rt32)","Vxx32.uw+=vdsad(Vuu32.uh,Rt32.uh)",
+"Dual Vector Halfword by Byte 4-Way Reduction to Word",
+    VxxV.v[0].uw[i] += fABS(fGETUHALF(0, VuuV.v[0].uw[i]) - fGETUHALF(0,RtV));
+    VxxV.v[0].uw[i] += fABS(fGETUHALF(1, VuuV.v[0].uw[i]) - fGETUHALF(1,RtV));
+    VxxV.v[1].uw[i] += fABS(fGETUHALF(1, VuuV.v[0].uw[i]) - fGETUHALF(0,RtV));
+    VxxV.v[1].uw[i] += fABS(fGETUHALF(0, VuuV.v[1].uw[i]) - fGETUHALF(1,RtV)))
+
+
+
+
+/********************************************
+*  4-WAY REDUCTION - SAD
+********************************************/
+
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vrsadubi,"Vdd32=vrsadub(Vuu32,Rt32,#u1)","Vdd32.uw=vrsad(Vuu32.ub,Rt32.ub,#u1)",
+"Dual Vector Halfword by Byte 4-Way Reduction to Word",
+    VddV.v[0].uw[i]  = fABS(fZE8_16(fGETUBYTE(0, VuuV.v[uiV?1:0].uw[i])) - fZE8_16(fGETUBYTE((0-uiV)&3,RtV)));
+    VddV.v[0].uw[i] += fABS(fZE8_16(fGETUBYTE(1, VuuV.v[0      ].uw[i])) - fZE8_16(fGETUBYTE((1-uiV)&3,RtV)));
+    VddV.v[0].uw[i] += fABS(fZE8_16(fGETUBYTE(2, VuuV.v[0      ].uw[i])) - fZE8_16(fGETUBYTE((2-uiV)&3,RtV)));
+    VddV.v[0].uw[i] += fABS(fZE8_16(fGETUBYTE(3, VuuV.v[0      ].uw[i])) - fZE8_16(fGETUBYTE((3-uiV)&3,RtV)));
+
+    VddV.v[1].uw[i]  = fABS(fZE8_16(fGETUBYTE(0, VuuV.v[1      ].uw[i])) - fZE8_16(fGETUBYTE((2-uiV)&3,RtV)));
+    VddV.v[1].uw[i] += fABS(fZE8_16(fGETUBYTE(1, VuuV.v[1      ].uw[i])) - fZE8_16(fGETUBYTE((3-uiV)&3,RtV)));
+    VddV.v[1].uw[i] += fABS(fZE8_16(fGETUBYTE(2, VuuV.v[uiV?1:0].uw[i])) - fZE8_16(fGETUBYTE((0-uiV)&3,RtV)));
+    VddV.v[1].uw[i] += fABS(fZE8_16(fGETUBYTE(3, VuuV.v[0      ].uw[i])) - fZE8_16(fGETUBYTE((1-uiV)&3,RtV))))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vrsadubi_acc,"Vxx32+=vrsadub(Vuu32,Rt32,#u1)","Vxx32.uw+=vrsad(Vuu32.ub,Rt32.ub,#u1)",
+"Dual Vector Halfword by Byte 4-Way Reduction to Word",
+    VxxV.v[0].uw[i] += fABS(fZE8_16(fGETUBYTE(0, VuuV.v[uiV?1:0].uw[i])) - fZE8_16(fGETUBYTE((0-uiV)&3,RtV)));
+    VxxV.v[0].uw[i] += fABS(fZE8_16(fGETUBYTE(1, VuuV.v[0      ].uw[i])) - fZE8_16(fGETUBYTE((1-uiV)&3,RtV)));
+    VxxV.v[0].uw[i] += fABS(fZE8_16(fGETUBYTE(2, VuuV.v[0      ].uw[i])) - fZE8_16(fGETUBYTE((2-uiV)&3,RtV)));
+    VxxV.v[0].uw[i] += fABS(fZE8_16(fGETUBYTE(3, VuuV.v[0      ].uw[i])) - fZE8_16(fGETUBYTE((3-uiV)&3,RtV)));
+
+    VxxV.v[1].uw[i] += fABS(fZE8_16(fGETUBYTE(0, VuuV.v[1      ].uw[i])) - fZE8_16(fGETUBYTE((2-uiV)&3,RtV)));
+    VxxV.v[1].uw[i] += fABS(fZE8_16(fGETUBYTE(1, VuuV.v[1      ].uw[i])) - fZE8_16(fGETUBYTE((3-uiV)&3,RtV)));
+    VxxV.v[1].uw[i] += fABS(fZE8_16(fGETUBYTE(2, VuuV.v[uiV?1:0].uw[i])) - fZE8_16(fGETUBYTE((0-uiV)&3,RtV)));
+    VxxV.v[1].uw[i] += fABS(fZE8_16(fGETUBYTE(3, VuuV.v[0      ].uw[i])) - fZE8_16(fGETUBYTE((1-uiV)&3,RtV))))
+
+
+
+
+
+
+
+
+
+
+/*********************************************************************
+ * MMVECTOR SHIFTING
+ * ******************************************************************/
+// Macro to shift arithmetically left/right and by either RT or Vv
+
+#define V_SHIFT(TYPE, DESC, SIZE, LOGSIZE, CASTTYPE)   \
+ITERATOR_INSN2_SHIFT_SLOT(SIZE,vasr##TYPE,   "Vd32=vasr" #TYPE "(Vu32,Rt32)","Vd32."#TYPE"=vasr(Vu32."#TYPE",Rt32)",         "Vector arithmetic shift right " DESC,    VdV.TYPE[i]     = (VuV.TYPE[i]    >> (RtV & (SIZE-1)))) \
+ITERATOR_INSN2_SHIFT_SLOT(SIZE,vasl##TYPE,   "Vd32=vasl" #TYPE "(Vu32,Rt32)","Vd32."#TYPE"=vasl(Vu32."#TYPE",Rt32)",         "Vector arithmetic shift left  " DESC,    VdV.TYPE[i]     = (VuV.TYPE[i]    << (RtV & (SIZE-1)))) \
+ITERATOR_INSN2_SHIFT_SLOT(SIZE,vlsr##TYPE,   "Vd32=vlsr" #TYPE "(Vu32,Rt32)","Vd32.u"#TYPE"=vlsr(Vu32.u"#TYPE",Rt32)",       "Vector logical shift right "    DESC,    VdV.u##TYPE[i]  = (VuV.u##TYPE[i] >> (RtV & (SIZE-1)))) \
+ITERATOR_INSN2_SHIFT_SLOT(SIZE,vasr##TYPE##v,"Vd32=vasr" #TYPE "(Vu32,Vv32)","Vd32."#TYPE"=vasr(Vu32."#TYPE",Vv32."#TYPE")", "Vector arithmetic shift right " DESC,    VdV.TYPE[i]     = fBIDIR_ASHIFTR(VuV.TYPE[i], fSXTN((LOGSIZE+1),SIZE,VvV.TYPE[i]),CASTTYPE)) \
+ITERATOR_INSN2_SHIFT_SLOT(SIZE,vasl##TYPE##v,"Vd32=vasl" #TYPE "(Vu32,Vv32)","Vd32."#TYPE"=vasl(Vu32."#TYPE",Vv32."#TYPE")", "Vector arithmetic shift left  " DESC,    VdV.TYPE[i]     = fBIDIR_ASHIFTL(VuV.TYPE[i],  fSXTN((LOGSIZE+1),SIZE,VvV.TYPE[i]),CASTTYPE)) \
+ITERATOR_INSN2_SHIFT_SLOT(SIZE,vlsr##TYPE##v,"Vd32=vlsr" #TYPE "(Vu32,Vv32)","Vd32."#TYPE"=vlsr(Vu32."#TYPE",Vv32."#TYPE")", "Vector logical shift right "    DESC,    VdV.u##TYPE[i]  = fBIDIR_LSHIFTR(VuV.u##TYPE[i], fSXTN((LOGSIZE+1),SIZE,VvV.TYPE[i]),CASTTYPE)) \
+
+V_SHIFT(w, "word",   32,5,4_4)
+V_SHIFT(h, "halfword", 16,4,2_2)
+
+ITERATOR_INSN_SHIFT_SLOT(8,vlsrb,"Vd32.ub=vlsr(Vu32.ub,Rt32)","vec log shift right bytes", VdV.b[i] = VuV.ub[i] >> (RtV & 0x7))
+
+ITERATOR_INSN2_SHIFT_SLOT(32,vrotr,"Vd32=vrotr(Vu32,Vv32)","Vd32.uw=vrotr(Vu32.uw,Vv32.uw)","Vector word rotate right", VdV.uw[i] = ((VuV.uw[i] >> (VvV.uw[i] & 0x1f)) | (VuV.uw[i] << (32 - (VvV.uw[i] & 0x1f)))))
+
+/*********************************************************************
+ * MMVECTOR SHIFT AND PERMUTE
+ * ******************************************************************/
+
+ITERATOR_INSN2_PERMUTE_SLOT_DOUBLE_VEC(32,vasr_into,"Vxx32=vasrinto(Vu32,Vv32)","Vxx32.w=vasrinto(Vu32.w,Vv32.w)","ASR vector 1 elements and overlay dropping bits to MSB of vector 2 elements",
+    fHIDE(int64_t ) shift = (fSE32_64(VuV.w[i]) << 32);
+    fHIDE(int64_t ) mask  = (((fSE32_64(VxxV.v[0].w[i])) << 32) | fZE32_64(VxxV.v[0].w[i]));
+    fHIDE(int64_t) lomask = (((fSE32_64(1)) << 32) - 1);
+    fHIDE(int ) count = -(0x40 & VvV.w[i]) + (VvV.w[i] & 0x3f);
+    fHIDE(int64_t ) result = (count == -0x40) ? 0 : (((count < 0) ? ((shift << -(count)) | (mask & (lomask << -(count)))) : ((shift >> count) | (mask & (lomask >> count)))));
+    VxxV.v[1].w[i] = ((result >> 32) & 0xffffffff);
+    VxxV.v[0].w[i] = (result & 0xffffffff))
+
+#define NEW_NARROWING_SHIFT 1
+
+#if NEW_NARROWING_SHIFT
+#define NARROWING_SHIFT(ITERSIZE,TAG,DSTM,DSTTYPE,SRCTYPE,SYNOPTS,SATFUNC,RNDFUNC,SHAMTMASK) \
+ITERATOR_INSN_SHIFT_SLOT(ITERSIZE,TAG, \
+"Vd32." #DSTTYPE "=vasr(Vu32." #SRCTYPE ",Vv32." #SRCTYPE ",Rt8)" #SYNOPTS, \
+"Vector shift right and shuffle", \
+	fHIDE(fRT8NOTE())\
+    fHIDE(int )shamt = RtV & SHAMTMASK; \
+    DSTM(0,VdV.SRCTYPE[i],SATFUNC(RNDFUNC(VvV.SRCTYPE[i],shamt) >> shamt)); \
+    DSTM(1,VdV.SRCTYPE[i],SATFUNC(RNDFUNC(VuV.SRCTYPE[i],shamt) >> shamt)))
+
+
+
+
+
+/* WORD TO HALF*/
+
+NARROWING_SHIFT(32,vasrwh,fSETHALF,h,w,,fECHO,fVNOROUND,0xF)
+NARROWING_SHIFT(32,vasrwhsat,fSETHALF,h,w,:sat,fVSATH,fVNOROUND,0xF)
+NARROWING_SHIFT(32,vasrwhrndsat,fSETHALF,h,w,:rnd:sat,fVSATH,fVROUND,0xF)
+NARROWING_SHIFT(32,vasrwuhrndsat,fSETHALF,uh,w,:rnd:sat,fVSATUH,fVROUND,0xF)
+NARROWING_SHIFT(32,vasrwuhsat,fSETHALF,uh,w,:sat,fVSATUH,fVNOROUND,0xF)
+NARROWING_SHIFT(32,vasruwuhrndsat,fSETHALF,uh,uw,:rnd:sat,fVSATUH,fVROUND,0xF)
+
+NARROWING_SHIFT_NOV1(32,vasruwuhsat,fSETHALF,uh,uw,:sat,fVSATUH,fVNOROUND,0xF)
+NARROWING_SHIFT(16,vasrhubsat,fSETBYTE,ub,h,:sat,fVSATUB,fVNOROUND,0x7)
+NARROWING_SHIFT(16,vasrhubrndsat,fSETBYTE,ub,h,:rnd:sat,fVSATUB,fVROUND,0x7)
+NARROWING_SHIFT(16,vasrhbsat,fSETBYTE,b,h,:sat,fVSATB,fVNOROUND,0x7)
+NARROWING_SHIFT(16,vasrhbrndsat,fSETBYTE,b,h,:rnd:sat,fVSATB,fVROUND,0x7)
+
+NARROWING_SHIFT_NOV1(16,vasruhubsat,fSETBYTE,ub,uh,:sat,fVSATUB,fVNOROUND,0x7)
+NARROWING_SHIFT_NOV1(16,vasruhubrndsat,fSETBYTE,ub,uh,:rnd:sat,fVSATUB,fVROUND,0x7)
+
+#else
+ITERATOR_INSN2_SHIFT_SLOT(32,vasrwh,"Vd32=vasrwh(Vu32,Vv32,Rt8)","Vd32.h=vasr(Vu32.w,Vv32.w,Rt8)",
+"Vector arithmetic shift right words, shuffle even halfwords",
+	fHIDE(fRT8NOTE())\
+    fSETHALF(0,VdV.w[i], (VvV.w[i] >> (RtV & 0xF)));
+    fSETHALF(1,VdV.w[i], (VuV.w[i] >> (RtV & 0xF))))
+
+
+ITERATOR_INSN2_SHIFT_SLOT(32,vasrwhsat,"Vd32=vasrwh(Vu32,Vv32,Rt8):sat","Vd32.h=vasr(Vu32.w,Vv32.w,Rt8):sat",
+"Vector arithmetic shift right words, shuffle even halfwords",
+	fHIDE(fRT8NOTE())\
+    fSETHALF(0,VdV.w[i], fVSATH(VvV.w[i] >> (RtV & 0xF)));
+    fSETHALF(1,VdV.w[i], fVSATH(VuV.w[i] >> (RtV & 0xF))))
+
+ITERATOR_INSN2_SHIFT_SLOT(32,vasrwhrndsat,"Vd32=vasrwh(Vu32,Vv32,Rt8):rnd:sat","Vd32.h=vasr(Vu32.w,Vv32.w,Rt8):rnd:sat",
+"Vector arithmetic shift right words, shuffle even halfwords",
+	fHIDE(fRT8NOTE())\
+    fHIDE(int ) shamt = RtV & 0xF;
+    fSETHALF(0,VdV.w[i], fVSATH(  (VvV.w[i] + fBIDIR_ASHIFTL(1,(shamt-1),4_8) ) >> shamt));
+    fSETHALF(1,VdV.w[i], fVSATH(  (VuV.w[i] + fBIDIR_ASHIFTL(1,(shamt-1),4_8) ) >> shamt)))
+
+ITERATOR_INSN2_SHIFT_SLOT(32,vasrwuhrndsat,"Vd32=vasrwuh(Vu32,Vv32,Rt8):rnd:sat","Vd32.uh=vasr(Vu32.w,Vv32.w,Rt8):rnd:sat",
+"Vector arithmetic shift right words, shuffle even halfwords",
+	fHIDE(fRT8NOTE())\
+    fHIDE(int ) shamt = RtV & 0xF;
+    fSETHALF(0,VdV.w[i], fVSATUH(  (VvV.w[i] + fBIDIR_ASHIFTL(1,(shamt-1),4_8) ) >> shamt));
+    fSETHALF(1,VdV.w[i], fVSATUH(  (VuV.w[i] + fBIDIR_ASHIFTL(1,(shamt-1),4_8) ) >> shamt)))
+
+ITERATOR_INSN2_SHIFT_SLOT(32,vasrwuhsat,"Vd32=vasrwuh(Vu32,Vv32,Rt8):sat","Vd32.uh=vasr(Vu32.w,Vv32.w,Rt8):sat",
+"Vector arithmetic shift right words, shuffle even halfwords",
+	fHIDE(fRT8NOTE())\
+    fSETHALF(0, VdV.uw[i], fVSATUH(VvV.w[i] >> (RtV & 0xF)));
+    fSETHALF(1, VdV.uw[i], fVSATUH(VuV.w[i] >> (RtV & 0xF))))
+
+ITERATOR_INSN2_SHIFT_SLOT(32,vasruwuhrndsat,"Vd32=vasruwuh(Vu32,Vv32,Rt8):rnd:sat","Vd32.uh=vasr(Vu32.uw,Vv32.uw,Rt8):rnd:sat",
+"Vector arithmetic shift right words, shuffle even halfwords",
+	fHIDE(fRT8NOTE())\
+    fHIDE(int ) shamt = RtV & 0xF;
+    fSETHALF(0,VdV.w[i], fVSATUH(  (VvV.uw[i] + fBIDIR_ASHIFTL(1,(shamt-1),4_8) ) >> shamt));
+    fSETHALF(1,VdV.w[i], fVSATUH(  (VuV.uw[i] + fBIDIR_ASHIFTL(1,(shamt-1),4_8) ) >> shamt)))
+#endif
+
+
+
+ITERATOR_INSN2_SHIFT_SLOT(32,vroundwh,"Vd32=vroundwh(Vu32,Vv32):sat","Vd32.h=vround(Vu32.w,Vv32.w):sat",
+"Vector round words to halves, shuffle resultant halfwords",
+    fSETHALF(0, VdV.uw[i], fVSATH((VvV.w[i] + fCONSTLL(0x8000)) >> 16));
+    fSETHALF(1, VdV.uw[i], fVSATH((VuV.w[i] + fCONSTLL(0x8000)) >> 16)))
+
+ITERATOR_INSN2_SHIFT_SLOT(32,vroundwuh,"Vd32=vroundwuh(Vu32,Vv32):sat","Vd32.uh=vround(Vu32.w,Vv32.w):sat",
+"Vector round words to halves, shuffle resultant halfwords",
+    fSETHALF(0, VdV.uw[i], fVSATUH((VvV.w[i] + fCONSTLL(0x8000)) >> 16));
+    fSETHALF(1, VdV.uw[i], fVSATUH((VuV.w[i] + fCONSTLL(0x8000)) >> 16)))
+
+ITERATOR_INSN2_SHIFT_SLOT(32,vrounduwuh,"Vd32=vrounduwuh(Vu32,Vv32):sat","Vd32.uh=vround(Vu32.uw,Vv32.uw):sat",
+"Vector round words to halves, shuffle resultant halfwords",
+    fSETHALF(0, VdV.uw[i], fVSATUH((VvV.uw[i] + fCONSTLL(0x8000)) >> 16));
+    fSETHALF(1, VdV.uw[i], fVSATUH((VuV.uw[i] + fCONSTLL(0x8000)) >> 16)))
+
+
+
+
+
+/* HALF TO BYTE*/
+
+ITERATOR_INSN2_SHIFT_SLOT(16,vroundhb,"Vd32=vroundhb(Vu32,Vv32):sat","Vd32.b=vround(Vu32.h,Vv32.h):sat",
+"Vector round words to halves, shuffle resultant halfwords",
+    fSETBYTE(0, VdV.uh[i], fVSATB((VvV.h[i] + 0x80) >> 8));
+    fSETBYTE(1, VdV.uh[i], fVSATB((VuV.h[i] + 0x80) >> 8)))
+
+ITERATOR_INSN2_SHIFT_SLOT(16,vroundhub,"Vd32=vroundhub(Vu32,Vv32):sat","Vd32.ub=vround(Vu32.h,Vv32.h):sat",
+"Vector round words to halves, shuffle resultant halfwords",
+    fSETBYTE(0, VdV.uh[i], fVSATUB((VvV.h[i] + 0x80) >> 8));
+    fSETBYTE(1, VdV.uh[i], fVSATUB((VuV.h[i] + 0x80) >> 8)))
+
+ITERATOR_INSN2_SHIFT_SLOT(16,vrounduhub,"Vd32=vrounduhub(Vu32,Vv32):sat","Vd32.ub=vround(Vu32.uh,Vv32.uh):sat",
+"Vector round words to halves, shuffle resultant halfwords",
+    fSETBYTE(0, VdV.uh[i], fVSATUB((VvV.uh[i] + 0x80) >> 8));
+    fSETBYTE(1, VdV.uh[i], fVSATUB((VuV.uh[i] + 0x80) >> 8)))
+
+
+ITERATOR_INSN2_SHIFT_SLOT(32,vaslw_acc,"Vx32+=vaslw(Vu32,Rt32)","Vx32.w+=vasl(Vu32.w,Rt32)",
+"Vector shift add word",
+    VxV.w[i]  +=  (VuV.w[i] << (RtV & (32-1))))
+
+ITERATOR_INSN2_SHIFT_SLOT(32,vasrw_acc,"Vx32+=vasrw(Vu32,Rt32)","Vx32.w+=vasr(Vu32.w,Rt32)",
+"Vector shift add word",
+    VxV.w[i]  +=  (VuV.w[i] >> (RtV & (32-1))))
+
+ITERATOR_INSN2_SHIFT_SLOT_NOV1(16,vaslh_acc,"Vx32+=vaslh(Vu32,Rt32)","Vx32.h+=vasl(Vu32.h,Rt32)",
+"Vector shift add halfword",
+    VxV.h[i]  +=  (VuV.h[i] << (RtV & (16-1))))
+
+ITERATOR_INSN2_SHIFT_SLOT_NOV1(16,vasrh_acc,"Vx32+=vasrh(Vu32,Rt32)","Vx32.h+=vasr(Vu32.h,Rt32)",
+"Vector shift add halfword",
+    VxV.h[i]  +=  (VuV.h[i] >> (RtV & (16-1))))
+
+/**************************************************************************
+*
+* MMVECTOR ELEMENT-WISE ARITHMETIC
+*
+**************************************************************************/
+
+/**************************************************************************
+* MACROS GO IN MACROS.DEF NOT HERE!!!
+**************************************************************************/
+
+
+#define MMVEC_ABSDIFF(TYPE,TYPE2,DESCR, WIDTH, DEST,SRC)\
+ITERATOR_INSN2_MPY_SLOT(WIDTH, vabsdiff##TYPE,                   "Vd32=vabsdiff"TYPE2"(Vu32,Vv32)" ,"Vd32."#DEST"=vabsdiff(Vu32."#SRC",Vv32."#SRC")" ,     "Vector Absolute of Difference "DESCR,   VdV.DEST[i] = (VuV.SRC[i] > VvV.SRC[i]) ? (VuV.SRC[i] - VvV.SRC[i]) : (VvV.SRC[i] - VuV.SRC[i]))
+
+#define MMVEC_ADDU_SAT(TYPE,TYPE2,DESCR, WIDTH, DEST,SRC)\
+ITERATOR_INSN2_ANY_SLOT(WIDTH, vadd##TYPE##sat,                  "Vd32=vadd"TYPE2"(Vu32,Vv32):sat" ,    "Vd32."#DEST"=vadd(Vu32."#SRC",Vv32."#SRC"):sat",    "Vector Add & Saturate "DESCR,            VdV.DEST[i] = fVUADDSAT(WIDTH,  VuV.SRC[i], VvV.SRC[i]))\
+ITERATOR_INSN2_ANY_SLOT_DOUBLE_VEC(WIDTH, vadd##TYPE##sat_dv,    "Vdd32=vadd"TYPE2"(Vuu32,Vvv32):sat",  "Vdd32."#DEST"=vadd(Vuu32."#SRC",Vvv32."#SRC"):sat", "Double Vector Add & Saturate "DESCR,    VddV.v[0].DEST[i] = fVUADDSAT(WIDTH, VuuV.v[0].SRC[i],VvvV.v[0].SRC[i]); VddV.v[1].DEST[i] = fVUADDSAT(WIDTH, VuuV.v[1].SRC[i],VvvV.v[1].SRC[i]))\
+ITERATOR_INSN2_ANY_SLOT(WIDTH, vsub##TYPE##sat,                  "Vd32=vsub"TYPE2"(Vu32,Vv32):sat",     "Vd32."#DEST"=vsub(Vu32."#SRC",Vv32."#SRC"):sat",    "Vector Add & Saturate "DESCR,            VdV.DEST[i] = fVUSUBSAT(WIDTH,  VuV.SRC[i], VvV.SRC[i]))\
+ITERATOR_INSN2_ANY_SLOT_DOUBLE_VEC(WIDTH, vsub##TYPE##sat_dv,    "Vdd32=vsub"TYPE2"(Vuu32,Vvv32):sat",  "Vdd32."#DEST"=vsub(Vuu32."#SRC",Vvv32."#SRC"):sat", "Double Vector Add & Saturate "DESCR,    VddV.v[0].DEST[i] = fVUSUBSAT(WIDTH, VuuV.v[0].SRC[i],VvvV.v[0].SRC[i]); VddV.v[1].DEST[i] = fVUSUBSAT(WIDTH, VuuV.v[1].SRC[i],VvvV.v[1].SRC[i]))\
+
+#define MMVEC_ADDS_SAT(TYPE,TYPE2,DESCR, WIDTH,DEST,SRC)\
+ITERATOR_INSN2_ANY_SLOT(WIDTH, vadd##TYPE##sat,                  "Vd32=vadd"TYPE2"(Vu32,Vv32):sat" ,    "Vd32."#DEST"=vadd(Vu32."#SRC",Vv32."#SRC"):sat",    "Vector Add & Saturate "DESCR,            VdV.DEST[i] = fVSADDSAT(WIDTH,  VuV.SRC[i],  VvV.SRC[i]))\
+ITERATOR_INSN2_ANY_SLOT_DOUBLE_VEC(WIDTH, vadd##TYPE##sat_dv,    "Vdd32=vadd"TYPE2"(Vuu32,Vvv32):sat",  "Vdd32."#DEST"=vadd(Vuu32."#SRC",Vvv32."#SRC"):sat", "Double Vector Add & Saturate "DESCR,    VddV.v[0].DEST[i] = fVSADDSAT(WIDTH, VuuV.v[0].SRC[i], VvvV.v[0].SRC[i]); VddV.v[1].DEST[i] = fVSADDSAT(WIDTH, VuuV.v[1].SRC[i], VvvV.v[1].SRC[i]))\
+ITERATOR_INSN2_ANY_SLOT(WIDTH, vsub##TYPE##sat,                  "Vd32=vsub"TYPE2"(Vu32,Vv32):sat",     "Vd32."#DEST"=vsub(Vu32."#SRC",Vv32."#SRC"):sat",    "Vector Add & Saturate "DESCR,            VdV.DEST[i] = fVSSUBSAT(WIDTH,  VuV.SRC[i],  VvV.SRC[i]))\
+ITERATOR_INSN2_ANY_SLOT_DOUBLE_VEC(WIDTH, vsub##TYPE##sat_dv,    "Vdd32=vsub"TYPE2"(Vuu32,Vvv32):sat",  "Vdd32."#DEST"=vsub(Vuu32."#SRC",Vvv32."#SRC"):sat", "Double Vector Add & Saturate "DESCR,    VddV.v[0].DEST[i] = fVSSUBSAT(WIDTH, VuuV.v[0].SRC[i], VvvV.v[0].SRC[i]); VddV.v[1].DEST[i] = fVSSUBSAT(WIDTH, VuuV.v[1].SRC[i], VvvV.v[1].SRC[i]))\
+
+#define MMVEC_AVGU(TYPE,TYPE2,DESCR, WIDTH, DEST,SRC)\
+ITERATOR_INSN2_ANY_SLOT(WIDTH,vavg##TYPE,                        "Vd32=vavg"TYPE2"(Vu32,Vv32)",         "Vd32."#DEST"=vavg(Vu32."#SRC",Vv32."#SRC")",        "Vector Average "DESCR,                                      VdV.DEST[i] = fVAVGU(   WIDTH,  VuV.SRC[i], VvV.SRC[i])) \
+ITERATOR_INSN2_ANY_SLOT(WIDTH,vavg##TYPE##rnd,                   "Vd32=vavg"TYPE2"(Vu32,Vv32):rnd",     "Vd32."#DEST"=vavg(Vu32."#SRC",Vv32."#SRC"):rnd",    "Vector Average % Round"DESCR,                               VdV.DEST[i] = fVAVGURND(WIDTH,  VuV.SRC[i], VvV.SRC[i]))
+
+
+
+#define MMVEC_AVGS(TYPE,TYPE2,DESCR, WIDTH, DEST,SRC)\
+ITERATOR_INSN2_ANY_SLOT(WIDTH,vavg##TYPE,                        "Vd32=vavg"TYPE2"(Vu32,Vv32)",          "Vd32."#DEST"=vavg(Vu32."#SRC",Vv32."#SRC")",          "Vector Average "DESCR,                                      VdV.DEST[i]  = fVAVGS(       WIDTH,  VuV.SRC[i], VvV.SRC[i])) \
+ITERATOR_INSN2_ANY_SLOT(WIDTH,vavg##TYPE##rnd,                   "Vd32=vavg"TYPE2"(Vu32,Vv32):rnd",      "Vd32."#DEST"=vavg(Vu32."#SRC",Vv32."#SRC"):rnd",      "Vector Average % Round"DESCR,                               VdV.DEST[i]  = fVAVGSRND(    WIDTH,  VuV.SRC[i], VvV.SRC[i])) \
+ITERATOR_INSN2_ANY_SLOT(WIDTH,vnavg##TYPE,                       "Vd32=vnavg"TYPE2"(Vu32,Vv32)",         "Vd32."#DEST"=vnavg(Vu32."#SRC",Vv32."#SRC")",         "Vector Negative Average "DESCR,                             VdV.DEST[i]  = fVNAVGS(      WIDTH,  VuV.SRC[i], VvV.SRC[i]))
+
+
+
+
+
+
+
+#define MMVEC_ADDWRAP(TYPE,TYPE2, DESCR, WIDTH , DEST,SRC)\
+ITERATOR_INSN2_ANY_SLOT(WIDTH, vadd##TYPE,                  "Vd32=vadd"TYPE2"(Vu32,Vv32)" ,     "Vd32."#DEST"=vadd(Vu32."#SRC",Vv32."#SRC")",    "Vector Add "DESCR,          VdV.DEST[i] =  VuV.SRC[i] +  VvV.SRC[i])\
+ITERATOR_INSN2_ANY_SLOT(WIDTH, vsub##TYPE,                  "Vd32=vsub"TYPE2"(Vu32,Vv32)" ,     "Vd32."#DEST"=vsub(Vu32."#SRC",Vv32."#SRC")",    "Vector Sub "DESCR,          VdV.DEST[i] =  VuV.SRC[i] -  VvV.SRC[i])\
+ITERATOR_INSN2_ANY_SLOT_DOUBLE_VEC(WIDTH, vadd##TYPE##_dv,  "Vdd32=vadd"TYPE2"(Vuu32,Vvv32)" ,  "Vdd32."#DEST"=vadd(Vuu32."#SRC",Vvv32."#SRC")", "Double Vector Add "DESCR,   VddV.v[0].DEST[i] = VuuV.v[0].SRC[i] + VvvV.v[0].SRC[i]; VddV.v[1].DEST[i] = VuuV.v[1].SRC[i] + VvvV.v[1].SRC[i])\
+ITERATOR_INSN2_ANY_SLOT_DOUBLE_VEC(WIDTH, vsub##TYPE##_dv,  "Vdd32=vsub"TYPE2"(Vuu32,Vvv32)" ,  "Vdd32."#DEST"=vsub(Vuu32."#SRC",Vvv32."#SRC")", "Double Vector Sub "DESCR,   VddV.v[0].DEST[i] = VuuV.v[0].SRC[i] - VvvV.v[0].SRC[i]; VddV.v[1].DEST[i] = VuuV.v[1].SRC[i] - VvvV.v[1].SRC[i]) \
+
+
+
+
+
+/* Wrapping Adds */
+MMVEC_ADDWRAP(b,    "b",    "Byte",         8,   b, b);
+MMVEC_ADDWRAP(h,    "h",    "Halfword",     16,  h, h);
+MMVEC_ADDWRAP(w,    "w",    "Word",         32,   w,    w);
+
+/* Saturating Adds */
+MMVEC_ADDU_SAT(ub, "ub",    "Unsigned Byte",        8,   ub,    ub);
+MMVEC_ADDU_SAT(uh, "uh",    "Unsigned Halfword",    16,  uh,    uh);
+MMVEC_ADDU_SAT(uw, "uw",    "Unsigned word",    32,  uw,    uw);
+MMVEC_ADDS_SAT(b,  "b",     "byte",             8,  b,     b);
+MMVEC_ADDS_SAT(h,  "h",     "Halfword",             16,  h,     h);
+MMVEC_ADDS_SAT(w,  "w",     "Word",                 32,  w,     w);
+
+
+/* Averaging Instructions */
+MMVEC_AVGU(ub,"ub",     "Unsigned Byte",     8,   ub,   ub);
+MMVEC_AVGU(uh,"uh",     "Unsigned Halfword", 16,  uh,   uh);
+MMVEC_AVGU_NOV1(uw,"uw",     "Unsigned Word",     32,  uw,   uw);
+MMVEC_AVGS_NOV1(b,   "b",    "Byte",               8,   b,   b);
+MMVEC_AVGS(h,   "h",    "Halfword",          16,   h,   h);
+MMVEC_AVGS(w,   "w",    "Word",              32,   w,   w);
+
+
+/* Absolute Difference */
+MMVEC_ABSDIFF(ub,"ub",  "Unsigned Byte",        8,   ub,    ub);
+MMVEC_ABSDIFF(uh,"uh",  "Unsigned Halfword",    16,  uh,    uh);
+MMVEC_ABSDIFF(h,"h",        "Halfword",             16,  uh,    h);
+MMVEC_ABSDIFF(w,"w",        "Word",                 32,  uw,    w);
+
+ITERATOR_INSN2_ANY_SLOT(8,vnavgub, "Vd32=vnavgub(Vu32,Vv32)", "Vd32.b=vnavg(Vu32.ub,Vv32.ub)",
+"Vector Negative Average Unsigned Byte", VdV.b[i]   = fVNAVGU(8, VuV.ub[i], VvV.ub[i]))
+
+ITERATOR_INSN_ANY_SLOT(32,vaddcarrysat,"Vd32.w=vadd(Vu32.w,Vv32.w,Qs4):carry:sat","add w/carry and saturate",
+VdV.w[i] = fVSATW(VuV.w[i]+VvV.w[i]+fGETQBIT(QsV,i*4)))
+
+ITERATOR_INSN_ANY_SLOT(32,vaddcarry,"Vd32.w=vadd(Vu32.w,Vv32.w,Qx4):carry","add w/carry",
+VdV.w[i] = VuV.w[i]+VvV.w[i]+fGETQBIT(QxV,i*4);
+fSETQBITS(QxV,4,0xF,4*i,-fCARRY_FROM_ADD32(VuV.w[i],VvV.w[i],fGETQBIT(QxV,i*4))))
+
+ITERATOR_INSN_ANY_SLOT(32,vsubcarry,"Vd32.w=vsub(Vu32.w,Vv32.w,Qx4):carry","add w/carry",
+VdV.w[i] = VuV.w[i]+~VvV.w[i]+fGETQBIT(QxV,i*4);
+fSETQBITS(QxV,4,0xF,4*i,-fCARRY_FROM_ADD32(VuV.w[i],~VvV.w[i],fGETQBIT(QxV,i*4))))
+
+ITERATOR_INSN_ANY_SLOT(32,vaddcarryo,"Vd32.w,Qe4=vadd(Vu32.w,Vv32.w):carry","add w/carry out-only",
+VdV.w[i] = VuV.w[i]+VvV.w[i];
+fSETQBITS(QeV,4,0xF,4*i,-fCARRY_FROM_ADD32(VuV.w[i],VvV.w[i],0)))
+
+ITERATOR_INSN_ANY_SLOT(32,vsubcarryo,"Vd32.w,Qe4=vsub(Vu32.w,Vv32.w):carry","subtract w/carry out-only",
+VdV.w[i] = VuV.w[i]+~VvV.w[i]+1;
+fSETQBITS(QeV,4,0xF,4*i,-fCARRY_FROM_ADD32(VuV.w[i],~VvV.w[i],1)))
+
+
+ITERATOR_INSN_ANY_SLOT(32,vsatdw,"Vd32.w=vsatdw(Vu32.w,Vv32.w)","Saturate from 64-bits (higher 32-bits come from first vector) to 32-bits",VdV.w[i] = fVSATDW(VuV.w[i],VvV.w[i]))
+
+
+#define MMVEC_ADDSAT_MIX(TAGEND,SATF,WIDTH,DEST,SRC1,SRC2)\
+ITERATOR_INSN_ANY_SLOT(WIDTH, vadd##TAGEND,"Vd32."#DEST"=vadd(Vu32."#SRC1",Vv32."#SRC2"):sat",    "Vector Add mixed", VdV.DEST[i] =  SATF(VuV.SRC1[i] +  VvV.SRC2[i]))\
+ITERATOR_INSN_ANY_SLOT(WIDTH, vsub##TAGEND,"Vd32."#DEST"=vsub(Vu32."#SRC1",Vv32."#SRC2"):sat",    "Vector Sub mixed", VdV.DEST[i] =  SATF(VuV.SRC1[i] -  VvV.SRC2[i]))\
+
+MMVEC_ADDSAT_MIX(ububb_sat,fVSATUB,8,ub,ub,b)
+
+/****************************
+*   WIDENING
+****************************/
+
+
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vaddubh,"Vdd32=vaddub(Vu32,Vv32)","Vdd32.h=vadd(Vu32.ub,Vv32.ub)",
+"Vector addition with widen into two vectors",
+    VddV.v[0].h[i] = fZE8_16(fGETUBYTE(0, VuV.uh[i])) + fZE8_16(fGETUBYTE(0, VvV.uh[i]));
+    VddV.v[1].h[i] = fZE8_16(fGETUBYTE(1, VuV.uh[i])) + fZE8_16(fGETUBYTE(1, VvV.uh[i])))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vsububh,"Vdd32=vsubub(Vu32,Vv32)","Vdd32.h=vsub(Vu32.ub,Vv32.ub)",
+"Vector subtraction with widen into two vectors",
+    VddV.v[0].h[i] = fZE8_16(fGETUBYTE(0, VuV.uh[i])) - fZE8_16(fGETUBYTE(0, VvV.uh[i]));
+    VddV.v[1].h[i] = fZE8_16(fGETUBYTE(1, VuV.uh[i])) - fZE8_16(fGETUBYTE(1, VvV.uh[i])))
+
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vaddhw,"Vdd32=vaddh(Vu32,Vv32)","Vdd32.w=vadd(Vu32.h,Vv32.h)",
+"Vector addition with widen into two vectors",
+    VddV.v[0].w[i] = fGETHALF(0, VuV.w[i]) + fGETHALF(0, VvV.w[i]);
+    VddV.v[1].w[i] = fGETHALF(1, VuV.w[i]) + fGETHALF(1, VvV.w[i]))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vsubhw,"Vdd32=vsubh(Vu32,Vv32)","Vdd32.w=vsub(Vu32.h,Vv32.h)",
+"Vector subtraction with widen into two vectors",
+    VddV.v[0].w[i] = fGETHALF(0, VuV.w[i]) - fGETHALF(0, VvV.w[i]);
+    VddV.v[1].w[i] = fGETHALF(1, VuV.w[i]) - fGETHALF(1, VvV.w[i]))
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vadduhw,"Vdd32=vadduh(Vu32,Vv32)","Vdd32.w=vadd(Vu32.uh,Vv32.uh)",
+"Vector addition with widen into two vectors",
+    VddV.v[0].w[i] = fZE16_32(fGETUHALF(0, VuV.uw[i])) + fZE16_32(fGETUHALF(0, VvV.uw[i]));
+    VddV.v[1].w[i] = fZE16_32(fGETUHALF(1, VuV.uw[i])) + fZE16_32(fGETUHALF(1, VvV.uw[i])))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vsubuhw,"Vdd32=vsubuh(Vu32,Vv32)","Vdd32.w=vsub(Vu32.uh,Vv32.uh)",
+"Vector subtraction with widen into two vectors",
+    VddV.v[0].w[i] = fZE16_32(fGETUHALF(0, VuV.uw[i])) - fZE16_32(fGETUHALF(0, VvV.uw[i]));
+    VddV.v[1].w[i] = fZE16_32(fGETUHALF(1, VuV.uw[i])) - fZE16_32(fGETUHALF(1, VvV.uw[i])))
+
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vaddhw_acc,"Vxx32+=vaddh(Vu32,Vv32)","Vxx32.w+=vadd(Vu32.h,Vv32.h)",
+"Vector addition with widen into two vectors",
+    VxxV.v[0].w[i] += fGETHALF(0, VuV.w[i]) + fGETHALF(0, VvV.w[i]);
+    VxxV.v[1].w[i] += fGETHALF(1, VuV.w[i]) + fGETHALF(1, VvV.w[i]))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vadduhw_acc,"Vxx32+=vadduh(Vu32,Vv32)","Vxx32.w+=vadd(Vu32.uh,Vv32.uh)",
+"Vector addition with widen into two vectors",
+    VxxV.v[0].w[i] += fGETUHALF(0, VuV.w[i]) + fGETUHALF(0, VvV.w[i]);
+    VxxV.v[1].w[i] += fGETUHALF(1, VuV.w[i]) + fGETUHALF(1, VvV.w[i]))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vaddubh_acc,"Vxx32+=vaddub(Vu32,Vv32)","Vxx32.h+=vadd(Vu32.ub,Vv32.ub)",
+"Vector addition with widen into two vectors",
+    VxxV.v[0].h[i] += fGETUBYTE(0, VuV.h[i]) + fGETUBYTE(0, VvV.h[i]);
+    VxxV.v[1].h[i] += fGETUBYTE(1, VuV.h[i]) + fGETUBYTE(1, VvV.h[i]))
+
+
+DEF_CVI_MAPPING(V6_vd0,  "Vd32=#0",  "Vd32=vxor(V31,V31)")
+DEF_CVI_MAPPING(V6_vdd0,  "Vdd32=#0",  "Vdd32.w=vsub(V31:30.w,V31:30.w)")
+
+
+/****************************
+*   Conditional
+****************************/
+
+#define CONDADDSUB(WIDTH,TAGEND,LHSYN,RHSYN,DESCR,LHBEH,RHBEH) \
+ITERATOR_INSN2_ANY_SLOT(WIDTH,vadd##TAGEND##q,"if (Qv4."#TAGEND") "LHSYN"+="RHSYN,"if (Qv4) "LHSYN"+="RHSYN,DESCR,LHBEH=fCONDMASK##WIDTH(QvV,i,LHBEH+RHBEH,LHBEH)) \
+ITERATOR_INSN2_ANY_SLOT(WIDTH,vsub##TAGEND##q,"if (Qv4."#TAGEND") "LHSYN"-="RHSYN,"if (Qv4) "LHSYN"-="RHSYN,DESCR,LHBEH=fCONDMASK##WIDTH(QvV,i,LHBEH-RHBEH,LHBEH)) \
+ITERATOR_INSN2_ANY_SLOT(WIDTH,vadd##TAGEND##nq,"if (!Qv4."#TAGEND") "LHSYN"+="RHSYN,"if (!Qv4) "LHSYN"+="RHSYN,DESCR,LHBEH=fCONDMASK##WIDTH(QvV,i,LHBEH,LHBEH+RHBEH)) \
+ITERATOR_INSN2_ANY_SLOT(WIDTH,vsub##TAGEND##nq,"if (!Qv4."#TAGEND") "LHSYN"-="RHSYN,"if (!Qv4) "LHSYN"-="RHSYN,DESCR,LHBEH=fCONDMASK##WIDTH(QvV,i,LHBEH,LHBEH-RHBEH)) \
+
+CONDADDSUB(8,b,"Vx32.b","Vu32.b","Conditional add/sub Byte",VxV.ub[i],VuV.ub[i])
+CONDADDSUB(16,h,"Vx32.h","Vu32.h","Conditional add/sub Half",VxV.h[i],VuV.h[i])
+CONDADDSUB(32,w,"Vx32.w","Vu32.w","Conditional add/sub Word",VxV.w[i],VuV.w[i])
+
+/*****************************************************
+ ABSOLUTE VALUES
+*****************************************************/
+// V65
+ITERATOR_INSN2_ANY_SLOT_NOV1(8,vabsb,        "Vd32=vabsb(Vu32)",     "Vd32.b=vabs(Vu32.b)",     "Vector absolute value of bytes",    VdV.b[i]  =  fABS(VuV.b[i]))
+ITERATOR_INSN2_ANY_SLOT_NOV1(8,vabsb_sat,    "Vd32=vabsb(Vu32):sat", "Vd32.b=vabs(Vu32.b):sat", "Vector absolute value of bytes",    VdV.b[i]  =  fVSATB(fABS(fSE8_16(VuV.b[i]))))
+
+
+ITERATOR_INSN2_ANY_SLOT(16,vabsh,        "Vd32=vabsh(Vu32)",     "Vd32.h=vabs(Vu32.h)",     "Vector absolute value of halfwords",    VdV.h[i]  =  fABS(VuV.h[i]))
+ITERATOR_INSN2_ANY_SLOT(16,vabsh_sat,    "Vd32=vabsh(Vu32):sat", "Vd32.h=vabs(Vu32.h):sat", "Vector absolute value of halfwords",    VdV.h[i]  =  fVSATH(fABS(fSE16_32(VuV.h[i]))))
+ITERATOR_INSN2_ANY_SLOT(32,vabsw,        "Vd32=vabsw(Vu32)",     "Vd32.w=vabs(Vu32.w)",     "Vector absolute value of words",        VdV.w[i]  =  fABS(VuV.w[i]))
+ITERATOR_INSN2_ANY_SLOT(32,vabsw_sat,    "Vd32=vabsw(Vu32):sat", "Vd32.w=vabs(Vu32.w):sat", "Vector absolute value of words",        VdV.w[i]  =  fVSATW(fABS(fSE32_64(VuV.w[i]))))
+
+
+DEF_CVI_MAPPING(V6_vabsub_alt,  "Vd32.ub=vabs(Vu32.b)",  "Vd32.b=vabs(Vu32.b)")
+DEF_CVI_MAPPING(V6_vabsuh_alt,  "Vd32.uh=vabs(Vu32.h)",  "Vd32.h=vabs(Vu32.h)")
+DEF_CVI_MAPPING(V6_vabsuw_alt,  "Vd32.uw=vabs(Vu32.w)",  "Vd32.w=vabs(Vu32.w)")
+
+
+
+
+/**************************************************************************
+ * MMVECTOR MULTIPLICATIONS
+ * ************************************************************************/
+
+
+/* Byte by Byte */
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vmpybv,"Vdd32=vmpyb(Vu32,Vv32)","Vdd32.h=vmpy(Vu32.b,Vv32.b)",
+"Vector absolute value of words",
+    VddV.v[0].h[i] =  fMPY8SS(fGETBYTE(0, VuV.h[i]), fGETBYTE(0, VvV.h[i]));
+    VddV.v[1].h[i] =  fMPY8SS(fGETBYTE(1, VuV.h[i]), fGETBYTE(1, VvV.h[i])))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vmpybv_acc,"Vxx32+=vmpyb(Vu32,Vv32)","Vxx32.h+=vmpy(Vu32.b,Vv32.b)",
+"Vector absolute value of words",
+    VxxV.v[0].h[i] +=  fMPY8SS(fGETBYTE(0, VuV.h[i]), fGETBYTE(0, VvV.h[i]));
+    VxxV.v[1].h[i] +=  fMPY8SS(fGETBYTE(1, VuV.h[i]), fGETBYTE(1, VvV.h[i])))
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vmpyubv,"Vdd32=vmpyub(Vu32,Vv32)","Vdd32.uh=vmpy(Vu32.ub,Vv32.ub)",
+"Vector absolute value of words",
+    VddV.v[0].uh[i] =  fMPY8UU(fGETUBYTE(0, VuV.uh[i]), fGETUBYTE(0, VvV.uh[i]) );
+    VddV.v[1].uh[i] =  fMPY8UU(fGETUBYTE(1, VuV.uh[i]), fGETUBYTE(1, VvV.uh[i]) ))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vmpyubv_acc,"Vxx32+=vmpyub(Vu32,Vv32)","Vxx32.uh+=vmpy(Vu32.ub,Vv32.ub)",
+"Vector absolute value of words",
+    VxxV.v[0].uh[i] +=  fMPY8UU(fGETUBYTE(0, VuV.uh[i]), fGETUBYTE(0, VvV.uh[i]) );
+    VxxV.v[1].uh[i] +=  fMPY8UU(fGETUBYTE(1, VuV.uh[i]), fGETUBYTE(1, VvV.uh[i]) ))
+
+
+
+
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vmpybusv,"Vdd32=vmpybus(Vu32,Vv32)","Vdd32.h=vmpy(Vu32.ub,Vv32.b)",
+"Vector absolute value of words",
+    VddV.v[0].h[i]  = fMPY8US(fGETUBYTE(0, VuV.uh[i]), fGETBYTE(0, VvV.h[i]));
+    VddV.v[1].h[i]  = fMPY8US(fGETUBYTE(1, VuV.uh[i]), fGETBYTE(1, VvV.h[i])))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vmpybusv_acc,"Vxx32+=vmpybus(Vu32,Vv32)","Vxx32.h+=vmpy(Vu32.ub,Vv32.b)",
+"Vector absolute value of words",
+    VxxV.v[0].h[i]  += fMPY8US(fGETUBYTE(0, VuV.uh[i]), fGETBYTE(0, VvV.h[i]));
+    VxxV.v[1].h[i]  += fMPY8US(fGETUBYTE(1, VuV.uh[i]), fGETBYTE(1, VvV.h[i])))
+
+
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vmpabusv,"Vdd32=vmpabus(Vuu32,Vvv32)","Vdd32.h=vmpa(Vuu32.ub,Vvv32.b)",
+"Vertical Byte Multiply",
+    VddV.v[0].h[i] = fMPY8US(fGETUBYTE(0, VuuV.v[0].uh[i]), fGETBYTE(0, VvvV.v[0].uh[i])) + fMPY8US(fGETUBYTE(0, VuuV.v[1].uh[i]), fGETBYTE(0, VvvV.v[1].uh[i]));
+    VddV.v[1].h[i] = fMPY8US(fGETUBYTE(1, VuuV.v[0].uh[i]), fGETBYTE(1, VvvV.v[0].uh[i])) + fMPY8US(fGETUBYTE(1, VuuV.v[1].uh[i]), fGETBYTE(1, VvvV.v[1].uh[i])))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vmpabuuv,"Vdd32=vmpabuu(Vuu32,Vvv32)","Vdd32.h=vmpa(Vuu32.ub,Vvv32.ub)",
+"Vertical Byte Multiply",
+    VddV.v[0].h[i] = fMPY8UU(fGETUBYTE(0, VuuV.v[0].uh[i]), fGETUBYTE(0, VvvV.v[0].uh[i])) + fMPY8UU(fGETUBYTE(0, VuuV.v[1].uh[i]), fGETUBYTE(0, VvvV.v[1].uh[i]));
+    VddV.v[1].h[i] = fMPY8UU(fGETUBYTE(1, VuuV.v[0].uh[i]), fGETUBYTE(1, VvvV.v[0].uh[i])) + fMPY8UU(fGETUBYTE(1, VuuV.v[1].uh[i]), fGETUBYTE(1, VvvV.v[1].uh[i])))
+
+
+
+
+
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyhv,"Vdd32=vmpyh(Vu32,Vv32)","Vdd32.w=vmpy(Vu32.h,Vv32.h)",
+"Vector by Vector Halfword Multiply",
+    VddV.v[0].w[i] = fMPY16SS(fGETHALF(0, VuV.w[i]), fGETHALF(0, VvV.w[i]));
+    VddV.v[1].w[i] = fMPY16SS(fGETHALF(1, VuV.w[i]), fGETHALF(1, VvV.w[i])))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyhv_acc,"Vxx32+=vmpyh(Vu32,Vv32)","Vxx32.w+=vmpy(Vu32.h,Vv32.h)",
+"Vector by Vector Halfword Multiply",
+    VxxV.v[0].w[i] += fMPY16SS(fGETHALF(0, VuV.w[i]), fGETHALF(0, VvV.w[i]));
+    VxxV.v[1].w[i] += fMPY16SS(fGETHALF(1, VuV.w[i]), fGETHALF(1, VvV.w[i])))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyuhv,"Vdd32=vmpyuh(Vu32,Vv32)","Vdd32.uw=vmpy(Vu32.uh,Vv32.uh)",
+"Vector by Vector Unsigned Halfword Multiply",
+    VddV.v[0].uw[i] = fMPY16UU(fGETUHALF(0, VuV.uw[i]), fGETUHALF(0, VvV.uw[i]));
+    VddV.v[1].uw[i] = fMPY16UU(fGETUHALF(1, VuV.uw[i]), fGETUHALF(1, VvV.uw[i])))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyuhv_acc,"Vxx32+=vmpyuh(Vu32,Vv32)","Vxx32.uw+=vmpy(Vu32.uh,Vv32.uh)",
+"Vector by Vector Unsigned Halfword Multiply",
+    VxxV.v[0].uw[i] += fMPY16UU(fGETUHALF(0, VuV.uw[i]), fGETUHALF(0, VvV.uw[i]));
+    VxxV.v[1].uw[i] += fMPY16UU(fGETUHALF(1, VuV.uw[i]), fGETUHALF(1, VvV.uw[i])))
+
+
+
+/* Vector by Vector */
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vmpyhvsrs,"Vd32=vmpyh(Vu32,Vv32):<<1:rnd:sat","Vd32.h=vmpy(Vu32.h,Vv32.h):<<1:rnd:sat",
+"Vector halfword multiply with round, shift, and sat16",
+    VdV.h[i] = fVSATH(fGETHALF(1,fVSAT(fROUND((fMPY16SS(VuV.h[i],VvV.h[i]    )<<1))))))
+
+
+
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyhus, "Vdd32=vmpyhus(Vu32,Vv32)","Vdd32.w=vmpy(Vu32.h,Vv32.uh)",
+"Vector by Vector Halfword Multiply",
+    VddV.v[0].w[i] = fMPY16SU(fGETHALF(0, VuV.w[i]), fGETUHALF(0, VvV.uw[i]));
+    VddV.v[1].w[i] = fMPY16SU(fGETHALF(1, VuV.w[i]), fGETUHALF(1, VvV.uw[i])))
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyhus_acc, "Vxx32+=vmpyhus(Vu32,Vv32)","Vxx32.w+=vmpy(Vu32.h,Vv32.uh)",
+"Vector by Vector Halfword Multiply",
+    VxxV.v[0].w[i] += fMPY16SU(fGETHALF(0, VuV.w[i]), fGETUHALF(0, VvV.uw[i]));
+    VxxV.v[1].w[i] += fMPY16SU(fGETHALF(1, VuV.w[i]), fGETUHALF(1, VvV.uw[i])))
+
+
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vmpyih,"Vd32=vmpyih(Vu32,Vv32)","Vd32.h=vmpyi(Vu32.h,Vv32.h)",
+"Vector by Vector Halfword Multiply",
+    VdV.h[i] = fMPY16SS(VuV.h[i], VvV.h[i]))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vmpyih_acc,"Vx32+=vmpyih(Vu32,Vv32)","Vx32.h+=vmpyi(Vu32.h,Vv32.h)",
+"Vector by Vector Halfword Multiply",
+    VxV.h[i] += fMPY16SS(VuV.h[i], VvV.h[i]))
+
+
+
+/* 32x32 high half / frac */
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyewuh,"Vd32=vmpyewuh(Vu32,Vv32)","Vd32.w=vmpye(Vu32.w,Vv32.uh)",
+"Vector by Vector Halfword Multiply",
+VdV.w[i] = fMPY3216SU(VuV.w[i], fGETUHALF(0, VvV.w[i])) >> 16)
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyowh,"Vd32=vmpyowh(Vu32,Vv32):<<1:sat","Vd32.w=vmpyo(Vu32.w,Vv32.h):<<1:sat",
+"Vector by Vector Halfword Multiply",
+VdV.w[i] = fVSATW((((fMPY3216SS(VuV.w[i], fGETHALF(1, VvV.w[i])) >> 14) + 0) >> 1)))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyowh_rnd,"Vd32=vmpyowh(Vu32,Vv32):<<1:rnd:sat","Vd32.w=vmpyo(Vu32.w,Vv32.h):<<1:rnd:sat",
+"Vector by Vector Halfword Multiply",
+VdV.w[i] = fVSATW((((fMPY3216SS(VuV.w[i], fGETHALF(1, VvV.w[i])) >> 14) + 1) >> 1)))
+
+ITERATOR_INSN_MPY_SLOT_DOUBLE_VEC(32,vmpyewuh_64,"Vdd32=vmpye(Vu32.w,Vv32.uh)",
+"Word times Halfword Multiply, 64-bit result",
+	fHIDE(size8s_t prod;)
+	prod = fMPY32SU(VuV.w[i],fGETUHALF(0,VvV.w[i]));
+	VddV.v[1].w[i] = prod >> 16;
+	VddV.v[0].w[i] = prod << 16)
+
+ITERATOR_INSN_MPY_SLOT_DOUBLE_VEC(32,vmpyowh_64_acc,"Vxx32+=vmpyo(Vu32.w,Vv32.h)",
+"Word times Halfword Multiply, 64-bit result",
+	fHIDE(size8s_t prod;)
+	prod = fMPY32SS(VuV.w[i],fGETHALF(1,VvV.w[i]))  + fSE32_64(VxxV.v[1].w[i]);
+	VxxV.v[1].w[i] = prod >> 16;
+	fSETHALF(0, VxxV.v[0].w[i], VxxV.v[0].w[i] >> 16);
+	fSETHALF(1, VxxV.v[0].w[i], prod & 0x0000ffff))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyowh_sacc,"Vx32+=vmpyowh(Vu32,Vv32):<<1:sat:shift","Vx32.w+=vmpyo(Vu32.w,Vv32.h):<<1:sat:shift",
+"Vector by Vector Halfword Multiply",
+IV1DEAD() VxV.w[i] = fVSATW(((((VxV.w[i] + fMPY3216SS(VuV.w[i], fGETHALF(1, VvV.w[i]))) >> 14) + 0) >> 1)))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyowh_rnd_sacc,"Vx32+=vmpyowh(Vu32,Vv32):<<1:rnd:sat:shift","Vx32.w+=vmpyo(Vu32.w,Vv32.h):<<1:rnd:sat:shift",
+"Vector by Vector Halfword Multiply",
+IV1DEAD() VxV.w[i] = fVSATW(((((VxV.w[i] + fMPY3216SS(VuV.w[i], fGETHALF(1, VvV.w[i]))) >> 14) + 1) >> 1)))
+
+/* For 32x32 integer / low half */
+
+ITERATOR_INSN_MPY_SLOT(32,vmpyieoh,"Vd32.w=vmpyieo(Vu32.h,Vv32.h)","Odd/Even multiply for 32x32 low half",
+	VdV.w[i] = (fGETHALF(0,VuV.w[i])*fGETHALF(1,VvV.w[i])) << 16)
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyiewuh,"Vd32=vmpyiewuh(Vu32,Vv32)","Vd32.w=vmpyie(Vu32.w,Vv32.uh)",
+"Vector by Vector Word by Halfword Multiply",
+IV1DEAD()    VdV.w[i] = fMPY3216SU(VuV.w[i], fGETUHALF(0, VvV.w[i])) )
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyiowh,"Vd32=vmpyiowh(Vu32,Vv32)","Vd32.w=vmpyio(Vu32.w,Vv32.h)",
+"Vector by Vector Word by Halfword Multiply",
+IV1DEAD()    VdV.w[i] = fMPY3216SS(VuV.w[i], fGETHALF(1, VvV.w[i])) )
+
+/* Add back these... */
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyiewh_acc,"Vx32+=vmpyiewh(Vu32,Vv32)","Vx32.w+=vmpyie(Vu32.w,Vv32.h)",
+"Vector by Vector Word by Halfword Multiply",
+VxV.w[i] = VxV.w[i] + fMPY3216SS(VuV.w[i], fGETHALF(0, VvV.w[i])) )
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyiewuh_acc,"Vx32+=vmpyiewuh(Vu32,Vv32)","Vx32.w+=vmpyie(Vu32.w,Vv32.uh)",
+"Vector by Vector Word by Halfword Multiply",
+VxV.w[i] = VxV.w[i] + fMPY3216SU(VuV.w[i], fGETUHALF(0, VvV.w[i])) )
+
+
+
+
+
+
+
+/* Vector by Scalar */
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vmpyub,"Vdd32=vmpyub(Vu32,Rt32)","Vdd32.uh=vmpy(Vu32.ub,Rt32.ub)",
+"Vector absolute value of words",
+    VddV.v[0].uh[i]  = fMPY8UU(fGETUBYTE(0, VuV.uh[i]), fGETUBYTE((2*i+0)%4, RtV));
+    VddV.v[1].uh[i]  = fMPY8UU(fGETUBYTE(1, VuV.uh[i]), fGETUBYTE((2*i+1)%4, RtV)))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vmpyub_acc,"Vxx32+=vmpyub(Vu32,Rt32)","Vxx32.uh+=vmpy(Vu32.ub,Rt32.ub)",
+"Vector absolute value of words",
+    VxxV.v[0].uh[i] += fMPY8UU(fGETUBYTE(0, VuV.uh[i]), fGETUBYTE((2*i+0)%4, RtV));
+    VxxV.v[1].uh[i] += fMPY8UU(fGETUBYTE(1, VuV.uh[i]), fGETUBYTE((2*i+1)%4, RtV)))
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vmpybus,"Vdd32=vmpybus(Vu32,Rt32)","Vdd32.h=vmpy(Vu32.ub,Rt32.b)",
+"Vector absolute value of words",
+    VddV.v[0].h[i]  = fMPY8US(fGETUBYTE(0, VuV.uh[i]), fGETBYTE((2*i+0)%4, RtV));
+    VddV.v[1].h[i]  = fMPY8US(fGETUBYTE(1, VuV.uh[i]), fGETBYTE((2*i+1)%4, RtV)))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vmpybus_acc,"Vxx32+=vmpybus(Vu32,Rt32)","Vxx32.h+=vmpy(Vu32.ub,Rt32.b)",
+"Vector absolute value of words",
+    VxxV.v[0].h[i] += fMPY8US(fGETUBYTE(0, VuV.uh[i]), fGETBYTE((2*i+0)%4, RtV));
+    VxxV.v[1].h[i] += fMPY8US(fGETUBYTE(1, VuV.uh[i]), fGETBYTE((2*i+1)%4, RtV)))
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vmpabus,"Vdd32=vmpabus(Vuu32,Rt32)","Vdd32.h=vmpa(Vuu32.ub,Rt32.b)",
+"Vertical Byte Multiply",
+    VddV.v[0].h[i] = fMPY8US(fGETUBYTE(0, VuuV.v[0].uh[i]), fGETBYTE(0, RtV)) + fMPY16SS(fGETUBYTE(0, VuuV.v[1].uh[i]), fGETBYTE(1, RtV));
+    VddV.v[1].h[i] = fMPY8US(fGETUBYTE(1, VuuV.v[0].uh[i]), fGETBYTE(2, RtV)) + fMPY16SS(fGETUBYTE(1, VuuV.v[1].uh[i]), fGETBYTE(3, RtV)))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vmpabus_acc,"Vxx32+=vmpabus(Vuu32,Rt32)","Vxx32.h+=vmpa(Vuu32.ub,Rt32.b)",
+"Vertical Byte Multiply",
+    VxxV.v[0].h[i] += fMPY8US(fGETUBYTE(0, VuuV.v[0].uh[i]), fGETBYTE(0, RtV)) + fMPY16SS(fGETUBYTE(0, VuuV.v[1].uh[i]), fGETBYTE(1, RtV));
+    VxxV.v[1].h[i] += fMPY8US(fGETUBYTE(1, VuuV.v[0].uh[i]), fGETBYTE(2, RtV)) + fMPY16SS(fGETUBYTE(1, VuuV.v[1].uh[i]), fGETBYTE(3, RtV)))
+
+// V65
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC_NOV1(16,vmpabuu,"Vdd32=vmpabuu(Vuu32,Rt32)","Vdd32.h=vmpa(Vuu32.ub,Rt32.ub)",
+"Vertical Byte Multiply",
+    VddV.v[0].uh[i] = fMPY8UU(fGETUBYTE(0, VuuV.v[0].uh[i]), fGETUBYTE(0, RtV)) + fMPY8UU(fGETUBYTE(0, VuuV.v[1].uh[i]), fGETUBYTE(1, RtV));
+    VddV.v[1].uh[i] = fMPY8UU(fGETUBYTE(1, VuuV.v[0].uh[i]), fGETUBYTE(2, RtV)) + fMPY8UU(fGETUBYTE(1, VuuV.v[1].uh[i]), fGETUBYTE(3, RtV)))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC_NOV1(16,vmpabuu_acc,"Vxx32+=vmpabuu(Vuu32,Rt32)","Vxx32.h+=vmpa(Vuu32.ub,Rt32.ub)",
+"Vertical Byte Multiply",
+    VxxV.v[0].uh[i] += fMPY8UU(fGETUBYTE(0, VuuV.v[0].uh[i]), fGETUBYTE(0, RtV)) + fMPY8UU(fGETUBYTE(0, VuuV.v[1].uh[i]), fGETUBYTE(1, RtV));
+    VxxV.v[1].uh[i] += fMPY8UU(fGETUBYTE(1, VuuV.v[0].uh[i]), fGETUBYTE(2, RtV)) + fMPY8UU(fGETUBYTE(1, VuuV.v[1].uh[i]), fGETUBYTE(3, RtV)))
+
+
+
+
+/* Half by Byte */
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpahb,"Vdd32=vmpahb(Vuu32,Rt32)","Vdd32.w=vmpa(Vuu32.h,Rt32.b)",
+"Vertical Byte Multiply",
+    VddV.v[0].w[i] = fMPY16SS(fGETHALF(0, VuuV.v[0].w[i]), fSE8_16(fGETBYTE(0, RtV))) + fMPY16SS(fGETHALF(0, VuuV.v[1].w[i]), fSE8_16(fGETBYTE(1, RtV)));
+    VddV.v[1].w[i] = fMPY16SS(fGETHALF(1, VuuV.v[0].w[i]), fSE8_16(fGETBYTE(2, RtV))) + fMPY16SS(fGETHALF(1, VuuV.v[1].w[i]), fSE8_16(fGETBYTE(3, RtV))))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpahb_acc,"Vxx32+=vmpahb(Vuu32,Rt32)","Vxx32.w+=vmpa(Vuu32.h,Rt32.b)",
+"Vertical Byte Multiply",
+    VxxV.v[0].w[i] += fMPY16SS(fGETHALF(0, VuuV.v[0].w[i]), fSE8_16(fGETBYTE(0, RtV))) + fMPY16SS(fGETHALF(0, VuuV.v[1].w[i]), fSE8_16(fGETBYTE(1, RtV)));
+    VxxV.v[1].w[i] += fMPY16SS(fGETHALF(1, VuuV.v[0].w[i]), fSE8_16(fGETBYTE(2, RtV))) + fMPY16SS(fGETHALF(1, VuuV.v[1].w[i]), fSE8_16(fGETBYTE(3, RtV))))
+
+/* Half by Byte */
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpauhb,"Vdd32=vmpauhb(Vuu32,Rt32)","Vdd32.w=vmpa(Vuu32.uh,Rt32.b)",
+"Vertical Byte Multiply",
+    VddV.v[0].w[i] = fMPY16US(fGETUHALF(0, VuuV.v[0].w[i]), fSE8_16(fGETBYTE(0, RtV))) + fMPY16US(fGETUHALF(0, VuuV.v[1].w[i]), fSE8_16(fGETBYTE(1, RtV)));
+    VddV.v[1].w[i] = fMPY16US(fGETUHALF(1, VuuV.v[0].w[i]), fSE8_16(fGETBYTE(2, RtV))) + fMPY16US(fGETUHALF(1, VuuV.v[1].w[i]), fSE8_16(fGETBYTE(3, RtV))))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpauhb_acc,"Vxx32+=vmpauhb(Vuu32,Rt32)","Vxx32.w+=vmpa(Vuu32.uh,Rt32.b)",
+"Vertical Byte Multiply",
+    VxxV.v[0].w[i] += fMPY16US(fGETUHALF(0, VuuV.v[0].w[i]), fSE8_16(fGETBYTE(0, RtV))) + fMPY16US(fGETUHALF(0, VuuV.v[1].w[i]), fSE8_16(fGETBYTE(1, RtV)));
+    VxxV.v[1].w[i] += fMPY16US(fGETUHALF(1, VuuV.v[0].w[i]), fSE8_16(fGETBYTE(2, RtV))) + fMPY16US(fGETUHALF(1, VuuV.v[1].w[i]), fSE8_16(fGETBYTE(3, RtV))))
+
+
+
+
+
+
+
+/* Half by Half */
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyh,"Vdd32=vmpyh(Vu32,Rt32)","Vdd32.w=vmpy(Vu32.h,Rt32.h)",
+"Vector absolute value of words",
+    VddV.v[0].w[i] =  fMPY16SS(fGETHALF(0, VuV.w[i]), fGETHALF(0, RtV));
+    VddV.v[1].w[i] =  fMPY16SS(fGETHALF(1, VuV.w[i]), fGETHALF(1, RtV)))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC_NOV1(32,vmpyh_acc,"Vxx32+=vmpyh(Vu32,Rt32)","Vxx32.w+=vmpy(Vu32.h,Rt32.h)",
+"Vector even halfwords with scalar lower halfword multiply with shift and sat32",
+    VxxV.v[0].w[i] =  fCAST8s(VxxV.v[0].w[i]) + fMPY16SS(fGETHALF(0, VuV.w[i]), fGETHALF(0, RtV));
+    VxxV.v[1].w[i] =  fCAST8s(VxxV.v[1].w[i]) + fMPY16SS(fGETHALF(1, VuV.w[i]), fGETHALF(1, RtV)))
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyhsat_acc,"Vxx32+=vmpyh(Vu32,Rt32):sat","Vxx32.w+=vmpy(Vu32.h,Rt32.h):sat",
+"Vector even halfwords with scalar lower halfword multiply with shift and sat32",
+    VxxV.v[0].w[i] =  fVSATW(fCAST8s(VxxV.v[0].w[i]) + fMPY16SS(fGETHALF(0, VuV.w[i]), fGETHALF(0, RtV)));
+    VxxV.v[1].w[i] =  fVSATW(fCAST8s(VxxV.v[1].w[i]) + fMPY16SS(fGETHALF(1, VuV.w[i]), fGETHALF(1, RtV))))
+
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyhss,"Vd32=vmpyh(Vu32,Rt32):<<1:sat","Vd32.h=vmpy(Vu32.h,Rt32.h):<<1:sat",
+"Vector halfword by halfword multiply, shift by 1, and take upper 16 msb",
+          fSETHALF(0,VdV.w[i],fVSATH(fGETHALF(1,fVSAT((fMPY16SS(fGETHALF(0,VuV.w[i]),fGETHALF(0,RtV))<<1)))));
+          fSETHALF(1,VdV.w[i],fVSATH(fGETHALF(1,fVSAT((fMPY16SS(fGETHALF(1,VuV.w[i]),fGETHALF(1,RtV))<<1)))));
+)
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyhsrs,"Vd32=vmpyh(Vu32,Rt32):<<1:rnd:sat","Vd32.h=vmpy(Vu32.h,Rt32.h):<<1:rnd:sat",
+"Vector halfword with scalar halfword multiply with round, shift, and sat16",
+       fSETHALF(0,VdV.w[i],fVSATH(fGETHALF(1,fVSAT(fROUND((fMPY16SS(fGETHALF(0,VuV.w[i]),fGETHALF(0,RtV))<<1))))));
+       fSETHALF(1,VdV.w[i],fVSATH(fGETHALF(1,fVSAT(fROUND((fMPY16SS(fGETHALF(1,VuV.w[i]),fGETHALF(1,RtV))<<1))))));
+)
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyuh,"Vdd32=vmpyuh(Vu32,Rt32)","Vdd32.uw=vmpy(Vu32.uh,Rt32.uh)",
+"Vector even halfword unsigned multiply by scalar",
+    VddV.v[0].uw[i] = fMPY16UU(fGETUHALF(0, VuV.uw[i]),fGETUHALF(0,RtV));
+    VddV.v[1].uw[i] = fMPY16UU(fGETUHALF(1, VuV.uw[i]),fGETUHALF(1,RtV)))
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyuh_acc,"Vxx32+=vmpyuh(Vu32,Rt32)","Vxx32.uw+=vmpy(Vu32.uh,Rt32.uh)",
+"Vector even halfword unsigned multiply by scalar",
+    VxxV.v[0].uw[i] += fMPY16UU(fGETUHALF(0, VuV.uw[i]),fGETUHALF(0,RtV));
+    VxxV.v[1].uw[i] += fMPY16UU(fGETUHALF(1, VuV.uw[i]),fGETUHALF(1,RtV)))
+
+
+
+
+/********************************************
+*  HALF BY BYTE
+********************************************/
+ITERATOR_INSN2_MPY_SLOT(16,vmpyihb,"Vd32=vmpyihb(Vu32,Rt32)","Vd32.h=vmpyi(Vu32.h,Rt32.b)",
+"Vector word by byte multiply, keep lower result",
+VdV.h[i]  = fMPY16SS(VuV.h[i], fGETBYTE(i % 4, RtV) ))
+
+ITERATOR_INSN2_MPY_SLOT(16,vmpyihb_acc,"Vx32+=vmpyihb(Vu32,Rt32)","Vx32.h+=vmpyi(Vu32.h,Rt32.b)",
+"Vector word by byte multiply, keep lower result",
+VxV.h[i] += fMPY16SS(VuV.h[i], fGETBYTE(i % 4, RtV) ))
+
+
+/********************************************
+*  WORD BY BYTE
+********************************************/
+ITERATOR_INSN2_MPY_SLOT(32,vmpyiwb,"Vd32=vmpyiwb(Vu32,Rt32)","Vd32.w=vmpyi(Vu32.w,Rt32.b)",
+"Vector word by byte multiply, keep lower result",
+VdV.w[i]  = fMPY32SS(VuV.w[i], fGETBYTE(i % 4, RtV) ))
+
+ITERATOR_INSN2_MPY_SLOT(32,vmpyiwb_acc,"Vx32+=vmpyiwb(Vu32,Rt32)","Vx32.w+=vmpyi(Vu32.w,Rt32.b)",
+"Vector word by byte multiply, keep lower result",
+VxV.w[i] += fMPY32SS(VuV.w[i], fGETBYTE(i % 4, RtV) ))
+
+ITERATOR_INSN2_MPY_SLOT(32,vmpyiwub,"Vd32=vmpyiwub(Vu32,Rt32)","Vd32.w=vmpyi(Vu32.w,Rt32.ub)",
+"Vector word by byte multiply, keep lower result",
+VdV.w[i]  = fMPY32SS(VuV.w[i], fGETUBYTE(i % 4, RtV) ))
+
+ITERATOR_INSN2_MPY_SLOT(32,vmpyiwub_acc,"Vx32+=vmpyiwub(Vu32,Rt32)","Vx32.w+=vmpyi(Vu32.w,Rt32.ub)",
+"Vector word by byte multiply, keep lower result",
+VxV.w[i] += fMPY32SS(VuV.w[i], fGETUBYTE(i % 4, RtV) ))
+
+
+/********************************************
+*  WORD BY HALF
+********************************************/
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyiwh,"Vd32=vmpyiwh(Vu32,Rt32)","Vd32.w=vmpyi(Vu32.w,Rt32.h)",
+"Vector word by byte multiply, keep lower result",
+VdV.w[i]  = fMPY32SS(VuV.w[i], fGETHALF(i % 2, RtV)))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyiwh_acc,"Vx32+=vmpyiwh(Vu32,Rt32)","Vx32.w+=vmpyi(Vu32.w,Rt32.h)",
+"Vector word by byte multiply, keep lower result",
+VxV.w[i] += fMPY32SS(VuV.w[i], fGETHALF(i % 2, RtV)))
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+/**************************************************************************
+ * MMVECTOR LOGICAL OPERATIONS
+ * ************************************************************************/
+ITERATOR_INSN_ANY_SLOT(16,vand,"Vd32=vand(Vu32,Vv32)", "Vector Logical And", VdV.uh[i] = VuV.uh[i] & VvV.h[i])
+ITERATOR_INSN_ANY_SLOT(16,vor, "Vd32=vor(Vu32,Vv32)",  "Vector Logical Or", VdV.uh[i] = VuV.uh[i] | VvV.h[i])
+ITERATOR_INSN_ANY_SLOT(16,vxor,"Vd32=vxor(Vu32,Vv32)", "Vector Logical XOR",    VdV.uh[i] = VuV.uh[i] ^ VvV.h[i])
+ITERATOR_INSN_ANY_SLOT(16,vnot,"Vd32=vnot(Vu32)",     "Vector Logical NOT", VdV.uh[i] = ~VuV.uh[i])
+
+
+
+
+
+ITERATOR_INSN2_MPY_SLOT_LATE(8, vandqrt,
+"Vd32.ub=vand(Qu4.ub,Rt32.ub)", "Vd32=vand(Qu4,Rt32)", "Insert Predicate into Vector",
+    VdV.ub[i] = fGETQBIT(QuV,i) ? fGETUBYTE(i % 4, RtV) : 0)
+
+ITERATOR_INSN2_MPY_SLOT_LATE(8, vandqrt_acc,
+"Vx32.ub|=vand(Qu4.ub,Rt32.ub)", "Vx32|=vand(Qu4,Rt32)",  "Insert Predicate into Vector",
+    VxV.ub[i] |= (fGETQBIT(QuV,i)) ? fGETUBYTE(i % 4, RtV) : 0)
+
+ITERATOR_INSN2_MPY_SLOT_LATE(8, vandnqrt,
+"Vd32.ub=vand(!Qu4.ub,Rt32.ub)", "Vd32=vand(!Qu4,Rt32)", "Insert Predicate into Vector",
+    VdV.ub[i] = !fGETQBIT(QuV,i) ? fGETUBYTE(i % 4, RtV) : 0)
+
+ITERATOR_INSN2_MPY_SLOT_LATE(8, vandnqrt_acc,
+"Vx32.ub|=vand(!Qu4.ub,Rt32.ub)", "Vx32|=vand(!Qu4,Rt32)",  "Insert Predicate into Vector",
+    VxV.ub[i] |= !(fGETQBIT(QuV,i)) ? fGETUBYTE(i % 4, RtV) : 0)
+
+
+ITERATOR_INSN2_MPY_SLOT_LATE(8, vandvrt,
+"Qd4.ub=vand(Vu32.ub,Rt32.ub)", "Qd4=vand(Vu32,Rt32)", "Insert into Predicate",
+    fSETQBIT(QdV,i,((VuV.ub[i] & fGETUBYTE(i % 4, RtV)) != 0) ? 1 : 0))
+
+ITERATOR_INSN2_MPY_SLOT_LATE(8, vandvrt_acc,
+"Qx4.ub|=vand(Vu32.ub,Rt32.ub)", "Qx4|=vand(Vu32,Rt32)", "Insert into Predicate ",
+    fSETQBIT(QxV,i,fGETQBIT(QxV,i)|(((VuV.ub[i] & fGETUBYTE(i % 4, RtV)) != 0) ? 1 : 0)))
+
+ITERATOR_INSN_ANY_SLOT(8,vandvqv,"Vd32=vand(Qv4,Vu32)","Mask off bytes",
+VdV.b[i] = fGETQBIT(QvV,i) ? VuV.b[i] : 0)
+ITERATOR_INSN_ANY_SLOT(8,vandvnqv,"Vd32=vand(!Qv4,Vu32)","Mask off bytes",
+VdV.b[i] = !fGETQBIT(QvV,i) ? VuV.b[i] : 0)
+
+
+ /***************************************************
+ * Compare Vector with Vector
+ ***************************************************/
+#define VCMP(DEST, ASRC, ASRCOP, CMP, N, SRC, MASK, WIDTH)        \
+{ \
+       for(fHIDE(int) i = 0; i < fVBYTES(); i += WIDTH) { \
+		fSETQBITS(DEST,WIDTH,MASK,i,ASRC ASRCOP ((VuV.SRC[i/WIDTH] CMP VvV.SRC[i/WIDTH]) ? MASK : 0)); \
+    } \
+       }
+
+#define MMVEC_CMPEQMAP(T,T2,T3) \
+DEF_CVI_MAPPING(V6_MAP_eq##T,      "Qd4=vcmp.eq(Vu32." T2 ",Vv32." T2 ")", "Qd4=vcmp.eq(Vu32." T3 ",Vv32." T3 ")") \
+DEF_CVI_MAPPING(V6_MAP_eq##T##_and,"Qx4&=vcmp.eq(Vu32." T2 ",Vv32." T2 ")", "Qx4&=vcmp.eq(Vu32." T3 ",Vv32." T3 ")") \
+DEF_CVI_MAPPING(V6_MAP_eq##T##_ior,"Qx4|=vcmp.eq(Vu32." T2 ",Vv32." T2 ")", "Qx4|=vcmp.eq(Vu32." T3 ",Vv32." T3 ")") \
+DEF_CVI_MAPPING(V6_MAP_eq##T##_xor,"Qx4^=vcmp.eq(Vu32." T2 ",Vv32." T2 ")", "Qx4^=vcmp.eq(Vu32." T3 ",Vv32." T3 ")")
+
+#define MMVEC_CMPGT(TYPE,TYPE2,TYPE3,DESCR,N,MASK,WIDTH,SRC) \
+EXTINSN(V6_vgt##TYPE,       "Qd4=vcmp.gt(Vu32." TYPE2 ",Vv32." TYPE2 ")",  ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA,A_NOTE_ANY_RESOURCE), DESCR" greater than", \
+	VCMP(QdV, , , >, N, SRC, MASK, WIDTH)) \
+EXTINSN(V6_vgt##TYPE##_and, "Qx4&=vcmp.gt(Vu32." TYPE2 ",Vv32." TYPE2 ")", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA,A_NOTE_ANY_RESOURCE), DESCR" greater than with predicate-and", \
+	VCMP(QxV, fGETQBITS(QxV,WIDTH,MASK,i), &, >, N, SRC, MASK, WIDTH)) \
+EXTINSN(V6_vgt##TYPE##_or,  "Qx4|=vcmp.gt(Vu32." TYPE2 ",Vv32." TYPE2 ")", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA,A_NOTE_ANY_RESOURCE), DESCR" greater than with predicate-or", \
+	VCMP(QxV, fGETQBITS(QxV,WIDTH,MASK,i), |, >, N, SRC, MASK, WIDTH)) \
+EXTINSN(V6_vgt##TYPE##_xor, "Qx4^=vcmp.gt(Vu32." TYPE2 ",Vv32." TYPE2 ")", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA,A_NOTE_ANY_RESOURCE), DESCR" greater than with predicate-xor", \
+	VCMP(QxV, fGETQBITS(QxV,WIDTH,MASK,i), ^, >, N, SRC, MASK, WIDTH))
+
+#define MMVEC_CMP(TYPE,TYPE2,TYPE3,DESCR,N,MASK, WIDTH, SRC)\
+MMVEC_CMPGT(TYPE,TYPE2,TYPE3,DESCR,N,MASK,WIDTH,SRC) \
+EXTINSN(V6_veq##TYPE,       "Qd4=vcmp.eq(Vu32." TYPE2 ",Vv32." TYPE2 ")",  ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA,A_NOTE_ANY_RESOURCE), DESCR" equal to", \
+	VCMP(QdV, , , ==, N, SRC, MASK, WIDTH)) \
+EXTINSN(V6_veq##TYPE##_and, "Qx4&=vcmp.eq(Vu32." TYPE2 ",Vv32." TYPE2 ")", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA,A_NOTE_ANY_RESOURCE), DESCR" equalto with predicate-and", \
+	VCMP(QxV, fGETQBITS(QxV,WIDTH,MASK,i), &, ==, N, SRC, MASK, WIDTH)) \
+EXTINSN(V6_veq##TYPE##_or,  "Qx4|=vcmp.eq(Vu32." TYPE2 ",Vv32." TYPE2 ")", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA,A_NOTE_ANY_RESOURCE), DESCR" equalto with predicate-or", \
+	VCMP(QxV, fGETQBITS(QxV,WIDTH,MASK,i), |, ==, N, SRC, MASK, WIDTH)) \
+EXTINSN(V6_veq##TYPE##_xor, "Qx4^=vcmp.eq(Vu32." TYPE2 ",Vv32." TYPE2 ")", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA,A_NOTE_ANY_RESOURCE), DESCR" equalto with predicate-xor", \
+	VCMP(QxV, fGETQBITS(QxV,WIDTH,MASK,i), ^, ==, N, SRC, MASK, WIDTH))
+
+
+MMVEC_CMP(w,"w","","Vector Word Compare ", fVELEM(32), 0xF, 4, w)
+MMVEC_CMP(h,"h","","Vector Half Compare ", fVELEM(16), 0x3, 2, h)
+MMVEC_CMP(b,"b","","Vector Half Compare ", fVELEM(8),  0x1, 1, b)
+MMVEC_CMPGT(uw,"uw","","Vector Unsigned Half Compare ", fVELEM(32), 0xF, 4,uw)
+MMVEC_CMPGT(uh,"uh","","Vector Unsigned Half Compare ", fVELEM(16), 0x3, 2,uh)
+MMVEC_CMPGT(ub,"ub","","Vector Unsigned Byte Compare ", fVELEM(8),  0x1, 1,ub)
+
+MMVEC_CMPEQMAP(uw,"uw","w")
+MMVEC_CMPEQMAP(uh,"uh","h")
+MMVEC_CMPEQMAP(ub,"ub","b")
+
+
+
+/***************************************************
+* Predicate Operations
+***************************************************/
+
+EXTINSN(V6_pred_scalar2, "Qd4=vsetq(Rt32)",         ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VP,A_NOTE_PERMUTE_RESOURCE),   "Set Vector Predicate ",
+{
+    fHIDE(int i;)
+    for(i = 0; i < fVBYTES(); i++) fSETQBIT(QdV,i,(i < (RtV & (fVBYTES()-1))) ? 1 : 0);
+})
+
+EXTINSN(V6_pred_scalar2v2, "Qd4=vsetq2(Rt32)",         ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VP,A_NOTE_PERMUTE_RESOURCE),   "Set Vector Predicate ",
+{
+    fHIDE(int i;)
+    for(i = 0; i < fVBYTES(); i++) fSETQBIT(QdV,i,(i <= ((RtV-1) & (fVBYTES()-1))) ? 1 : 0);
+})
+
+
+ITERATOR_INSN_ANY_SLOT_DOUBLE_VEC(8, shuffeqw, "Qd4.h=vshuffe(Qs4.w,Qt4.w)","Shrink Predicate", fSETQBIT(QdV,i, (i & 2) ? fGETQBIT(QsV,i-2) : fGETQBIT(QtV,i) ) );
+ITERATOR_INSN_ANY_SLOT_DOUBLE_VEC(8, shuffeqh, "Qd4.b=vshuffe(Qs4.h,Qt4.h)","Shrink Predicate", fSETQBIT(QdV,i, (i & 1) ? fGETQBIT(QsV,i-1) : fGETQBIT(QtV,i) ) );
+ITERATOR_INSN_ANY_SLOT_DOUBLE_VEC(8, pred_or, "Qd4=or(Qs4,Qt4)","Vector Predicate Or", fSETQBIT(QdV,i,fGETQBIT(QsV,i) || fGETQBIT(QtV,i) ) );
+ITERATOR_INSN_ANY_SLOT_DOUBLE_VEC(8, pred_and, "Qd4=and(Qs4,Qt4)","Vector Predicate And", fSETQBIT(QdV,i,fGETQBIT(QsV,i) && fGETQBIT(QtV,i) ) );
+ITERATOR_INSN_ANY_SLOT_DOUBLE_VEC(8, pred_xor, "Qd4=xor(Qs4,Qt4)","Vector Predicate Xor", fSETQBIT(QdV,i,fGETQBIT(QsV,i) ^ fGETQBIT(QtV,i) ) );
+ITERATOR_INSN_ANY_SLOT_DOUBLE_VEC(8, pred_or_n, "Qd4=or(Qs4,!Qt4)","Vector Predicate Or with not", fSETQBIT(QdV,i,fGETQBIT(QsV,i) || !fGETQBIT(QtV,i) ) );
+ITERATOR_INSN_ANY_SLOT_DOUBLE_VEC(8, pred_and_n, "Qd4=and(Qs4,!Qt4)","Vector Predicate And  with not", fSETQBIT(QdV,i,fGETQBIT(QsV,i) && !fGETQBIT(QtV,i) ) );
+ITERATOR_INSN_ANY_SLOT(8, pred_not, "Qd4=not(Qs4)","Vector Predicate Not", fSETQBIT(QdV,i,!fGETQBIT(QsV,i) ) );
+
+
+
+EXTINSN(V6_vcmov,  "if (Ps4) Vd32=Vu32",  ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA,A_NOTE_ANY_RESOURCE),   "Conditional Mov",
+{
+if (fLSBOLD(PsV))	{
+	fHIDE(int i;)
+	fVFOREACH(8, i) {
+		VdV.ub[i] = VuV.ub[i];
+	}
+	} else {CANCEL;}
+})
+
+EXTINSN(V6_vncmov,  "if (!Ps4) Vd32=Vu32",  ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA,A_NOTE_ANY_RESOURCE),   "Conditional Mov",
+{
+if (fLSBOLDNOT(PsV))	{
+	fHIDE(int i;)
+	fVFOREACH(8, i) {
+		VdV.ub[i] = VuV.ub[i];
+	}
+	} else {CANCEL;}
+})
+
+EXTINSN(V6_vccombine,  "if (Ps4) Vdd32=vcombine(Vu32,Vv32)",	ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA_DV,A_NOTE_ANY2_RESOURCE),   "Conditional Combine",
+{
+if (fLSBOLD(PsV))	{
+	fHIDE(int i;)
+	fVFOREACH(8, i) {
+		VddV.v[0].ub[i] = VvV.ub[i];
+		VddV.v[1].ub[i] = VuV.ub[i];
+	}
+	} else {CANCEL;}
+})
+
+EXTINSN(V6_vnccombine,  "if (!Ps4) Vdd32=vcombine(Vu32,Vv32)",	ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA_DV,A_NOTE_ANY2_RESOURCE),   "Conditional Combine",
+{
+if (fLSBOLDNOT(PsV))	{
+	fHIDE(int i;)
+	fVFOREACH(8, i) {
+		VddV.v[0].ub[i] = VvV.ub[i];
+		VddV.v[1].ub[i] = VuV.ub[i];
+	}
+	} else {CANCEL;}
+})
+
+
+
+ITERATOR_INSN_ANY_SLOT(8,vmux,"Vd32=vmux(Qt4,Vu32,Vv32)",
+"Vector Select Element 8-bit",
+    VdV.ub[i] = fGETQBIT(QtV,i) ? VuV.ub[i] : VvV.ub[i])
+
+ITERATOR_INSN_ANY_SLOT_DOUBLE_VEC(8,vswap,"Vdd32=vswap(Qt4,Vu32,Vv32)",
+"Vector Swap Element 8-bit",
+    VddV.v[0].ub[i] =  fGETQBIT(QtV,i) ? VuV.ub[i] : VvV.ub[i];
+	VddV.v[1].ub[i] = !fGETQBIT(QtV,i) ? VuV.ub[i] : VvV.ub[i])
+
+
+/***************************************************************************
+*
+*   MMVECTOR SORTING
+*
+****************************************************************************/
+
+#define MMVEC_SORT(TYPE,TYPE2,DESCR,ELEMENTSIZE,SRC)\
+ITERATOR_INSN2_ANY_SLOT(ELEMENTSIZE,vmax##TYPE, "Vd32=vmax" TYPE2 "(Vu32,Vv32)", "Vd32."#SRC"=vmax(Vu32."#SRC",Vv32."#SRC")", "Vector " DESCR " max", VdV.SRC[i] = (VuV.SRC[i] > VvV.SRC[i]) ? VuV.SRC[i] :  VvV.SRC[i])  \
+ITERATOR_INSN2_ANY_SLOT(ELEMENTSIZE,vmin##TYPE, "Vd32=vmin" TYPE2 "(Vu32,Vv32)", "Vd32."#SRC"=vmin(Vu32."#SRC",Vv32."#SRC")", "Vector " DESCR " min", VdV.SRC[i] = (VuV.SRC[i] < VvV.SRC[i]) ? VuV.SRC[i] :  VvV.SRC[i])
+
+MMVEC_SORT(b,"b", "signed byte",    8,  b);
+MMVEC_SORT(ub,"ub", "unsigned byte",    8,  ub);
+MMVEC_SORT(uh,"uh", "unsigned halfword",16, uh);
+MMVEC_SORT(h,   "h",    "halfword",         16, h);
+MMVEC_SORT(w,   "w",    "word",             32, w);
+
+
+
+
+
+
+
+
+
+/*************************************************************
+* SHUFFLES
+****************************************************************/
+
+ITERATOR_INSN2_ANY_SLOT(16,vsathub,"Vd32=vsathub(Vu32,Vv32)","Vd32.ub=vsat(Vu32.h,Vv32.h)",
+"Saturate and pack 32 halfwords to 32 unsigned bytes, and interleave them",
+    fSETBYTE(0, VdV.uh[i], fVSATUB(VvV.h[i]));
+    fSETBYTE(1, VdV.uh[i], fVSATUB(VuV.h[i])))
+
+ITERATOR_INSN2_ANY_SLOT(32,vsatwh,"Vd32=vsatwh(Vu32,Vv32)","Vd32.h=vsat(Vu32.w,Vv32.w)",
+"Saturate and pack 16 words to 16 halfwords, and interleave them",
+    fSETHALF(0, VdV.w[i], fVSATH(VvV.w[i]));
+    fSETHALF(1, VdV.w[i], fVSATH(VuV.w[i])))
+
+ITERATOR_INSN2_ANY_SLOT(32,vsatuwuh,"Vd32=vsatuwuh(Vu32,Vv32)","Vd32.uh=vsat(Vu32.uw,Vv32.uw)",
+"Saturate and pack 16 words to 16 halfwords, and interleave them",
+    fSETHALF(0, VdV.w[i], fVSATUH(VvV.uw[i]));
+    fSETHALF(1, VdV.w[i], fVSATUH(VuV.uw[i])))
+
+ITERATOR_INSN2_ANY_SLOT(16,vshuffeb,"Vd32=vshuffeb(Vu32,Vv32)","Vd32.b=vshuffe(Vu32.b,Vv32.b)",
+"Shuffle half words with in a lane",
+    fSETBYTE(0, VdV.uh[i], fGETUBYTE(0, VvV.uh[i]));
+    fSETBYTE(1, VdV.uh[i], fGETUBYTE(0, VuV.uh[i])))
+
+ITERATOR_INSN2_ANY_SLOT(16,vshuffob,"Vd32=vshuffob(Vu32,Vv32)","Vd32.b=vshuffo(Vu32.b,Vv32.b)",
+"Shuffle half words with in a lane",
+    fSETBYTE(0, VdV.uh[i], fGETUBYTE(1, VvV.uh[i]));
+    fSETBYTE(1, VdV.uh[i], fGETUBYTE(1, VuV.uh[i])))
+
+ITERATOR_INSN2_ANY_SLOT(32,vshufeh,"Vd32=vshuffeh(Vu32,Vv32)","Vd32.h=vshuffe(Vu32.h,Vv32.h)",
+"Shuffle half words with in a lane",
+    fSETHALF(0, VdV.uw[i], fGETUHALF(0, VvV.uw[i]));
+    fSETHALF(1, VdV.uw[i], fGETUHALF(0, VuV.uw[i])))
+
+ITERATOR_INSN2_ANY_SLOT(32,vshufoh,"Vd32=vshuffoh(Vu32,Vv32)","Vd32.h=vshuffo(Vu32.h,Vv32.h)",
+"Shuffle half words with in a lane",
+    fSETHALF(0, VdV.uw[i], fGETUHALF(1, VvV.uw[i]));
+    fSETHALF(1, VdV.uw[i], fGETUHALF(1, VuV.uw[i])))
+
+
+
+
+/**************************************************************************
+* Double Vector Shuffles
+**************************************************************************/
+
+DEF_CVI_MAPPING(V6_vtran2x2_map, "vtrans2x2(Vy32,Vx32,Rt32)","vshuff(Vy32,Vx32,Rt32)")
+
+
+EXTINSN(V6_vshuff, "vshuff(Vy32,Vx32,Rt32)",
+ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VP_VS,A_CVI_EARLY,A_NOTE_PERMUTE_RESOURCE,A_NOTE_SHIFT_RESOURCE),
+"2x2->2x2 transpose, for multiple data sizes, inplace",
+{
+	fHIDE(int offset;)
+	for (offset=1; offset<fVBYTES(); offset<<=1) {
+		if ( RtV & offset) {
+			    fHIDE(int k;) \
+				fVFOREACH(8, k) {\
+				if (!( k & offset)) {
+					fSWAPB(VyV.ub[k], VxV.ub[k+offset]);
+				}
+			}
+		}
+	}
+	})
+
+EXTINSN(V6_vshuffvdd, "Vdd32=vshuff(Vu32,Vv32,Rt8)",
+ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VP_VS,A_NOTE_PERMUTE_RESOURCE,A_NOTE_SHIFT_RESOURCE,A_NOTE_RT8),
+"2x2->2x2 transpose for multiple data sizes",
+{
+	fHIDE(int offset;)
+	VddV.v[0] = VvV;
+	VddV.v[1] = VuV;
+	for (offset=1; offset<fVBYTES(); offset<<=1) {
+		if ( RtV & offset) {
+			    fHIDE(int k;) \
+				fVFOREACH(8, k) {\
+				if (!( k & offset)) {
+					fSWAPB(VddV.v[1].ub[k], VddV.v[0].ub[k+offset]);
+				}
+			}
+		}
+	}
+	})
+
+EXTINSN(V6_vdeal, "vdeal(Vy32,Vx32,Rt32)",
+ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VP_VS,A_CVI_EARLY,A_NOTE_PERMUTE_RESOURCE,A_NOTE_SHIFT_RESOURCE),
+" vector - vector deal - or deinterleave, for multiple data sizes, inplace",
+{
+	fHIDE(int offset;)
+	for (offset=fVBYTES()>>1; offset>0; offset>>=1) {
+		if ( RtV & offset) {
+			    fHIDE(int k;) \
+				fVFOREACH(8, k) {\
+				if (!( k & offset)) {
+					fSWAPB(VyV.ub[k], VxV.ub[k+offset]);
+				}
+			}
+		}
+	}
+	})
+
+EXTINSN(V6_vdealvdd, "Vdd32=vdeal(Vu32,Vv32,Rt8)",
+ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VP_VS,A_NOTE_PERMUTE_RESOURCE,A_NOTE_SHIFT_RESOURCE,A_NOTE_RT8),
+" vector - vector deal - or deinterleave, for multiple data sizes",
+{
+	fHIDE(int offset;)
+	VddV.v[0] = VvV;
+	VddV.v[1] = VuV;
+	for (offset=fVBYTES()>>1; offset>0; offset>>=1) {
+		if ( RtV & offset) {
+			    fHIDE(int k;) \
+				fVFOREACH(8, k) {\
+				if (!( k & offset)) {
+					fSWAPB(VddV.v[1].ub[k], VddV.v[0].ub[k+offset]);
+				}
+			}
+		}
+	}
+	})
+
+/**************************************************************************/
+
+
+
+ITERATOR_INSN2_ANY_SLOT_DOUBLE_VEC(32,vshufoeh,"Vdd32=vshuffoeh(Vu32,Vv32)","Vdd32.h=vshuffoe(Vu32.h,Vv32.h)",
+"Vector Shuffle half words",
+    fSETHALF(0, VddV.v[0].uw[i], fGETUHALF(0, VvV.uw[i]));
+    fSETHALF(1, VddV.v[0].uw[i], fGETUHALF(0, VuV.uw[i]));
+    fSETHALF(0, VddV.v[1].uw[i], fGETUHALF(1, VvV.uw[i]));
+    fSETHALF(1, VddV.v[1].uw[i], fGETUHALF(1, VuV.uw[i])))
+
+ITERATOR_INSN2_ANY_SLOT_DOUBLE_VEC(16,vshufoeb,"Vdd32=vshuffoeb(Vu32,Vv32)","Vdd32.b=vshuffoe(Vu32.b,Vv32.b)",
+"Vector Shuffle bytes",
+    fSETBYTE(0, VddV.v[0].uh[i], fGETUBYTE(0, VvV.uh[i]));
+    fSETBYTE(1, VddV.v[0].uh[i], fGETUBYTE(0, VuV.uh[i]));
+    fSETBYTE(0, VddV.v[1].uh[i], fGETUBYTE(1, VvV.uh[i]));
+    fSETBYTE(1, VddV.v[1].uh[i], fGETUBYTE(1, VuV.uh[i])))
+
+
+/***************************************************************
+* Deal
+***************************************************************/
+
+ITERATOR_INSN2_PERMUTE_SLOT(32, vdealh, "Vd32=vdealh(Vu32)", "Vd32.h=vdeal(Vu32.h)",
+"Deal Halfwords",
+    VdV.uh[i  ] = fGETUHALF(0, VuV.uw[i]);
+    VdV.uh[i+fVELEM(32)] = fGETUHALF(1, VuV.uw[i]))
+
+ITERATOR_INSN2_PERMUTE_SLOT(16, vdealb, "Vd32=vdealb(Vu32)", "Vd32.b=vdeal(Vu32.b)",
+"Deal Halfwords",
+    VdV.ub[i   ] = fGETUBYTE(0, VuV.uh[i]);
+    VdV.ub[i+fVELEM(16)] = fGETUBYTE(1, VuV.uh[i]))
+
+ITERATOR_INSN2_PERMUTE_SLOT(32, vdealb4w,  "Vd32=vdealb4w(Vu32,Vv32)", "Vd32.b=vdeale(Vu32.b,Vv32.b)",
+"Deal Two Vectors Bytes",
+    VdV.ub[0+i ] = fGETUBYTE(0, VvV.uw[i]);
+    VdV.ub[fVELEM(32)+i ] = fGETUBYTE(2, VvV.uw[i]);
+    VdV.ub[2*fVELEM(32)+i] = fGETUBYTE(0, VuV.uw[i]);
+    VdV.ub[3*fVELEM(32)+i] = fGETUBYTE(2, VuV.uw[i]))
+
+/***************************************************************
+* shuffle
+***************************************************************/
+
+ITERATOR_INSN2_PERMUTE_SLOT(32, vshuffh, "Vd32=vshuffh(Vu32)", "Vd32.h=vshuff(Vu32.h)",
+"Deal Halfwords",
+    fSETHALF(0, VdV.uw[i], VuV.uh[i]);
+    fSETHALF(1, VdV.uw[i], VuV.uh[i+fVELEM(32)]))
+
+ITERATOR_INSN2_PERMUTE_SLOT(16, vshuffb, "Vd32=vshuffb(Vu32)", "Vd32.b=vshuff(Vu32.b)",
+"Deal Halfwords",
+    fSETBYTE(0, VdV.uh[i], VuV.ub[i]);
+    fSETBYTE(1, VdV.uh[i], VuV.ub[i+fVELEM(16)]))
+
+
+
+
+
+/***********************************************************
+* INSERT AND EXTRACT
+*********************************************************/
+EXTINSN(V6_extractw, "Rd32=vextract(Vu32,Rs32)",
+ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA,A_RESTRICT_NOPACKET,A_CVI_EXTRACT,A_NOTE_NOPACKET,A_MEMLIKE,A_RESTRICT_SLOT0ONLY),
+"Extract an element from a vector to scalar",
+fHIDE(warn("RdN=%d VuN=%d RsN=%d RsV=0x%08x widx=%d",RdN,VuN,RsN,RsV,((RsV & (fVBYTES()-1)) >> 2));)
+RdV = VuV.uw[ (RsV & (fVBYTES()-1)) >> 2];
+fHIDE(warn("RdV=0x%08x",RdV);))
+
+DEF_CVI_MAPPING(V6_extractw_alt,"Rd32.w=vextract(Vu32,Rs32)","Rd32=vextract(Vu32,Rs32)")
+
+EXTINSN(V6_vinsertwr, "Vx32.w=vinsert(Rt32)",
+ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VX,A_CVI_LATE,A_NOTE_MPY_RESOURCE),
+"Insert Word Scalar into Vector",
+VxV.uw[0] = RtV;)
+
+
+
+
+ITERATOR_INSN_MPY_SLOT_LATE(32,lvsplatw, "Vd32=vsplat(Rt32)", "Replicates scalar accross words in vector", VdV.uw[i] = RtV)
+
+ITERATOR_INSN_MPY_SLOT_LATE(16,lvsplath, "Vd32.h=vsplat(Rt32)", "Replicates scalar accross halves in vector", VdV.uh[i] = RtV)
+
+ITERATOR_INSN_MPY_SLOT_LATE(8,lvsplatb, "Vd32.b=vsplat(Rt32)", "Replicates scalar accross bytes in vector", VdV.ub[i] = RtV)
+
+
+DEF_CVI_MAPPING(V6_vassignp,"Vdd32=Vuu32","Vdd32=vcombine(Vuu.H32,Vuu.L32)")
+
+ITERATOR_INSN_ANY_SLOT(32,vassign,"Vd32=Vu32","Copy a vector",VdV.w[i]=VuV.w[i])
+
+
+ITERATOR_INSN_ANY_SLOT_DOUBLE_VEC(8,vcombine,"Vdd32=vcombine(Vu32,Vv32)",
+"Vector assign, Any two to Vector Pair",
+    VddV.v[0].ub[i] = VvV.ub[i];
+    VddV.v[1].ub[i] = VuV.ub[i])
+
+
+
+///////////////////////////////////////////////////////////////////////////
+
+
+/*********************************************************
+* GENERAL PERMUTE NETWORKS
+*********************************************************/
+
+
+EXTINSN(V6_vdelta, "Vd32=vdelta(Vu32,Vv32)",    ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VP,A_NOTE_PERMUTE_RESOURCE),
+"Reverse Benes Butterfly network ",
+{
+    fHIDE(int offset;)
+    fHIDE(int k;)
+    for (offset=fVBYTES(); (offset>>=1)>0; ) {
+        for (k = 0; k<fVBYTES(); k++) {
+            VdV.ub[k] = (VvV.ub[k]&offset) ? VuV.ub[k^offset] : VuV.ub[k];
+        }
+        for (k = 0; k<fVBYTES(); k++) {
+            VuV.ub[k] = VdV.ub[k];
+        }
+    }
+})
+
+
+EXTINSN(V6_vrdelta, "Vd32=vrdelta(Vu32,Vv32)",  ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VP,A_NOTE_PERMUTE_RESOURCE),
+"Forward Benes Butterfly network ",
+{
+	fHIDE(int offset;)
+    fHIDE(int k;)
+    for (offset=1; offset<fVBYTES(); offset<<=1){
+        for (k = 0; k<fVBYTES(); k++) {
+            VdV.ub[k] = (VvV.ub[k]&offset) ? VuV.ub[k^offset] : VuV.ub[k];
+        }
+        for (k = 0; k<fVBYTES(); k++) {
+            VuV.ub[k] = VdV.ub[k];
+        }
+    }
+})
+
+
+
+
+
+ITERATOR_INSN2_SHIFT_SLOT(32,vcl0w,"Vd32=vcl0w(Vu32)","Vd32.uw=vcl0(Vu32.uw)",         "Count Leading Zeros in Word",     VdV.uw[i]=fCL1_4(~VuV.uw[i]))
+ITERATOR_INSN2_SHIFT_SLOT(16,vcl0h,"Vd32=vcl0h(Vu32)","Vd32.uh=vcl0(Vu32.uh)",         "Count Leading Zeros in Word",    VdV.uh[i]=fCL1_2(~VuV.uh[i]))
+
+ITERATOR_INSN2_SHIFT_SLOT(32,vnormamtw,"Vd32=vnormamtw(Vu32)","Vd32.w=vnormamt(Vu32.w)","Norm Amount Word",
+VdV.w[i]=fMAX(fCL1_4(~VuV.w[i]),fCL1_4(VuV.w[i]))-1; fHIDE(IV1DEAD();))
+ITERATOR_INSN2_SHIFT_SLOT(16,vnormamth,"Vd32=vnormamth(Vu32)","Vd32.h=vnormamt(Vu32.h)","Norm Amount Halfword",
+VdV.h[i]=fMAX(fCL1_2(~VuV.h[i]),fCL1_2(VuV.h[i]))-1; fHIDE(IV1DEAD();))
+
+ITERATOR_INSN_SHIFT_SLOT_VV_LATE(32,vaddclbw,"Vd32.w=vadd(vclb(Vu32.w),Vv32.w)",
+"Count leading bits and add",
+VdV.w[i] = fMAX(fCL1_4(~VuV.w[i]),fCL1_4(VuV.w[i])) + VvV.w[i])
+
+ITERATOR_INSN_SHIFT_SLOT_VV_LATE(16,vaddclbh,"Vd32.h=vadd(vclb(Vu32.h),Vv32.h)",
+"Count leading bits and add",
+VdV.h[i] = fMAX(fCL1_2(~VuV.h[i]),fCL1_2(VuV.h[i])) + VvV.h[i])
+
+
+ITERATOR_INSN2_SHIFT_SLOT(16,vpopcounth,"Vd32=vpopcounth(Vu32)","Vd32.h=vpopcount(Vu32.h)",   "Count Leading Zeros in Word",  VdV.uh[i]=fCOUNTONES_2(VuV.uh[i]))
+
+
+#define fHIST(INPUTVEC) \
+	fUARCH_NOTE_PUMP_4X(); \
+	fHIDE(int lane;) \
+	fHIDE(mmvector_t tmp;) \
+	fVFOREACH(128, lane) { \
+		for (fHIDE(int )i=0; i<128/8; ++i) { \
+			unsigned char value = INPUTVEC.ub[(128/8)*lane+i]; \
+			unsigned char regno = value>>3; \
+			unsigned char element = value & 7; \
+			READ_EXT_VREG(regno,tmp,0); \
+			tmp.uh[(128/16)*lane+(element)]++; \
+			WRITE_EXT_VREG(regno,tmp,EXT_NEW); \
+		} \
+	}
+
+#define fHISTQ(INPUTVEC,QVAL) \
+	fUARCH_NOTE_PUMP_4X(); \
+	fHIDE(int lane;) \
+	fHIDE(mmvector_t tmp;) \
+	fVFOREACH(128, lane) { \
+		for (fHIDE(int )i=0; i<128/8; ++i) { \
+			unsigned char value = INPUTVEC.ub[(128/8)*lane+i]; \
+			unsigned char regno = value>>3; \
+			unsigned char element = value & 7; \
+			READ_EXT_VREG(regno,tmp,0); \
+			if (fGETQBIT(QVAL,128/8*lane+i)) tmp.uh[(128/16)*lane+(element)]++; \
+			WRITE_EXT_VREG(regno,tmp,EXT_NEW); \
+		} \
+	}
+
+
+
+EXTINSN(V6_vhist, "vhist",ATTRIBS(A_EXTENSION,A_CVI,A_CVI_4SLOT,A_CVI_REQUIRES_TMPLOAD), "vhist instruction",{ fHIDE(mmvector_t inputVec;) inputVec=fTMPVDATA(); fHIST(inputVec); })
+EXTINSN(V6_vhistq, "vhist(Qv4)",ATTRIBS(A_EXTENSION,A_CVI,A_CVI_4SLOT,A_CVI_REQUIRES_TMPLOAD), "vhist instruction",{ fHIDE(mmvector_t inputVec;) inputVec=fTMPVDATA(); fHISTQ(inputVec,QvV); })
+
+#undef fHIST
+#undef fHISTQ
+
+
+/* **** WEIGHTED HISTOGRAM **** */
+
+
+#if 1
+#define WHIST(EL,MASK,BSHIFT,COND,SATF) \
+	fHIDE(unsigned int) bucket = fGETUBYTE(0,input.h[i]); \
+	fHIDE(unsigned int) weight = fGETUBYTE(1,input.h[i]); \
+	fHIDE(unsigned int) vindex = (bucket >> 3) & 0x1F; \
+	fHIDE(unsigned int) elindex = ((i>>BSHIFT) & (~MASK)) | ((bucket>>BSHIFT) & MASK); \
+	fHIDE(mmvector_t tmp;) \
+	READ_EXT_VREG(vindex,tmp,0); \
+	COND tmp.EL[elindex] = SATF(tmp.EL[elindex] + weight); \
+	WRITE_EXT_VREG(vindex,tmp,EXT_NEW); \
+	fUARCH_NOTE_PUMP_2X();
+
+ITERATOR_INSN_VHISTLIKE(16,vwhist256,"vwhist256","vector weighted histogram halfword counters", WHIST(uh,7,0,,))
+ITERATOR_INSN_VHISTLIKE(16,vwhist256q,"vwhist256(Qv4)","vector weighted histogram halfword counters", WHIST(uh,7,0,if (fGETQBIT(QvV,2*i)),))
+ITERATOR_INSN_VHISTLIKE(16,vwhist256_sat,"vwhist256:sat","vector weighted histogram halfword counters", WHIST(uh,7,0,,fVSATUH))
+ITERATOR_INSN_VHISTLIKE(16,vwhist256q_sat,"vwhist256(Qv4):sat","vector weighted histogram halfword counters", WHIST(uh,7,0,if (fGETQBIT(QvV,2*i)),fVSATUH))
+ITERATOR_INSN_VHISTLIKE(16,vwhist128,"vwhist128","vector weighted histogram word counters", WHIST(uw,3,1,,))
+ITERATOR_INSN_VHISTLIKE(16,vwhist128q,"vwhist128(Qv4)","vector weighted histogram word counters", WHIST(uw,3,1,if (fGETQBIT(QvV,2*i)),))
+ITERATOR_INSN_VHISTLIKE(16,vwhist128m,"vwhist128(#u1)","vector weighted histogram word counters", WHIST(uw,3,1,if ((bucket & 1) == uiV),))
+ITERATOR_INSN_VHISTLIKE(16,vwhist128qm,"vwhist128(Qv4,#u1)","vector weighted histogram word counters", WHIST(uw,3,1,if (((bucket & 1) == uiV) && fGETQBIT(QvV,2*i)),))
+
+
+#endif
+
+
+
+/* ******   lookup table instructions                          ***********  */
+
+/* Use low bits from idx to choose next-bigger elements from vector, then use LSB from idx to choose odd or even element */
+
+ITERATOR_INSN_PERMUTE_SLOT(8,vlutvvb,"Vd32.b=vlut32(Vu32.b,Vv32.b,Rt8)","vector-vector table lookup",
+fHIDE(fRT8NOTE())
+fHIDE(unsigned int idx;) fHIDE(int matchval;) fHIDE(int oddhalf;)
+matchval = RtV & 0x7;
+oddhalf = (RtV >> (fVECLOGSIZE()-6)) & 0x1;
+idx = VuV.ub[i];
+VdV.b[i] = ((idx & 0xE0) == (matchval << 5)) ? fGETBYTE(oddhalf,VvV.h[idx % fVELEM(16)]) : 0)
+
+
+ITERATOR_INSN_PERMUTE_SLOT_DOUBLE_VEC(8,vlutvvb_oracc,"Vx32.b|=vlut32(Vu32.b,Vv32.b,Rt8)","vector-vector table lookup",
+fHIDE(fRT8NOTE())
+fHIDE(unsigned int idx;) fHIDE(int matchval;) fHIDE(int oddhalf;)
+matchval = RtV & 0x7;
+oddhalf = (RtV >> (fVECLOGSIZE()-6)) & 0x1;
+idx = VuV.ub[i];
+VxV.b[i] |= ((idx & 0xE0) == (matchval << 5)) ? fGETBYTE(oddhalf,VvV.h[idx % fVELEM(16)]) : 0)
+
+ITERATOR_INSN_PERMUTE_SLOT_DOUBLE_VEC(16,vlutvwh,"Vdd32.h=vlut16(Vu32.b,Vv32.h,Rt8)","vector-vector table lookup",
+fHIDE(fRT8NOTE())
+fHIDE(unsigned int idx;) fHIDE(int matchval;) fHIDE(int oddhalf;)
+matchval = RtV & 0xF;
+oddhalf = (RtV >> (fVECLOGSIZE()-6)) & 0x1;
+idx = fGETUBYTE(0,VuV.uh[i]);
+VddV.v[0].h[i] = ((idx & 0xF0) == (matchval << 4)) ? fGETHALF(oddhalf,VvV.w[idx % fVELEM(32)]) : 0;
+idx = fGETUBYTE(1,VuV.uh[i]);
+VddV.v[1].h[i] = ((idx & 0xF0) == (matchval << 4)) ? fGETHALF(oddhalf,VvV.w[idx % fVELEM(32)]) : 0)
+
+ITERATOR_INSN_PERMUTE_SLOT_DOUBLE_VEC(16,vlutvwh_oracc,"Vxx32.h|=vlut16(Vu32.b,Vv32.h,Rt8)","vector-vector table lookup",
+fHIDE(fRT8NOTE())
+fHIDE(unsigned int idx;) fHIDE(int matchval;) fHIDE(int oddhalf;)
+matchval = fGETUBYTE(0,RtV) & 0xF;
+oddhalf = (RtV >> (fVECLOGSIZE()-6)) & 0x1;
+idx = fGETUBYTE(0,VuV.uh[i]);
+VxxV.v[0].h[i] |= ((idx & 0xF0) == (matchval << 4)) ? fGETHALF(oddhalf,VvV.w[idx % fVELEM(32)]) : 0;
+idx = fGETUBYTE(1,VuV.uh[i]);
+VxxV.v[1].h[i] |= ((idx & 0xF0) == (matchval << 4)) ? fGETHALF(oddhalf,VvV.w[idx % fVELEM(32)]) : 0)
+
+ITERATOR_INSN_PERMUTE_SLOT(8,vlutvvbi,"Vd32.b=vlut32(Vu32.b,Vv32.b,#u3)","vector-vector table lookup",
+fHIDE(unsigned int idx;) fHIDE(int matchval;) fHIDE(int oddhalf;)
+matchval = uiV & 0x7;
+oddhalf = (uiV >> (fVECLOGSIZE()-6)) & 0x1;
+idx = VuV.ub[i];
+VdV.b[i] = ((idx & 0xE0) == (matchval << 5)) ? fGETBYTE(oddhalf,VvV.h[idx % fVELEM(16)]) : 0)
+
+
+ITERATOR_INSN_PERMUTE_SLOT_DOUBLE_VEC(8,vlutvvb_oracci,"Vx32.b|=vlut32(Vu32.b,Vv32.b,#u3)","vector-vector table lookup",
+fHIDE(unsigned int idx;) fHIDE(int matchval;) fHIDE(int oddhalf;)
+matchval = uiV & 0x7;
+oddhalf = (uiV >> (fVECLOGSIZE()-6)) & 0x1;
+idx = VuV.ub[i];
+VxV.b[i] |= ((idx & 0xE0) == (matchval << 5)) ? fGETBYTE(oddhalf,VvV.h[idx % fVELEM(16)]) : 0)
+
+ITERATOR_INSN_PERMUTE_SLOT_DOUBLE_VEC(16,vlutvwhi,"Vdd32.h=vlut16(Vu32.b,Vv32.h,#u3)","vector-vector table lookup",
+fHIDE(unsigned int idx;) fHIDE(int matchval;) fHIDE(int oddhalf;)
+matchval = uiV & 0xF;
+oddhalf = (uiV >> (fVECLOGSIZE()-6)) & 0x1;
+idx = fGETUBYTE(0,VuV.uh[i]);
+VddV.v[0].h[i] = ((idx & 0xF0) == (matchval << 4)) ? fGETHALF(oddhalf,VvV.w[idx % fVELEM(32)]) : 0;
+idx = fGETUBYTE(1,VuV.uh[i]);
+VddV.v[1].h[i] = ((idx & 0xF0) == (matchval << 4)) ? fGETHALF(oddhalf,VvV.w[idx % fVELEM(32)]) : 0)
+
+ITERATOR_INSN_PERMUTE_SLOT_DOUBLE_VEC(16,vlutvwh_oracci,"Vxx32.h|=vlut16(Vu32.b,Vv32.h,#u3)","vector-vector table lookup",
+fHIDE(unsigned int idx;) fHIDE(int matchval;) fHIDE(int oddhalf;)
+matchval = uiV & 0xF;
+oddhalf = (uiV >> (fVECLOGSIZE()-6)) & 0x1;
+idx = fGETUBYTE(0,VuV.uh[i]);
+VxxV.v[0].h[i] |= ((idx & 0xF0) == (matchval << 4)) ? fGETHALF(oddhalf,VvV.w[idx % fVELEM(32)]) : 0;
+idx = fGETUBYTE(1,VuV.uh[i]);
+VxxV.v[1].h[i] |= ((idx & 0xF0) == (matchval << 4)) ? fGETHALF(oddhalf,VvV.w[idx % fVELEM(32)]) : 0)
+
+ITERATOR_INSN_PERMUTE_SLOT(8,vlutvvb_nm,"Vd32.b=vlut32(Vu32.b,Vv32.b,Rt8):nomatch","vector-vector table lookup",
+fHIDE(fRT8NOTE())
+fHIDE(unsigned int idx;) fHIDE(int oddhalf;) fHIDE(int matchval;)
+    matchval = RtV & 0x7;
+    oddhalf = (RtV >> (fVECLOGSIZE()-6)) & 0x1;
+    idx = VuV.ub[i];
+    idx = (idx&0x1F) | (matchval<<5);
+    VdV.b[i] = fGETBYTE(oddhalf,VvV.h[idx % fVELEM(16)]))
+
+ITERATOR_INSN_PERMUTE_SLOT_DOUBLE_VEC(16,vlutvwh_nm,"Vdd32.h=vlut16(Vu32.b,Vv32.h,Rt8):nomatch","vector-vector table lookup",
+fHIDE(fRT8NOTE())
+fHIDE(unsigned int idx;) fHIDE(int oddhalf;) fHIDE(int matchval;)
+    matchval = RtV & 0xF;
+    oddhalf = (RtV >> (fVECLOGSIZE()-6)) & 0x1;
+    idx = fGETUBYTE(0,VuV.uh[i]);
+    idx = (idx&0x0F) | (matchval<<4);
+    VddV.v[0].h[i] = fGETHALF(oddhalf,VvV.w[idx % fVELEM(32)]);
+    idx = fGETUBYTE(1,VuV.uh[i]);
+    idx = (idx&0x0F) | (matchval<<4);
+    VddV.v[1].h[i] = fGETHALF(oddhalf,VvV.w[idx % fVELEM(32)]))
+
+
+
+
+/******************************************************************************
+NON LINEAR - V65
+ ******************************************************************************/
+
+ITERATOR_INSN_SLOT2_DOUBLE_VEC(16,vmpahhsat,"Vx32.h=vmpa(Vx32.h,Vu32.h,Rtt32.h):sat","piecewise linear approximation",
+    VxV.h[i]= fVSATH( ( ( fMPY16SS(VxV.h[i],VuV.h[i])<<1) + (fGETHALF(( (VuV.h[i]>>14)&0x3), RttV )<<15))>>16))
+
+
+ITERATOR_INSN_SLOT2_DOUBLE_VEC(16,vmpauhuhsat,"Vx32.h=vmpa(Vx32.h,Vu32.uh,Rtt32.uh):sat","piecewise linear approximation",
+    VxV.h[i]= fVSATH( (  fMPY16SU(VxV.h[i],VuV.uh[i]) + (fGETUHALF(((VuV.uh[i]>>14)&0x3), RttV )<<15))>>16))
+
+ITERATOR_INSN_SLOT2_DOUBLE_VEC(16,vmpsuhuhsat,"Vx32.h=vmps(Vx32.h,Vu32.uh,Rtt32.uh):sat","piecewise linear approximation",
+    VxV.h[i]= fVSATH( (  fMPY16SU(VxV.h[i],VuV.uh[i]) - (fGETUHALF(((VuV.uh[i]>>14)&0x3), RttV )<<15))>>16))
+
+
+ITERATOR_INSN_SLOT2_DOUBLE_VEC(16,vlut4,"Vd32.h=vlut4(Vu32.uh,Rtt32.h)","4 entry lookup table",
+    VdV.h[i]= fGETHALF(  ((VuV.h[i]>>14)&0x3), RttV ))
+
+
+
+/******************************************************************************
+V65
+ ******************************************************************************/
+
+ITERATOR_INSN_MPY_SLOT_NOV1(32,vmpyuhe,"Vd32.uw=vmpye(Vu32.uh,Rt32.uh)",
+"Vector even halfword unsigned multiply by scalar",
+    VdV.uw[i] = fMPY16UU(fGETUHALF(0, VuV.uw[i]),fGETUHALF(0,RtV)))
+
+
+ITERATOR_INSN_MPY_SLOT_NOV1(32,vmpyuhe_acc,"Vx32.uw+=vmpye(Vu32.uh,Rt32.uh)",
+"Vector even halfword unsigned multiply by scalar",
+    VxV.uw[i] += fMPY16UU(fGETUHALF(0, VuV.uw[i]),fGETUHALF(0,RtV)))
+
+
+
+
+/******************************************************************************
+ Vecror HI/LOW accessors
+ ******************************************************************************/
+DEF_CVI_MAPPING(V6_hi, "Vd32=hi(Vss32)","Vd32=Vss.H32")
+DEF_CVI_MAPPING(V6_lo, "Vd32=lo(Vss32)","Vd32=Vss.L32")
+
+
+
+
+EXTINSN(V6_vgathermw,  "vtmp.w=vgather(Rt32,Mu2,Vv32.w).w", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_GATHER,A_CVI_VA,A_CVI_VM,A_CVI_TMP_DST,A_EA_PAGECROSS,A_MEMSIZE_4B,A_MEMLIKE,A_CVI_GATHER_ADDR_4B,A_NOTE_ANY_RESOURCE), "Gather Words",
+{
+    fHIDE(int i;)
+	fHIDE(int element_size = 4;)
+    fHIDE(fGATHER_INIT( RtV, MuV, element_size);)
+    fVLASTBYTE(MuV, element_size);
+    fVALIGN(RtV, element_size);
+    fVFOREACH(32, i) {
+        EA = RtV+VvV.uw[i];
+        fVLOG_VTCM_GATHER_WORD(EA, VvV.uw[i], i,MuV);
+    }
+    fGATHER_FINISH()
+})
+EXTINSN(V6_vgathermh,  "vtmp.h=vgather(Rt32,Mu2,Vv32.h).h", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_GATHER,A_CVI_VA,A_CVI_VM,A_CVI_TMP_DST,A_EA_PAGECROSS,A_MEMSIZE_2B,A_MEMLIKE,A_CVI_GATHER_ADDR_2B,A_NOTE_ANY_RESOURCE), "Gather halfwords",
+{
+    fHIDE(int i;)
+	fHIDE(int element_size = 2;)
+    fHIDE(fGATHER_INIT( RtV, MuV, element_size);)
+    fVLASTBYTE(MuV, element_size);
+    fVALIGN(RtV, element_size);
+    fVFOREACH(16, i) {
+        EA = RtV+VvV.uh[i];
+        fVLOG_VTCM_GATHER_HALFWORD(EA, VvV.uh[i], i,MuV);
+    }
+    fGATHER_FINISH()
+})
+
+
+
+EXTINSN(V6_vgathermhw,  "vtmp.h=vgather(Rt32,Mu2,Vvv32.w).h", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_GATHER,A_CVI_VA_DV,A_CVI_VM,A_CVI_TMP_DST,A_EA_PAGECROSS,A_MEMSIZE_2B,A_MEMLIKE,A_CVI_GATHER_ADDR_4B,A_NOTE_ANY_RESOURCE), "Gather halfwords",
+{
+    fHIDE(int i;)
+    fHIDE(int j;)
+	fHIDE(int element_size = 2;)
+    fHIDE(fGATHER_INIT( RtV, MuV, element_size);)
+    fVLASTBYTE(MuV, element_size);
+    fVALIGN(RtV, element_size);
+    fVFOREACH(32, i) {
+       for(j = 0; j < 2; j++) {
+            EA = RtV+VvvV.v[j].uw[i];
+            fVLOG_VTCM_GATHER_HALFWORD_DV(EA, VvvV.v[j].uw[i], (2*i+j),i,j,MuV);
+        }
+    }
+     fGATHER_FINISH()
+})
+
+
+EXTINSN(V6_vgathermwq,  "if (Qs4) vtmp.w=vgather(Rt32,Mu2,Vv32.w).w", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_GATHER,A_CVI_VA,A_CVI_VM,A_CVI_TMP_DST,A_EA_PAGECROSS,A_MEMSIZE_4B,A_MEMLIKE,A_CVI_GATHER_ADDR_4B,A_NOTE_ANY_RESOURCE), "Gather Words",
+{
+    fHIDE(int i;)
+	fHIDE(int element_size = 4;)
+    fHIDE(fGATHER_INIT( RtV, MuV, element_size);)
+    fVLASTBYTE(MuV, element_size);
+    fVALIGN(RtV, element_size);
+    fVFOREACH(32, i) {
+        EA = RtV+VvV.uw[i];
+        fVLOG_VTCM_GATHER_WORDQ(EA, VvV.uw[i], i,QsV,MuV);
+    }
+    fGATHER_FINISH()
+})
+EXTINSN(V6_vgathermhq,  "if (Qs4) vtmp.h=vgather(Rt32,Mu2,Vv32.h).h", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_GATHER,A_CVI_VA,A_CVI_VM,A_CVI_TMP_DST,A_EA_PAGECROSS,A_MEMSIZE_2B,A_MEMLIKE,A_CVI_GATHER_ADDR_2B,A_NOTE_ANY_RESOURCE), "Gather halfwords",
+{
+    fHIDE(int i;)
+	fHIDE(int element_size = 2;)
+    fHIDE(fGATHER_INIT( RtV, MuV, element_size);)
+    fVLASTBYTE(MuV, element_size);
+    fVALIGN(RtV, element_size);
+    fVFOREACH(16, i) {
+        EA = RtV+VvV.uh[i];
+        fVLOG_VTCM_GATHER_HALFWORDQ(EA, VvV.uh[i], i,QsV,MuV);
+    }
+    fGATHER_FINISH()
+})
+
+
+
+EXTINSN(V6_vgathermhwq,  "if (Qs4) vtmp.h=vgather(Rt32,Mu2,Vvv32.w).h", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_GATHER,A_CVI_VA_DV,A_CVI_VM,A_CVI_TMP_DST,A_EA_PAGECROSS,A_MEMSIZE_2B,A_MEMLIKE,A_CVI_GATHER_ADDR_4B,A_NOTE_ANY_RESOURCE), "Gather halfwords",
+{
+    fHIDE(int i;)
+    fHIDE(int j;)
+	fHIDE(int element_size = 2;)
+    fHIDE(fGATHER_INIT( RtV, MuV, element_size);)
+    fVLASTBYTE(MuV, element_size);
+    fVALIGN(RtV, element_size);
+    fVFOREACH(32, i) {
+       for(j = 0; j < 2; j++) {
+            EA = RtV+VvvV.v[j].uw[i];
+            fVLOG_VTCM_GATHER_HALFWORDQ_DV(EA, VvvV.v[j].uw[i], (2*i+j),i,j,QsV,MuV);
+       }
+    }
+    fGATHER_FINISH()
+})
+
+
+
+EXTINSN(V6_vscattermw , "vscatter(Rt32,Mu2,Vv32.w).w=Vw32", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_SCATTER,A_CVI_VA,A_CVI_VM,A_EA_PAGECROSS,A_MEMSIZE_4B,A_CVI_GATHER_ADDR_4B,A_MEMLIKE,A_NOTE_ANY_RESOURCE), "Scatter Words",
+{
+    fHIDE(int i;)
+	fHIDE(int element_size = 4;)
+    fHIDE(fSCATTER_INIT( RtV, MuV, element_size);)
+    fVLASTBYTE(MuV, element_size);
+    fVALIGN(RtV, element_size);
+    fVFOREACH(32, i) {
+        EA = RtV+VvV.uw[i];
+        fVLOG_VTCM_WORD(EA, VvV.uw[i], VwV,i,MuV);
+    }
+    fSCATTER_FINISH(0)
+})
+
+
+
+EXTINSN(V6_vscattermh , "vscatter(Rt32,Mu2,Vv32.h).h=Vw32", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_SCATTER,A_CVI_VA,A_CVI_VM,A_EA_PAGECROSS,A_MEMSIZE_2B,A_CVI_GATHER_ADDR_2B,A_MEMLIKE,A_NOTE_ANY_RESOURCE), "Scatter halfWords",
+{
+    fHIDE(int i;)
+	fHIDE(int element_size = 2;)
+    fHIDE(fSCATTER_INIT( RtV, MuV, element_size);)
+    fVLASTBYTE(MuV, element_size);
+    fVALIGN(RtV, element_size);
+    fVFOREACH(16, i) {
+        EA = RtV+VvV.uh[i];
+        fVLOG_VTCM_HALFWORD(EA,VvV.uh[i],VwV,i,MuV);
+    }
+    fSCATTER_FINISH(0)
+})
+
+
+EXTINSN(V6_vscattermw_add,  "vscatter(Rt32,Mu2,Vv32.w).w+=Vw32", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_SCATTER,A_CVI_VA,A_CVI_VM,A_EA_PAGECROSS,A_MEMSIZE_4B,A_CVI_GATHER_ADDR_4B,A_MEMLIKE,A_CVI_SCATTER_WORD_ACC,A_NOTE_ANY_RESOURCE), "Scatter Words-Add",
+{
+    fHIDE(int i;)
+    fHIDE(int ALIGNMENT=4;)
+	fHIDE(int element_size = 4;)
+    fHIDE(fSCATTER_INIT( RtV, MuV, element_size);)
+    fVLASTBYTE(MuV, element_size);
+    fVALIGN(RtV, element_size);
+    fVFOREACH(32, i) {
+        EA = (RtV+fVALIGN(VvV.uw[i],ALIGNMENT));
+        fVLOG_VTCM_WORD_INCREMENT(EA,VvV.uw[i],VwV,i,ALIGNMENT,MuV);
+    }
+    fHIDE(fLOG_SCATTER_OP(4);)
+    fSCATTER_FINISH(1)
+})
+
+EXTINSN(V6_vscattermh_add,  "vscatter(Rt32,Mu2,Vv32.h).h+=Vw32", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_SCATTER,A_CVI_VA,A_CVI_VM,A_EA_PAGECROSS,A_MEMSIZE_2B,A_CVI_GATHER_ADDR_2B,A_CVI_SCATTER_ACC,A_MEMLIKE,A_NOTE_ANY_RESOURCE), "Scatter halfword-Add",
+{
+    fHIDE(int i;)
+    fHIDE(int ALIGNMENT=2;)
+	fHIDE(int element_size = 2;)
+    fHIDE(fSCATTER_INIT( RtV, MuV, element_size);)
+    fVLASTBYTE(MuV, element_size);
+    fVALIGN(RtV, element_size);
+    fVFOREACH(16, i) {
+        EA = (RtV+fVALIGN(VvV.uh[i],ALIGNMENT));
+        fVLOG_VTCM_HALFWORD_INCREMENT(EA,VvV.uh[i],VwV,i,ALIGNMENT,MuV);
+    }
+    fHIDE(fLOG_SCATTER_OP(2);)
+    fSCATTER_FINISH(1)
+})
+
+
+EXTINSN(V6_vscattermwq,  "if (Qs4) vscatter(Rt32,Mu2,Vv32.w).w=Vw32", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_SCATTER,A_CVI_VA,A_CVI_VM,A_EA_PAGECROSS,A_MEMSIZE_4B,A_CVI_GATHER_ADDR_4B,A_MEMLIKE,A_NOTE_ANY_RESOURCE), "Scatter Words conditional",
+{
+    fHIDE(int i;)
+	fHIDE(int element_size = 4;)
+    fHIDE(fSCATTER_INIT( RtV, MuV, element_size);)
+    fVLASTBYTE(MuV, element_size);
+    fVALIGN(RtV, element_size);
+    fVFOREACH(32, i) {
+        EA = RtV+VvV.uw[i];
+        fVLOG_VTCM_WORDQ(EA,VvV.uw[i], VwV,i,QsV,MuV);
+    }
+    fSCATTER_FINISH(0)
+})
+
+EXTINSN(V6_vscattermhq,  "if (Qs4) vscatter(Rt32,Mu2,Vv32.h).h=Vw32", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_SCATTER,A_CVI_VA,A_CVI_VM,A_EA_PAGECROSS,A_MEMSIZE_2B,A_CVI_GATHER_ADDR_2B,A_MEMLIKE,A_NOTE_ANY_RESOURCE), "Scatter HalfWords conditional",
+{
+    fHIDE(int i;)
+	fHIDE(int element_size = 2;)
+    fHIDE(fSCATTER_INIT( RtV, MuV, element_size);)
+    fVLASTBYTE(MuV, element_size);
+    fVALIGN(RtV, element_size);
+    fVFOREACH(16, i) {
+        EA = RtV+VvV.uh[i];
+        fVLOG_VTCM_HALFWORDQ(EA,VvV.uh[i],VwV,i,QsV,MuV);
+    }
+    fSCATTER_FINISH(0)
+})
+
+
+
+
+EXTINSN(V6_vscattermhw , "vscatter(Rt32,Mu2,Vvv32.w).h=Vw32", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_SCATTER,A_CVI_VA_DV,A_CVI_VM,A_EA_PAGECROSS,A_MEMSIZE_2B,A_CVI_GATHER_ADDR_4B,A_MEMLIKE,A_NOTE_ANY2_RESOURCE), "Scatter Words",
+{
+    fHIDE(int i;)
+    fHIDE(int j;)
+	fHIDE(int element_size = 2;)
+    fHIDE(fSCATTER_INIT( RtV, MuV, element_size);)
+    fVLASTBYTE(MuV, element_size);
+    fVALIGN(RtV, element_size);
+    fVFOREACH(32, i) {
+        for(j = 0; j < 2; j++) {
+            EA = RtV+VvvV.v[j].uw[i];
+            fVLOG_VTCM_HALFWORD_DV(EA,VvvV.v[j].uw[i],VwV,(2*i+j),i,j,MuV);
+        }
+    }
+    fSCATTER_FINISH(0)
+})
+
+
+
+EXTINSN(V6_vscattermhwq,  "if (Qs4) vscatter(Rt32,Mu2,Vvv32.w).h=Vw32", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_SCATTER,A_CVI_VA_DV,A_CVI_VM,A_EA_PAGECROSS,A_MEMSIZE_2B,A_CVI_GATHER_ADDR_4B,A_MEMLIKE,A_NOTE_ANY2_RESOURCE), "Scatter halfwords conditional",
+{
+    fHIDE(int i;)
+    fHIDE(int j;)
+	fHIDE(int element_size = 2;)
+    fHIDE(fSCATTER_INIT( RtV, MuV, element_size);)
+    fVLASTBYTE(MuV, element_size);
+    fVALIGN(RtV, element_size);
+    fVFOREACH(32, i) {
+        for(j = 0; j < 2; j++) {
+            EA = RtV+VvvV.v[j].uw[i];
+            fVLOG_VTCM_HALFWORDQ_DV(EA,VvvV.v[j].uw[i],VwV,(2*i+j),QsV,i,j,MuV);
+        }
+    }
+    fSCATTER_FINISH(0)
+})
+
+EXTINSN(V6_vscattermhw_add,  "vscatter(Rt32,Mu2,Vvv32.w).h+=Vw32", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_SCATTER,A_CVI_VA_DV,A_CVI_VM,A_EA_PAGECROSS,A_MEMSIZE_2B,A_CVI_GATHER_ADDR_4B,A_CVI_SCATTER_ACC,A_MEMLIKE,A_NOTE_ANY2_RESOURCE), "Scatter halfwords-add",
+{
+    fHIDE(int i;)
+    fHIDE(int j;)
+    fHIDE(int ALIGNMENT=2;)
+	fHIDE(int element_size = 2;)
+    fHIDE(fSCATTER_INIT( RtV, MuV, element_size);)
+    fVLASTBYTE(MuV, element_size);
+    fVALIGN(RtV, element_size);
+    fVFOREACH(32, i) {
+        for(j = 0; j < 2; j++) {
+             EA =  RtV + fVALIGN(VvvV.v[j].uw[i],ALIGNMENT);;
+             fVLOG_VTCM_HALFWORD_INCREMENT_DV(EA,VvvV.v[j].uw[i],VwV,(2*i+j),i,j,ALIGNMENT,MuV);
+        }
+    }
+    fHIDE(fLOG_SCATTER_OP(2);)
+    fSCATTER_FINISH(1)
+})
+DEF_CVI_MAPPING(V6_vscattermw_alt,  "vscatter(Rt32,Mu2,Vv32.w)=Vw32.w",  "vscatter(Rt32,Mu2,Vv32.w).w=Vw32")
+DEF_CVI_MAPPING(V6_vscattermwh_alt,  "vscatter(Rt32,Mu2,Vvv32.w)=Vw32.h",  "vscatter(Rt32,Mu2,Vvv32.w).h=Vw32")
+DEF_CVI_MAPPING(V6_vscattermh_alt,  "vscatter(Rt32,Mu2,Vv32.h)=Vw32.h",  "vscatter(Rt32,Mu2,Vv32.h).h=Vw32")
+
+DEF_CVI_MAPPING(V6_vscattermw_add_alt,  "vscatter(Rt32,Mu2,Vv32.w)+=Vw32.w",  "vscatter(Rt32,Mu2,Vv32.w).w+=Vw32")
+DEF_CVI_MAPPING(V6_vscattermwh_add_alt,  "vscatter(Rt32,Mu2,Vvv32.w)+=Vw32.h",  "vscatter(Rt32,Mu2,Vvv32.w).h+=Vw32")
+DEF_CVI_MAPPING(V6_vscattermh_add_alt,  "vscatter(Rt32,Mu2,Vv32.h)+=Vw32.h",  "vscatter(Rt32,Mu2,Vv32.h).h+=Vw32")
+
+DEF_CVI_MAPPING(V6_vscattermwq_alt,  "if (Qs4) vscatter(Rt32,Mu2,Vv32.w)=Vw32.w",  "if (Qs4) vscatter(Rt32,Mu2,Vv32.w).w=Vw32")
+DEF_CVI_MAPPING(V6_vscattermwhq_alt,  "if (Qs4) vscatter(Rt32,Mu2,Vvv32.w)=Vw32.h",  "if (Qs4) vscatter(Rt32,Mu2,Vvv32.w).h=Vw32")
+DEF_CVI_MAPPING(V6_vscattermhq_alt,  "if (Qs4) vscatter(Rt32,Mu2,Vv32.h)=Vw32.h",  "if (Qs4) vscatter(Rt32,Mu2,Vv32.h).h=Vw32")
+
+EXTINSN(V6_vprefixqb,"Vd32.b=prefixsum(Qv4)",   ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VS,A_CVI_EARLY,A_NOTE_SHIFT_RESOURCE),  "parallel prefix sum of Q into byte",
+{
+    fHIDE(int i;)
+    fHIDE(size1u_t acc = 0;)
+    fVFOREACH(8, i) {
+        acc += fGETQBIT(QvV,i);
+        VdV.ub[i] = acc;
+    }
+    } )
+EXTINSN(V6_vprefixqh,"Vd32.h=prefixsum(Qv4)",   ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VS,A_CVI_EARLY,A_NOTE_SHIFT_RESOURCE),  "parallel prefix sum of Q into halfwords",
+{
+    fHIDE(int i;)
+    fHIDE(size2u_t acc = 0;)
+    fVFOREACH(16, i) {
+        acc += fGETQBIT(QvV,i*2+0);
+        acc += fGETQBIT(QvV,i*2+1);
+        VdV.uh[i] = acc;
+    }
+    } )
+EXTINSN(V6_vprefixqw,"Vd32.w=prefixsum(Qv4)",   ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VS,A_CVI_EARLY,A_NOTE_SHIFT_RESOURCE),  "parallel prefix sum of Q into words",
+{
+    fHIDE(int i;)
+    fHIDE(size4u_t acc = 0;)
+    fVFOREACH(32, i) {
+        acc += fGETQBIT(QvV,i*4+0);
+        acc += fGETQBIT(QvV,i*4+1);
+        acc += fGETQBIT(QvV,i*4+2);
+        acc += fGETQBIT(QvV,i*4+3);
+        VdV.uw[i] = acc;
+    }
+    } )
+
+
+
+
+
+/******************************************************************************
+ DEBUG Vector/Register Printing
+ ******************************************************************************/
+
+#define PRINT_VU(TYPE, TYPE2, COUNT)\
+    int i;  \
+    size4u_t vec_len = fVBYTES();\
+    fprintf(stdout,"V%2d: ",VuN);  \
+    for (i=0;i<vec_len>>COUNT;i++) {         \
+        fprintf(stdout,TYPE2 " ", VuV.TYPE[i]); \
+    };  \
+    fprintf(stdout,"\\n");  \
+	fflush(stdout);\
+
+EXTINSN(V6_pv32,    "pv32(Vu32)",   ATTRIBS(A_EXTENSION,A_CVI,A_VDBG,A_FAKEINSN),"Print a Vector",  {   PRINT_VU(uw, "%08x", 2);    })
+EXTINSN(V6_pv32d,   "pv32d(Vu32)",  ATTRIBS(A_EXTENSION,A_CVI,A_VDBG,A_FAKEINSN),"Print a Vector",  {   PRINT_VU(w,  "%10d", 2);    })
+EXTINSN(V6_pv32du,  "pv32du(Vu32)", ATTRIBS(A_EXTENSION,A_CVI,A_VDBG,A_FAKEINSN),"Print a Vector",  {   PRINT_VU(uw, "%10u", 2);    })
+EXTINSN(V6_pv64d,   "pv64d(Vu32)",  ATTRIBS(A_EXTENSION,A_CVI,A_VDBG,A_FAKEINSN),"Print a Vector",  {   PRINT_VU(ud, "%lli", 4);    })
+EXTINSN(V6_pv16,    "pv16(Vu32)",   ATTRIBS(A_EXTENSION,A_CVI,A_VDBG,A_FAKEINSN),"Print a Vector",  {   PRINT_VU(uh, "%04x", 1);   })
+EXTINSN(V6_pv16d,   "pv16d(Vu32)",  ATTRIBS(A_EXTENSION,A_CVI,A_VDBG,A_FAKEINSN),"Print a Vector",  {   PRINT_VU(h,  "%5d",  1);   })
+EXTINSN(V6_pv8d,    "pv8d(Vu32)",   ATTRIBS(A_EXTENSION,A_CVI,A_VDBG,A_FAKEINSN),"Print a Vector",  {   PRINT_VU(ub, "%3u",  0);   })
+EXTINSN(V6_pv8,     "pv8(Vu32)",    ATTRIBS(A_EXTENSION,A_CVI,A_VDBG,A_FAKEINSN),"Print a Vector",  {   PRINT_VU(ub, "%02x", 0);   })
+EXTINSN(V6_preg,    "preg(Ru32)",   ATTRIBS(A_EXTENSION,A_CVI,A_VDBG,A_FAKEINSN),"Print a scalar",  {   printf("R%02d=0x%08x\\n", RuN, RuV);})
+EXTINSN(V6_pregd,   "pregd(Ru32)",  ATTRIBS(A_EXTENSION,A_CVI,A_VDBG,A_FAKEINSN),"Print a scalar",  {   printf("R%02d=%10d\\n", RuN, RuV); })
+EXTINSN(V6_pregf,   "pregf(Ru32)",  ATTRIBS(A_EXTENSION,A_CVI,A_VDBG,A_FAKEINSN),"Print a scalar",  {   printf("R%02d=%f\\n", RuN, (float)RuV); })
+
+EXTINSN(V6_pz,   	"pz(Zu2)",  ATTRIBS(A_EXTENSION,A_CVI,A_VDBG,A_FAKEINSN),"Print a scalar",  {
+	fprintf(stdout,"Z%d:\\n", ZuN);
+	for(int m=0, l=0; l < fVBYTES()/4/8; l++)
+	{
+		fprintf(stdout,"\\t");
+		for(int k = 0; k < 8; k++,m++)
+			fprintf(stdout,"%x ", ZuV.w[m]);
+		fprintf(stdout,"\\n");
+	}
+	fprintf(stdout,"\\n");
+	fflush(stdout);
+})
+
+
+
+
+EXTINSN(V6_ppred,   "ppred(Qs4)",   ATTRIBS(A_EXTENSION,A_CVI,A_VDBG,A_FAKEINSN),"Print Predicates",
+    int j;
+    fprintf(stdout,"Q%d: [",QsN);
+    for (j = 0; j < fVBYTES()-1; j++){
+        fprintf(stdout,"%1x,", fGETQBIT(QsV,j));
+    }
+    fprintf(stdout,"%1x", fGETQBIT(QsV,j));
+    fprintf(stdout,"]\\n");
+	fflush(stdout);
+)
+
+
+
+
+#undef ATTR_VMEM
+#undef ATTR_VMEMU
+#undef ATTR_VMEM_NT
+
+#endif /* NO_MMVEC */
+
+#ifdef __SELF_DEF_EXTINSN
+#undef EXTINSN
+#undef __SELF_DEF_EXTINSN
+#endif
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 58/67] Hexagon HVX import macro definitions
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (56 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 57/67] Hexagon HVX import semantics Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-02-28 16:43 ` [RFC PATCH v2 59/67] Hexagon HVX semantics generator Taylor Simpson
                   ` (9 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Imported from the Hexagon architecture library
    imported/allext_macros.def       Top level macro include for all extensions
    imported/mmvec/macros.def        HVX macro definitions
The macro definition files specify instruction attributes that are applied
to each instruction that reverences the macro.

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/imported/allext_macros.def |   25 +
 target/hexagon/imported/mmvec/macros.def  | 1110 +++++++++++++++++++++++++++++
 2 files changed, 1135 insertions(+)
 create mode 100644 target/hexagon/imported/allext_macros.def
 create mode 100755 target/hexagon/imported/mmvec/macros.def

diff --git a/target/hexagon/imported/allext_macros.def b/target/hexagon/imported/allext_macros.def
new file mode 100644
index 0000000..e1f16eb
--- /dev/null
+++ b/target/hexagon/imported/allext_macros.def
@@ -0,0 +1,25 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+/*
+ * Top level file for all instruction set extensions
+ */
+#define EXTNAME mmvec
+#define EXTSTR "mmvec"
+#include "mmvec/macros.def"
+#undef EXTNAME
+#undef EXTSTR
diff --git a/target/hexagon/imported/mmvec/macros.def b/target/hexagon/imported/mmvec/macros.def
new file mode 100755
index 0000000..fa0beb1
--- /dev/null
+++ b/target/hexagon/imported/mmvec/macros.def
@@ -0,0 +1,1110 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+DEF_MACRO(fDUMPQ,(STR,REG),
+	"dump REG",
+	"dump REG",
+	do {
+		printf(STR ":" #REG ": 0x%016llx\n",REG.ud[0]);
+	} while (0),
+	()
+)
+
+DEF_MACRO(fUSE_LOOKUP_ADDRESS_BY_REV,(PROC),
+	"",
+	"Use full VA address for lookup and exception based on REV ",
+	PROC->arch_proc_options->mmvec_use_full_va_for_lookup,
+	()
+)
+
+DEF_MACRO(fUSE_LOOKUP_ADDRESS,(),
+	"",
+	"Use full VA address for lookup and exception",
+	1,
+	()
+)
+
+DEF_MACRO(fRT8NOTE, (),
+	"",
+	"",
+	,
+	(A_NOTE_RT8)
+)
+
+DEF_MACRO(fNOTQ,(VAL),
+	"~VAL",
+	"~VAL",
+	/* Will break Visual Studio? */
+	({mmqreg_t _ret = {0}; int _i_; for (_i_ = 0; _i_ < fVECSIZE()/64; _i_++) _ret.ud[_i_] = ~VAL.ud[_i_]; _ret;}),
+	()
+)
+
+DEF_MACRO(fGETQBITS,(REG,WIDTH,MASK,BITNO),
+	"REG[BITNO+WIDTH-1:BITNO]",
+	"Get MASK bits at BITNO from REG",
+	((MASK) & (REG.w[(BITNO)>>5] >> ((BITNO) & 0x1f))),
+	()
+)
+
+DEF_MACRO(fGETQBIT,(REG,BITNO),
+	"REG[BITNO]",
+	"Get bit BITNO from REG",
+	fGETQBITS(REG,1,1,BITNO),
+	()
+)
+
+DEF_MACRO(fGENMASKW,(QREG,IDX),
+	"maskw(QREG,IDX)",
+	"Generate mask from QREG for word IDX",
+	(((fGETQBIT(QREG,(IDX*4+0)) ? 0xFF : 0x0) << 0)
+	|((fGETQBIT(QREG,(IDX*4+1)) ? 0xFF : 0x0) << 8)
+	|((fGETQBIT(QREG,(IDX*4+2)) ? 0xFF : 0x0) << 16)
+	|((fGETQBIT(QREG,(IDX*4+3)) ? 0xFF : 0x0) << 24)),
+	()
+)
+DEF_MACRO(fGET10BIT,(COE,VAL,POS),
+	"COE=(((((fGETUBYTE(3,VAL) >> (2 * POS)) & 3) << 8) | fGETUBYTE(POS,VAL)) << 6) >> 6;",
+	"Get 10-bit coefficient from current word value and byte position",
+	{
+		COE = (((((fGETUBYTE(3,VAL) >> (2 * POS)) & 3) << 8) | fGETUBYTE(POS,VAL)) << 6);
+		COE >>= 6;
+	},
+	()
+)
+
+DEF_MACRO(fVMAX,(X,Y),
+	"max(X,Y)",
+	"",
+	(X>Y) ? X : Y,
+	()
+)
+
+
+DEF_MACRO(fREAD_VEC,
+    (DST,IDX),
+    "DST=VREG[IDX]",   /* short desc */
+    "Read Vector IDX", /* long desc */
+	(DST = READ_VREG(fMODCIRCU((IDX),5))),
+    ()
+)
+
+DEF_MACRO(fGETNIBBLE,(IDX,SRC),
+    "SRC.s4[IDX]",
+    "Get nibble",
+    ( fSXTN(4,8,(SRC >> (4*IDX)) & 0xF) ),
+    ()
+)
+
+DEF_MACRO(fGETCRUMB,(IDX,SRC),
+    "SRC.s2[IDX]",
+    "Get 2bits",
+    ( fSXTN(2,8,(SRC >> (2*IDX)) & 0x3) ),
+    ()
+)
+
+DEF_MACRO(fGETCRUMB_SYMMETRIC,(IDX,SRC),
+    "SRC.s2[IDX] >= 0 ? (2-SRC.s2[IDX]) : SRC.s2[IDX]",
+    "Get 2bits",
+    ( (fGETCRUMB(IDX,SRC)>=0 ? (2-fGETCRUMB(IDX,SRC)) : fGETCRUMB(IDX,SRC) ) ),
+    ()
+)
+
+#define ZERO_OFFSET_2B +
+
+DEF_MACRO(fWRITE_VEC,
+    (IDX,VAR),
+    "VREG[IDX]=VAR",   /* short desc */
+    "Write Vector IDX", /* long desc */
+	(WRITE_VREG(fMODCIRCU((IDX),5),VAR)),
+    ()
+)
+
+DEF_MACRO(fGENMASKH,(QREG,IDX),
+	"maskh(QREG,IDX)",
+	"generate mask from QREG for halfword IDX",
+	(((fGETQBIT(QREG,(IDX*2+0)) ? 0xFF : 0x0) << 0)
+	|((fGETQBIT(QREG,(IDX*2+1)) ? 0xFF : 0x0) << 8)),
+	()
+)
+
+DEF_MACRO(fGETMASKW,(VREG,QREG,IDX),
+	"VREG.w[IDX] & fGENMASKW(QREG,IDX)",
+	"Mask word IDX from VREG using QREG",
+	(VREG.w[IDX] & fGENMASKW((QREG),IDX)),
+	()
+)
+
+DEF_MACRO(fGETMASKH,(VREG,QREG,IDX),
+	"VREG.h[IDX] & fGENMASKH(QREG,IDX)",
+	"Mask word IDX from VREG using QREG",
+	(VREG.h[IDX] & fGENMASKH((QREG),IDX)),
+	()
+)
+
+DEF_MACRO(fCONDMASK8,(QREG,IDX,YESVAL,NOVAL),
+	"QREG.IDX ? YESVAL : NOVAL",
+	"QREG.IDX ? YESVAL : NOVAL",
+	(fGETQBIT(QREG,IDX) ? (YESVAL) : (NOVAL)),
+	()
+)
+
+DEF_MACRO(fCONDMASK16,(QREG,IDX,YESVAL,NOVAL),
+	"select_bytes(QREG,IDX,YESVAL,NOVAL)",
+	"select_bytes(QREG,IDX,YESVAL,NOVAL)",
+	((fGENMASKH(QREG,IDX) & (YESVAL)) | (fGENMASKH(fNOTQ(QREG),IDX) & (NOVAL))),
+	()
+)
+
+DEF_MACRO(fCONDMASK32,(QREG,IDX,YESVAL,NOVAL),
+	"select_bytes(QREG,IDX,YESVAL,NOVAL)",
+	"select_bytes(QREG,IDX,YESVAL,NOVAL)",
+	((fGENMASKW(QREG,IDX) & (YESVAL)) | (fGENMASKW(fNOTQ(QREG),IDX) & (NOVAL))),
+	()
+)
+
+
+DEF_MACRO(fSETQBITS,(REG,WIDTH,MASK,BITNO,VAL),
+	"REG[BITNO+WIDTH-1:BITNO] = VAL",
+	"Put bits into REG",
+	do {
+		size4u_t __TMP = (VAL);
+		REG.w[(BITNO)>>5] &= ~((MASK) << ((BITNO) & 0x1f));
+		REG.w[(BITNO)>>5] |= (((__TMP) & (MASK)) << ((BITNO) & 0x1f));
+	} while (0),
+	()
+)
+
+DEF_MACRO(fSETQBIT,(REG,BITNO,VAL),
+	"REG[BITNO]=VAL",
+	"Put bit into REG",
+	fSETQBITS(REG,1,1,BITNO,VAL),
+	()
+)
+
+DEF_MACRO(fVBYTES,(),
+	"VWIDTH",
+	"Number of bytes in a vector",
+	(fVECSIZE()),
+	()
+)
+
+DEF_MACRO(fVHALVES,(),
+	"VWIDTH/2",
+	"Number of halves in a vector",
+	(fVECSIZE()/2),
+	()
+)
+
+DEF_MACRO(fVWORDS,(),
+	"VWIDTH/2",
+	"Number of words in a vector",
+	(fVECSIZE()/4),
+	()
+)
+
+DEF_MACRO(fVDWORDS,(),
+	"VWIDTH/8",
+	"Number of double words in a vector",
+	(fVECSIZE()/8),
+	()
+)
+
+DEF_MACRO(fVALIGN, (ADDR, LOG2_ALIGNMENT),
+    "ADDR = ADDR & ~(LOG2_ALIGNMENT-1)",
+    "Align to Element Size",
+    ( ADDR = ADDR & ~(LOG2_ALIGNMENT-1)),
+    ()
+)
+
+DEF_MACRO(fVLASTBYTE, (ADDR, LOG2_ALIGNMENT),
+    "ADDR = ADDR | (LOG2_ALIGNMENT-1)",
+    "Set LSB of length to last byte",
+    ( ADDR = ADDR | (LOG2_ALIGNMENT-1)),
+    ()
+)
+
+
+DEF_MACRO(fVELEM, (WIDTH),
+    "VBITS/WIDTH",
+    "Number of WIDTH-bit elements in a vector",
+    ((fVECSIZE()*8)/WIDTH),
+    ()
+)
+
+DEF_MACRO(fVECLOGSIZE,(),
+    "log2(VECTOR_SIZE)",
+    "Log base 2 of the number of bytes in a vector",
+    (mmvec_current_veclogsize(thread)),
+    ()
+)
+
+DEF_MACRO(fVBUF_IDX,(EA),
+	"(EA >> log2(VECTOR_SIZE)) & 0xFF",
+	"(EA >> log2(VECTOR_SIZE)) & 0xFF",
+	(((EA) >> fVECLOGSIZE()) & 0xFF),
+	(A_FAKEINSN)
+)
+
+DEF_MACRO(fREAD_VBUF,(IDX,WIDX),
+	"vbuf[IDX].w[WIDX]",
+	"vbuf[IDX].w[WIDX]",
+	READ_VBUF(IDX,WIDX),
+	(A_FAKEINSN)
+)
+
+DEF_MACRO(fLOG_VBUF,(IDX,VAL,WIDX),
+	"vbuf[IDX].w[WIDX] = VAL",
+	"vbuf[IDX].w[WIDX] = VAL",
+	LOG_VBUF(IDX,VAL,WIDX),
+	(A_FAKEINSN)
+)
+
+DEF_MACRO(fVECSIZE,(),
+    "VBYTES",
+    "Number of bytes in a vector currently",
+    (1<<fVECLOGSIZE()),
+    ()
+)
+
+DEF_MACRO(fSWAPB,(A, B),
+    "SWAP(A,B)",
+    "Swap bytes",
+    {
+		size1u_t tmp = A;
+		A = B;
+		B = tmp;
+	},
+    /* NOTHING */
+)
+
+DEF_MACRO(
+	fVZERO,(),
+	"0",
+	"0",
+	mmvec_zero_vector(),
+	()
+)
+
+DEF_MACRO(
+    fNEWVREG,(VNUM),
+    "VNUM.new",
+    "Register value produced in this packet",
+    ((THREAD2STRUCT->VRegs_updated & (((VRegMask)1)<<VNUM)) ? THREAD2STRUCT->future_VRegs[VNUM] : mmvec_zero_vector()),
+    (A_DOTNEWVALUE,A_RESTRICT_SLOT0ONLY)
+)
+
+DEF_MACRO(
+	fV_AL_CHECK,
+	(EA,MASK),
+	"",
+	"",
+	if ((EA) & (MASK)) {
+		warn("aligning misaligned vector. PC=%08x EA=%08x",thread->Regs[REG_PC],(EA));
+	},
+	()
+)
+DEF_MACRO(fSCATTER_INIT, ( REGION_START, LENGTH, ELEMENT_SIZE),
+    "",
+    "",
+    {
+    mem_vector_scatter_init(thread, insn,   REGION_START, LENGTH, ELEMENT_SIZE);
+	if (EXCEPTION_DETECTED) return;
+    },
+    (A_STORE,A_MEMLIKE,A_RESTRICT_SINGLE_MEM_FIRST,A_RESTRICT_SLOT0ONLY)
+)
+
+DEF_MACRO(fGATHER_INIT, ( REGION_START, LENGTH, ELEMENT_SIZE),
+    "",
+    "",
+    {
+    mem_vector_gather_init(thread, insn,   REGION_START, LENGTH, ELEMENT_SIZE);
+	if (EXCEPTION_DETECTED) return;
+    },
+    (A_LOAD,A_MEMLIKE,A_RESTRICT_SINGLE_MEM_FIRST,A_RESTRICT_SLOT1ONLY)
+)
+
+DEF_MACRO(fSCATTER_FINISH, (OP),
+    "",
+    "",
+    {
+	if (EXCEPTION_DETECTED) return;
+    mem_vector_scatter_finish(thread, insn, OP);
+    },
+    ()
+)
+
+DEF_MACRO(fGATHER_FINISH, (),
+    "",
+    "",
+    {
+	if (EXCEPTION_DETECTED) return;
+    mem_vector_gather_finish(thread, insn);
+    },
+    ()
+)
+
+
+DEF_MACRO(CHECK_VTCM_PAGE, (FLAG, BASE, LENGTH, OFFSET, ALIGNMENT),
+    "FLAG=((BASE+OFFSET) < (BASE+LENGTH))",
+    "FLAG=((BASE+OFFSET) < (BASE+LENGTH))",
+     {
+        int slot = insn->slot;
+        paddr_t pa = thread->mem_access[slot].paddr+OFFSET;
+        pa = pa & ~(ALIGNMENT-1);
+        FLAG = (pa < (thread->mem_access[slot].paddr+LENGTH));
+     },
+    ()
+)
+DEF_MACRO(COUNT_OUT_OF_BOUNDS, (FLAG, SIZE),
+    " ",
+    "",
+     {
+        if (!FLAG)
+        {
+               THREAD2STRUCT->vtcm_log.oob_access += SIZE;
+               warn("Scatter/Gather out of bounds of region");
+        }
+     },
+    ()
+)
+
+DEF_MACRO(fLOG_SCATTER_OP, (SIZE),
+    "  ",
+    "  ",
+    {
+        // Log the size and indicate that the extension ext.c file needs to increment right before memory write
+        THREAD2STRUCT->vtcm_log.op = 1;
+        THREAD2STRUCT->vtcm_log.op_size = SIZE;
+    },
+    ()
+)
+
+
+
+DEF_MACRO(fVLOG_VTCM_WORD_INCREMENT, (EA,OFFSET,INC,IDX,ALIGNMENT,LEN),
+    "if (RtV <= EA <= RtV + LEN) *EA += INC.uw[IDX] ",
+    "if (RtV <= EA <= RtV + LEN) *EA += INC.uw[IDX] ",
+    {
+        int slot = insn->slot;
+        int log_bank = 0;
+        int log_byte =0;
+        paddr_t pa = thread->mem_access[slot].paddr+(OFFSET & ~(ALIGNMENT-1));
+        paddr_t pa_high = thread->mem_access[slot].paddr+LEN;
+        for(int i0 = 0; i0 < 4; i0++)
+        {
+            log_byte =  ((OFFSET>=0)&&((pa+i0)<=pa_high));
+            log_bank |= (log_byte<<i0);
+            LOG_VTCM_BYTE(pa+i0,log_byte,INC.ub[4*IDX+i0],4*IDX+i0);
+        }
+        { LOG_VTCM_BANK(pa, log_bank, IDX); }
+    },
+    ()
+)
+
+DEF_MACRO(fVLOG_VTCM_HALFWORD_INCREMENT, (EA,OFFSET,INC,IDX,ALIGNMENT,LEN),
+    "if (RtV <= EA <= RtV + LEN) *EA += INC.uh[IDX] ",
+    "if (RtV <= EA <= RtV + LEN) *EA += INC.uh[IDX] ",
+    {
+        int slot = insn->slot;
+        int log_bank = 0;
+        int log_byte = 0;
+        paddr_t pa = thread->mem_access[slot].paddr+(OFFSET & ~(ALIGNMENT-1));
+        paddr_t pa_high = thread->mem_access[slot].paddr+LEN;
+        for(int i0 = 0; i0 < 2; i0++) {
+            log_byte =  ((OFFSET>=0)&&((pa+i0)<=pa_high));
+            log_bank |= (log_byte<<i0);
+            LOG_VTCM_BYTE(pa+i0,log_byte,INC.ub[2*IDX+i0],2*IDX+i0);
+        }
+        { LOG_VTCM_BANK(pa, log_bank,IDX); }
+    },
+    ()
+)
+
+DEF_MACRO(fVLOG_VTCM_HALFWORD_INCREMENT_DV, (EA,OFFSET,INC,IDX,IDX2,IDX_H,ALIGNMENT,LEN),
+    "if (RtV <= EA <= RtV + LEN) *EA += IN.uw[IDX2].uh[IDX_H] ",
+    "if (RtV <= EA <= RtV + LEN) *EA += IN.uw[IDX2].uh[IDX_H] ",
+    {
+        int slot = insn->slot;
+        int log_bank = 0;
+        int log_byte = 0;
+        paddr_t pa = thread->mem_access[slot].paddr+(OFFSET & ~(ALIGNMENT-1));
+        paddr_t pa_high = thread->mem_access[slot].paddr+LEN;
+        for(int i0 = 0; i0 < 2; i0++) {
+            log_byte =  ((OFFSET>=0)&&((pa+i0)<=pa_high));
+            log_bank |= (log_byte<<i0);
+            LOG_VTCM_BYTE(pa+i0,log_byte,INC.ub[2*IDX+i0],2*IDX+i0);
+        }
+        { LOG_VTCM_BANK(pa, log_bank,(2*IDX2+IDX_H));}
+    },
+    ()
+)
+
+
+
+DEF_MACRO(GATHER_FUNCTION, (EA,OFFSET,IDX, LEN, ELEMENT_SIZE, BANK_IDX, QVAL),
+"",
+"",
+{
+        int slot = insn->slot;
+        int i0;
+        paddr_t pa = thread->mem_access[slot].paddr+OFFSET;
+        paddr_t pa_high = thread->mem_access[slot].paddr+LEN;
+        int log_bank = 0;
+        int log_byte = 0;
+        for(i0 = 0; i0 < ELEMENT_SIZE; i0++)
+        {
+            log_byte =  ((OFFSET>=0)&&((pa+i0)<=pa_high)) && QVAL;
+            log_bank |= (log_byte<<i0);
+            size1u_t B  = sim_mem_read1(thread->system_ptr, thread->threadId, thread->mem_access[slot].paddr+OFFSET+i0);
+            THREAD2STRUCT->tmp_VRegs[0].ub[ELEMENT_SIZE*IDX+i0] = B;
+            LOG_VTCM_BYTE(pa+i0,log_byte,B,ELEMENT_SIZE*IDX+i0);
+        }
+        LOG_VTCM_BANK(pa, log_bank,BANK_IDX);
+},
+()
+)
+
+
+
+DEF_MACRO(fVLOG_VTCM_GATHER_WORD, (EA,OFFSET,IDX, LEN),
+    "if (RtV <= EA <= RtV + LEN) TEMP.uw[IDX] = *EA ",
+    "if (RtV <= EA <= RtV + LEN) TEMP.uw[IDX] = *EA ",
+    {
+		GATHER_FUNCTION(EA,OFFSET,IDX, LEN, 4, IDX, 1);
+    },
+    ()
+)
+DEF_MACRO(fVLOG_VTCM_GATHER_HALFWORD, (EA,OFFSET,IDX, LEN),
+    " if (RtV <= EA <= RtV + LEN) TEMP.uh[IDX] = *EA ",
+    " if (RtV <= EA <= RtV + LEN) TEMP.uh[IDX] = *EA ",
+    {
+		GATHER_FUNCTION(EA,OFFSET,IDX, LEN, 2, IDX, 1);
+    },
+    ()
+)
+DEF_MACRO(fVLOG_VTCM_GATHER_HALFWORD_DV, (EA,OFFSET,IDX,IDX2,IDX_H, LEN),
+    "if (RtV <= EA <= RtV + LEN) TEMP.uw[IDX2].uh[IDX_H] = *EA ",
+    "if (RtV <= EA <= RtV + LEN) TEMP.uw[IDX2].uh[IDX_H] = *EA ",
+    {
+		GATHER_FUNCTION(EA,OFFSET,IDX, LEN, 2, (2*IDX2+IDX_H), 1);
+    },
+    ()
+)
+DEF_MACRO(fVLOG_VTCM_GATHER_WORDQ, (EA,OFFSET,IDX, Q, LEN),
+    " if ( (RtV <= EA <= RtV + LEN) & Q) TEMP.uw[IDX] = *EA ",
+    " if ( (RtV <= EA <= RtV + LEN) & Q) TEMP.uw[IDX] = *EA ",
+    {
+		GATHER_FUNCTION(EA,OFFSET,IDX, LEN, 4, IDX, fGETQBIT(QsV,4*IDX+i0));
+    },
+    ()
+)
+DEF_MACRO(fVLOG_VTCM_GATHER_HALFWORDQ, (EA,OFFSET,IDX, Q, LEN),
+    " if ( (RtV <= EA <= RtV + LEN) & Q) TEMP.uh[IDX] = *EA ",
+    " if ( (RtV <= EA <= RtV + LEN) & Q) TEMP.uh[IDX] = *EA ",
+    {
+		GATHER_FUNCTION(EA,OFFSET,IDX, LEN, 2, IDX, fGETQBIT(QsV,2*IDX+i0));
+    },
+    ()
+)
+
+DEF_MACRO(fVLOG_VTCM_GATHER_HALFWORDQ_DV, (EA,OFFSET,IDX,IDX2,IDX_H, Q, LEN),
+    " if ( (RtV <= EA <= RtV + LEN) & Q) TEMP.uw[IDX2].uh[IDX_H] = *EA ",
+    " if ( (RtV <= EA <= RtV + LEN) & Q) TEMP.uw[IDX2].uh[IDX_H] = *EA ",
+    {
+		GATHER_FUNCTION(EA,OFFSET,IDX, LEN, 2, (2*IDX2+IDX_H), fGETQBIT(QsV,2*IDX+i0));
+    },
+    ()
+)
+
+
+DEF_MACRO(DEBUG_LOG_ADDR, (OFFSET),
+    "  ",
+    "  ",
+    {
+
+        if (thread->processor_ptr->arch_proc_options->mmvec_network_addr_log2)
+        {
+
+            int slot = insn->slot;
+            paddr_t pa = thread->mem_access[slot].paddr+OFFSET;
+        }
+    },
+    ()
+)
+
+
+
+
+
+
+
+DEF_MACRO(SCATTER_OP_WRITE_TO_MEM, (TYPE),
+    " Read, accumulate, and write to VTCM",
+    "  ",
+    {
+        for (int i = 0; i < mmvecx->vtcm_log.size; i+=sizeof(TYPE))
+        {
+            if ( mmvecx->vtcm_log.mask.ub[i] != 0) {
+                TYPE dst = 0;
+                TYPE inc = 0;
+                for(int j = 0; j < sizeof(TYPE); j++) {
+                    dst |= (sim_mem_read1(thread->system_ptr, thread->threadId, mmvecx->vtcm_log.pa[i+j]) << (8*j));
+                    inc |= mmvecx->vtcm_log.data.ub[j+i] << (8*j);
+
+                    mmvecx->vtcm_log.mask.ub[j+i] = 0;
+                    mmvecx->vtcm_log.data.ub[j+i] = 0;
+                    mmvecx->vtcm_log.offsets.ub[j+i] = 0;
+                }
+                dst += inc;
+                for(int j = 0; j < sizeof(TYPE); j++) {
+                    sim_mem_write1(thread->system_ptr,thread->threadId, mmvecx->vtcm_log.pa[i+j], (dst >> (8*j))& 0xFF );
+                }
+        }
+
+    }
+    },
+    ()
+)
+
+DEF_MACRO(SCATTER_FUNCTION, (EA,OFFSET,IDX, LEN, ELEMENT_SIZE, BANK_IDX, QVAL, IN),
+"",
+"",
+{
+        int slot = insn->slot;
+        int i0;
+        paddr_t pa = thread->mem_access[slot].paddr+OFFSET;
+        paddr_t pa_high = thread->mem_access[slot].paddr+LEN;
+        int log_bank = 0;
+        int log_byte = 0;
+        for(i0 = 0; i0 < ELEMENT_SIZE; i0++) {
+            log_byte = ((OFFSET>=0)&&((pa+i0)<=pa_high)) && QVAL;
+            log_bank |= (log_byte<<i0);
+            LOG_VTCM_BYTE(pa+i0,log_byte,IN.ub[ELEMENT_SIZE*IDX+i0],ELEMENT_SIZE*IDX+i0);
+        }
+        LOG_VTCM_BANK(pa, log_bank,BANK_IDX);
+
+},
+()
+)
+
+DEF_MACRO(fVLOG_VTCM_HALFWORD, (EA,OFFSET,IN,IDX, LEN),
+    "if (RtV <= EA <= RtV + LEN) *EA = IN.uh[IDX] ",
+    "if (RtV <= EA <= RtV + LEN) *EA = IN.uh[IDX] ",
+    {
+		SCATTER_FUNCTION (EA,OFFSET,IDX, LEN, 2, IDX, 1, IN);
+    },
+    ()
+)
+DEF_MACRO(fVLOG_VTCM_WORD, (EA,OFFSET,IN,IDX,LEN),
+    "if (RtV <= EA <= RtV + LEN) *EA = IN.uw[IDX] ",
+    "if (RtV <= EA <= RtV + LEN) *EA = IN.uw[IDX] ",
+    {
+		SCATTER_FUNCTION (EA,OFFSET,IDX, LEN, 4, IDX, 1, IN);
+    },
+    ()
+)
+
+DEF_MACRO(fVLOG_VTCM_HALFWORDQ, (EA,OFFSET,IN,IDX,Q,LEN),
+    " if ( (RtV <= EA <= RtV + LEN) & Q) *EA = IN.uh[IDX] ",
+    " if ( (RtV <= EA <= RtV + LEN) & Q) *EA = IN.uh[IDX] ",
+    {
+		SCATTER_FUNCTION (EA,OFFSET,IDX, LEN, 2, IDX, fGETQBIT(QsV,2*IDX+i0), IN);
+    },
+    ()
+)
+DEF_MACRO(fVLOG_VTCM_WORDQ, (EA,OFFSET,IN,IDX,Q,LEN),
+    " if ( (RtV <= EA <= RtV + LEN) & Q) *EA = IN.uw[IDX] ",
+    " if ( (RtV <= EA <= RtV + LEN) & Q) *EA = IN.uw[IDX] ",
+    {
+		SCATTER_FUNCTION (EA,OFFSET,IDX, LEN, 4, IDX, fGETQBIT(QsV,4*IDX+i0), IN);
+    },
+    ()
+)
+
+
+
+
+
+DEF_MACRO(fVLOG_VTCM_HALFWORD_DV, (EA,OFFSET,IN,IDX,IDX2,IDX_H, LEN),
+    "if (RtV <= EA <= RtV + LEN) *EA = IN.hw[IDX2].uh[IDX_H] ",
+    "if (RtV <= EA <= RtV + LEN) *EA = IN.hw[IDX2].uh[IDX_H] ",
+    {
+		SCATTER_FUNCTION (EA,OFFSET,IDX, LEN, 2, (2*IDX2+IDX_H), 1, IN);
+    },
+    ()
+)
+
+DEF_MACRO(fVLOG_VTCM_HALFWORDQ_DV, (EA,OFFSET,IN,IDX,Q,IDX2,IDX_H, LEN),
+    " if ( (RtV <= EA <= RtV + LEN) & Q) *EA = IN.hw[IDX2].uh[IDX_H] ",
+    " if ( (RtV <= EA <= RtV + LEN) & Q) *EA = IN.hw[IDX2].uh[IDX_H] ",
+    {
+		SCATTER_FUNCTION (EA,OFFSET,IDX, LEN, 2, (2*IDX2+IDX_H), fGETQBIT(QsV,2*IDX+i0), IN);
+    },
+    ()
+)
+
+
+
+
+
+
+DEF_MACRO(fSTORERELEASE, (EA,TYPE),
+	"char* addr = EA&~(ALIGNMENT-1); Zero Byte Store Release (Non-blocking Sync)",
+	"Zero Byte Store Release (Sync)",
+    {
+        fV_AL_CHECK(EA,fVECSIZE()-1);
+
+        mem_store_release(thread, insn, fVECSIZE(), EA&~(fVECSIZE()-1), EA, TYPE, fUSE_LOOKUP_ADDRESS_BY_REV(thread->processor_ptr));
+    },
+	(A_STORE,A_MEMLIKE,A_RESTRICT_SINGLE_MEM_FIRST)
+)
+
+DEF_MACRO(fVFETCH_AL, (EA),
+    "Prefetch vector into L2 cache at EA",
+    "Prefetch vector into L2 cache at EA",
+    {
+    fV_AL_CHECK(EA,fVECSIZE()-1);
+    mem_fetch_vector(thread, insn, EA&~(fVECSIZE()-1), insn->slot, fVECSIZE());
+    },
+    (A_LOAD,A_MEMLIKE,A_RESTRICT_SINGLE_MEM_FIRST,A_RESTRICT_NOSLOT1_STORE)
+)
+
+
+DEF_MACRO(fLOADMMV_AL, (EA, ALIGNMENT, LEN, DST),
+    "char* addr = EA&~(ALIGNMENT-1); for (i=0; i<LEN; ++i) DST[i] = addr[i]",
+    "Load LEN bytes from memory at EA (forced alignment) to DST.",
+    {
+    fV_AL_CHECK(EA,ALIGNMENT-1);
+	thread->last_pkt->double_access_vec = 0;
+    mem_load_vector_oddva(thread, insn, EA&~(ALIGNMENT-1), EA, insn->slot, LEN, &DST.ub[0], LEN, fUSE_LOOKUP_ADDRESS_BY_REV(thread->processor_ptr));
+    },
+    (A_LOAD,A_MEMLIKE,A_RESTRICT_SINGLE_MEM_FIRST,A_RESTRICT_NOSLOT1_STORE)
+)
+
+DEF_MACRO(fLOADMMV, (EA, DST),
+	"DST = *(EA&~(ALIGNMENT-1))",
+	"Load vector from memory at EA (forced alignment) to DST.",
+	fLOADMMV_AL(EA,fVECSIZE(),fVECSIZE(),DST),
+	()
+)
+
+DEF_MACRO(fLOADMMVQ, (EA,DST,QVAL),
+	"DST = vmux(QVAL,*(EA&~(ALIGNMENT-1)),0)",
+	"Load vector from memory at EA (forced alignment) to DST.",
+	do {
+		int __i;
+		fLOADMMV_AL(EA,fVECSIZE(),fVECSIZE(),DST);
+		fVFOREACH(8,__i) if (!fGETQBIT(QVAL,__i)) DST.b[__i] = 0;
+	} while (0),
+	()
+)
+
+DEF_MACRO(fLOADMMVNQ, (EA,DST,QVAL),
+	"DST = vmux(QVAL,0,*(EA&~(ALIGNMENT-1)))",
+	"Load vector from memory at EA (forced alignment) to DST.",
+	do {
+		int __i;
+		fLOADMMV_AL(EA,fVECSIZE(),fVECSIZE(),DST);
+		fVFOREACH(8,__i) if (fGETQBIT(QVAL,__i)) DST.b[__i] = 0;
+	} while (0),
+	()
+)
+
+DEF_MACRO(fLOADMMVU_AL, (EA, ALIGNMENT, LEN, DST),
+    "char* addr = EA; for (i=0; i<LEN; ++i) DST[i] = addr[i]",
+    "Load LEN bytes from memory at EA (unaligned) to DST.",
+    {
+    size4u_t size2 = (EA)&(ALIGNMENT-1);
+    size4u_t size1 = LEN-size2;
+	thread->last_pkt->double_access_vec = 1;
+    mem_load_vector_oddva(thread, insn, EA+size1, EA+fVECSIZE(), /* slot */ 1, size2, &DST.ub[size1], size2, fUSE_LOOKUP_ADDRESS());
+    mem_load_vector_oddva(thread, insn, EA, EA,/* slot */ 0, size1, &DST.ub[0], size1, fUSE_LOOKUP_ADDRESS_BY_REV(thread->processor_ptr));
+    },
+    (A_LOAD,A_MEMLIKE,A_RESTRICT_SINGLE_MEM_FIRST,A_RESTRICT_NOSLOT1_STORE)
+)
+
+DEF_MACRO(fLOADMMVU, (EA, DST),
+	"DST = *EA",
+	"Load vector from memory at EA (unaligned) to DST.",
+	{
+		/* if address happens to be aligned, only do aligned load */
+        thread->last_pkt->pkt_has_vtcm_access = 0;
+        thread->last_pkt->pkt_access_count = 0;
+		if ( (EA & (fVECSIZE()-1)) == 0) {
+            thread->last_pkt->pkt_has_vmemu_access = 0;
+			thread->last_pkt->double_access = 0;
+
+			fLOADMMV_AL(EA,fVECSIZE(),fVECSIZE(),DST);
+		} else {
+            thread->last_pkt->pkt_has_vmemu_access = 1;
+			thread->last_pkt->double_access = 1;
+
+			fLOADMMVU_AL(EA,fVECSIZE(),fVECSIZE(),DST);
+		}
+	},
+	()
+)
+
+DEF_MACRO(fSTOREMMV_AL, (EA, ALIGNMENT, LEN, SRC),
+    "char* addr = EA&~(ALIGNMENT-1); for (i=0; i<LEN; ++i) addr[i] = SRC[i]",
+    "Store LEN bytes from SRC into memory at EA (forced alignment).",
+    {
+    fV_AL_CHECK(EA,ALIGNMENT-1);
+    mem_store_vector_oddva(thread, insn, EA&~(ALIGNMENT-1), EA, insn->slot, LEN, &SRC.ub[0], 0, 0, fUSE_LOOKUP_ADDRESS_BY_REV(thread->processor_ptr));
+    },
+    (A_STORE,A_MEMLIKE,A_RESTRICT_SINGLE_MEM_FIRST)
+)
+
+DEF_MACRO(fSTOREMMV, (EA, SRC),
+	"*(EA&~(ALIGNMENT-1)) = SRC",
+	"Store vector SRC to memory at EA (unaligned).",
+	fSTOREMMV_AL(EA,fVECSIZE(),fVECSIZE(),SRC),
+	()
+)
+
+DEF_MACRO(fSTOREMMVQ_AL, (EA, ALIGNMENT, LEN, SRC, MASK),
+    "char* addr = EA&~(ALIGNMENT-1); for (i=0; i<LEN; ++i) if (MASK[i]) addr[i] = SRC[i]",
+    "Store LEN bytes from SRC into memory at EA (forced alignment).",
+    do {
+	mmvector_t maskvec;
+	int i;
+	for (i = 0; i < fVECSIZE(); i++) maskvec.ub[i] = fGETQBIT(MASK,i);
+	mem_store_vector_oddva(thread, insn, EA&~(ALIGNMENT-1), EA, insn->slot, LEN, &SRC.ub[0], &maskvec.ub[0], 0, fUSE_LOOKUP_ADDRESS_BY_REV(thread->processor_ptr));
+    } while (0),
+    (A_STORE,A_MEMLIKE,A_RESTRICT_SINGLE_MEM_FIRST)
+)
+
+DEF_MACRO(fSTOREMMVQ, (EA, SRC, MASK),
+	"*(EA&~(ALIGNMENT-1)) = SRC",
+	"Masked store vector SRC to memory at EA (forced alignment).",
+	fSTOREMMVQ_AL(EA,fVECSIZE(),fVECSIZE(),SRC,MASK),
+	()
+)
+
+DEF_MACRO(fSTOREMMVNQ_AL, (EA, ALIGNMENT, LEN, SRC, MASK),
+    "char* addr = EA&~(ALIGNMENT-1); for (i=0; i<LEN; ++i) if (!MASK[i]) addr[i] = SRC[i]",
+    "Store LEN bytes from SRC into memory at EA (forced alignment).",
+    {
+	mmvector_t maskvec;
+	int i;
+	for (i = 0; i < fVECSIZE(); i++) maskvec.ub[i] = fGETQBIT(MASK,i);
+        fV_AL_CHECK(EA,ALIGNMENT-1);
+	mem_store_vector_oddva(thread, insn, EA&~(ALIGNMENT-1), EA, insn->slot, LEN, &SRC.ub[0], &maskvec.ub[0], 1, fUSE_LOOKUP_ADDRESS_BY_REV(thread->processor_ptr));
+    },
+    (A_STORE,A_MEMLIKE,A_RESTRICT_SINGLE_MEM_FIRST)
+)
+
+DEF_MACRO(fSTOREMMVNQ, (EA, SRC, MASK),
+	"*(EA&~(ALIGNMENT-1)) = SRC",
+	"Masked negated store vector SRC to memory at EA (forced alignment).",
+	fSTOREMMVNQ_AL(EA,fVECSIZE(),fVECSIZE(),SRC,MASK),
+	()
+)
+
+DEF_MACRO(fSTOREMMVU_AL, (EA, ALIGNMENT, LEN, SRC),
+    "char* addr = EA; for (i=0; i<LEN; ++i) addr[i] = SRC[i]",
+    "Store LEN bytes from SRC into memory at EA (unaligned).",
+    {
+    size4u_t size1 = ALIGNMENT-((EA)&(ALIGNMENT-1));
+    size4u_t size2;
+    if (size1>LEN) size1 = LEN;
+    size2 = LEN-size1;
+    mem_store_vector_oddva(thread, insn, EA+size1, EA+fVECSIZE(), /* slot */ 1, size2, &SRC.ub[size1], 0, 0, fUSE_LOOKUP_ADDRESS());
+    mem_store_vector_oddva(thread, insn, EA, EA, /* slot */ 0, size1, &SRC.ub[0], 0, 0, fUSE_LOOKUP_ADDRESS_BY_REV(thread->processor_ptr));
+    },
+    (A_STORE,A_MEMLIKE,A_RESTRICT_SINGLE_MEM_FIRST)
+)
+
+DEF_MACRO(fSTOREMMVU, (EA, SRC),
+	"*EA = SRC",
+	"Store vector SRC to memory at EA (unaligned).",
+	{
+        thread->last_pkt->pkt_has_vtcm_access = 0;
+        thread->last_pkt->pkt_access_count = 0;
+		if ( (EA & (fVECSIZE()-1)) == 0) {
+			thread->last_pkt->double_access = 0;
+			fSTOREMMV_AL(EA,fVECSIZE(),fVECSIZE(),SRC);
+		} else {
+			thread->last_pkt->double_access = 1;
+            thread->last_pkt->pkt_has_vmemu_access = 1;
+			fSTOREMMVU_AL(EA,fVECSIZE(),fVECSIZE(),SRC);
+		}
+	},
+	()
+)
+
+DEF_MACRO(fSTOREMMVQU_AL, (EA, ALIGNMENT, LEN, SRC, MASK),
+    "char* addr = EA; for (i=0; i<LEN; ++i) if (MASK[i]) addr[i] = SRC[i]",
+    "Store LEN bytes from SRC into memory at EA (unaligned).",
+    {
+	size4u_t size1 = ALIGNMENT-((EA)&(ALIGNMENT-1));
+	size4u_t size2;
+	mmvector_t maskvec;
+	int i;
+	for (i = 0; i < fVECSIZE(); i++) maskvec.ub[i] = fGETQBIT(MASK,i);
+	if (size1>LEN) size1 = LEN;
+	size2 = LEN-size1;
+	mem_store_vector_oddva(thread, insn, EA+size1, EA+fVECSIZE(),/* slot */ 1, size2, &SRC.ub[size1], &maskvec.ub[size1], 0, fUSE_LOOKUP_ADDRESS());
+	mem_store_vector_oddva(thread, insn, EA, /* slot */ 0, size1, &SRC.ub[0], &maskvec.ub[0], 0, fUSE_LOOKUP_ADDRESS_BY_REV(thread->processor_ptr));
+    },
+    (A_STORE,A_MEMLIKE,A_RESTRICT_SINGLE_MEM_FIRST)
+)
+
+DEF_MACRO(fSTOREMMVQU, (EA, SRC, MASK),
+	"*EA = SRC",
+	"Store vector SRC to memory at EA (unaligned).",
+	{
+        thread->last_pkt->pkt_has_vtcm_access = 0;
+        thread->last_pkt->pkt_access_count = 0;
+		if ( (EA & (fVECSIZE()-1)) == 0) {
+			thread->last_pkt->double_access = 0;
+			fSTOREMMVQ_AL(EA,fVECSIZE(),fVECSIZE(),SRC,MASK);
+		} else {
+			thread->last_pkt->double_access = 1;
+            thread->last_pkt->pkt_has_vmemu_access = 1;
+			fSTOREMMVQU_AL(EA,fVECSIZE(),fVECSIZE(),SRC,MASK);
+		}
+	},
+	()
+)
+
+DEF_MACRO(fSTOREMMVNQU_AL, (EA, ALIGNMENT, LEN, SRC, MASK),
+    "char* addr = EA; for (i=0; i<LEN; ++i) if (!MASK[i]) addr[i] = SRC[i]",
+    "Store LEN bytes from SRC into memory at EA (unaligned).",
+    {
+	size4u_t size1 = ALIGNMENT-((EA)&(ALIGNMENT-1));
+	size4u_t size2;
+	mmvector_t maskvec;
+	int i;
+	for (i = 0; i < fVECSIZE(); i++) maskvec.ub[i] = fGETQBIT(MASK,i);
+	if (size1>LEN) size1 = LEN;
+	size2 = LEN-size1;
+	mem_store_vector_oddva(thread, insn, EA+size1, EA+fVECSIZE(), /* slot */ 1, size2, &SRC.ub[size1], &maskvec.ub[size1], 1, fUSE_LOOKUP_ADDRESS());
+	mem_store_vector_oddva(thread, insn, EA, EA, /* slot */ 0, size1, &SRC.ub[0], &maskvec.ub[0], 1, fUSE_LOOKUP_ADDRESS_BY_REV(thread->processor_ptr));
+    },
+    (A_STORE,A_MEMLIKE,A_RESTRICT_SINGLE_MEM_FIRST)
+)
+
+DEF_MACRO(fSTOREMMVNQU, (EA, SRC, MASK),
+	"*EA = SRC",
+	"Store vector SRC to memory at EA (unaligned).",
+	{
+        thread->last_pkt->pkt_has_vtcm_access = 0;
+        thread->last_pkt->pkt_access_count = 0;
+		if ( (EA & (fVECSIZE()-1)) == 0) {
+			thread->last_pkt->double_access = 0;
+			fSTOREMMVNQ_AL(EA,fVECSIZE(),fVECSIZE(),SRC,MASK);
+		} else {
+			thread->last_pkt->double_access = 1;
+            thread->last_pkt->pkt_has_vmemu_access = 1;
+			fSTOREMMVNQU_AL(EA,fVECSIZE(),fVECSIZE(),SRC,MASK);
+		}
+	},
+	()
+)
+
+
+
+
+DEF_MACRO(fVFOREACH,(WIDTH, VAR),
+    "for (VAR = 0; VAR < VELEM(WIDTH); VAR++)",
+    "For VAR in each WIDTH-bit vector index",
+    for (VAR = 0; VAR < fVELEM(WIDTH); VAR++),
+    /* NOTHING */
+)
+
+DEF_MACRO(fVARRAY_ELEMENT_ACCESS, (ARRAY, TYPE, INDEX),
+    "ARRAY.TYPE[INDEX]",
+    "Access element of type TYPE at position INDEX of flattened ARRAY",
+    ARRAY.v[(INDEX) / (fVECSIZE()/(sizeof(ARRAY.TYPE[0])))].TYPE[(INDEX) % (fVECSIZE()/(sizeof(ARRAY.TYPE[0])))],
+    ()
+)
+
+DEF_MACRO(fVNEWCANCEL,(REGNUM),
+	"Ignore current value for register REGNUM",
+	"Ignore current value for register REGNUM",
+	do { THREAD2STRUCT->VRegs_select &= ~(1<<(REGNUM)); } while (0),
+	()
+)
+
+DEF_MACRO(fTMPVDATA,(),
+	"Data from .tmp load",
+	"Data from .tmp load and clear tmp status",
+	mmvec_vtmp_data(thread),
+	(A_CVI,A_CVI_REQUIRES_TMPLOAD)
+)
+
+DEF_MACRO(fVSATDW, (U,V),
+    "usat_32(U:V)",
+    "Use 32-bits of U as MSW and 32-bits of V as LSW and saturate the resultant 64-bits to 32 bits",
+    fVSATW( ( ( ((long long)U)<<32 ) | fZXTN(32,64,V) ) ),
+    /* attribs */
+)
+
+DEF_MACRO(fVASL_SATHI, (U,V),
+    "uasl_sathi(U:V)",
+    "Use 32-bits of U as MSW and 32-bits of V as LSW, left shift by 1 and saturate the result and take high word",
+    fVSATW(((U)<<1) | ((V)>>31)),
+    /* attribs */
+)
+
+DEF_MACRO(fVUADDSAT,(WIDTH,U,V),
+	"usat_##WIDTH(U+V)",
+	"Add WIDTH-bit values U and V with saturation",
+	fVSATUN( WIDTH, fZXTN(WIDTH, 2*WIDTH, U)  + fZXTN(WIDTH, 2*WIDTH, V)),
+	/* attribs */
+)
+
+DEF_MACRO(fVSADDSAT,(WIDTH,U,V),
+	"sat_##WIDTH(U+V)",
+	"Add WIDTH-bit values U and V with saturation",
+	fVSATN(  WIDTH, fSXTN(WIDTH, 2*WIDTH, U)  + fSXTN(WIDTH, 2*WIDTH, V)),
+	/* attribs */
+)
+
+DEF_MACRO(fVUSUBSAT,(WIDTH,U,V),
+	"usat_##WIDTH(U-V)",
+	"sub WIDTH-bit values U and V with saturation",
+	fVSATUN( WIDTH, fZXTN(WIDTH, 2*WIDTH, U)  - fZXTN(WIDTH, 2*WIDTH, V)),
+	/* attribs */
+)
+
+DEF_MACRO(fVSSUBSAT,(WIDTH,U,V),
+	"sat_##WIDTH(U-V)",
+	"sub WIDTH-bit values U and V with saturation",
+	fVSATN(  WIDTH, fSXTN(WIDTH, 2*WIDTH, U)  - fSXTN(WIDTH, 2*WIDTH, V)),
+	/* attribs */
+)
+
+DEF_MACRO(fVAVGU,(WIDTH,U,V),
+	"(U+V)/2",
+	"average WIDTH-bit values U and V with saturation",
+	((fZXTN(WIDTH, 2*WIDTH, U) + fZXTN(WIDTH, 2*WIDTH, V))>>1),
+	/* attribs */
+)
+
+DEF_MACRO(fVAVGURND,(WIDTH,U,V),
+	"(U+V+1)/2",
+	"average WIDTH-bit values U and V with saturation",
+	((fZXTN(WIDTH, 2*WIDTH, U) + fZXTN(WIDTH, 2*WIDTH, V)+1)>>1),
+	/* attribs */
+)
+
+DEF_MACRO(fVNAVGU,(WIDTH,U,V),
+	"(U-V)/2",
+	"average WIDTH-bit values U and V with saturation",
+	((fZXTN(WIDTH, 2*WIDTH, U) - fZXTN(WIDTH, 2*WIDTH, V))>>1),
+	/* attribs */
+)
+
+DEF_MACRO(fVNAVGURNDSAT,(WIDTH,U,V),
+	"(U-V+1)/2",
+	"average WIDTH-bit values U and V with saturation",
+	fVSATUN(WIDTH,((fZXTN(WIDTH, 2*WIDTH, U) - fZXTN(WIDTH, 2*WIDTH, V)+1)>>1)),
+	/* attribs */
+)
+
+DEF_MACRO(fVAVGS,(WIDTH,U,V),
+	"(U+V)/2",
+	"average WIDTH-bit values U and V with saturation",
+	((fSXTN(WIDTH, 2*WIDTH, U) + fSXTN(WIDTH, 2*WIDTH, V))>>1),
+	/* attribs */
+)
+
+DEF_MACRO(fVAVGSRND,(WIDTH,U,V),
+	"(U+V+1)/2",
+	"average WIDTH-bit values U and V with saturation",
+	((fSXTN(WIDTH, 2*WIDTH, U) + fSXTN(WIDTH, 2*WIDTH, V)+1)>>1),
+	/* attribs */
+)
+
+DEF_MACRO(fVNAVGS,(WIDTH,U,V),
+	"(U-V)/2",
+	"average WIDTH-bit values U and V with saturation",
+	((fSXTN(WIDTH, 2*WIDTH, U) - fSXTN(WIDTH, 2*WIDTH, V))>>1),
+	/* attribs */
+)
+
+DEF_MACRO(fVNAVGSRND,(WIDTH,U,V),
+	"(U-V+1)/2",
+	"average WIDTH-bit values U and negative V followed by rounding",
+	((fSXTN(WIDTH, 2*WIDTH, U) - fSXTN(WIDTH, 2*WIDTH, V)+1)>>1),
+	/* attribs */
+)
+
+DEF_MACRO(fVNAVGSRNDSAT,(WIDTH,U,V),
+	"(U-V+1)/2",
+	"average WIDTH-bit values U and V with saturation",
+	fVSATN(WIDTH,((fSXTN(WIDTH, 2*WIDTH, U) - fSXTN(WIDTH, 2*WIDTH, V)+1)>>1)),
+	/* attribs */
+)
+
+
+DEF_MACRO(fVNOROUND,(VAL,SHAMT),
+	"VAL",
+	"VAL",
+	VAL,
+	/* NOTHING */
+)
+DEF_MACRO(fVNOSAT,(VAL),
+	"VAL",
+	"VAL",
+	VAL,
+	/* NOTHING */
+)
+
+DEF_MACRO(fVROUND,(VAL,SHAMT),
+	"VAL + (1<<(SHAMT-1))",
+	"VAL + RNDBIT",
+	((VAL) + (((SHAMT)>0)?(1LL<<((SHAMT)-1)):0)),
+	/* NOTHING */
+)
+
+DEF_MACRO(fCARRY_FROM_ADD32,(A,B,C),
+	"carry_from(A,B,C)",
+	"carry_from(A,B,C)",
+	(((fZXTN(32,64,A)+fZXTN(32,64,B)+C) >> 32) & 1),
+	/* NOTHING */
+)
+
+DEF_MACRO(fUARCH_NOTE_PUMP_4X,(),
+	"",
+	"",
+	,
+	(A_CVI_PUMP_4X)
+)
+
+DEF_MACRO(fUARCH_NOTE_PUMP_2X,(),
+	"",
+	"",
+	,
+	(A_CVI_PUMP_2X)
+)
+
+DEF_MACRO(fVDOCHKPAGECROSS,(BASE,SUM),
+	"",
+	"",
+	if (UNLIKELY(thread->timing_on)) {
+		thread->mem_access[slot].check_page_crosses = 1;
+		thread->mem_access[slot].page_cross_base = BASE;
+		thread->mem_access[slot].page_cross_sum = SUM;
+	},
+	(A_EA_PAGECROSS)
+)
+
+
+
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 59/67] Hexagon HVX semantics generator
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (57 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 58/67] Hexagon HVX import macro definitions Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-02-28 16:43 ` [RFC PATCH v2 60/67] Hexagon HVX instruction decoding Taylor Simpson
                   ` (8 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Add HVX support to the semantics generator

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/gen_semantics.c |   9 +++
 target/hexagon/do_qemu.py      | 175 ++++++++++++++++++++++++++++++++++++++---
 2 files changed, 171 insertions(+), 13 deletions(-)

diff --git a/target/hexagon/gen_semantics.c b/target/hexagon/gen_semantics.c
index 65d04fa..f5ee3ae 100644
--- a/target/hexagon/gen_semantics.c
+++ b/target/hexagon/gen_semantics.c
@@ -87,6 +87,15 @@ int main(int argc, char *argv[])
 #include "imported/macros.def"
 #undef DEF_MACRO
 
+/*
+ * Process the macros for HVX
+ */
+#define DEF_MACRO(MNAME, PARAMS, SDESC, LDESC, BEH, ATTRS) \
+    fprintf(outfile, "MACROATTRIB(\"%s\",\"\"\"%s\"\"\",\"%s\",\"%s\")\n", \
+            #MNAME, STRINGIZE(BEH), STRINGIZE(ATTRS), EXTSTR);
+#include "imported/allext_macros.def"
+#undef DEF_MACRO
+
     fclose(outfile);
     return 0;
 }
diff --git a/target/hexagon/do_qemu.py b/target/hexagon/do_qemu.py
index 32543c8..1306e6e 100755
--- a/target/hexagon/do_qemu.py
+++ b/target/hexagon/do_qemu.py
@@ -107,6 +107,16 @@ def SEMANTICS(tag, beh, sem):
     attribdict[tag] = set()
     tags.append(tag)        # dicts have no order, this is for order
 
+def EXT_SEMANTICS(ext, tag, beh, sem):
+    #print tag,beh,sem
+    extnames[ext] = True
+    extdict[tag] = ext
+    behdict[tag] = beh
+    semdict[tag] = sem
+    attribdict[tag] = set()
+    tags.append(tag)        # dicts have no order, this is for order
+
+
 def ATTRIBUTES(tag, attribstring):
     attribstring = \
         attribstring.replace("ATTRIBS","").replace("(","").replace(")","")
@@ -170,6 +180,9 @@ def compute_tag_immediates(tag):
 ##          P                predicate register
 ##          R                GPR register
 ##          M                modifier register
+##          Q                HVX predicate vector
+##          V                HVX vector register
+##          O                HVX new vector register
 ##      regid can be one of the following
 ##          d, e             destination register
 ##          dd               destination register pair
@@ -201,6 +214,9 @@ def is_readwrite(regid):
 def is_scalar_reg(regtype):
     return regtype in "RPC"
 
+def is_hvx_reg(regtype):
+    return regtype in "VQ"
+
 def is_old_val(regtype, regid, tag):
     return regtype+regid+'V' in semdict[tag]
 
@@ -211,7 +227,8 @@ tagimms = dict(zip(tags, list(map(compute_tag_immediates, tags))))
 
 def need_slot(tag):
     if ('A_CONDEXEC' in attribdict[tag] or
-        'A_STORE' in attribdict[tag]):
+        'A_STORE' in attribdict[tag] or
+        'A_CVI' in attribdict[tag]):
         return 1
     else:
         return 0
@@ -297,19 +314,32 @@ def gen_helper_prototype(f, tag, regs, imms):
             f.write('DEF_HELPER_%s(%s' % (def_helper_size, tag))
 
         ## Generate the qemu DEF_HELPER type for each result
+        ## Iterate over this list twice
+        ## - Emit the scalar result
+        ## - Emit the vector result
         i=0
         for regtype,regid,toss,numregs in regs:
             if (is_written(regid)):
-                gen_def_helper_opn(f, tag, regtype, regid, toss, numregs, i)
+                if (not is_hvx_reg(regtype)):
+                    gen_def_helper_opn(f, tag, regtype, regid, toss, numregs, i)
                 i += 1
 
         ## Put the env between the outputs and inputs
         f.write(', env' )
         i += 1
 
+        # Second pass
+        for regtype,regid,toss,numregs in regs:
+            if (is_written(regid)):
+                if (is_hvx_reg(regtype)):
+                    gen_def_helper_opn(f, tag, regtype, regid, toss, numregs, i)
+                    i += 1
+
         ## Generate the qemu type for each input operand (regs and immediates)
         for regtype,regid,toss,numregs in regs:
             if (is_read(regid)):
+                if (is_hvx_reg(regtype) and is_readwrite(regid)):
+                    continue
                 gen_def_helper_opn(f, tag, regtype, regid, toss, numregs, i)
                 i += 1
         for immlett,bits,immshift in imms:
@@ -437,11 +467,33 @@ def genptr_dst_write(f,regtype, regid):
     macro = "WRITE_%sREG_%s" % (regtype, regid)
     f.write("%s(%s%sN, %s%sV);\n" % (macro, regtype, regid, regtype, regid))
 
+def genptr_dst_write_ext(f, regtype, regid, newv="0"):
+    macro = "WRITE_%sREG_%s" % (regtype, regid)
+    f.write("%s(%s%sN, %s%sV,%s);\n" % \
+            (macro, regtype, regid, regtype, regid, newv))
+
 def genptr_dst_write_opn(f,regtype, regid, tag):
     if (is_pair(regid)):
-        genptr_dst_write(f, regtype, regid)
+        if (is_hvx_reg(regtype)):
+            if ('A_CVI_TMP' in attribdict[tag] or
+                'A_CVI_TMP_DST' in attribdict[tag]):
+                genptr_dst_write_ext(f, regtype, regid, "EXT_TMP")
+            else:
+                genptr_dst_write_ext(f, regtype, regid)
+        else:
+            genptr_dst_write(f, regtype, regid)
     elif (is_single(regid)):
-        genptr_dst_write(f, regtype, regid)
+        if (is_hvx_reg(regtype)):
+            if 'A_CVI_NEW' in attribdict[tag]:
+                genptr_dst_write_ext(f, regtype, regid, "EXT_NEW")
+            elif 'A_CVI_TMP' in attribdict[tag]:
+                genptr_dst_write_ext(f, regtype, regid, "EXT_TMP")
+            elif 'A_CVI_TMP_DST' in attribdict[tag]:
+                genptr_dst_write_ext(f, regtype, regid, "EXT_TMP")
+            else:
+                genptr_dst_write_ext(f, regtype, regid, "EXT_DFL")
+        else:
+            genptr_dst_write(f, regtype, regid)
     else:
         print("Bad register parse: ",regtype,regid,toss,numregs)
 
@@ -504,13 +556,23 @@ def gen_tcg_func(f, tag, regs, imms):
     ## If there is a scalar result, it is the return type
     for regtype,regid,toss,numregs in regs:
         if (is_written(regid)):
+            if (is_hvx_reg(regtype)):
+                continue
             gen_helper_call_opn(f, tag, regtype, regid, toss, numregs, i)
             i += 1
     if (i > 0): f.write(", ")
     f.write("cpu_env")
     i=1
     for regtype,regid,toss,numregs in regs:
+        if (is_written(regid)):
+            if (not is_hvx_reg(regtype)):
+                continue
+            gen_helper_call_opn(f, tag, regtype, regid, toss, numregs, i)
+            i += 1
+    for regtype,regid,toss,numregs in regs:
         if (is_read(regid)):
+            if (is_hvx_reg(regtype) and is_readwrite(regid)):
+                continue
             gen_helper_call_opn(f, tag, regtype, regid, toss, numregs, i)
             i += 1
     for immlett,bits,immshift in imms:
@@ -573,12 +635,26 @@ def gen_helper_arg_pair(f,regtype,regid,regno):
     if regno >= 0 : f.write(", ")
     f.write("int64_t %s%sV" % (regtype,regid))
 
+def gen_helper_arg_ext(f,regtype,regid,regno):
+    if regno > 0 : f.write(", ")
+    f.write("void *%s%sV_void" % (regtype,regid))
+
+def gen_helper_arg_ext_pair(f,regtype,regid,regno):
+    if regno > 0 : f.write(", ")
+    f.write("void *%s%sV_void" % (regtype,regid))
+
 def gen_helper_arg_opn(f,regtype,regid,i):
     if (is_pair(regid)):
-        gen_helper_arg_pair(f,regtype,regid,i)
+        if (is_hvx_reg(regtype)):
+            gen_helper_arg_ext_pair(f,regtype,regid,i)
+        else:
+            gen_helper_arg_pair(f,regtype,regid,i)
     elif (is_single(regid)):
         if is_old_val(regtype, regid, tag):
-            gen_helper_arg(f,regtype,regid,i)
+            if (is_hvx_reg(regtype)):
+                gen_helper_arg_ext(f,regtype,regid,i)
+            else:
+                gen_helper_arg(f,regtype,regid,i)
         elif is_new_val(regtype, regid, tag):
             gen_helper_arg_new(f,regtype,regid,i)
         else:
@@ -597,25 +673,61 @@ def gen_helper_dest_decl_pair(f,regtype,regid,regno,subfield=""):
     f.write("int64_t %s%sV%s = 0;\n" % \
         (regtype,regid,subfield))
 
+def gen_helper_dest_decl_ext(f,regtype,regid):
+    f.write("/* %s%sV is *(mmvector_t*)(%s%sV_void) */\n" % \
+        (regtype,regid,regtype,regid))
+
+def gen_helper_dest_decl_ext_pair(f,regtype,regid,regno):
+    f.write("/* %s%sV is *(mmvector_pair_t*))%s%sV_void) */\n" % \
+        (regtype,regid,regtype, regid))
+
 def gen_helper_dest_decl_opn(f,regtype,regid,i):
     if (is_pair(regid)):
-        gen_helper_dest_decl_pair(f,regtype,regid,i)
+        if (is_hvx_reg(regtype)):
+            gen_helper_dest_decl_ext_pair(f,regtype,regid, i)
+        else:
+            gen_helper_dest_decl_pair(f,regtype,regid,i)
     elif (is_single(regid)):
-        gen_helper_dest_decl(f,regtype,regid,i)
+        if (is_hvx_reg(regtype)):
+            gen_helper_dest_decl_ext(f,regtype,regid)
+        else:
+            gen_helper_dest_decl(f,regtype,regid,i)
     else:
         print("Bad register parse: ",regtype,regid,toss,numregs)
 
+def gen_helper_src_var_ext(f,regtype,regid):
+    f.write("/* %s%sV is *(mmvector_t*)(%s%sV_void) */\n" % \
+        (regtype,regid,regtype,regid))
+
+def gen_helper_src_var_ext_pair(f,regtype,regid,regno):
+    f.write("/* %s%sV%s is *(mmvector_pair_t*)(%s%sV%s_void) */\n" % \
+        (regtype,regid,regno,regtype,regid,regno))
+
 def gen_helper_return(f,regtype,regid,regno):
     f.write("return %s%sV;\n" % (regtype,regid))
 
 def gen_helper_return_pair(f,regtype,regid,regno):
     f.write("return %s%sV;\n" % (regtype,regid))
 
+def gen_helper_dst_write_ext(f,regtype,regid):
+    f.write("/* %s%sV is *(mmvector_t*)%s%sV_void */\n" % \
+        (regtype,regid,regtype,regid))
+
+def gen_helper_dst_write_ext_pair(f,regtype,regid):
+    f.write("/* %s%sV is *(mmvector_pair_t*)%s%sV_void */\n" % \
+        (regtype,regid, regtype,regid))
+
 def gen_helper_return_opn(f, regtype, regid, i):
     if (is_pair(regid)):
-        gen_helper_return_pair(f,regtype,regid,i)
+        if (is_hvx_reg(regtype)):
+            gen_helper_dst_write_ext_pair(f,regtype,regid)
+        else:
+            gen_helper_return_pair(f,regtype,regid,i)
     elif (is_single(regid)):
-        gen_helper_return(f,regtype,regid,i)
+        if (is_hvx_reg(regtype)):
+            gen_helper_dst_write_ext(f,regtype,regid)
+        else:
+            gen_helper_return(f,regtype,regid,i)
     else:
         print("Bad register parse: ",regtype,regid,toss,numregs)
 
@@ -654,14 +766,20 @@ def gen_helper_definition(f, tag, regs, imms):
                 % (tag, tag))
     else:
         ## The return type of the function is the type of the destination
-        ## register
+        ## register (if scalar)
         i=0
         for regtype,regid,toss,numregs in regs:
             if (is_written(regid)):
                 if (is_pair(regid)):
-                    gen_helper_return_type_pair(f,regtype,regid,i)
+                    if (is_hvx_reg(regtype)):
+                        continue
+                    else:
+                        gen_helper_return_type_pair(f,regtype,regid,i)
                 elif (is_single(regid)):
-                    gen_helper_return_type(f,regtype,regid,i)
+                    if (is_hvx_reg(regtype)):
+                            continue
+                    else:
+                        gen_helper_return_type(f,regtype,regid,i)
                 else:
                     print("Bad register parse: ",regtype,regid,toss,numregs)
             i += 1
@@ -670,10 +788,30 @@ def gen_helper_definition(f, tag, regs, imms):
             f.write("void")
         f.write(" HELPER(%s)(CPUHexagonState *env" % tag)
 
+        ## Arguments include the vector destination operands
         i = 1
+        for regtype,regid,toss,numregs in regs:
+            if (is_written(regid)):
+                if (is_pair(regid)):
+                    if (is_hvx_reg(regtype)):
+                        gen_helper_arg_ext_pair(f,regtype,regid,i)
+                    else:
+                        continue
+                elif (is_single(regid)):
+                    if (is_hvx_reg(regtype)):
+                        gen_helper_arg_ext(f,regtype,regid,i)
+                    else:
+                        # This is the return value of the function
+                        continue
+                else:
+                    print("Bad register parse: ",regtype,regid,toss,numregs)
+                i += 1
+
         ## Arguments to the helper function are the source regs and immediates
         for regtype,regid,toss,numregs in regs:
             if (is_read(regid)):
+                if (is_hvx_reg(regtype) and is_readwrite(regid)):
+                    continue
                 gen_helper_arg_opn(f,regtype,regid,i)
                 i += 1
         for immlett,bits,immshift in imms:
@@ -696,6 +834,17 @@ def gen_helper_definition(f, tag, regs, imms):
                 gen_helper_dest_decl_opn(f,regtype,regid,i)
             i += 1
 
+        for regtype,regid,toss,numregs in regs:
+            if (is_read(regid)):
+                if (is_pair(regid)):
+                    if (is_hvx_reg(regtype)):
+                        gen_helper_src_var_ext_pair(f,regtype,regid,i)
+                elif (is_single(regid)):
+                    if (is_hvx_reg(regtype)):
+                        gen_helper_src_var_ext(f,regtype,regid)
+                else:
+                    print("Bad register parse: ",regtype,regid,toss,numregs)
+
         if 'A_FPOP' in attribdict[tag]:
             f.write('fFPOP_START();\n');
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 60/67] Hexagon HVX instruction decoding
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (58 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 59/67] Hexagon HVX semantics generator Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-02-28 16:43 ` [RFC PATCH v2 61/67] Hexagon HVX instruction utility functions Taylor Simpson
                   ` (7 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/mmvec/decode_ext_mmvec.h |  24 ++
 target/hexagon/decode.c                 |  23 +-
 target/hexagon/mmvec/decode_ext_mmvec.c | 670 ++++++++++++++++++++++++++++++++
 target/hexagon/q6v_decode.c             |  14 +
 4 files changed, 729 insertions(+), 2 deletions(-)
 create mode 100644 target/hexagon/mmvec/decode_ext_mmvec.h
 create mode 100644 target/hexagon/mmvec/decode_ext_mmvec.c

diff --git a/target/hexagon/mmvec/decode_ext_mmvec.h b/target/hexagon/mmvec/decode_ext_mmvec.h
new file mode 100644
index 0000000..67d3ead
--- /dev/null
+++ b/target/hexagon/mmvec/decode_ext_mmvec.h
@@ -0,0 +1,24 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HEXAGON_DECODE_EXT_MMVEC_H
+#define HEXAGON_DECODE_EXT_MMVEC_H
+
+extern int mmvec_ext_decode_checks(packet_t *pkt);
+extern const char *mmvec_ext_decode_find_iclass_slots(int opcode);
+
+#endif
diff --git a/target/hexagon/decode.c b/target/hexagon/decode.c
index 2201c23..fcf5e75 100644
--- a/target/hexagon/decode.c
+++ b/target/hexagon/decode.c
@@ -24,6 +24,8 @@
 #include "insn.h"
 #include "macros.h"
 #include "printinsn.h"
+#include "mmvec/mmvec.h"
+#include "mmvec/decode_ext_mmvec.h"
 
 enum {
     EXT_IDX_noext = 0,
@@ -148,6 +150,9 @@ static void decode_ext_init(void)
     for (i = EXT_IDX_noext; i < EXT_IDX_noext_AFTER; i++) {
         ext_trees[i] = &dectree_table_DECODE_EXT_EXT_noext;
     }
+    for (i = EXT_IDX_mmvec; i < EXT_IDX_mmvec_AFTER; i++) {
+        ext_trees[i] = &dectree_table_DECODE_EXT_EXT_mmvec;
+    }
 }
 
 typedef struct {
@@ -445,6 +450,9 @@ static int decode_set_insn_attr_fields(packet_t *pkt)
 
         if (GET_ATTRIB(opcode, A_STORE)) {
             pkt->insn[i].is_store = 1;
+            if (GET_ATTRIB(opcode, A_VMEM)) {
+                pkt->insn[i].is_vmem_st = 1;
+            }
 
             if (pkt->insn[i].slot == 0) {
                 pkt->pkt_has_store_s0 = 1;
@@ -457,6 +465,9 @@ static int decode_set_insn_attr_fields(packet_t *pkt)
         }
         if (GET_ATTRIB(opcode, A_LOAD)) {
             pkt->insn[i].is_load = 1;
+            if (GET_ATTRIB(opcode, A_VMEM))  {
+                pkt->insn[i].is_vmem_ld = 1;
+            }
 
             if (pkt->insn[i].slot == 0) {
                 pkt->pkt_has_load_s0 = 1;
@@ -464,6 +475,10 @@ static int decode_set_insn_attr_fields(packet_t *pkt)
                 pkt->pkt_has_load_s1 = 1;
             }
         }
+        if (GET_ATTRIB(opcode, A_CVI_GATHER) ||
+            GET_ATTRIB(opcode, A_CVI_SCATTER)) {
+            pkt->insn[i].is_scatgath = 1;
+        }
         if (GET_ATTRIB(opcode, A_MEMOP)) {
             pkt->insn[i].is_memop = 1;
         }
@@ -737,8 +752,12 @@ static int decode_remove_extenders(packet_t *packet)
 static const char *
 get_valid_slot_str(const packet_t *pkt, unsigned int slot)
 {
-    return find_iclass_slots(pkt->insn[slot].opcode,
-                             pkt->insn[slot].iclass);
+    if (GET_ATTRIB(pkt->insn[slot].opcode, A_EXTENSION)) {
+        return mmvec_ext_decode_find_iclass_slots(pkt->insn[slot].opcode);
+    } else {
+        return find_iclass_slots(pkt->insn[slot].opcode,
+                                 pkt->insn[slot].iclass);
+    }
 }
 
 #include "q6v_decode.c"
diff --git a/target/hexagon/mmvec/decode_ext_mmvec.c b/target/hexagon/mmvec/decode_ext_mmvec.c
new file mode 100644
index 0000000..7a74602
--- /dev/null
+++ b/target/hexagon/mmvec/decode_ext_mmvec.c
@@ -0,0 +1,670 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "decode.h"
+#include "opcodes.h"
+#include "insn.h"
+#include "printinsn.h"
+#include "mmvec/mmvec.h"
+#include "mmvec/decode_ext_mmvec.h"
+
+typedef enum hvx_resource {
+    HVX_RESOURCE_LOAD    = 0,
+    HVX_RESOURCE_STORE   = 1,
+    HVX_RESOURCE_PERM    = 2,
+    HVX_RESOURCE_SHIFT   = 3,
+    HVX_RESOURCE_MPY0    = 4,
+    HVX_RESOURCE_MPY1    = 5,
+    HVX_RESOURCE_ZR      = 6,
+    HVX_RESOURCE_ZW      = 7
+} hvx_resource_t;
+
+#define FREE    1
+#define USED    0
+
+static int
+check_dv_instruction(hvx_resource_t *resources, int *ilist,
+                     int num_insn, packet_t *packet, unsigned int attribute,
+                     hvx_resource_t resource0, hvx_resource_t resource1)
+{
+
+    int current_insn = 0;
+    /* Loop on vector instruction count */
+    for (current_insn = 0; current_insn < num_insn; current_insn++) {
+        /* valid instruction */
+        if (ilist[current_insn] > -1) {
+            int opcode = packet->insn[ilist[current_insn]].opcode;
+            if (GET_ATTRIB(opcode, attribute)) {
+                /* Needs two available resources */
+                if ((resources[resource0] + resources[resource1]) == 2 * FREE) {
+                    resources[resource0] = USED;
+                    resources[resource1] = USED;
+                    packet->insn[ilist[current_insn]].hvx_resource |=
+                        (1 << resource0);
+                    packet->insn[ilist[current_insn]].hvx_resource |=
+                        (1 << resource1);
+                } else {
+                    g_assert_not_reached();
+                }
+                ilist[current_insn] = -1;     /* Remove Instruction */
+            }
+        }
+    }
+    return 0;
+}
+
+/* Double Vector instructions that can use any one of specific or both pairs */
+static int
+check_dv_instruction2(hvx_resource_t *resources, int *ilist,
+                      int num_insn, packet_t *packet, unsigned int attribute,
+                      hvx_resource_t resource0, hvx_resource_t resource1,
+                      hvx_resource_t resource2, hvx_resource_t resource3)
+{
+
+    int current_insn = 0;
+    /* Loop on vector instruction count */
+    for (current_insn = 0; current_insn < num_insn; current_insn++) {
+        /* valid instruction */
+        if (ilist[current_insn] > -1) {
+            int opcode = packet->insn[ilist[current_insn]].opcode;
+            if (GET_ATTRIB(opcode, attribute)) {
+                /* Needs two available resources */
+                if ((resources[resource0] + resources[resource1]) == 2 * FREE) {
+                    resources[resource0] = USED;
+                    resources[resource1] = USED;
+                    packet->insn[ilist[current_insn]].hvx_resource |=
+                       (1 << resource0);
+                    packet->insn[ilist[current_insn]].hvx_resource |=
+                       (1 << resource1);
+                } else if ((resources[resource2] + resources[resource3]) ==
+                           2 * FREE) {
+                    resources[resource2] = USED;
+                    resources[resource3] = USED;
+                    packet->insn[ilist[current_insn]].hvx_resource |=
+                        (1 << resource0);
+                    packet->insn[ilist[current_insn]].hvx_resource |=
+                        (1 << resource1);
+                } else {
+                    g_assert_not_reached();
+                }
+                ilist[current_insn] = -1;     /* Remove Instruction */
+            }
+        }
+    }
+    return 0;
+}
+
+static int
+check_umem_instruction(hvx_resource_t *resources, int *ilist,
+                       int num_insn, packet_t *packet)
+{
+
+    int current_insn = 0;
+    /* Loop on vector instruction count */
+    for (current_insn = 0; current_insn < num_insn; current_insn++) {
+        /* valid instruction */
+        if (ilist[current_insn] > -1) {
+            int opcode = packet->insn[ilist[current_insn]].opcode;
+            /* check attribute */
+            if (GET_ATTRIB(opcode, A_CVI_VP) && GET_ATTRIB(opcode, A_CVI_VM)) {
+                /* Needs three available resources, both mem and permute */
+                if ((resources[HVX_RESOURCE_LOAD] +
+                     resources[HVX_RESOURCE_STORE] +
+                     resources[HVX_RESOURCE_PERM]) == 3 * FREE) {
+                    resources[HVX_RESOURCE_LOAD] = USED;
+                    resources[HVX_RESOURCE_STORE] = USED;
+                    resources[HVX_RESOURCE_PERM] = USED;
+                    packet->insn[ilist[current_insn]].hvx_resource |=
+                        (1 << HVX_RESOURCE_LOAD);
+                    packet->insn[ilist[current_insn]].hvx_resource |=
+                        (1 << HVX_RESOURCE_STORE);
+                    packet->insn[ilist[current_insn]].hvx_resource |=
+                        (1 << HVX_RESOURCE_PERM);
+                } else {
+                    g_assert_not_reached();
+                }
+
+                ilist[current_insn] = -1;     /* Remove Instruction */
+            }
+        }
+    }
+    return 0;
+}
+
+
+/* Memory instructions */
+static int
+check_mem_instruction(hvx_resource_t *resources, int *ilist,
+                      int num_insn, packet_t *packet)
+{
+
+    int current_insn = 0;
+    /* Loop on vector instruction count */
+    for (current_insn = 0; current_insn < num_insn; current_insn++) {
+        /* valid instruction */
+        if (ilist[current_insn] > -1) {
+            int opcode = packet->insn[ilist[current_insn]].opcode;
+            if (GET_ATTRIB(opcode, A_CVI_VM)) {
+                if (GET_ATTRIB(opcode, A_LOAD)) {
+                    if (resources[HVX_RESOURCE_LOAD] == FREE) {
+                        resources[HVX_RESOURCE_LOAD] = USED;
+                        packet->insn[ilist[current_insn]].hvx_resource |=
+                            (1 << HVX_RESOURCE_LOAD);
+                    } else {
+                        g_assert_not_reached();
+                    }
+
+                } else if (GET_ATTRIB(opcode, A_STORE)) {
+                    if (resources[HVX_RESOURCE_STORE] == FREE) {
+                        resources[HVX_RESOURCE_STORE] = USED;
+                        packet->insn[ilist[current_insn]].hvx_resource |=
+                            (1 << HVX_RESOURCE_STORE);
+                    } else {
+                        g_assert_not_reached();
+                    }
+                } else {
+                    g_assert_not_reached();
+                }
+
+                /* Not a load temp and not a store new */
+                if (!(GET_ATTRIB(opcode, A_CVI_TMP) &&
+                      GET_ATTRIB(opcode, A_LOAD)) &&
+                    !(GET_ATTRIB(opcode, A_CVI_NEW) &&
+                      GET_ATTRIB(opcode, A_STORE))) {
+                    /* Grab any one of the other resources */
+                    if (resources[HVX_RESOURCE_PERM] == FREE) {
+                        resources[HVX_RESOURCE_PERM] = USED;
+                        packet->insn[ilist[current_insn]].hvx_resource |=
+                            (1 << HVX_RESOURCE_PERM);
+                    } else if (resources[HVX_RESOURCE_SHIFT] == FREE) {
+                        resources[HVX_RESOURCE_SHIFT] = USED;
+                        packet->insn[ilist[current_insn]].hvx_resource |=
+                            (1 << HVX_RESOURCE_SHIFT);
+                    } else if (resources[HVX_RESOURCE_MPY0] == FREE) {
+                        resources[HVX_RESOURCE_MPY0] = USED;
+                        packet->insn[ilist[current_insn]].hvx_resource |=
+                            (1 << HVX_RESOURCE_MPY0);
+                    } else if (resources[HVX_RESOURCE_MPY1] == FREE) {
+                        resources[HVX_RESOURCE_MPY1] = USED;
+                        packet->insn[ilist[current_insn]].hvx_resource |=
+                            (1 << HVX_RESOURCE_MPY1);
+                    } else {
+                        g_assert_not_reached();
+                    }
+                }
+                ilist[current_insn] = -1;     /* Remove Instruction */
+            }
+        }
+    }
+    return 0;
+}
+
+/*
+ * Single Vector instructions that can use one, two, or four resources
+ * Insert instruction into one possible resource
+ */
+static int
+check_instruction1(hvx_resource_t *resources, int *ilist,
+                   int num_insn, packet_t *packet, unsigned int attribute,
+                   hvx_resource_t resource0)
+{
+
+    int current_insn = 0;
+    /* Loop on vector instruction count */
+    for (current_insn = 0; current_insn < num_insn; current_insn++) {
+        /* valid instruction */
+        if (ilist[current_insn] > -1) {
+            int opcode = packet->insn[ilist[current_insn]].opcode;
+            if (GET_ATTRIB(opcode, attribute)) {
+                /* Needs two available resources */
+                if (resources[resource0] == FREE) {
+                    resources[resource0] = USED;
+                    packet->insn[ilist[current_insn]].hvx_resource |=
+                        (1 << resource0);
+                } else {
+                    g_assert_not_reached();
+                }
+
+                ilist[current_insn] = -1;     /* Remove Instruction */
+            }
+        }
+    }
+    return 0;
+}
+
+/* Insert instruction into one of two possible resource2 */
+static int
+check_instruction2(hvx_resource_t *resources, int *ilist,
+                   int num_insn, packet_t *packet, unsigned int attribute,
+                   hvx_resource_t resource0, hvx_resource_t resource1)
+{
+
+    int current_insn = 0;
+    /* Loop on vector instruction count */
+    for (current_insn = 0; current_insn < num_insn; current_insn++) {
+        /* valid instruction */
+        if (ilist[current_insn] > -1) {
+            int opcode = packet->insn[ilist[current_insn]].opcode;
+            if (GET_ATTRIB(opcode, attribute)) {
+                /* Needs one of two possible available resources */
+                if (resources[resource0] == FREE) {
+                    resources[resource0] = USED;
+                    packet->insn[ilist[current_insn]].hvx_resource |=
+                        (1 << resource0);
+                } else if (resources[resource1] == FREE)  {
+                    resources[resource1] = USED;
+                    packet->insn[ilist[current_insn]].hvx_resource |=
+                        (1 << resource1);
+                } else {
+                    g_assert_not_reached();
+                }
+
+                ilist[current_insn] = -1;     /* Remove Instruction */
+            }
+        }
+    }
+    return 0;
+}
+
+/* Insert instruction into one of 4 four possible resource */
+static int
+check_instruction4(hvx_resource_t *resources, int *ilist,
+                   int num_insn, packet_t *packet, unsigned int attribute,
+                   hvx_resource_t resource0, hvx_resource_t resource1,
+                   hvx_resource_t resource2, hvx_resource_t resource3)
+{
+
+    int current_insn = 0;
+    /* Loop on vector instruction count */
+    for (current_insn = 0; current_insn < num_insn; current_insn++) {
+        /* valid instruction */
+        if (ilist[current_insn] > -1) {
+            int opcode = packet->insn[ilist[current_insn]].opcode;
+            if (GET_ATTRIB(opcode, attribute)) {
+                /* Needs one of four available resources */
+                if (resources[resource0] == FREE) {
+                    resources[resource0] = USED;
+                    packet->insn[ilist[current_insn]].hvx_resource |=
+                        (1 << resource0);
+                } else if (resources[resource1] == FREE) {
+                    resources[resource1] = USED;
+                    packet->insn[ilist[current_insn]].hvx_resource |=
+                        (1 << resource1);
+                } else if (resources[resource2] == FREE) {
+                    resources[resource2] = USED;
+                    packet->insn[ilist[current_insn]].hvx_resource |=
+                        (1 << resource2);
+                } else if (resources[resource3] == FREE) {
+                    resources[resource3] = USED;
+                    packet->insn[ilist[current_insn]].hvx_resource |=
+                        (1 << resource3);
+                } else {
+                    g_assert_not_reached();
+                }
+
+                ilist[current_insn] = -1;     /* Remove Instruction */
+            }
+        }
+    }
+    return 0;
+}
+
+static int
+check_4res_instruction(hvx_resource_t *resources, int *ilist,
+                       int num_insn, packet_t *packet)
+{
+
+    int current_insn = 0;
+    for (current_insn = 0; current_insn < num_insn; current_insn++) {
+        if (ilist[current_insn] > -1) {
+            int opcode = packet->insn[ilist[current_insn]].opcode;
+            if (GET_ATTRIB(opcode, A_CVI_4SLOT)) {
+                int available_resource =
+                        resources[HVX_RESOURCE_PERM]
+                        + resources[HVX_RESOURCE_SHIFT]
+                        + resources[HVX_RESOURCE_MPY0]
+                        + resources[HVX_RESOURCE_MPY1];
+
+                if (available_resource == 4 * FREE) {
+                    resources[HVX_RESOURCE_PERM] = USED;
+                    resources[HVX_RESOURCE_SHIFT] = USED;
+                    resources[HVX_RESOURCE_MPY0] = USED;
+                    resources[HVX_RESOURCE_MPY1] = USED;
+                    packet->insn[ilist[current_insn]].hvx_resource |=
+                        (1 << HVX_RESOURCE_PERM);
+                    packet->insn[ilist[current_insn]].hvx_resource |=
+                        (1 << HVX_RESOURCE_SHIFT);
+                    packet->insn[ilist[current_insn]].hvx_resource |=
+                        (1 << HVX_RESOURCE_MPY0);
+                    packet->insn[ilist[current_insn]].hvx_resource |=
+                        (1 << HVX_RESOURCE_MPY1);
+                } else {
+                    g_assert_not_reached();
+                }
+
+                ilist[current_insn] = -1;     /* Remove Instruction */
+            }
+        }
+
+    }
+    return 0;
+}
+
+
+static int
+decode_populate_cvi_resources(packet_t *packet)
+{
+
+    int i, num_insn = 0;
+    int vlist[4] = {-1, -1, -1, -1};
+    hvx_resource_t hvx_resources[8] = {
+        FREE,
+        FREE,
+        FREE,
+        FREE,
+        FREE,
+        FREE,
+        FREE,
+        FREE
+    };    /* All Available */
+    int errors = 0;
+
+
+    /* Count Vector instructions and check for deprecated ones */
+    for (num_insn = 0, i = 0; i < packet->num_insns; i++) {
+        if (GET_ATTRIB(packet->insn[i].opcode, A_CVI)) {
+            vlist[num_insn++] = i;
+        }
+    }
+
+    /* Instructions that consume all vector resources */
+    errors += check_4res_instruction(hvx_resources, vlist, num_insn, packet);
+    /* Insert Unaligned Memory Access */
+    errors += check_umem_instruction(hvx_resources, vlist, num_insn, packet);
+
+    /* double vector permute Consumes both permute and shift resources */
+    errors += check_dv_instruction(hvx_resources, vlist, num_insn,
+                                   packet, A_CVI_VP_VS,
+                                   HVX_RESOURCE_SHIFT, HVX_RESOURCE_PERM);
+    /* Single vector permute can only go to permute resource */
+    errors += check_instruction1(hvx_resources, vlist, num_insn,
+                                 packet, A_CVI_VP, HVX_RESOURCE_PERM);
+    /* Single vector permute can only go to permute resource */
+    errors += check_instruction1(hvx_resources, vlist, num_insn,
+                                 packet, A_CVI_VS, HVX_RESOURCE_SHIFT);
+
+    /* Try to insert double vector multiply */
+    errors += check_dv_instruction(hvx_resources, vlist, num_insn,
+                                   packet, A_CVI_VX_DV,
+                                   HVX_RESOURCE_MPY0, HVX_RESOURCE_MPY1);
+    /* Try to insert double capacity mult */
+    errors += check_dv_instruction2(hvx_resources, vlist, num_insn,
+                                    packet, A_CVI_VS_VX,
+                                    HVX_RESOURCE_SHIFT, HVX_RESOURCE_MPY0,
+                                    HVX_RESOURCE_PERM, HVX_RESOURCE_MPY1);
+    /* Single vector permute can only go to permute resource */
+    errors += check_instruction2(hvx_resources, vlist, num_insn,
+                                 packet, A_CVI_VX,
+                                 HVX_RESOURCE_MPY0, HVX_RESOURCE_MPY1);
+    /* Try to insert double vector alu */
+    errors += check_dv_instruction2(hvx_resources, vlist, num_insn,
+                                    packet, A_CVI_VA_DV,
+                                    HVX_RESOURCE_SHIFT, HVX_RESOURCE_PERM,
+                                    HVX_RESOURCE_MPY0, HVX_RESOURCE_MPY1);
+
+    errors += check_mem_instruction(hvx_resources, vlist, num_insn, packet);
+    /* single vector alu can go on any of the 4 pipes */
+    errors += check_instruction4(hvx_resources, vlist, num_insn,
+                                 packet, A_CVI_VA,
+                                 HVX_RESOURCE_SHIFT, HVX_RESOURCE_PERM,
+                                 HVX_RESOURCE_MPY0, HVX_RESOURCE_MPY1);
+
+    return errors;
+}
+
+
+
+static int
+check_new_value(packet_t *packet)
+{
+    /* .New Value for a MMVector Store */
+    int i, j;
+    const char *reginfo;
+    const char *destletters;
+    const char *dststr = NULL;
+    size2u_t def_opcode;
+    char letter;
+    int def_regnum;
+
+    for (i = 1; i < packet->num_insns; i++) {
+        size2u_t use_opcode = packet->insn[i].opcode;
+        if (GET_ATTRIB(use_opcode, A_DOTNEWVALUE) &&
+            GET_ATTRIB(use_opcode, A_CVI) &&
+            GET_ATTRIB(use_opcode, A_STORE)) {
+            int use_regidx = strchr(opcode_reginfo[use_opcode], 's') -
+                opcode_reginfo[use_opcode];
+            /*
+             * What's encoded at the N-field is the offset to who's producing
+             * the value.
+             * Shift off the LSB which indicates odd/even register.
+             */
+            int def_off = ((packet->insn[i].regno[use_regidx]) >> 1);
+            int def_oreg = packet->insn[i].regno[use_regidx] & 1;
+            int def_idx = -1;
+            for (j = i - 1; (j >= 0) && (def_off >= 0); j--) {
+                if (!GET_ATTRIB(packet->insn[j].opcode, A_CVI)) {
+                    continue;
+                }
+                def_off--;
+                if (def_off == 0) {
+                    def_idx = j;
+                    break;
+                }
+            }
+            /*
+             * Check for a badly encoded N-field which points to an instruction
+             * out-of-range
+             */
+            if ((def_off != 0) || (def_idx < 0) ||
+                (def_idx > (packet->num_insns - 1))) {
+                g_assert_not_reached();
+            }
+            /* previous insn is the producer */
+            def_opcode = packet->insn[def_idx].opcode;
+            reginfo = opcode_reginfo[def_opcode];
+            destletters = "dexy";
+            for (j = 0; (letter = destletters[j]) != 0; j++) {
+                dststr = strchr(reginfo, letter);
+                if (dststr != NULL) {
+                    break;
+                }
+            }
+            if ((dststr == NULL)  && GET_ATTRIB(def_opcode, A_CVI_GATHER)) {
+                def_regnum = 0;
+                packet->insn[i].regno[use_regidx] = def_oreg;
+                packet->insn[i].new_value_producer_slot =
+                    packet->insn[def_idx].slot;
+            } else {
+                if (dststr == NULL) {
+                    /* still not there, we have a bad packet */
+                    g_assert_not_reached();
+                }
+                def_regnum = packet->insn[def_idx].regno[dststr - reginfo];
+                /* Now patch up the consumer with the register number */
+                packet->insn[i].regno[use_regidx] = def_regnum ^ def_oreg;
+                /* special case for (Vx,Vy) */
+                dststr = strchr(reginfo, 'y');
+                if (def_oreg && strchr(reginfo, 'x') && dststr) {
+                    def_regnum = packet->insn[def_idx].regno[dststr - reginfo];
+                    packet->insn[i].regno[use_regidx] = def_regnum;
+                }
+                /*
+                 * We need to remember who produces this value to later
+                 * check if it was dynamically cancelled
+                 */
+                packet->insn[i].new_value_producer_slot =
+                    packet->insn[def_idx].slot;
+            }
+        }
+    }
+    return 0;
+}
+
+/*
+ * We don't want to reorder slot1/slot0 with respect to each other.
+ * So in our shuffling, we don't want to move the .cur / .tmp vmem earlier
+ * Instead, we should move the producing instruction later
+ * But the producing instruction might feed a .new store!
+ * So we may need to move that even later.
+ */
+
+static void
+decode_mmvec_move_cvi_to_end(packet_t *packet, int max)
+{
+    int i;
+    for (i = 0; i < max; i++) {
+        if (GET_ATTRIB(packet->insn[i].opcode, A_CVI)) {
+            int last_inst = packet->num_insns - 1;
+
+            /*
+             * If the last instruction is an endloop, move to the one before it
+             * Keep endloop as the last thing always
+             */
+            if ((packet->insn[last_inst].opcode == J2_endloop0) ||
+                (packet->insn[last_inst].opcode == J2_endloop1) ||
+                (packet->insn[last_inst].opcode == J2_endloop01)) {
+                last_inst--;
+            }
+
+            decode_send_insn_to(packet, i, last_inst);
+            max--;
+            i--;    /* Retry this index now that packet has rotated */
+        }
+    }
+}
+
+static int
+decode_shuffle_for_execution_vops(packet_t *packet)
+{
+    /*
+     * Sort for V.new = VMEM()
+     * Right now we need to make sure that the vload occurs before the
+     * permute instruction or VPVX ops
+     */
+    int i;
+    for (i = 0; i < packet->num_insns; i++) {
+        if (GET_ATTRIB(packet->insn[i].opcode, A_LOAD) &&
+            (GET_ATTRIB(packet->insn[i].opcode, A_CVI_NEW) ||
+             GET_ATTRIB(packet->insn[i].opcode, A_CVI_TMP))) {
+            /*
+             * Find prior consuming vector instructions
+             * Move to end of packet
+             */
+            decode_mmvec_move_cvi_to_end(packet, i);
+            break;
+        }
+    }
+    for (i = 0; i < packet->num_insns - 1; i++) {
+        if (GET_ATTRIB(packet->insn[i].opcode, A_STORE) &&
+            GET_ATTRIB(packet->insn[i].opcode, A_CVI_NEW) &&
+            !GET_ATTRIB(packet->insn[i].opcode, A_CVI_SCATTER_RELEASE)) {
+            /* decode_send_insn_to the end */
+            int last_inst = packet->num_insns - 1;
+
+            /*
+             * If the last instruction is an endloop, move to the one before it
+             * Keep endloop as the last thing always
+             */
+            if ((packet->insn[last_inst].opcode == J2_endloop0) ||
+                (packet->insn[last_inst].opcode == J2_endloop1) ||
+                (packet->insn[last_inst].opcode == J2_endloop01)) {
+                last_inst--;
+            }
+
+            decode_send_insn_to(packet, i, last_inst);
+            break;
+        }
+    }
+    return 0;
+}
+
+/* Collect stats on HVX packet */
+static void
+decode_hvx_packet_contents(packet_t *pkt) {
+    pkt->pkt_hvx_va = 0;
+    pkt->pkt_hvx_vx = 0;
+    pkt->pkt_hvx_vp = 0;
+    pkt->pkt_hvx_vs = 0;
+    pkt->pkt_hvx_all = 0;
+    pkt->pkt_hvx_none = 0;
+    pkt->pkt_has_hvx = 0;
+
+    for (int i = 0; i < pkt->num_insns; i++) {
+        pkt->pkt_hvx_va += GET_ATTRIB(pkt->insn[i].opcode, A_CVI_VA);
+        pkt->pkt_hvx_vx += GET_ATTRIB(pkt->insn[i].opcode, A_CVI_VX);
+        pkt->pkt_hvx_vp += GET_ATTRIB(pkt->insn[i].opcode, A_CVI_VP);
+        pkt->pkt_hvx_vs += GET_ATTRIB(pkt->insn[i].opcode, A_CVI_VS);
+        pkt->pkt_hvx_none += GET_ATTRIB(pkt->insn[i].opcode, A_CVI_TMP);
+        pkt->pkt_hvx_all += GET_ATTRIB(pkt->insn[i].opcode, A_CVI_4SLOT);
+        pkt->pkt_hvx_va += 2 * GET_ATTRIB(pkt->insn[i].opcode, A_CVI_VA_DV);
+        pkt->pkt_hvx_vx += 2 * GET_ATTRIB(pkt->insn[i].opcode, A_CVI_VX_DV);
+        pkt->pkt_hvx_vp += GET_ATTRIB(pkt->insn[i].opcode, A_CVI_VP_VS);
+        pkt->pkt_hvx_vs += GET_ATTRIB(pkt->insn[i].opcode, A_CVI_VP_VS);
+        pkt->pkt_has_hvx |= pkt->insn[i].hvx_resource ? 1 : 0;
+    }
+}
+
+/*
+ * Public Functions
+ */
+
+const char *
+mmvec_ext_decode_find_iclass_slots(int opcode)
+{
+    if (GET_ATTRIB(opcode, A_CVI_VM)) {
+        if (GET_ATTRIB(opcode, A_RESTRICT_SLOT0ONLY)) {
+            return "0";
+        } else if (GET_ATTRIB(opcode, A_RESTRICT_SLOT1ONLY)) {
+            return "1";
+        }
+        return "01";
+    } else if (GET_ATTRIB(opcode, A_RESTRICT_SLOT2ONLY)) {
+        return "2";
+    } else if (GET_ATTRIB(opcode, A_CVI_VX)) {
+        return "23";
+    } else if (GET_ATTRIB(opcode, A_CVI_VS_VX)) {
+        return "23";
+    } else {
+        return "0123";
+    }
+}
+
+int mmvec_ext_decode_checks(packet_t *packet)
+{
+    int errors = 0;
+    errors += check_new_value(packet);
+    errors += decode_populate_cvi_resources(packet);
+    errors += decode_shuffle_for_execution_vops(packet);
+
+    if (errors == 0) {
+        decode_hvx_packet_contents(packet);
+    }
+
+    return errors;
+}
+
diff --git a/target/hexagon/q6v_decode.c b/target/hexagon/q6v_decode.c
index f2b1548..2d253db 100644
--- a/target/hexagon/q6v_decode.c
+++ b/target/hexagon/q6v_decode.c
@@ -161,6 +161,13 @@ decode_insns_tablewalk(insn_t *insn, dectree_table_t *table, size4u_t encoding)
         }
         decode_op(insn, opc, encoding);
         return 1;
+    } else if (table->table[i].type == DECTREE_EXTSPACE) {
+        size4u_t active_ext;
+        /*
+         * For now, HVX will be the only coproc
+         */
+        active_ext = 4;
+        return decode_insns_tablewalk(insn, ext_trees[active_ext], encoding);
     } else {
         return 0;
     }
@@ -356,10 +363,13 @@ static int do_decode_packet(int max_words, const size4u_t *words, packet_t *pkt)
             pkt->pkt_has_payload = 1;
         }
     }
+    pkt->pkt_has_extension = 0;
     pkt->pkt_has_initloop = 0;
     pkt->pkt_has_initloop0 = 0;
     pkt->pkt_has_initloop1 = 0;
     for (i = 0; i < num_insns; i++) {
+        pkt->pkt_has_extension |=
+            GET_ATTRIB(pkt->insn[i].opcode, A_EXTENSION);
         pkt->pkt_has_initloop0 |=
             GET_ATTRIB(pkt->insn[i].opcode, A_HWLOOP0_SETUP);
         pkt->pkt_has_initloop1 |=
@@ -391,6 +401,10 @@ static int do_decode_packet(int max_words, const size4u_t *words, packet_t *pkt)
     errors += decode_set_slot_number(pkt);
     errors += decode_fill_newvalue_regno(pkt);
 
+    if (pkt->pkt_has_extension) {
+        errors += mmvec_ext_decode_checks(pkt);
+    }
+
     errors += decode_shuffle_for_execution(pkt);
     errors += decode_split_cmpjump(pkt);
     errors += decode_set_insn_attr_fields(pkt);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 61/67] Hexagon HVX instruction utility functions
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (59 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 60/67] Hexagon HVX instruction decoding Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-02-28 16:43 ` [RFC PATCH v2 62/67] Hexagon HVX macros to interface with the generator Taylor Simpson
                   ` (6 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Functions to support scatter/gather

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/mmvec/system_ext_mmvec.h |  38 +++++
 target/hexagon/mmvec/system_ext_mmvec.c | 263 ++++++++++++++++++++++++++++++++
 2 files changed, 301 insertions(+)
 create mode 100644 target/hexagon/mmvec/system_ext_mmvec.h
 create mode 100644 target/hexagon/mmvec/system_ext_mmvec.c

diff --git a/target/hexagon/mmvec/system_ext_mmvec.h b/target/hexagon/mmvec/system_ext_mmvec.h
new file mode 100644
index 0000000..4f35966
--- /dev/null
+++ b/target/hexagon/mmvec/system_ext_mmvec.h
@@ -0,0 +1,38 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HEXAGON_SYSTEM_EXT_MMVEC_H
+#define HEXAGON_SYSTEM_EXT_MMVEC_H
+
+extern void mem_load_vector_oddva(CPUHexagonState *env, vaddr_t vaddr,
+                           vaddr_t lookup_vaddr, int slot, int size,
+                           size1u_t *data, int use_full_va);
+extern void mem_store_vector_oddva(CPUHexagonState *env, vaddr_t vaddr,
+                            vaddr_t lookup_vaddr, int slot, int size,
+                            size1u_t *data, size1u_t* mask, unsigned invert,
+                            int use_full_va);
+extern void mem_vector_scatter_init(CPUHexagonState *env, int slot,
+                                    vaddr_t base_vaddr, int length,
+                                    int element_size);
+extern void mem_vector_scatter_finish(CPUHexagonState *env, int slot, int op);
+extern void mem_vector_gather_finish(CPUHexagonState *env, int slot);
+extern void mem_vector_gather_init(CPUHexagonState *env, int slot,
+                                   vaddr_t base_vaddr, int length,
+                                   int element_size);
+
+
+#endif
diff --git a/target/hexagon/mmvec/system_ext_mmvec.c b/target/hexagon/mmvec/system_ext_mmvec.c
new file mode 100644
index 0000000..e0df96e
--- /dev/null
+++ b/target/hexagon/mmvec/system_ext_mmvec.c
@@ -0,0 +1,263 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "opcodes.h"
+#include "insn.h"
+#include "mmvec/macros.h"
+#include "qemu.h"
+
+#define TYPE_LOAD 'L'
+#define TYPE_STORE 'S'
+#define TYPE_FETCH 'F'
+#define TYPE_ICINVA 'I'
+
+enum mem_access_types {
+    access_type_INVALID = 0,
+    access_type_unknown = 1,
+    access_type_load = 2,
+    access_type_store = 3,
+    access_type_fetch = 4,
+    access_type_dczeroa = 5,
+    access_type_dccleana = 6,
+    access_type_dcinva = 7,
+    access_type_dccleaninva = 8,
+    access_type_icinva = 9,
+    access_type_ictagr = 10,
+    access_type_ictagw = 11,
+    access_type_icdatar = 12,
+    access_type_dcfetch = 13,
+    access_type_l2fetch = 14,
+    access_type_l2cleanidx = 15,
+    access_type_l2cleaninvidx = 16,
+    access_type_l2tagr = 17,
+    access_type_l2tagw = 18,
+    access_type_dccleanidx = 19,
+    access_type_dcinvidx = 20,
+    access_type_dccleaninvidx = 21,
+    access_type_dctagr = 22,
+    access_type_dctagw = 23,
+    access_type_k0unlock = 24,
+    access_type_l2locka = 25,
+    access_type_l2unlocka = 26,
+    access_type_l2kill = 27,
+    access_type_l2gclean = 28,
+    access_type_l2gcleaninv = 29,
+    access_type_l2gunlock = 30,
+    access_type_synch = 31,
+    access_type_isync = 32,
+    access_type_pause = 33,
+    access_type_load_phys = 34,
+    access_type_load_locked = 35,
+    access_type_store_conditional = 36,
+    access_type_barrier = 37,
+    access_type_memcpy_load = 39,
+    access_type_memcpy_store = 40,
+
+    NUM_CORE_ACCESS_TYPES
+};
+
+enum ext_mem_access_types {
+    access_type_vload = NUM_CORE_ACCESS_TYPES,
+    access_type_vstore,
+    access_type_vload_nt,
+    access_type_vstore_nt,
+    access_type_vgather_load,
+    access_type_vscatter_store,
+    access_type_vscatter_release,
+    access_type_vgather_release,
+    access_type_vfetch,
+    NUM_EXT_ACCESS_TYPES
+};
+
+static inline
+target_ulong mem_init_access(CPUHexagonState *env, int slot, size4u_t vaddr,
+                             int width, enum mem_access_types mtype,
+                             int type_for_xlate)
+{
+#ifdef CONFIG_USER_ONLY
+    /* Nothing to do for Linux user mode in qemu */
+    return vaddr;
+#else
+#error System mode not yet implemented for Hexagon
+#endif
+}
+
+static inline int check_gather_store(CPUHexagonState *env)
+{
+    /* First check to see if temp vreg has been updated */
+    int check  = env->gather_issued;
+    check &= env->is_gather_store_insn;
+
+    /* In case we don't have store, suppress gather */
+    if (!check) {
+        env->gather_issued = 0;
+        env->vtcm_pending = 0;   /* Suppress any gather writes to memory */
+    }
+    return check;
+}
+
+void mem_store_vector_oddva(CPUHexagonState *env, vaddr_t vaddr,
+                            vaddr_t lookup_vaddr, int slot, int size,
+                            size1u_t *data, size1u_t *mask, unsigned invert,
+                            int use_full_va)
+{
+    int i;
+
+    if (!use_full_va) {
+        lookup_vaddr = vaddr;
+    }
+
+    if (!size) {
+        return;
+    }
+
+    int is_gather_store = check_gather_store(env);
+    if (is_gather_store) {
+        memcpy(data, &env->tmp_VRegs[0].ub[0], size);
+        env->VRegs_updated_tmp = 0;
+        env->gather_issued = 0;
+    }
+
+    /*
+     * If it's a gather store update store data from temporary register
+     * And clear flag
+     */
+    env->vstore_pending[slot] = 1;
+    env->vstore[slot].va   = vaddr;
+    env->vstore[slot].size = size;
+    memcpy(&env->vstore[slot].data.ub[0], data, size);
+    if (!mask) {
+        memset(&env->vstore[slot].mask.ub[0], invert ? 0 : -1, size);
+    } else if (invert) {
+        for (i = 0; i < size; i++) {
+            env->vstore[slot].mask.ub[i] = !mask[i];
+        }
+    } else {
+        memcpy(&env->vstore[slot].mask.ub[0], mask, size);
+    }
+    /* On a gather store, overwrite the store mask to emulate dropped gathers */
+    if (is_gather_store) {
+        memcpy(&env->vstore[slot].mask.ub[0], &env->vtcm_log.mask.ub[0], size);
+    }
+    for (i = 0; i < size; i++) {
+        env->mem_access[slot].cdata[i] = data[i];
+    }
+}
+
+void mem_load_vector_oddva(CPUHexagonState *env, vaddr_t vaddr,
+                           vaddr_t lookup_vaddr, int slot, int size,
+                           size1u_t *data, int use_full_va)
+{
+    int i;
+
+    if (!use_full_va) {
+        lookup_vaddr = vaddr;
+    }
+
+    if (!size) {
+        return;
+    }
+
+    for (i = 0; i < size; i++) {
+        get_user_u8(data[i], vaddr);
+        vaddr++;
+    }
+}
+
+void mem_vector_scatter_init(CPUHexagonState *env, int slot, vaddr_t base_vaddr,
+                             int length, int element_size)
+{
+    enum ext_mem_access_types access_type = access_type_vscatter_store;
+    int i;
+
+    /* Translation for Store Address on Slot 1 - maybe any slot? */
+    mem_init_access(env, slot, base_vaddr, 1, access_type, TYPE_STORE);
+    mem_access_info_t *maptr = &env->mem_access[slot];
+    if (EXCEPTION_DETECTED) {
+        return;
+    }
+
+    maptr->range = length;
+
+    for (i = 0; i < fVECSIZE(); i++) {
+        env->vtcm_log.offsets.ub[i] = 0; /* Mark invalid */
+        env->vtcm_log.data.ub[i] = 0;
+        env->vtcm_log.mask.ub[i] = 0;
+    }
+    env->vtcm_log.va_base = base_vaddr;
+
+    env->vtcm_pending = 1;
+    env->vtcm_log.oob_access = 0;
+    env->vtcm_log.op = 0;
+    env->vtcm_log.op_size = 0;
+    return;
+}
+
+void mem_vector_gather_init(CPUHexagonState *env, int slot, vaddr_t base_vaddr,
+                            int length, int element_size)
+{
+    enum ext_mem_access_types access_type = access_type_vgather_load;
+    int i;
+
+    mem_init_access(env, slot, base_vaddr, 1,  access_type, TYPE_LOAD);
+    mem_access_info_t *maptr = &env->mem_access[slot];
+
+    if (EXCEPTION_DETECTED) {
+        return;
+    }
+
+    maptr->range = length;
+
+    for (i = 0; i < 2 * fVECSIZE(); i++) {
+        env->vtcm_log.offsets.ub[i] = 0x0;
+    }
+    for (i = 0; i < fVECSIZE(); i++) {
+        env->vtcm_log.data.ub[i] = 0;
+        env->vtcm_log.mask.ub[i] = 0;
+        env->vtcm_log.va[i] = 0;
+        env->tmp_VRegs[0].ub[i] = 0;
+    }
+    env->vtcm_log.oob_access = 0;
+    env->vtcm_log.op = 0;
+    env->vtcm_log.op_size = 0;
+
+    env->vtcm_log.va_base = base_vaddr;
+
+    /*
+     * Temp Reg gets updated
+     * This allows Store .new to grab the correct result
+     */
+    env->VRegs_updated_tmp = 1;
+    env->gather_issued = 1;
+
+    return;
+}
+
+void mem_vector_scatter_finish(CPUHexagonState *env, int slot, int op)
+{
+    env->store_pending[slot] = 0;
+    env->vstore_pending[slot] = 0;
+    env->vtcm_log.size = fVECSIZE();
+
+    memcpy(env->mem_access[slot].cdata, &env->vtcm_log.offsets.ub[0], 256);
+}
+
+void mem_vector_gather_finish(CPUHexagonState *env, int slot)
+{
+    memcpy(env->mem_access[slot].cdata, &env->vtcm_log.offsets.ub[0], 256);
+}
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 62/67] Hexagon HVX macros to interface with the generator
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (60 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 61/67] Hexagon HVX instruction utility functions Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-02-28 16:43 ` [RFC PATCH v2 63/67] Hexagon HVX macros referenced in instruction semantics Taylor Simpson
                   ` (5 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Various forms of declare, read, write, free for HVX operands

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/mmvec/macros.h | 262 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 262 insertions(+)
 create mode 100644 target/hexagon/mmvec/macros.h

diff --git a/target/hexagon/mmvec/macros.h b/target/hexagon/mmvec/macros.h
new file mode 100644
index 0000000..be80bbd
--- /dev/null
+++ b/target/hexagon/mmvec/macros.h
@@ -0,0 +1,262 @@
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HEXAGON_MMVEC_MACROS_H
+#define HEXAGON_MMVEC_MACROS_H
+
+#include "mmvec/system_ext_mmvec.h"
+
+#ifdef QEMU_GENERATE
+#else
+#define VdV      (*(mmvector_t *)(VdV_void))
+#define VsV      (*(mmvector_t *)(VsV_void))
+#define VuV      (*(mmvector_t *)(VuV_void))
+#define VvV      (*(mmvector_t *)(VvV_void))
+#define VwV      (*(mmvector_t *)(VwV_void))
+#define VxV      (*(mmvector_t *)(VxV_void))
+#define VyV      (*(mmvector_t *)(VyV_void))
+
+#define VddV     (*(mmvector_pair_t *)(VddV_void))
+#define VuuV     (*(mmvector_pair_t *)(VuuV_void))
+#define VvvV     (*(mmvector_pair_t *)(VvvV_void))
+#define VxxV     (*(mmvector_pair_t *)(VxxV_void))
+
+#define QeV      (*(mmqreg_t *)(QeV_void))
+#define QdV      (*(mmqreg_t *)(QdV_void))
+#define QsV      (*(mmqreg_t *)(QsV_void))
+#define QtV      (*(mmqreg_t *)(QtV_void))
+#define QuV      (*(mmqreg_t *)(QuV_void))
+#define QvV      (*(mmqreg_t *)(QvV_void))
+#define QxV      (*(mmqreg_t *)(QxV_void))
+#endif
+
+#ifdef QEMU_GENERATE
+#define DECL_VREG(VAR, NUM, X, OFF) \
+    TCGv_ptr VAR = tcg_temp_local_new_ptr(); \
+    size1u_t NUM = REGNO(X) + OFF; \
+    do { \
+        uint32_t offset = new_temp_vreg_offset(ctx, 1); \
+        tcg_gen_addi_ptr(VAR, cpu_env, offset); \
+    } while (0)
+
+/*
+ * Certain instructions appear to have readonly operands, but
+ * in reality they do not.
+ *     vdelta instructions overwrite their VuV operand
+ */
+static bool readonly_ok(insn_t *insn)
+{
+    uint32_t opcode = insn->opcode;
+    if (opcode == V6_vdelta ||
+        opcode == V6_vrdelta) {
+        return false;
+    }
+    return true;
+}
+
+#define DECL_VREG_READONLY(VAR, NUM, X, OFF) \
+    TCGv_ptr VAR = tcg_temp_local_new_ptr(); \
+    size1u_t NUM = REGNO(X) + OFF; \
+    if (!readonly_ok(insn)) { \
+        uint32_t offset = new_temp_vreg_offset(ctx, 1); \
+        tcg_gen_addi_ptr(VAR, cpu_env, offset); \
+    }
+
+#define DECL_VREG_d(VAR, NUM, X, OFF) \
+    DECL_VREG(VAR, NUM, X, OFF)
+#define DECL_VREG_s(VAR, NUM, X, OFF) \
+    DECL_VREG_READONLY(VAR, NUM, X, OFF)
+#define DECL_VREG_u(VAR, NUM, X, OFF) \
+    DECL_VREG_READONLY(VAR, NUM, X, OFF)
+#define DECL_VREG_v(VAR, NUM, X, OFF) \
+    DECL_VREG_READONLY(VAR, NUM, X, OFF)
+#define DECL_VREG_w(VAR, NUM, X, OFF) \
+    DECL_VREG_READONLY(VAR, NUM, X, OFF)
+#define DECL_VREG_x(VAR, NUM, X, OFF) \
+    DECL_VREG(VAR, NUM, X, OFF)
+#define DECL_VREG_y(VAR, NUM, X, OFF) \
+    DECL_VREG(VAR, NUM, X, OFF)
+
+#define DECL_VREG_PAIR(VAR, NUM, X, OFF) \
+    TCGv_ptr VAR = tcg_temp_local_new_ptr(); \
+    size1u_t NUM = REGNO(X) + OFF; \
+    do { \
+        uint32_t offset = new_temp_vreg_offset(ctx, 2); \
+        tcg_gen_addi_ptr(VAR, cpu_env, offset); \
+    } while (0)
+
+#define DECL_VREG_dd(VAR, NUM, X, OFF) \
+    DECL_VREG_PAIR(VAR, NUM, X, OFF)
+#define DECL_VREG_uu(VAR, NUM, X, OFF) \
+    DECL_VREG_PAIR(VAR, NUM, X, OFF)
+#define DECL_VREG_vv(VAR, NUM, X, OFF) \
+    DECL_VREG_PAIR(VAR, NUM, X, OFF)
+#define DECL_VREG_xx(VAR, NUM, X, OFF) \
+    DECL_VREG_PAIR(VAR, NUM, X, OFF)
+
+#define DECL_QREG(VAR, NUM, X, OFF) \
+    TCGv_ptr VAR = tcg_temp_local_new_ptr(); \
+    size1u_t NUM = REGNO(X) + OFF; \
+    do { \
+        uint32_t __offset = new_temp_qreg_offset(ctx); \
+        tcg_gen_addi_ptr(VAR, cpu_env, __offset); \
+    } while (0)
+
+#define DECL_QREG_d(VAR, NUM, X, OFF) \
+    DECL_QREG(VAR, NUM, X, OFF)
+#define DECL_QREG_e(VAR, NUM, X, OFF) \
+    DECL_QREG(VAR, NUM, X, OFF)
+#define DECL_QREG_s(VAR, NUM, X, OFF) \
+    DECL_QREG(VAR, NUM, X, OFF)
+#define DECL_QREG_t(VAR, NUM, X, OFF) \
+    DECL_QREG(VAR, NUM, X, OFF)
+#define DECL_QREG_u(VAR, NUM, X, OFF) \
+    DECL_QREG(VAR, NUM, X, OFF)
+#define DECL_QREG_v(VAR, NUM, X, OFF) \
+    DECL_QREG(VAR, NUM, X, OFF)
+#define DECL_QREG_x(VAR, NUM, X, OFF) \
+    DECL_QREG(VAR, NUM, X, OFF)
+
+#define FREE_VREG(VAR)          tcg_temp_free_ptr(VAR)
+#define FREE_VREG_d(VAR)        FREE_VREG(VAR)
+#define FREE_VREG_s(VAR)        FREE_VREG(VAR)
+#define FREE_VREG_u(VAR)        FREE_VREG(VAR)
+#define FREE_VREG_v(VAR)        FREE_VREG(VAR)
+#define FREE_VREG_w(VAR)        FREE_VREG(VAR)
+#define FREE_VREG_x(VAR)        FREE_VREG(VAR)
+#define FREE_VREG_y(VAR)        FREE_VREG(VAR)
+
+#define FREE_VREG_PAIR(VAR)     tcg_temp_free_ptr(VAR)
+#define FREE_VREG_dd(VAR)       FREE_VREG_PAIR(VAR)
+#define FREE_VREG_uu(VAR)       FREE_VREG_PAIR(VAR)
+#define FREE_VREG_vv(VAR)       FREE_VREG_PAIR(VAR)
+#define FREE_VREG_xx(VAR)       FREE_VREG_PAIR(VAR)
+
+#define FREE_QREG(VAR)          tcg_temp_free_ptr(VAR)
+#define FREE_QREG_d(VAR)        FREE_QREG(VAR)
+#define FREE_QREG_e(VAR)        FREE_QREG(VAR)
+#define FREE_QREG_s(VAR)        FREE_QREG(VAR)
+#define FREE_QREG_t(VAR)        FREE_QREG(VAR)
+#define FREE_QREG_u(VAR)        FREE_QREG(VAR)
+#define FREE_QREG_v(VAR)        FREE_QREG(VAR)
+#define FREE_QREG_x(VAR)        FREE_QREG(VAR)
+
+#define READ_VREG(VAR, NUM) \
+    gen_read_vreg(VAR, NUM, 0)
+#define READ_VREG_READONLY(VAR, NUM) \
+    do { \
+        if (readonly_ok(insn)) { \
+            gen_read_vreg_readonly(VAR, NUM, 0); \
+        } else { \
+            gen_read_vreg(VAR, NUM, 0); \
+        } \
+    } while (0)
+
+#define READ_VREG_s(VAR, NUM)    READ_VREG_READONLY(VAR, NUM)
+#define READ_VREG_u(VAR, NUM)    READ_VREG_READONLY(VAR, NUM)
+#define READ_VREG_v(VAR, NUM)    READ_VREG_READONLY(VAR, NUM)
+#define READ_VREG_w(VAR, NUM)    READ_VREG_READONLY(VAR, NUM)
+#define READ_VREG_x(VAR, NUM)    READ_VREG(VAR, NUM)
+#define READ_VREG_y(VAR, NUM)    READ_VREG(VAR, NUM)
+
+#define READ_VREG_PAIR(VAR, NUM) \
+    gen_read_vreg_pair(VAR, NUM, 0)
+#define READ_VREG_uu(VAR, NUM)   READ_VREG_PAIR(VAR, NUM)
+#define READ_VREG_vv(VAR, NUM)   READ_VREG_PAIR(VAR, NUM)
+#define READ_VREG_xx(VAR, NUM)   READ_VREG_PAIR(VAR, NUM)
+
+#define READ_QREG(VAR, NUM) \
+    gen_read_qreg(VAR, NUM, 0)
+#define READ_QREG_s(VAR, NUM)     READ_QREG(VAR, NUM)
+#define READ_QREG_t(VAR, NUM)     READ_QREG(VAR, NUM)
+#define READ_QREG_u(VAR, NUM)     READ_QREG(VAR, NUM)
+#define READ_QREG_v(VAR, NUM)     READ_QREG(VAR, NUM)
+#define READ_QREG_x(VAR, NUM)     READ_QREG(VAR, NUM)
+
+#define DECL_NEW_OREG(TYPE, NAME, NUM, X, OFF) \
+    TYPE NAME; \
+    int NUM = REGNO(X) + OFF
+
+#define READ_NEW_OREG(tmp, i) (tmp = tcg_const_tl(i))
+
+#define FREE_NEW_OREG(NAME) \
+    tcg_temp_free(NAME)
+
+#define LOG_VREG_WRITE(NUM, VAR, VNEW) \
+    do { \
+        int is_predicated = GET_ATTRIB(insn->opcode, A_CONDEXEC); \
+        gen_log_vreg_write(VAR, NUM, VNEW, insn->slot); \
+        ctx_log_vreg_write(ctx, (NUM), is_predicated); \
+    } while (0)
+
+#define LOG_VREG_WRITE_PAIR(NUM, VAR, VNEW) \
+    do { \
+        int is_predicated = GET_ATTRIB(insn->opcode, A_CONDEXEC); \
+        gen_log_vreg_write_pair(VAR, NUM, VNEW, insn->slot); \
+        ctx_log_vreg_write(ctx, (NUM) ^ 0, is_predicated); \
+        ctx_log_vreg_write(ctx, (NUM) ^ 1, is_predicated); \
+    } while (0)
+
+#define LOG_QREG_WRITE(NUM, VAR, VNEW) \
+    do { \
+        int is_predicated = GET_ATTRIB(insn->opcode, A_CONDEXEC); \
+        gen_log_qreg_write(VAR, NUM, VNEW, insn->slot); \
+        ctx_log_qreg_write(ctx, (NUM), is_predicated); \
+    } while (0)
+#else
+#define NEW_WRITTEN(NUM) ((env->VRegs_select >> (NUM)) & 1)
+#define TMP_WRITTEN(NUM) ((env->VRegs_updated_tmp >> (NUM)) & 1)
+
+#define LOG_VREG_WRITE_FUNC(X) \
+    _Generic((X), void * : log_vreg_write, mmvector_t : log_mmvector_write)
+#define LOG_VREG_WRITE(NUM, VAR, VNEW) \
+    LOG_VREG_WRITE_FUNC(VAR)(env, NUM, VAR, VNEW, slot)
+
+#define READ_EXT_VREG(NUM, VAR, VTMP) \
+    do { \
+        VAR = ((NEW_WRITTEN(NUM)) ? env->future_VRegs[NUM] \
+                                  : env->VRegs[NUM]); \
+        VAR = ((TMP_WRITTEN(NUM)) ? env->tmp_VRegs[NUM] : VAR); \
+        if (VTMP == EXT_TMP) { \
+            if (env->VRegs_updated & ((VRegMask)1) << (NUM)) { \
+                VAR = env->future_VRegs[NUM]; \
+                env->VRegs_updated ^= ((VRegMask)1) << (NUM); \
+            } \
+        } \
+    } while (0)
+
+#define READ_EXT_VREG_PAIR(NUM, VAR, VTMP) \
+    do { \
+        READ_EXT_VREG((NUM) ^ 0, VAR.v[0], VTMP); \
+        READ_EXT_VREG((NUM) ^ 1, VAR.v[1], VTMP) \
+    } while (0)
+#endif
+
+#define WRITE_EXT_VREG(NUM, VAR, VNEW)   LOG_VREG_WRITE(NUM, VAR, VNEW)
+#define WRITE_VREG_d(NUM, VAR, VNEW)     LOG_VREG_WRITE(NUM, VAR, VNEW)
+#define WRITE_VREG_x(NUM, VAR, VNEW)     LOG_VREG_WRITE(NUM, VAR, VNEW)
+#define WRITE_VREG_y(NUM, VAR, VNEW)     LOG_VREG_WRITE(NUM, VAR, VNEW)
+
+#define WRITE_VREG_dd(NUM, VAR, VNEW)    LOG_VREG_WRITE_PAIR(NUM, VAR, VNEW)
+#define WRITE_VREG_xx(NUM, VAR, VNEW)    LOG_VREG_WRITE_PAIR(NUM, VAR, VNEW)
+#define WRITE_VREG_yy(NUM, VAR, VNEW)    LOG_VREG_WRITE_PAIR(NUM, VAR, VNEW)
+
+#define WRITE_QREG_d(NUM, VAR, VNEW)     LOG_QREG_WRITE(NUM, VAR, VNEW)
+#define WRITE_QREG_e(NUM, VAR, VNEW)     LOG_QREG_WRITE(NUM, VAR, VNEW)
+#define WRITE_QREG_x(NUM, VAR, VNEW)     LOG_QREG_WRITE(NUM, VAR, VNEW)
+
+#endif
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 63/67] Hexagon HVX macros referenced in instruction semantics
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (61 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 62/67] Hexagon HVX macros to interface with the generator Taylor Simpson
@ 2020-02-28 16:43 ` Taylor Simpson
  2020-02-28 16:44 ` [RFC PATCH v2 64/67] Hexagon HVX helper to commit vector stores (masked and scatter/gather) Taylor Simpson
                   ` (4 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/mmvec/macros.h | 436 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 436 insertions(+)

diff --git a/target/hexagon/mmvec/macros.h b/target/hexagon/mmvec/macros.h
index be80bbd..c63a00a 100644
--- a/target/hexagon/mmvec/macros.h
+++ b/target/hexagon/mmvec/macros.h
@@ -259,4 +259,440 @@ static bool readonly_ok(insn_t *insn)
 #define WRITE_QREG_e(NUM, VAR, VNEW)     LOG_QREG_WRITE(NUM, VAR, VNEW)
 #define WRITE_QREG_x(NUM, VAR, VNEW)     LOG_QREG_WRITE(NUM, VAR, VNEW)
 
+#define LOG_VTCM_BYTE(VA, MASK, VAL, IDX) \
+    do { \
+        env->vtcm_log.data.ub[IDX] = (VAL); \
+        env->vtcm_log.mask.ub[IDX] = (MASK); \
+        env->vtcm_log.va[IDX] = (VA); \
+    } while (0)
+
+/* VTCM Banks */
+#define LOG_VTCM_BANK(VAL, MASK, IDX) \
+    do { \
+        env->vtcm_log.offsets.uh[IDX]  = (VAL & 0xFFF); \
+        env->vtcm_log.offsets.uh[IDX] |= ((MASK & 0xF) << 12) ; \
+    } while (0)
+
+#define fUSE_LOOKUP_ADDRESS_BY_REV(PROC) true
+#define fUSE_LOOKUP_ADDRESS() 1
+#define fRT8NOTE()
+#define fNOTQ(VAL) \
+    ({ \
+        mmqreg_t _ret;  \
+        int _i_;  \
+        for (_i_ = 0; _i_ < fVECSIZE() / 64; _i_++) { \
+            _ret.ud[_i_] = ~VAL.ud[_i_]; \
+        } \
+        _ret;\
+     })
+#define fGETQBITS(REG, WIDTH, MASK, BITNO) \
+    ((MASK) & (REG.w[(BITNO) >> 5] >> ((BITNO) & 0x1f)))
+#define fGETQBIT(REG, BITNO) fGETQBITS(REG, 1, 1, BITNO)
+#define fGENMASKW(QREG, IDX) \
+    (((fGETQBIT(QREG, (IDX * 4 + 0)) ? 0xFF : 0x0) << 0)  | \
+     ((fGETQBIT(QREG, (IDX * 4 + 1)) ? 0xFF : 0x0) << 8)  | \
+     ((fGETQBIT(QREG, (IDX * 4 + 2)) ? 0xFF : 0x0) << 16) | \
+     ((fGETQBIT(QREG, (IDX * 4 + 3)) ? 0xFF : 0x0) << 24))
+#define fGETNIBBLE(IDX, SRC) (fSXTN(4, 8, (SRC >> (4 * IDX)) & 0xF))
+#define fGETCRUMB(IDX, SRC) (fSXTN(2, 8, (SRC >> (2 * IDX)) & 0x3))
+#define fGETCRUMB_SYMMETRIC(IDX, SRC) \
+    ((fGETCRUMB(IDX, SRC) >= 0 ? (2 - fGETCRUMB(IDX, SRC)) \
+                               : fGETCRUMB(IDX, SRC)))
+#define fGENMASKH(QREG, IDX) \
+    (((fGETQBIT(QREG, (IDX * 2 + 0)) ? 0xFF : 0x0) << 0) | \
+     ((fGETQBIT(QREG, (IDX * 2 + 1)) ? 0xFF : 0x0) << 8))
+#define fGETMASKW(VREG, QREG, IDX) (VREG.w[IDX] & fGENMASKW((QREG), IDX))
+#define fGETMASKH(VREG, QREG, IDX) (VREG.h[IDX] & fGENMASKH((QREG), IDX))
+#define fCONDMASK8(QREG, IDX, YESVAL, NOVAL) \
+    (fGETQBIT(QREG, IDX) ? (YESVAL) : (NOVAL))
+#define fCONDMASK16(QREG, IDX, YESVAL, NOVAL) \
+    ((fGENMASKH(QREG, IDX) & (YESVAL)) | \
+     (fGENMASKH(fNOTQ(QREG), IDX) & (NOVAL)))
+#define fCONDMASK32(QREG, IDX, YESVAL, NOVAL) \
+    ((fGENMASKW(QREG, IDX) & (YESVAL)) | \
+     (fGENMASKW(fNOTQ(QREG), IDX) & (NOVAL)))
+#define fSETQBITS(REG, WIDTH, MASK, BITNO, VAL) \
+    do { \
+        size4u_t __TMP = (VAL); \
+        REG.w[(BITNO) >> 5] &= ~((MASK) << ((BITNO) & 0x1f)); \
+        REG.w[(BITNO) >> 5] |= (((__TMP) & (MASK)) << ((BITNO) & 0x1f)); \
+    } while (0)
+#define fSETQBIT(REG, BITNO, VAL) fSETQBITS(REG, 1, 1, BITNO, VAL)
+#define fVBYTES() (fVECSIZE())
+#define fVALIGN(ADDR, LOG2_ALIGNMENT) (ADDR = ADDR & ~(LOG2_ALIGNMENT - 1))
+#define fVLASTBYTE(ADDR, LOG2_ALIGNMENT) (ADDR = ADDR | (LOG2_ALIGNMENT - 1))
+#define fVELEM(WIDTH) ((fVECSIZE() * 8) / WIDTH)
+#define fVECLOGSIZE() (7)
+#define fVECSIZE() (1 << fVECLOGSIZE())
+#define fSWAPB(A, B) do { size1u_t tmp = A; A = B; B = tmp; } while (0)
+static inline mmvector_t mmvec_zero_vector(void)
+{
+    mmvector_t ret;
+    memset(&ret, 0, sizeof(ret));
+    return ret;
+}
+#define fVZERO() mmvec_zero_vector()
+#define fNEWVREG(VNUM) \
+    ((env->VRegs_updated & (((VRegMask)1) << VNUM)) ? env->future_VRegs[VNUM] \
+                                                    : mmvec_zero_vector())
+#define fV_AL_CHECK(EA, MASK) \
+    if ((EA) & (MASK)) { \
+        warn("aligning misaligned vector. EA=%08x", (EA)); \
+    }
+#define fSCATTER_INIT(REGION_START, LENGTH, ELEMENT_SIZE) \
+    do { \
+        mem_vector_scatter_init(env, slot, REGION_START, LENGTH, ELEMENT_SIZE);\
+        if (EXCEPTION_DETECTED) { \
+            return; \
+        } \
+    } while (0)
+#define fGATHER_INIT(REGION_START, LENGTH, ELEMENT_SIZE) \
+    do { \
+        mem_vector_gather_init(env, slot, REGION_START, LENGTH, ELEMENT_SIZE); \
+        if (EXCEPTION_DETECTED) { \
+            return; \
+        } \
+    } while (0)
+#define fSCATTER_FINISH(OP) \
+    do { \
+        if (EXCEPTION_DETECTED) { \
+            return; \
+        } \
+        mem_vector_scatter_finish(env, slot, OP); \
+    } while (0);
+#define fGATHER_FINISH() \
+    do { \
+        if (EXCEPTION_DETECTED) { \
+            return; \
+        } \
+        mem_vector_gather_finish(env, slot); \
+    } while (0);
+#define fLOG_SCATTER_OP(SIZE) \
+    do { \
+        env->vtcm_log.op = 1; \
+        env->vtcm_log.op_size = SIZE; \
+    } while (0)
+#define fVLOG_VTCM_WORD_INCREMENT(EA, OFFSET, INC, IDX, ALIGNMENT, LEN) \
+    do { \
+        int log_byte = 0; \
+        vaddr_t va = EA; \
+        vaddr_t va_high = EA + LEN; \
+        for (int i0 = 0; i0 < 4; i0++) { \
+            log_byte = (va + i0) <= va_high; \
+            LOG_VTCM_BYTE(va + i0, log_byte, INC. ub[4 * IDX + i0], \
+                          4 * IDX + i0); \
+        } \
+    } while (0)
+#define fVLOG_VTCM_HALFWORD_INCREMENT(EA, OFFSET, INC, IDX, ALIGNMENT, LEN) \
+    do { \
+        int log_byte = 0; \
+        vaddr_t va = EA; \
+        vaddr_t va_high = EA + LEN; \
+        for (int i0 = 0; i0 < 2; i0++) { \
+            log_byte = (va + i0) <= va_high; \
+            LOG_VTCM_BYTE(va + i0, log_byte, INC.ub[2 * IDX + i0], \
+                          2 * IDX + i0); \
+        } \
+    } while (0)
+
+#define fVLOG_VTCM_HALFWORD_INCREMENT_DV(EA, OFFSET, INC, IDX, IDX2, IDX_H, \
+                                         ALIGNMENT, LEN) \
+    do { \
+        int log_byte = 0; \
+        vaddr_t va = EA; \
+        vaddr_t va_high = EA + LEN; \
+        for (int i0 = 0; i0 < 2; i0++) { \
+            log_byte = (va + i0) <= va_high; \
+            LOG_VTCM_BYTE(va + i0, log_byte, INC.ub[2 * IDX + i0], \
+                          2 * IDX + i0); \
+        } \
+    } while (0)
+
+/* NOTE - Will this always be tmp_VRegs[0]; */
+#define GATHER_FUNCTION(EA, OFFSET, IDX, LEN, ELEMENT_SIZE, BANK_IDX, QVAL) \
+    do { \
+        int i0; \
+        vaddr_t va = EA; \
+        vaddr_t va_high = EA + LEN; \
+        int log_bank = 0; \
+        int log_byte = 0; \
+        for (i0 = 0; i0 < ELEMENT_SIZE; i0++) { \
+            log_byte = ((va + i0) <= va_high) && QVAL; \
+            log_bank |= (log_byte << i0); \
+            size1u_t B; \
+            get_user_u8(B, EA + i0); \
+            env->tmp_VRegs[0].ub[ELEMENT_SIZE * IDX + i0] = B; \
+            LOG_VTCM_BYTE(va + i0, log_byte, B, ELEMENT_SIZE * IDX + i0); \
+        } \
+        LOG_VTCM_BANK(va, log_bank, BANK_IDX); \
+    } while (0)
+#define fVLOG_VTCM_GATHER_WORD(EA, OFFSET, IDX, LEN) \
+    do { \
+        GATHER_FUNCTION(EA, OFFSET, IDX, LEN, 4, IDX, 1); \
+    } while (0)
+#define fVLOG_VTCM_GATHER_HALFWORD(EA, OFFSET, IDX, LEN) \
+    do { \
+        GATHER_FUNCTION(EA, OFFSET, IDX, LEN, 2, IDX, 1); \
+    } while (0)
+#define fVLOG_VTCM_GATHER_HALFWORD_DV(EA, OFFSET, IDX, IDX2, IDX_H, LEN) \
+    do { \
+        GATHER_FUNCTION(EA, OFFSET, IDX, LEN, 2, (2 * IDX2 + IDX_H), 1); \
+    } while (0)
+#define fVLOG_VTCM_GATHER_WORDQ(EA, OFFSET, IDX, Q, LEN) \
+    do { \
+        GATHER_FUNCTION(EA, OFFSET, IDX, LEN, 4, IDX, \
+                        fGETQBIT(QsV, 4 * IDX + i0)); \
+    } while (0)
+#define fVLOG_VTCM_GATHER_HALFWORDQ(EA, OFFSET, IDX, Q, LEN) \
+    do { \
+        GATHER_FUNCTION(EA, OFFSET, IDX, LEN, 2, IDX, \
+                        fGETQBIT(QsV, 2 * IDX + i0)); \
+    } while (0)
+#define fVLOG_VTCM_GATHER_HALFWORDQ_DV(EA, OFFSET, IDX, IDX2, IDX_H, Q, LEN) \
+    do { \
+        GATHER_FUNCTION(EA, OFFSET, IDX, LEN, 2, (2 * IDX2 + IDX_H), \
+                        fGETQBIT(QsV, 2 * IDX + i0)); \
+    } while (0)
+#define SCATTER_OP_WRITE_TO_MEM(TYPE) \
+    do { \
+        for (int i = 0; i < env->vtcm_log.size; i += sizeof(TYPE)) { \
+            if (env->vtcm_log.mask.ub[i] != 0) { \
+                TYPE dst = 0; \
+                TYPE inc = 0; \
+                for (int j = 0; j < sizeof(TYPE); j++) { \
+                    size1u_t val; \
+                    get_user_u8(val, env->vtcm_log.va[i + j]); \
+                    dst |= val << (8 * j); \
+                    inc |= env->vtcm_log.data.ub[j + i] << (8 * j); \
+                    env->vtcm_log.mask.ub[j + i] = 0; \
+                    env->vtcm_log.data.ub[j + i] = 0; \
+                    env->vtcm_log.offsets.ub[j + i] = 0; \
+                } \
+                dst += inc; \
+                for (int j = 0; j < sizeof(TYPE); j++) { \
+                    put_user_u8((dst >> (8 * j)) & 0xFF, \
+                        env->vtcm_log.va[i + j]);  \
+                } \
+            } \
+        } \
+    } while (0)
+#define SCATTER_FUNCTION(EA, OFFSET, IDX, LEN, ELEM_SIZE, BANK_IDX, QVAL, IN) \
+    do { \
+        int i0; \
+        target_ulong va = EA; \
+        target_ulong va_high = EA + LEN; \
+        int log_bank = 0; \
+        int log_byte = 0; \
+        for (i0 = 0; i0 < ELEM_SIZE; i0++) { \
+            log_byte = ((va + i0) <= va_high) && QVAL; \
+            log_bank |= (log_byte << i0); \
+            LOG_VTCM_BYTE(va + i0, log_byte, IN.ub[ELEM_SIZE * IDX + i0], \
+                          ELEM_SIZE * IDX + i0); \
+        } \
+        LOG_VTCM_BANK(va, log_bank, BANK_IDX); \
+    } while (0)
+#define fVLOG_VTCM_HALFWORD(EA, OFFSET, IN, IDX, LEN) \
+    do { \
+        SCATTER_FUNCTION(EA, OFFSET, IDX, LEN, 2, IDX, 1, IN); \
+    } while (0)
+#define fVLOG_VTCM_WORD(EA, OFFSET, IN, IDX, LEN) \
+    do { \
+        SCATTER_FUNCTION(EA, OFFSET, IDX, LEN, 4, IDX, 1, IN); \
+    } while (0)
+#define fVLOG_VTCM_HALFWORDQ(EA, OFFSET, IN, IDX, Q, LEN) \
+    do { \
+        SCATTER_FUNCTION(EA, OFFSET, IDX, LEN, 2, IDX, \
+                         fGETQBIT(QsV, 2 * IDX + i0), IN); \
+    } while (0)
+#define fVLOG_VTCM_WORDQ(EA, OFFSET, IN, IDX, Q, LEN) \
+    do { \
+        SCATTER_FUNCTION(EA, OFFSET, IDX, LEN, 4, IDX, \
+                         fGETQBIT(QsV, 4 * IDX + i0), IN); \
+    } while (0)
+#define fVLOG_VTCM_HALFWORD_DV(EA, OFFSET, IN, IDX, IDX2, IDX_H, LEN) \
+    do { \
+        SCATTER_FUNCTION(EA, OFFSET, IDX, LEN, 2, \
+                         (2 * IDX2 + IDX_H), 1, IN); \
+    } while (0)
+#define fVLOG_VTCM_HALFWORDQ_DV(EA, OFFSET, IN, IDX, Q, IDX2, IDX_H, LEN) \
+    do { \
+        SCATTER_FUNCTION(EA, OFFSET, IDX, LEN, 2, (2 * IDX2 + IDX_H), \
+                         fGETQBIT(QsV, 2 * IDX + i0), IN); \
+    } while (0)
+#define fSTORERELEASE(EA, TYPE) \
+    do { \
+        fV_AL_CHECK(EA, fVECSIZE() - 1); \
+    } while (0)
+#define fLOADMMV_AL(EA, ALIGNMENT, LEN, DST) \
+    do { \
+        fV_AL_CHECK(EA, ALIGNMENT - 1); \
+        mem_load_vector_oddva(env, EA & ~(ALIGNMENT - 1), EA, slot, LEN, \
+                              &DST.ub[0], fUSE_LOOKUP_ADDRESS_BY_REV()); \
+    } while (0)
+#define fLOADMMV(EA, DST) fLOADMMV_AL(EA, fVECSIZE(), fVECSIZE(), DST)
+#define fLOADMMVU_AL(EA, ALIGNMENT, LEN, DST) \
+    do { \
+        size4u_t size2 = (EA) & (ALIGNMENT - 1); \
+        size4u_t size1 = LEN - size2; \
+        mem_load_vector_oddva(env, EA + size1, EA + fVECSIZE(), 1, size2, \
+                              &DST.ub[size1], fUSE_LOOKUP_ADDRESS()); \
+        mem_load_vector_oddva(env, EA, EA, 0, size1, &DST.ub[0], \
+                              fUSE_LOOKUP_ADDRESS_BY_REV()); \
+    } while (0)
+#define fLOADMMVU(EA, DST) \
+    do { \
+        if ((EA & (fVECSIZE() - 1)) == 0) { \
+            fLOADMMV_AL(EA, fVECSIZE(), fVECSIZE(), DST); \
+        } else { \
+            fLOADMMVU_AL(EA, fVECSIZE(), fVECSIZE(), DST); \
+        } \
+    } while (0)
+#define fSTOREMMV_AL(EA, ALIGNMENT, LEN, SRC) \
+    do  { \
+        fV_AL_CHECK(EA, ALIGNMENT - 1); \
+        mem_store_vector_oddva(env, EA & ~(ALIGNMENT - 1), EA, slot, LEN, \
+                               &SRC.ub[0], 0, 0, \
+                               fUSE_LOOKUP_ADDRESS_BY_REV()); \
+    } while (0)
+#define fSTOREMMV(EA, SRC) fSTOREMMV_AL(EA, fVECSIZE(), fVECSIZE(), SRC)
+#define fSTOREMMVQ_AL(EA, ALIGNMENT, LEN, SRC, MASK) \
+    do { \
+        mmvector_t maskvec; \
+        int i; \
+        for (i = 0; i < fVECSIZE(); i++) { \
+            maskvec.ub[i] = fGETQBIT(MASK, i); \
+        } \
+        mem_store_vector_oddva(env, EA & ~(ALIGNMENT - 1), EA, slot, LEN, \
+                               &SRC.ub[0], &maskvec.ub[0], 0, \
+                               fUSE_LOOKUP_ADDRESS_BY_REV()); \
+    } while (0)
+#define fSTOREMMVQ(EA, SRC, MASK) \
+    fSTOREMMVQ_AL(EA, fVECSIZE(), fVECSIZE(), SRC, MASK)
+#define fSTOREMMVNQ_AL(EA, ALIGNMENT, LEN, SRC, MASK) \
+    do { \
+        mmvector_t maskvec; \
+        int i; \
+        for (i = 0; i < fVECSIZE(); i++) { \
+            maskvec.ub[i] = fGETQBIT(MASK, i); \
+        } \
+        fV_AL_CHECK(EA, ALIGNMENT - 1); \
+        mem_store_vector_oddva(env, EA & ~(ALIGNMENT - 1), EA, slot, LEN, \
+                               &SRC.ub[0], &maskvec.ub[0], 1, \
+                               fUSE_LOOKUP_ADDRESS_BY_REV()); \
+    } while (0)
+#define fSTOREMMVNQ(EA, SRC, MASK) \
+    fSTOREMMVNQ_AL(EA, fVECSIZE(), fVECSIZE(), SRC, MASK)
+#define fSTOREMMVU_AL(EA, ALIGNMENT, LEN, SRC) \
+    do { \
+        size4u_t size1 = ALIGNMENT - ((EA) & (ALIGNMENT - 1)); \
+        size4u_t size2; \
+        if (size1 > LEN) { \
+            size1 = LEN; \
+        } \
+        size2 = LEN - size1; \
+        mem_store_vector_oddva(env, EA + size1, EA + fVECSIZE(), 1, size2, \
+                               &SRC.ub[size1], 0, 0, \
+                               fUSE_LOOKUP_ADDRESS()); \
+        mem_store_vector_oddva(env, EA, EA, 0, size1, &SRC.ub[0], 0, 0, \
+                               fUSE_LOOKUP_ADDRESS_BY_REV()); \
+    } while (0)
+#define fSTOREMMVU(EA, SRC) \
+    do { \
+        if ((EA & (fVECSIZE() - 1)) == 0) { \
+            fSTOREMMV_AL(EA, fVECSIZE(), fVECSIZE(), SRC); \
+        } else { \
+            fSTOREMMVU_AL(EA, fVECSIZE(), fVECSIZE(), SRC); \
+        } \
+    } while (0)
+#define fSTOREMMVQU_AL(EA, ALIGNMENT, LEN, SRC, MASK) \
+    do { \
+        size4u_t size1 = ALIGNMENT - ((EA) & (ALIGNMENT - 1)); \
+        size4u_t size2; \
+        mmvector_t maskvec; \
+        int i; \
+        for (i = 0; i < fVECSIZE(); i++) { \
+            maskvec.ub[i] = fGETQBIT(MASK, i); \
+        } \
+        if (size1 > LEN) { \
+            size1 = LEN; \
+        } \
+        size2 = LEN - size1; \
+        mem_store_vector_oddva(env, EA + size1, EA + fVECSIZE(), 1, size2, \
+                               &SRC.ub[size1], &maskvec.ub[size1], 0, \
+                               fUSE_LOOKUP_ADDRESS()); \
+        mem_store_vector_oddva(env, EA, 0, size1, &SRC.ub[0], &maskvec.ub[0], \
+                               0, fUSE_LOOKUP_ADDRESS_BY_REV()); \
+    } while (0)
+#define fSTOREMMVNQU_AL(EA, ALIGNMENT, LEN, SRC, MASK) \
+    do { \
+        size4u_t size1 = ALIGNMENT - ((EA) & (ALIGNMENT - 1)); \
+        size4u_t size2; \
+        mmvector_t maskvec; \
+        int i; \
+        for (i = 0; i < fVECSIZE(); i++) { \
+            maskvec.ub[i] = fGETQBIT(MASK, i); \
+        } \
+        if (size1 > LEN) { \
+            size1 = LEN; \
+        } \
+        size2 = LEN - size1; \
+        mem_store_vector_oddva(env, EA + size1, EA + fVECSIZE(), 1, size2, \
+                               &SRC.ub[size1], &maskvec.ub[size1], 1, \
+                               fUSE_LOOKUP_ADDRESS()); \
+        mem_store_vector_oddva(env, EA, EA, 0, size1, &SRC.ub[0], \
+                               &maskvec.ub[0], 1, \
+                               fUSE_LOOKUP_ADDRESS_BY_REV()); \
+    } while (0)
+#define fVFOREACH(WIDTH, VAR) for (VAR = 0; VAR < fVELEM(WIDTH); VAR++)
+#define fVARRAY_ELEMENT_ACCESS(ARRAY, TYPE, INDEX) \
+    ARRAY.v[(INDEX) / (fVECSIZE() / (sizeof(ARRAY.TYPE[0])))].TYPE[(INDEX) % \
+    (fVECSIZE() / (sizeof(ARRAY.TYPE[0])))]
+/* Grabs the .tmp data, wherever it is, and clears the .tmp status */
+/* Used for vhist */
+static inline mmvector_t mmvec_vtmp_data(void)
+{
+    mmvector_t ret;
+    g_assert_not_reached();
+    return ret;
+}
+#define fTMPVDATA() mmvec_vtmp_data()
+#define fVSATDW(U, V) fVSATW(((((long long)U) << 32) | fZXTN(32, 64, V)))
+#define fVASL_SATHI(U, V) fVSATW(((U) << 1) | ((V) >> 31))
+#define fVUADDSAT(WIDTH, U, V) \
+    fVSATUN(WIDTH, fZXTN(WIDTH, 2 * WIDTH, U) + fZXTN(WIDTH, 2 * WIDTH, V))
+#define fVSADDSAT(WIDTH, U, V) \
+    fVSATN(WIDTH, fSXTN(WIDTH, 2 * WIDTH, U) + fSXTN(WIDTH, 2 * WIDTH, V))
+#define fVUSUBSAT(WIDTH, U, V) \
+    fVSATUN(WIDTH, fZXTN(WIDTH, 2 * WIDTH, U) - fZXTN(WIDTH, 2 * WIDTH, V))
+#define fVSSUBSAT(WIDTH, U, V) \
+    fVSATN(WIDTH, fSXTN(WIDTH, 2 * WIDTH, U) - fSXTN(WIDTH, 2 * WIDTH, V))
+#define fVAVGU(WIDTH, U, V) \
+    ((fZXTN(WIDTH, 2 * WIDTH, U) + fZXTN(WIDTH, 2 * WIDTH, V)) >> 1)
+#define fVAVGURND(WIDTH, U, V) \
+    ((fZXTN(WIDTH, 2 * WIDTH, U) + fZXTN(WIDTH, 2 * WIDTH, V) + 1) >> 1)
+#define fVNAVGU(WIDTH, U, V) \
+    ((fZXTN(WIDTH, 2 * WIDTH, U) - fZXTN(WIDTH, 2 * WIDTH, V)) >> 1)
+#define fVNAVGURNDSAT(WIDTH, U, V) \
+    fVSATUN(WIDTH, ((fZXTN(WIDTH, 2 * WIDTH, U) - \
+                     fZXTN(WIDTH, 2 * WIDTH, V) + 1) >> 1))
+#define fVAVGS(WIDTH, U, V) \
+    ((fSXTN(WIDTH, 2 * WIDTH, U) + fSXTN(WIDTH, 2 * WIDTH, V)) >> 1)
+#define fVAVGSRND(WIDTH, U, V) \
+    ((fSXTN(WIDTH, 2 * WIDTH, U) + fSXTN(WIDTH, 2 * WIDTH, V) + 1) >> 1)
+#define fVNAVGS(WIDTH, U, V) \
+    ((fSXTN(WIDTH, 2 * WIDTH, U) - fSXTN(WIDTH, 2 * WIDTH, V)) >> 1)
+#define fVNAVGSRND(WIDTH, U, V) \
+    ((fSXTN(WIDTH, 2 * WIDTH, U) - fSXTN(WIDTH, 2 * WIDTH, V) + 1) >> 1)
+#define fVNAVGSRNDSAT(WIDTH, U, V) \
+    fVSATN(WIDTH, ((fSXTN(WIDTH, 2 * WIDTH, U) - \
+                    fSXTN(WIDTH, 2 * WIDTH, V) + 1) >> 1))
+#define fVNOROUND(VAL, SHAMT) VAL
+#define fVNOSAT(VAL) VAL
+#define fVROUND(VAL, SHAMT) \
+    ((VAL) + (((SHAMT) > 0) ? (1LL << ((SHAMT) - 1)) : 0))
+#define fCARRY_FROM_ADD32(A, B, C) \
+    (((fZXTN(32, 64, A) + fZXTN(32, 64, B) + C) >> 32) & 1)
+#define fUARCH_NOTE_PUMP_4X()
+#define fUARCH_NOTE_PUMP_2X()
+
 #endif
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 64/67] Hexagon HVX helper to commit vector stores (masked and scatter/gather)
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (62 preceding siblings ...)
  2020-02-28 16:43 ` [RFC PATCH v2 63/67] Hexagon HVX macros referenced in instruction semantics Taylor Simpson
@ 2020-02-28 16:44 ` Taylor Simpson
  2020-02-28 16:44 ` [RFC PATCH v2 65/67] Hexagon HVX TCG generation Taylor Simpson
                   ` (3 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/helper.h    |  1 +
 target/hexagon/op_helper.c | 75 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 76 insertions(+)

diff --git a/target/hexagon/helper.h b/target/hexagon/helper.h
index 8558da8..64e798b 100644
--- a/target/hexagon/helper.h
+++ b/target/hexagon/helper.h
@@ -21,6 +21,7 @@ DEF_HELPER_2(raise_exception, noreturn, env, i32)
 DEF_HELPER_1(debug_start_packet, void, env)
 DEF_HELPER_2(new_value, s32, env, int)
 DEF_HELPER_3(debug_check_store_width, void, env, int, int)
+DEF_HELPER_1(commit_hvx_stores, void, env)
 DEF_HELPER_3(debug_commit_end, void, env, int, int)
 DEF_HELPER_3(sfrecipa_val, s32, env, s32, s32)
 DEF_HELPER_3(sfrecipa_pred, s32, env, s32, s32)
diff --git a/target/hexagon/op_helper.c b/target/hexagon/op_helper.c
index d944d03..72250cc 100644
--- a/target/hexagon/op_helper.c
+++ b/target/hexagon/op_helper.c
@@ -26,6 +26,8 @@
 #include "arch.h"
 #include "fma_emu.h"
 #include "conv_emu.h"
+#include "mmvec/mmvec.h"
+#include "mmvec/macros.h"
 
 #if COUNT_HEX_HELPERS
 #include "opcodes.h"
@@ -199,6 +201,51 @@ void HELPER(debug_check_store_width)(CPUHexagonState *env, int slot, int check)
     }
 }
 
+void HELPER(commit_hvx_stores)(CPUHexagonState *env)
+{
+    int i;
+
+    /* Normal (possibly masked) vector store */
+    for (i = 0; i < VSTORES_MAX; i++) {
+        if (env->vstore_pending[i]) {
+            env->vstore_pending[i] = 0;
+            target_ulong va = env->vstore[i].va;
+            int size = env->vstore[i].size;
+            for (int j = 0; j < size; j++) {
+                if (env->vstore[i].mask.ub[j]) {
+                    put_user_u8(env->vstore[i].data.ub[j], va + j);
+                }
+            }
+        }
+    }
+
+    /* Scatter store */
+    if (env->vtcm_pending) {
+        env->vtcm_pending = 0;
+        if (env->vtcm_log.op) {
+            /* Need to perform the scatter read/modify/write at commit time */
+            if (env->vtcm_log.op_size == 2) {
+                SCATTER_OP_WRITE_TO_MEM(size2u_t);
+            } else if (env->vtcm_log.op_size == 4) {
+                /* Word Scatter += */
+                SCATTER_OP_WRITE_TO_MEM(size4u_t);
+            } else {
+                g_assert_not_reached();
+            }
+        } else {
+            for (int i = 0; i < env->vtcm_log.size; i++) {
+                if (env->vtcm_log.mask.ub[i] != 0) {
+                    put_user_u8(env->vtcm_log.data.ub[i], env->vtcm_log.va[i]);
+                    env->vtcm_log.mask.ub[i] = 0;
+                    env->vtcm_log.data.ub[i] = 0;
+                    env->vtcm_log.offsets.ub[i] = 0;
+                }
+
+            }
+        }
+    }
+}
+
 static void print_store(CPUHexagonState *env, int slot)
 {
     if (!(env->slot_cancelled & (1 << slot))) {
@@ -415,6 +462,34 @@ void HELPER(debug_value_i64)(CPUHexagonState *env, int64_t value)
     HEX_DEBUG_LOG("value = 0x%lx\n", value);
 }
 
+/* Log a write to HVX vector */
+static inline void log_vreg_write(CPUHexagonState *env, int num, void *var,
+                                      int vnew, uint32_t slot)
+{
+    HEX_DEBUG_LOG("log_vreg_write[%d]", num);
+    if (env->slot_cancelled & (1 << slot)) {
+        HEX_DEBUG_LOG(" CANCELLED");
+    }
+    HEX_DEBUG_LOG("\n");
+
+    if (!(env->slot_cancelled & (1 << slot))) {
+        VRegMask regnum_mask = ((VRegMask)1) << num;
+        env->VRegs_updated |=      (vnew != EXT_TMP) ? regnum_mask : 0;
+        env->VRegs_select |=       (vnew == EXT_NEW) ? regnum_mask : 0;
+        env->VRegs_updated_tmp  |= (vnew == EXT_TMP) ? regnum_mask : 0;
+        env->future_VRegs[num] = *(mmvector_t *)var;
+        if (vnew == EXT_TMP) {
+            env->tmp_VRegs[num] = env->future_VRegs[num];
+        }
+    }
+}
+
+static inline void log_mmvector_write(CPUHexagonState *env, int num,
+                                      mmvector_t var, int vnew, uint32_t slot)
+{
+    log_vreg_write(env, num, &var, vnew, slot);
+}
+
 static void cancel_slot(CPUHexagonState *env, uint32_t slot)
 {
     HEX_DEBUG_LOG("Slot %d cancelled\n", slot);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 65/67] Hexagon HVX TCG generation
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (63 preceding siblings ...)
  2020-02-28 16:44 ` [RFC PATCH v2 64/67] Hexagon HVX helper to commit vector stores (masked and scatter/gather) Taylor Simpson
@ 2020-02-28 16:44 ` Taylor Simpson
  2020-02-28 16:44 ` [RFC PATCH v2 66/67] Hexagon HVX translation Taylor Simpson
                   ` (2 subsequent siblings)
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/genptr_helpers.h | 202 ++++++++++++++++++++++++++++++++++++++++
 target/hexagon/genptr.c         |   1 +
 2 files changed, 203 insertions(+)

diff --git a/target/hexagon/genptr_helpers.h b/target/hexagon/genptr_helpers.h
index e342f29..84479f8 100644
--- a/target/hexagon/genptr_helpers.h
+++ b/target/hexagon/genptr_helpers.h
@@ -844,4 +844,206 @@ static inline void gen_lshiftr_4_4u(TCGv dst, TCGv src, int32_t shift_amt)
     }
 }
 
+static inline uint32_t new_temp_vreg_offset(DisasContext *ctx, int num)
+{
+    uint32_t offset =
+        offsetof(CPUHexagonState, temp_vregs[ctx->ctx_temp_vregs_idx]);
+
+    HEX_DEBUG_LOG("new_temp_vreg_offset: %d\n", ctx->ctx_temp_vregs_idx);
+    g_assert(ctx->ctx_temp_vregs_idx + num - 1 < TEMP_VECTORS_MAX);
+    ctx->ctx_temp_vregs_idx += num;
+    return offset;
+}
+
+static inline uint32_t new_temp_qreg_offset(DisasContext *ctx)
+{
+    uint32_t offset =
+        offsetof(CPUHexagonState, temp_qregs[ctx->ctx_temp_qregs_idx]);
+
+    HEX_DEBUG_LOG("new_temp_qreg_offset: %d\n", ctx->ctx_temp_qregs_idx);
+    g_assert(ctx->ctx_temp_qregs_idx < TEMP_VECTORS_MAX);
+    ctx->ctx_temp_qregs_idx++;
+    return offset;
+}
+
+static inline void gen_read_qreg(TCGv_ptr var, int num, int vtmp)
+{
+    uint32_t offset = offsetof(CPUHexagonState, QRegs[(num)]);
+    TCGv_ptr src = tcg_temp_new_ptr();
+    tcg_gen_addi_ptr(src, cpu_env, offset);
+    gen_memcpy(var, src, sizeof(mmqreg_t));
+    tcg_temp_free_ptr(src);
+}
+
+static inline void gen_read_vreg_ptr_src(TCGv_ptr ptr_src, int num, int vtmp)
+{
+    TCGv zero = tcg_const_tl(0);
+    TCGv offset_future =
+        tcg_const_tl(offsetof(CPUHexagonState, future_VRegs[num]));
+    TCGv offset_vregs =
+        tcg_const_tl(offsetof(CPUHexagonState, VRegs[num]));
+    TCGv offset_tmp_vregs =
+        tcg_const_tl(offsetof(CPUHexagonState, tmp_VRegs[num]));
+    TCGv offset = tcg_temp_new();
+    TCGv_ptr offset_ptr = tcg_temp_new_ptr();
+    TCGv new_written = tcg_temp_new();
+    TCGv tmp_written = tcg_temp_new();
+
+    /*
+     *  new_written = (hex_VRegs_select >> num) & 1;
+     *  offset = new_written ? offset_future, offset_vregs;
+     */
+    tcg_gen_shri_tl(new_written, hex_VRegs_select, num);
+    tcg_gen_andi_tl(new_written, new_written, 1);
+    tcg_gen_movcond_tl(TCG_COND_NE, offset, new_written, zero,
+                       offset_future, offset_vregs);
+
+    /*
+     * tmp_written = (hex_VRegs_updated_tmp >> num) & 1;
+     * if (tmp_written) offset = offset_tmp_vregs;
+     */
+    tcg_gen_shri_tl(tmp_written, hex_VRegs_updated_tmp, num);
+    tcg_gen_andi_tl(tmp_written, tmp_written, 1);
+    tcg_gen_movcond_tl(TCG_COND_NE, offset, tmp_written, zero,
+                       offset_tmp_vregs, offset);
+
+    if (vtmp == EXT_TMP) {
+        TCGv vregs_updated = tcg_temp_new();
+        TCGv temp = tcg_temp_new();
+
+        /*
+         * vregs_updated = hex_VRegs_updates & (1 << num);
+         * if (vregs_updated) {
+         *     offset = offset_future;
+         *     hex_VRegs_updated ^= (1 << num);
+         * }
+         */
+        tcg_gen_andi_tl(vregs_updated, hex_VRegs_updated, 1 << num);
+        tcg_gen_movcond_tl(TCG_COND_NE, offset, vregs_updated, zero,
+                           offset_future, offset);
+        tcg_gen_xori_tl(temp, hex_VRegs_updated, 1 << num);
+        tcg_gen_movcond_tl(TCG_COND_NE, hex_VRegs_updated, vregs_updated, zero,
+                           temp, hex_VRegs_updated);
+
+        tcg_temp_free(vregs_updated);
+        tcg_temp_free(temp);
+    }
+
+    tcg_gen_ext_i32_ptr(offset_ptr, offset);
+    tcg_gen_add_ptr(ptr_src, cpu_env, offset_ptr);
+
+    tcg_temp_free(zero);
+    tcg_temp_free(offset_future);
+    tcg_temp_free(offset_vregs);
+    tcg_temp_free(offset_tmp_vregs);
+    tcg_temp_free(offset);
+    tcg_temp_free_ptr(offset_ptr);
+    tcg_temp_free(new_written);
+    tcg_temp_free(tmp_written);
+}
+
+static inline void gen_read_vreg_readonly(TCGv_ptr var, int num, int vtmp)
+{
+    TCGv_ptr ptr_src = tcg_temp_new_ptr();
+    gen_read_vreg_ptr_src(ptr_src, num, vtmp);
+    tcg_gen_addi_ptr(var, ptr_src, 0);
+    tcg_temp_free_ptr(ptr_src);
+}
+
+static inline void gen_read_vreg(TCGv_ptr var, int num, int vtmp)
+{
+    TCGv_ptr ptr_src = tcg_temp_new_ptr();
+    gen_read_vreg_ptr_src(ptr_src, num, vtmp);
+    gen_memcpy(var, ptr_src, sizeof(mmvector_t));
+    tcg_temp_free_ptr(ptr_src);
+}
+
+static inline void gen_read_vreg_pair(TCGv_ptr var, int num, int vtmp)
+{
+    TCGv_ptr v0 = tcg_temp_new_ptr();
+    TCGv_ptr v1 = tcg_temp_new_ptr();
+    tcg_gen_addi_ptr(v0, var, offsetof(mmvector_pair_t, v[0]));
+    gen_read_vreg(v0, num ^ 0, vtmp);
+    tcg_gen_addi_ptr(v1, var, offsetof(mmvector_pair_t, v[1]));
+    gen_read_vreg(v1, num ^ 1, vtmp);
+    tcg_temp_free_ptr(v0);
+    tcg_temp_free_ptr(v1);
+}
+
+static inline void gen_log_vreg_write(TCGv_ptr var, int num, int vnew,
+                                      int slot_num)
+{
+    TCGv cancelled = tcg_temp_local_new();
+    TCGLabel *label_end = gen_new_label();
+
+    /* Don't do anything if the slot was cancelled */
+    gen_slot_cancelled_check(cancelled, slot_num);
+    tcg_gen_brcondi_tl(TCG_COND_NE, cancelled, 0, label_end);
+    {
+        TCGv mask = tcg_const_tl(1 << num);
+        TCGv_ptr dst = tcg_temp_new_ptr();
+        if (vnew != EXT_TMP) {
+            tcg_gen_or_tl(hex_VRegs_updated, hex_VRegs_updated, mask);
+        }
+        if (vnew == EXT_NEW) {
+            tcg_gen_or_tl(hex_VRegs_select, hex_VRegs_select, mask);
+        }
+        if (vnew == EXT_TMP) {
+            tcg_gen_or_tl(hex_VRegs_updated_tmp, hex_VRegs_updated_tmp, mask);
+        }
+        tcg_gen_addi_ptr(dst, cpu_env,
+                         offsetof(CPUHexagonState, future_VRegs[num]));
+        gen_memcpy(dst, var, sizeof(mmvector_t));
+        if (vnew == EXT_TMP) {
+            TCGv_ptr src = tcg_temp_new_ptr();
+            tcg_gen_addi_ptr(dst, cpu_env,
+                             offsetof(CPUHexagonState, tmp_VRegs[num]));
+            tcg_gen_addi_ptr(src, cpu_env,
+                             offsetof(CPUHexagonState, future_VRegs[num]));
+            gen_memcpy(dst, src, sizeof(mmvector_t));
+            tcg_temp_free_ptr(src);
+        }
+        tcg_temp_free(mask);
+        tcg_temp_free_ptr(dst);
+    }
+    gen_set_label(label_end);
+
+    tcg_temp_free(cancelled);
+}
+
+static inline void gen_log_vreg_write_pair(TCGv_ptr var, int num, int vnew,
+                                           int slot_num)
+{
+    TCGv_ptr v0 = tcg_temp_local_new_ptr();
+    TCGv_ptr v1 = tcg_temp_local_new_ptr();
+    tcg_gen_addi_ptr(v0, var, offsetof(mmvector_pair_t, v[0]));
+    gen_log_vreg_write(v0, num ^ 0, vnew, slot_num);
+    tcg_gen_addi_ptr(v1, var, offsetof(mmvector_pair_t, v[1]));
+    gen_log_vreg_write(v1, num ^ 1, vnew, slot_num);
+    tcg_temp_free_ptr(v0);
+    tcg_temp_free_ptr(v1);
+}
+
+static inline void gen_log_qreg_write(TCGv_ptr var, int num, int vnew,
+                                          int slot_num)
+{
+    TCGv cancelled = tcg_temp_local_new();
+    TCGLabel *label_end = gen_new_label();
+
+    /* Don't do anything if the slot was cancelled */
+    gen_slot_cancelled_check(cancelled, slot_num);
+    tcg_gen_brcondi_tl(TCG_COND_NE, cancelled, 0, label_end);
+    {
+        TCGv_ptr dst = tcg_temp_new_ptr();
+        tcg_gen_addi_ptr(dst, cpu_env,
+                         offsetof(CPUHexagonState, future_QRegs[num]));
+        gen_memcpy(dst, var, sizeof(mmqreg_t));
+        tcg_gen_ori_tl(hex_QRegs_updated, hex_QRegs_updated, 1 << num);
+        tcg_temp_free_ptr(dst);
+    }
+    gen_set_label(label_end);
+
+    tcg_temp_free(cancelled);
+}
+
 #endif
diff --git a/target/hexagon/genptr.c b/target/hexagon/genptr.c
index 6b9cc0d..588360d 100644
--- a/target/hexagon/genptr.c
+++ b/target/hexagon/genptr.c
@@ -25,6 +25,7 @@
 #include "opcodes.h"
 #include "translate.h"
 #include "macros.h"
+#include "mmvec/macros.h"
 #include "genptr_helpers.h"
 #include "helper_overrides.h"
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 66/67] Hexagon HVX translation
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (64 preceding siblings ...)
  2020-02-28 16:44 ` [RFC PATCH v2 65/67] Hexagon HVX TCG generation Taylor Simpson
@ 2020-02-28 16:44 ` Taylor Simpson
  2020-02-28 16:44 ` [RFC PATCH v2 67/67] Hexagon HVX build infrastructure Taylor Simpson
  2020-03-25 21:13 ` [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Changes to packet semantics to support HVX

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/translate.h |  30 ++++++++
 target/hexagon/translate.c | 188 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 218 insertions(+)

diff --git a/target/hexagon/translate.h b/target/hexagon/translate.h
index 65b721e..481dbbd 100644
--- a/target/hexagon/translate.h
+++ b/target/hexagon/translate.h
@@ -32,6 +32,14 @@ typedef struct DisasContext {
     int ctx_preg_log[PRED_WRITES_MAX];
     int ctx_preg_log_idx;
     uint8_t ctx_store_width[STORES_MAX];
+    int ctx_temp_vregs_idx;
+    int ctx_temp_qregs_idx;
+    int ctx_vreg_log[NUM_VREGS];
+    int ctx_vreg_is_predicated[NUM_VREGS];
+    int ctx_vreg_log_idx;
+    int ctx_qreg_log[NUM_QREGS];
+    int ctx_qreg_is_predicated[NUM_QREGS];
+    int ctx_qreg_log_idx;
 } DisasContext;
 
 static inline void ctx_log_reg_write(DisasContext *ctx, int rnum)
@@ -54,6 +62,22 @@ static inline void ctx_log_pred_write(DisasContext *ctx, int pnum)
     ctx->ctx_preg_log_idx++;
 }
 
+static inline void ctx_log_vreg_write(DisasContext *ctx,
+                                      int rnum, int is_predicated)
+{
+    ctx->ctx_vreg_log[ctx->ctx_vreg_log_idx] = rnum;
+    ctx->ctx_vreg_is_predicated[ctx->ctx_vreg_log_idx] = is_predicated;
+    ctx->ctx_vreg_log_idx++;
+}
+
+static inline void ctx_log_qreg_write(DisasContext *ctx,
+                                      int rnum, int is_predicated)
+{
+    ctx->ctx_qreg_log[ctx->ctx_qreg_log_idx] = rnum;
+    ctx->ctx_qreg_is_predicated[ctx->ctx_qreg_log_idx] = is_predicated;
+    ctx->ctx_qreg_log_idx++;
+}
+
 extern TCGv hex_gpr[TOTAL_PER_THREAD_REGS];
 extern TCGv hex_pred[NUM_PREGS];
 extern TCGv hex_next_PC;
@@ -74,9 +98,15 @@ extern TCGv llsc_val;
 extern TCGv_i64 llsc_val_i64;
 extern TCGv hex_is_gather_store_insn;
 extern TCGv hex_gather_issued;
+extern TCGv hex_VRegs_updated_tmp;
+extern TCGv hex_VRegs_updated;
+extern TCGv hex_VRegs_select;
+extern TCGv hex_QRegs_updated;
 
 void hexagon_translate_init(void);
 extern void gen_exception(int excp);
 extern void gen_exception_debug(void);
 
+extern void gen_memcpy(TCGv_ptr dest, TCGv_ptr src, size_t n);
+
 #endif
diff --git a/target/hexagon/translate.c b/target/hexagon/translate.c
index 57ab294..f5c135b 100644
--- a/target/hexagon/translate.c
+++ b/target/hexagon/translate.c
@@ -51,6 +51,10 @@ TCGv llsc_val;
 TCGv_i64 llsc_val_i64;
 TCGv hex_is_gather_store_insn;
 TCGv hex_gather_issued;
+TCGv hex_VRegs_updated_tmp;
+TCGv hex_VRegs_updated;
+TCGv hex_VRegs_select;
+TCGv hex_QRegs_updated;
 
 static const char * const hexagon_prednames[] = {
   "p0", "p1", "p2", "p3"
@@ -137,6 +141,10 @@ static void gen_start_packet(DisasContext *ctx, packet_t *pkt)
     /* Clear out the disassembly context */
     ctx->ctx_reg_log_idx = 0;
     ctx->ctx_preg_log_idx = 0;
+    ctx->ctx_temp_vregs_idx = 0;
+    ctx->ctx_temp_qregs_idx = 0;
+    ctx->ctx_vreg_log_idx = 0;
+    ctx->ctx_qreg_log_idx = 0;
     for (i = 0; i < STORES_MAX; i++) {
         ctx->ctx_store_width[i] = 0;
     }
@@ -155,6 +163,15 @@ static void gen_start_packet(DisasContext *ctx, packet_t *pkt)
         tcg_gen_movi_tl(hex_next_PC, next_PC);
     }
     tcg_gen_movi_tl(hex_pred_written, 0);
+
+    if (pkt->pkt_has_hvx) {
+        tcg_gen_movi_tl(hex_VRegs_updated_tmp, 0);
+        tcg_gen_movi_tl(hex_VRegs_updated, 0);
+        tcg_gen_movi_tl(hex_VRegs_select, 0);
+        tcg_gen_movi_tl(hex_QRegs_updated, 0);
+        tcg_gen_movi_tl(hex_is_gather_store_insn, 0);
+        tcg_gen_movi_tl(hex_gather_issued, 0);
+    }
 }
 
 static int is_gather_store_insn(insn_t *insn)
@@ -445,10 +462,163 @@ static bool process_change_of_flow(DisasContext *ctx, packet_t *pkt)
     return false;
 }
 
+void gen_memcpy(TCGv_ptr dest, TCGv_ptr src, size_t n)
+{
+    TCGv_ptr d = tcg_temp_new_ptr();
+    TCGv_ptr s = tcg_temp_new_ptr();
+    int i;
+
+    tcg_gen_addi_ptr(d, dest, 0);
+    tcg_gen_addi_ptr(s, src, 0);
+    if (n % 8 == 0) {
+        TCGv_i64 temp = tcg_temp_new_i64();
+        for (i = 0; i < n / 8; i++) {
+            tcg_gen_ld_i64(temp, s, 0);
+            tcg_gen_st_i64(temp, d, 0);
+            tcg_gen_addi_ptr(s, s, 8);
+            tcg_gen_addi_ptr(d, d, 8);
+        }
+        tcg_temp_free_i64(temp);
+    } else if (n % 4 == 0) {
+        TCGv temp = tcg_temp_new();
+        for (i = 0; i < n / 4; i++) {
+            tcg_gen_ld32u_tl(temp, s, 0);
+            tcg_gen_st32_tl(temp, d, 0);
+            tcg_gen_addi_ptr(s, s, 4);
+            tcg_gen_addi_ptr(d, d, 4);
+        }
+        tcg_temp_free(temp);
+    } else if (n % 2 == 0) {
+        TCGv temp = tcg_temp_new();
+        for (i = 0; i < n / 2; i++) {
+            tcg_gen_ld16u_tl(temp, s, 0);
+            tcg_gen_st16_tl(temp, d, 0);
+            tcg_gen_addi_ptr(s, s, 2);
+            tcg_gen_addi_ptr(d, d, 2);
+        }
+        tcg_temp_free(temp);
+    } else {
+        TCGv temp = tcg_temp_new();
+        for (i = 0; i < n; i++) {
+            tcg_gen_ld8u_tl(temp, s, 0);
+            tcg_gen_st8_tl(temp, d, 0);
+            tcg_gen_addi_ptr(s, s, 1);
+            tcg_gen_addi_ptr(d, d, 1);
+        }
+        tcg_temp_free(temp);
+    }
+
+    tcg_temp_free_ptr(d);
+    tcg_temp_free_ptr(s);
+}
+
+static inline void gen_vec_copy(intptr_t dstoff, intptr_t srcoff, size_t size)
+{
+    TCGv_ptr src = tcg_temp_new_ptr();
+    TCGv_ptr dst = tcg_temp_new_ptr();
+    tcg_gen_addi_ptr(src, cpu_env, srcoff);
+    tcg_gen_addi_ptr(dst, cpu_env, dstoff);
+    gen_memcpy(dst, src, size);
+    tcg_temp_free_ptr(src);
+    tcg_temp_free_ptr(dst);
+}
+
+static inline bool pkt_has_hvx_store(packet_t *pkt)
+{
+    int i;
+    for (i = 0; i < pkt->num_insns; i++) {
+        int opcode = pkt->insn[i].opcode;
+        if (GET_ATTRIB(opcode, A_CVI) && GET_ATTRIB(opcode, A_STORE)) {
+            return true;
+        }
+    }
+    return false;
+}
+
+static void gen_commit_hvx(DisasContext *ctx, packet_t *pkt)
+{
+    int i;
+
+    /*
+     *    for (i = 0; i < ctx->ctx_vreg_log_idx; i++) {
+     *        int rnum = ctx->ctx_vreg_log[i];
+     *        if (ctx->ctx_vreg_is_predicated[i]) {
+     *            if (env->VRegs_updated & (1 << rnum)) {
+     *                env->VRegs[rnum] = env->future_VRegs[rnum];
+     *            }
+     *        } else {
+     *            env->VRegs[rnum] = env->future_VRegs[rnum];
+     *        }
+     *    }
+     */
+    for (i = 0; i < ctx->ctx_vreg_log_idx; i++) {
+        int rnum = ctx->ctx_vreg_log[i];
+        int is_predicated = ctx->ctx_vreg_is_predicated[i];
+        intptr_t dstoff = offsetof(CPUHexagonState, VRegs[rnum]);
+        intptr_t srcoff = offsetof(CPUHexagonState, future_VRegs[rnum]);
+        size_t size = sizeof(mmvector_t);
+
+        if (is_predicated) {
+            TCGv cmp = tcg_temp_local_new();
+            TCGLabel *label_skip = gen_new_label();
+
+            tcg_gen_andi_tl(cmp, hex_VRegs_updated, 1 << rnum);
+            tcg_gen_brcondi_tl(TCG_COND_EQ, cmp, 0, label_skip);
+            {
+                gen_vec_copy(dstoff, srcoff, size);
+            }
+            gen_set_label(label_skip);
+            tcg_temp_free(cmp);
+        } else {
+            gen_vec_copy(dstoff, srcoff, size);
+        }
+    }
+
+    /*
+     *    for (i = 0; i < ctx-_ctx_qreg_log_idx; i++) {
+     *        int rnum = ctx->ctx_qreg_log[i];
+     *        if (ctx->ctx_qreg_is_predicated[i]) {
+     *            if (env->QRegs_updated) & (1 << rnum)) {
+     *                env->QRegs[rnum] = env->future_QRegs[rnum];
+     *            }
+     *        } else {
+     *            env->QRegs[rnum] = env->future_QRegs[rnum];
+     *        }
+     *    }
+     */
+    for (i = 0; i < ctx->ctx_qreg_log_idx; i++) {
+        int rnum = ctx->ctx_qreg_log[i];
+        int is_predicated = ctx->ctx_qreg_is_predicated[i];
+        intptr_t dstoff = offsetof(CPUHexagonState, QRegs[rnum]);
+        intptr_t srcoff = offsetof(CPUHexagonState, future_QRegs[rnum]);
+        size_t size = sizeof(mmqreg_t);
+
+        if (is_predicated) {
+            TCGv cmp = tcg_temp_local_new();
+            TCGLabel *label_skip = gen_new_label();
+
+            tcg_gen_andi_tl(cmp, hex_QRegs_updated, 1 << rnum);
+            tcg_gen_brcondi_tl(TCG_COND_EQ, cmp, 0, label_skip);
+            {
+                gen_vec_copy(dstoff, srcoff, size);
+            }
+            gen_set_label(label_skip);
+            tcg_temp_free(cmp);
+        } else {
+            gen_vec_copy(dstoff, srcoff, size);
+        }
+    }
+
+    if (pkt_has_hvx_store(pkt)) {
+        gen_helper_commit_hvx_stores(cpu_env);
+    }
+}
+
 static void gen_exec_counters(packet_t *pkt)
 {
     int num_insns = pkt->num_insns;
     int num_real_insns = 0;
+    int num_hvx_insns = 0;
     int i;
 
     for (i = 0; i < num_insns; i++) {
@@ -457,6 +627,9 @@ static void gen_exec_counters(packet_t *pkt)
             !GET_ATTRIB(pkt->insn[i].opcode, A_IT_NOP)) {
             num_real_insns++;
         }
+        if (pkt->insn[i].hvx_resource) {
+            num_hvx_insns++;
+        }
     }
 
     tcg_gen_addi_tl(hex_gpr[HEX_REG_QEMU_PKT_CNT],
@@ -465,6 +638,10 @@ static void gen_exec_counters(packet_t *pkt)
         tcg_gen_addi_tl(hex_gpr[HEX_REG_QEMU_INSN_CNT],
                         hex_gpr[HEX_REG_QEMU_INSN_CNT], num_real_insns);
     }
+    if (num_hvx_insns) {
+        tcg_gen_addi_tl(hex_gpr[HEX_REG_QEMU_HVX_CNT],
+                        hex_gpr[HEX_REG_QEMU_HVX_CNT], num_hvx_insns);
+    }
 }
 
 static void gen_commit_packet(DisasContext *ctx, packet_t *pkt)
@@ -476,6 +653,9 @@ static void gen_commit_packet(DisasContext *ctx, packet_t *pkt)
     process_store_log(ctx, pkt);
     process_dczeroa(ctx, pkt);
     end_tb |= process_change_of_flow(ctx, pkt);
+    if (pkt->pkt_has_hvx) {
+        gen_commit_hvx(ctx, pkt);
+    }
     gen_exec_counters(pkt);
 #if HEX_DEBUG
     {
@@ -702,6 +882,14 @@ void hexagon_translate_init(void)
         "is_gather_store_insn");
     hex_gather_issued = tcg_global_mem_new(cpu_env,
         offsetof(CPUHexagonState, gather_issued), "gather_issued");
+    hex_VRegs_updated_tmp = tcg_global_mem_new(cpu_env,
+        offsetof(CPUHexagonState, VRegs_updated_tmp), "VRegs_updated_tmp");
+    hex_VRegs_updated = tcg_global_mem_new(cpu_env,
+        offsetof(CPUHexagonState, VRegs_updated), "VRegs_updated");
+    hex_VRegs_select = tcg_global_mem_new(cpu_env,
+        offsetof(CPUHexagonState, VRegs_select), "VRegs_select");
+    hex_QRegs_updated = tcg_global_mem_new(cpu_env,
+        offsetof(CPUHexagonState, QRegs_updated), "QRegs_updated");
     for (i = 0; i < STORES_MAX; i++) {
         sprintf(store_addr_names[i], "store_addr_%d", i);
         hex_store_addr[i] = tcg_global_mem_new(cpu_env,
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCH v2 67/67] Hexagon HVX build infrastructure
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (65 preceding siblings ...)
  2020-02-28 16:44 ` [RFC PATCH v2 66/67] Hexagon HVX translation Taylor Simpson
@ 2020-02-28 16:44 ` Taylor Simpson
  2020-03-25 21:13 ` [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
  67 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-02-28 16:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: riku.voipio, richard.henderson, laurent, Taylor Simpson, philmd,
	aleksandar.m.mail

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/Makefile.objs | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/target/hexagon/Makefile.objs b/target/hexagon/Makefile.objs
index be0d08f..d18b41f 100644
--- a/target/hexagon/Makefile.objs
+++ b/target/hexagon/Makefile.objs
@@ -28,7 +28,9 @@ obj-y += \
     printinsn.o \
     arch.o \
     fma_emu.o \
-    conv_emu.o
+    conv_emu.o \
+    mmvec/decode_ext_mmvec.o \
+    mmvec/system_ext_mmvec.o
 
 #
 #  Step 1
@@ -47,10 +49,14 @@ IDEF_FILES = \
     $(SRC_PATH)/target/hexagon/imported/mpy.idef \
     $(SRC_PATH)/target/hexagon/imported/shift.idef \
     $(SRC_PATH)/target/hexagon/imported/subinsns.idef \
-    $(SRC_PATH)/target/hexagon/imported/system.idef
+    $(SRC_PATH)/target/hexagon/imported/system.idef \
+    $(SRC_PATH)/target/hexagon/imported/allext.idef \
+    $(SRC_PATH)/target/hexagon/imported/mmvec/ext.idef
 DEF_FILES = \
     $(SRC_PATH)/target/hexagon/imported/allidefs.def \
-    $(SRC_PATH)/target/hexagon/imported/macros.def
+    $(SRC_PATH)/target/hexagon/imported/macros.def \
+    $(SRC_PATH)/target/hexagon/imported/allext_macros.def \
+    $(SRC_PATH)/target/hexagon/imported/mmvec/macros.def
 
 $(GEN_SEMANTICS): $(GEN_SEMANTICS_SRC) $(IDEF_FILES) $(DEF_FILES)
 	$(CC) $(CFLAGS) $(GEN_SEMANTICS_SRC) -o $(GEN_SEMANTICS)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* RE: [RFC PATCH v2 00/67] Hexagon patch series
  2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
                   ` (66 preceding siblings ...)
  2020-02-28 16:44 ` [RFC PATCH v2 67/67] Hexagon HVX build infrastructure Taylor Simpson
@ 2020-03-25 21:13 ` Taylor Simpson
  2020-04-30 20:53   ` Taylor Simpson
  67 siblings, 1 reply; 72+ messages in thread
From: Taylor Simpson @ 2020-03-25 21:13 UTC (permalink / raw)
  To: Taylor Simpson, qemu-devel
  Cc: philmd, riku.voipio, richard.henderson, laurent, aleksandar.m.mail

I know everyone is heads-down working on the 5.0 release, and I realize this is a lot of code.  However, I would really appreciate some feedback on it.  Is there anything I can do to make it easier to review?

Thanks,
Taylor


> -----Original Message-----
> From: Taylor Simpson <tsimpson@quicinc.com>
> Sent: Friday, February 28, 2020 10:43 AM
> To: qemu-devel@nongnu.org
> Cc: richard.henderson@linaro.org; philmd@redhat.com; laurent@vivier.eu;
> riku.voipio@iki.fi; aleksandar.m.mail@gmail.com; Taylor Simpson
> <tsimpson@quicinc.com>
> Subject: [RFC PATCH v2 00/67] Hexagon patch series
>
> This series adds support for the Hexagon processor with Linux user support
>
> See patch 02/67 Hexagon README for detailed information.
>
> The patches up to and including "Hexagon build infractructure" implement
> the
> base Hexagon core and the remainder add HVX.  Once the build
> infrastructure
> patch is applied, you can build and qemu will execute non-HVX Hexagon
> programs.
>
> We have a parallel effort to make the Hexagon Linux toolchain publically
> available.
>
>
> *** Testing ***
>
> The port passes the following tests
>     Directed unit tests will contributed when the Hexagon toolchain is available
>     MUSL libc test suite (good coverage of Linux system calls)
>     https://git.musl-libc.org/cgit/libc-testsuite/
>     Internal compiler intrinsics test suite (good coverage of instructions)
>     Hexagon machine learning library unit tests
>     TODO - pull these from the CAF repo
>     make check-tcg TIMEOUT=60
>
> *** Known checkpatch issues ***
>
> The following are known checkpatch errors in the series
>     include/disas/dis-asm.h             space prohibited
>         (Follow convention of other targets on prior lines)
>     target/hexagon/reg_fields.h         Complex macro
>     target/hexagon/attribs.h            Complex macro
>     target/hexagon/decode.c             Complex macro
>     target/hexagon/q6v_decode.c         Macro needs do - while
>     target/hexagon/printinsn.c          Macro needs do - while
>     target/hexagon/gen_semantics.c      Suspicious ; after while (0)
>     target/hexagon/gen_dectree_import.c Complex macro
>     target/hexagon/gen_dectree_import.c Suspicious ; after while (0)
>     target/hexagon/opcodes.c            Complex macro
>     target/hexagon/iclass.h             Complex macro
>     scripts/qemu-binfmt-conf.sh         Line over 90 characters
>     target/hexagon/mmvec/macros.h       Suspicious ; after while (0)
>
> The following are known checkpatch warnings in the series
>     target/hexagon/fma_emu.c            Comments inside macro definition
>     scripts/qemu-binfmt-conf.sh         Line over 80 characters
>
> *** Changes in v2 ***
> - Use scripts/git.orderfile
> - Create a README with the code overview in patch 0001
> - Change #define's in hex_regs.h to an enum
> - Replace hard coded disassembly buffer length (1028) with #define
> - Move Hexagon architecture types patch earlier in series
> - Replace #include standard header files with #include "qemu/osdep.h"
> - Prefix all header file #ifndef's with HEXAGON_
> - Update python version to python3
> - #include "tcg/tcg.h" in genptr_helpers.h
> - Change target/hexagon/Makefile.objs to support out-of-tree build
> - Updated copyright to include year 2020
> - Bug fixes
>     Fix some problems with HEX_DEBUG output
>     Fix bug in circular addressing
> - Optimizations to reduce the amount of TCG code generated
>     Change pred_written from an array to a bit mask
>     Optimize readonly vector registers
>     Conditionally call gen_helper_commit_hvx_stores
>
> Taylor Simpson (67):
>   Hexagon Maintainers
>   Hexagon README
>   Hexagon ELF Machine Definition
>   Hexagon CPU Scalar Core Definition
>   Hexagon register names
>   Hexagon Disassembler
>   Hexagon CPU Scalar Core Helpers
>   Hexagon GDB Stub
>   Hexagon architecture types
>   Hexagon instruction and packet types
>   Hexagon register fields
>   Hexagon instruction attributes
>   Hexagon register map
>   Hexagon instruction/packet decode
>   Hexagon instruction printing
>   Hexagon arch import - instruction semantics definitions
>   Hexagon arch import - macro definitions
>   Hexagon arch import - instruction encoding
>   Hexagon instruction class definitions
>   Hexagon instruction utility functions
>   Hexagon generator phase 1 - C preprocessor for semantics
>   Hexagon generator phase 2 - qemu_def_generated.h
>   Hexagon generator phase 2 - qemu_wrap_generated.h
>   Hexagon generator phase 2 - opcodes_def_generated.h
>   Hexagon generator phase 2 - op_attribs_generated.h
>   Hexagon generator phase 2 - op_regs_generated.h
>   Hexagon generator phase 2 - printinsn-generated.h
>   Hexagon generator phase 3 - C preprocessor for decode tree
>   Hexagon generater phase 4 - Decode tree
>   Hexagon opcode data structures
>   Hexagon macros to interface with the generator
>   Hexagon macros referenced in instruction semantics
>   Hexagon instruction classes
>   Hexagon TCG generation helpers - step 1
>   Hexagon TCG generation helpers - step 2
>   Hexagon TCG generation helpers - step 3
>   Hexagon TCG generation helpers - step 4
>   Hexagon TCG generation helpers - step 5
>   Hexagon TCG generation - step 01
>   Hexagon TCG generation - step 02
>   Hexagon TCG generation - step 03
>   Hexagon TCG generation - step 04
>   Hexagon TCG generation - step 05
>   Hexagon TCG generation - step 06
>   Hexagon TCG generation - step 07
>   Hexagon TCG generation - step 08
>   Hexagon TCG generation - step 09
>   Hexagon TCG generation - step 10
>   Hexagon TCG generation - step 11
>   Hexagon TCG generation - step 12
>   Hexagon translation
>   Hexagon Linux user emulation
>   Hexagon build infrastructure
>   Hexagon - Add Hexagon Vector eXtensions (HVX) to core definition
>   Hexagon HVX support in gdbstub
>   Hexagon HVX import instruction encodings
>   Hexagon HVX import semantics
>   Hexagon HVX import macro definitions
>   Hexagon HVX semantics generator
>   Hexagon HVX instruction decoding
>   Hexagon HVX instruction utility functions
>   Hexagon HVX macros to interface with the generator
>   Hexagon HVX macros referenced in instruction semantics
>   Hexagon HVX helper to commit vector stores (masked and scatter/gather)
>   Hexagon HVX TCG generation
>   Hexagon HVX translation
>   Hexagon HVX build infrastructure
>
>  configure                                    |    9 +
>  default-configs/hexagon-linux-user.mak       |    1 +
>  include/disas/dis-asm.h                      |    1 +
>  include/elf.h                                |    2 +
>  linux-user/hexagon/sockbits.h                |   18 +
>  linux-user/hexagon/syscall_nr.h              |  346 +++
>  linux-user/hexagon/target_cpu.h              |   44 +
>  linux-user/hexagon/target_elf.h              |   38 +
>  linux-user/hexagon/target_fcntl.h            |   18 +
>  linux-user/hexagon/target_signal.h           |   34 +
>  linux-user/hexagon/target_structs.h          |   46 +
>  linux-user/hexagon/target_syscall.h          |   32 +
>  linux-user/hexagon/termbits.h                |   18 +
>  linux-user/syscall_defs.h                    |   33 +
>  target/hexagon/arch.h                        |   62 +
>  target/hexagon/attribs.h                     |   32 +
>  target/hexagon/attribs_def.h                 |  404 +++
>  target/hexagon/conv_emu.h                    |   50 +
>  target/hexagon/cpu-param.h                   |   26 +
>  target/hexagon/cpu.h                         |  207 ++
>  target/hexagon/cpu_bits.h                    |   37 +
>  target/hexagon/decode.h                      |   39 +
>  target/hexagon/fma_emu.h                     |   30 +
>  target/hexagon/genptr.h                      |   25 +
>  target/hexagon/genptr_helpers.h              | 1049 +++++++
>  target/hexagon/helper.h                      |   38 +
>  target/hexagon/helper_overrides.h            | 1850 ++++++++++++
>  target/hexagon/hex_arch_types.h              |   42 +
>  target/hexagon/hex_regs.h                    |   99 +
>  target/hexagon/iclass.h                      |   46 +
>  target/hexagon/insn.h                        |  149 +
>  target/hexagon/internal.h                    |   54 +
>  target/hexagon/macros.h                      | 1474 ++++++++++
>  target/hexagon/mmvec/decode_ext_mmvec.h      |   24 +
>  target/hexagon/mmvec/macros.h                |  698 +++++
>  target/hexagon/mmvec/mmvec.h                 |   87 +
>  target/hexagon/mmvec/system_ext_mmvec.h      |   38 +
>  target/hexagon/opcodes.h                     |   67 +
>  target/hexagon/printinsn.h                   |   26 +
>  target/hexagon/reg_fields.h                  |   40 +
>  target/hexagon/reg_fields_def.h              |  109 +
>  target/hexagon/regmap.h                      |   38 +
>  target/hexagon/translate.h                   |  112 +
>  disas/hexagon.c                              |   62 +
>  linux-user/elfload.c                         |   16 +
>  linux-user/hexagon/cpu_loop.c                |  173 ++
>  linux-user/hexagon/signal.c                  |  276 ++
>  linux-user/syscall.c                         |    2 +
>  target/hexagon/arch.c                        |  663 +++++
>  target/hexagon/conv_emu.c                    |  369 +++
>  target/hexagon/cpu.c                         |  374 +++
>  target/hexagon/decode.c                      |  788 +++++
>  target/hexagon/fma_emu.c                     |  916 ++++++
>  target/hexagon/gdbstub.c                     |  111 +
>  target/hexagon/gen_dectree_import.c          |  205 ++
>  target/hexagon/gen_semantics.c               |  101 +
>  target/hexagon/genptr.c                      |   61 +
>  target/hexagon/iclass.c                      |  107 +
>  target/hexagon/mmvec/decode_ext_mmvec.c      |  670 +++++
>  target/hexagon/mmvec/system_ext_mmvec.c      |  263 ++
>  target/hexagon/op_helper.c                   |  509 ++++
>  target/hexagon/opcodes.c                     |  217 ++
>  target/hexagon/printinsn.c                   |   91 +
>  target/hexagon/q6v_decode.c                  |  416 +++
>  target/hexagon/reg_fields.c                  |   28 +
>  target/hexagon/translate.c                   |  916 ++++++
>  MAINTAINERS                                  |    8 +
>  disas/Makefile.objs                          |    1 +
>  scripts/qemu-binfmt-conf.sh                  |    6 +-
>  target/hexagon/Makefile.objs                 |  127 +
>  target/hexagon/README                        |  296 ++
>  target/hexagon/dectree.py                    |  353 +++
>  target/hexagon/do_qemu.py                    | 1194 ++++++++
>  target/hexagon/imported/allext.idef          |   25 +
>  target/hexagon/imported/allext_macros.def    |   25 +
>  target/hexagon/imported/allextenc.def        |   20 +
>  target/hexagon/imported/allidefs.def         |   92 +
>  target/hexagon/imported/alu.idef             | 1335 +++++++++
>  target/hexagon/imported/branch.idef          |  344 +++
>  target/hexagon/imported/compare.idef         |  639 +++++
>  target/hexagon/imported/encode.def           |  126 +
>  target/hexagon/imported/encode_pp.def        | 2283 +++++++++++++++
>  target/hexagon/imported/encode_subinsn.def   |  150 +
>  target/hexagon/imported/float.idef           |  498 ++++
>  target/hexagon/imported/iclass.def           |   52 +
>  target/hexagon/imported/ldst.idef            |  421 +++
>  target/hexagon/imported/macros.def           | 3970
> ++++++++++++++++++++++++++
>  target/hexagon/imported/mmvec/encode_ext.def |  830 ++++++
>  target/hexagon/imported/mmvec/ext.idef       | 2780
> ++++++++++++++++++
>  target/hexagon/imported/mmvec/macros.def     | 1110 +++++++
>  target/hexagon/imported/mpy.idef             | 1269 ++++++++
>  target/hexagon/imported/shift.idef           | 1211 ++++++++
>  target/hexagon/imported/subinsns.idef        |  152 +
>  target/hexagon/imported/system.idef          |  302 ++
>  tests/tcg/configure.sh                       |    4 +-
>  tests/tcg/hexagon/float_convs.ref            |  748 +++++
>  tests/tcg/hexagon/float_madds.ref            |  768 +++++
>  97 files changed, 36063 insertions(+), 2 deletions(-)
>  create mode 100644 default-configs/hexagon-linux-user.mak
>  create mode 100644 linux-user/hexagon/sockbits.h
>  create mode 100644 linux-user/hexagon/syscall_nr.h
>  create mode 100644 linux-user/hexagon/target_cpu.h
>  create mode 100644 linux-user/hexagon/target_elf.h
>  create mode 100644 linux-user/hexagon/target_fcntl.h
>  create mode 100644 linux-user/hexagon/target_signal.h
>  create mode 100644 linux-user/hexagon/target_structs.h
>  create mode 100644 linux-user/hexagon/target_syscall.h
>  create mode 100644 linux-user/hexagon/termbits.h
>  create mode 100644 target/hexagon/arch.h
>  create mode 100644 target/hexagon/attribs.h
>  create mode 100644 target/hexagon/attribs_def.h
>  create mode 100644 target/hexagon/conv_emu.h
>  create mode 100644 target/hexagon/cpu-param.h
>  create mode 100644 target/hexagon/cpu.h
>  create mode 100644 target/hexagon/cpu_bits.h
>  create mode 100644 target/hexagon/decode.h
>  create mode 100644 target/hexagon/fma_emu.h
>  create mode 100644 target/hexagon/genptr.h
>  create mode 100644 target/hexagon/genptr_helpers.h
>  create mode 100644 target/hexagon/helper.h
>  create mode 100644 target/hexagon/helper_overrides.h
>  create mode 100644 target/hexagon/hex_arch_types.h
>  create mode 100644 target/hexagon/hex_regs.h
>  create mode 100644 target/hexagon/iclass.h
>  create mode 100644 target/hexagon/insn.h
>  create mode 100644 target/hexagon/internal.h
>  create mode 100644 target/hexagon/macros.h
>  create mode 100644 target/hexagon/mmvec/decode_ext_mmvec.h
>  create mode 100644 target/hexagon/mmvec/macros.h
>  create mode 100644 target/hexagon/mmvec/mmvec.h
>  create mode 100644 target/hexagon/mmvec/system_ext_mmvec.h
>  create mode 100644 target/hexagon/opcodes.h
>  create mode 100644 target/hexagon/printinsn.h
>  create mode 100644 target/hexagon/reg_fields.h
>  create mode 100644 target/hexagon/reg_fields_def.h
>  create mode 100644 target/hexagon/regmap.h
>  create mode 100644 target/hexagon/translate.h
>  create mode 100644 disas/hexagon.c
>  create mode 100644 linux-user/hexagon/cpu_loop.c
>  create mode 100644 linux-user/hexagon/signal.c
>  create mode 100644 target/hexagon/arch.c
>  create mode 100644 target/hexagon/conv_emu.c
>  create mode 100644 target/hexagon/cpu.c
>  create mode 100644 target/hexagon/decode.c
>  create mode 100644 target/hexagon/fma_emu.c
>  create mode 100644 target/hexagon/gdbstub.c
>  create mode 100644 target/hexagon/gen_dectree_import.c
>  create mode 100644 target/hexagon/gen_semantics.c
>  create mode 100644 target/hexagon/genptr.c
>  create mode 100644 target/hexagon/iclass.c
>  create mode 100644 target/hexagon/mmvec/decode_ext_mmvec.c
>  create mode 100644 target/hexagon/mmvec/system_ext_mmvec.c
>  create mode 100644 target/hexagon/op_helper.c
>  create mode 100644 target/hexagon/opcodes.c
>  create mode 100644 target/hexagon/printinsn.c
>  create mode 100644 target/hexagon/q6v_decode.c
>  create mode 100644 target/hexagon/reg_fields.c
>  create mode 100644 target/hexagon/translate.c
>  create mode 100644 target/hexagon/Makefile.objs
>  create mode 100644 target/hexagon/README
>  create mode 100755 target/hexagon/dectree.py
>  create mode 100755 target/hexagon/do_qemu.py
>  create mode 100644 target/hexagon/imported/allext.idef
>  create mode 100644 target/hexagon/imported/allext_macros.def
>  create mode 100644 target/hexagon/imported/allextenc.def
>  create mode 100644 target/hexagon/imported/allidefs.def
>  create mode 100644 target/hexagon/imported/alu.idef
>  create mode 100644 target/hexagon/imported/branch.idef
>  create mode 100644 target/hexagon/imported/compare.idef
>  create mode 100644 target/hexagon/imported/encode.def
>  create mode 100644 target/hexagon/imported/encode_pp.def
>  create mode 100644 target/hexagon/imported/encode_subinsn.def
>  create mode 100644 target/hexagon/imported/float.idef
>  create mode 100644 target/hexagon/imported/iclass.def
>  create mode 100644 target/hexagon/imported/ldst.idef
>  create mode 100755 target/hexagon/imported/macros.def
>  create mode 100644 target/hexagon/imported/mmvec/encode_ext.def
>  create mode 100644 target/hexagon/imported/mmvec/ext.idef
>  create mode 100755 target/hexagon/imported/mmvec/macros.def
>  create mode 100644 target/hexagon/imported/mpy.idef
>  create mode 100644 target/hexagon/imported/shift.idef
>  create mode 100644 target/hexagon/imported/subinsns.idef
>  create mode 100644 target/hexagon/imported/system.idef
>  create mode 100644 tests/tcg/hexagon/float_convs.ref
>  create mode 100644 tests/tcg/hexagon/float_madds.ref
>
> --
> 2.7.4


^ permalink raw reply	[flat|nested] 72+ messages in thread

* RE: [RFC PATCH v2 20/67] Hexagon instruction utility functions
  2020-02-28 16:43 ` [RFC PATCH v2 20/67] Hexagon instruction utility functions Taylor Simpson
@ 2020-04-09 18:53   ` Brian Cain
  2020-04-09 20:22     ` Taylor Simpson
  0 siblings, 1 reply; 72+ messages in thread
From: Brian Cain @ 2020-04-09 18:53 UTC (permalink / raw)
  To: 'Taylor Simpson', qemu-devel
  Cc: philmd, riku.voipio, richard.henderson, laurent, aleksandar.m.mail

> -----Original Message-----
> From: Qemu-devel <qemu-devel-
> bounces+bcain=codeaurora.org@nongnu.org> On Behalf Of Taylor Simpson
> Sent: Friday, February 28, 2020 10:43 AM
> To: qemu-devel@nongnu.org
> Cc: riku.voipio@iki.fi; richard.henderson@linaro.org; laurent@vivier.eu;
> Taylor Simpson <tsimpson@quicinc.com>; philmd@redhat.com;
> aleksandar.m.mail@gmail.com
> Subject: [RFC PATCH v2 20/67] Hexagon instruction utility functions
...
> +int arch_sf_invsqrt_common(size4s_t *Rs, size4s_t *Rd, int *adjust)
> +{
...
> +    } else if (r_class == FP_INFINITE) {
> +        /* EJP: or put Inf in num fixup? */
> +        RsV = fSFINFVAL(-1);
> +        RdV = fSFINFVAL(-1);
> +    } else if (r_class == FP_ZERO) {
> +        /* EJP: or put zero in num fixup? */
> +        RsV = RsV;
> +        RdV = fSFONEVAL(0);
...

This "RsV = RsV" looks like a logic error?  Presumably it's safe to remove -- unless there's some other field that should get initialized here?  PeV maybe?


^ permalink raw reply	[flat|nested] 72+ messages in thread

* RE: [RFC PATCH v2 20/67] Hexagon instruction utility functions
  2020-04-09 18:53   ` Brian Cain
@ 2020-04-09 20:22     ` Taylor Simpson
  0 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-04-09 20:22 UTC (permalink / raw)
  To: bcain, qemu-devel
  Cc: philmd, riku.voipio, richard.henderson, laurent, aleksandar.m.mail



> -----Original Message-----
> From: bcain=codeaurora.org@mg.codeaurora.org
> <bcain=codeaurora.org@mg.codeaurora.org> On Behalf Of Brian Cain
> Sent: Thursday, April 9, 2020 1:53 PM
> To: Taylor Simpson <tsimpson@quicinc.com>; qemu-devel@nongnu.org
> Cc: riku.voipio@iki.fi; richard.henderson@linaro.org; laurent@vivier.eu;
> philmd@redhat.com; aleksandar.m.mail@gmail.com
> Subject: RE: [RFC PATCH v2 20/67] Hexagon instruction utility functions
>
> > -----Original Message-----
> > From: Qemu-devel <qemu-devel-
> > bounces+bcain=codeaurora.org@nongnu.org> On Behalf Of Taylor
> Simpson
> > Sent: Friday, February 28, 2020 10:43 AM
> > To: qemu-devel@nongnu.org
> > Cc: riku.voipio@iki.fi; richard.henderson@linaro.org; laurent@vivier.eu;
> > Taylor Simpson <tsimpson@quicinc.com>; philmd@redhat.com;
> > aleksandar.m.mail@gmail.com
> > Subject: [RFC PATCH v2 20/67] Hexagon instruction utility functions
> ...
> > +int arch_sf_invsqrt_common(size4s_t *Rs, size4s_t *Rd, int *adjust)
> > +{
> ...
> > +    } else if (r_class == FP_INFINITE) {
> > +        /* EJP: or put Inf in num fixup? */
> > +        RsV = fSFINFVAL(-1);
> > +        RdV = fSFINFVAL(-1);
> > +    } else if (r_class == FP_ZERO) {
> > +        /* EJP: or put zero in num fixup? */
> > +        RsV = RsV;
> > +        RdV = fSFONEVAL(0);
> ...
>
> This "RsV = RsV" looks like a logic error?  Presumably it's safe to remove --
> unless there's some other field that should get initialized here?  PeV maybe?

Probably a copy/paste error.  I will remove it.

Thanks,
Taylor


^ permalink raw reply	[flat|nested] 72+ messages in thread

* RE: [RFC PATCH v2 00/67] Hexagon patch series
  2020-03-25 21:13 ` [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
@ 2020-04-30 20:53   ` Taylor Simpson
  0 siblings, 0 replies; 72+ messages in thread
From: Taylor Simpson @ 2020-04-30 20:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: philmd, riku.voipio, richard.henderson, laurent, aleksandar.m.mail

Ping?


> -----Original Message-----
> From: Taylor Simpson <tsimpson@quicinc.com>
> Sent: Wednesday, March 25, 2020 4:14 PM
> To: Taylor Simpson <tsimpson@quicinc.com>; qemu-devel@nongnu.org
> Cc: richard.henderson@linaro.org; philmd@redhat.com; laurent@vivier.eu;
> riku.voipio@iki.fi; aleksandar.m.mail@gmail.com
> Subject: RE: [RFC PATCH v2 00/67] Hexagon patch series
>
> I know everyone is heads-down working on the 5.0 release, and I realize this
> is a lot of code.  However, I would really appreciate some feedback on it.  Is
> there anything I can do to make it easier to review?
>
> Thanks,
> Taylor
>
>
> > -----Original Message-----
> > From: Taylor Simpson <tsimpson@quicinc.com>
> > Sent: Friday, February 28, 2020 10:43 AM
> > To: qemu-devel@nongnu.org
> > Cc: richard.henderson@linaro.org; philmd@redhat.com; laurent@vivier.eu;
> > riku.voipio@iki.fi; aleksandar.m.mail@gmail.com; Taylor Simpson
> > <tsimpson@quicinc.com>
> > Subject: [RFC PATCH v2 00/67] Hexagon patch series
> >
> > This series adds support for the Hexagon processor with Linux user support
> >
> > See patch 02/67 Hexagon README for detailed information.
> >
> > The patches up to and including "Hexagon build infractructure" implement
> > the
> > base Hexagon core and the remainder add HVX.  Once the build
> > infrastructure
> > patch is applied, you can build and qemu will execute non-HVX Hexagon
> > programs.
> >
> > We have a parallel effort to make the Hexagon Linux toolchain publically
> > available.
> >
> >
> > *** Testing ***
> >
> > The port passes the following tests
> >     Directed unit tests will contributed when the Hexagon toolchain is
> available
> >     MUSL libc test suite (good coverage of Linux system calls)
> >     https://git.musl-libc.org/cgit/libc-testsuite/
> >     Internal compiler intrinsics test suite (good coverage of instructions)
> >     Hexagon machine learning library unit tests
> >     TODO - pull these from the CAF repo
> >     make check-tcg TIMEOUT=60
> >
> > *** Known checkpatch issues ***
> >
> > The following are known checkpatch errors in the series
> >     include/disas/dis-asm.h             space prohibited
> >         (Follow convention of other targets on prior lines)
> >     target/hexagon/reg_fields.h         Complex macro
> >     target/hexagon/attribs.h            Complex macro
> >     target/hexagon/decode.c             Complex macro
> >     target/hexagon/q6v_decode.c         Macro needs do - while
> >     target/hexagon/printinsn.c          Macro needs do - while
> >     target/hexagon/gen_semantics.c      Suspicious ; after while (0)
> >     target/hexagon/gen_dectree_import.c Complex macro
> >     target/hexagon/gen_dectree_import.c Suspicious ; after while (0)
> >     target/hexagon/opcodes.c            Complex macro
> >     target/hexagon/iclass.h             Complex macro
> >     scripts/qemu-binfmt-conf.sh         Line over 90 characters
> >     target/hexagon/mmvec/macros.h       Suspicious ; after while (0)
> >
> > The following are known checkpatch warnings in the series
> >     target/hexagon/fma_emu.c            Comments inside macro definition
> >     scripts/qemu-binfmt-conf.sh         Line over 80 characters
> >
> > *** Changes in v2 ***
> > - Use scripts/git.orderfile
> > - Create a README with the code overview in patch 0001
> > - Change #define's in hex_regs.h to an enum
> > - Replace hard coded disassembly buffer length (1028) with #define
> > - Move Hexagon architecture types patch earlier in series
> > - Replace #include standard header files with #include "qemu/osdep.h"
> > - Prefix all header file #ifndef's with HEXAGON_
> > - Update python version to python3
> > - #include "tcg/tcg.h" in genptr_helpers.h
> > - Change target/hexagon/Makefile.objs to support out-of-tree build
> > - Updated copyright to include year 2020
> > - Bug fixes
> >     Fix some problems with HEX_DEBUG output
> >     Fix bug in circular addressing
> > - Optimizations to reduce the amount of TCG code generated
> >     Change pred_written from an array to a bit mask
> >     Optimize readonly vector registers
> >     Conditionally call gen_helper_commit_hvx_stores
> >
> > Taylor Simpson (67):
> >   Hexagon Maintainers
> >   Hexagon README
> >   Hexagon ELF Machine Definition
> >   Hexagon CPU Scalar Core Definition
> >   Hexagon register names
> >   Hexagon Disassembler
> >   Hexagon CPU Scalar Core Helpers
> >   Hexagon GDB Stub
> >   Hexagon architecture types
> >   Hexagon instruction and packet types
> >   Hexagon register fields
> >   Hexagon instruction attributes
> >   Hexagon register map
> >   Hexagon instruction/packet decode
> >   Hexagon instruction printing
> >   Hexagon arch import - instruction semantics definitions
> >   Hexagon arch import - macro definitions
> >   Hexagon arch import - instruction encoding
> >   Hexagon instruction class definitions
> >   Hexagon instruction utility functions
> >   Hexagon generator phase 1 - C preprocessor for semantics
> >   Hexagon generator phase 2 - qemu_def_generated.h
> >   Hexagon generator phase 2 - qemu_wrap_generated.h
> >   Hexagon generator phase 2 - opcodes_def_generated.h
> >   Hexagon generator phase 2 - op_attribs_generated.h
> >   Hexagon generator phase 2 - op_regs_generated.h
> >   Hexagon generator phase 2 - printinsn-generated.h
> >   Hexagon generator phase 3 - C preprocessor for decode tree
> >   Hexagon generater phase 4 - Decode tree
> >   Hexagon opcode data structures
> >   Hexagon macros to interface with the generator
> >   Hexagon macros referenced in instruction semantics
> >   Hexagon instruction classes
> >   Hexagon TCG generation helpers - step 1
> >   Hexagon TCG generation helpers - step 2
> >   Hexagon TCG generation helpers - step 3
> >   Hexagon TCG generation helpers - step 4
> >   Hexagon TCG generation helpers - step 5
> >   Hexagon TCG generation - step 01
> >   Hexagon TCG generation - step 02
> >   Hexagon TCG generation - step 03
> >   Hexagon TCG generation - step 04
> >   Hexagon TCG generation - step 05
> >   Hexagon TCG generation - step 06
> >   Hexagon TCG generation - step 07
> >   Hexagon TCG generation - step 08
> >   Hexagon TCG generation - step 09
> >   Hexagon TCG generation - step 10
> >   Hexagon TCG generation - step 11
> >   Hexagon TCG generation - step 12
> >   Hexagon translation
> >   Hexagon Linux user emulation
> >   Hexagon build infrastructure
> >   Hexagon - Add Hexagon Vector eXtensions (HVX) to core definition
> >   Hexagon HVX support in gdbstub
> >   Hexagon HVX import instruction encodings
> >   Hexagon HVX import semantics
> >   Hexagon HVX import macro definitions
> >   Hexagon HVX semantics generator
> >   Hexagon HVX instruction decoding
> >   Hexagon HVX instruction utility functions
> >   Hexagon HVX macros to interface with the generator
> >   Hexagon HVX macros referenced in instruction semantics
> >   Hexagon HVX helper to commit vector stores (masked and scatter/gather)
> >   Hexagon HVX TCG generation
> >   Hexagon HVX translation
> >   Hexagon HVX build infrastructure
> >
> >  configure                                    |    9 +
> >  default-configs/hexagon-linux-user.mak       |    1 +
> >  include/disas/dis-asm.h                      |    1 +
> >  include/elf.h                                |    2 +
> >  linux-user/hexagon/sockbits.h                |   18 +
> >  linux-user/hexagon/syscall_nr.h              |  346 +++
> >  linux-user/hexagon/target_cpu.h              |   44 +
> >  linux-user/hexagon/target_elf.h              |   38 +
> >  linux-user/hexagon/target_fcntl.h            |   18 +
> >  linux-user/hexagon/target_signal.h           |   34 +
> >  linux-user/hexagon/target_structs.h          |   46 +
> >  linux-user/hexagon/target_syscall.h          |   32 +
> >  linux-user/hexagon/termbits.h                |   18 +
> >  linux-user/syscall_defs.h                    |   33 +
> >  target/hexagon/arch.h                        |   62 +
> >  target/hexagon/attribs.h                     |   32 +
> >  target/hexagon/attribs_def.h                 |  404 +++
> >  target/hexagon/conv_emu.h                    |   50 +
> >  target/hexagon/cpu-param.h                   |   26 +
> >  target/hexagon/cpu.h                         |  207 ++
> >  target/hexagon/cpu_bits.h                    |   37 +
> >  target/hexagon/decode.h                      |   39 +
> >  target/hexagon/fma_emu.h                     |   30 +
> >  target/hexagon/genptr.h                      |   25 +
> >  target/hexagon/genptr_helpers.h              | 1049 +++++++
> >  target/hexagon/helper.h                      |   38 +
> >  target/hexagon/helper_overrides.h            | 1850 ++++++++++++
> >  target/hexagon/hex_arch_types.h              |   42 +
> >  target/hexagon/hex_regs.h                    |   99 +
> >  target/hexagon/iclass.h                      |   46 +
> >  target/hexagon/insn.h                        |  149 +
> >  target/hexagon/internal.h                    |   54 +
> >  target/hexagon/macros.h                      | 1474 ++++++++++
> >  target/hexagon/mmvec/decode_ext_mmvec.h      |   24 +
> >  target/hexagon/mmvec/macros.h                |  698 +++++
> >  target/hexagon/mmvec/mmvec.h                 |   87 +
> >  target/hexagon/mmvec/system_ext_mmvec.h      |   38 +
> >  target/hexagon/opcodes.h                     |   67 +
> >  target/hexagon/printinsn.h                   |   26 +
> >  target/hexagon/reg_fields.h                  |   40 +
> >  target/hexagon/reg_fields_def.h              |  109 +
> >  target/hexagon/regmap.h                      |   38 +
> >  target/hexagon/translate.h                   |  112 +
> >  disas/hexagon.c                              |   62 +
> >  linux-user/elfload.c                         |   16 +
> >  linux-user/hexagon/cpu_loop.c                |  173 ++
> >  linux-user/hexagon/signal.c                  |  276 ++
> >  linux-user/syscall.c                         |    2 +
> >  target/hexagon/arch.c                        |  663 +++++
> >  target/hexagon/conv_emu.c                    |  369 +++
> >  target/hexagon/cpu.c                         |  374 +++
> >  target/hexagon/decode.c                      |  788 +++++
> >  target/hexagon/fma_emu.c                     |  916 ++++++
> >  target/hexagon/gdbstub.c                     |  111 +
> >  target/hexagon/gen_dectree_import.c          |  205 ++
> >  target/hexagon/gen_semantics.c               |  101 +
> >  target/hexagon/genptr.c                      |   61 +
> >  target/hexagon/iclass.c                      |  107 +
> >  target/hexagon/mmvec/decode_ext_mmvec.c      |  670 +++++
> >  target/hexagon/mmvec/system_ext_mmvec.c      |  263 ++
> >  target/hexagon/op_helper.c                   |  509 ++++
> >  target/hexagon/opcodes.c                     |  217 ++
> >  target/hexagon/printinsn.c                   |   91 +
> >  target/hexagon/q6v_decode.c                  |  416 +++
> >  target/hexagon/reg_fields.c                  |   28 +
> >  target/hexagon/translate.c                   |  916 ++++++
> >  MAINTAINERS                                  |    8 +
> >  disas/Makefile.objs                          |    1 +
> >  scripts/qemu-binfmt-conf.sh                  |    6 +-
> >  target/hexagon/Makefile.objs                 |  127 +
> >  target/hexagon/README                        |  296 ++
> >  target/hexagon/dectree.py                    |  353 +++
> >  target/hexagon/do_qemu.py                    | 1194 ++++++++
> >  target/hexagon/imported/allext.idef          |   25 +
> >  target/hexagon/imported/allext_macros.def    |   25 +
> >  target/hexagon/imported/allextenc.def        |   20 +
> >  target/hexagon/imported/allidefs.def         |   92 +
> >  target/hexagon/imported/alu.idef             | 1335 +++++++++
> >  target/hexagon/imported/branch.idef          |  344 +++
> >  target/hexagon/imported/compare.idef         |  639 +++++
> >  target/hexagon/imported/encode.def           |  126 +
> >  target/hexagon/imported/encode_pp.def        | 2283 +++++++++++++++
> >  target/hexagon/imported/encode_subinsn.def   |  150 +
> >  target/hexagon/imported/float.idef           |  498 ++++
> >  target/hexagon/imported/iclass.def           |   52 +
> >  target/hexagon/imported/ldst.idef            |  421 +++
> >  target/hexagon/imported/macros.def           | 3970
> > ++++++++++++++++++++++++++
> >  target/hexagon/imported/mmvec/encode_ext.def |  830 ++++++
> >  target/hexagon/imported/mmvec/ext.idef       | 2780
> > ++++++++++++++++++
> >  target/hexagon/imported/mmvec/macros.def     | 1110 +++++++
> >  target/hexagon/imported/mpy.idef             | 1269 ++++++++
> >  target/hexagon/imported/shift.idef           | 1211 ++++++++
> >  target/hexagon/imported/subinsns.idef        |  152 +
> >  target/hexagon/imported/system.idef          |  302 ++
> >  tests/tcg/configure.sh                       |    4 +-
> >  tests/tcg/hexagon/float_convs.ref            |  748 +++++
> >  tests/tcg/hexagon/float_madds.ref            |  768 +++++
> >  97 files changed, 36063 insertions(+), 2 deletions(-)
> >  create mode 100644 default-configs/hexagon-linux-user.mak
> >  create mode 100644 linux-user/hexagon/sockbits.h
> >  create mode 100644 linux-user/hexagon/syscall_nr.h
> >  create mode 100644 linux-user/hexagon/target_cpu.h
> >  create mode 100644 linux-user/hexagon/target_elf.h
> >  create mode 100644 linux-user/hexagon/target_fcntl.h
> >  create mode 100644 linux-user/hexagon/target_signal.h
> >  create mode 100644 linux-user/hexagon/target_structs.h
> >  create mode 100644 linux-user/hexagon/target_syscall.h
> >  create mode 100644 linux-user/hexagon/termbits.h
> >  create mode 100644 target/hexagon/arch.h
> >  create mode 100644 target/hexagon/attribs.h
> >  create mode 100644 target/hexagon/attribs_def.h
> >  create mode 100644 target/hexagon/conv_emu.h
> >  create mode 100644 target/hexagon/cpu-param.h
> >  create mode 100644 target/hexagon/cpu.h
> >  create mode 100644 target/hexagon/cpu_bits.h
> >  create mode 100644 target/hexagon/decode.h
> >  create mode 100644 target/hexagon/fma_emu.h
> >  create mode 100644 target/hexagon/genptr.h
> >  create mode 100644 target/hexagon/genptr_helpers.h
> >  create mode 100644 target/hexagon/helper.h
> >  create mode 100644 target/hexagon/helper_overrides.h
> >  create mode 100644 target/hexagon/hex_arch_types.h
> >  create mode 100644 target/hexagon/hex_regs.h
> >  create mode 100644 target/hexagon/iclass.h
> >  create mode 100644 target/hexagon/insn.h
> >  create mode 100644 target/hexagon/internal.h
> >  create mode 100644 target/hexagon/macros.h
> >  create mode 100644 target/hexagon/mmvec/decode_ext_mmvec.h
> >  create mode 100644 target/hexagon/mmvec/macros.h
> >  create mode 100644 target/hexagon/mmvec/mmvec.h
> >  create mode 100644 target/hexagon/mmvec/system_ext_mmvec.h
> >  create mode 100644 target/hexagon/opcodes.h
> >  create mode 100644 target/hexagon/printinsn.h
> >  create mode 100644 target/hexagon/reg_fields.h
> >  create mode 100644 target/hexagon/reg_fields_def.h
> >  create mode 100644 target/hexagon/regmap.h
> >  create mode 100644 target/hexagon/translate.h
> >  create mode 100644 disas/hexagon.c
> >  create mode 100644 linux-user/hexagon/cpu_loop.c
> >  create mode 100644 linux-user/hexagon/signal.c
> >  create mode 100644 target/hexagon/arch.c
> >  create mode 100644 target/hexagon/conv_emu.c
> >  create mode 100644 target/hexagon/cpu.c
> >  create mode 100644 target/hexagon/decode.c
> >  create mode 100644 target/hexagon/fma_emu.c
> >  create mode 100644 target/hexagon/gdbstub.c
> >  create mode 100644 target/hexagon/gen_dectree_import.c
> >  create mode 100644 target/hexagon/gen_semantics.c
> >  create mode 100644 target/hexagon/genptr.c
> >  create mode 100644 target/hexagon/iclass.c
> >  create mode 100644 target/hexagon/mmvec/decode_ext_mmvec.c
> >  create mode 100644 target/hexagon/mmvec/system_ext_mmvec.c
> >  create mode 100644 target/hexagon/op_helper.c
> >  create mode 100644 target/hexagon/opcodes.c
> >  create mode 100644 target/hexagon/printinsn.c
> >  create mode 100644 target/hexagon/q6v_decode.c
> >  create mode 100644 target/hexagon/reg_fields.c
> >  create mode 100644 target/hexagon/translate.c
> >  create mode 100644 target/hexagon/Makefile.objs
> >  create mode 100644 target/hexagon/README
> >  create mode 100755 target/hexagon/dectree.py
> >  create mode 100755 target/hexagon/do_qemu.py
> >  create mode 100644 target/hexagon/imported/allext.idef
> >  create mode 100644 target/hexagon/imported/allext_macros.def
> >  create mode 100644 target/hexagon/imported/allextenc.def
> >  create mode 100644 target/hexagon/imported/allidefs.def
> >  create mode 100644 target/hexagon/imported/alu.idef
> >  create mode 100644 target/hexagon/imported/branch.idef
> >  create mode 100644 target/hexagon/imported/compare.idef
> >  create mode 100644 target/hexagon/imported/encode.def
> >  create mode 100644 target/hexagon/imported/encode_pp.def
> >  create mode 100644 target/hexagon/imported/encode_subinsn.def
> >  create mode 100644 target/hexagon/imported/float.idef
> >  create mode 100644 target/hexagon/imported/iclass.def
> >  create mode 100644 target/hexagon/imported/ldst.idef
> >  create mode 100755 target/hexagon/imported/macros.def
> >  create mode 100644 target/hexagon/imported/mmvec/encode_ext.def
> >  create mode 100644 target/hexagon/imported/mmvec/ext.idef
> >  create mode 100755 target/hexagon/imported/mmvec/macros.def
> >  create mode 100644 target/hexagon/imported/mpy.idef
> >  create mode 100644 target/hexagon/imported/shift.idef
> >  create mode 100644 target/hexagon/imported/subinsns.idef
> >  create mode 100644 target/hexagon/imported/system.idef
> >  create mode 100644 tests/tcg/hexagon/float_convs.ref
> >  create mode 100644 tests/tcg/hexagon/float_madds.ref
> >
> > --
> > 2.7.4


^ permalink raw reply	[flat|nested] 72+ messages in thread

end of thread, other threads:[~2020-04-30 20:56 UTC | newest]

Thread overview: 72+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-28 16:42 [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
2020-02-28 16:42 ` [RFC PATCH v2 01/67] Hexagon Maintainers Taylor Simpson
2020-02-28 16:42 ` [RFC PATCH v2 02/67] Hexagon README Taylor Simpson
2020-02-28 16:42 ` [RFC PATCH v2 03/67] Hexagon ELF Machine Definition Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 04/67] Hexagon CPU Scalar Core Definition Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 05/67] Hexagon register names Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 06/67] Hexagon Disassembler Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 07/67] Hexagon CPU Scalar Core Helpers Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 08/67] Hexagon GDB Stub Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 09/67] Hexagon architecture types Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 10/67] Hexagon instruction and packet types Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 11/67] Hexagon register fields Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 12/67] Hexagon instruction attributes Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 13/67] Hexagon register map Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 14/67] Hexagon instruction/packet decode Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 15/67] Hexagon instruction printing Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 16/67] Hexagon arch import - instruction semantics definitions Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 17/67] Hexagon arch import - macro definitions Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 18/67] Hexagon arch import - instruction encoding Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 19/67] Hexagon instruction class definitions Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 20/67] Hexagon instruction utility functions Taylor Simpson
2020-04-09 18:53   ` Brian Cain
2020-04-09 20:22     ` Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 21/67] Hexagon generator phase 1 - C preprocessor for semantics Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 22/67] Hexagon generator phase 2 - qemu_def_generated.h Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 23/67] Hexagon generator phase 2 - qemu_wrap_generated.h Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 24/67] Hexagon generator phase 2 - opcodes_def_generated.h Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 25/67] Hexagon generator phase 2 - op_attribs_generated.h Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 26/67] Hexagon generator phase 2 - op_regs_generated.h Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 27/67] Hexagon generator phase 2 - printinsn-generated.h Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 28/67] Hexagon generator phase 3 - C preprocessor for decode tree Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 29/67] Hexagon generater phase 4 - Decode tree Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 30/67] Hexagon opcode data structures Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 31/67] Hexagon macros to interface with the generator Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 32/67] Hexagon macros referenced in instruction semantics Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 33/67] Hexagon instruction classes Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 34/67] Hexagon TCG generation helpers - step 1 Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 35/67] Hexagon TCG generation helpers - step 2 Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 36/67] Hexagon TCG generation helpers - step 3 Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 37/67] Hexagon TCG generation helpers - step 4 Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 38/67] Hexagon TCG generation helpers - step 5 Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 39/67] Hexagon TCG generation - step 01 Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 40/67] Hexagon TCG generation - step 02 Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 41/67] Hexagon TCG generation - step 03 Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 42/67] Hexagon TCG generation - step 04 Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 43/67] Hexagon TCG generation - step 05 Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 44/67] Hexagon TCG generation - step 06 Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 45/67] Hexagon TCG generation - step 07 Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 46/67] Hexagon TCG generation - step 08 Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 47/67] Hexagon TCG generation - step 09 Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 48/67] Hexagon TCG generation - step 10 Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 49/67] Hexagon TCG generation - step 11 Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 50/67] Hexagon TCG generation - step 12 Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 51/67] Hexagon translation Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 52/67] Hexagon Linux user emulation Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 53/67] Hexagon build infrastructure Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 54/67] Hexagon - Add Hexagon Vector eXtensions (HVX) to core definition Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 55/67] Hexagon HVX support in gdbstub Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 56/67] Hexagon HVX import instruction encodings Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 57/67] Hexagon HVX import semantics Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 58/67] Hexagon HVX import macro definitions Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 59/67] Hexagon HVX semantics generator Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 60/67] Hexagon HVX instruction decoding Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 61/67] Hexagon HVX instruction utility functions Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 62/67] Hexagon HVX macros to interface with the generator Taylor Simpson
2020-02-28 16:43 ` [RFC PATCH v2 63/67] Hexagon HVX macros referenced in instruction semantics Taylor Simpson
2020-02-28 16:44 ` [RFC PATCH v2 64/67] Hexagon HVX helper to commit vector stores (masked and scatter/gather) Taylor Simpson
2020-02-28 16:44 ` [RFC PATCH v2 65/67] Hexagon HVX TCG generation Taylor Simpson
2020-02-28 16:44 ` [RFC PATCH v2 66/67] Hexagon HVX translation Taylor Simpson
2020-02-28 16:44 ` [RFC PATCH v2 67/67] Hexagon HVX build infrastructure Taylor Simpson
2020-03-25 21:13 ` [RFC PATCH v2 00/67] Hexagon patch series Taylor Simpson
2020-04-30 20:53   ` Taylor Simpson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).