qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/20] Hexagon HVX (target/hexagon) patch series
@ 2021-07-05 23:34 Taylor Simpson
  2021-07-05 23:34 ` [PATCH 01/20] Hexagon HVX (target/hexagon) README Taylor Simpson
                   ` (19 more replies)
  0 siblings, 20 replies; 40+ messages in thread
From: Taylor Simpson @ 2021-07-05 23:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: ale, peter.maydell, bcain, richard.henderson, tsimpson, philmd

This series adds support for the Hexagon Vector eXtensions (HVX)

These instructions are documented here
https://developer.qualcomm.com/downloads/qualcomm-hexagon-v66-hvx-programmer-s-reference-manual

Hexagon HVX is a wide vector engine with 128 byte vectors.

See patch 01 Hexagon HVX README for more information.

*** Known checkpatch issues ***

The following are known checkpatch errors in the series
    target/hexagon/gen_semantics.c    Suspicious ; after while (0)
    tests/tcg/hexagon/hvx_misc.c      Spaces around operator in macro invocation


Taylor Simpson (20):
  Hexagon HVX (target/hexagon) README
  Hexagon HVX (target/hexagon) add Hexagon Vector eXtensions (HVX) to
    core
  Hexagon HVX (target/hexagon) register names
  Hexagon HVX (target/hexagon) support in gdbstub
  Hexagon HVX (target/hexagon) instruction attributes
  Hexagon HVX (target/hexagon) macros
  Hexagon HVX (target/hexagon) import macro definitions
  Hexagon HVX (target/hexagon) semantics generator
  Hexagon HVX (target/hexagon) semantics generator - part 2
  Hexagon HVX (target/hexagon) C preprocessor for decode tree
  Hexagon HVX (target/hexagon) instruction utility functions
  Hexagon HVX (target/hexagon) helper functions
  Hexagon HVX (target/hexagon) TCG generation
  Hexagon HVX (target/hexagon) import semantics
  Hexagon HVX (target/hexagon) instruction decoding
  Hexagon HVX (target/hexagon) import instruction encodings
  Hexagon HVX (tests/tcg/hexagon) vector_add_int test
  Hexagon HVX (tests/tcg/hexagon) hvx_misc test
  Hexagon HVX (tests/tcg/hexagon) scatter_gather test
  Hexagon HVX (tests/tcg/hexagon) histogram test

 target/hexagon/arch.h                        |    1 +
 target/hexagon/cpu.h                         |   35 +-
 target/hexagon/helper.h                      |    1 +
 target/hexagon/hex_arch_types.h              |    5 +
 target/hexagon/hex_regs.h                    |    1 +
 target/hexagon/insn.h                        |    3 +
 target/hexagon/internal.h                    |    3 +
 target/hexagon/macros.h                      |   22 +
 target/hexagon/mmvec/decode_ext_mmvec.h      |   24 +
 target/hexagon/mmvec/macros.h                |  478 +++++
 target/hexagon/mmvec/mmvec.h                 |   83 +
 target/hexagon/mmvec/system_ext_mmvec.h      |   35 +
 target/hexagon/translate.h                   |   54 +
 tests/tcg/hexagon/hvx_histogram_input.h      |  717 +++++++
 tests/tcg/hexagon/hvx_histogram_row.h        |   24 +
 target/hexagon/attribs_def.h.inc             |   22 +
 target/hexagon/arch.c                        |    9 +
 target/hexagon/cpu.c                         |   72 +-
 target/hexagon/decode.c                      |   28 +-
 target/hexagon/gdbstub.c                     |   66 +
 target/hexagon/gen_dectree_import.c          |   13 +
 target/hexagon/gen_semantics.c               |   33 +
 target/hexagon/genptr.c                      |  111 ++
 target/hexagon/mmvec/decode_ext_mmvec.c      |  235 +++
 target/hexagon/mmvec/system_ext_mmvec.c      |  119 ++
 target/hexagon/op_helper.c                   |   74 +-
 target/hexagon/translate.c                   |  199 ++
 tests/tcg/hexagon/hvx_histogram.c            |   88 +
 tests/tcg/hexagon/hvx_misc.c                 |  331 ++++
 tests/tcg/hexagon/scatter_gather.c           | 1011 ++++++++++
 tests/tcg/hexagon/vector_add_int.c           |   61 +
 target/hexagon/README                        |   83 +-
 target/hexagon/gen_helper_funcs.py           |  111 +-
 target/hexagon/gen_helper_protos.py          |   16 +-
 target/hexagon/gen_tcg_funcs.py              |  254 ++-
 target/hexagon/hex_common.py                 |    9 +-
 target/hexagon/imported/allext.idef          |   25 +
 target/hexagon/imported/allext_macros.def    |   25 +
 target/hexagon/imported/allextenc.def        |   20 +
 target/hexagon/imported/allidefs.def         |    1 +
 target/hexagon/imported/encode.def           |    1 +
 target/hexagon/imported/macros.def           |   88 +
 target/hexagon/imported/mmvec/encode_ext.def |  794 ++++++++
 target/hexagon/imported/mmvec/ext.idef       | 2606 ++++++++++++++++++++++++++
 target/hexagon/imported/mmvec/macros.def     |  842 +++++++++
 target/hexagon/meson.build                   |    2 +
 tests/tcg/hexagon/Makefile.target            |   12 +
 tests/tcg/hexagon/hvx_histogram_row.S        |  294 +++
 48 files changed, 9110 insertions(+), 31 deletions(-)
 create mode 100644 target/hexagon/mmvec/decode_ext_mmvec.h
 create mode 100644 target/hexagon/mmvec/macros.h
 create mode 100644 target/hexagon/mmvec/mmvec.h
 create mode 100644 target/hexagon/mmvec/system_ext_mmvec.h
 create mode 100644 tests/tcg/hexagon/hvx_histogram_input.h
 create mode 100644 tests/tcg/hexagon/hvx_histogram_row.h
 create mode 100644 target/hexagon/mmvec/decode_ext_mmvec.c
 create mode 100644 target/hexagon/mmvec/system_ext_mmvec.c
 create mode 100644 tests/tcg/hexagon/hvx_histogram.c
 create mode 100644 tests/tcg/hexagon/hvx_misc.c
 create mode 100644 tests/tcg/hexagon/scatter_gather.c
 create mode 100644 tests/tcg/hexagon/vector_add_int.c
 create mode 100644 target/hexagon/imported/allext.idef
 create mode 100644 target/hexagon/imported/allext_macros.def
 create mode 100644 target/hexagon/imported/allextenc.def
 create mode 100644 target/hexagon/imported/mmvec/encode_ext.def
 create mode 100644 target/hexagon/imported/mmvec/ext.idef
 create mode 100755 target/hexagon/imported/mmvec/macros.def
 create mode 100644 tests/tcg/hexagon/hvx_histogram_row.S

-- 
2.7.4


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 01/20] Hexagon HVX (target/hexagon) README
  2021-07-05 23:34 [PATCH 00/20] Hexagon HVX (target/hexagon) patch series Taylor Simpson
@ 2021-07-05 23:34 ` Taylor Simpson
  2021-07-12  8:16   ` Rob Landley
  2021-07-05 23:34 ` [PATCH 02/20] Hexagon HVX (target/hexagon) add Hexagon Vector eXtensions (HVX) to core Taylor Simpson
                   ` (18 subsequent siblings)
  19 siblings, 1 reply; 40+ messages in thread
From: Taylor Simpson @ 2021-07-05 23:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: ale, peter.maydell, bcain, richard.henderson, tsimpson, philmd

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/README | 83 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 82 insertions(+), 1 deletion(-)

diff --git a/target/hexagon/README b/target/hexagon/README
index b0b2435..9a57802 100644
--- a/target/hexagon/README
+++ b/target/hexagon/README
@@ -1,9 +1,13 @@
 Hexagon is Qualcomm's very long instruction word (VLIW) digital signal
-processor(DSP).
+processor(DSP).  We also support Hexagon Vector eXtensions (HVX).  HVX
+is a wide vector coprocessor designed for high performance computer vision,
+image processing, machine learning, and other workloads.
 
 The following versions of the Hexagon core are supported
     Scalar core: v67
     https://developer.qualcomm.com/downloads/qualcomm-hexagon-v67-programmer-s-reference-manual
+    HVX extension: v66
+    https://developer.qualcomm.com/downloads/qualcomm-hexagon-v66-hvx-programmer-s-reference-manual
 
 We presented an overview of the project at the 2019 KVM Forum.
     https://kvmforum2019.sched.com/event/Tmwc/qemu-hexagon-automatic-translation-of-the-isa-manual-pseudcode-to-tiny-code-instructions-of-a-vliw-architecture-niccolo-izzo-revng-taylor-simpson-qualcomm-innovation-center
@@ -124,6 +128,73 @@ There are also cases where we brute force the TCG code generation.
 Instructions with multiple definitions are examples.  These require special
 handling because qemu helpers can only return a single value.
 
+For HVX vectors, the generator behaves slightly differently.  The wide vectors
+won't fit in a TCGv or TCGv_i64, so we pass TCGv_ptr variables to pass the
+address to helper functions.  Here's an example for an HVX vector-add-word
+istruction.
+    static void generate_V6_vaddw(
+                    CPUHexagonState *env,
+                    DisasContext *ctx,
+                    Insn *insn,
+                    Packet *pkt)
+    {
+        const int VdN = insn->regno[0];
+        const intptr_t VdV_off =
+            offsetof(CPUHexagonState,
+                     future_VRegs[VdN]);
+        TCGv_ptr VdV = tcg_temp_local_new_ptr();
+        tcg_gen_addi_ptr(VdV, cpu_env, VdV_off);
+        const int VuN = insn->regno[1];
+        const intptr_t VuV_off =
+            vreg_src_off(ctx, VuN);
+        TCGv_ptr VuV = tcg_temp_local_new_ptr();
+        const int VvN = insn->regno[2];
+        const intptr_t VvV_off =
+            vreg_src_off(ctx, VvN);
+        TCGv_ptr VvV = tcg_temp_local_new_ptr();
+        tcg_gen_addi_ptr(VuV, cpu_env, VuV_off);
+        tcg_gen_addi_ptr(VvV, cpu_env, VvV_off);
+        TCGv slot = tcg_const_tl(insn->slot);
+        gen_helper_V6_vaddw(cpu_env, VdV, VuV, VvV, slot);
+        tcg_temp_free(slot);
+        gen_log_vreg_write(VdV_off, VdN, EXT_DFL, insn->slot, false, pkt->pkt_has_vhist);
+        ctx_log_vreg_write(ctx, VdN, EXT_DFL, false);
+        tcg_temp_free_ptr(VdV);
+        tcg_temp_free_ptr(VuV);
+        tcg_temp_free_ptr(VvV);
+    }
+
+Notice that we also generate a variable named <operand>_off for each operand of
+the instruction.  This makes it easy to override the instruction semantics with
+functions from tcg-op-gved.h.  Here's the override for this instruction.
+    #define fGEN_TCG_V6_vaddw(SHORTCODE) \
+        tcg_gen_gvec_add(MO_32, VdV_off, VuV_off, VvV_off, \
+                         sizeof(MMVector), sizeof(MMVector))
+
+Finally, we notice that the override doesn't use the TCGv_ptr variables, so
+we don't generate them when an override is present.  Here is what we generate
+when the override is present.
+    static void generate_V6_vaddw(
+                    CPUHexagonState *env,
+                    DisasContext *ctx,
+                    Insn *insn,
+                    Packet *pkt)
+    {
+        const int VdN = insn->regno[0];
+        const intptr_t VdV_off =
+            offsetof(CPUHexagonState,
+                     future_VRegs[VdN]);
+        const int VuN = insn->regno[1];
+        const intptr_t VuV_off =
+            vreg_src_off(ctx, VuN);
+        const int VvN = insn->regno[2];
+        const intptr_t VvV_off =
+            vreg_src_off(ctx, VvN);
+        fGEN_TCG_V6_vaddw({ fHIDE(int i;) fVFOREACH(32, i) { VdV.w[i] = VuV.w[i] + VvV.w[i] ; } });
+        gen_log_vreg_write(VdV_off, VdN, EXT_DFL, insn->slot, false, pkt->pkt_has_vhist);
+        ctx_log_vreg_write(ctx, VdN, EXT_DFL, false);
+    }
+
 In addition to instruction semantics, we use a generator to create the decode
 tree.  This generation is also a two step process.  The first step is to run
 target/hexagon/gen_dectree_import.c to produce
@@ -140,6 +211,7 @@ runtime information for each thread and contains stuff like the GPR and
 predicate registers.
 
 macros.h
+mmvec/macros.h
 
 The Hexagon arch lib relies heavily on macros for the instruction semantics.
 This is a great advantage for qemu because we can override them for different
@@ -203,6 +275,15 @@ During runtime, the following fields in CPUHexagonState (see cpu.h) are used
     pred_written          boolean indicating if predicate was written
     mem_log_stores        record of the stores (indexed by slot)
 
+For Hexagon Vector eXtensions (HVX), the following fields are used
+    future_VRegs
+    tmp_VRegs
+    future_ZRegs
+    ZRegs_updated
+    VRegs_updated_tmp
+    VRegs_updated
+    VRegs_select
+
 *** Debugging ***
 
 You can turn on a lot of debugging by changing the HEX_DEBUG macro to 1 in
-- 
2.7.4


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 02/20] Hexagon HVX (target/hexagon) add Hexagon Vector eXtensions (HVX) to core
  2021-07-05 23:34 [PATCH 00/20] Hexagon HVX (target/hexagon) patch series Taylor Simpson
  2021-07-05 23:34 ` [PATCH 01/20] Hexagon HVX (target/hexagon) README Taylor Simpson
@ 2021-07-05 23:34 ` Taylor Simpson
  2021-07-25 13:08   ` Richard Henderson
  2021-07-05 23:34 ` [PATCH 03/20] Hexagon HVX (target/hexagon) register names Taylor Simpson
                   ` (17 subsequent siblings)
  19 siblings, 1 reply; 40+ messages in thread
From: Taylor Simpson @ 2021-07-05 23:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: ale, peter.maydell, bcain, richard.henderson, tsimpson, philmd

HVX is a set of wide vector instructions.  Machine state includes
    vector registers (VRegs)
    vector predicate registers (QRegs)
    temporary registers for intermediate values
    store buffer (masked stores and scatter/gather)

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/cpu.h            | 35 ++++++++++++++++-
 target/hexagon/hex_arch_types.h |  5 +++
 target/hexagon/insn.h           |  3 ++
 target/hexagon/internal.h       |  3 ++
 target/hexagon/mmvec/mmvec.h    | 83 +++++++++++++++++++++++++++++++++++++++++
 target/hexagon/cpu.c            | 72 ++++++++++++++++++++++++++++++++++-
 6 files changed, 198 insertions(+), 3 deletions(-)
 create mode 100644 target/hexagon/mmvec/mmvec.h

diff --git a/target/hexagon/cpu.h b/target/hexagon/cpu.h
index 2855dd3..0b377c3 100644
--- a/target/hexagon/cpu.h
+++ b/target/hexagon/cpu.h
@@ -26,6 +26,7 @@ typedef struct CPUHexagonState CPUHexagonState;
 #include "qemu-common.h"
 #include "exec/cpu-defs.h"
 #include "hex_regs.h"
+#include "mmvec/mmvec.h"
 
 #define NUM_PREGS 4
 #define TOTAL_PER_THREAD_REGS 64
@@ -34,6 +35,7 @@ typedef struct CPUHexagonState CPUHexagonState;
 #define STORES_MAX 2
 #define REG_WRITES_MAX 32
 #define PRED_WRITES_MAX 5                   /* 4 insns + endloop */
+#define VSTORES_MAX 2
 
 #define TYPE_HEXAGON_CPU "hexagon-cpu"
 
@@ -52,6 +54,13 @@ typedef struct {
     uint64_t data64;
 } MemLog;
 
+typedef struct {
+    target_ulong va;
+    int size;
+    MMVector mask QEMU_ALIGNED(16);
+    MMVector data QEMU_ALIGNED(16);
+} VStoreLog;
+
 #define EXEC_STATUS_OK          0x0000
 #define EXEC_STATUS_STOP        0x0002
 #define EXEC_STATUS_REPLAY      0x0010
@@ -98,7 +107,31 @@ struct CPUHexagonState {
     uint64_t     llsc_val_i64;
 
     target_ulong is_gather_store_insn;
-    target_ulong gather_issued;
+    bool gather_issued;
+
+    MMVector VRegs[NUM_VREGS] QEMU_ALIGNED(16);
+    MMVector future_VRegs[NUM_VREGS] QEMU_ALIGNED(16);
+    MMVector tmp_VRegs[NUM_VREGS] QEMU_ALIGNED(16);
+
+    VRegMask VRegs_updated_tmp;
+    VRegMask VRegs_updated;
+    VRegMask VRegs_select;
+
+    MMQReg QRegs[NUM_QREGS] QEMU_ALIGNED(16);
+    MMQReg future_QRegs[NUM_QREGS] QEMU_ALIGNED(16);
+    QRegMask QRegs_updated;
+
+    /* Temporaries used within instructions */
+    MMVector zero_vector QEMU_ALIGNED(16);
+    MMVectorPair VddV QEMU_ALIGNED(16),
+                 VuuV QEMU_ALIGNED(16),
+                 VvvV QEMU_ALIGNED(16),
+                 VxxV QEMU_ALIGNED(16);
+
+    VStoreLog vstore[VSTORES_MAX];
+    target_ulong vstore_pending[VSTORES_MAX];
+    bool vtcm_pending;
+    VTCMStoreLog vtcm_log;
 };
 
 #define HEXAGON_CPU_CLASS(klass) \
diff --git a/target/hexagon/hex_arch_types.h b/target/hexagon/hex_arch_types.h
index d721e1f..78ad607 100644
--- a/target/hexagon/hex_arch_types.h
+++ b/target/hexagon/hex_arch_types.h
@@ -19,6 +19,7 @@
 #define HEXAGON_ARCH_TYPES_H
 
 #include "qemu/osdep.h"
+#include "mmvec/mmvec.h"
 #include "qemu/int128.h"
 
 /*
@@ -35,4 +36,8 @@ typedef uint64_t    size8u_t;
 typedef int64_t     size8s_t;
 typedef Int128      size16s_t;
 
+typedef MMVector          mmvector_t;
+typedef MMVectorPair      mmvector_pair_t;
+typedef MMQReg            mmqret_t;
+
 #endif
diff --git a/target/hexagon/insn.h b/target/hexagon/insn.h
index 2e34591..3872f90 100644
--- a/target/hexagon/insn.h
+++ b/target/hexagon/insn.h
@@ -67,6 +67,9 @@ struct Packet {
     bool pkt_has_store_s0;
     bool pkt_has_store_s1;
 
+    bool pkt_has_hvx;
+    bool pkt_has_vhist;
+
     Insn insn[INSTRUCTIONS_MAX];
 };
 
diff --git a/target/hexagon/internal.h b/target/hexagon/internal.h
index 6b20aff..82ac304 100644
--- a/target/hexagon/internal.h
+++ b/target/hexagon/internal.h
@@ -31,6 +31,9 @@
 
 int hexagon_gdb_read_register(CPUState *cpu, GByteArray *buf, int reg);
 int hexagon_gdb_write_register(CPUState *cpu, uint8_t *buf, int reg);
+
+void hexagon_debug_vreg(CPUHexagonState *env, int regnum);
+void hexagon_debug_qreg(CPUHexagonState *env, int regnum);
 void hexagon_debug(CPUHexagonState *env);
 
 extern const char * const hexagon_regnames[TOTAL_PER_THREAD_REGS];
diff --git a/target/hexagon/mmvec/mmvec.h b/target/hexagon/mmvec/mmvec.h
new file mode 100644
index 0000000..061ccce
--- /dev/null
+++ b/target/hexagon/mmvec/mmvec.h
@@ -0,0 +1,83 @@
+/*
+ *  Copyright(c) 2019-2021 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HEXAGON_MMVEC_H
+#define HEXAGON_MMVEC_H
+
+#define MAX_VEC_SIZE_LOGBYTES 7
+#define MAX_VEC_SIZE_BYTES  (1 << MAX_VEC_SIZE_LOGBYTES)
+
+#define NUM_VREGS           32
+#define NUM_QREGS           4
+
+typedef uint32_t VRegMask; /* at least NUM_VREGS bits */
+typedef uint32_t QRegMask; /* at least NUM_QREGS bits */
+
+#define VECTOR_SIZE_BYTE    (fVECSIZE())
+
+typedef union {
+    uint64_t ud[MAX_VEC_SIZE_BYTES / 8];
+    int64_t   d[MAX_VEC_SIZE_BYTES / 8];
+    uint32_t uw[MAX_VEC_SIZE_BYTES / 4];
+    int32_t   w[MAX_VEC_SIZE_BYTES / 4];
+    uint16_t uh[MAX_VEC_SIZE_BYTES / 2];
+    int16_t   h[MAX_VEC_SIZE_BYTES / 2];
+    uint8_t  ub[MAX_VEC_SIZE_BYTES / 1];
+    int8_t    b[MAX_VEC_SIZE_BYTES / 1];
+} MMVector;
+
+typedef union {
+    uint64_t ud[2 * MAX_VEC_SIZE_BYTES / 8];
+    int64_t   d[2 * MAX_VEC_SIZE_BYTES / 8];
+    uint32_t uw[2 * MAX_VEC_SIZE_BYTES / 4];
+    int32_t   w[2 * MAX_VEC_SIZE_BYTES / 4];
+    uint16_t uh[2 * MAX_VEC_SIZE_BYTES / 2];
+    int16_t   h[2 * MAX_VEC_SIZE_BYTES / 2];
+    uint8_t  ub[2 * MAX_VEC_SIZE_BYTES / 1];
+    int8_t    b[2 * MAX_VEC_SIZE_BYTES / 1];
+    MMVector v[2];
+} MMVectorPair;
+
+typedef union {
+    uint64_t ud[MAX_VEC_SIZE_BYTES / 8 / 8];
+    int64_t   d[MAX_VEC_SIZE_BYTES / 8 / 8];
+    uint32_t uw[MAX_VEC_SIZE_BYTES / 4 / 8];
+    int32_t   w[MAX_VEC_SIZE_BYTES / 4 / 8];
+    uint16_t uh[MAX_VEC_SIZE_BYTES / 2 / 8];
+    int16_t   h[MAX_VEC_SIZE_BYTES / 2 / 8];
+    uint8_t  ub[MAX_VEC_SIZE_BYTES / 1 / 8];
+    int8_t    b[MAX_VEC_SIZE_BYTES / 1 / 8];
+} MMQReg;
+
+typedef struct {
+    MMVector data;
+    MMVector mask;
+    int size;
+    target_ulong va[MAX_VEC_SIZE_BYTES];
+    bool op;
+    int op_size;
+} VTCMStoreLog;
+
+
+/* Types of vector register assignment */
+typedef enum {
+    EXT_DFL,      /* Default */
+    EXT_NEW,      /* New - value used in the same packet */
+    EXT_TMP       /* Temp - value used but not stored to register */
+} VRegWriteType;
+
+#endif
diff --git a/target/hexagon/cpu.c b/target/hexagon/cpu.c
index 3338365..4c7eaff 100644
--- a/target/hexagon/cpu.c
+++ b/target/hexagon/cpu.c
@@ -59,7 +59,7 @@ const char * const hexagon_regnames[TOTAL_PER_THREAD_REGS] = {
   "r24", "r25", "r26", "r27", "r28",  "r29", "r30", "r31",
   "sa0", "lc0", "sa1", "lc1", "p3_0", "c5",  "m0",  "m1",
   "usr", "pc",  "ugp", "gp",  "cs0",  "cs1", "c14", "c15",
-  "c16", "c17", "c18", "c19", "pkt_cnt",  "insn_cnt", "c22", "c23",
+  "c16", "c17", "c18", "c19", "pkt_cnt",  "insn_cnt", "hvx_cnt", "c23",
   "c24", "c25", "c26", "c27", "c28",  "c29", "c30", "c31",
 };
 
@@ -113,6 +113,57 @@ static void print_reg(FILE *f, CPUHexagonState *env, int regnum)
                  hexagon_regnames[regnum], value);
 }
 
+static void print_vreg(FILE *f, CPUHexagonState *env, int regnum)
+{
+    bool nonzero_found = false;
+    int i;
+    qemu_fprintf(f, "  v%d = ( ", regnum);
+    qemu_fprintf(f, "0x%02x", env->VRegs[regnum].ub[MAX_VEC_SIZE_BYTES - 1]);
+    for (i = MAX_VEC_SIZE_BYTES - 2; i >= 0; i--) {
+        if (env->VRegs[regnum].ub[i] != 0) {
+            nonzero_found = true;
+            break;
+        }
+    }
+    if (nonzero_found) {
+        for (i = MAX_VEC_SIZE_BYTES - 2; i >= 0; i--) {
+            qemu_fprintf(f, ", 0x%02x", env->VRegs[regnum].ub[i]);
+        }
+    }
+    qemu_fprintf(f, " )\n");
+}
+
+void hexagon_debug_vreg(CPUHexagonState *env, int regnum)
+{
+    print_vreg(stdout, env, regnum);
+}
+
+static void print_qreg(FILE *f, CPUHexagonState *env, int regnum)
+{
+    bool nonzero_found = false;
+    int i;
+    qemu_fprintf(f, "  q%d = ( ", regnum);
+    qemu_fprintf(f, "0x%02x",
+                 env->QRegs[regnum].ub[MAX_VEC_SIZE_BYTES / 8 - 1]);
+    for (i = MAX_VEC_SIZE_BYTES / 8 - 2; i >= 0; i--) {
+        if (env->QRegs[regnum].ub[i] != 0) {
+            nonzero_found = true;
+            break;
+        }
+    }
+    if (nonzero_found) {
+        for (i = MAX_VEC_SIZE_BYTES / 8 - 2; i >= 0; i--) {
+            qemu_fprintf(f, ", 0x%02x", env->QRegs[regnum].ub[i]);
+        }
+    }
+    qemu_fprintf(f, " )\n");
+}
+
+void hexagon_debug_qreg(CPUHexagonState *env, int regnum)
+{
+    print_qreg(stdout, env, regnum);
+}
+
 static void hexagon_dump(CPUHexagonState *env, FILE *f)
 {
     HexagonCPU *cpu = env_archcpu(env);
@@ -159,6 +210,22 @@ static void hexagon_dump(CPUHexagonState *env, FILE *f)
     print_reg(f, env, HEX_REG_CS1);
 #endif
     qemu_fprintf(f, "}\n");
+
+/*
+ * The HVX register dump takes up a ton of space in the log
+ * Don't print it unless it is needed
+ */
+#define DUMP_HVX 0
+#if DUMP_HVX
+    qemu_fprintf(f, "Vector Registers = {\n");
+    for (int i = 0; i < NUM_VREGS; i++) {
+        print_vreg(f, env, i);
+    }
+    for (int i = 0; i < NUM_QREGS; i++) {
+        print_qreg(f, env, i);
+    }
+    qemu_fprintf(f, "}\n");
+#endif
 }
 
 static void hexagon_dump_state(CPUState *cs, FILE *f, int flags)
@@ -211,6 +278,7 @@ static void hexagon_cpu_reset(DeviceState *dev)
 
     set_default_nan_mode(1, &env->fp_status);
     set_float_detect_tininess(float_tininess_before_rounding, &env->fp_status);
+    memset(&env->zero_vector, 0, sizeof(MMVector));
 }
 
 static void hexagon_cpu_disas_set_info(CPUState *s, disassemble_info *info)
@@ -292,7 +360,7 @@ static void hexagon_cpu_class_init(ObjectClass *c, void *data)
     cc->set_pc = hexagon_cpu_set_pc;
     cc->gdb_read_register = hexagon_gdb_read_register;
     cc->gdb_write_register = hexagon_gdb_write_register;
-    cc->gdb_num_core_regs = TOTAL_PER_THREAD_REGS;
+    cc->gdb_num_core_regs = TOTAL_PER_THREAD_REGS + NUM_VREGS + NUM_QREGS;
     cc->gdb_stop_before_watchpoint = true;
     cc->disas_set_info = hexagon_cpu_disas_set_info;
     cc->tcg_ops = &hexagon_tcg_ops;
-- 
2.7.4


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 03/20] Hexagon HVX (target/hexagon) register names
  2021-07-05 23:34 [PATCH 00/20] Hexagon HVX (target/hexagon) patch series Taylor Simpson
  2021-07-05 23:34 ` [PATCH 01/20] Hexagon HVX (target/hexagon) README Taylor Simpson
  2021-07-05 23:34 ` [PATCH 02/20] Hexagon HVX (target/hexagon) add Hexagon Vector eXtensions (HVX) to core Taylor Simpson
@ 2021-07-05 23:34 ` Taylor Simpson
  2021-07-25 13:10   ` Richard Henderson
  2021-07-05 23:34 ` [PATCH 04/20] Hexagon HVX (target/hexagon) support in gdbstub Taylor Simpson
                   ` (16 subsequent siblings)
  19 siblings, 1 reply; 40+ messages in thread
From: Taylor Simpson @ 2021-07-05 23:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: ale, peter.maydell, bcain, richard.henderson, tsimpson, philmd

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/hex_regs.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/hexagon/hex_regs.h b/target/hexagon/hex_regs.h
index f291911..e1b3149 100644
--- a/target/hexagon/hex_regs.h
+++ b/target/hexagon/hex_regs.h
@@ -76,6 +76,7 @@ enum {
     /* Use reserved control registers for qemu execution counts */
     HEX_REG_QEMU_PKT_CNT      = 52,
     HEX_REG_QEMU_INSN_CNT     = 53,
+    HEX_REG_QEMU_HVX_CNT      = 54,
     HEX_REG_UTIMERLO          = 62,
     HEX_REG_UTIMERHI          = 63,
 };
-- 
2.7.4


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 04/20] Hexagon HVX (target/hexagon) support in gdbstub
  2021-07-05 23:34 [PATCH 00/20] Hexagon HVX (target/hexagon) patch series Taylor Simpson
                   ` (2 preceding siblings ...)
  2021-07-05 23:34 ` [PATCH 03/20] Hexagon HVX (target/hexagon) register names Taylor Simpson
@ 2021-07-05 23:34 ` Taylor Simpson
  2021-07-05 23:34 ` [PATCH 05/20] Hexagon HVX (target/hexagon) instruction attributes Taylor Simpson
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 40+ messages in thread
From: Taylor Simpson @ 2021-07-05 23:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: ale, peter.maydell, bcain, richard.henderson, tsimpson, philmd

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/gdbstub.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 66 insertions(+)

diff --git a/target/hexagon/gdbstub.c b/target/hexagon/gdbstub.c
index 9c8c04c..b804e97 100644
--- a/target/hexagon/gdbstub.c
+++ b/target/hexagon/gdbstub.c
@@ -21,6 +21,30 @@
 #include "cpu.h"
 #include "internal.h"
 
+static int gdb_get_vreg(CPUHexagonState *env, GByteArray *mem_buf, int n)
+{
+    int total = 0;
+    int i;
+    for (i = 0; i < MAX_VEC_SIZE_BYTES / sizeof(uint32_t); i++) {
+        total += gdb_get_regl(mem_buf, env->VRegs[n].uw[i]);
+        mem_buf += sizeof(uint32_t);
+    }
+    return total;
+}
+
+static int gdb_get_qreg(CPUHexagonState *env, GByteArray *mem_buf, int n)
+{
+    int total = 0;
+    int i;
+    for (i = 0;
+         i < MAX_VEC_SIZE_BYTES / sizeof(uint32_t) / BITS_PER_BYTE;
+         i++) {
+        total += gdb_get_regl(mem_buf, env->QRegs[n].uw[i]);
+        mem_buf += sizeof(uint32_t);
+    }
+    return total;
+}
+
 int hexagon_gdb_read_register(CPUState *cs, GByteArray *mem_buf, int n)
 {
     HexagonCPU *cpu = HEXAGON_CPU(cs);
@@ -29,10 +53,42 @@ int hexagon_gdb_read_register(CPUState *cs, GByteArray *mem_buf, int n)
     if (n < TOTAL_PER_THREAD_REGS) {
         return gdb_get_regl(mem_buf, env->gpr[n]);
     }
+    n -= TOTAL_PER_THREAD_REGS;
+
+    if (n < NUM_VREGS) {
+        return gdb_get_vreg(env, mem_buf, n);
+    }
+    n -= NUM_VREGS;
+
+    if (n < NUM_QREGS) {
+        return gdb_get_qreg(env, mem_buf, n);
+    }
 
     g_assert_not_reached();
 }
 
+static int gdb_put_vreg(CPUHexagonState *env, uint8_t *mem_buf, int n)
+{
+    int i;
+    for (i = 0; i < MAX_VEC_SIZE_BYTES / sizeof(uint32_t); i++) {
+        env->VRegs[n].uw[i] = ldtul_p(mem_buf);
+        mem_buf += sizeof(uint32_t);
+    }
+    return MAX_VEC_SIZE_BYTES;
+}
+
+static int gdb_put_qreg(CPUHexagonState *env, uint8_t *mem_buf, int n)
+{
+    int i;
+    for (i = 0;
+         i < MAX_VEC_SIZE_BYTES / sizeof(uint32_t) / BITS_PER_BYTE;
+         i++) {
+        env->QRegs[n].uw[i] = ldtul_p(mem_buf);
+        mem_buf += sizeof(uint32_t);
+    }
+    return MAX_VEC_SIZE_BYTES / BITS_PER_BYTE;
+}
+
 int hexagon_gdb_write_register(CPUState *cs, uint8_t *mem_buf, int n)
 {
     HexagonCPU *cpu = HEXAGON_CPU(cs);
@@ -42,6 +98,16 @@ int hexagon_gdb_write_register(CPUState *cs, uint8_t *mem_buf, int n)
         env->gpr[n] = ldtul_p(mem_buf);
         return sizeof(target_ulong);
     }
+    n -= TOTAL_PER_THREAD_REGS;
+
+    if (n < NUM_VREGS) {
+        return gdb_put_vreg(env, mem_buf, n);
+    }
+    n -= NUM_VREGS;
+
+    if (n < NUM_QREGS) {
+        return gdb_put_qreg(env, mem_buf, n);
+    }
 
     g_assert_not_reached();
 }
-- 
2.7.4


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 05/20] Hexagon HVX (target/hexagon) instruction attributes
  2021-07-05 23:34 [PATCH 00/20] Hexagon HVX (target/hexagon) patch series Taylor Simpson
                   ` (3 preceding siblings ...)
  2021-07-05 23:34 ` [PATCH 04/20] Hexagon HVX (target/hexagon) support in gdbstub Taylor Simpson
@ 2021-07-05 23:34 ` Taylor Simpson
  2021-07-05 23:34 ` [PATCH 06/20] Hexagon HVX (target/hexagon) macros Taylor Simpson
                   ` (14 subsequent siblings)
  19 siblings, 0 replies; 40+ messages in thread
From: Taylor Simpson @ 2021-07-05 23:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: ale, peter.maydell, bcain, richard.henderson, tsimpson, philmd

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/attribs_def.h.inc | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/target/hexagon/attribs_def.h.inc b/target/hexagon/attribs_def.h.inc
index 3815509..4138a7a 100644
--- a/target/hexagon/attribs_def.h.inc
+++ b/target/hexagon/attribs_def.h.inc
@@ -41,6 +41,27 @@ DEF_ATTRIB(STORE, "Stores to memory", "", "")
 DEF_ATTRIB(MEMLIKE, "Memory-like instruction", "", "")
 DEF_ATTRIB(MEMLIKE_PACKET_RULES, "follows Memory-like packet rules", "", "")
 
+/* V6 Vector attributes */
+DEF_ATTRIB(CVI, "Executes on the HVX extension", "", "")
+
+DEF_ATTRIB(CVI_NEW, "New value memory instruction executes on HVX", "", "")
+DEF_ATTRIB(CVI_VM, "Memory instruction executes on HVX", "", "")
+DEF_ATTRIB(CVI_VP, "Permute instruction executes on HVX", "", "")
+DEF_ATTRIB(CVI_VP_VS, "Double vector permute/shft insn executes on HVX", "", "")
+DEF_ATTRIB(CVI_VX, "Multiply instruction executes on HVX", "", "")
+DEF_ATTRIB(CVI_VX_DV, "Double vector multiply insn executes on HVX", "", "")
+DEF_ATTRIB(CVI_VS, "Shift instruction executes on HVX", "", "")
+DEF_ATTRIB(CVI_VS_VX, "Permute/shift and multiply insn executes on HVX", "", "")
+DEF_ATTRIB(CVI_VA, "ALU instruction executes on HVX", "", "")
+DEF_ATTRIB(CVI_VA_DV, "Double vector alu instruction executes on HVX", "", "")
+DEF_ATTRIB(CVI_4SLOT, "Consumes all the vector execution resources", "", "")
+DEF_ATTRIB(CVI_TMP, "Transient Memory Load not written to register", "", "")
+DEF_ATTRIB(CVI_GATHER, "CVI Gather operation", "", "")
+DEF_ATTRIB(CVI_SCATTER, "CVI Scatter operation", "", "")
+DEF_ATTRIB(CVI_SCATTER_RELEASE, "CVI Store Release for scatter", "", "")
+DEF_ATTRIB(CVI_TMP_DST, "CVI instruction that doesn't write a register", "", "")
+DEF_ATTRIB(CVI_SLOT23, "Can execute in slot 2 or slot 3 (HVX)", "", "")
+
 
 /* Change-of-flow attributes */
 DEF_ATTRIB(JUMP, "Jump-type instruction", "", "")
@@ -86,6 +107,7 @@ DEF_ATTRIB(HWLOOP1_END, "Ends HW loop1", "", "")
 DEF_ATTRIB(DCZEROA, "dczeroa type", "", "")
 DEF_ATTRIB(ICFLUSHOP, "icflush op type", "", "")
 DEF_ATTRIB(DCFLUSHOP, "dcflush op type", "", "")
+DEF_ATTRIB(L2FLUSHOP, "l2flush op type", "", "")
 DEF_ATTRIB(DCFETCH, "dcfetch type", "", "")
 
 DEF_ATTRIB(L2FETCH, "Instruction is l2fetch type", "", "")
-- 
2.7.4


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 06/20] Hexagon HVX (target/hexagon) macros
  2021-07-05 23:34 [PATCH 00/20] Hexagon HVX (target/hexagon) patch series Taylor Simpson
                   ` (4 preceding siblings ...)
  2021-07-05 23:34 ` [PATCH 05/20] Hexagon HVX (target/hexagon) instruction attributes Taylor Simpson
@ 2021-07-05 23:34 ` Taylor Simpson
  2021-07-25 13:13   ` Richard Henderson
  2021-07-05 23:34 ` [PATCH 07/20] Hexagon HVX (target/hexagon) import macro definitions Taylor Simpson
                   ` (13 subsequent siblings)
  19 siblings, 1 reply; 40+ messages in thread
From: Taylor Simpson @ 2021-07-05 23:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: ale, peter.maydell, bcain, richard.henderson, tsimpson, philmd

macros to interface with the generator
macros referenced in instruction semantics

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/macros.h       |  22 ++
 target/hexagon/mmvec/macros.h | 478 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 500 insertions(+)
 create mode 100644 target/hexagon/mmvec/macros.h

diff --git a/target/hexagon/macros.h b/target/hexagon/macros.h
index 094b8da..852183f 100644
--- a/target/hexagon/macros.h
+++ b/target/hexagon/macros.h
@@ -269,6 +269,10 @@ static inline void gen_pred_cancel(TCGv pred, int slot_num)
 
 #define fNEWREG_ST(VAL) (VAL)
 
+#define fVSATUVALN(N, VAL) \
+    ({ \
+        (((int)(VAL)) < 0) ? 0 : ((1LL << (N)) - 1); \
+    })
 #define fSATUVALN(N, VAL) \
     ({ \
         fSET_OVERFLOW(); \
@@ -279,10 +283,16 @@ static inline void gen_pred_cancel(TCGv pred, int slot_num)
         fSET_OVERFLOW(); \
         ((VAL) < 0) ? (-(1LL << ((N) - 1))) : ((1LL << ((N) - 1)) - 1); \
     })
+#define fVSATVALN(N, VAL) \
+    ({ \
+        ((VAL) < 0) ? (-(1LL << ((N) - 1))) : ((1LL << ((N) - 1)) - 1); \
+    })
 #define fZXTN(N, M, VAL) (((N) != 0) ? extract64((VAL), 0, (N)) : 0LL)
 #define fSXTN(N, M, VAL) (((N) != 0) ? sextract64((VAL), 0, (N)) : 0LL)
 #define fSATN(N, VAL) \
     ((fSXTN(N, 64, VAL) == (VAL)) ? (VAL) : fSATVALN(N, VAL))
+#define fVSATN(N, VAL) \
+    ((fSXTN(N, 64, VAL) == (VAL)) ? (VAL) : fVSATVALN(N, VAL))
 #define fADDSAT64(DST, A, B) \
     do { \
         uint64_t __a = fCAST8u(A); \
@@ -305,12 +315,18 @@ static inline void gen_pred_cancel(TCGv pred, int slot_num)
             DST = __sum; \
         } \
     } while (0)
+#define fVSATUN(N, VAL) \
+    ((fZXTN(N, 64, VAL) == (VAL)) ? (VAL) : fVSATUVALN(N, VAL))
 #define fSATUN(N, VAL) \
     ((fZXTN(N, 64, VAL) == (VAL)) ? (VAL) : fSATUVALN(N, VAL))
 #define fSATH(VAL) (fSATN(16, VAL))
 #define fSATUH(VAL) (fSATUN(16, VAL))
+#define fVSATH(VAL) (fVSATN(16, VAL))
+#define fVSATUH(VAL) (fVSATUN(16, VAL))
 #define fSATUB(VAL) (fSATUN(8, VAL))
 #define fSATB(VAL) (fSATN(8, VAL))
+#define fVSATUB(VAL) (fVSATUN(8, VAL))
+#define fVSATB(VAL) (fVSATN(8, VAL))
 #define fIMMEXT(IMM) (IMM = IMM)
 #define fMUST_IMMEXT(IMM) fIMMEXT(IMM)
 
@@ -417,6 +433,8 @@ static inline TCGv gen_read_ireg(TCGv result, TCGv val, int shift)
 #define fCAST4s(A) ((int32_t)(A))
 #define fCAST8u(A) ((uint64_t)(A))
 #define fCAST8s(A) ((int64_t)(A))
+#define fCAST2_2s(A) ((int16_t)(A))
+#define fCAST2_2u(A) ((uint16_t)(A))
 #define fCAST4_4s(A) ((int32_t)(A))
 #define fCAST4_4u(A) ((uint32_t)(A))
 #define fCAST4_8s(A) ((int64_t)((int32_t)(A)))
@@ -514,7 +532,9 @@ static inline TCGv gen_read_ireg(TCGv result, TCGv val, int shift)
 #define fPM_M(REG, MVAL)    do { REG = REG + (MVAL); } while (0)
 #endif
 #define fSCALE(N, A) (((int64_t)(A)) << N)
+#define fVSATW(A) fVSATN(32, ((long long)A))
 #define fSATW(A) fSATN(32, ((long long)A))
+#define fVSAT(A) fVSATN(32, (A))
 #define fSAT(A) fSATN(32, (A))
 #define fSAT_ORIG_SHL(A, ORIG_REG) \
     ((((int32_t)((fSAT(A)) ^ ((int32_t)(ORIG_REG)))) < 0) \
@@ -651,12 +671,14 @@ static inline TCGv gen_read_ireg(TCGv result, TCGv val, int shift)
             fSETBIT(j, DST, VAL); \
         } \
     } while (0)
+#define fCOUNTONES_2(VAL) ctpop16(VAL)
 #define fCOUNTONES_4(VAL) ctpop32(VAL)
 #define fCOUNTONES_8(VAL) ctpop64(VAL)
 #define fBREV_8(VAL) revbit64(VAL)
 #define fBREV_4(VAL) revbit32(VAL)
 #define fCL1_8(VAL) clo64(VAL)
 #define fCL1_4(VAL) clo32(VAL)
+#define fCL1_2(VAL) count_leading_ones_2(VAL)
 #define fINTERLEAVE(ODD, EVEN) interleave(ODD, EVEN)
 #define fDEINTERLEAVE(MIXED) deinterleave(MIXED)
 #define fHIDE(A) A
diff --git a/target/hexagon/mmvec/macros.h b/target/hexagon/mmvec/macros.h
new file mode 100644
index 0000000..52a5b87
--- /dev/null
+++ b/target/hexagon/mmvec/macros.h
@@ -0,0 +1,478 @@
+/*
+ *  Copyright(c) 2019-2021 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HEXAGON_MMVEC_MACROS_H
+#define HEXAGON_MMVEC_MACROS_H
+
+#include "qemu/osdep.h"
+#include "qemu/host-utils.h"
+#include "arch.h"
+#include "mmvec/system_ext_mmvec.h"
+
+#ifndef QEMU_GENERATE
+#define VdV      (*(MMVector *)(VdV_void))
+#define VsV      (*(MMVector *)(VsV_void))
+#define VuV      (*(MMVector *)(VuV_void))
+#define VvV      (*(MMVector *)(VvV_void))
+#define VwV      (*(MMVector *)(VwV_void))
+#define VxV      (*(MMVector *)(VxV_void))
+#define VyV      (*(MMVector *)(VyV_void))
+
+#define VddV     (*(MMVectorPair *)(VddV_void))
+#define VuuV     (*(MMVectorPair *)(VuuV_void))
+#define VvvV     (*(MMVectorPair *)(VvvV_void))
+#define VxxV     (*(MMVectorPair *)(VxxV_void))
+
+#define QeV      (*(MMQReg *)(QeV_void))
+#define QdV      (*(MMQReg *)(QdV_void))
+#define QsV      (*(MMQReg *)(QsV_void))
+#define QtV      (*(MMQReg *)(QtV_void))
+#define QuV      (*(MMQReg *)(QuV_void))
+#define QvV      (*(MMQReg *)(QvV_void))
+#define QxV      (*(MMQReg *)(QxV_void))
+#endif
+
+#define NEW_WRITTEN(NUM) ((env->VRegs_select >> (NUM)) & 1)
+#define TMP_WRITTEN(NUM) ((env->VRegs_updated_tmp >> (NUM)) & 1)
+
+#define TYPE_VOID(X)   __builtin_types_compatible_p(typeof(X), void *)
+#define TYPE_MMVECTOR(X)   __builtin_types_compatible_p(typeof(X), MMVector)
+
+#define LOG_VREG_WRITE_FUNC(X) \
+    __builtin_choose_expr(TYPE_VOID(X), \
+        log_vreg_write, \
+        __builtin_choose_expr(TYPE_MMVECTOR(X), \
+            log_mmvector_write, (void)0))
+#define LOG_VREG_WRITE(NUM, VAR, VNEW) \
+    LOG_VREG_WRITE_FUNC(VAR)(env, NUM, VAR, VNEW)
+
+#define READ_EXT_VREG(NUM, VAR, VTMP) \
+    do { \
+        VAR = ((NEW_WRITTEN(NUM)) ? env->future_VRegs[NUM] \
+                                  : env->VRegs[NUM]); \
+        VAR = ((TMP_WRITTEN(NUM)) ? env->tmp_VRegs[NUM] : VAR); \
+        if (VTMP == EXT_TMP) { \
+            if (env->VRegs_updated & ((VRegMask)1) << (NUM)) { \
+                VAR = env->future_VRegs[NUM]; \
+                env->VRegs_updated ^= ((VRegMask)1) << (NUM); \
+            } \
+        } \
+    } while (0)
+
+
+#define WRITE_EXT_VREG(NUM, VAR, VNEW)   LOG_VREG_WRITE(NUM, VAR, VNEW)
+
+#define LOG_VTCM_BYTE(VA, MASK, VAL, IDX) \
+    do { \
+        env->vtcm_log.data.ub[IDX] = (VAL); \
+        env->vtcm_log.mask.ub[IDX] = (MASK); \
+        env->vtcm_log.va[IDX] = (VA); \
+    } while (0)
+
+#define fNOTQ(VAL) \
+    ({ \
+        MMQReg _ret;  \
+        int _i_;  \
+        for (_i_ = 0; _i_ < fVECSIZE() / 64; _i_++) { \
+            _ret.ud[_i_] = ~VAL.ud[_i_]; \
+        } \
+        _ret;\
+     })
+#define fGETQBITS(REG, WIDTH, MASK, BITNO) \
+    ((MASK) & (REG.w[(BITNO) >> 5] >> ((BITNO) & 0x1f)))
+#define fGETQBIT(REG, BITNO) fGETQBITS(REG, 1, 1, BITNO)
+#define fGENMASKW(QREG, IDX) \
+    (((fGETQBIT(QREG, (IDX * 4 + 0)) ? 0xFF : 0x0) << 0)  | \
+     ((fGETQBIT(QREG, (IDX * 4 + 1)) ? 0xFF : 0x0) << 8)  | \
+     ((fGETQBIT(QREG, (IDX * 4 + 2)) ? 0xFF : 0x0) << 16) | \
+     ((fGETQBIT(QREG, (IDX * 4 + 3)) ? 0xFF : 0x0) << 24))
+#define fGETNIBBLE(IDX, SRC) (fSXTN(4, 8, (SRC >> (4 * IDX)) & 0xF))
+#define fGETCRUMB(IDX, SRC) (fSXTN(2, 8, (SRC >> (2 * IDX)) & 0x3))
+#define fGETCRUMB_SYMMETRIC(IDX, SRC) \
+    ((fGETCRUMB(IDX, SRC) >= 0 ? (2 - fGETCRUMB(IDX, SRC)) \
+                               : fGETCRUMB(IDX, SRC)))
+#define fGENMASKH(QREG, IDX) \
+    (((fGETQBIT(QREG, (IDX * 2 + 0)) ? 0xFF : 0x0) << 0) | \
+     ((fGETQBIT(QREG, (IDX * 2 + 1)) ? 0xFF : 0x0) << 8))
+#define fGETMASKW(VREG, QREG, IDX) (VREG.w[IDX] & fGENMASKW((QREG), IDX))
+#define fGETMASKH(VREG, QREG, IDX) (VREG.h[IDX] & fGENMASKH((QREG), IDX))
+#define fCONDMASK8(QREG, IDX, YESVAL, NOVAL) \
+    (fGETQBIT(QREG, IDX) ? (YESVAL) : (NOVAL))
+#define fCONDMASK16(QREG, IDX, YESVAL, NOVAL) \
+    ((fGENMASKH(QREG, IDX) & (YESVAL)) | \
+     (fGENMASKH(fNOTQ(QREG), IDX) & (NOVAL)))
+#define fCONDMASK32(QREG, IDX, YESVAL, NOVAL) \
+    ((fGENMASKW(QREG, IDX) & (YESVAL)) | \
+     (fGENMASKW(fNOTQ(QREG), IDX) & (NOVAL)))
+#define fSETQBITS(REG, WIDTH, MASK, BITNO, VAL) \
+    do { \
+        uint32_t __TMP = (VAL); \
+        REG.w[(BITNO) >> 5] &= ~((MASK) << ((BITNO) & 0x1f)); \
+        REG.w[(BITNO) >> 5] |= (((__TMP) & (MASK)) << ((BITNO) & 0x1f)); \
+    } while (0)
+#define fSETQBIT(REG, BITNO, VAL) fSETQBITS(REG, 1, 1, BITNO, VAL)
+#define fVBYTES() (fVECSIZE())
+#define fVALIGN(ADDR, LOG2_ALIGNMENT) (ADDR = ADDR & ~(LOG2_ALIGNMENT - 1))
+#define fVLASTBYTE(ADDR, LOG2_ALIGNMENT) (ADDR = ADDR | (LOG2_ALIGNMENT - 1))
+#define fVELEM(WIDTH) ((fVECSIZE() * 8) / WIDTH)
+#define fVECLOGSIZE() (7)
+#define fVECSIZE() (1 << fVECLOGSIZE())
+#define fSWAPB(A, B) do { uint8_t tmp = A; A = B; B = tmp; } while (0)
+static inline MMVector mmvec_zero_vector(void)
+{
+    MMVector ret;
+    memset(&ret, 0, sizeof(ret));
+    return ret;
+}
+#define fVZERO() mmvec_zero_vector()
+#define fNEWVREG(VNUM) \
+    ((env->VRegs_updated & (((VRegMask)1) << VNUM)) ? env->future_VRegs[VNUM] \
+                                                    : mmvec_zero_vector())
+#define fV_AL_CHECK(EA, MASK) \
+    if ((EA) & (MASK)) { \
+        warn("aligning misaligned vector. EA=%08x", (EA)); \
+    }
+#define fSCATTER_INIT(REGION_START, LENGTH, ELEMENT_SIZE) \
+    mem_vector_scatter_init(env, slot, REGION_START, LENGTH, ELEMENT_SIZE)
+#define fGATHER_INIT(REGION_START, LENGTH, ELEMENT_SIZE) \
+    mem_vector_gather_init(env, slot, REGION_START, LENGTH, ELEMENT_SIZE)
+#define fSCATTER_FINISH(OP)
+#define fGATHER_FINISH()
+#define fLOG_SCATTER_OP(SIZE) \
+    do { \
+        env->vtcm_log.op = true; \
+        env->vtcm_log.op_size = SIZE; \
+    } while (0)
+#define fVLOG_VTCM_WORD_INCREMENT(EA, OFFSET, INC, IDX, ALIGNMENT, LEN) \
+    do { \
+        int log_byte = 0; \
+        target_ulong va = EA; \
+        target_ulong va_high = EA + LEN; \
+        for (int i0 = 0; i0 < 4; i0++) { \
+            log_byte = (va + i0) <= va_high; \
+            LOG_VTCM_BYTE(va + i0, log_byte, INC. ub[4 * IDX + i0], \
+                          4 * IDX + i0); \
+        } \
+    } while (0)
+#define fVLOG_VTCM_HALFWORD_INCREMENT(EA, OFFSET, INC, IDX, ALIGNMENT, LEN) \
+    do { \
+        int log_byte = 0; \
+        target_ulong va = EA; \
+        target_ulong va_high = EA + LEN; \
+        for (int i0 = 0; i0 < 2; i0++) { \
+            log_byte = (va + i0) <= va_high; \
+            LOG_VTCM_BYTE(va + i0, log_byte, INC.ub[2 * IDX + i0], \
+                          2 * IDX + i0); \
+        } \
+    } while (0)
+
+#define fVLOG_VTCM_HALFWORD_INCREMENT_DV(EA, OFFSET, INC, IDX, IDX2, IDX_H, \
+                                         ALIGNMENT, LEN) \
+    do { \
+        int log_byte = 0; \
+        target_ulong va = EA; \
+        target_ulong va_high = EA + LEN; \
+        for (int i0 = 0; i0 < 2; i0++) { \
+            log_byte = (va + i0) <= va_high; \
+            LOG_VTCM_BYTE(va + i0, log_byte, INC.ub[2 * IDX + i0], \
+                          2 * IDX + i0); \
+        } \
+    } while (0)
+
+/* NOTE - Will this always be tmp_VRegs[0]; */
+#define GATHER_FUNCTION(EA, OFFSET, IDX, LEN, ELEMENT_SIZE, BANK_IDX, QVAL) \
+    do { \
+        int i0; \
+        target_ulong va = EA; \
+        target_ulong va_high = EA + LEN; \
+        int log_bank = 0; \
+        int log_byte = 0; \
+        for (i0 = 0; i0 < ELEMENT_SIZE; i0++) { \
+            log_byte = ((va + i0) <= va_high) && QVAL; \
+            log_bank |= (log_byte << i0); \
+            uint8_t B; \
+            get_user_u8(B, EA + i0); \
+            env->tmp_VRegs[0].ub[ELEMENT_SIZE * IDX + i0] = B; \
+            LOG_VTCM_BYTE(va + i0, log_byte, B, ELEMENT_SIZE * IDX + i0); \
+        } \
+    } while (0)
+#define fVLOG_VTCM_GATHER_WORD(EA, OFFSET, IDX, LEN) \
+    do { \
+        GATHER_FUNCTION(EA, OFFSET, IDX, LEN, 4, IDX, 1); \
+    } while (0)
+#define fVLOG_VTCM_GATHER_HALFWORD(EA, OFFSET, IDX, LEN) \
+    do { \
+        GATHER_FUNCTION(EA, OFFSET, IDX, LEN, 2, IDX, 1); \
+    } while (0)
+#define fVLOG_VTCM_GATHER_HALFWORD_DV(EA, OFFSET, IDX, IDX2, IDX_H, LEN) \
+    do { \
+        GATHER_FUNCTION(EA, OFFSET, IDX, LEN, 2, (2 * IDX2 + IDX_H), 1); \
+    } while (0)
+#define fVLOG_VTCM_GATHER_WORDQ(EA, OFFSET, IDX, Q, LEN) \
+    do { \
+        GATHER_FUNCTION(EA, OFFSET, IDX, LEN, 4, IDX, \
+                        fGETQBIT(QsV, 4 * IDX + i0)); \
+    } while (0)
+#define fVLOG_VTCM_GATHER_HALFWORDQ(EA, OFFSET, IDX, Q, LEN) \
+    do { \
+        GATHER_FUNCTION(EA, OFFSET, IDX, LEN, 2, IDX, \
+                        fGETQBIT(QsV, 2 * IDX + i0)); \
+    } while (0)
+#define fVLOG_VTCM_GATHER_HALFWORDQ_DV(EA, OFFSET, IDX, IDX2, IDX_H, Q, LEN) \
+    do { \
+        GATHER_FUNCTION(EA, OFFSET, IDX, LEN, 2, (2 * IDX2 + IDX_H), \
+                        fGETQBIT(QsV, 2 * IDX + i0)); \
+    } while (0)
+#define SCATTER_OP_WRITE_TO_MEM(TYPE) \
+    do { \
+        for (int i = 0; i < env->vtcm_log.size; i += sizeof(TYPE)) { \
+            if (env->vtcm_log.mask.ub[i] != 0) { \
+                TYPE dst = 0; \
+                TYPE inc = 0; \
+                for (int j = 0; j < sizeof(TYPE); j++) { \
+                    uint8_t val; \
+                    get_user_u8(val, env->vtcm_log.va[i + j]); \
+                    dst |= val << (8 * j); \
+                    inc |= env->vtcm_log.data.ub[j + i] << (8 * j); \
+                    env->vtcm_log.mask.ub[j + i] = 0; \
+                    env->vtcm_log.data.ub[j + i] = 0; \
+                } \
+                dst += inc; \
+                for (int j = 0; j < sizeof(TYPE); j++) { \
+                    put_user_u8((dst >> (8 * j)) & 0xFF, \
+                        env->vtcm_log.va[i + j]);  \
+                } \
+            } \
+        } \
+    } while (0)
+#define SCATTER_FUNCTION(EA, OFFSET, IDX, LEN, ELEM_SIZE, BANK_IDX, QVAL, IN) \
+    do { \
+        int i0; \
+        target_ulong va = EA; \
+        target_ulong va_high = EA + LEN; \
+        int log_bank = 0; \
+        int log_byte = 0; \
+        for (i0 = 0; i0 < ELEM_SIZE; i0++) { \
+            log_byte = ((va + i0) <= va_high) && QVAL; \
+            log_bank |= (log_byte << i0); \
+            LOG_VTCM_BYTE(va + i0, log_byte, IN.ub[ELEM_SIZE * IDX + i0], \
+                          ELEM_SIZE * IDX + i0); \
+        } \
+    } while (0)
+#define fVLOG_VTCM_HALFWORD(EA, OFFSET, IN, IDX, LEN) \
+    do { \
+        SCATTER_FUNCTION(EA, OFFSET, IDX, LEN, 2, IDX, 1, IN); \
+    } while (0)
+#define fVLOG_VTCM_WORD(EA, OFFSET, IN, IDX, LEN) \
+    do { \
+        SCATTER_FUNCTION(EA, OFFSET, IDX, LEN, 4, IDX, 1, IN); \
+    } while (0)
+#define fVLOG_VTCM_HALFWORDQ(EA, OFFSET, IN, IDX, Q, LEN) \
+    do { \
+        SCATTER_FUNCTION(EA, OFFSET, IDX, LEN, 2, IDX, \
+                         fGETQBIT(QsV, 2 * IDX + i0), IN); \
+    } while (0)
+#define fVLOG_VTCM_WORDQ(EA, OFFSET, IN, IDX, Q, LEN) \
+    do { \
+        SCATTER_FUNCTION(EA, OFFSET, IDX, LEN, 4, IDX, \
+                         fGETQBIT(QsV, 4 * IDX + i0), IN); \
+    } while (0)
+#define fVLOG_VTCM_HALFWORD_DV(EA, OFFSET, IN, IDX, IDX2, IDX_H, LEN) \
+    do { \
+        SCATTER_FUNCTION(EA, OFFSET, IDX, LEN, 2, \
+                         (2 * IDX2 + IDX_H), 1, IN); \
+    } while (0)
+#define fVLOG_VTCM_HALFWORDQ_DV(EA, OFFSET, IN, IDX, Q, IDX2, IDX_H, LEN) \
+    do { \
+        SCATTER_FUNCTION(EA, OFFSET, IDX, LEN, 2, (2 * IDX2 + IDX_H), \
+                         fGETQBIT(QsV, 2 * IDX + i0), IN); \
+    } while (0)
+#define fSTORERELEASE(EA, TYPE) \
+    do { \
+        fV_AL_CHECK(EA, fVECSIZE() - 1); \
+    } while (0)
+#define fLOADMMV_AL(EA, ALIGNMENT, LEN, DST) \
+    do { \
+        fV_AL_CHECK(EA, ALIGNMENT - 1); \
+        mem_load_vector(env, EA & ~(ALIGNMENT - 1), LEN, &DST.ub[0]); \
+    } while (0)
+#define fLOADMMV(EA, DST) fLOADMMV_AL(EA, fVECSIZE(), fVECSIZE(), DST)
+#define fLOADMMVU_AL(EA, ALIGNMENT, LEN, DST) \
+    do { \
+        uint32_t size2 = (EA) & (ALIGNMENT - 1); \
+        uint32_t size1 = LEN - size2; \
+        mem_load_vector(env, EA + size1, size2, &DST.ub[size1]); \
+        mem_load_vector(env, EA, size1, &DST.ub[0]); \
+    } while (0)
+#define fLOADMMVU(EA, DST) \
+    do { \
+        if ((EA & (fVECSIZE() - 1)) == 0) { \
+            fLOADMMV_AL(EA, fVECSIZE(), fVECSIZE(), DST); \
+        } else { \
+            fLOADMMVU_AL(EA, fVECSIZE(), fVECSIZE(), DST); \
+        } \
+    } while (0)
+#define fSTOREMMV_AL(EA, ALIGNMENT, LEN, SRC) \
+    do  { \
+        fV_AL_CHECK(EA, ALIGNMENT - 1); \
+        mem_store_vector(env, EA & ~(ALIGNMENT - 1), slot, LEN, \
+                         &SRC.ub[0], NULL, false); \
+    } while (0)
+#define fSTOREMMV(EA, SRC) fSTOREMMV_AL(EA, fVECSIZE(), fVECSIZE(), SRC)
+#define fSTOREMMVQ_AL(EA, ALIGNMENT, LEN, SRC, MASK) \
+    do { \
+        MMVector maskvec; \
+        int i; \
+        for (i = 0; i < fVECSIZE(); i++) { \
+            maskvec.ub[i] = fGETQBIT(MASK, i); \
+        } \
+        mem_store_vector(env, EA & ~(ALIGNMENT - 1), slot, LEN, \
+                         &SRC.ub[0], &maskvec.ub[0], false); \
+    } while (0)
+#define fSTOREMMVQ(EA, SRC, MASK) \
+    fSTOREMMVQ_AL(EA, fVECSIZE(), fVECSIZE(), SRC, MASK)
+#define fSTOREMMVNQ_AL(EA, ALIGNMENT, LEN, SRC, MASK) \
+    do { \
+        MMVector maskvec; \
+        int i; \
+        for (i = 0; i < fVECSIZE(); i++) { \
+            maskvec.ub[i] = fGETQBIT(MASK, i); \
+        } \
+        fV_AL_CHECK(EA, ALIGNMENT - 1); \
+        mem_store_vector(env, EA & ~(ALIGNMENT - 1), slot, LEN, \
+                         &SRC.ub[0], &maskvec.ub[0], true); \
+    } while (0)
+#define fSTOREMMVNQ(EA, SRC, MASK) \
+    fSTOREMMVNQ_AL(EA, fVECSIZE(), fVECSIZE(), SRC, MASK)
+#define fSTOREMMVU_AL(EA, ALIGNMENT, LEN, SRC) \
+    do { \
+        uint32_t size1 = ALIGNMENT - ((EA) & (ALIGNMENT - 1)); \
+        uint32_t size2; \
+        if (size1 > LEN) { \
+            size1 = LEN; \
+        } \
+        size2 = LEN - size1; \
+        mem_store_vector(env, EA + size1, 1, size2, \
+                         &SRC.ub[size1], NULL, false); \
+        mem_store_vector(env, EA, 0, size1, &SRC.ub[0], NULL, false); \
+    } while (0)
+#define fSTOREMMVU(EA, SRC) \
+    do { \
+        if ((EA & (fVECSIZE() - 1)) == 0) { \
+            fSTOREMMV_AL(EA, fVECSIZE(), fVECSIZE(), SRC); \
+        } else { \
+            fSTOREMMVU_AL(EA, fVECSIZE(), fVECSIZE(), SRC); \
+        } \
+    } while (0)
+#define fSTOREMMVQU_AL(EA, ALIGNMENT, LEN, SRC, MASK) \
+    do { \
+        uint32_t size1 = ALIGNMENT - ((EA) & (ALIGNMENT - 1)); \
+        uint32_t size2; \
+        MMVector maskvec; \
+        int i; \
+        for (i = 0; i < fVECSIZE(); i++) { \
+            maskvec.ub[i] = fGETQBIT(MASK, i); \
+        } \
+        if (size1 > LEN) { \
+            size1 = LEN; \
+        } \
+        size2 = LEN - size1; \
+        mem_store_vector(env, EA + size1, 1, size2, \
+                         &SRC.ub[size1], &maskvec.ub[size1], false); \
+        mem_store_vector(env, EA, size1, &SRC.ub[0], &maskvec.ub[0], false); \
+    } while (0)
+#define fSTOREMMVNQU_AL(EA, ALIGNMENT, LEN, SRC, MASK) \
+    do { \
+        uint32_t size1 = ALIGNMENT - ((EA) & (ALIGNMENT - 1)); \
+        uint32_t size2; \
+        MMVector maskvec; \
+        int i; \
+        for (i = 0; i < fVECSIZE(); i++) { \
+            maskvec.ub[i] = fGETQBIT(MASK, i); \
+        } \
+        if (size1 > LEN) { \
+            size1 = LEN; \
+        } \
+        size2 = LEN - size1; \
+        mem_store_vector(env, EA + size1, 1, size2, \
+                         &SRC.ub[size1], &maskvec.ub[size1], true); \
+        mem_store_vector(env, EA, 0, size1, &SRC.ub[0], \
+                         &maskvec.ub[0], true); \
+    } while (0)
+#define fVFOREACH(WIDTH, VAR) for (VAR = 0; VAR < fVELEM(WIDTH); VAR++)
+#define fVARRAY_ELEMENT_ACCESS(ARRAY, TYPE, INDEX) \
+    ARRAY.v[(INDEX) / (fVECSIZE() / (sizeof(ARRAY.TYPE[0])))].TYPE[(INDEX) % \
+    (fVECSIZE() / (sizeof(ARRAY.TYPE[0])))]
+
+#ifndef QEMU_GENERATE
+/* Grabs the .tmp data, wherever it is, and clears the .tmp status */
+/* Used for vhist */
+static inline MMVector mmvec_vtmp_data(CPUHexagonState *env)
+{
+    VRegMask vsel = env->VRegs_updated_tmp;
+    MMVector ret;
+    int idx = clo32(~revbit32(vsel));
+    if (vsel == 0) {
+        printf("[UNDEFINED] no .tmp load when implicitly required...");
+    }
+    ret = env->tmp_VRegs[idx];
+    env->VRegs_updated_tmp = 0;
+    return ret;
+}
+#define fTMPVDATA() mmvec_vtmp_data(env)
+#endif
+#define fVSATDW(U, V) fVSATW(((((long long)U) << 32) | fZXTN(32, 64, V)))
+#define fVASL_SATHI(U, V) fVSATW(((U) << 1) | ((V) >> 31))
+#define fVUADDSAT(WIDTH, U, V) \
+    fVSATUN(WIDTH, fZXTN(WIDTH, 2 * WIDTH, U) + fZXTN(WIDTH, 2 * WIDTH, V))
+#define fVSADDSAT(WIDTH, U, V) \
+    fVSATN(WIDTH, fSXTN(WIDTH, 2 * WIDTH, U) + fSXTN(WIDTH, 2 * WIDTH, V))
+#define fVUSUBSAT(WIDTH, U, V) \
+    fVSATUN(WIDTH, fZXTN(WIDTH, 2 * WIDTH, U) - fZXTN(WIDTH, 2 * WIDTH, V))
+#define fVSSUBSAT(WIDTH, U, V) \
+    fVSATN(WIDTH, fSXTN(WIDTH, 2 * WIDTH, U) - fSXTN(WIDTH, 2 * WIDTH, V))
+#define fVAVGU(WIDTH, U, V) \
+    ((fZXTN(WIDTH, 2 * WIDTH, U) + fZXTN(WIDTH, 2 * WIDTH, V)) >> 1)
+#define fVAVGURND(WIDTH, U, V) \
+    ((fZXTN(WIDTH, 2 * WIDTH, U) + fZXTN(WIDTH, 2 * WIDTH, V) + 1) >> 1)
+#define fVNAVGU(WIDTH, U, V) \
+    ((fZXTN(WIDTH, 2 * WIDTH, U) - fZXTN(WIDTH, 2 * WIDTH, V)) >> 1)
+#define fVNAVGURNDSAT(WIDTH, U, V) \
+    fVSATUN(WIDTH, ((fZXTN(WIDTH, 2 * WIDTH, U) - \
+                     fZXTN(WIDTH, 2 * WIDTH, V) + 1) >> 1))
+#define fVAVGS(WIDTH, U, V) \
+    ((fSXTN(WIDTH, 2 * WIDTH, U) + fSXTN(WIDTH, 2 * WIDTH, V)) >> 1)
+#define fVAVGSRND(WIDTH, U, V) \
+    ((fSXTN(WIDTH, 2 * WIDTH, U) + fSXTN(WIDTH, 2 * WIDTH, V) + 1) >> 1)
+#define fVNAVGS(WIDTH, U, V) \
+    ((fSXTN(WIDTH, 2 * WIDTH, U) - fSXTN(WIDTH, 2 * WIDTH, V)) >> 1)
+#define fVNAVGSRND(WIDTH, U, V) \
+    ((fSXTN(WIDTH, 2 * WIDTH, U) - fSXTN(WIDTH, 2 * WIDTH, V) + 1) >> 1)
+#define fVNAVGSRNDSAT(WIDTH, U, V) \
+    fVSATN(WIDTH, ((fSXTN(WIDTH, 2 * WIDTH, U) - \
+                    fSXTN(WIDTH, 2 * WIDTH, V) + 1) >> 1))
+#define fVNOROUND(VAL, SHAMT) VAL
+#define fVNOSAT(VAL) VAL
+#define fVROUND(VAL, SHAMT) \
+    ((VAL) + (((SHAMT) > 0) ? (1LL << ((SHAMT) - 1)) : 0))
+#define fCARRY_FROM_ADD32(A, B, C) \
+    (((fZXTN(32, 64, A) + fZXTN(32, 64, B) + C) >> 32) & 1)
+#define fUARCH_NOTE_PUMP_4X()
+#define fUARCH_NOTE_PUMP_2X()
+
+#define IV1DEAD()
+#endif
-- 
2.7.4


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 07/20] Hexagon HVX (target/hexagon) import macro definitions
  2021-07-05 23:34 [PATCH 00/20] Hexagon HVX (target/hexagon) patch series Taylor Simpson
                   ` (5 preceding siblings ...)
  2021-07-05 23:34 ` [PATCH 06/20] Hexagon HVX (target/hexagon) macros Taylor Simpson
@ 2021-07-05 23:34 ` Taylor Simpson
  2021-07-05 23:34 ` [PATCH 08/20] Hexagon HVX (target/hexagon) semantics generator Taylor Simpson
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 40+ messages in thread
From: Taylor Simpson @ 2021-07-05 23:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: ale, peter.maydell, bcain, richard.henderson, tsimpson, philmd

Imported from the Hexagon architecture library
    imported/allext_macros.def       Top level macro include for all extensions
    imported/macros.def              Scalar core macros (some HVX here)
    imported/mmvec/macros.def        HVX macro definitions
The macro definition files specify instruction attributes that are applied
to each instruction that reverences the macro.

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/imported/allext_macros.def |  25 +
 target/hexagon/imported/macros.def        |  88 ++++
 target/hexagon/imported/mmvec/macros.def  | 842 ++++++++++++++++++++++++++++++
 3 files changed, 955 insertions(+)
 create mode 100644 target/hexagon/imported/allext_macros.def
 create mode 100755 target/hexagon/imported/mmvec/macros.def

diff --git a/target/hexagon/imported/allext_macros.def b/target/hexagon/imported/allext_macros.def
new file mode 100644
index 0000000..9c91199
--- /dev/null
+++ b/target/hexagon/imported/allext_macros.def
@@ -0,0 +1,25 @@
+/*
+ *  Copyright(c) 2019-2021 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+/*
+ * Top level file for all instruction set extensions
+ */
+#define EXTNAME mmvec
+#define EXTSTR "mmvec"
+#include "mmvec/macros.def"
+#undef EXTNAME
+#undef EXTSTR
diff --git a/target/hexagon/imported/macros.def b/target/hexagon/imported/macros.def
index 32ed3bf..e23f915 100755
--- a/target/hexagon/imported/macros.def
+++ b/target/hexagon/imported/macros.def
@@ -177,6 +177,12 @@ DEF_MACRO(
 )
 
 DEF_MACRO(
+    fVSATUVALN,
+    ({ ((VAL) < 0) ? 0 : ((1LL<<(N))-1);}),
+    ()
+)
+
+DEF_MACRO(
     fSATUVALN,
     ({fSET_OVERFLOW(); ((VAL) < 0) ? 0 : ((1LL<<(N))-1);}),
     ()
@@ -189,6 +195,12 @@ DEF_MACRO(
 )
 
 DEF_MACRO(
+    fVSATVALN,
+    ({((VAL) < 0) ? (-(1LL<<((N)-1))) : ((1LL<<((N)-1))-1);}),
+    ()
+)
+
+DEF_MACRO(
     fZXTN, /* macro name */
     ((VAL) & ((1LL<<(N))-1)),
     /* attribs */
@@ -205,6 +217,11 @@ DEF_MACRO(
     ((fSXTN(N,64,VAL) == (VAL)) ? (VAL) : fSATVALN(N,VAL)),
     ()
 )
+DEF_MACRO(
+    fVSATN,
+    ((fSXTN(N,64,VAL) == (VAL)) ? (VAL) : fVSATVALN(N,VAL)),
+    ()
+)
 
 DEF_MACRO(
     fADDSAT64,
@@ -235,6 +252,12 @@ DEF_MACRO(
 )
 
 DEF_MACRO(
+    fVSATUN,
+    ((fZXTN(N,64,VAL) == (VAL)) ? (VAL) : fVSATUVALN(N,VAL)),
+    ()
+)
+
+DEF_MACRO(
     fSATUN,
     ((fZXTN(N,64,VAL) == (VAL)) ? (VAL) : fSATUVALN(N,VAL)),
     ()
@@ -254,6 +277,19 @@ DEF_MACRO(
 )
 
 DEF_MACRO(
+    fVSATH,
+    (fVSATN(16,VAL)),
+    ()
+)
+
+DEF_MACRO(
+    fVSATUH,
+    (fVSATUN(16,VAL)),
+    ()
+)
+
+
+DEF_MACRO(
     fSATUB,
     (fSATUN(8,VAL)),
     ()
@@ -265,6 +301,20 @@ DEF_MACRO(
 )
 
 
+DEF_MACRO(
+    fVSATUB,
+    (fVSATUN(8,VAL)),
+    ()
+)
+DEF_MACRO(
+    fVSATB,
+    (fVSATN(8,VAL)),
+    ()
+)
+
+
+
+
 /*************************************/
 /* immediate extension               */
 /*************************************/
@@ -557,6 +607,18 @@ DEF_MACRO(
 )
 
 DEF_MACRO(
+    fCAST2_2s, /* macro name */
+    ((size2s_t)(A)),
+    /* optional attributes */
+)
+
+DEF_MACRO(
+    fCAST2_2u, /* macro name */
+    ((size2u_t)(A)),
+    /* optional attributes */
+)
+
+DEF_MACRO(
     fCAST4_4s, /* macro name */
     ((size4s_t)(A)),
     /* optional attributes */
@@ -876,6 +938,11 @@ DEF_MACRO(
     (((size8s_t)(A))<<N),
     /* optional attributes */
 )
+DEF_MACRO(
+    fVSATW, /* saturating to 32-bits*/
+    fVSATN(32,((long long)A)),
+    ()
+)
 
 DEF_MACRO(
     fSATW, /* saturating to 32-bits*/
@@ -884,6 +951,12 @@ DEF_MACRO(
 )
 
 DEF_MACRO(
+    fVSAT, /* saturating to 32-bits*/
+    fVSATN(32,(A)),
+    ()
+)
+
+DEF_MACRO(
     fSAT, /* saturating to 32-bits*/
     fSATN(32,(A)),
     ()
@@ -1389,6 +1462,11 @@ DEF_MACRO(fSETBITS,
 /*************************************/
 /* Used for parity, etc........      */
 /*************************************/
+DEF_MACRO(fCOUNTONES_2,
+    count_ones_2(VAL),
+    /* nothing */
+)
+
 DEF_MACRO(fCOUNTONES_4,
     count_ones_4(VAL),
     /* nothing */
@@ -1419,6 +1497,11 @@ DEF_MACRO(fCL1_4,
     /* nothing */
 )
 
+DEF_MACRO(fCL1_2,
+    count_leading_ones_2(VAL),
+    /* nothing */
+)
+
 DEF_MACRO(fINTERLEAVE,
     interleave(ODD,EVEN),
     /* nothing */
@@ -1576,3 +1659,8 @@ DEF_MACRO(fBRANCH_SPECULATE_STALL,
     },
     ()
 )
+
+DEF_MACRO(IV1DEAD,
+    ,
+    ()
+)
diff --git a/target/hexagon/imported/mmvec/macros.def b/target/hexagon/imported/mmvec/macros.def
new file mode 100755
index 0000000..7e5438a
--- /dev/null
+++ b/target/hexagon/imported/mmvec/macros.def
@@ -0,0 +1,842 @@
+/*
+ *  Copyright(c) 2019-2021 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+DEF_MACRO(fDUMPQ,
+	do {
+		printf(STR ":" #REG ": 0x%016llx\n",REG.ud[0]);
+	} while (0),
+	()
+)
+
+DEF_MACRO(fUSE_LOOKUP_ADDRESS_BY_REV,
+	PROC->arch_proc_options->mmvec_use_full_va_for_lookup,
+	()
+)
+
+DEF_MACRO(fUSE_LOOKUP_ADDRESS,
+	1,
+	()
+)
+
+DEF_MACRO(fNOTQ,
+	({mmqreg_t _ret = {0}; int _i_; for (_i_ = 0; _i_ < fVECSIZE()/64; _i_++) _ret.ud[_i_] = ~VAL.ud[_i_]; _ret;}),
+	()
+)
+
+DEF_MACRO(fGETQBITS,
+	((MASK) & (REG.w[(BITNO)>>5] >> ((BITNO) & 0x1f))),
+	()
+)
+
+DEF_MACRO(fGETQBIT,
+	fGETQBITS(REG,1,1,BITNO),
+	()
+)
+
+DEF_MACRO(fGENMASKW,
+	(((fGETQBIT(QREG,(IDX*4+0)) ? 0xFF : 0x0) << 0)
+	|((fGETQBIT(QREG,(IDX*4+1)) ? 0xFF : 0x0) << 8)
+	|((fGETQBIT(QREG,(IDX*4+2)) ? 0xFF : 0x0) << 16)
+	|((fGETQBIT(QREG,(IDX*4+3)) ? 0xFF : 0x0) << 24)),
+	()
+)
+DEF_MACRO(fGET10BIT,
+	{
+		COE = (((((fGETUBYTE(3,VAL) >> (2 * POS)) & 3) << 8) | fGETUBYTE(POS,VAL)) << 6);
+		COE >>= 6;
+	},
+	()
+)
+
+DEF_MACRO(fVMAX,
+	(X>Y) ? X : Y,
+	()
+)
+
+
+DEF_MACRO(fGETNIBBLE,
+    ( fSXTN(4,8,(SRC >> (4*IDX)) & 0xF) ),
+    ()
+)
+
+DEF_MACRO(fGETCRUMB,
+    ( fSXTN(2,8,(SRC >> (2*IDX)) & 0x3) ),
+    ()
+)
+
+DEF_MACRO(fGETCRUMB_SYMMETRIC,
+    ( (fGETCRUMB(IDX,SRC)>=0 ? (2-fGETCRUMB(IDX,SRC)) : fGETCRUMB(IDX,SRC) ) ),
+    ()
+)
+
+#define ZERO_OFFSET_2B +
+
+DEF_MACRO(fGENMASKH,
+	(((fGETQBIT(QREG,(IDX*2+0)) ? 0xFF : 0x0) << 0)
+	|((fGETQBIT(QREG,(IDX*2+1)) ? 0xFF : 0x0) << 8)),
+	()
+)
+
+DEF_MACRO(fGETMASKW,
+	(VREG.w[IDX] & fGENMASKW((QREG),IDX)),
+	()
+)
+
+DEF_MACRO(fGETMASKH,
+	(VREG.h[IDX] & fGENMASKH((QREG),IDX)),
+	()
+)
+
+DEF_MACRO(fCONDMASK8,
+	(fGETQBIT(QREG,IDX) ? (YESVAL) : (NOVAL)),
+	()
+)
+
+DEF_MACRO(fCONDMASK16,
+	((fGENMASKH(QREG,IDX) & (YESVAL)) | (fGENMASKH(fNOTQ(QREG),IDX) & (NOVAL))),
+	()
+)
+
+DEF_MACRO(fCONDMASK32,
+	((fGENMASKW(QREG,IDX) & (YESVAL)) | (fGENMASKW(fNOTQ(QREG),IDX) & (NOVAL))),
+	()
+)
+
+
+DEF_MACRO(fSETQBITS,
+	do {
+		size4u_t __TMP = (VAL);
+		REG.w[(BITNO)>>5] &= ~((MASK) << ((BITNO) & 0x1f));
+		REG.w[(BITNO)>>5] |= (((__TMP) & (MASK)) << ((BITNO) & 0x1f));
+	} while (0),
+	()
+)
+
+DEF_MACRO(fSETQBIT,
+	fSETQBITS(REG,1,1,BITNO,VAL),
+	()
+)
+
+DEF_MACRO(fVBYTES,
+	(fVECSIZE()),
+	()
+)
+
+DEF_MACRO(fVHALVES,
+	(fVECSIZE()/2),
+	()
+)
+
+DEF_MACRO(fVWORDS,
+	(fVECSIZE()/4),
+	()
+)
+
+DEF_MACRO(fVDWORDS,
+	(fVECSIZE()/8),
+	()
+)
+
+DEF_MACRO(fVALIGN,
+    ( ADDR = ADDR & ~(LOG2_ALIGNMENT-1)),
+    ()
+)
+
+DEF_MACRO(fVLASTBYTE,
+    ( ADDR = ADDR | (LOG2_ALIGNMENT-1)),
+    ()
+)
+
+
+DEF_MACRO(fVELEM,
+    ((fVECSIZE()*8)/WIDTH),
+    ()
+)
+
+DEF_MACRO(fVECLOGSIZE,
+    (mmvec_current_veclogsize(thread)),
+    ()
+)
+
+DEF_MACRO(fVECSIZE,
+    (1<<fVECLOGSIZE()),
+    ()
+)
+
+DEF_MACRO(fSWAPB,
+    {
+		size1u_t tmp = A;
+		A = B;
+		B = tmp;
+	},
+    /* NOTHING */
+)
+
+DEF_MACRO(
+	fVZERO,
+	mmvec_zero_vector(),
+	()
+)
+
+DEF_MACRO(
+    fNEWVREG,
+    ((THREAD2STRUCT->VRegs_updated & (((VRegMask)1)<<VNUM)) ? THREAD2STRUCT->future_VRegs[VNUM] : mmvec_zero_vector()),
+    (A_DOTNEWVALUE,A_RESTRICT_SLOT0ONLY)
+)
+
+DEF_MACRO(
+	fV_AL_CHECK,
+	if ((EA) & (MASK)) {
+		warn("aligning misaligned vector. PC=%08x EA=%08x",thread->Regs[REG_PC],(EA));
+	},
+	()
+)
+DEF_MACRO(fSCATTER_INIT,
+    {
+    mem_vector_scatter_init(thread, insn,   REGION_START, LENGTH, ELEMENT_SIZE);
+	if (EXCEPTION_DETECTED) return;
+    },
+    (A_STORE,A_MEMLIKE,A_RESTRICT_SLOT0ONLY)
+)
+
+DEF_MACRO(fGATHER_INIT,
+    {
+    mem_vector_gather_init(thread, insn,   REGION_START, LENGTH, ELEMENT_SIZE);
+	if (EXCEPTION_DETECTED) return;
+    },
+    (A_LOAD,A_MEMLIKE,A_RESTRICT_SLOT1ONLY)
+)
+
+DEF_MACRO(fSCATTER_FINISH,
+    {
+	if (EXCEPTION_DETECTED) return;
+    mem_vector_scatter_finish(thread, insn, OP);
+    },
+    ()
+)
+
+DEF_MACRO(fGATHER_FINISH,
+    {
+	if (EXCEPTION_DETECTED) return;
+    mem_vector_gather_finish(thread, insn);
+    },
+    ()
+)
+
+
+DEF_MACRO(CHECK_VTCM_PAGE,
+     {
+        int slot = insn->slot;
+        paddr_t pa = thread->mem_access[slot].paddr+OFFSET;
+        pa = pa & ~(ALIGNMENT-1);
+        FLAG = (pa < (thread->mem_access[slot].paddr+LENGTH));
+     },
+    ()
+)
+DEF_MACRO(COUNT_OUT_OF_BOUNDS,
+     {
+        if (!FLAG)
+        {
+               THREAD2STRUCT->vtcm_log.oob_access += SIZE;
+               warn("Scatter/Gather out of bounds of region");
+        }
+     },
+    ()
+)
+
+DEF_MACRO(fLOG_SCATTER_OP,
+    {
+        // Log the size and indicate that the extension ext.c file needs to increment right before memory write
+        THREAD2STRUCT->vtcm_log.op = 1;
+        THREAD2STRUCT->vtcm_log.op_size = SIZE;
+    },
+    ()
+)
+
+
+
+DEF_MACRO(fVLOG_VTCM_WORD_INCREMENT,
+    {
+        int slot = insn->slot;
+        int log_bank = 0;
+        int log_byte =0;
+        paddr_t pa = thread->mem_access[slot].paddr+(OFFSET & ~(ALIGNMENT-1));
+        paddr_t pa_high = thread->mem_access[slot].paddr+LEN;
+        for(int i0 = 0; i0 < 4; i0++)
+        {
+            log_byte =  ((OFFSET>=0)&&((pa+i0)<=pa_high));
+            log_bank |= (log_byte<<i0);
+            LOG_VTCM_BYTE(pa+i0,log_byte,INC.ub[4*IDX+i0],4*IDX+i0);
+        }
+        { LOG_VTCM_BANK(pa, log_bank, IDX); }
+    },
+    ()
+)
+
+DEF_MACRO(fVLOG_VTCM_HALFWORD_INCREMENT,
+    {
+        int slot = insn->slot;
+        int log_bank = 0;
+        int log_byte = 0;
+        paddr_t pa = thread->mem_access[slot].paddr+(OFFSET & ~(ALIGNMENT-1));
+        paddr_t pa_high = thread->mem_access[slot].paddr+LEN;
+        for(int i0 = 0; i0 < 2; i0++) {
+            log_byte =  ((OFFSET>=0)&&((pa+i0)<=pa_high));
+            log_bank |= (log_byte<<i0);
+            LOG_VTCM_BYTE(pa+i0,log_byte,INC.ub[2*IDX+i0],2*IDX+i0);
+        }
+        { LOG_VTCM_BANK(pa, log_bank,IDX); }
+    },
+    ()
+)
+
+DEF_MACRO(fVLOG_VTCM_HALFWORD_INCREMENT_DV,
+    {
+        int slot = insn->slot;
+        int log_bank = 0;
+        int log_byte = 0;
+        paddr_t pa = thread->mem_access[slot].paddr+(OFFSET & ~(ALIGNMENT-1));
+        paddr_t pa_high = thread->mem_access[slot].paddr+LEN;
+        for(int i0 = 0; i0 < 2; i0++) {
+            log_byte =  ((OFFSET>=0)&&((pa+i0)<=pa_high));
+            log_bank |= (log_byte<<i0);
+            LOG_VTCM_BYTE(pa+i0,log_byte,INC.ub[2*IDX+i0],2*IDX+i0);
+        }
+        { LOG_VTCM_BANK(pa, log_bank,(2*IDX2+IDX_H));}
+    },
+    ()
+)
+
+
+
+DEF_MACRO(GATHER_FUNCTION,
+{
+        int slot = insn->slot;
+        int i0;
+        paddr_t pa = thread->mem_access[slot].paddr+OFFSET;
+        paddr_t pa_high = thread->mem_access[slot].paddr+LEN;
+        int log_bank = 0;
+        int log_byte = 0;
+        for(i0 = 0; i0 < ELEMENT_SIZE; i0++)
+        {
+            log_byte =  ((OFFSET>=0)&&((pa+i0)<=pa_high)) && QVAL;
+            log_bank |= (log_byte<<i0);
+            size1u_t B  = sim_mem_read1(thread->system_ptr, thread->threadId, thread->mem_access[slot].paddr+OFFSET+i0);
+            THREAD2STRUCT->tmp_VRegs[0].ub[ELEMENT_SIZE*IDX+i0] = B;
+            LOG_VTCM_BYTE(pa+i0,log_byte,B,ELEMENT_SIZE*IDX+i0);
+        }
+        LOG_VTCM_BANK(pa, log_bank,BANK_IDX);
+},
+()
+)
+
+
+
+DEF_MACRO(fVLOG_VTCM_GATHER_WORD,
+    {
+		GATHER_FUNCTION(EA,OFFSET,IDX, LEN, 4, IDX, 1);
+    },
+    ()
+)
+DEF_MACRO(fVLOG_VTCM_GATHER_HALFWORD,
+    {
+		GATHER_FUNCTION(EA,OFFSET,IDX, LEN, 2, IDX, 1);
+    },
+    ()
+)
+DEF_MACRO(fVLOG_VTCM_GATHER_HALFWORD_DV,
+    {
+		GATHER_FUNCTION(EA,OFFSET,IDX, LEN, 2, (2*IDX2+IDX_H), 1);
+    },
+    ()
+)
+DEF_MACRO(fVLOG_VTCM_GATHER_WORDQ,
+    {
+		GATHER_FUNCTION(EA,OFFSET,IDX, LEN, 4, IDX, fGETQBIT(QsV,4*IDX+i0));
+    },
+    ()
+)
+DEF_MACRO(fVLOG_VTCM_GATHER_HALFWORDQ,
+    {
+		GATHER_FUNCTION(EA,OFFSET,IDX, LEN, 2, IDX, fGETQBIT(QsV,2*IDX+i0));
+    },
+    ()
+)
+
+DEF_MACRO(fVLOG_VTCM_GATHER_HALFWORDQ_DV,
+    {
+		GATHER_FUNCTION(EA,OFFSET,IDX, LEN, 2, (2*IDX2+IDX_H), fGETQBIT(QsV,2*IDX+i0));
+    },
+    ()
+)
+
+
+DEF_MACRO(DEBUG_LOG_ADDR,
+    {
+
+        if (thread->processor_ptr->arch_proc_options->mmvec_network_addr_log2)
+        {
+
+            int slot = insn->slot;
+            paddr_t pa = thread->mem_access[slot].paddr+OFFSET;
+        }
+    },
+    ()
+)
+
+
+
+
+
+
+
+DEF_MACRO(SCATTER_OP_WRITE_TO_MEM,
+    {
+        for (int i = 0; i < mmvecx->vtcm_log.size; i+=sizeof(TYPE))
+        {
+            if ( mmvecx->vtcm_log.mask.ub[i] != 0) {
+                TYPE dst = 0;
+                TYPE inc = 0;
+                for(int j = 0; j < sizeof(TYPE); j++) {
+                    dst |= (sim_mem_read1(thread->system_ptr, thread->threadId, mmvecx->vtcm_log.pa[i+j]) << (8*j));
+                    inc |= mmvecx->vtcm_log.data.ub[j+i] << (8*j);
+
+                    mmvecx->vtcm_log.mask.ub[j+i] = 0;
+                    mmvecx->vtcm_log.data.ub[j+i] = 0;
+                    mmvecx->vtcm_log.offsets.ub[j+i] = 0;
+                }
+                dst += inc;
+                for(int j = 0; j < sizeof(TYPE); j++) {
+                    sim_mem_write1(thread->system_ptr,thread->threadId, mmvecx->vtcm_log.pa[i+j], (dst >> (8*j))& 0xFF );
+                }
+        }
+
+    }
+    },
+    ()
+)
+
+DEF_MACRO(SCATTER_FUNCTION,
+{
+        int slot = insn->slot;
+        int i0;
+        paddr_t pa = thread->mem_access[slot].paddr+OFFSET;
+        paddr_t pa_high = thread->mem_access[slot].paddr+LEN;
+        int log_bank = 0;
+        int log_byte = 0;
+        for(i0 = 0; i0 < ELEMENT_SIZE; i0++) {
+            log_byte = ((OFFSET>=0)&&((pa+i0)<=pa_high)) && QVAL;
+            log_bank |= (log_byte<<i0);
+            LOG_VTCM_BYTE(pa+i0,log_byte,IN.ub[ELEMENT_SIZE*IDX+i0],ELEMENT_SIZE*IDX+i0);
+        }
+        LOG_VTCM_BANK(pa, log_bank,BANK_IDX);
+
+},
+()
+)
+
+DEF_MACRO(fVLOG_VTCM_HALFWORD,
+    {
+		SCATTER_FUNCTION (EA,OFFSET,IDX, LEN, 2, IDX, 1, IN);
+    },
+    ()
+)
+DEF_MACRO(fVLOG_VTCM_WORD,
+    {
+		SCATTER_FUNCTION (EA,OFFSET,IDX, LEN, 4, IDX, 1, IN);
+    },
+    ()
+)
+
+DEF_MACRO(fVLOG_VTCM_HALFWORDQ,
+    {
+		SCATTER_FUNCTION (EA,OFFSET,IDX, LEN, 2, IDX, fGETQBIT(QsV,2*IDX+i0), IN);
+    },
+    ()
+)
+DEF_MACRO(fVLOG_VTCM_WORDQ,
+    {
+		SCATTER_FUNCTION (EA,OFFSET,IDX, LEN, 4, IDX, fGETQBIT(QsV,4*IDX+i0), IN);
+    },
+    ()
+)
+
+
+
+
+
+DEF_MACRO(fVLOG_VTCM_HALFWORD_DV,
+    {
+		SCATTER_FUNCTION (EA,OFFSET,IDX, LEN, 2, (2*IDX2+IDX_H), 1, IN);
+    },
+    ()
+)
+
+DEF_MACRO(fVLOG_VTCM_HALFWORDQ_DV,
+    {
+		SCATTER_FUNCTION (EA,OFFSET,IDX, LEN, 2, (2*IDX2+IDX_H), fGETQBIT(QsV,2*IDX+i0), IN);
+    },
+    ()
+)
+
+
+
+
+
+
+DEF_MACRO(fSTORERELEASE,
+    {
+        fV_AL_CHECK(EA,fVECSIZE()-1);
+
+        mem_store_release(thread, insn, fVECSIZE(), EA&~(fVECSIZE()-1), EA, TYPE, fUSE_LOOKUP_ADDRESS_BY_REV(thread->processor_ptr));
+    },
+	(A_STORE,A_MEMLIKE)
+)
+
+DEF_MACRO(fVFETCH_AL,
+    {
+    fV_AL_CHECK(EA,fVECSIZE()-1);
+    mem_fetch_vector(thread, insn, EA&~(fVECSIZE()-1), insn->slot, fVECSIZE());
+    },
+    (A_LOAD,A_MEMLIKE)
+)
+
+
+DEF_MACRO(fLOADMMV_AL,
+    {
+    fV_AL_CHECK(EA,ALIGNMENT-1);
+	thread->last_pkt->double_access_vec = 0;
+    mem_load_vector_oddva(thread, insn, EA&~(ALIGNMENT-1), EA, insn->slot, LEN, &DST.ub[0], LEN, fUSE_LOOKUP_ADDRESS_BY_REV(thread->processor_ptr));
+    },
+    (A_LOAD,A_MEMLIKE)
+)
+
+DEF_MACRO(fLOADMMV,
+	fLOADMMV_AL(EA,fVECSIZE(),fVECSIZE(),DST),
+	()
+)
+
+DEF_MACRO(fLOADMMVQ,
+	do {
+		int __i;
+		fLOADMMV_AL(EA,fVECSIZE(),fVECSIZE(),DST);
+		fVFOREACH(8,__i) if (!fGETQBIT(QVAL,__i)) DST.b[__i] = 0;
+	} while (0),
+	()
+)
+
+DEF_MACRO(fLOADMMVNQ,
+	do {
+		int __i;
+		fLOADMMV_AL(EA,fVECSIZE(),fVECSIZE(),DST);
+		fVFOREACH(8,__i) if (fGETQBIT(QVAL,__i)) DST.b[__i] = 0;
+	} while (0),
+	()
+)
+
+DEF_MACRO(fLOADMMVU_AL,
+    {
+    size4u_t size2 = (EA)&(ALIGNMENT-1);
+    size4u_t size1 = LEN-size2;
+	thread->last_pkt->double_access_vec = 1;
+    mem_load_vector_oddva(thread, insn, EA+size1, EA+fVECSIZE(), /* slot */ 1, size2, &DST.ub[size1], size2, fUSE_LOOKUP_ADDRESS());
+    mem_load_vector_oddva(thread, insn, EA, EA,/* slot */ 0, size1, &DST.ub[0], size1, fUSE_LOOKUP_ADDRESS_BY_REV(thread->processor_ptr));
+    },
+    (A_LOAD,A_MEMLIKE)
+)
+
+DEF_MACRO(fLOADMMVU,
+	{
+		/* if address happens to be aligned, only do aligned load */
+        thread->last_pkt->pkt_has_vtcm_access = 0;
+        thread->last_pkt->pkt_access_count = 0;
+		if ( (EA & (fVECSIZE()-1)) == 0) {
+            thread->last_pkt->pkt_has_vmemu_access = 0;
+			thread->last_pkt->double_access = 0;
+
+			fLOADMMV_AL(EA,fVECSIZE(),fVECSIZE(),DST);
+		} else {
+            thread->last_pkt->pkt_has_vmemu_access = 1;
+			thread->last_pkt->double_access = 1;
+
+			fLOADMMVU_AL(EA,fVECSIZE(),fVECSIZE(),DST);
+		}
+	},
+	()
+)
+
+DEF_MACRO(fSTOREMMV_AL,
+    {
+    fV_AL_CHECK(EA,ALIGNMENT-1);
+    mem_store_vector_oddva(thread, insn, EA&~(ALIGNMENT-1), EA, insn->slot, LEN, &SRC.ub[0], 0, 0, fUSE_LOOKUP_ADDRESS_BY_REV(thread->processor_ptr));
+    },
+    (A_STORE,A_MEMLIKE)
+)
+
+DEF_MACRO(fSTOREMMV,
+	fSTOREMMV_AL(EA,fVECSIZE(),fVECSIZE(),SRC),
+	()
+)
+
+DEF_MACRO(fSTOREMMVQ_AL,
+    do {
+	mmvector_t maskvec;
+	int i;
+	for (i = 0; i < fVECSIZE(); i++) maskvec.ub[i] = fGETQBIT(MASK,i);
+	mem_store_vector_oddva(thread, insn, EA&~(ALIGNMENT-1), EA, insn->slot, LEN, &SRC.ub[0], &maskvec.ub[0], 0, fUSE_LOOKUP_ADDRESS_BY_REV(thread->processor_ptr));
+    } while (0),
+    (A_STORE,A_MEMLIKE)
+)
+
+DEF_MACRO(fSTOREMMVQ,
+	fSTOREMMVQ_AL(EA,fVECSIZE(),fVECSIZE(),SRC,MASK),
+	()
+)
+
+DEF_MACRO(fSTOREMMVNQ_AL,
+    {
+	mmvector_t maskvec;
+	int i;
+	for (i = 0; i < fVECSIZE(); i++) maskvec.ub[i] = fGETQBIT(MASK,i);
+        fV_AL_CHECK(EA,ALIGNMENT-1);
+	mem_store_vector_oddva(thread, insn, EA&~(ALIGNMENT-1), EA, insn->slot, LEN, &SRC.ub[0], &maskvec.ub[0], 1, fUSE_LOOKUP_ADDRESS_BY_REV(thread->processor_ptr));
+    },
+    (A_STORE,A_MEMLIKE)
+)
+
+DEF_MACRO(fSTOREMMVNQ,
+	fSTOREMMVNQ_AL(EA,fVECSIZE(),fVECSIZE(),SRC,MASK),
+	()
+)
+
+DEF_MACRO(fSTOREMMVU_AL,
+    {
+    size4u_t size1 = ALIGNMENT-((EA)&(ALIGNMENT-1));
+    size4u_t size2;
+    if (size1>LEN) size1 = LEN;
+    size2 = LEN-size1;
+    mem_store_vector_oddva(thread, insn, EA+size1, EA+fVECSIZE(), /* slot */ 1, size2, &SRC.ub[size1], 0, 0, fUSE_LOOKUP_ADDRESS());
+    mem_store_vector_oddva(thread, insn, EA, EA, /* slot */ 0, size1, &SRC.ub[0], 0, 0, fUSE_LOOKUP_ADDRESS_BY_REV(thread->processor_ptr));
+    },
+    (A_STORE,A_MEMLIKE)
+)
+
+DEF_MACRO(fSTOREMMVU,
+	{
+        thread->last_pkt->pkt_has_vtcm_access = 0;
+        thread->last_pkt->pkt_access_count = 0;
+		if ( (EA & (fVECSIZE()-1)) == 0) {
+			thread->last_pkt->double_access = 0;
+			fSTOREMMV_AL(EA,fVECSIZE(),fVECSIZE(),SRC);
+		} else {
+			thread->last_pkt->double_access = 1;
+            thread->last_pkt->pkt_has_vmemu_access = 1;
+			fSTOREMMVU_AL(EA,fVECSIZE(),fVECSIZE(),SRC);
+		}
+	},
+	()
+)
+
+DEF_MACRO(fSTOREMMVQU_AL,
+    {
+	size4u_t size1 = ALIGNMENT-((EA)&(ALIGNMENT-1));
+	size4u_t size2;
+	mmvector_t maskvec;
+	int i;
+	for (i = 0; i < fVECSIZE(); i++) maskvec.ub[i] = fGETQBIT(MASK,i);
+	if (size1>LEN) size1 = LEN;
+	size2 = LEN-size1;
+	mem_store_vector_oddva(thread, insn, EA+size1, EA+fVECSIZE(),/* slot */ 1, size2, &SRC.ub[size1], &maskvec.ub[size1], 0, fUSE_LOOKUP_ADDRESS());
+	mem_store_vector_oddva(thread, insn, EA, /* slot */ 0, size1, &SRC.ub[0], &maskvec.ub[0], 0, fUSE_LOOKUP_ADDRESS_BY_REV(thread->processor_ptr));
+    },
+    (A_STORE,A_MEMLIKE)
+)
+
+DEF_MACRO(fSTOREMMVQU,
+	{
+        thread->last_pkt->pkt_has_vtcm_access = 0;
+        thread->last_pkt->pkt_access_count = 0;
+		if ( (EA & (fVECSIZE()-1)) == 0) {
+			thread->last_pkt->double_access = 0;
+			fSTOREMMVQ_AL(EA,fVECSIZE(),fVECSIZE(),SRC,MASK);
+		} else {
+			thread->last_pkt->double_access = 1;
+            thread->last_pkt->pkt_has_vmemu_access = 1;
+			fSTOREMMVQU_AL(EA,fVECSIZE(),fVECSIZE(),SRC,MASK);
+		}
+	},
+	()
+)
+
+DEF_MACRO(fSTOREMMVNQU_AL,
+    {
+	size4u_t size1 = ALIGNMENT-((EA)&(ALIGNMENT-1));
+	size4u_t size2;
+	mmvector_t maskvec;
+	int i;
+	for (i = 0; i < fVECSIZE(); i++) maskvec.ub[i] = fGETQBIT(MASK,i);
+	if (size1>LEN) size1 = LEN;
+	size2 = LEN-size1;
+	mem_store_vector_oddva(thread, insn, EA+size1, EA+fVECSIZE(), /* slot */ 1, size2, &SRC.ub[size1], &maskvec.ub[size1], 1, fUSE_LOOKUP_ADDRESS());
+	mem_store_vector_oddva(thread, insn, EA, EA, /* slot */ 0, size1, &SRC.ub[0], &maskvec.ub[0], 1, fUSE_LOOKUP_ADDRESS_BY_REV(thread->processor_ptr));
+    },
+    (A_STORE,A_MEMLIKE)
+)
+
+DEF_MACRO(fSTOREMMVNQU,
+	{
+        thread->last_pkt->pkt_has_vtcm_access = 0;
+        thread->last_pkt->pkt_access_count = 0;
+		if ( (EA & (fVECSIZE()-1)) == 0) {
+			thread->last_pkt->double_access = 0;
+			fSTOREMMVNQ_AL(EA,fVECSIZE(),fVECSIZE(),SRC,MASK);
+		} else {
+			thread->last_pkt->double_access = 1;
+            thread->last_pkt->pkt_has_vmemu_access = 1;
+			fSTOREMMVNQU_AL(EA,fVECSIZE(),fVECSIZE(),SRC,MASK);
+		}
+	},
+	()
+)
+
+
+
+
+DEF_MACRO(fVFOREACH,
+    for (VAR = 0; VAR < fVELEM(WIDTH); VAR++),
+    /* NOTHING */
+)
+
+DEF_MACRO(fVARRAY_ELEMENT_ACCESS,
+    ARRAY.v[(INDEX) / (fVECSIZE()/(sizeof(ARRAY.TYPE[0])))].TYPE[(INDEX) % (fVECSIZE()/(sizeof(ARRAY.TYPE[0])))],
+    ()
+)
+
+DEF_MACRO(fVNEWCANCEL,
+	do { THREAD2STRUCT->VRegs_select &= ~(1<<(REGNUM)); } while (0),
+	()
+)
+
+DEF_MACRO(fTMPVDATA,
+	mmvec_vtmp_data(thread),
+	(A_CVI)
+)
+
+DEF_MACRO(fVSATDW,
+    fVSATW( ( ( ((long long)U)<<32 ) | fZXTN(32,64,V) ) ),
+    /* attribs */
+)
+
+DEF_MACRO(fVASL_SATHI,
+    fVSATW(((U)<<1) | ((V)>>31)),
+    /* attribs */
+)
+
+DEF_MACRO(fVUADDSAT,
+	fVSATUN( WIDTH, fZXTN(WIDTH, 2*WIDTH, U)  + fZXTN(WIDTH, 2*WIDTH, V)),
+	/* attribs */
+)
+
+DEF_MACRO(fVSADDSAT,
+	fVSATN(  WIDTH, fSXTN(WIDTH, 2*WIDTH, U)  + fSXTN(WIDTH, 2*WIDTH, V)),
+	/* attribs */
+)
+
+DEF_MACRO(fVUSUBSAT,
+	fVSATUN( WIDTH, fZXTN(WIDTH, 2*WIDTH, U)  - fZXTN(WIDTH, 2*WIDTH, V)),
+	/* attribs */
+)
+
+DEF_MACRO(fVSSUBSAT,
+	fVSATN(  WIDTH, fSXTN(WIDTH, 2*WIDTH, U)  - fSXTN(WIDTH, 2*WIDTH, V)),
+	/* attribs */
+)
+
+DEF_MACRO(fVAVGU,
+	((fZXTN(WIDTH, 2*WIDTH, U) + fZXTN(WIDTH, 2*WIDTH, V))>>1),
+	/* attribs */
+)
+
+DEF_MACRO(fVAVGURND,
+	((fZXTN(WIDTH, 2*WIDTH, U) + fZXTN(WIDTH, 2*WIDTH, V)+1)>>1),
+	/* attribs */
+)
+
+DEF_MACRO(fVNAVGU,
+	((fZXTN(WIDTH, 2*WIDTH, U) - fZXTN(WIDTH, 2*WIDTH, V))>>1),
+	/* attribs */
+)
+
+DEF_MACRO(fVNAVGURNDSAT,
+	fVSATUN(WIDTH,((fZXTN(WIDTH, 2*WIDTH, U) - fZXTN(WIDTH, 2*WIDTH, V)+1)>>1)),
+	/* attribs */
+)
+
+DEF_MACRO(fVAVGS,
+	((fSXTN(WIDTH, 2*WIDTH, U) + fSXTN(WIDTH, 2*WIDTH, V))>>1),
+	/* attribs */
+)
+
+DEF_MACRO(fVAVGSRND,
+	((fSXTN(WIDTH, 2*WIDTH, U) + fSXTN(WIDTH, 2*WIDTH, V)+1)>>1),
+	/* attribs */
+)
+
+DEF_MACRO(fVNAVGS,
+	((fSXTN(WIDTH, 2*WIDTH, U) - fSXTN(WIDTH, 2*WIDTH, V))>>1),
+	/* attribs */
+)
+
+DEF_MACRO(fVNAVGSRND,
+	((fSXTN(WIDTH, 2*WIDTH, U) - fSXTN(WIDTH, 2*WIDTH, V)+1)>>1),
+	/* attribs */
+)
+
+DEF_MACRO(fVNAVGSRNDSAT,
+	fVSATN(WIDTH,((fSXTN(WIDTH, 2*WIDTH, U) - fSXTN(WIDTH, 2*WIDTH, V)+1)>>1)),
+	/* attribs */
+)
+
+
+DEF_MACRO(fVNOROUND,
+	VAL,
+	/* NOTHING */
+)
+DEF_MACRO(fVNOSAT,
+	VAL,
+	/* NOTHING */
+)
+
+DEF_MACRO(fVROUND,
+	((VAL) + (((SHAMT)>0)?(1LL<<((SHAMT)-1)):0)),
+	/* NOTHING */
+)
+
+DEF_MACRO(fCARRY_FROM_ADD32,
+	(((fZXTN(32,64,A)+fZXTN(32,64,B)+C) >> 32) & 1),
+	/* NOTHING */
+)
+
+DEF_MACRO(fUARCH_NOTE_PUMP_4X,
+	,
+	()
+)
+
+DEF_MACRO(fUARCH_NOTE_PUMP_2X,
+	,
+	()
+)
-- 
2.7.4


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 08/20] Hexagon HVX (target/hexagon) semantics generator
  2021-07-05 23:34 [PATCH 00/20] Hexagon HVX (target/hexagon) patch series Taylor Simpson
                   ` (6 preceding siblings ...)
  2021-07-05 23:34 ` [PATCH 07/20] Hexagon HVX (target/hexagon) import macro definitions Taylor Simpson
@ 2021-07-05 23:34 ` Taylor Simpson
  2021-07-05 23:34 ` [PATCH 09/20] Hexagon HVX (target/hexagon) semantics generator - part 2 Taylor Simpson
                   ` (11 subsequent siblings)
  19 siblings, 0 replies; 40+ messages in thread
From: Taylor Simpson @ 2021-07-05 23:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: ale, peter.maydell, bcain, richard.henderson, tsimpson, philmd

Add HVX support to the semantics generator

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/gen_semantics.c | 33 +++++++++++++++++++++++++++++++++
 target/hexagon/hex_common.py   |  9 ++++++++-
 2 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/target/hexagon/gen_semantics.c b/target/hexagon/gen_semantics.c
index c5fccec..4a2bdd7 100644
--- a/target/hexagon/gen_semantics.c
+++ b/target/hexagon/gen_semantics.c
@@ -44,6 +44,11 @@ int main(int argc, char *argv[])
  *         Q6INSN(A2_add,"Rd32=add(Rs32,Rt32)",ATTRIBS(),
  *         "Add 32-bit registers",
  *         { RdV=RsV+RtV;})
+ *     HVX instructions have the following form
+ *         EXTINSN(V6_vinsertwr, "Vx32.w=vinsert(Rt32)",
+ *         ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VX),
+ *         "Insert Word Scalar into Vector",
+ *         VxV.uw[0] = RtV;)
  */
 #define Q6INSN(TAG, BEH, ATTRIBS, DESCR, SEM) \
     do { \
@@ -59,8 +64,23 @@ int main(int argc, char *argv[])
                          ")\n", \
                 #TAG, STRINGIZE(ATTRIBS)); \
     } while (0);
+#define EXTINSN(TAG, BEH, ATTRIBS, DESCR, SEM) \
+    do { \
+        fprintf(outfile, "SEMANTICS( \\\n" \
+                         "    \"%s\", \\\n" \
+                         "    %s, \\\n" \
+                         "    \"\"\"%s\"\"\" \\\n" \
+                         ")\n", \
+                #TAG, STRINGIZE(BEH), STRINGIZE(SEM)); \
+        fprintf(outfile, "ATTRIBUTES( \\\n" \
+                         "    \"%s\", \\\n" \
+                         "    \"%s\" \\\n" \
+                         ")\n", \
+                #TAG, STRINGIZE(ATTRIBS)); \
+    } while (0);
 #include "imported/allidefs.def"
 #undef Q6INSN
+#undef EXTINSN
 
 /*
  * Process the macro definitions
@@ -83,6 +103,19 @@ int main(int argc, char *argv[])
 #include "imported/macros.def"
 #undef DEF_MACRO
 
+/*
+ * Process the macros for HVX
+ */
+#define DEF_MACRO(MNAME, BEH, ATTRS) \
+    fprintf(outfile, "MACROATTRIB( \\\n" \
+                     "    \"%s\", \\\n" \
+                     "    \"\"\"%s\"\"\", \\\n" \
+                     "    \"%s\" \\\n" \
+                     ")\n", \
+            #MNAME, STRINGIZE(BEH), STRINGIZE(ATTRS));
+#include "imported/allext_macros.def"
+#undef DEF_MACRO
+
     fclose(outfile);
     return 0;
 }
diff --git a/target/hexagon/hex_common.py b/target/hexagon/hex_common.py
index b3b5340..d07e48b 100755
--- a/target/hexagon/hex_common.py
+++ b/target/hexagon/hex_common.py
@@ -143,6 +143,9 @@ def compute_tag_immediates(tag):
 ##          P                predicate register
 ##          R                GPR register
 ##          M                modifier register
+##          Q                HVX predicate vector
+##          V                HVX vector register
+##          O                HVX new vector register
 ##      regid can be one of the following
 ##          d, e             destination register
 ##          dd               destination register pair
@@ -178,6 +181,9 @@ def is_readwrite(regid):
 def is_scalar_reg(regtype):
     return regtype in "RPC"
 
+def is_hvx_reg(regtype):
+    return regtype in "VQ"
+
 def is_old_val(regtype, regid, tag):
     return regtype+regid+'V' in semdict[tag]
 
@@ -187,7 +193,8 @@ def is_new_val(regtype, regid, tag):
 def need_slot(tag):
     if ('A_CONDEXEC' in attribdict[tag] or
         'A_STORE' in attribdict[tag] or
-        'A_LOAD' in attribdict[tag]):
+        'A_LOAD' in attribdict[tag] or
+        'A_CVI' in attribdict[tag]):
         return 1
     else:
         return 0
-- 
2.7.4


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 09/20] Hexagon HVX (target/hexagon) semantics generator - part 2
  2021-07-05 23:34 [PATCH 00/20] Hexagon HVX (target/hexagon) patch series Taylor Simpson
                   ` (7 preceding siblings ...)
  2021-07-05 23:34 ` [PATCH 08/20] Hexagon HVX (target/hexagon) semantics generator Taylor Simpson
@ 2021-07-05 23:34 ` Taylor Simpson
  2021-07-05 23:34 ` [PATCH 10/20] Hexagon HVX (target/hexagon) C preprocessor for decode tree Taylor Simpson
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 40+ messages in thread
From: Taylor Simpson @ 2021-07-05 23:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: ale, peter.maydell, bcain, richard.henderson, tsimpson, philmd

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/gen_helper_funcs.py  | 111 ++++++++++++++--
 target/hexagon/gen_helper_protos.py |  16 ++-
 target/hexagon/gen_tcg_funcs.py     | 254 ++++++++++++++++++++++++++++++++++--
 3 files changed, 359 insertions(+), 22 deletions(-)

diff --git a/target/hexagon/gen_helper_funcs.py b/target/hexagon/gen_helper_funcs.py
index 2b1c5d8..c152002 100755
--- a/target/hexagon/gen_helper_funcs.py
+++ b/target/hexagon/gen_helper_funcs.py
@@ -48,12 +48,26 @@ def gen_helper_arg_pair(f,regtype,regid,regno):
     if regno >= 0 : f.write(", ")
     f.write("int64_t %s%sV" % (regtype,regid))
 
+def gen_helper_arg_ext(f,regtype,regid,regno):
+    if regno > 0 : f.write(", ")
+    f.write("void *%s%sV_void" % (regtype,regid))
+
+def gen_helper_arg_ext_pair(f,regtype,regid,regno):
+    if regno > 0 : f.write(", ")
+    f.write("void *%s%sV_void" % (regtype,regid))
+
 def gen_helper_arg_opn(f,regtype,regid,i,tag):
     if (hex_common.is_pair(regid)):
-        gen_helper_arg_pair(f,regtype,regid,i)
+        if (hex_common.is_hvx_reg(regtype)):
+            gen_helper_arg_ext_pair(f,regtype,regid,i)
+        else:
+            gen_helper_arg_pair(f,regtype,regid,i)
     elif (hex_common.is_single(regid)):
         if hex_common.is_old_val(regtype, regid, tag):
-            gen_helper_arg(f,regtype,regid,i)
+            if (hex_common.is_hvx_reg(regtype)):
+                gen_helper_arg_ext(f,regtype,regid,i)
+            else:
+                gen_helper_arg(f,regtype,regid,i)
         elif hex_common.is_new_val(regtype, regid, tag):
             gen_helper_arg_new(f,regtype,regid,i)
         else:
@@ -72,25 +86,67 @@ def gen_helper_dest_decl_pair(f,regtype,regid,regno,subfield=""):
     f.write("    int64_t %s%sV%s = 0;\n" % \
         (regtype,regid,subfield))
 
+def gen_helper_dest_decl_ext(f,regtype,regid):
+    if (regtype == "Q"):
+        f.write("    /* %s%sV is *(MMQReg *)(%s%sV_void) */\n" % \
+            (regtype,regid,regtype,regid))
+    else:
+        f.write("    /* %s%sV is *(MMVector *)(%s%sV_void) */\n" % \
+            (regtype,regid,regtype,regid))
+
+def gen_helper_dest_decl_ext_pair(f,regtype,regid,regno):
+    f.write("    /* %s%sV is *(MMVectorPair *))%s%sV_void) */\n" % \
+        (regtype,regid,regtype, regid))
+
 def gen_helper_dest_decl_opn(f,regtype,regid,i):
     if (hex_common.is_pair(regid)):
-        gen_helper_dest_decl_pair(f,regtype,regid,i)
+        if (hex_common.is_hvx_reg(regtype)):
+            gen_helper_dest_decl_ext_pair(f,regtype,regid, i)
+        else:
+            gen_helper_dest_decl_pair(f,regtype,regid,i)
     elif (hex_common.is_single(regid)):
-        gen_helper_dest_decl(f,regtype,regid,i)
+        if (hex_common.is_hvx_reg(regtype)):
+            gen_helper_dest_decl_ext(f,regtype,regid)
+        else:
+            gen_helper_dest_decl(f,regtype,regid,i)
     else:
         print("Bad register parse: ",regtype,regid,toss,numregs)
 
+def gen_helper_src_var_ext(f,regtype,regid):
+    if (regtype == "Q"):
+       f.write("    /* %s%sV is *(MMQReg *)(%s%sV_void) */\n" % \
+           (regtype,regid,regtype,regid))
+    else:
+       f.write("    /* %s%sV is *(MMVector *)(%s%sV_void) */\n" % \
+           (regtype,regid,regtype,regid))
+
+def gen_helper_src_var_ext_pair(f,regtype,regid,regno):
+    f.write("    /* %s%sV%s is *(MMVectorPair *)(%s%sV%s_void) */\n" % \
+        (regtype,regid,regno,regtype,regid,regno))
+
 def gen_helper_return(f,regtype,regid,regno):
     f.write("    return %s%sV;\n" % (regtype,regid))
 
 def gen_helper_return_pair(f,regtype,regid,regno):
     f.write("    return %s%sV;\n" % (regtype,regid))
 
+def gen_helper_dst_write_ext(f,regtype,regid):
+    return
+
+def gen_helper_dst_write_ext_pair(f,regtype,regid):
+    return
+
 def gen_helper_return_opn(f, regtype, regid, i):
     if (hex_common.is_pair(regid)):
-        gen_helper_return_pair(f,regtype,regid,i)
+        if (hex_common.is_hvx_reg(regtype)):
+            gen_helper_dst_write_ext_pair(f,regtype,regid)
+        else:
+            gen_helper_return_pair(f,regtype,regid,i)
     elif (hex_common.is_single(regid)):
-        gen_helper_return(f,regtype,regid,i)
+        if (hex_common.is_hvx_reg(regtype)):
+            gen_helper_dst_write_ext(f,regtype,regid)
+        else:
+            gen_helper_return(f,regtype,regid,i)
     else:
         print("Bad register parse: ",regtype,regid,toss,numregs)
 
@@ -129,14 +185,20 @@ def gen_helper_function(f, tag, tagregs, tagimms):
                 % (tag, tag))
     else:
         ## The return type of the function is the type of the destination
-        ## register
+        ## register (if scalar)
         i=0
         for regtype,regid,toss,numregs in regs:
             if (hex_common.is_written(regid)):
                 if (hex_common.is_pair(regid)):
-                    gen_helper_return_type_pair(f,regtype,regid,i)
+                    if (hex_common.is_hvx_reg(regtype)):
+                        continue
+                    else:
+                        gen_helper_return_type_pair(f,regtype,regid,i)
                 elif (hex_common.is_single(regid)):
-                    gen_helper_return_type(f,regtype,regid,i)
+                    if (hex_common.is_hvx_reg(regtype)):
+                            continue
+                    else:
+                        gen_helper_return_type(f,regtype,regid,i)
                 else:
                     print("Bad register parse: ",regtype,regid,toss,numregs)
             i += 1
@@ -145,11 +207,31 @@ def gen_helper_function(f, tag, tagregs, tagimms):
             f.write("void")
         f.write(" HELPER(%s)(CPUHexagonState *env" % tag)
 
+        ## Arguments include the vector destination operands
         i = 1
+        for regtype,regid,toss,numregs in regs:
+            if (hex_common.is_written(regid)):
+                if (hex_common.is_pair(regid)):
+                    if (hex_common.is_hvx_reg(regtype)):
+                        gen_helper_arg_ext_pair(f,regtype,regid,i)
+                    else:
+                        continue
+                elif (hex_common.is_single(regid)):
+                    if (hex_common.is_hvx_reg(regtype)):
+                        gen_helper_arg_ext(f,regtype,regid,i)
+                    else:
+                        # This is the return value of the function
+                        continue
+                else:
+                    print("Bad register parse: ",regtype,regid,toss,numregs)
+                i += 1
 
         ## Arguments to the helper function are the source regs and immediates
         for regtype,regid,toss,numregs in regs:
             if (hex_common.is_read(regid)):
+                if (hex_common.is_hvx_reg(regtype) and
+                    hex_common.is_readwrite(regid)):
+                    continue
                 gen_helper_arg_opn(f,regtype,regid,i,tag)
                 i += 1
         for immlett,bits,immshift in imms:
@@ -173,6 +255,17 @@ def gen_helper_function(f, tag, tagregs, tagimms):
                 gen_helper_dest_decl_opn(f,regtype,regid,i)
             i += 1
 
+        for regtype,regid,toss,numregs in regs:
+            if (hex_common.is_read(regid)):
+                if (hex_common.is_pair(regid)):
+                    if (hex_common.is_hvx_reg(regtype)):
+                        gen_helper_src_var_ext_pair(f,regtype,regid,i)
+                elif (hex_common.is_single(regid)):
+                    if (hex_common.is_hvx_reg(regtype)):
+                        gen_helper_src_var_ext(f,regtype,regid)
+                else:
+                    print("Bad register parse: ",regtype,regid,toss,numregs)
+
         if 'A_FPOP' in hex_common.attribdict[tag]:
             f.write('    arch_fpop_start(env);\n');
 
diff --git a/target/hexagon/gen_helper_protos.py b/target/hexagon/gen_helper_protos.py
index ea41007..229ef8d 100755
--- a/target/hexagon/gen_helper_protos.py
+++ b/target/hexagon/gen_helper_protos.py
@@ -94,19 +94,33 @@ def gen_helper_prototype(f, tag, tagregs, tagimms):
             f.write('DEF_HELPER_%s(%s' % (def_helper_size, tag))
 
         ## Generate the qemu DEF_HELPER type for each result
+        ## Iterate over this list twice
+        ## - Emit the scalar result
+        ## - Emit the vector result
         i=0
         for regtype,regid,toss,numregs in regs:
             if (hex_common.is_written(regid)):
-                gen_def_helper_opn(f, tag, regtype, regid, toss, numregs, i)
+                if (not hex_common.is_hvx_reg(regtype)):
+                    gen_def_helper_opn(f, tag, regtype, regid, toss, numregs, i)
                 i += 1
 
         ## Put the env between the outputs and inputs
         f.write(', env' )
         i += 1
 
+        # Second pass
+        for regtype,regid,toss,numregs in regs:
+            if (hex_common.is_written(regid)):
+                if (hex_common.is_hvx_reg(regtype)):
+                    gen_def_helper_opn(f, tag, regtype, regid, toss, numregs, i)
+                    i += 1
+
         ## Generate the qemu type for each input operand (regs and immediates)
         for regtype,regid,toss,numregs in regs:
             if (hex_common.is_read(regid)):
+                if (hex_common.is_hvx_reg(regtype) and
+                    hex_common.is_readwrite(regid)):
+                    continue
                 gen_def_helper_opn(f, tag, regtype, regid, toss, numregs, i)
                 i += 1
         for immlett,bits,immshift in imms:
diff --git a/target/hexagon/gen_tcg_funcs.py b/target/hexagon/gen_tcg_funcs.py
index 7ceb25b..bcf28bb 100755
--- a/target/hexagon/gen_tcg_funcs.py
+++ b/target/hexagon/gen_tcg_funcs.py
@@ -119,10 +119,86 @@ def genptr_decl(f, tag, regtype, regid, regno):
                 (regtype, regid, regtype, regid))
         else:
             print("Bad register parse: ", regtype, regid)
+    elif (regtype == "V"):
+        if (regid in {"dd"}):
+            f.write("    const int %s%sN = insn->regno[%d];\n" %\
+                (regtype, regid, regno))
+            f.write("    const intptr_t %s%sV_off =\n" %\
+                 (regtype, regid))
+            f.write("        offsetof(CPUHexagonState, %s%sV);\n" % \
+                 (regtype, regid))
+            if (not hex_common.skip_qemu_helper(tag)):
+                f.write("    TCGv_ptr %s%sV = tcg_temp_local_new_ptr();\n" % \
+                    (regtype, regid))
+                f.write("    tcg_gen_addi_ptr(%s%sV, cpu_env, %s%sV_off);\n" % \
+                    (regtype, regid, regtype, regid))
+        elif (regid in {"uu", "vv", "xx"}):
+            f.write("    const int %s%sN = insn->regno[%d];\n" %\
+                (regtype, regid, regno))
+            f.write("    const intptr_t %s%sV_off =\n" % \
+                 (regtype, regid))
+            f.write("        offsetof(CPUHexagonState, %s%sV);\n" % \
+                 (regtype, regid))
+            if (not hex_common.skip_qemu_helper(tag)):
+                f.write("    TCGv_ptr %s%sV = tcg_temp_local_new_ptr();\n" % \
+                    (regtype, regid))
+                f.write("    tcg_gen_addi_ptr(%s%sV, cpu_env, %s%sV_off);\n" % \
+                    (regtype, regid, regtype, regid))
+        elif (regid in {"s", "u", "v", "w"}):
+            f.write("    const int %s%sN = insn->regno[%d];\n" % \
+                (regtype, regid, regno))
+            f.write("    const intptr_t %s%sV_off =\n" % \
+                              (regtype, regid))
+            f.write("        vreg_src_off(ctx, %s%sN);\n" % \
+                              (regtype, regid))
+            if (not hex_common.skip_qemu_helper(tag)):
+                f.write("    TCGv_ptr %s%sV = tcg_temp_local_new_ptr();\n" % \
+                    (regtype, regid))
+        elif (regid in {"d", "x", "y"}):
+            f.write("    const int %s%sN = insn->regno[%d];\n" % \
+                (regtype, regid, regno))
+            f.write("    const intptr_t %s%sV_off =\n" % \
+                (regtype, regid))
+            f.write("        offsetof(CPUHexagonState,\n")
+            f.write("                 future_VRegs[%s%sN]);\n" % \
+                (regtype, regid))
+            if (not hex_common.skip_qemu_helper(tag)):
+                f.write("    TCGv_ptr %s%sV = tcg_temp_local_new_ptr();\n" % \
+                    (regtype, regid))
+                f.write("    tcg_gen_addi_ptr(%s%sV, cpu_env, %s%sV_off);\n" % \
+                    (regtype, regid, regtype, regid))
+        else:
+            print("Bad register parse: ", regtype, regid)
+    elif (regtype == "Q"):
+        if (regid in {"d", "e", "x"}):
+            f.write("    const int %s%sN = insn->regno[%d];\n" % \
+                (regtype, regid, regno))
+            f.write("    const intptr_t %s%sV_off =\n" % \
+                (regtype, regid))
+            f.write("        offsetof(CPUHexagonState,\n")
+            f.write("                 future_QRegs[%s%sN]);\n" % \
+                (regtype, regid))
+            if (not hex_common.skip_qemu_helper(tag)):
+                f.write("    TCGv_ptr %s%sV = tcg_temp_local_new_ptr();\n" % \
+                    (regtype, regid))
+                f.write("    tcg_gen_addi_ptr(%s%sV, cpu_env, %s%sV_off);\n" % \
+                    (regtype, regid, regtype, regid))
+        elif (regid in {"s", "t", "u", "v"}):
+            f.write("    const int %s%sN = insn->regno[%d];\n" % \
+                (regtype, regid, regno))
+            f.write("    const intptr_t %s%sV_off =\n" %\
+                (regtype, regid))
+            f.write("        offsetof(CPUHexagonState, QRegs[%s%sN]);\n" % \
+                (regtype, regid))
+            if (not hex_common.skip_qemu_helper(tag)):
+                f.write("    TCGv_ptr %s%sV = tcg_temp_local_new_ptr();\n" % \
+                    (regtype, regid))
+        else:
+            print("Bad register parse: ", regtype, regid)
     else:
         print("Bad register parse: ", regtype, regid)
 
-def genptr_decl_new(f,regtype,regid,regno):
+def genptr_decl_new(f, tag, regtype, regid, regno):
     if (regtype == "N"):
         if (regid in {"s", "t"}):
             f.write("    TCGv %s%sN = hex_new_value[insn->regno[%d]];\n" % \
@@ -135,6 +211,25 @@ def genptr_decl_new(f,regtype,regid,regno):
                 (regtype, regid, regno))
         else:
             print("Bad register parse: ", regtype, regid)
+    elif (regtype == "O"):
+        if (regid == "s"):
+            f.write("    const intptr_t %s%sN_num = insn->regno[%d];\n" % \
+                (regtype, regid, regno))
+            if (hex_common.skip_qemu_helper(tag)):
+                f.write("    const intptr_t %s%sN_off =\n" % \
+                    (regtype, regid))
+                f.write("        test_bit(%s%sN_num, ctx->vregs_updated)\n" % \
+                    (regtype, regid))
+                f.write("            ? offsetof(CPUHexagonState, ")
+                f.write("future_VRegs[%s%sN_num])\n" % \
+                    (regtype, regid))
+                f.write("            : offsetof(CPUHexagonState, ")
+                f.write("zero_vector);\n")
+            else:
+                f.write("    TCGv %s%sN = tcg_const_tl(%s%sN_num);\n" % \
+                    (regtype, regid, regtype, regid))
+        else:
+            print("Bad register parse: ", regtype, regid)
     else:
         print("Bad register parse: ", regtype, regid)
 
@@ -145,7 +240,7 @@ def genptr_decl_opn(f, tag, regtype, regid, toss, numregs, i):
         if hex_common.is_old_val(regtype, regid, tag):
             genptr_decl(f,tag, regtype, regid, i)
         elif hex_common.is_new_val(regtype, regid, tag):
-            genptr_decl_new(f,regtype,regid,i)
+            genptr_decl_new(f, tag, regtype, regid, i)
         else:
             print("Bad register parse: ",regtype,regid,toss,numregs)
     else:
@@ -159,7 +254,7 @@ def genptr_decl_imm(f,immlett):
     f.write("    int %s = insn->immed[%d];\n" % \
         (hex_common.imm_name(immlett), i))
 
-def genptr_free(f,regtype,regid,regno):
+def genptr_free(f, tag, regtype, regid, regno):
     if (regtype == "R"):
         if (regid in {"dd", "ss", "tt", "xx", "yy"}):
             f.write("    tcg_temp_free_i64(%s%sV);\n" % (regtype, regid))
@@ -182,33 +277,55 @@ def genptr_free(f,regtype,regid,regno):
     elif (regtype == "M"):
         if (regid != "u"):
             print("Bad register parse: ", regtype, regid)
+    elif (regtype == "V"):
+        if (regid in {"dd", "uu", "vv", "xx", \
+                      "d", "s", "u", "v", "w", "x", "y"}):
+            if (not hex_common.skip_qemu_helper(tag)):
+                f.write("    tcg_temp_free_ptr(%s%sV);\n" % \
+                    (regtype, regid))
+        else:
+            print("Bad register parse: ", regtype, regid)
+    elif (regtype == "Q"):
+        if (regid in {"d", "e", "s", "t", "u", "v", "x"}):
+            if (not hex_common.skip_qemu_helper(tag)):
+                f.write("    tcg_temp_free_ptr(%s%sV);\n" % \
+                    (regtype, regid))
+        else:
+            print("Bad register parse: ", regtype, regid)
     else:
         print("Bad register parse: ", regtype, regid)
 
-def genptr_free_new(f,regtype,regid,regno):
+def genptr_free_new(f, tag, regtype, regid, regno):
     if (regtype == "N"):
         if (regid not in {"s", "t"}):
             print("Bad register parse: ", regtype, regid)
     elif (regtype == "P"):
         if (regid not in {"t", "u", "v"}):
             print("Bad register parse: ", regtype, regid)
+    elif (regtype == "O"):
+        if (regid == "s"):
+            if (not hex_common.skip_qemu_helper(tag)):
+                f.write("    tcg_temp_free(%s%sN);\n" % \
+                    (regtype, regid))
+        else:
+            print("Bad register parse: ", regtype, regid)
     else:
         print("Bad register parse: ", regtype, regid)
 
 def genptr_free_opn(f,regtype,regid,i,tag):
     if (hex_common.is_pair(regid)):
-        genptr_free(f,regtype,regid,i)
+        genptr_free(f, tag, regtype, regid, i)
     elif (hex_common.is_single(regid)):
         if hex_common.is_old_val(regtype, regid, tag):
-            genptr_free(f,regtype,regid,i)
+            genptr_free(f, tag, regtype, regid, i)
         elif hex_common.is_new_val(regtype, regid, tag):
-            genptr_free_new(f,regtype,regid,i)
+            genptr_free_new(f, tag, regtype, regid, i)
         else:
             print("Bad register parse: ",regtype,regid,toss,numregs)
     else:
         print("Bad register parse: ",regtype,regid,toss,numregs)
 
-def genptr_src_read(f,regtype,regid):
+def genptr_src_read(f, tag, regtype, regid):
     if (regtype == "R"):
         if (regid in {"ss", "tt", "xx", "yy"}):
             f.write("    tcg_gen_concat_i32_i64(%s%sV, hex_gpr[%s%sN],\n" % \
@@ -238,6 +355,47 @@ def genptr_src_read(f,regtype,regid):
     elif (regtype == "M"):
         if (regid != "u"):
             print("Bad register parse: ", regtype, regid)
+    elif (regtype == "V"):
+        if (regid in {"uu", "vv", "xx"}):
+            f.write("    tcg_gen_gvec_mov(MO_64, %s%sV_off,\n" % \
+                (regtype, regid))
+            f.write("        vreg_src_off(ctx, %s%sN),\n" % \
+                (regtype, regid))
+            f.write("        sizeof(MMVector), sizeof(MMVector));\n")
+            f.write("    tcg_gen_gvec_mov(MO_64,\n")
+            f.write("        %s%sV_off + sizeof(MMVector),\n" % \
+                (regtype, regid))
+            f.write("        vreg_src_off(ctx, %s%sN ^ 1),\n" % \
+                (regtype, regid))
+            f.write("        sizeof(MMVector), sizeof(MMVector));\n")
+        elif (regid in {"s", "u", "v", "w"}):
+            if (not hex_common.skip_qemu_helper(tag)):
+                f.write("    tcg_gen_addi_ptr(%s%sV, cpu_env, %s%sV_off);\n" % \
+                                 (regtype, regid, regtype, regid))
+        elif (regid in {"x", "y"}):
+            f.write("    tcg_gen_gvec_mov(MO_64, %s%sV_off,\n" % \
+                             (regtype, regid))
+            f.write("        vreg_src_off(ctx, %s%sN),\n" % \
+                             (regtype, regid))
+            f.write("        sizeof(MMVector), sizeof(MMVector));\n")
+            if (not hex_common.skip_qemu_helper(tag)):
+                f.write("    tcg_gen_addi_ptr(%s%sV, cpu_env, %s%sV_off);\n" % \
+                                 (regtype, regid, regtype, regid))
+        else:
+            print("Bad register parse: ", regtype, regid)
+    elif (regtype == "Q"):
+        if (regid in {"s", "t", "u", "v"}):
+            if (not hex_common.skip_qemu_helper(tag)):
+                f.write("    tcg_gen_addi_ptr(%s%sV, cpu_env, %s%sV_off);\n" % \
+                    (regtype, regid, regtype, regid))
+        elif (regid in {"x"}):
+            f.write("    tcg_gen_gvec_mov(MO_64, %s%sV_off,\n" % \
+                (regtype, regid))
+            f.write("        offsetof(CPUHexagonState, QRegs[%s%sN]),\n" % \
+                (regtype, regid))
+            f.write("        sizeof(MMQReg), sizeof(MMQReg));\n")
+        else:
+            print("Bad register parse: ", regtype, regid)
     else:
         print("Bad register parse: ", regtype, regid)
 
@@ -248,15 +406,18 @@ def genptr_src_read_new(f,regtype,regid):
     elif (regtype == "P"):
         if (regid not in {"t", "u", "v"}):
             print("Bad register parse: ", regtype, regid)
+    elif (regtype == "O"):
+        if (regid != "s"):
+            print("Bad register parse: ", regtype, regid)
     else:
         print("Bad register parse: ", regtype, regid)
 
 def genptr_src_read_opn(f,regtype,regid,tag):
     if (hex_common.is_pair(regid)):
-        genptr_src_read(f,regtype,regid)
+        genptr_src_read(f, tag, regtype, regid)
     elif (hex_common.is_single(regid)):
         if hex_common.is_old_val(regtype, regid, tag):
-            genptr_src_read(f,regtype,regid)
+            genptr_src_read(f, tag, regtype, regid)
         elif hex_common.is_new_val(regtype, regid, tag):
             genptr_src_read_new(f,regtype,regid)
         else:
@@ -334,11 +495,69 @@ def genptr_dst_write(f, tag, regtype, regid):
     else:
         print("Bad register parse: ", regtype, regid)
 
+def genptr_dst_write_ext(f, tag, regtype, regid, newv="0"):
+    if (regtype == "V"):
+        if (regid in {"dd", "xx", "yy"}):
+            if ('A_CONDEXEC' in hex_common.attribdict[tag]):
+                is_predicated = "true"
+            else:
+                is_predicated = "false"
+            f.write("    gen_log_vreg_write_pair(%s%sV_off, %s%sN, %s, " % \
+                (regtype, regid, regtype, regid, newv))
+            f.write("insn->slot, %s, pkt->pkt_has_vhist);\n" % (is_predicated))
+            f.write("    ctx_log_vreg_write_pair(ctx, %s%sN, %s,\n" % \
+                (regtype, regid, newv))
+            f.write("        %s);\n" % (is_predicated))
+        elif (regid in {"d", "x", "y"}):
+            if ('A_CONDEXEC' in hex_common.attribdict[tag]):
+                is_predicated = "true"
+            else:
+                is_predicated = "false"
+            f.write("    gen_log_vreg_write(%s%sV_off, %s%sN, %s, " % \
+                (regtype, regid, regtype, regid, newv))
+            f.write("insn->slot, %s, pkt->pkt_has_vhist);\n" % (is_predicated))
+            f.write("    ctx_log_vreg_write(ctx, %s%sN, %s, %s);\n" % \
+                (regtype, regid, newv, is_predicated))
+        else:
+            print("Bad register parse: ", regtype, regid)
+    elif (regtype == "Q"):
+        if (regid in {"d", "e", "x"}):
+            if ('A_CONDEXEC' in hex_common.attribdict[tag]):
+                is_predicated = "true"
+            else:
+                is_predicated = "false"
+            f.write("    gen_log_qreg_write(%s%sV_off, %s%sN, %s, " % \
+                (regtype, regid, regtype, regid, newv))
+            f.write("insn->slot, %s);\n" % (is_predicated))
+            f.write("    ctx_log_qreg_write(ctx, %s%sN, %s);\n" % \
+                (regtype, regid, is_predicated))
+        else:
+            print("Bad register parse: ", regtype, regid)
+    else:
+        print("Bad register parse: ", regtype, regid)
+
 def genptr_dst_write_opn(f,regtype, regid, tag):
     if (hex_common.is_pair(regid)):
-        genptr_dst_write(f, tag, regtype, regid)
+        if (hex_common.is_hvx_reg(regtype)):
+            if ('A_CVI_TMP' in hex_common.attribdict[tag] or
+                'A_CVI_TMP_DST' in hex_common.attribdict[tag]):
+                genptr_dst_write_ext(f, tag, regtype, regid, "EXT_TMP")
+            else:
+                genptr_dst_write_ext(f, tag, regtype, regid)
+        else:
+            genptr_dst_write(f, tag, regtype, regid)
     elif (hex_common.is_single(regid)):
-        genptr_dst_write(f, tag, regtype, regid)
+        if (hex_common.is_hvx_reg(regtype)):
+            if 'A_CVI_NEW' in hex_common.attribdict[tag]:
+                genptr_dst_write_ext(f, tag, regtype, regid, "EXT_NEW")
+            elif 'A_CVI_TMP' in hex_common.attribdict[tag]:
+                genptr_dst_write_ext(f, tag, regtype, regid, "EXT_TMP")
+            elif 'A_CVI_TMP_DST' in hex_common.attribdict[tag]:
+                genptr_dst_write_ext(f, tag, regtype, regid, "EXT_TMP")
+            else:
+                genptr_dst_write_ext(f, tag, regtype, regid, "EXT_DFL")
+        else:
+            genptr_dst_write(f, tag, regtype, regid)
     else:
         print("Bad register parse: ",regtype,regid,toss,numregs)
 
@@ -409,13 +628,24 @@ def gen_tcg_func(f, tag, regs, imms):
         ## If there is a scalar result, it is the return type
         for regtype,regid,toss,numregs in regs:
             if (hex_common.is_written(regid)):
+                if (hex_common.is_hvx_reg(regtype)):
+                    continue
                 gen_helper_call_opn(f, tag, regtype, regid, toss, numregs, i)
                 i += 1
         if (i > 0): f.write(", ")
         f.write("cpu_env")
         i=1
         for regtype,regid,toss,numregs in regs:
+            if (hex_common.is_written(regid)):
+                if (not hex_common.is_hvx_reg(regtype)):
+                    continue
+                gen_helper_call_opn(f, tag, regtype, regid, toss, numregs, i)
+                i += 1
+        for regtype,regid,toss,numregs in regs:
             if (hex_common.is_read(regid)):
+                if (hex_common.is_hvx_reg(regtype) and
+                    hex_common.is_readwrite(regid)):
+                    continue
                 gen_helper_call_opn(f, tag, regtype, regid, toss, numregs, i)
                 i += 1
         for immlett,bits,immshift in imms:
-- 
2.7.4


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 10/20] Hexagon HVX (target/hexagon) C preprocessor for decode tree
  2021-07-05 23:34 [PATCH 00/20] Hexagon HVX (target/hexagon) patch series Taylor Simpson
                   ` (8 preceding siblings ...)
  2021-07-05 23:34 ` [PATCH 09/20] Hexagon HVX (target/hexagon) semantics generator - part 2 Taylor Simpson
@ 2021-07-05 23:34 ` Taylor Simpson
  2021-07-25 13:15   ` Richard Henderson
  2021-07-05 23:34 ` [PATCH 11/20] Hexagon HVX (target/hexagon) instruction utility functions Taylor Simpson
                   ` (9 subsequent siblings)
  19 siblings, 1 reply; 40+ messages in thread
From: Taylor Simpson @ 2021-07-05 23:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: ale, peter.maydell, bcain, richard.henderson, tsimpson, philmd

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/gen_dectree_import.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/target/hexagon/gen_dectree_import.c b/target/hexagon/gen_dectree_import.c
index 5b7ecfc..2374030 100644
--- a/target/hexagon/gen_dectree_import.c
+++ b/target/hexagon/gen_dectree_import.c
@@ -40,6 +40,11 @@ const char * const opcode_names[] = {
  *         Q6INSN(A2_add,"Rd32=add(Rs32,Rt32)",ATTRIBS(),
  *         "Add 32-bit registers",
  *         { RdV=RsV+RtV;})
+ *     HVX instructions have the following form
+ *         EXTINSN(V6_vinsertwr, "Vx32.w=vinsert(Rt32)",
+ *         ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VX,A_CVI_LATE),
+ *         "Insert Word Scalar into Vector",
+ *         VxV.uw[0] = RtV;)
  */
 const char * const opcode_syntax[XX_LAST_OPCODE] = {
 #define Q6INSN(TAG, BEH, ATTRIBS, DESCR, SEM) \
@@ -105,6 +110,14 @@ static const char *get_opcode_enc(int opcode)
 
 static const char *get_opcode_enc_class(int opcode)
 {
+    const char *tmp = opcode_encodings[opcode].encoding;
+    if (tmp == NULL) {
+        const char *test = "V6_";        /* HVX */
+        char *name = (char *)opcode_names[opcode];
+        if (strncmp(name, test, strlen(test)) == 0) {
+            return "EXT_mmvec";
+        }
+    }
     return opcode_enc_class_names[opcode_encodings[opcode].enc_class];
 }
 
-- 
2.7.4


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 11/20] Hexagon HVX (target/hexagon) instruction utility functions
  2021-07-05 23:34 [PATCH 00/20] Hexagon HVX (target/hexagon) patch series Taylor Simpson
                   ` (9 preceding siblings ...)
  2021-07-05 23:34 ` [PATCH 10/20] Hexagon HVX (target/hexagon) C preprocessor for decode tree Taylor Simpson
@ 2021-07-05 23:34 ` Taylor Simpson
  2021-07-25 13:21   ` Richard Henderson
  2021-07-05 23:34 ` [PATCH 12/20] Hexagon HVX (target/hexagon) helper functions Taylor Simpson
                   ` (8 subsequent siblings)
  19 siblings, 1 reply; 40+ messages in thread
From: Taylor Simpson @ 2021-07-05 23:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: ale, peter.maydell, bcain, richard.henderson, tsimpson, philmd

Functions to support scatter/gather
Add new file to target/hexagon/meson.build

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/arch.h                   |   1 +
 target/hexagon/mmvec/system_ext_mmvec.h |  35 ++++++++++
 target/hexagon/arch.c                   |   9 +++
 target/hexagon/mmvec/system_ext_mmvec.c | 119 ++++++++++++++++++++++++++++++++
 target/hexagon/meson.build              |   1 +
 5 files changed, 165 insertions(+)
 create mode 100644 target/hexagon/mmvec/system_ext_mmvec.h
 create mode 100644 target/hexagon/mmvec/system_ext_mmvec.c

diff --git a/target/hexagon/arch.h b/target/hexagon/arch.h
index 7091806..133cb6d 100644
--- a/target/hexagon/arch.h
+++ b/target/hexagon/arch.h
@@ -24,6 +24,7 @@ extern const uint8_t rLPS_table_64x4[64][4];
 extern const uint8_t AC_next_state_MPS_64[64];
 extern const uint8_t AC_next_state_LPS_64[64];
 
+uint32_t count_leading_ones_2(uint16_t src);
 uint64_t interleave(uint32_t odd, uint32_t even);
 uint64_t deinterleave(uint64_t src);
 int32_t conv_round(int32_t a, int n);
diff --git a/target/hexagon/mmvec/system_ext_mmvec.h b/target/hexagon/mmvec/system_ext_mmvec.h
new file mode 100644
index 0000000..d103191
--- /dev/null
+++ b/target/hexagon/mmvec/system_ext_mmvec.h
@@ -0,0 +1,35 @@
+/*
+ *  Copyright(c) 2019-2021 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HEXAGON_SYSTEM_EXT_MMVEC_H
+#define HEXAGON_SYSTEM_EXT_MMVEC_H
+
+void mem_load_vector(CPUHexagonState *env, target_ulong vaddr,
+                     int size, uint8_t *data);
+void mem_gather_store(CPUHexagonState *env, target_ulong vaddr,
+                      int slot, uint8_t *data);
+void mem_store_vector(CPUHexagonState *env, target_ulong vaddr,
+                      int slot, int size,
+                      uint8_t *data, uint8_t *mask, bool invert);
+void mem_vector_scatter_init(CPUHexagonState *env, int slot,
+                             target_ulong base_vaddr, int length,
+                             int element_size);
+void mem_vector_gather_init(CPUHexagonState *env, int slot,
+                            target_ulong base_vaddr, int length,
+                            int element_size);
+
+#endif
diff --git a/target/hexagon/arch.c b/target/hexagon/arch.c
index 68a55b3..b2ff905 100644
--- a/target/hexagon/arch.c
+++ b/target/hexagon/arch.c
@@ -118,6 +118,15 @@ const uint8_t AC_next_state_LPS_64[64] = {
     37, 38, 38, 63
 };
 
+uint32_t count_leading_ones_2(uint16_t src)
+{
+    int ret;
+    for (ret = 0; src & 0x8000; src <<= 1) {
+        ret++;
+    }
+    return ret;
+}
+
 #define BITS_MASK_8 0x5555555555555555ULL
 #define PAIR_MASK_8 0x3333333333333333ULL
 #define NYBL_MASK_8 0x0f0f0f0f0f0f0f0fULL
diff --git a/target/hexagon/mmvec/system_ext_mmvec.c b/target/hexagon/mmvec/system_ext_mmvec.c
new file mode 100644
index 0000000..fbd7505
--- /dev/null
+++ b/target/hexagon/mmvec/system_ext_mmvec.c
@@ -0,0 +1,119 @@
+/*
+ *  Copyright(c) 2019-2021 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu.h"
+#include "cpu.h"
+#include "mmvec/system_ext_mmvec.h"
+
+void mem_gather_store(CPUHexagonState *env, target_ulong vaddr,
+                      int slot, uint8_t *data)
+{
+    size_t size = sizeof(MMVector);
+
+    /*
+     * If it's a gather store update store data from temporary register
+     * and clear flag
+     */
+    memcpy(data, &env->tmp_VRegs[0].ub[0], size);
+    env->VRegs_updated_tmp = 0;
+    env->gather_issued = false;
+
+    env->vstore_pending[slot] = 1;
+    env->vstore[slot].va   = vaddr;
+    env->vstore[slot].size = size;
+    memcpy(&env->vstore[slot].data.ub[0], data, size);
+
+    /* On a gather store, overwrite the store mask to emulate dropped gathers */
+    memcpy(&env->vstore[slot].mask.ub[0], &env->vtcm_log.mask.ub[0], size);
+}
+
+void mem_store_vector(CPUHexagonState *env, target_ulong vaddr, int slot,
+                      int size, uint8_t *data, uint8_t *mask, bool invert)
+{
+    if (!size) {
+        return;
+    }
+
+    if (env->is_gather_store_insn) {
+        mem_gather_store(env, vaddr, slot, data);
+        return;
+    }
+
+    env->vstore_pending[slot] = 1;
+    env->vstore[slot].va   = vaddr;
+    env->vstore[slot].size = size;
+    memcpy(&env->vstore[slot].data.ub[0], data, size);
+    if (!mask) {
+        memset(&env->vstore[slot].mask.ub[0], invert ? 0 : -1, size);
+    } else if (invert) {
+        for (int i = 0; i < size; i++) {
+            env->vstore[slot].mask.ub[i] = !mask[i];
+        }
+    } else {
+        memcpy(&env->vstore[slot].mask.ub[0], mask, size);
+    }
+}
+
+void mem_load_vector(CPUHexagonState *env, target_ulong vaddr,
+                     int size, uint8_t *data)
+{
+    for (int i = 0; i < size; i++) {
+        get_user_u8(data[i], vaddr);
+        vaddr++;
+    }
+}
+
+void mem_vector_scatter_init(CPUHexagonState *env, int slot,
+                             target_ulong base_vaddr,
+                             int length, int element_size)
+{
+    int i;
+
+    for (i = 0; i < sizeof(MMVector); i++) {
+        env->vtcm_log.data.ub[i] = 0;
+        env->vtcm_log.mask.ub[i] = 0;
+    }
+
+    env->vtcm_pending = true;
+    env->vtcm_log.op = false;
+    env->vtcm_log.op_size = 0;
+    env->vtcm_log.size = sizeof(MMVector);
+}
+
+void mem_vector_gather_init(CPUHexagonState *env, int slot,
+                            target_ulong base_vaddr,
+                            int length, int element_size)
+{
+    int i;
+
+    for (i = 0; i < sizeof(MMVector); i++) {
+        env->vtcm_log.data.ub[i] = 0;
+        env->vtcm_log.mask.ub[i] = 0;
+        env->vtcm_log.va[i] = 0;
+        env->tmp_VRegs[0].ub[i] = 0;
+    }
+    env->vtcm_log.op = false;
+    env->vtcm_log.op_size = 0;
+
+    /*
+     * Temp reg gets updated
+     * This allows store .new to grab the correct result
+     */
+    env->VRegs_updated_tmp = 1;
+    env->gather_issued = true;
+}
diff --git a/target/hexagon/meson.build b/target/hexagon/meson.build
index 6fd9360..ed292b4 100644
--- a/target/hexagon/meson.build
+++ b/target/hexagon/meson.build
@@ -173,6 +173,7 @@ hexagon_ss.add(files(
     'printinsn.c',
     'arch.c',
     'fma_emu.c',
+    'mmvec/system_ext_mmvec.c',
 ))
 
 target_arch += {'hexagon': hexagon_ss}
-- 
2.7.4


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 12/20] Hexagon HVX (target/hexagon) helper functions
  2021-07-05 23:34 [PATCH 00/20] Hexagon HVX (target/hexagon) patch series Taylor Simpson
                   ` (10 preceding siblings ...)
  2021-07-05 23:34 ` [PATCH 11/20] Hexagon HVX (target/hexagon) instruction utility functions Taylor Simpson
@ 2021-07-05 23:34 ` Taylor Simpson
  2021-07-25 13:22   ` Richard Henderson
  2021-07-05 23:34 ` [PATCH 13/20] Hexagon HVX (target/hexagon) TCG generation Taylor Simpson
                   ` (7 subsequent siblings)
  19 siblings, 1 reply; 40+ messages in thread
From: Taylor Simpson @ 2021-07-05 23:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: ale, peter.maydell, bcain, richard.henderson, tsimpson, philmd

Commit vector stores (masked and scatter/gather)
Log vector register writes
Add the execution counters to the debug log

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/helper.h    |  1 +
 target/hexagon/op_helper.c | 51 ++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 50 insertions(+), 2 deletions(-)

diff --git a/target/hexagon/helper.h b/target/hexagon/helper.h
index ca201fb..e41a5a1 100644
--- a/target/hexagon/helper.h
+++ b/target/hexagon/helper.h
@@ -23,6 +23,7 @@ DEF_HELPER_1(debug_start_packet, void, env)
 DEF_HELPER_FLAGS_3(debug_check_store_width, TCG_CALL_NO_WG, void, env, int, int)
 DEF_HELPER_FLAGS_3(debug_commit_end, TCG_CALL_NO_WG, void, env, int, int)
 DEF_HELPER_2(commit_store, void, env, int)
+DEF_HELPER_1(commit_hvx_stores, void, env)
 DEF_HELPER_FLAGS_4(fcircadd, TCG_CALL_NO_RWG_SE, s32, s32, s32, s32, s32)
 DEF_HELPER_FLAGS_1(fbrev, TCG_CALL_NO_RWG_SE, i32, i32)
 DEF_HELPER_3(sfrecipa, i64, env, f32, f32)
diff --git a/target/hexagon/op_helper.c b/target/hexagon/op_helper.c
index 4595559..4e196f9 100644
--- a/target/hexagon/op_helper.c
+++ b/target/hexagon/op_helper.c
@@ -25,6 +25,8 @@
 #include "arch.h"
 #include "hex_arch_types.h"
 #include "fma_emu.h"
+#include "mmvec/mmvec.h"
+#include "mmvec/macros.h"
 
 #define SF_BIAS        127
 #define SF_MANTBITS    23
@@ -162,6 +164,50 @@ void HELPER(commit_store)(CPUHexagonState *env, int slot_num)
     }
 }
 
+void HELPER(commit_hvx_stores)(CPUHexagonState *env)
+{
+    int i;
+
+    /* Normal (possibly masked) vector store */
+    for (i = 0; i < VSTORES_MAX; i++) {
+        if (env->vstore_pending[i]) {
+            env->vstore_pending[i] = 0;
+            target_ulong va = env->vstore[i].va;
+            int size = env->vstore[i].size;
+            for (int j = 0; j < size; j++) {
+                if (env->vstore[i].mask.ub[j]) {
+                    put_user_u8(env->vstore[i].data.ub[j], va + j);
+                }
+            }
+        }
+    }
+
+    /* Scatter store */
+    if (env->vtcm_pending) {
+        env->vtcm_pending = false;
+        if (env->vtcm_log.op) {
+            /* Need to perform the scatter read/modify/write at commit time */
+            if (env->vtcm_log.op_size == 2) {
+                SCATTER_OP_WRITE_TO_MEM(uint16_t);
+            } else if (env->vtcm_log.op_size == 4) {
+                /* Word Scatter += */
+                SCATTER_OP_WRITE_TO_MEM(uint32_t);
+            } else {
+                g_assert_not_reached();
+            }
+        } else {
+            for (i = 0; i < env->vtcm_log.size; i++) {
+                if (env->vtcm_log.mask.ub[i] != 0) {
+                    put_user_u8(env->vtcm_log.data.ub[i], env->vtcm_log.va[i]);
+                    env->vtcm_log.mask.ub[i] = 0;
+                    env->vtcm_log.data.ub[i] = 0;
+                }
+
+            }
+        }
+    }
+}
+
 static void print_store(CPUHexagonState *env, int slot)
 {
     if (!(env->slot_cancelled & (1 << slot))) {
@@ -240,9 +286,10 @@ void HELPER(debug_commit_end)(CPUHexagonState *env, int has_st0, int has_st1)
     HEX_DEBUG_LOG("Next PC = " TARGET_FMT_lx "\n", env->next_PC);
     HEX_DEBUG_LOG("Exec counters: pkt = " TARGET_FMT_lx
                   ", insn = " TARGET_FMT_lx
-                  "\n",
+                  ", hvx = " TARGET_FMT_lx "\n",
                   env->gpr[HEX_REG_QEMU_PKT_CNT],
-                  env->gpr[HEX_REG_QEMU_INSN_CNT]);
+                  env->gpr[HEX_REG_QEMU_INSN_CNT],
+                  env->gpr[HEX_REG_QEMU_HVX_CNT]);
 
 }
 
-- 
2.7.4


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 13/20] Hexagon HVX (target/hexagon) TCG generation
  2021-07-05 23:34 [PATCH 00/20] Hexagon HVX (target/hexagon) patch series Taylor Simpson
                   ` (11 preceding siblings ...)
  2021-07-05 23:34 ` [PATCH 12/20] Hexagon HVX (target/hexagon) helper functions Taylor Simpson
@ 2021-07-05 23:34 ` Taylor Simpson
  2021-07-05 23:34 ` [PATCH 14/20] Hexagon HVX (target/hexagon) import semantics Taylor Simpson
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 40+ messages in thread
From: Taylor Simpson @ 2021-07-05 23:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: ale, peter.maydell, bcain, richard.henderson, tsimpson, philmd

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/translate.h |  54 ++++++++++++
 target/hexagon/genptr.c    |  15 ++++
 target/hexagon/translate.c | 199 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 268 insertions(+)

diff --git a/target/hexagon/translate.h b/target/hexagon/translate.h
index 703fd13..fe3c7c5 100644
--- a/target/hexagon/translate.h
+++ b/target/hexagon/translate.h
@@ -29,6 +29,7 @@ typedef struct DisasContext {
     uint32_t mem_idx;
     uint32_t num_packets;
     uint32_t num_insns;
+    uint32_t num_hvx_insns;
     int reg_log[REG_WRITES_MAX];
     int reg_log_idx;
     DECLARE_BITMAP(regs_written, TOTAL_PER_THREAD_REGS);
@@ -37,6 +38,16 @@ typedef struct DisasContext {
     DECLARE_BITMAP(pregs_written, NUM_PREGS);
     uint8_t store_width[STORES_MAX];
     bool s1_store_processed;
+    int vreg_log[NUM_VREGS];
+    bool vreg_is_predicated[NUM_VREGS];
+    int vreg_log_idx;
+    DECLARE_BITMAP(vregs_updated_tmp, NUM_VREGS);
+    DECLARE_BITMAP(vregs_updated, NUM_VREGS);
+    DECLARE_BITMAP(vregs_select, NUM_VREGS);
+    int qreg_log[NUM_QREGS];
+    bool qreg_is_predicated[NUM_QREGS];
+    int qreg_log_idx;
+    bool is_gather_store_insn;
 } DisasContext;
 
 static inline void ctx_log_reg_write(DisasContext *ctx, int rnum)
@@ -67,6 +78,41 @@ static inline bool is_preloaded(DisasContext *ctx, int num)
     return test_bit(num, ctx->regs_written);
 }
 
+static inline void ctx_log_vreg_write(DisasContext *ctx,
+                                      int rnum, VRegWriteType type,
+                                      bool is_predicated)
+{
+    if (type != EXT_TMP) {
+        ctx->vreg_log[ctx->vreg_log_idx] = rnum;
+        ctx->vreg_is_predicated[ctx->vreg_log_idx] = is_predicated;
+        ctx->vreg_log_idx++;
+
+        set_bit(rnum, ctx->vregs_updated);
+    }
+    if (type == EXT_NEW) {
+        set_bit(rnum, ctx->vregs_select);
+    }
+    if (type == EXT_TMP) {
+        set_bit(rnum, ctx->vregs_updated_tmp);
+    }
+}
+
+static inline void ctx_log_vreg_write_pair(DisasContext *ctx,
+                                           int rnum, VRegWriteType type,
+                                           bool is_predicated)
+{
+    ctx_log_vreg_write(ctx, rnum ^ 0, type, is_predicated);
+    ctx_log_vreg_write(ctx, rnum ^ 1, type, is_predicated);
+}
+
+static inline void ctx_log_qreg_write(DisasContext *ctx,
+                                      int rnum, bool is_predicated)
+{
+    ctx->qreg_log[ctx->qreg_log_idx] = rnum;
+    ctx->qreg_is_predicated[ctx->qreg_log_idx] = is_predicated;
+    ctx->qreg_log_idx++;
+}
+
 extern TCGv hex_gpr[TOTAL_PER_THREAD_REGS];
 extern TCGv hex_pred[NUM_PREGS];
 extern TCGv hex_next_PC;
@@ -85,6 +131,14 @@ extern TCGv hex_dczero_addr;
 extern TCGv hex_llsc_addr;
 extern TCGv hex_llsc_val;
 extern TCGv_i64 hex_llsc_val_i64;
+extern TCGv hex_is_gather_store_insn;
+extern TCGv hex_VRegs_updated_tmp;
+extern TCGv hex_VRegs_updated;
+extern TCGv hex_VRegs_select;
+extern TCGv hex_QRegs_updated;
+extern TCGv hex_vstore_addr[VSTORES_MAX];
+extern TCGv hex_vstore_size[VSTORES_MAX];
+extern TCGv hex_vstore_pending[VSTORES_MAX];
 
 void process_store(DisasContext *ctx, Packet *pkt, int slot_num);
 #endif
diff --git a/target/hexagon/genptr.c b/target/hexagon/genptr.c
index 7333299..da8527d 100644
--- a/target/hexagon/genptr.c
+++ b/target/hexagon/genptr.c
@@ -167,6 +167,9 @@ static inline void gen_read_ctrl_reg(DisasContext *ctx, const int reg_num,
     } else if (reg_num == HEX_REG_QEMU_INSN_CNT) {
         tcg_gen_addi_tl(dest, hex_gpr[HEX_REG_QEMU_INSN_CNT],
                         ctx->num_insns);
+    } else if (reg_num == HEX_REG_QEMU_HVX_CNT) {
+        tcg_gen_addi_tl(dest, hex_gpr[HEX_REG_QEMU_HVX_CNT],
+                        ctx->num_hvx_insns);
     } else {
         tcg_gen_mov_tl(dest, hex_gpr[reg_num]);
     }
@@ -194,6 +197,12 @@ static inline void gen_read_ctrl_reg_pair(DisasContext *ctx, const int reg_num,
         tcg_gen_concat_i32_i64(dest, pkt_cnt, insn_cnt);
         tcg_temp_free(pkt_cnt);
         tcg_temp_free(insn_cnt);
+    } else if (reg_num == HEX_REG_QEMU_HVX_CNT) {
+        TCGv hvx_cnt = tcg_temp_new();
+        tcg_gen_addi_tl(hvx_cnt, hex_gpr[HEX_REG_QEMU_HVX_CNT],
+                        ctx->num_hvx_insns);
+        tcg_gen_concat_i32_i64(dest, hvx_cnt, hex_gpr[reg_num + 1]);
+        tcg_temp_free(hvx_cnt);
     } else {
         tcg_gen_concat_i32_i64(dest,
             hex_gpr[reg_num],
@@ -229,6 +238,9 @@ static inline void gen_write_ctrl_reg(DisasContext *ctx, int reg_num,
         if (reg_num == HEX_REG_QEMU_INSN_CNT) {
             ctx->num_insns = 0;
         }
+        if (reg_num == HEX_REG_QEMU_HVX_CNT) {
+            ctx->num_hvx_insns = 0;
+        }
     }
 }
 
@@ -250,6 +262,9 @@ static inline void gen_write_ctrl_reg_pair(DisasContext *ctx, int reg_num,
             ctx->num_packets = 0;
             ctx->num_insns = 0;
         }
+        if (reg_num == HEX_REG_QEMU_HVX_CNT) {
+            ctx->num_hvx_insns = 0;
+        }
     }
 }
 
diff --git a/target/hexagon/translate.c b/target/hexagon/translate.c
index b23d36a..06d3004 100644
--- a/target/hexagon/translate.c
+++ b/target/hexagon/translate.c
@@ -19,6 +19,7 @@
 #include "qemu/osdep.h"
 #include "cpu.h"
 #include "tcg/tcg-op.h"
+#include "tcg/tcg-op-gvec.h"
 #include "exec/cpu_ldst.h"
 #include "exec/log.h"
 #include "internal.h"
@@ -47,6 +48,14 @@ TCGv hex_dczero_addr;
 TCGv hex_llsc_addr;
 TCGv hex_llsc_val;
 TCGv_i64 hex_llsc_val_i64;
+TCGv hex_is_gather_store_insn;
+TCGv hex_VRegs_updated_tmp;
+TCGv hex_VRegs_updated;
+TCGv hex_VRegs_select;
+TCGv hex_QRegs_updated;
+TCGv hex_vstore_addr[VSTORES_MAX];
+TCGv hex_vstore_size[VSTORES_MAX];
+TCGv hex_vstore_pending[VSTORES_MAX];
 
 static const char * const hexagon_prednames[] = {
   "p0", "p1", "p2", "p3"
@@ -65,6 +74,8 @@ static void gen_exec_counters(DisasContext *ctx)
                     hex_gpr[HEX_REG_QEMU_PKT_CNT], ctx->num_packets);
     tcg_gen_addi_tl(hex_gpr[HEX_REG_QEMU_INSN_CNT],
                     hex_gpr[HEX_REG_QEMU_INSN_CNT], ctx->num_insns);
+    tcg_gen_addi_tl(hex_gpr[HEX_REG_QEMU_HVX_CNT],
+                    hex_gpr[HEX_REG_QEMU_HVX_CNT], ctx->num_hvx_insns);
 }
 
 static void gen_end_tb(DisasContext *ctx)
@@ -172,6 +183,11 @@ static void gen_start_packet(DisasContext *ctx, Packet *pkt)
     bitmap_zero(ctx->regs_written, TOTAL_PER_THREAD_REGS);
     ctx->preg_log_idx = 0;
     bitmap_zero(ctx->pregs_written, NUM_PREGS);
+    ctx->vreg_log_idx = 0;
+    bitmap_zero(ctx->vregs_updated_tmp, NUM_VREGS);
+    bitmap_zero(ctx->vregs_updated, NUM_VREGS);
+    bitmap_zero(ctx->vregs_select, NUM_VREGS);
+    ctx->qreg_log_idx = 0;
     for (i = 0; i < STORES_MAX; i++) {
         ctx->store_width[i] = 0;
     }
@@ -198,6 +214,32 @@ static void gen_start_packet(DisasContext *ctx, Packet *pkt)
     if (need_pred_written(pkt)) {
         tcg_gen_movi_tl(hex_pred_written, 0);
     }
+
+    if (pkt->pkt_has_hvx) {
+        if (pkt->pkt_has_vhist) {
+            tcg_gen_movi_tl(hex_VRegs_updated_tmp, 0);
+            tcg_gen_movi_tl(hex_VRegs_select, 0);
+        }
+        tcg_gen_movi_tl(hex_VRegs_updated, 0);
+        tcg_gen_movi_tl(hex_QRegs_updated, 0);
+        ctx->is_gather_store_insn = false;
+        tcg_gen_movi_tl(hex_is_gather_store_insn, 0);
+    }
+}
+
+static bool is_gather_store_insn(Insn *insn, Packet *pkt)
+{
+    if (GET_ATTRIB(insn->opcode, A_CVI_NEW) &&
+        insn->new_value_producer_slot == 1) {
+        /* Look for gather instruction */
+        for (int i = 0; i < pkt->num_insns; i++) {
+            Insn *in = &pkt->insn[i];
+            if (GET_ATTRIB(in->opcode, A_CVI_GATHER) && in->slot == 1) {
+                return true;
+            }
+        }
+    }
+    return false;
 }
 
 /*
@@ -249,6 +291,10 @@ static void gen_insn(CPUHexagonState *env, DisasContext *ctx,
                      Insn *insn, Packet *pkt)
 {
     if (insn->generate) {
+        if (is_gather_store_insn(insn, pkt)) {
+            ctx->is_gather_store_insn = true;
+            tcg_gen_movi_tl(hex_is_gather_store_insn, 1);
+        }
         mark_implicit_reg_writes(ctx, insn);
         insn->generate(env, ctx, insn, pkt);
         mark_implicit_pred_writes(ctx, insn);
@@ -451,10 +497,125 @@ static void process_dczeroa(DisasContext *ctx, Packet *pkt)
     }
 }
 
+static bool pkt_has_hvx_store(Packet *pkt)
+{
+    int i;
+    for (i = 0; i < pkt->num_insns; i++) {
+        int opcode = pkt->insn[i].opcode;
+        if (GET_ATTRIB(opcode, A_CVI) && GET_ATTRIB(opcode, A_STORE)) {
+            return true;
+        }
+    }
+    return false;
+}
+
+static void gen_commit_hvx(DisasContext *ctx, Packet *pkt)
+{
+    int i;
+
+    /*
+     * vhist instructions need special handling
+     * They potentially write the entire vector register file
+     */
+    if (pkt->pkt_has_vhist) {
+        TCGv cmp = tcg_temp_local_new();
+        size_t size = sizeof(MMVector);
+        for (i = 0; i < NUM_VREGS; i++) {
+            intptr_t dstoff = offsetof(CPUHexagonState, VRegs[i]);
+            intptr_t srcoff = offsetof(CPUHexagonState, future_VRegs[i]);
+            TCGLabel *label_skip = gen_new_label();
+
+            tcg_gen_andi_tl(cmp, hex_VRegs_updated, 1 << i);
+            tcg_gen_brcondi_tl(TCG_COND_EQ, cmp, 0, label_skip);
+            {
+                tcg_gen_gvec_mov(MO_64, dstoff, srcoff, size, size);
+            }
+            gen_set_label(label_skip);
+        }
+        tcg_temp_free(cmp);
+        return;
+    }
+
+    /*
+     *    for (i = 0; i < ctx->vreg_log_idx; i++) {
+     *        int rnum = ctx->vreg_log[i];
+     *        if (ctx->vreg_is_predicated[i]) {
+     *            if (env->VRegs_updated & (1 << rnum)) {
+     *                env->VRegs[rnum] = env->future_VRegs[rnum];
+     *            }
+     *        } else {
+     *            env->VRegs[rnum] = env->future_VRegs[rnum];
+     *        }
+     *    }
+     */
+    for (i = 0; i < ctx->vreg_log_idx; i++) {
+        int rnum = ctx->vreg_log[i];
+        bool is_predicated = ctx->vreg_is_predicated[i];
+        intptr_t dstoff = offsetof(CPUHexagonState, VRegs[rnum]);
+        intptr_t srcoff = offsetof(CPUHexagonState, future_VRegs[rnum]);
+        size_t size = sizeof(MMVector);
+
+        if (is_predicated) {
+            TCGv cmp = tcg_temp_local_new();
+            TCGLabel *label_skip = gen_new_label();
+
+            tcg_gen_andi_tl(cmp, hex_VRegs_updated, 1 << rnum);
+            tcg_gen_brcondi_tl(TCG_COND_EQ, cmp, 0, label_skip);
+            {
+                tcg_gen_gvec_mov(MO_64, dstoff, srcoff, size, size);
+            }
+            gen_set_label(label_skip);
+            tcg_temp_free(cmp);
+        } else {
+            tcg_gen_gvec_mov(MO_64, dstoff, srcoff, size, size);
+        }
+    }
+
+    /*
+     *    for (i = 0; i < ctx->qreg_log_idx; i++) {
+     *        int rnum = ctx->qreg_log[i];
+     *        if (ctx->qreg_is_predicated[i]) {
+     *            if (env->QRegs_updated) & (1 << rnum)) {
+     *                env->QRegs[rnum] = env->future_QRegs[rnum];
+     *            }
+     *        } else {
+     *            env->QRegs[rnum] = env->future_QRegs[rnum];
+     *        }
+     *    }
+     */
+    for (i = 0; i < ctx->qreg_log_idx; i++) {
+        int rnum = ctx->qreg_log[i];
+        bool is_predicated = ctx->qreg_is_predicated[i];
+        intptr_t dstoff = offsetof(CPUHexagonState, QRegs[rnum]);
+        intptr_t srcoff = offsetof(CPUHexagonState, future_QRegs[rnum]);
+        size_t size = sizeof(MMQReg);
+
+        if (is_predicated) {
+            TCGv cmp = tcg_temp_local_new();
+            TCGLabel *label_skip = gen_new_label();
+
+            tcg_gen_andi_tl(cmp, hex_QRegs_updated, 1 << rnum);
+            tcg_gen_brcondi_tl(TCG_COND_EQ, cmp, 0, label_skip);
+            {
+                tcg_gen_gvec_mov(MO_64, dstoff, srcoff, size, size);
+            }
+            gen_set_label(label_skip);
+            tcg_temp_free(cmp);
+        } else {
+            tcg_gen_gvec_mov(MO_64, dstoff, srcoff, size, size);
+        }
+    }
+
+    if (pkt_has_hvx_store(pkt)) {
+        gen_helper_commit_hvx_stores(cpu_env);
+    }
+}
+
 static void update_exec_counters(DisasContext *ctx, Packet *pkt)
 {
     int num_insns = pkt->num_insns;
     int num_real_insns = 0;
+    int num_hvx_insns = 0;
 
     for (int i = 0; i < num_insns; i++) {
         if (!pkt->insn[i].is_endloop &&
@@ -462,10 +623,14 @@ static void update_exec_counters(DisasContext *ctx, Packet *pkt)
             !GET_ATTRIB(pkt->insn[i].opcode, A_IT_NOP)) {
             num_real_insns++;
         }
+        if (GET_ATTRIB(pkt->insn[i].opcode, A_CVI)) {
+            num_hvx_insns++;
+        }
     }
 
     ctx->num_packets++;
     ctx->num_insns += num_real_insns;
+    ctx->num_hvx_insns += num_hvx_insns;
 }
 
 static void gen_commit_packet(DisasContext *ctx, Packet *pkt)
@@ -474,6 +639,9 @@ static void gen_commit_packet(DisasContext *ctx, Packet *pkt)
     gen_pred_writes(ctx, pkt);
     process_store_log(ctx, pkt);
     process_dczeroa(ctx, pkt);
+    if (pkt->pkt_has_hvx) {
+        gen_commit_hvx(ctx, pkt);
+    }
     update_exec_counters(ctx, pkt);
     if (HEX_DEBUG) {
         TCGv has_st0 =
@@ -527,6 +695,7 @@ static void hexagon_tr_init_disas_context(DisasContextBase *dcbase,
     ctx->mem_idx = MMU_USER_IDX;
     ctx->num_packets = 0;
     ctx->num_insns = 0;
+    ctx->num_hvx_insns = 0;
 }
 
 static void hexagon_tr_tb_start(DisasContextBase *db, CPUState *cpu)
@@ -652,6 +821,9 @@ static char store_addr_names[STORES_MAX][NAME_LEN];
 static char store_width_names[STORES_MAX][NAME_LEN];
 static char store_val32_names[STORES_MAX][NAME_LEN];
 static char store_val64_names[STORES_MAX][NAME_LEN];
+static char vstore_addr_names[VSTORES_MAX][NAME_LEN];
+static char vstore_size_names[VSTORES_MAX][NAME_LEN];
+static char vstore_pending_names[VSTORES_MAX][NAME_LEN];
 
 void hexagon_translate_init(void)
 {
@@ -714,6 +886,17 @@ void hexagon_translate_init(void)
         offsetof(CPUHexagonState, llsc_val), "llsc_val");
     hex_llsc_val_i64 = tcg_global_mem_new_i64(cpu_env,
         offsetof(CPUHexagonState, llsc_val_i64), "llsc_val_i64");
+    hex_is_gather_store_insn = tcg_global_mem_new(cpu_env,
+        offsetof(CPUHexagonState, is_gather_store_insn),
+        "is_gather_store_insn");
+    hex_VRegs_updated_tmp = tcg_global_mem_new(cpu_env,
+        offsetof(CPUHexagonState, VRegs_updated_tmp), "VRegs_updated_tmp");
+    hex_VRegs_updated = tcg_global_mem_new(cpu_env,
+        offsetof(CPUHexagonState, VRegs_updated), "VRegs_updated");
+    hex_VRegs_select = tcg_global_mem_new(cpu_env,
+        offsetof(CPUHexagonState, VRegs_select), "VRegs_select");
+    hex_QRegs_updated = tcg_global_mem_new(cpu_env,
+        offsetof(CPUHexagonState, QRegs_updated), "QRegs_updated");
     for (i = 0; i < STORES_MAX; i++) {
         snprintf(store_addr_names[i], NAME_LEN, "store_addr_%d", i);
         hex_store_addr[i] = tcg_global_mem_new(cpu_env,
@@ -735,4 +918,20 @@ void hexagon_translate_init(void)
             offsetof(CPUHexagonState, mem_log_stores[i].data64),
             store_val64_names[i]);
     }
+    for (int i = 0; i < VSTORES_MAX; i++) {
+        snprintf(vstore_addr_names[i], NAME_LEN, "vstore_addr_%d", i);
+        hex_vstore_addr[i] = tcg_global_mem_new(cpu_env,
+            offsetof(CPUHexagonState, vstore[i].va),
+            vstore_addr_names[i]);
+
+        snprintf(vstore_size_names[i], NAME_LEN, "vstore_size_%d", i);
+        hex_vstore_size[i] = tcg_global_mem_new(cpu_env,
+            offsetof(CPUHexagonState, vstore[i].size),
+            vstore_size_names[i]);
+
+        snprintf(vstore_pending_names[i], NAME_LEN, "vstore_pending_%d", i);
+        hex_vstore_pending[i] = tcg_global_mem_new(cpu_env,
+            offsetof(CPUHexagonState, vstore_pending[i]),
+            vstore_pending_names[i]);
+    }
 }
-- 
2.7.4


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 14/20] Hexagon HVX (target/hexagon) import semantics
  2021-07-05 23:34 [PATCH 00/20] Hexagon HVX (target/hexagon) patch series Taylor Simpson
                   ` (12 preceding siblings ...)
  2021-07-05 23:34 ` [PATCH 13/20] Hexagon HVX (target/hexagon) TCG generation Taylor Simpson
@ 2021-07-05 23:34 ` Taylor Simpson
  2021-07-05 23:34 ` [PATCH 15/20] Hexagon HVX (target/hexagon) instruction decoding Taylor Simpson
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 40+ messages in thread
From: Taylor Simpson @ 2021-07-05 23:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: ale, peter.maydell, bcain, richard.henderson, tsimpson, philmd

Imported from the Hexagon architecture library
    imported/allext.idef           Top level file for all extensions
    imported/mmvec/ext.idef        HVX instruction definitions

Support functions added to target/hexagon/op_helper.c
Support functions added to target/hexagon/genptr.c

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/genptr.c                |   96 ++
 target/hexagon/op_helper.c             |   23 +
 target/hexagon/imported/allext.idef    |   25 +
 target/hexagon/imported/allidefs.def   |    1 +
 target/hexagon/imported/mmvec/ext.idef | 2606 ++++++++++++++++++++++++++++++++
 5 files changed, 2751 insertions(+)
 create mode 100644 target/hexagon/imported/allext.idef
 create mode 100644 target/hexagon/imported/mmvec/ext.idef

diff --git a/target/hexagon/genptr.c b/target/hexagon/genptr.c
index da8527d..45c5a71 100644
--- a/target/hexagon/genptr.c
+++ b/target/hexagon/genptr.c
@@ -19,6 +19,7 @@
 #include "cpu.h"
 #include "internal.h"
 #include "tcg/tcg-op.h"
+#include "tcg/tcg-op-gvec.h"
 #include "insn.h"
 #include "opcodes.h"
 #include "translate.h"
@@ -474,5 +475,100 @@ static TCGv gen_8bitsof(TCGv result, TCGv value)
     return result;
 }
 
+static intptr_t vreg_src_off(DisasContext *ctx, int num)
+{
+    intptr_t offset = offsetof(CPUHexagonState, VRegs[num]);
+
+    if (test_bit(num, ctx->vregs_select)) {
+        offset = offsetof(CPUHexagonState, future_VRegs[num]);
+    }
+    if (test_bit(num, ctx->vregs_updated_tmp)) {
+        offset = offsetof(CPUHexagonState, tmp_VRegs[num]);
+    }
+    return offset;
+}
+
+static void gen_log_vreg_write(intptr_t srcoff, int num,
+                               VRegWriteType type, int slot_num,
+                               bool is_predicated, bool has_vhist)
+{
+    TCGLabel *label_end = NULL;
+    intptr_t dstoff;
+
+    if (is_predicated) {
+        TCGv cancelled = tcg_temp_local_new();
+        label_end = gen_new_label();
+
+        /* Don't do anything if the slot was cancelled */
+        tcg_gen_extract_tl(cancelled, hex_slot_cancelled, slot_num, 1);
+        tcg_gen_brcondi_tl(TCG_COND_NE, cancelled, 0, label_end);
+        tcg_temp_free(cancelled);
+    }
+
+    if (type != EXT_TMP) {
+        tcg_gen_ori_tl(hex_VRegs_updated, hex_VRegs_updated, 1 << num);
+    }
+    if (has_vhist && type == EXT_NEW) {
+        tcg_gen_ori_tl(hex_VRegs_select, hex_VRegs_select, 1 << num);
+    }
+    if (has_vhist && type == EXT_TMP) {
+        tcg_gen_ori_tl(hex_VRegs_updated_tmp, hex_VRegs_updated_tmp, 1 << num);
+    }
+
+    dstoff = offsetof(CPUHexagonState, future_VRegs[num]);
+    if (dstoff != srcoff) {
+        tcg_gen_gvec_mov(MO_64, dstoff, srcoff,
+                         sizeof(MMVector), sizeof(MMVector));
+    }
+
+    if (type == EXT_TMP) {
+        dstoff = offsetof(CPUHexagonState, tmp_VRegs[num]);
+        tcg_gen_gvec_mov(MO_64, dstoff, srcoff,
+                         sizeof(MMVector), sizeof(MMVector));
+    }
+
+    if (is_predicated) {
+        gen_set_label(label_end);
+    }
+}
+
+static void gen_log_vreg_write_pair(intptr_t srcoff, int num,
+                                    VRegWriteType type, int slot_num,
+                                    bool is_predicated, bool has_vhist)
+{
+    gen_log_vreg_write(srcoff, num ^ 0, type, slot_num,
+                       is_predicated, has_vhist);
+    srcoff += sizeof(MMVector);
+    gen_log_vreg_write(srcoff, num ^ 1, type, slot_num,
+                       is_predicated, has_vhist);
+}
+
+static void gen_log_qreg_write(intptr_t srcoff, int num, int vnew,
+                               int slot_num, bool is_predicated)
+{
+    TCGLabel *label_end = NULL;
+    intptr_t dstoff;
+
+    if (is_predicated) {
+        TCGv cancelled = tcg_temp_local_new();
+        label_end = gen_new_label();
+
+        /* Don't do anything if the slot was cancelled */
+        tcg_gen_extract_tl(cancelled, hex_slot_cancelled, slot_num, 1);
+        tcg_gen_brcondi_tl(TCG_COND_NE, cancelled, 0, label_end);
+        tcg_temp_free(cancelled);
+    }
+
+    dstoff = offsetof(CPUHexagonState, future_QRegs[num]);
+    if (dstoff != srcoff) {
+        tcg_gen_gvec_mov(MO_64, dstoff, srcoff, sizeof(MMQReg), sizeof(MMQReg));
+    }
+
+    if (is_predicated) {
+        tcg_gen_ori_tl(hex_QRegs_updated, hex_QRegs_updated, 1 << num);
+        gen_set_label(label_end);
+    }
+}
+
 #include "tcg_funcs_generated.c.inc"
 #include "tcg_func_table_generated.c.inc"
diff --git a/target/hexagon/op_helper.c b/target/hexagon/op_helper.c
index 4e196f9..0fcd70a 100644
--- a/target/hexagon/op_helper.c
+++ b/target/hexagon/op_helper.c
@@ -1214,6 +1214,29 @@ float64 HELPER(dfmpyhh)(CPUHexagonState *env, float64 RxxV,
     return RxxV;
 }
 
+/* Log a write to HVX vector */
+static void log_vreg_write(CPUHexagonState *env, int num, void *var,
+                           VRegWriteType type)
+{
+    VRegMask regnum_mask = ((VRegMask)1) << num;
+
+    HEX_DEBUG_LOG("log_vreg_write[%d]\n", num);
+
+    env->VRegs_updated      |= (type != EXT_TMP) ? regnum_mask : 0;
+    env->VRegs_select       |= (type == EXT_NEW) ? regnum_mask : 0;
+    env->VRegs_updated_tmp  |= (type == EXT_TMP) ? regnum_mask : 0;
+    env->future_VRegs[num] = *(MMVector *)var;
+    if (type == EXT_TMP) {
+        env->tmp_VRegs[num] = env->future_VRegs[num];
+    }
+}
+
+static void log_mmvector_write(CPUHexagonState *env, int num,
+                               MMVector var, VRegWriteType type)
+{
+    log_vreg_write(env, num, &var, type);
+}
+
 static void cancel_slot(CPUHexagonState *env, uint32_t slot)
 {
     HEX_DEBUG_LOG("Slot %d cancelled\n", slot);
diff --git a/target/hexagon/imported/allext.idef b/target/hexagon/imported/allext.idef
new file mode 100644
index 0000000..9d4b23e
--- /dev/null
+++ b/target/hexagon/imported/allext.idef
@@ -0,0 +1,25 @@
+/*
+ *  Copyright(c) 2019-2021 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+/*
+ * Top level file for all instruction set extensions
+ */
+#define EXTNAME mmvec
+#define EXTSTR "mmvec"
+#include "mmvec/ext.idef"
+#undef EXTNAME
+#undef EXTSTR
diff --git a/target/hexagon/imported/allidefs.def b/target/hexagon/imported/allidefs.def
index 2aace29..ee253b8 100644
--- a/target/hexagon/imported/allidefs.def
+++ b/target/hexagon/imported/allidefs.def
@@ -28,3 +28,4 @@
 #include "shift.idef"
 #include "system.idef"
 #include "subinsns.idef"
+#include "allext.idef"
diff --git a/target/hexagon/imported/mmvec/ext.idef b/target/hexagon/imported/mmvec/ext.idef
new file mode 100644
index 0000000..8ca5a60
--- /dev/null
+++ b/target/hexagon/imported/mmvec/ext.idef
@@ -0,0 +1,2606 @@
+/*
+ *  Copyright(c) 2019-2021 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+/******************************************************************************
+ *
+ *     HOYA: MULTI MEDIA INSTRUCITONS
+ *
+ ******************************************************************************/
+
+#ifndef EXTINSN
+#define EXTINSN Q6INSN
+#define __SELF_DEF_EXTINSN 1
+#endif
+
+#ifndef NO_MMVEC
+
+#define DO_FOR_EACH_CODE(WIDTH, CODE) \
+{ \
+    fHIDE(int i;) \
+    fVFOREACH(WIDTH, i) {\
+        CODE ;\
+    } \
+}
+
+
+
+
+#define ITERATOR_INSN_ANY_SLOT(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA),  \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+
+
+#define ITERATOR_INSN2_ANY_SLOT(WIDTH,TAG,SYNTAX,SYNTAX2,DESCR,CODE) \
+ITERATOR_INSN_ANY_SLOT(WIDTH,TAG,SYNTAX2,DESCR,CODE)
+
+#define ITERATOR_INSN_ANY_SLOT_DOUBLE_VEC(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA_DV),  \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+
+#define ITERATOR_INSN2_ANY_SLOT_DOUBLE_VEC(WIDTH,TAG,SYNTAX,SYNTAX2,DESCR,CODE) \
+ITERATOR_INSN_ANY_SLOT_DOUBLE_VEC(WIDTH,TAG,SYNTAX2,DESCR,CODE)
+
+
+#define ITERATOR_INSN_SHIFT_SLOT(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VS),  \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+
+
+#define ITERATOR_INSN_SHIFT_SLOT_VV_LATE(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VS),  \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+#define ITERATOR_INSN2_SHIFT_SLOT(WIDTH,TAG,SYNTAX,SYNTAX2,DESCR,CODE) \
+ITERATOR_INSN_SHIFT_SLOT(WIDTH,TAG,SYNTAX2,DESCR,CODE)
+
+#define ITERATOR_INSN_PERMUTE_SLOT(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VP),  \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+#define ITERATOR_INSN2_PERMUTE_SLOT(WIDTH,TAG,SYNTAX,SYNTAX2,DESCR,CODE) \
+ITERATOR_INSN_PERMUTE_SLOT(WIDTH,TAG,SYNTAX2,DESCR,CODE)
+
+#define ITERATOR_INSN_PERMUTE_SLOT_DEP(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VP),
+
+
+#define ITERATOR_INSN2_PERMUTE_SLOT_DEP(WIDTH,TAG,SYNTAX,SYNTAX2,DESCR,CODE) \
+ITERATOR_INSN_PERMUTE_SLOT_DEP(WIDTH,TAG,SYNTAX2,DESCR,CODE)
+
+#define ITERATOR_INSN_PERMUTE_SLOT_DOUBLE_VEC(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VP_VS),  \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+#define ITERATOR_INSN_PERMUTE_SLOT_DOUBLE_VEC_DEP(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VP_VS),  \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+#define ITERATOR_INSN2_PERMUTE_SLOT_DOUBLE_VEC(WIDTH,TAG,SYNTAX,SYNTAX2,DESCR,CODE) \
+ITERATOR_INSN_PERMUTE_SLOT_DOUBLE_VEC(WIDTH,TAG,SYNTAX2,DESCR,CODE)
+
+#define ITERATOR_INSN_MPY_SLOT(WIDTH,TAG, SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, \
+ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VX),  \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+#define ITERATOR_INSN_MPY_SLOT_LATE(WIDTH,TAG, SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, \
+ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VX),  \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+#define ITERATOR_INSN2_MPY_SLOT(WIDTH,TAG,SYNTAX,SYNTAX2,DESCR,CODE) \
+ITERATOR_INSN_MPY_SLOT(WIDTH,TAG,SYNTAX2,DESCR,CODE)
+
+#define ITERATOR_INSN2_MPY_SLOT_LATE(WIDTH,TAG, SYNTAX,SYNTAX2,DESCR,CODE) \
+ITERATOR_INSN_MPY_SLOT_LATE(WIDTH,TAG, SYNTAX2,DESCR,CODE)
+
+
+#define ITERATOR_INSN_MPY_SLOT_DOUBLE_VEC(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VX_DV),  \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+#define ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(WIDTH,TAG,SYNTAX,SYNTAX2,DESCR,CODE) \
+ITERATOR_INSN_MPY_SLOT_DOUBLE_VEC(WIDTH,TAG,SYNTAX2,DESCR,CODE)
+
+
+
+
+#define ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC2(WIDTH,TAG,SYNTAX,SYNTAX2,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VX_DV,A_CVI_VX_VSRC0_IS_DST), DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+#define ITERATOR_INSN_SLOT2_DOUBLE_VEC(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VX_DV,A_RESTRICT_SLOT2ONLY), DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+#define ITERATOR_INSN_VHISTLIKE(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_4SLOT),  \
+DESCR, fHIDE(mmvector_t input;) input = fTMPVDATA(); DO_FOR_EACH_CODE(WIDTH, CODE))
+
+
+
+
+
+/******************************************************************************************
+*
+* MMVECTOR MEMORY OPERATIONS - NO NAPALI V1
+*
+*******************************************************************************************/
+
+
+
+#define ITERATOR_INSN_MPY_SLOT_DOUBLE_VEC_NOV1(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VX_DV),  \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+#define ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC_NOV1(WIDTH,TAG,SYNTAX,SYNTAX2,DESCR,CODE) \
+ITERATOR_INSN_MPY_SLOT_DOUBLE_VEC_NOV1(WIDTH,TAG,SYNTAX2,DESCR,CODE)
+
+
+
+#define ITERATOR_INSN_SHIFT_SLOT_NOV1(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VS),  \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+#define ITERATOR_INSN2_SHIFT_SLOT_NOV1(WIDTH,TAG,SYNTAX,SYNTAX2,DESCR,CODE) \
+ITERATOR_INSN_SHIFT_SLOT_NOV1(WIDTH,TAG,SYNTAX2,DESCR,CODE)
+
+
+#define ITERATOR_INSN_ANY_SLOT_NOV1(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA),  \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+#define ITERATOR_INSN2_ANY_SLOT_NOV1(WIDTH,TAG,SYNTAX,SYNTAX2,DESCR,CODE) \
+ITERATOR_INSN_ANY_SLOT_NOV1(WIDTH,TAG,SYNTAX2,DESCR,CODE)
+
+
+#define ITERATOR_INSN_MPY_SLOT_NOV1(WIDTH,TAG, SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, \
+ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VX),  \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+#define ITERATOR_INSN_PERMUTE_SLOT_NOV1(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VP),  \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+#define ITERATOR_INSN2_PERMUTE_SLOTT_NOV1(WIDTH,TAG,SYNTAX,SYNTAX2,DESCR,CODE) \
+ITERATOR_INSN_PERMUTE_SLOT(WIDTH,TAG,SYNTAX2,DESCR,CODE)
+
+#define ITERATOR_INSN_PERMUTE_SLOT_DEPT_NOV1(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VP),
+
+
+#define ITERATOR_INSN2_PERMUTE_SLOT_DEPT_NOV1(WIDTH,TAG,SYNTAX,SYNTAX2,DESCR,CODE) \
+ITERATOR_INSN_PERMUTE_SLOT_DEP_NOV1(WIDTH,TAG,SYNTAX2,DESCR,CODE)
+
+#define ITERATOR_INSN_PERMUTE_SLOT_DOUBLE_VEC_NOV1(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VP_VS),  \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+#define ITERATOR_INSN_PERMUTE_SLOT_DOUBLE_VEC_DEPT_NOV1(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VP_VS),  \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+#define ITERATOR_INSN2_PERMUTE_SLOT_DOUBLE_VEC_NOV1(WIDTH,TAG,SYNTAX,SYNTAX2,DESCR,CODE) \
+ITERATOR_INSN_PERMUTE_SLOT_DOUBLE_VEC_NOV1(WIDTH,TAG,SYNTAX2,DESCR,CODE)
+
+#define NARROWING_SHIFT_NOV1(ITERSIZE,TAG,DSTM,DSTTYPE,SRCTYPE,SYNOPTS,SATFUNC,RNDFUNC,SHAMTMASK) \
+ITERATOR_INSN_SHIFT_SLOT_NOV1(ITERSIZE,TAG, \
+"Vd32." #DSTTYPE "=vasr(Vu32." #SRCTYPE ",Vv32." #SRCTYPE ",Rt8)" #SYNOPTS, \
+"Vector shift right and shuffle", \
+    fHIDE(int )shamt = RtV & SHAMTMASK; \
+    DSTM(0,VdV.SRCTYPE[i],SATFUNC(RNDFUNC(VvV.SRCTYPE[i],shamt) >> shamt)); \
+    DSTM(1,VdV.SRCTYPE[i],SATFUNC(RNDFUNC(VuV.SRCTYPE[i],shamt) >> shamt)))
+
+#define MMVEC_AVGS_NOV1(TYPE,TYPE2,DESCR, WIDTH, DEST,SRC)\
+ITERATOR_INSN2_ANY_SLOT_NOV1(WIDTH,vavg##TYPE,                        "Vd32=vavg"TYPE2"(Vu32,Vv32)",          "Vd32."#DEST"=vavg(Vu32."#SRC",Vv32."#SRC")",          "Vector Average "DESCR,                                      VdV.DEST[i]  = fVAVGS(       WIDTH,  VuV.SRC[i], VvV.SRC[i])) \
+ITERATOR_INSN2_ANY_SLOT_NOV1(WIDTH,vavg##TYPE##rnd,                   "Vd32=vavg"TYPE2"(Vu32,Vv32):rnd",      "Vd32."#DEST"=vavg(Vu32."#SRC",Vv32."#SRC"):rnd",      "Vector Average % Round"DESCR,                               VdV.DEST[i]  = fVAVGSRND(    WIDTH,  VuV.SRC[i], VvV.SRC[i])) \
+ITERATOR_INSN2_ANY_SLOT_NOV1(WIDTH,vnavg##TYPE,                       "Vd32=vnavg"TYPE2"(Vu32,Vv32)",         "Vd32."#DEST"=vnavg(Vu32."#SRC",Vv32."#SRC")",         "Vector Negative Average "DESCR,                             VdV.DEST[i]  = fVNAVGS(      WIDTH,  VuV.SRC[i], VvV.SRC[i]))
+
+  #define MMVEC_AVGU_NOV1(TYPE,TYPE2,DESCR, WIDTH, DEST,SRC)\
+ITERATOR_INSN2_ANY_SLOT_NOV1(WIDTH,vavg##TYPE,                        "Vd32=vavg"TYPE2"(Vu32,Vv32)",         "Vd32."#DEST"=vavg(Vu32."#SRC",Vv32."#SRC")",        "Vector Average "DESCR,                                      VdV.DEST[i] = fVAVGU(   WIDTH,  VuV.SRC[i], VvV.SRC[i])) \
+ITERATOR_INSN2_ANY_SLOT_NOV1(WIDTH,vavg##TYPE##rnd,                   "Vd32=vavg"TYPE2"(Vu32,Vv32):rnd",     "Vd32."#DEST"=vavg(Vu32."#SRC",Vv32."#SRC"):rnd",    "Vector Average % Round"DESCR,                               VdV.DEST[i] = fVAVGURND(WIDTH,  VuV.SRC[i], VvV.SRC[i]))
+
+
+
+/******************************************************************************************
+*
+* MMVECTOR MEMORY OPERATIONS
+*
+*******************************************************************************************/
+
+#define MMVEC_EACH_EA(TAG,DESCR,ATTRIB,NT,SYNTAXA,SYNTAXB,BEH) \
+EXTINSN(V6_##TAG##_pi,      SYNTAXA "(Rx32++#s3)" NT SYNTAXB,ATTRIB,DESCR,{ fEA_REG(RxV); BEH; fPM_I(RxV,VEC_SCALE(siV)); }) \
+EXTINSN(V6_##TAG##_ai,      SYNTAXA "(Rt32+#s4)" NT SYNTAXB,ATTRIB,DESCR,{ fEA_RI(RtV,VEC_SCALE(siV)); BEH;}) \
+EXTINSN(V6_##TAG##_ppu,      SYNTAXA "(Rx32++Mu2)" NT SYNTAXB,ATTRIB,DESCR,{ fEA_REG(RxV); BEH; fPM_M(RxV,MuV); }) \
+
+
+#define MMVEC_COND_EACH_EA_TRUE(TAG,DESCR,ATTRIB,NT,SYNTAXA,SYNTAXB,SYNTAXP,BEH) \
+EXTINSN(V6_##TAG##_pred_pi,      "if (" #SYNTAXP "4) " SYNTAXA "(Rx32++#s3)" NT SYNTAXB, ATTRIB,DESCR, { if (fLSBOLD(SYNTAXP##V)) { fEA_REG(RxV); BEH; fPM_I(RxV,siV*fVECSIZE()); } else {CANCEL;}}) \
+EXTINSN(V6_##TAG##_pred_ai,      "if (" #SYNTAXP "4) " SYNTAXA "(Rt32+#s4)" NT SYNTAXB, ATTRIB,DESCR,  { if (fLSBOLD(SYNTAXP##V)) { fEA_RI(RtV,siV*fVECSIZE()); BEH;} else {CANCEL;}}) \
+EXTINSN(V6_##TAG##_pred_ppu,     "if (" #SYNTAXP "4) " SYNTAXA "(Rx32++Mu2)" NT SYNTAXB,ATTRIB,DESCR,  { if (fLSBOLD(SYNTAXP##V)) { fEA_REG(RxV); BEH; fPM_M(RxV,MuV); } else {CANCEL;}}) \
+
+#define MMVEC_COND_EACH_EA_FALSE(TAG,DESCR,ATTRIB,NT,SYNTAXA,SYNTAXB,SYNTAXP,BEH) \
+EXTINSN(V6_##TAG##_npred_pi,     "if (!" #SYNTAXP "4) " SYNTAXA "(Rx32++#s3)" NT SYNTAXB,ATTRIB,DESCR,{ if (fLSBOLDNOT(SYNTAXP##V)) { fEA_REG(RxV); BEH; fPM_I(RxV,siV*fVECSIZE()); } else {CANCEL;}}) \
+EXTINSN(V6_##TAG##_npred_ai,     "if (!" #SYNTAXP "4) " SYNTAXA "(Rt32+#s4)" NT SYNTAXB,ATTRIB,DESCR, { if (fLSBOLDNOT(SYNTAXP##V)) { fEA_RI(RtV,siV*fVECSIZE()); BEH;} else {CANCEL;}}) \
+EXTINSN(V6_##TAG##_npred_ppu,    "if (!" #SYNTAXP "4) " SYNTAXA "(Rx32++Mu2)" NT SYNTAXB,ATTRIB,DESCR,{ if (fLSBOLDNOT(SYNTAXP##V)) { fEA_REG(RxV); BEH; fPM_M(RxV,MuV); } else {CANCEL;}})
+
+#define MMVEC_COND_EACH_EA(TAG,DESCR,ATTRIB,NT,SYNTAXA,SYNTAXB,SYNTAXP,BEH) \
+MMVEC_COND_EACH_EA_TRUE(TAG,DESCR,ATTRIB,NT,SYNTAXA,SYNTAXB,SYNTAXP,BEH) \
+MMVEC_COND_EACH_EA_FALSE(TAG,DESCR,ATTRIB,NT,SYNTAXA,SYNTAXB,SYNTAXP,BEH)
+
+
+#define VEC_SCALE(X) X*fVECSIZE()
+
+
+#define MMVEC_LD(TAG,DESCR,ATTRIB,NT) MMVEC_EACH_EA(TAG,DESCR,ATTRIB,NT,"Vd32=vmem","",fLOADMMV(EA,VdV))
+#define MMVEC_LDC(TAG,DESCR,ATTRIB,NT) MMVEC_EACH_EA(TAG##_cur,DESCR,ATTRIB,NT,"Vd32.cur=vmem","",fLOADMMV(EA,VdV))
+#define MMVEC_LDT(TAG,DESCR,ATTRIB,NT) MMVEC_EACH_EA(TAG##_tmp,DESCR,ATTRIB,NT,"Vd32.tmp=vmem","",fLOADMMV(EA,VdV))
+#define MMVEC_LDU(TAG,DESCR,ATTRIB,NT) MMVEC_EACH_EA(TAG,DESCR,ATTRIB,NT,"Vd32=vmemu","",fLOADMMVU(EA,VdV))
+
+
+#define MMVEC_STQ(TAG,DESCR,ATTRIB,NT) \
+MMVEC_EACH_EA(TAG##_qpred,DESCR,ATTRIB,NT,"if (Qv4) vmem","=Vs32",fSTOREMMVQ(EA,VsV,QvV)) \
+MMVEC_EACH_EA(TAG##_nqpred,DESCR,ATTRIB,NT,"if (!Qv4) vmem","=Vs32",fSTOREMMVNQ(EA,VsV,QvV))
+
+/****************************************************************
+* MAPPING FOR VMEMs
+****************************************************************/
+
+#define ATTR_VMEM A_EXTENSION,A_CVI,A_CVI_VM
+#define ATTR_VMEMU A_EXTENSION,A_CVI,A_CVI_VM,A_CVI_VP
+
+
+MMVEC_LD(vL32b,  "Aligned Vector Load",        ATTRIBS(ATTR_VMEM,A_LOAD,A_CVI_VA),)
+MMVEC_LDC(vL32b,  "Aligned Vector Load Cur",	ATTRIBS(ATTR_VMEM,A_LOAD,A_CVI_NEW,A_CVI_VA),)
+MMVEC_LDT(vL32b,  "Aligned Vector Load Tmp",	ATTRIBS(ATTR_VMEM,A_LOAD,A_CVI_TMP),)
+
+MMVEC_COND_EACH_EA(vL32b,"Conditional Aligned Vector Load",ATTRIBS(ATTR_VMEM,A_LOAD,A_CVI_VA),,"Vd32=vmem",,Pv,fLOADMMV(EA,VdV);)
+MMVEC_COND_EACH_EA(vL32b_cur,"Conditional Aligned Vector Load Cur",ATTRIBS(ATTR_VMEM,A_LOAD,A_CVI_VA,A_CVI_NEW),,"Vd32.cur=vmem",,Pv,fLOADMMV(EA,VdV);)
+MMVEC_COND_EACH_EA(vL32b_tmp,"Conditional Aligned Vector Load Tmp",ATTRIBS(ATTR_VMEM,A_LOAD,A_CVI_TMP),,"Vd32.tmp=vmem",,Pv,fLOADMMV(EA,VdV);)
+
+MMVEC_EACH_EA(vS32b,"Aligned Vector Store",ATTRIBS(ATTR_VMEM,A_STORE,A_RESTRICT_SLOT0ONLY,A_CVI_VA),,"vmem","=Vs32",fSTOREMMV(EA,VsV))
+MMVEC_COND_EACH_EA(vS32b,"Aligned Vector Store",ATTRIBS(ATTR_VMEM,A_STORE,A_RESTRICT_SLOT0ONLY,A_CVI_VA),,"vmem","=Vs32",Pv,fSTOREMMV(EA,VsV))
+
+
+MMVEC_STQ(vS32b,  "Aligned Vector Store",      ATTRIBS(ATTR_VMEM,A_STORE,A_RESTRICT_SLOT0ONLY,A_CVI_VA),)
+
+MMVEC_LDU(vL32Ub, "Unaligned Vector Load",     ATTRIBS(ATTR_VMEMU,A_LOAD,A_RESTRICT_NOSLOT1),)
+
+MMVEC_EACH_EA(vS32Ub,"Unaligned Vector Store",ATTRIBS(ATTR_VMEMU,A_STORE,A_RESTRICT_NOSLOT1),,"vmemu","=Vs32",fSTOREMMVU(EA,VsV))
+
+MMVEC_COND_EACH_EA(vS32Ub,"Unaligned Vector Store",ATTRIBS(ATTR_VMEMU,A_STORE,A_RESTRICT_NOSLOT1),,"vmemu","=Vs32",Pv,fSTOREMMVU(EA,VsV))
+
+MMVEC_EACH_EA(vS32b_new,"Aligned Vector Store New",ATTRIBS(ATTR_VMEM,A_STORE,A_CVI_NEW,A_DOTNEWVALUE,A_RESTRICT_SLOT0ONLY),,"vmem","=Os8.new",fSTOREMMV(EA,fNEWVREG(OsN)))
+
+// V65 store relase, zero byte store
+MMVEC_EACH_EA(vS32b_srls,"Aligned Vector Scatter Release",ATTRIBS(ATTR_VMEM,A_STORE,A_CVI_SCATTER_RELEASE,A_CVI_NEW,A_RESTRICT_SLOT0ONLY),,"vmem",":scatter_release",fSTORERELEASE(EA,0))
+
+
+
+MMVEC_COND_EACH_EA(vS32b_new,"Aligned Vector Store New",ATTRIBS(ATTR_VMEM,A_STORE,A_CVI_NEW,A_DOTNEWVALUE,A_RESTRICT_SLOT0ONLY),,"vmem","=Os8.new",Pv,fSTOREMMV(EA,fNEWVREG(OsN)))
+
+
+/******************************************************************************************
+*
+* MMVECTOR MEMORY OPERATIONS - NON TEMPORAL
+*
+*******************************************************************************************/
+
+#define ATTR_VMEM_NT A_EXTENSION,A_CVI,A_CVI_VM
+
+MMVEC_EACH_EA(vS32b_nt,"Aligned Vector Store - Non temporal",ATTRIBS(ATTR_VMEM_NT,A_STORE,A_RESTRICT_SLOT0ONLY,A_CVI_VA),":nt","vmem","=Vs32",fSTOREMMV(EA,VsV))
+MMVEC_COND_EACH_EA(vS32b_nt,"Aligned Vector Store - Non temporal",ATTRIBS(ATTR_VMEM_NT,A_STORE,A_RESTRICT_SLOT0ONLY,A_CVI_VA),":nt","vmem","=Vs32",Pv,fSTOREMMV(EA,VsV))
+
+MMVEC_EACH_EA(vS32b_nt_new,"Aligned Vector Store New - Non temporal",ATTRIBS(ATTR_VMEM_NT,A_STORE,A_CVI_NEW,A_DOTNEWVALUE,A_RESTRICT_SLOT0ONLY),":nt","vmem","=Os8.new",fSTOREMMV(EA,fNEWVREG(OsN)))
+MMVEC_COND_EACH_EA(vS32b_nt_new,"Aligned Vector Store New - Non temporal",ATTRIBS(ATTR_VMEM_NT,A_STORE,A_CVI_NEW,A_DOTNEWVALUE,A_RESTRICT_SLOT0ONLY),":nt","vmem","=Os8.new",Pv,fSTOREMMV(EA,fNEWVREG(OsN)))
+
+
+MMVEC_STQ(vS32b_nt,  "Aligned Vector Store - Non temporal",      ATTRIBS(ATTR_VMEM_NT,A_STORE,A_RESTRICT_SLOT0ONLY,A_CVI_VA),":nt")
+
+MMVEC_LD(vL32b_nt,  "Aligned Vector Load - Non temporal",       ATTRIBS(ATTR_VMEM_NT,A_LOAD,A_CVI_VA),":nt")
+MMVEC_LDC(vL32b_nt,  "Aligned Vector Load Cur - Non temporal",	ATTRIBS(ATTR_VMEM_NT,A_LOAD,A_CVI_NEW,A_CVI_VA),":nt")
+MMVEC_LDT(vL32b_nt,  "Aligned Vector Load Tmp - Non temporal",	ATTRIBS(ATTR_VMEM_NT,A_LOAD,A_CVI_TMP),":nt")
+
+MMVEC_COND_EACH_EA(vL32b_nt,"Conditional Aligned Vector Load",ATTRIBS(ATTR_VMEM_NT,A_CVI_VA),,"Vd32=vmem",":nt",Pv,fLOADMMV(EA,VdV);)
+MMVEC_COND_EACH_EA(vL32b_nt_cur,"Conditional Aligned Vector Load Cur",ATTRIBS(ATTR_VMEM_NT,A_CVI_VA,A_CVI_NEW),,"Vd32.cur=vmem",":nt",Pv,fLOADMMV(EA,VdV);)
+MMVEC_COND_EACH_EA(vL32b_nt_tmp,"Conditional Aligned Vector Load Tmp",ATTRIBS(ATTR_VMEM_NT,A_CVI_TMP),,"Vd32.tmp=vmem",":nt",Pv,fLOADMMV(EA,VdV);)
+
+
+#undef VEC_SCALE
+
+
+/***************************************************
+ * Vector Alignment
+ ************************************************/
+
+#define VALIGNB(SHIFT)  \
+    fHIDE(int i;) \
+    for(i = 0; i < fVBYTES(); i++) {\
+        VdV.ub[i] = (i+SHIFT>=fVBYTES()) ? VuV.ub[i+SHIFT-fVBYTES()] : VvV.ub[i+SHIFT];\
+	}
+
+EXTINSN(V6_valignb,  "Vd32=valign(Vu32,Vv32,Rt8)",  ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VP),"Align Two vectors by Rt8 as control",
+{
+	unsigned shift = RtV & (fVBYTES()-1);
+	VALIGNB(shift)
+})
+EXTINSN(V6_vlalignb, "Vd32=vlalign(Vu32,Vv32,Rt8)", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VP),"Align Two vectors by Rt8 as control",
+{
+	unsigned shift = fVBYTES() - (RtV & (fVBYTES()-1));
+	VALIGNB(shift)
+})
+EXTINSN(V6_valignbi, "Vd32=valign(Vu32,Vv32,#u3)", 	ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VP),"Align Two vectors by #u3 as control",
+{
+	VALIGNB(uiV)
+})
+EXTINSN(V6_vlalignbi,"Vd32=vlalign(Vu32,Vv32,#u3)", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VP),"Align Two vectors by #u3 as control",
+{
+	unsigned shift = fVBYTES() - uiV;
+	VALIGNB(shift)
+})
+
+EXTINSN(V6_vror, "Vd32=vror(Vu32,Rt32)", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VP),
+"Align Two vectors by Rt32 as control",
+{
+	fHIDE(int k;)
+	for (k=0;k<fVBYTES();k++) {
+		VdV.ub[k] = VuV.ub[(k+RtV)&(fVBYTES()-1)];
+	}
+	})
+
+
+
+
+
+
+
+/**************************************************************
+* Unpack elements with zero/sign extend and cross lane permute
+***************************************************************/
+
+ITERATOR_INSN2_PERMUTE_SLOT_DOUBLE_VEC(8,vunpackub,  "Vdd32=vunpackub(Vu32)", "Vdd32.uh=vunpack(Vu32.ub)", "Unpack byte with zero-extend",     fVARRAY_ELEMENT_ACCESS(VddV, uh, i)  = fZE8_16( VuV.ub[i]))
+ITERATOR_INSN2_PERMUTE_SLOT_DOUBLE_VEC(8,vunpackb,   "Vdd32=vunpackb(Vu32)",  "Vdd32.h=vunpack(Vu32.b)",   "Unpack bytes with sign-extend",    fVARRAY_ELEMENT_ACCESS(VddV, h,  i)  = fSE8_16( VuV.b[i] ))
+ITERATOR_INSN2_PERMUTE_SLOT_DOUBLE_VEC(16,vunpackuh, "Vdd32=vunpackuh(Vu32)", "Vdd32.uw=vunpack(Vu32.uh)", "Unpack halves with zero-extend",   fVARRAY_ELEMENT_ACCESS(VddV, uw, i)  = fZE16_32(VuV.uh[i]))
+ITERATOR_INSN2_PERMUTE_SLOT_DOUBLE_VEC(16,vunpackh,  "Vdd32=vunpackh(Vu32)",  "Vdd32.w=vunpack(Vu32.h)",   "Unpack halves with sign-extend",   fVARRAY_ELEMENT_ACCESS(VddV, w,  i)  = fSE16_32(VuV.h[i] ))
+
+ITERATOR_INSN2_PERMUTE_SLOT_DOUBLE_VEC(8, vunpackob, "Vxx32|=vunpackob(Vu32)", "Vxx32.h|=vunpacko(Vu32.b)", "Unpack byte to odd bytes ",       fVARRAY_ELEMENT_ACCESS(VxxV, uh, i) |= fZE8_16( VuV.ub[i])<<8)
+ITERATOR_INSN2_PERMUTE_SLOT_DOUBLE_VEC(16,vunpackoh, "Vxx32|=vunpackoh(Vu32)", "Vxx32.w|=vunpacko(Vu32.h)", "Unpack halves to odd halves",     fVARRAY_ELEMENT_ACCESS(VxxV, uw, i) |= fZE16_32(VuV.uh[i])<<16)
+
+
+/**************************************************************
+* Pack elements and cross lane permute
+***************************************************************/
+
+ ITERATOR_INSN2_PERMUTE_SLOT(16, vpackeb,  "Vd32=vpackeb(Vu32,Vv32)", "Vd32.b=vpacke(Vu32.h,Vv32.h)",
+ "Pack  bytes",
+    VdV.ub[i]               = fGETUBYTE(0, VvV.uh[i]);
+    VdV.ub[i+fVELEM(16)]    = fGETUBYTE(0, VuV.uh[i]))
+
+ ITERATOR_INSN2_PERMUTE_SLOT(32, vpackeh,  "Vd32=vpackeh(Vu32,Vv32)", "Vd32.h=vpacke(Vu32.w,Vv32.w)",
+ "Pack  halfwords",
+    VdV.uh[i]               = fGETUHALF(0, VvV.uw[i]);
+    VdV.uh[i+fVELEM(32)]    = fGETUHALF(0, VuV.uw[i]))
+
+  ITERATOR_INSN2_PERMUTE_SLOT(16, vpackob,  "Vd32=vpackob(Vu32,Vv32)", "Vd32.b=vpacko(Vu32.h,Vv32.h)",
+ "Pack  bytes",
+    VdV.ub[i]               = fGETUBYTE(1, VvV.uh[i]);
+    VdV.ub[i+fVELEM(16)]    = fGETUBYTE(1, VuV.uh[i]))
+
+ ITERATOR_INSN2_PERMUTE_SLOT(32, vpackoh,  "Vd32=vpackoh(Vu32,Vv32)", "Vd32.h=vpacko(Vu32.w,Vv32.w)",
+ "Pack  halfwords",
+    VdV.uh[i]               = fGETUHALF(1, VvV.uw[i]);
+    VdV.uh[i+fVELEM(32)]    = fGETUHALF(1, VuV.uw[i]))
+
+
+
+ITERATOR_INSN2_PERMUTE_SLOT(16, vpackhub_sat,  "Vd32=vpackhub(Vu32,Vv32):sat", "Vd32.ub=vpack(Vu32.h,Vv32.h):sat",
+ "Pack ubytes with saturation",
+    VdV.ub[i]               = fVSATUB(VvV.h[i]);
+    VdV.ub[i+fVELEM(16)]    = fVSATUB(VuV.h[i]))
+
+
+ITERATOR_INSN2_PERMUTE_SLOT(16, vpackhb_sat,  "Vd32=vpackhb(Vu32,Vv32):sat", "Vd32.b=vpack(Vu32.h,Vv32.h):sat",
+ "Pack bytes with saturation",
+    VdV.b[i]               = fVSATB(VvV.h[i]);
+    VdV.b[i+fVELEM(16)]    = fVSATB(VuV.h[i]))
+
+
+ITERATOR_INSN2_PERMUTE_SLOT(32, vpackwuh_sat,  "Vd32=vpackwuh(Vu32,Vv32):sat", "Vd32.uh=vpack(Vu32.w,Vv32.w):sat",
+ "Pack ubytes with saturation",
+    VdV.uh[i]               = fVSATUH(VvV.w[i]);
+    VdV.uh[i+fVELEM(32)]    = fVSATUH(VuV.w[i]))
+
+ITERATOR_INSN2_PERMUTE_SLOT(32, vpackwh_sat,  "Vd32=vpackwh(Vu32,Vv32):sat", "Vd32.h=vpack(Vu32.w,Vv32.w):sat",
+ "Pack bytes with saturation",
+    VdV.h[i]               = fVSATH(VvV.w[i]);
+    VdV.h[i+fVELEM(32)]    = fVSATH(VuV.w[i]))
+
+
+
+
+
+/**************************************************************
+* Zero/Sign Extend with in-lane permute
+***************************************************************/
+
+ITERATOR_INSN2_ANY_SLOT_DOUBLE_VEC(16,vzb,"Vdd32=vzxtb(Vu32)","Vdd32.uh=vzxt(Vu32.ub)",
+"Vector Zero Extend Bytes",
+    VddV.v[0].uh[i] = fZE8_16(fGETUBYTE(0, VuV.uh[i]));
+    VddV.v[1].uh[i] = fZE8_16(fGETUBYTE(1, VuV.uh[i])))
+
+ITERATOR_INSN2_ANY_SLOT_DOUBLE_VEC(16,vsb,"Vdd32=vsxtb(Vu32)","Vdd32.h=vsxt(Vu32.b)",
+"Vector Sign Extend Bytes",
+    VddV.v[0].h[i] = fSE8_16(fGETBYTE(0, VuV.h[i]));
+    VddV.v[1].h[i] = fSE8_16(fGETBYTE(1, VuV.h[i])))
+
+ITERATOR_INSN2_ANY_SLOT_DOUBLE_VEC(32,vzh,"Vdd32=vzxth(Vu32)","Vdd32.uw=vzxt(Vu32.uh)",
+"Vector Zero Extend halfwords",
+    VddV.v[0].uw[i] = fZE16_32(fGETUHALF(0, VuV.uw[i]));
+    VddV.v[1].uw[i] = fZE16_32(fGETUHALF(1, VuV.uw[i])))
+
+ITERATOR_INSN2_ANY_SLOT_DOUBLE_VEC(32,vsh,"Vdd32=vsxth(Vu32)","Vdd32.w=vsxt(Vu32.h)",
+"Vector Sign Extend halfwords",
+    VddV.v[0].w[i] = fSE16_32(fGETHALF(0, VuV.w[i]));
+    VddV.v[1].w[i] = fSE16_32(fGETHALF(1, VuV.w[i])))
+
+
+/**********************************************************************
+*
+*
+*
+*               MMVECTOR REDUCTION
+*
+*
+*
+**********************************************************************/
+
+/********************************************
+*  2-WAY REDUCTION - UNSIGNED BYTE BY BYTE
+********************************************/
+
+
+ITERATOR_INSN2_MPY_SLOT(16,vdmpybus,"Vd32=vdmpybus(Vu32,Rt32)","Vd32.h=vdmpy(Vu32.ub,Rt32.b)",
+"Vector Dual Multiply-Accumulates unsigned bytes by bytes",
+    VdV.h[i]   = fMPY8US( fGETUBYTE(0, VuV.uh[i]), fGETBYTE((2*i) % 4, RtV));
+    VdV.h[i]  += fMPY8US( fGETUBYTE(1, VuV.uh[i]), fGETBYTE((2*i+1)%4, RtV)))
+
+ITERATOR_INSN2_MPY_SLOT(16,vdmpybus_acc,"Vx32+=vdmpybus(Vu32,Rt32)","Vx32.h+=vdmpy(Vu32.ub,Rt32.b)",
+"Vector Dual Multiply-Accumulates unsigned bytes by  bytes, and accumulate",
+    VxV.h[i] += fMPY8US( fGETUBYTE(0, VuV.uh[i]), fGETBYTE((2*i) % 4, RtV));
+    VxV.h[i] += fMPY8US( fGETUBYTE(1, VuV.uh[i]), fGETBYTE((2*i+1)%4, RtV)))
+
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vdmpybus_dv,"Vdd32=vdmpybus(Vuu32,Rt32)","Vdd32.h=vdmpy(Vuu32.ub,Rt32.b)",
+"Vector Dual Multiply-Accumulates unsigned bytes by  bytes, and accumulate Sliding Window Reduction",
+    VddV.v[0].h[i]  = fMPY8US(fGETUBYTE(0, VuuV.v[0].uh[i]),fGETBYTE((2*i) % 4, RtV));
+    VddV.v[0].h[i] += fMPY8US(fGETUBYTE(1, VuuV.v[0].uh[i]),fGETBYTE((2*i+1)%4, RtV));
+
+    VddV.v[1].h[i]  = fMPY8US(fGETUBYTE(1, VuuV.v[0].uh[i]),fGETBYTE((2*i) % 4, RtV));
+    VddV.v[1].h[i] += fMPY8US(fGETUBYTE(0, VuuV.v[1].uh[i]),fGETBYTE((2*i+1)%4, RtV)))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vdmpybus_dv_acc,"Vxx32+=vdmpybus(Vuu32,Rt32)","Vxx32.h+=vdmpy(Vuu32.ub,Rt32.b)",
+"Vector Dual Multiply-Accumulates unsigned bytes by  bytes, and accumulate Sliding Window Reduction",
+    VxxV.v[0].h[i] += fMPY8US(fGETUBYTE(0, VuuV.v[0].uh[i]),fGETBYTE((2*i) % 4, RtV));
+    VxxV.v[0].h[i] += fMPY8US(fGETUBYTE(1, VuuV.v[0].uh[i]),fGETBYTE((2*i+1)%4, RtV));
+
+    VxxV.v[1].h[i] += fMPY8US(fGETUBYTE(1, VuuV.v[0].uh[i]),fGETBYTE((2*i) % 4, RtV));
+    VxxV.v[1].h[i] += fMPY8US(fGETUBYTE(0, VuuV.v[1].uh[i]),fGETBYTE((2*i+1)%4, RtV)))
+
+
+
+/********************************************
+*  2-WAY REDUCTION - HALF BY BYTE
+********************************************/
+ITERATOR_INSN2_MPY_SLOT(32,vdmpyhb,"Vd32=vdmpyhb(Vu32,Rt32)","Vd32.w=vdmpy(Vu32.h,Rt32.b)",
+"Dual-Vector 2-Element Half x Byte Reduction with Sliding Window Overlap",
+    VdV.w[i]  = fMPY16SS(fGETHALF(0, VuV.w[i]),fGETBYTE((2*i+0)%4, RtV));
+    VdV.w[i] += fMPY16SS(fGETHALF(1, VuV.w[i]),fGETBYTE((2*i+1)%4, RtV)))
+
+ITERATOR_INSN2_MPY_SLOT(32,vdmpyhb_acc,"Vx32+=vdmpyhb(Vu32,Rt32)","Vx32.w+=vdmpy(Vu32.h,Rt32.b)",
+"Dual-Vector 2-Element Half x Byte Reduction with Sliding Window Overlap",
+    VxV.w[i] += fMPY16SS(fGETHALF(0, VuV.w[i]),fGETBYTE((2*i+0)%4, RtV));
+    VxV.w[i] += fMPY16SS(fGETHALF(1, VuV.w[i]),fGETBYTE((2*i+1)%4, RtV)))
+
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vdmpyhb_dv,"Vdd32=vdmpyhb(Vuu32,Rt32)","Vdd32.w=vdmpy(Vuu32.h,Rt32.b)",
+"Dual-Vector 2-Element Half x Byte Reduction with Sliding Window Overlap",
+    VddV.v[0].w[i]  = fMPY16SS(fGETHALF(0, VuuV.v[0].w[i]),fGETBYTE((2*i+0)%4, RtV));
+    VddV.v[0].w[i] += fMPY16SS(fGETHALF(1, VuuV.v[0].w[i]),fGETBYTE((2*i+1)%4, RtV));
+
+    VddV.v[1].w[i]  = fMPY16SS(fGETHALF(1, VuuV.v[0].w[i]),fGETBYTE((2*i+0)%4, RtV));
+    VddV.v[1].w[i] += fMPY16SS(fGETHALF(0, VuuV.v[1].w[i]),fGETBYTE((2*i+1)%4, RtV)))
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vdmpyhb_dv_acc,"Vxx32+=vdmpyhb(Vuu32,Rt32)","Vxx32.w+=vdmpy(Vuu32.h,Rt32.b)",
+"Dual-Vector 2-Element Half x Byte Reduction with Sliding Window Overlap",
+    VxxV.v[0].w[i] += fMPY16SS(fGETHALF(0, VuuV.v[0].w[i]),fGETBYTE((2*i+0)%4, RtV));
+    VxxV.v[0].w[i] += fMPY16SS(fGETHALF(1, VuuV.v[0].w[i]),fGETBYTE((2*i+1)%4, RtV));
+
+    VxxV.v[1].w[i] += fMPY16SS(fGETHALF(1, VuuV.v[0].w[i]),fGETBYTE((2*i+0)%4, RtV));
+    VxxV.v[1].w[i] += fMPY16SS(fGETHALF(0, VuuV.v[1].w[i]),fGETBYTE((2*i+1)%4, RtV)))
+
+
+
+
+
+/********************************************
+*  2-WAY REDUCTION - HALF BY HALF
+********************************************/
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vdmpyhvsat,"Vd32=vdmpyh(Vu32,Vv32):sat","Vd32.w=vdmpy(Vu32.h,Vv32.h):sat",
+"Vector halfword multiply, accumulate pairs, sat to word",
+    fHIDE(size8s_t accum;)
+    accum    = fMPY16SS(fGETHALF(0,VuV.w[i]),fGETHALF(0, VvV.w[i]));
+    accum   += fMPY16SS(fGETHALF(1,VuV.w[i]),fGETHALF(1, VvV.w[i]));
+    VdV.w[i] = fVSATW(accum))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vdmpyhvsat_acc,"Vx32+=vdmpyh(Vu32,Vv32):sat","Vx32.w+=vdmpy(Vu32.h,Vv32.h):sat",
+"Vector halfword multiply, accumulate pairs, sat to word",
+    fHIDE(size8s_t accum;)
+    accum    = fMPY16SS(fGETHALF(0,VuV.w[i]),fGETHALF(0, VvV.w[i]));
+    accum   += fMPY16SS(fGETHALF(1,VuV.w[i]),fGETHALF(1, VvV.w[i]));
+    VxV.w[i] = fVSATW(VxV.w[i]+accum))
+
+
+/* VDMPYH */
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vdmpyhsat,"Vd32=vdmpyh(Vu32,Rt32):sat","Vd32.w=vdmpy(Vu32.h,Rt32.h):sat",
+"Vector halfword multiply, accumulate pairs, saturate to word",
+    fHIDE(size8s_t accum;)
+    accum    = fMPY16SS(fGETHALF(0, VuV.w[i]),fGETHALF(0, RtV));
+    accum   += fMPY16SS(fGETHALF(1, VuV.w[i]),fGETHALF(1, RtV));
+    VdV.w[i] = fVSATW(accum))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vdmpyhsat_acc,"Vx32+=vdmpyh(Vu32,Rt32):sat","Vx32.w+=vdmpy(Vu32.h,Rt32.h):sat",
+"Vector halfword multiply, accumulate pairs, saturate to word",
+    fHIDE(size8s_t) accum = VxV.w[i];
+    accum   += fMPY16SS(fGETHALF(0, VuV.w[i]),fGETHALF(0, RtV));
+    accum   += fMPY16SS(fGETHALF(1, VuV.w[i]),fGETHALF(1, RtV));
+    VxV.w[i] = fVSATW(accum))
+
+
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vdmpyhisat,"Vd32=vdmpyh(Vuu32,Rt32):sat","Vd32.w=vdmpy(Vuu32.h,Rt32.h):sat",
+"Dual Vector Signed Halfword by Signed Halfword 2-Way Reduction to Halfword with saturation",
+    fHIDE(size8s_t accum;)
+    accum    = fMPY16SS(fGETHALF(1,VuuV.v[0].w[i]),fGETHALF(0,RtV));
+    accum   += fMPY16SS(fGETHALF(0,VuuV.v[1].w[i]),fGETHALF(1,RtV));
+    VdV.w[i] = fVSATW(accum))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vdmpyhisat_acc,"Vx32+=vdmpyh(Vuu32,Rt32):sat","Vx32.w+=vdmpy(Vuu32.h,Rt32.h):sat",
+"Dual Vector Signed Halfword by Signed Halfword 2-Way Reduction to Halfword with accumulation and saturation",
+    fHIDE(size8s_t) accum = VxV.w[i];
+    accum   += fMPY16SS(fGETHALF(1,VuuV.v[0].w[i]),fGETHALF(0,RtV));
+    accum   += fMPY16SS(fGETHALF(0,VuuV.v[1].w[i]),fGETHALF(1,RtV));
+    VxV.w[i] = fVSATW(accum))
+
+
+
+
+
+
+
+/* VDMPYHSU */
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vdmpyhsusat,"Vd32=vdmpyhsu(Vu32,Rt32):sat","Vd32.w=vdmpy(Vu32.h,Rt32.uh):sat",
+"Vector halfword multiply, accumulate pairs, saturate to word",
+    fHIDE(size8s_t accum;)
+    accum    = fMPY16SU(fGETHALF(0, VuV.w[i]),fGETUHALF(0, RtV));
+    accum   += fMPY16SU(fGETHALF(1, VuV.w[i]),fGETUHALF(1, RtV));
+    VdV.w[i] = fVSATW(accum))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vdmpyhsusat_acc,"Vx32+=vdmpyhsu(Vu32,Rt32):sat","Vx32.w+=vdmpy(Vu32.h,Rt32.uh):sat",
+"Vector halfword multiply, accumulate pairs, saturate to word",
+    fHIDE(size8s_t) accum=VxV.w[i];
+    accum   += fMPY16SU(fGETHALF(0, VuV.w[i]),fGETUHALF(0, RtV));
+    accum   += fMPY16SU(fGETHALF(1, VuV.w[i]),fGETUHALF(1, RtV));
+    VxV.w[i] = fVSATW(accum))
+
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vdmpyhsuisat,"Vd32=vdmpyhsu(Vuu32,Rt32,#1):sat","Vd32.w=vdmpy(Vuu32.h,Rt32.uh,#1):sat",
+"Dual Vector Signed Halfword by Signed Halfword 2-Way Reduction to Halfword with saturation",
+    fHIDE(size8s_t accum;)
+    accum    = fMPY16SU(fGETHALF(1,VuuV.v[0].w[i]),fGETUHALF(0,RtV));
+    accum   += fMPY16SU(fGETHALF(0,VuuV.v[1].w[i]),fGETUHALF(1,RtV));
+    VdV.w[i] = fVSATW(accum))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vdmpyhsuisat_acc,"Vx32+=vdmpyhsu(Vuu32,Rt32,#1):sat","Vx32.w+=vdmpy(Vuu32.h,Rt32.uh,#1):sat",
+"Dual Vector Signed Halfword by Signed Halfword 2-Way Reduction to Halfword with accumulation and saturation",
+    fHIDE(size8s_t) accum=VxV.w[i];
+    accum   += fMPY16SU(fGETHALF(1, VuuV.v[0].w[i]),fGETUHALF(0,RtV));
+    accum   += fMPY16SU(fGETHALF(0, VuuV.v[1].w[i]),fGETUHALF(1,RtV));
+    VxV.w[i] = fVSATW(accum))
+
+
+
+/********************************************
+*  3-WAY REDUCTION - UNSIGNED BYTE BY  BYTE
+********************************************/
+
+ ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vtmpyb, "Vdd32=vtmpyb(Vuu32,Rt32)", "Vdd32.h=vtmpy(Vuu32.b,Rt32.b)",
+"Dual Vector 3x1 Reduction",
+    VddV.v[0].h[i]  = fMPY8SS(fGETBYTE(0,VuuV.v[0].h[i]), fGETBYTE((2*i  )%4, RtV));
+    VddV.v[0].h[i] += fMPY8SS(fGETBYTE(1,VuuV.v[0].h[i]), fGETBYTE((2*i+1)%4, RtV));
+    VddV.v[0].h[i] += fGETBYTE(0,VuuV.v[1].h[i]);
+
+    VddV.v[1].h[i]  = fMPY8SS(fGETBYTE(1,VuuV.v[0].h[i]), fGETBYTE((2*i  )%4, RtV));
+    VddV.v[1].h[i] += fMPY8SS(fGETBYTE(0,VuuV.v[1].h[i]), fGETBYTE((2*i+1)%4, RtV));
+    VddV.v[1].h[i] += fGETBYTE(1,VuuV.v[1].h[i]))
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vtmpyb_acc, "Vxx32+=vtmpyb(Vuu32,Rt32)", "Vxx32.h+=vtmpy(Vuu32.b,Rt32.b)",
+"Dual Vector 3x1 Reduction",
+    VxxV.v[0].h[i] += fMPY8SS(fGETBYTE(0,VuuV.v[0].h[i]), fGETBYTE((2*i  )%4, RtV));
+    VxxV.v[0].h[i] += fMPY8SS(fGETBYTE(1,VuuV.v[0].h[i]), fGETBYTE((2*i+1)%4, RtV));
+    VxxV.v[0].h[i] += fGETBYTE(0,VuuV.v[1].h[i]);
+
+    VxxV.v[1].h[i] += fMPY8SS(fGETBYTE(1,VuuV.v[0].h[i]), fGETBYTE((2*i  )%4, RtV));
+    VxxV.v[1].h[i] += fMPY8SS(fGETBYTE(0,VuuV.v[1].h[i]), fGETBYTE((2*i+1)%4, RtV));
+    VxxV.v[1].h[i] += fGETBYTE(1,VuuV.v[1].h[i]))
+
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vtmpybus, "Vdd32=vtmpybus(Vuu32,Rt32)", "Vdd32.h=vtmpy(Vuu32.ub,Rt32.b)",
+"Dual Vector 3x1 Reduction",
+    VddV.v[0].h[i]  = fMPY8US(fGETUBYTE(0,VuuV.v[0].uh[i]), fGETBYTE((2*i  )%4, RtV));
+    VddV.v[0].h[i] += fMPY8US(fGETUBYTE(1,VuuV.v[0].uh[i]), fGETBYTE((2*i+1)%4, RtV));
+    VddV.v[0].h[i] += fGETUBYTE(0,VuuV.v[1].uh[i]);
+
+    VddV.v[1].h[i]  = fMPY8US(fGETUBYTE(1,VuuV.v[0].uh[i]), fGETBYTE((2*i  )%4, RtV));
+    VddV.v[1].h[i] += fMPY8US(fGETUBYTE(0,VuuV.v[1].uh[i]), fGETBYTE((2*i+1)%4, RtV));
+    VddV.v[1].h[i] += fGETUBYTE(1,VuuV.v[1].uh[i]))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vtmpybus_acc, "Vxx32+=vtmpybus(Vuu32,Rt32)", "Vxx32.h+=vtmpy(Vuu32.ub,Rt32.b)",
+"Dual Vector 3x1 Reduction",
+    VxxV.v[0].h[i] += fMPY8US(fGETUBYTE(0,VuuV.v[0].uh[i]), fGETBYTE((2*i  )%4, RtV));
+    VxxV.v[0].h[i] += fMPY8US(fGETUBYTE(1,VuuV.v[0].uh[i]), fGETBYTE((2*i+1)%4, RtV));
+    VxxV.v[0].h[i] += fGETUBYTE(0,VuuV.v[1].uh[i]);
+
+    VxxV.v[1].h[i] += fMPY8US(fGETUBYTE(1,VuuV.v[0].uh[i]), fGETBYTE((2*i  )%4, RtV));
+    VxxV.v[1].h[i] += fMPY8US(fGETUBYTE(0,VuuV.v[1].uh[i]), fGETBYTE((2*i+1)%4, RtV));
+    VxxV.v[1].h[i] += fGETUBYTE(1,VuuV.v[1].uh[i]))
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vtmpyhb, "Vdd32=vtmpyhb(Vuu32,Rt32)", "Vdd32.w=vtmpy(Vuu32.h,Rt32.b)",
+"Dual Vector 3x1 Reduction",
+    VddV.v[0].w[i] = fMPY16SS(fGETHALF(0,VuuV.v[0].w[i]), fSE8_16(fGETBYTE((2*i+0)%4, RtV)));
+    VddV.v[0].w[i]+= fMPY16SS(fGETHALF(1,VuuV.v[0].w[i]), fSE8_16(fGETBYTE((2*i+1)%4, RtV)));
+    VddV.v[0].w[i]+= fGETHALF(0,VuuV.v[1].w[i]);
+
+    VddV.v[1].w[i] = fMPY16SS(fGETHALF(1,VuuV.v[0].w[i]), fSE8_16(fGETBYTE((2*i+0)%4, RtV)));
+    VddV.v[1].w[i]+= fMPY16SS(fGETHALF(0,VuuV.v[1].w[i]), fSE8_16(fGETBYTE((2*i+1)%4, RtV)));
+    VddV.v[1].w[i]+= fGETHALF(1,VuuV.v[1].w[i]))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vtmpyhb_acc, "Vxx32+=vtmpyhb(Vuu32,Rt32)", "Vxx32.w+=vtmpy(Vuu32.h,Rt32.b)",
+"Dual Vector 3x1 Reduction",
+    VxxV.v[0].w[i]+= fMPY16SS(fGETHALF(0,VuuV.v[0].w[i]), fSE8_16(fGETBYTE((2*i+0)%4, RtV)));
+    VxxV.v[0].w[i]+= fMPY16SS(fGETHALF(1,VuuV.v[0].w[i]), fSE8_16(fGETBYTE((2*i+1)%4, RtV)));
+    VxxV.v[0].w[i]+= fGETHALF(0,VuuV.v[1].w[i]);
+
+    VxxV.v[1].w[i]+= fMPY16SS(fGETHALF(1,VuuV.v[0].w[i]), fSE8_16(fGETBYTE((2*i+0)%4, RtV)));
+    VxxV.v[1].w[i]+= fMPY16SS(fGETHALF(0,VuuV.v[1].w[i]), fSE8_16(fGETBYTE((2*i+1)%4, RtV)));
+    VxxV.v[1].w[i]+= fGETHALF(1,VuuV.v[1].w[i]))
+
+
+/********************************************
+*  4-WAY REDUCTION - UNSIGNED BYTE BY UNSIGNED BYTE
+********************************************/
+
+
+
+ITERATOR_INSN2_MPY_SLOT(32,vrmpyub,"Vd32=vrmpyub(Vu32,Rt32)","Vd32.uw=vrmpy(Vu32.ub,Rt32.ub)",
+"Vector Multiply-Accumulate Reduce with 4 byte coefficients",
+    VdV.uw[i]  = fMPY8UU(fGETUBYTE(0,VuV.uw[i]), fGETUBYTE(0,RtV));
+    VdV.uw[i] += fMPY8UU(fGETUBYTE(1,VuV.uw[i]), fGETUBYTE(1,RtV));
+    VdV.uw[i] += fMPY8UU(fGETUBYTE(2,VuV.uw[i]), fGETUBYTE(2,RtV));
+    VdV.uw[i] += fMPY8UU(fGETUBYTE(3,VuV.uw[i]), fGETUBYTE(3,RtV)))
+
+ITERATOR_INSN2_MPY_SLOT(32,vrmpyub_acc,"Vx32+=vrmpyub(Vu32,Rt32)","Vx32.uw+=vrmpy(Vu32.ub,Rt32.ub)",
+"Vector Multiply-Accumulate Reduce with 4 byte coefficients Accumulate",
+    VxV.uw[i] += fMPY8UU(fGETUBYTE(0,VuV.uw[i]), fGETUBYTE(0,RtV));
+    VxV.uw[i] += fMPY8UU(fGETUBYTE(1,VuV.uw[i]), fGETUBYTE(1,RtV));
+    VxV.uw[i] += fMPY8UU(fGETUBYTE(2,VuV.uw[i]), fGETUBYTE(2,RtV));
+    VxV.uw[i] += fMPY8UU(fGETUBYTE(3,VuV.uw[i]), fGETUBYTE(3,RtV)))
+
+
+ITERATOR_INSN2_MPY_SLOT(32,vrmpyubv,"Vd32=vrmpyub(Vu32,Vv32)","Vd32.uw=vrmpy(Vu32.ub,Vv32.ub)",
+"Vector Multiply-Accumulate Reduce with 4 byte coefficients",
+    VdV.uw[i]  = fMPY8UU(fGETUBYTE(0,VuV.uw[i]), fGETUBYTE(0,VvV.uw[i]));
+    VdV.uw[i] += fMPY8UU(fGETUBYTE(1,VuV.uw[i]), fGETUBYTE(1,VvV.uw[i]));
+    VdV.uw[i] += fMPY8UU(fGETUBYTE(2,VuV.uw[i]), fGETUBYTE(2,VvV.uw[i]));
+    VdV.uw[i] += fMPY8UU(fGETUBYTE(3,VuV.uw[i]), fGETUBYTE(3,VvV.uw[i])))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vrmpyubv_acc,"Vx32+=vrmpyub(Vu32,Vv32)","Vx32.uw+=vrmpy(Vu32.ub,Vv32.ub)",
+"Vector Multiply-Accumulate Reduce with 4 byte coefficients Accumulate",
+    VxV.uw[i] += fMPY8UU(fGETUBYTE(0,VuV.uw[i]), fGETUBYTE(0,VvV.uw[i]));
+    VxV.uw[i] += fMPY8UU(fGETUBYTE(1,VuV.uw[i]), fGETUBYTE(1,VvV.uw[i]));
+    VxV.uw[i] += fMPY8UU(fGETUBYTE(2,VuV.uw[i]), fGETUBYTE(2,VvV.uw[i]));
+    VxV.uw[i] += fMPY8UU(fGETUBYTE(3,VuV.uw[i]), fGETUBYTE(3,VvV.uw[i])))
+
+ITERATOR_INSN2_MPY_SLOT(32,vrmpybv,"Vd32=vrmpyb(Vu32,Vv32)","Vd32.w=vrmpy(Vu32.b,Vv32.b)",
+"Vector Multiply-Accumulate Reduce with 4 byte coefficients",
+    VdV.w[i]  = fMPY8SS(fGETBYTE(0, VuV.w[i]), fGETBYTE(0, VvV.w[i]));
+    VdV.w[i] += fMPY8SS(fGETBYTE(1, VuV.w[i]), fGETBYTE(1, VvV.w[i]));
+    VdV.w[i] += fMPY8SS(fGETBYTE(2, VuV.w[i]), fGETBYTE(2, VvV.w[i]));
+    VdV.w[i] += fMPY8SS(fGETBYTE(3, VuV.w[i]), fGETBYTE(3, VvV.w[i])))
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vrmpybv_acc,"Vx32+=vrmpyb(Vu32,Vv32)","Vx32.w+=vrmpy(Vu32.b,Vv32.b)",
+"Vector Multiply-Accumulate Reduce with 4 byte coefficients",
+    VxV.w[i] += fMPY8SS(fGETBYTE(0, VuV.w[i]), fGETBYTE(0, VvV.w[i]));
+    VxV.w[i] += fMPY8SS(fGETBYTE(1, VuV.w[i]), fGETBYTE(1, VvV.w[i]));
+    VxV.w[i] += fMPY8SS(fGETBYTE(2, VuV.w[i]), fGETBYTE(2, VvV.w[i]));
+    VxV.w[i] += fMPY8SS(fGETBYTE(3, VuV.w[i]), fGETBYTE(3, VvV.w[i])))
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vrmpyubi,"Vdd32=vrmpyub(Vuu32,Rt32,#u1)","Vdd32.uw=vrmpy(Vuu32.ub,Rt32.ub,#u1)",
+"Dual Vector Unsigned Byte By Signed Byte 4-way Reduction to Word",
+    VddV.v[0].uw[i]  = fMPY8UU(fGETUBYTE(0, VuuV.v[uiV ? 1:0].uw[i]),fGETUBYTE((0-uiV) & 0x3,RtV));
+    VddV.v[0].uw[i] += fMPY8UU(fGETUBYTE(1, VuuV.v[0        ].uw[i]),fGETUBYTE((1-uiV) & 0x3,RtV));
+    VddV.v[0].uw[i] += fMPY8UU(fGETUBYTE(2, VuuV.v[0        ].uw[i]),fGETUBYTE((2-uiV) & 0x3,RtV));
+    VddV.v[0].uw[i] += fMPY8UU(fGETUBYTE(3, VuuV.v[0        ].uw[i]),fGETUBYTE((3-uiV) & 0x3,RtV));
+
+    VddV.v[1].uw[i]  = fMPY8UU(fGETUBYTE(0, VuuV.v[1        ].uw[i]),fGETUBYTE((2-uiV) & 0x3,RtV));
+    VddV.v[1].uw[i] += fMPY8UU(fGETUBYTE(1, VuuV.v[1        ].uw[i]),fGETUBYTE((3-uiV) & 0x3,RtV));
+    VddV.v[1].uw[i] += fMPY8UU(fGETUBYTE(2, VuuV.v[uiV ? 1:0].uw[i]),fGETUBYTE((0-uiV) & 0x3,RtV));
+    VddV.v[1].uw[i] += fMPY8UU(fGETUBYTE(3, VuuV.v[0        ].uw[i]),fGETUBYTE((1-uiV) & 0x3,RtV)))
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vrmpyubi_acc,"Vxx32+=vrmpyub(Vuu32,Rt32,#u1)","Vxx32.uw+=vrmpy(Vuu32.ub,Rt32.ub,#u1)",
+"Dual Vector Unsigned Byte By Signed Byte 4-way Reduction with accumulate and saturation to Word",
+    VxxV.v[0].uw[i] += fMPY8UU(fGETUBYTE(0, VuuV.v[uiV ? 1:0].uw[i]),fGETUBYTE((0-uiV) & 0x3,RtV));
+    VxxV.v[0].uw[i] += fMPY8UU(fGETUBYTE(1, VuuV.v[0        ].uw[i]),fGETUBYTE((1-uiV) & 0x3,RtV));
+    VxxV.v[0].uw[i] += fMPY8UU(fGETUBYTE(2, VuuV.v[0        ].uw[i]),fGETUBYTE((2-uiV) & 0x3,RtV));
+    VxxV.v[0].uw[i] += fMPY8UU(fGETUBYTE(3, VuuV.v[0        ].uw[i]),fGETUBYTE((3-uiV) & 0x3,RtV));
+
+    VxxV.v[1].uw[i] += fMPY8UU(fGETUBYTE(0, VuuV.v[1        ].uw[i]),fGETUBYTE((2-uiV) & 0x3,RtV));
+    VxxV.v[1].uw[i] += fMPY8UU(fGETUBYTE(1, VuuV.v[1        ].uw[i]),fGETUBYTE((3-uiV) & 0x3,RtV));
+    VxxV.v[1].uw[i] += fMPY8UU(fGETUBYTE(2, VuuV.v[uiV ? 1:0].uw[i]),fGETUBYTE((0-uiV) & 0x3,RtV));
+    VxxV.v[1].uw[i] += fMPY8UU(fGETUBYTE(3, VuuV.v[0        ].uw[i]),fGETUBYTE((1-uiV) & 0x3,RtV)))
+
+
+
+
+/********************************************
+*  4-WAY REDUCTION - UNSIGNED BYTE BY  BYTE
+********************************************/
+
+ITERATOR_INSN2_MPY_SLOT(32,vrmpybus,"Vd32=vrmpybus(Vu32,Rt32)","Vd32.w=vrmpy(Vu32.ub,Rt32.b)",
+"Vector Multiply-Accumulate Reduce with 4 byte coefficients",
+    VdV.w[i]  = fMPY8US(fGETUBYTE(0,VuV.uw[i]), fGETBYTE(0,RtV));
+    VdV.w[i] += fMPY8US(fGETUBYTE(1,VuV.uw[i]), fGETBYTE(1,RtV));
+    VdV.w[i] += fMPY8US(fGETUBYTE(2,VuV.uw[i]), fGETBYTE(2,RtV));
+    VdV.w[i] += fMPY8US(fGETUBYTE(3,VuV.uw[i]), fGETBYTE(3,RtV)))
+
+
+ITERATOR_INSN2_MPY_SLOT(32,vrmpybus_acc,"Vx32+=vrmpybus(Vu32,Rt32)","Vx32.w+=vrmpy(Vu32.ub,Rt32.b)",
+"Vector Multiply-Accumulate Reduce with 4 byte coefficients",
+    VxV.w[i] += fMPY8US(fGETUBYTE(0,VuV.uw[i]), fGETBYTE(0,RtV));
+    VxV.w[i] += fMPY8US(fGETUBYTE(1,VuV.uw[i]), fGETBYTE(1,RtV));
+    VxV.w[i] += fMPY8US(fGETUBYTE(2,VuV.uw[i]), fGETBYTE(2,RtV));
+    VxV.w[i] += fMPY8US(fGETUBYTE(3,VuV.uw[i]), fGETBYTE(3,RtV)))
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vrmpybusi,"Vdd32=vrmpybus(Vuu32,Rt32,#u1)","Vdd32.w=vrmpy(Vuu32.ub,Rt32.b,#u1)",
+"Dual Vector Unsigned Byte By Signed Byte 4-way Reduction to Word",
+    VddV.v[0].w[i]  = fMPY8US(fGETUBYTE(0, VuuV.v[uiV ? 1:0].uw[i]),fGETBYTE((0-uiV) & 0x3,RtV));
+    VddV.v[0].w[i] += fMPY8US(fGETUBYTE(1, VuuV.v[0        ].uw[i]),fGETBYTE((1-uiV) & 0x3,RtV));
+    VddV.v[0].w[i] += fMPY8US(fGETUBYTE(2, VuuV.v[0        ].uw[i]),fGETBYTE((2-uiV) & 0x3,RtV));
+    VddV.v[0].w[i] += fMPY8US(fGETUBYTE(3, VuuV.v[0        ].uw[i]),fGETBYTE((3-uiV) & 0x3,RtV));
+
+    VddV.v[1].w[i]  = fMPY8US(fGETUBYTE(0, VuuV.v[1        ].uw[i]),fGETBYTE((2-uiV) & 0x3,RtV));
+    VddV.v[1].w[i] += fMPY8US(fGETUBYTE(1, VuuV.v[1        ].uw[i]),fGETBYTE((3-uiV) & 0x3,RtV));
+    VddV.v[1].w[i] += fMPY8US(fGETUBYTE(2, VuuV.v[uiV ? 1:0].uw[i]),fGETBYTE((0-uiV) & 0x3,RtV));
+    VddV.v[1].w[i] += fMPY8US(fGETUBYTE(3, VuuV.v[0        ].uw[i]),fGETBYTE((1-uiV) & 0x3,RtV)))
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vrmpybusi_acc,"Vxx32+=vrmpybus(Vuu32,Rt32,#u1)","Vxx32.w+=vrmpy(Vuu32.ub,Rt32.b,#u1)",
+"Dual Vector Unsigned Byte By Signed Byte 4-way Reduction with accumulate and saturation to Word",
+    VxxV.v[0].w[i] += fMPY8US(fGETUBYTE(0, VuuV.v[uiV ? 1:0].uw[i]),fGETBYTE((0-uiV) & 0x3,RtV));
+    VxxV.v[0].w[i] += fMPY8US(fGETUBYTE(1, VuuV.v[0        ].uw[i]),fGETBYTE((1-uiV) & 0x3,RtV));
+    VxxV.v[0].w[i] += fMPY8US(fGETUBYTE(2, VuuV.v[0        ].uw[i]),fGETBYTE((2-uiV) & 0x3,RtV));
+    VxxV.v[0].w[i] += fMPY8US(fGETUBYTE(3, VuuV.v[0        ].uw[i]),fGETBYTE((3-uiV) & 0x3,RtV));
+
+    VxxV.v[1].w[i] += fMPY8US(fGETUBYTE(0, VuuV.v[1        ].uw[i]),fGETBYTE((2-uiV) & 0x3,RtV));
+    VxxV.v[1].w[i] += fMPY8US(fGETUBYTE(1, VuuV.v[1        ].uw[i]),fGETBYTE((3-uiV) & 0x3,RtV));
+    VxxV.v[1].w[i] += fMPY8US(fGETUBYTE(2, VuuV.v[uiV ? 1:0].uw[i]),fGETBYTE((0-uiV) & 0x3,RtV));
+    VxxV.v[1].w[i] += fMPY8US(fGETUBYTE(3, VuuV.v[0        ].uw[i]),fGETBYTE((1-uiV) & 0x3,RtV)))
+
+
+
+
+ITERATOR_INSN2_MPY_SLOT(32,vrmpybusv,"Vd32=vrmpybus(Vu32,Vv32)","Vd32.w=vrmpy(Vu32.ub,Vv32.b)",
+"Vector Multiply-Accumulate Reduce with 4 byte coefficients",
+    VdV.w[i]  = fMPY8US(fGETUBYTE(0,VuV.uw[i]), fGETBYTE(0,VvV.w[i]));
+    VdV.w[i] += fMPY8US(fGETUBYTE(1,VuV.uw[i]), fGETBYTE(1,VvV.w[i]));
+    VdV.w[i] += fMPY8US(fGETUBYTE(2,VuV.uw[i]), fGETBYTE(2,VvV.w[i]));
+    VdV.w[i] += fMPY8US(fGETUBYTE(3,VuV.uw[i]), fGETBYTE(3,VvV.w[i])))
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vrmpybusv_acc,"Vx32+=vrmpybus(Vu32,Vv32)","Vx32.w+=vrmpy(Vu32.ub,Vv32.b)",
+"Vector Multiply-Accumulate Reduce with 4 byte coefficients",
+    VxV.w[i] += fMPY8US(fGETUBYTE(0,VuV.uw[i]), fGETBYTE(0,VvV.w[i]));
+    VxV.w[i] += fMPY8US(fGETUBYTE(1,VuV.uw[i]), fGETBYTE(1,VvV.w[i]));
+    VxV.w[i] += fMPY8US(fGETUBYTE(2,VuV.uw[i]), fGETBYTE(2,VvV.w[i]));
+    VxV.w[i] += fMPY8US(fGETUBYTE(3,VuV.uw[i]), fGETBYTE(3,VvV.w[i])))
+
+
+
+
+
+
+
+
+
+
+
+/********************************************
+*  2-WAY REDUCTION - SAD
+********************************************/
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vdsaduh,"Vdd32=vdsaduh(Vuu32,Rt32)","Vdd32.uw=vdsad(Vuu32.uh,Rt32.uh)",
+"Dual Vector Halfword by Byte 4-Way Reduction to Word",
+    VddV.v[0].uw[i]  = fABS(fGETUHALF(0, VuuV.v[0].uw[i]) - fGETUHALF(0,RtV));
+    VddV.v[0].uw[i] += fABS(fGETUHALF(1, VuuV.v[0].uw[i]) - fGETUHALF(1,RtV));
+    VddV.v[1].uw[i]  = fABS(fGETUHALF(1, VuuV.v[0].uw[i]) - fGETUHALF(0,RtV));
+    VddV.v[1].uw[i] += fABS(fGETUHALF(0, VuuV.v[1].uw[i]) - fGETUHALF(1,RtV)))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vdsaduh_acc,"Vxx32+=vdsaduh(Vuu32,Rt32)","Vxx32.uw+=vdsad(Vuu32.uh,Rt32.uh)",
+"Dual Vector Halfword by Byte 4-Way Reduction to Word",
+    VxxV.v[0].uw[i] += fABS(fGETUHALF(0, VuuV.v[0].uw[i]) - fGETUHALF(0,RtV));
+    VxxV.v[0].uw[i] += fABS(fGETUHALF(1, VuuV.v[0].uw[i]) - fGETUHALF(1,RtV));
+    VxxV.v[1].uw[i] += fABS(fGETUHALF(1, VuuV.v[0].uw[i]) - fGETUHALF(0,RtV));
+    VxxV.v[1].uw[i] += fABS(fGETUHALF(0, VuuV.v[1].uw[i]) - fGETUHALF(1,RtV)))
+
+
+
+
+/********************************************
+*  4-WAY REDUCTION - SAD
+********************************************/
+
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vrsadubi,"Vdd32=vrsadub(Vuu32,Rt32,#u1)","Vdd32.uw=vrsad(Vuu32.ub,Rt32.ub,#u1)",
+"Dual Vector Halfword by Byte 4-Way Reduction to Word",
+    VddV.v[0].uw[i]  = fABS(fZE8_16(fGETUBYTE(0, VuuV.v[uiV?1:0].uw[i])) - fZE8_16(fGETUBYTE((0-uiV)&3,RtV)));
+    VddV.v[0].uw[i] += fABS(fZE8_16(fGETUBYTE(1, VuuV.v[0      ].uw[i])) - fZE8_16(fGETUBYTE((1-uiV)&3,RtV)));
+    VddV.v[0].uw[i] += fABS(fZE8_16(fGETUBYTE(2, VuuV.v[0      ].uw[i])) - fZE8_16(fGETUBYTE((2-uiV)&3,RtV)));
+    VddV.v[0].uw[i] += fABS(fZE8_16(fGETUBYTE(3, VuuV.v[0      ].uw[i])) - fZE8_16(fGETUBYTE((3-uiV)&3,RtV)));
+
+    VddV.v[1].uw[i]  = fABS(fZE8_16(fGETUBYTE(0, VuuV.v[1      ].uw[i])) - fZE8_16(fGETUBYTE((2-uiV)&3,RtV)));
+    VddV.v[1].uw[i] += fABS(fZE8_16(fGETUBYTE(1, VuuV.v[1      ].uw[i])) - fZE8_16(fGETUBYTE((3-uiV)&3,RtV)));
+    VddV.v[1].uw[i] += fABS(fZE8_16(fGETUBYTE(2, VuuV.v[uiV?1:0].uw[i])) - fZE8_16(fGETUBYTE((0-uiV)&3,RtV)));
+    VddV.v[1].uw[i] += fABS(fZE8_16(fGETUBYTE(3, VuuV.v[0      ].uw[i])) - fZE8_16(fGETUBYTE((1-uiV)&3,RtV))))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vrsadubi_acc,"Vxx32+=vrsadub(Vuu32,Rt32,#u1)","Vxx32.uw+=vrsad(Vuu32.ub,Rt32.ub,#u1)",
+"Dual Vector Halfword by Byte 4-Way Reduction to Word",
+    VxxV.v[0].uw[i] += fABS(fZE8_16(fGETUBYTE(0, VuuV.v[uiV?1:0].uw[i])) - fZE8_16(fGETUBYTE((0-uiV)&3,RtV)));
+    VxxV.v[0].uw[i] += fABS(fZE8_16(fGETUBYTE(1, VuuV.v[0      ].uw[i])) - fZE8_16(fGETUBYTE((1-uiV)&3,RtV)));
+    VxxV.v[0].uw[i] += fABS(fZE8_16(fGETUBYTE(2, VuuV.v[0      ].uw[i])) - fZE8_16(fGETUBYTE((2-uiV)&3,RtV)));
+    VxxV.v[0].uw[i] += fABS(fZE8_16(fGETUBYTE(3, VuuV.v[0      ].uw[i])) - fZE8_16(fGETUBYTE((3-uiV)&3,RtV)));
+
+    VxxV.v[1].uw[i] += fABS(fZE8_16(fGETUBYTE(0, VuuV.v[1      ].uw[i])) - fZE8_16(fGETUBYTE((2-uiV)&3,RtV)));
+    VxxV.v[1].uw[i] += fABS(fZE8_16(fGETUBYTE(1, VuuV.v[1      ].uw[i])) - fZE8_16(fGETUBYTE((3-uiV)&3,RtV)));
+    VxxV.v[1].uw[i] += fABS(fZE8_16(fGETUBYTE(2, VuuV.v[uiV?1:0].uw[i])) - fZE8_16(fGETUBYTE((0-uiV)&3,RtV)));
+    VxxV.v[1].uw[i] += fABS(fZE8_16(fGETUBYTE(3, VuuV.v[0      ].uw[i])) - fZE8_16(fGETUBYTE((1-uiV)&3,RtV))))
+
+
+
+
+
+
+
+
+
+
+/*********************************************************************
+ * MMVECTOR SHIFTING
+ * ******************************************************************/
+// Macro to shift arithmetically left/right and by either RT or Vv
+
+#define V_SHIFT(TYPE, DESC, SIZE, LOGSIZE, CASTTYPE)   \
+ITERATOR_INSN2_SHIFT_SLOT(SIZE,vasr##TYPE,   "Vd32=vasr" #TYPE "(Vu32,Rt32)","Vd32."#TYPE"=vasr(Vu32."#TYPE",Rt32)",         "Vector arithmetic shift right " DESC,    VdV.TYPE[i]     = (VuV.TYPE[i]    >> (RtV & (SIZE-1)))) \
+ITERATOR_INSN2_SHIFT_SLOT(SIZE,vasl##TYPE,   "Vd32=vasl" #TYPE "(Vu32,Rt32)","Vd32."#TYPE"=vasl(Vu32."#TYPE",Rt32)",         "Vector arithmetic shift left  " DESC,    VdV.TYPE[i]     = (VuV.TYPE[i]    << (RtV & (SIZE-1)))) \
+ITERATOR_INSN2_SHIFT_SLOT(SIZE,vlsr##TYPE,   "Vd32=vlsr" #TYPE "(Vu32,Rt32)","Vd32.u"#TYPE"=vlsr(Vu32.u"#TYPE",Rt32)",       "Vector logical shift right "    DESC,    VdV.u##TYPE[i]  = (VuV.u##TYPE[i] >> (RtV & (SIZE-1)))) \
+ITERATOR_INSN2_SHIFT_SLOT(SIZE,vasr##TYPE##v,"Vd32=vasr" #TYPE "(Vu32,Vv32)","Vd32."#TYPE"=vasr(Vu32."#TYPE",Vv32."#TYPE")", "Vector arithmetic shift right " DESC,    VdV.TYPE[i]     = fBIDIR_ASHIFTR(VuV.TYPE[i], fSXTN((LOGSIZE+1),SIZE,VvV.TYPE[i]),CASTTYPE)) \
+ITERATOR_INSN2_SHIFT_SLOT(SIZE,vasl##TYPE##v,"Vd32=vasl" #TYPE "(Vu32,Vv32)","Vd32."#TYPE"=vasl(Vu32."#TYPE",Vv32."#TYPE")", "Vector arithmetic shift left  " DESC,    VdV.TYPE[i]     = fBIDIR_ASHIFTL(VuV.TYPE[i],  fSXTN((LOGSIZE+1),SIZE,VvV.TYPE[i]),CASTTYPE)) \
+ITERATOR_INSN2_SHIFT_SLOT(SIZE,vlsr##TYPE##v,"Vd32=vlsr" #TYPE "(Vu32,Vv32)","Vd32."#TYPE"=vlsr(Vu32."#TYPE",Vv32."#TYPE")", "Vector logical shift right "    DESC,    VdV.u##TYPE[i]  = fBIDIR_LSHIFTR(VuV.u##TYPE[i], fSXTN((LOGSIZE+1),SIZE,VvV.TYPE[i]),CASTTYPE)) \
+
+V_SHIFT(w, "word",   32,5,4_4)
+V_SHIFT(h, "halfword", 16,4,2_2)
+
+ITERATOR_INSN_SHIFT_SLOT(8,vlsrb,"Vd32.ub=vlsr(Vu32.ub,Rt32)","vec log shift right bytes", VdV.b[i] = VuV.ub[i] >> (RtV & 0x7))
+
+ITERATOR_INSN2_SHIFT_SLOT(32,vrotr,"Vd32=vrotr(Vu32,Vv32)","Vd32.uw=vrotr(Vu32.uw,Vv32.uw)","Vector word rotate right", VdV.uw[i] = ((VuV.uw[i] >> (VvV.uw[i] & 0x1f)) | (VuV.uw[i] << (32 - (VvV.uw[i] & 0x1f)))))
+
+/*********************************************************************
+ * MMVECTOR SHIFT AND PERMUTE
+ * ******************************************************************/
+
+ITERATOR_INSN2_PERMUTE_SLOT_DOUBLE_VEC(32,vasr_into,"Vxx32=vasrinto(Vu32,Vv32)","Vxx32.w=vasrinto(Vu32.w,Vv32.w)","ASR vector 1 elements and overlay dropping bits to MSB of vector 2 elements",
+    fHIDE(int64_t ) shift = (fSE32_64(VuV.w[i]) << 32);
+    fHIDE(int64_t ) mask  = (((fSE32_64(VxxV.v[0].w[i])) << 32) | fZE32_64(VxxV.v[0].w[i]));
+    fHIDE(int64_t) lomask = (((fSE32_64(1)) << 32) - 1);
+    fHIDE(int ) count = -(0x40 & VvV.w[i]) + (VvV.w[i] & 0x3f);
+    fHIDE(int64_t ) result = (count == -0x40) ? 0 : (((count < 0) ? ((shift << -(count)) | (mask & (lomask << -(count)))) : ((shift >> count) | (mask & (lomask >> count)))));
+    VxxV.v[1].w[i] = ((result >> 32) & 0xffffffff);
+    VxxV.v[0].w[i] = (result & 0xffffffff))
+
+#define NEW_NARROWING_SHIFT 1
+
+#if NEW_NARROWING_SHIFT
+#define NARROWING_SHIFT(ITERSIZE,TAG,DSTM,DSTTYPE,SRCTYPE,SYNOPTS,SATFUNC,RNDFUNC,SHAMTMASK) \
+ITERATOR_INSN_SHIFT_SLOT(ITERSIZE,TAG, \
+"Vd32." #DSTTYPE "=vasr(Vu32." #SRCTYPE ",Vv32." #SRCTYPE ",Rt8)" #SYNOPTS, \
+"Vector shift right and shuffle", \
+    fHIDE(int )shamt = RtV & SHAMTMASK; \
+    DSTM(0,VdV.SRCTYPE[i],SATFUNC(RNDFUNC(VvV.SRCTYPE[i],shamt) >> shamt)); \
+    DSTM(1,VdV.SRCTYPE[i],SATFUNC(RNDFUNC(VuV.SRCTYPE[i],shamt) >> shamt)))
+
+
+
+
+
+/* WORD TO HALF*/
+
+NARROWING_SHIFT(32,vasrwh,fSETHALF,h,w,,fECHO,fVNOROUND,0xF)
+NARROWING_SHIFT(32,vasrwhsat,fSETHALF,h,w,:sat,fVSATH,fVNOROUND,0xF)
+NARROWING_SHIFT(32,vasrwhrndsat,fSETHALF,h,w,:rnd:sat,fVSATH,fVROUND,0xF)
+NARROWING_SHIFT(32,vasrwuhrndsat,fSETHALF,uh,w,:rnd:sat,fVSATUH,fVROUND,0xF)
+NARROWING_SHIFT(32,vasrwuhsat,fSETHALF,uh,w,:sat,fVSATUH,fVNOROUND,0xF)
+NARROWING_SHIFT(32,vasruwuhrndsat,fSETHALF,uh,uw,:rnd:sat,fVSATUH,fVROUND,0xF)
+
+NARROWING_SHIFT_NOV1(32,vasruwuhsat,fSETHALF,uh,uw,:sat,fVSATUH,fVNOROUND,0xF)
+NARROWING_SHIFT(16,vasrhubsat,fSETBYTE,ub,h,:sat,fVSATUB,fVNOROUND,0x7)
+NARROWING_SHIFT(16,vasrhubrndsat,fSETBYTE,ub,h,:rnd:sat,fVSATUB,fVROUND,0x7)
+NARROWING_SHIFT(16,vasrhbsat,fSETBYTE,b,h,:sat,fVSATB,fVNOROUND,0x7)
+NARROWING_SHIFT(16,vasrhbrndsat,fSETBYTE,b,h,:rnd:sat,fVSATB,fVROUND,0x7)
+
+NARROWING_SHIFT_NOV1(16,vasruhubsat,fSETBYTE,ub,uh,:sat,fVSATUB,fVNOROUND,0x7)
+NARROWING_SHIFT_NOV1(16,vasruhubrndsat,fSETBYTE,ub,uh,:rnd:sat,fVSATUB,fVROUND,0x7)
+
+#else
+ITERATOR_INSN2_SHIFT_SLOT(32,vasrwh,"Vd32=vasrwh(Vu32,Vv32,Rt8)","Vd32.h=vasr(Vu32.w,Vv32.w,Rt8)",
+"Vector arithmetic shift right words, shuffle even halfwords",
+    fSETHALF(0,VdV.w[i], (VvV.w[i] >> (RtV & 0xF)));
+    fSETHALF(1,VdV.w[i], (VuV.w[i] >> (RtV & 0xF))))
+
+
+ITERATOR_INSN2_SHIFT_SLOT(32,vasrwhsat,"Vd32=vasrwh(Vu32,Vv32,Rt8):sat","Vd32.h=vasr(Vu32.w,Vv32.w,Rt8):sat",
+"Vector arithmetic shift right words, shuffle even halfwords",
+    fSETHALF(0,VdV.w[i], fVSATH(VvV.w[i] >> (RtV & 0xF)));
+    fSETHALF(1,VdV.w[i], fVSATH(VuV.w[i] >> (RtV & 0xF))))
+
+ITERATOR_INSN2_SHIFT_SLOT(32,vasrwhrndsat,"Vd32=vasrwh(Vu32,Vv32,Rt8):rnd:sat","Vd32.h=vasr(Vu32.w,Vv32.w,Rt8):rnd:sat",
+"Vector arithmetic shift right words, shuffle even halfwords",
+    fHIDE(int ) shamt = RtV & 0xF;
+    fSETHALF(0,VdV.w[i], fVSATH(  (VvV.w[i] + fBIDIR_ASHIFTL(1,(shamt-1),4_8) ) >> shamt));
+    fSETHALF(1,VdV.w[i], fVSATH(  (VuV.w[i] + fBIDIR_ASHIFTL(1,(shamt-1),4_8) ) >> shamt)))
+
+ITERATOR_INSN2_SHIFT_SLOT(32,vasrwuhrndsat,"Vd32=vasrwuh(Vu32,Vv32,Rt8):rnd:sat","Vd32.uh=vasr(Vu32.w,Vv32.w,Rt8):rnd:sat",
+"Vector arithmetic shift right words, shuffle even halfwords",
+    fHIDE(int ) shamt = RtV & 0xF;
+    fSETHALF(0,VdV.w[i], fVSATUH(  (VvV.w[i] + fBIDIR_ASHIFTL(1,(shamt-1),4_8) ) >> shamt));
+    fSETHALF(1,VdV.w[i], fVSATUH(  (VuV.w[i] + fBIDIR_ASHIFTL(1,(shamt-1),4_8) ) >> shamt)))
+
+ITERATOR_INSN2_SHIFT_SLOT(32,vasrwuhsat,"Vd32=vasrwuh(Vu32,Vv32,Rt8):sat","Vd32.uh=vasr(Vu32.w,Vv32.w,Rt8):sat",
+"Vector arithmetic shift right words, shuffle even halfwords",
+    fSETHALF(0, VdV.uw[i], fVSATUH(VvV.w[i] >> (RtV & 0xF)));
+    fSETHALF(1, VdV.uw[i], fVSATUH(VuV.w[i] >> (RtV & 0xF))))
+
+ITERATOR_INSN2_SHIFT_SLOT(32,vasruwuhrndsat,"Vd32=vasruwuh(Vu32,Vv32,Rt8):rnd:sat","Vd32.uh=vasr(Vu32.uw,Vv32.uw,Rt8):rnd:sat",
+"Vector arithmetic shift right words, shuffle even halfwords",
+    fHIDE(int ) shamt = RtV & 0xF;
+    fSETHALF(0,VdV.w[i], fVSATUH(  (VvV.uw[i] + fBIDIR_ASHIFTL(1,(shamt-1),4_8) ) >> shamt));
+    fSETHALF(1,VdV.w[i], fVSATUH(  (VuV.uw[i] + fBIDIR_ASHIFTL(1,(shamt-1),4_8) ) >> shamt)))
+#endif
+
+
+
+ITERATOR_INSN2_SHIFT_SLOT(32,vroundwh,"Vd32=vroundwh(Vu32,Vv32):sat","Vd32.h=vround(Vu32.w,Vv32.w):sat",
+"Vector round words to halves, shuffle resultant halfwords",
+    fSETHALF(0, VdV.uw[i], fVSATH((VvV.w[i] + fCONSTLL(0x8000)) >> 16));
+    fSETHALF(1, VdV.uw[i], fVSATH((VuV.w[i] + fCONSTLL(0x8000)) >> 16)))
+
+ITERATOR_INSN2_SHIFT_SLOT(32,vroundwuh,"Vd32=vroundwuh(Vu32,Vv32):sat","Vd32.uh=vround(Vu32.w,Vv32.w):sat",
+"Vector round words to halves, shuffle resultant halfwords",
+    fSETHALF(0, VdV.uw[i], fVSATUH((VvV.w[i] + fCONSTLL(0x8000)) >> 16));
+    fSETHALF(1, VdV.uw[i], fVSATUH((VuV.w[i] + fCONSTLL(0x8000)) >> 16)))
+
+ITERATOR_INSN2_SHIFT_SLOT(32,vrounduwuh,"Vd32=vrounduwuh(Vu32,Vv32):sat","Vd32.uh=vround(Vu32.uw,Vv32.uw):sat",
+"Vector round words to halves, shuffle resultant halfwords",
+    fSETHALF(0, VdV.uw[i], fVSATUH((VvV.uw[i] + fCONSTLL(0x8000)) >> 16));
+    fSETHALF(1, VdV.uw[i], fVSATUH((VuV.uw[i] + fCONSTLL(0x8000)) >> 16)))
+
+
+
+
+
+/* HALF TO BYTE*/
+
+ITERATOR_INSN2_SHIFT_SLOT(16,vroundhb,"Vd32=vroundhb(Vu32,Vv32):sat","Vd32.b=vround(Vu32.h,Vv32.h):sat",
+"Vector round words to halves, shuffle resultant halfwords",
+    fSETBYTE(0, VdV.uh[i], fVSATB((VvV.h[i] + 0x80) >> 8));
+    fSETBYTE(1, VdV.uh[i], fVSATB((VuV.h[i] + 0x80) >> 8)))
+
+ITERATOR_INSN2_SHIFT_SLOT(16,vroundhub,"Vd32=vroundhub(Vu32,Vv32):sat","Vd32.ub=vround(Vu32.h,Vv32.h):sat",
+"Vector round words to halves, shuffle resultant halfwords",
+    fSETBYTE(0, VdV.uh[i], fVSATUB((VvV.h[i] + 0x80) >> 8));
+    fSETBYTE(1, VdV.uh[i], fVSATUB((VuV.h[i] + 0x80) >> 8)))
+
+ITERATOR_INSN2_SHIFT_SLOT(16,vrounduhub,"Vd32=vrounduhub(Vu32,Vv32):sat","Vd32.ub=vround(Vu32.uh,Vv32.uh):sat",
+"Vector round words to halves, shuffle resultant halfwords",
+    fSETBYTE(0, VdV.uh[i], fVSATUB((VvV.uh[i] + 0x80) >> 8));
+    fSETBYTE(1, VdV.uh[i], fVSATUB((VuV.uh[i] + 0x80) >> 8)))
+
+
+ITERATOR_INSN2_SHIFT_SLOT(32,vaslw_acc,"Vx32+=vaslw(Vu32,Rt32)","Vx32.w+=vasl(Vu32.w,Rt32)",
+"Vector shift add word",
+    VxV.w[i]  +=  (VuV.w[i] << (RtV & (32-1))))
+
+ITERATOR_INSN2_SHIFT_SLOT(32,vasrw_acc,"Vx32+=vasrw(Vu32,Rt32)","Vx32.w+=vasr(Vu32.w,Rt32)",
+"Vector shift add word",
+    VxV.w[i]  +=  (VuV.w[i] >> (RtV & (32-1))))
+
+ITERATOR_INSN2_SHIFT_SLOT_NOV1(16,vaslh_acc,"Vx32+=vaslh(Vu32,Rt32)","Vx32.h+=vasl(Vu32.h,Rt32)",
+"Vector shift add halfword",
+    VxV.h[i]  +=  (VuV.h[i] << (RtV & (16-1))))
+
+ITERATOR_INSN2_SHIFT_SLOT_NOV1(16,vasrh_acc,"Vx32+=vasrh(Vu32,Rt32)","Vx32.h+=vasr(Vu32.h,Rt32)",
+"Vector shift add halfword",
+    VxV.h[i]  +=  (VuV.h[i] >> (RtV & (16-1))))
+
+/**************************************************************************
+*
+* MMVECTOR ELEMENT-WISE ARITHMETIC
+*
+**************************************************************************/
+
+/**************************************************************************
+* MACROS GO IN MACROS.DEF NOT HERE!!!
+**************************************************************************/
+
+
+#define MMVEC_ABSDIFF(TYPE,TYPE2,DESCR, WIDTH, DEST,SRC)\
+ITERATOR_INSN2_MPY_SLOT(WIDTH, vabsdiff##TYPE,                   "Vd32=vabsdiff"TYPE2"(Vu32,Vv32)" ,"Vd32."#DEST"=vabsdiff(Vu32."#SRC",Vv32."#SRC")" ,     "Vector Absolute of Difference "DESCR,   VdV.DEST[i] = (VuV.SRC[i] > VvV.SRC[i]) ? (VuV.SRC[i] - VvV.SRC[i]) : (VvV.SRC[i] - VuV.SRC[i]))
+
+#define MMVEC_ADDU_SAT(TYPE,TYPE2,DESCR, WIDTH, DEST,SRC)\
+ITERATOR_INSN2_ANY_SLOT(WIDTH, vadd##TYPE##sat,                  "Vd32=vadd"TYPE2"(Vu32,Vv32):sat" ,    "Vd32."#DEST"=vadd(Vu32."#SRC",Vv32."#SRC"):sat",    "Vector Add & Saturate "DESCR,            VdV.DEST[i] = fVUADDSAT(WIDTH,  VuV.SRC[i], VvV.SRC[i]))\
+ITERATOR_INSN2_ANY_SLOT_DOUBLE_VEC(WIDTH, vadd##TYPE##sat_dv,    "Vdd32=vadd"TYPE2"(Vuu32,Vvv32):sat",  "Vdd32."#DEST"=vadd(Vuu32."#SRC",Vvv32."#SRC"):sat", "Double Vector Add & Saturate "DESCR,    VddV.v[0].DEST[i] = fVUADDSAT(WIDTH, VuuV.v[0].SRC[i],VvvV.v[0].SRC[i]); VddV.v[1].DEST[i] = fVUADDSAT(WIDTH, VuuV.v[1].SRC[i],VvvV.v[1].SRC[i]))\
+ITERATOR_INSN2_ANY_SLOT(WIDTH, vsub##TYPE##sat,                  "Vd32=vsub"TYPE2"(Vu32,Vv32):sat",     "Vd32."#DEST"=vsub(Vu32."#SRC",Vv32."#SRC"):sat",    "Vector Add & Saturate "DESCR,            VdV.DEST[i] = fVUSUBSAT(WIDTH,  VuV.SRC[i], VvV.SRC[i]))\
+ITERATOR_INSN2_ANY_SLOT_DOUBLE_VEC(WIDTH, vsub##TYPE##sat_dv,    "Vdd32=vsub"TYPE2"(Vuu32,Vvv32):sat",  "Vdd32."#DEST"=vsub(Vuu32."#SRC",Vvv32."#SRC"):sat", "Double Vector Add & Saturate "DESCR,    VddV.v[0].DEST[i] = fVUSUBSAT(WIDTH, VuuV.v[0].SRC[i],VvvV.v[0].SRC[i]); VddV.v[1].DEST[i] = fVUSUBSAT(WIDTH, VuuV.v[1].SRC[i],VvvV.v[1].SRC[i]))\
+
+#define MMVEC_ADDS_SAT(TYPE,TYPE2,DESCR, WIDTH,DEST,SRC)\
+ITERATOR_INSN2_ANY_SLOT(WIDTH, vadd##TYPE##sat,                  "Vd32=vadd"TYPE2"(Vu32,Vv32):sat" ,    "Vd32."#DEST"=vadd(Vu32."#SRC",Vv32."#SRC"):sat",    "Vector Add & Saturate "DESCR,            VdV.DEST[i] = fVSADDSAT(WIDTH,  VuV.SRC[i],  VvV.SRC[i]))\
+ITERATOR_INSN2_ANY_SLOT_DOUBLE_VEC(WIDTH, vadd##TYPE##sat_dv,    "Vdd32=vadd"TYPE2"(Vuu32,Vvv32):sat",  "Vdd32."#DEST"=vadd(Vuu32."#SRC",Vvv32."#SRC"):sat", "Double Vector Add & Saturate "DESCR,    VddV.v[0].DEST[i] = fVSADDSAT(WIDTH, VuuV.v[0].SRC[i], VvvV.v[0].SRC[i]); VddV.v[1].DEST[i] = fVSADDSAT(WIDTH, VuuV.v[1].SRC[i], VvvV.v[1].SRC[i]))\
+ITERATOR_INSN2_ANY_SLOT(WIDTH, vsub##TYPE##sat,                  "Vd32=vsub"TYPE2"(Vu32,Vv32):sat",     "Vd32."#DEST"=vsub(Vu32."#SRC",Vv32."#SRC"):sat",    "Vector Add & Saturate "DESCR,            VdV.DEST[i] = fVSSUBSAT(WIDTH,  VuV.SRC[i],  VvV.SRC[i]))\
+ITERATOR_INSN2_ANY_SLOT_DOUBLE_VEC(WIDTH, vsub##TYPE##sat_dv,    "Vdd32=vsub"TYPE2"(Vuu32,Vvv32):sat",  "Vdd32."#DEST"=vsub(Vuu32."#SRC",Vvv32."#SRC"):sat", "Double Vector Add & Saturate "DESCR,    VddV.v[0].DEST[i] = fVSSUBSAT(WIDTH, VuuV.v[0].SRC[i], VvvV.v[0].SRC[i]); VddV.v[1].DEST[i] = fVSSUBSAT(WIDTH, VuuV.v[1].SRC[i], VvvV.v[1].SRC[i]))\
+
+#define MMVEC_AVGU(TYPE,TYPE2,DESCR, WIDTH, DEST,SRC)\
+ITERATOR_INSN2_ANY_SLOT(WIDTH,vavg##TYPE,                        "Vd32=vavg"TYPE2"(Vu32,Vv32)",         "Vd32."#DEST"=vavg(Vu32."#SRC",Vv32."#SRC")",        "Vector Average "DESCR,                                      VdV.DEST[i] = fVAVGU(   WIDTH,  VuV.SRC[i], VvV.SRC[i])) \
+ITERATOR_INSN2_ANY_SLOT(WIDTH,vavg##TYPE##rnd,                   "Vd32=vavg"TYPE2"(Vu32,Vv32):rnd",     "Vd32."#DEST"=vavg(Vu32."#SRC",Vv32."#SRC"):rnd",    "Vector Average % Round"DESCR,                               VdV.DEST[i] = fVAVGURND(WIDTH,  VuV.SRC[i], VvV.SRC[i]))
+
+
+
+#define MMVEC_AVGS(TYPE,TYPE2,DESCR, WIDTH, DEST,SRC)\
+ITERATOR_INSN2_ANY_SLOT(WIDTH,vavg##TYPE,                        "Vd32=vavg"TYPE2"(Vu32,Vv32)",          "Vd32."#DEST"=vavg(Vu32."#SRC",Vv32."#SRC")",          "Vector Average "DESCR,                                      VdV.DEST[i]  = fVAVGS(       WIDTH,  VuV.SRC[i], VvV.SRC[i])) \
+ITERATOR_INSN2_ANY_SLOT(WIDTH,vavg##TYPE##rnd,                   "Vd32=vavg"TYPE2"(Vu32,Vv32):rnd",      "Vd32."#DEST"=vavg(Vu32."#SRC",Vv32."#SRC"):rnd",      "Vector Average % Round"DESCR,                               VdV.DEST[i]  = fVAVGSRND(    WIDTH,  VuV.SRC[i], VvV.SRC[i])) \
+ITERATOR_INSN2_ANY_SLOT(WIDTH,vnavg##TYPE,                       "Vd32=vnavg"TYPE2"(Vu32,Vv32)",         "Vd32."#DEST"=vnavg(Vu32."#SRC",Vv32."#SRC")",         "Vector Negative Average "DESCR,                             VdV.DEST[i]  = fVNAVGS(      WIDTH,  VuV.SRC[i], VvV.SRC[i]))
+
+
+
+
+
+
+
+#define MMVEC_ADDWRAP(TYPE,TYPE2, DESCR, WIDTH , DEST,SRC)\
+ITERATOR_INSN2_ANY_SLOT(WIDTH, vadd##TYPE,                  "Vd32=vadd"TYPE2"(Vu32,Vv32)" ,     "Vd32."#DEST"=vadd(Vu32."#SRC",Vv32."#SRC")",    "Vector Add "DESCR,          VdV.DEST[i] =  VuV.SRC[i] +  VvV.SRC[i])\
+ITERATOR_INSN2_ANY_SLOT(WIDTH, vsub##TYPE,                  "Vd32=vsub"TYPE2"(Vu32,Vv32)" ,     "Vd32."#DEST"=vsub(Vu32."#SRC",Vv32."#SRC")",    "Vector Sub "DESCR,          VdV.DEST[i] =  VuV.SRC[i] -  VvV.SRC[i])\
+ITERATOR_INSN2_ANY_SLOT_DOUBLE_VEC(WIDTH, vadd##TYPE##_dv,  "Vdd32=vadd"TYPE2"(Vuu32,Vvv32)" ,  "Vdd32."#DEST"=vadd(Vuu32."#SRC",Vvv32."#SRC")", "Double Vector Add "DESCR,   VddV.v[0].DEST[i] = VuuV.v[0].SRC[i] + VvvV.v[0].SRC[i]; VddV.v[1].DEST[i] = VuuV.v[1].SRC[i] + VvvV.v[1].SRC[i])\
+ITERATOR_INSN2_ANY_SLOT_DOUBLE_VEC(WIDTH, vsub##TYPE##_dv,  "Vdd32=vsub"TYPE2"(Vuu32,Vvv32)" ,  "Vdd32."#DEST"=vsub(Vuu32."#SRC",Vvv32."#SRC")", "Double Vector Sub "DESCR,   VddV.v[0].DEST[i] = VuuV.v[0].SRC[i] - VvvV.v[0].SRC[i]; VddV.v[1].DEST[i] = VuuV.v[1].SRC[i] - VvvV.v[1].SRC[i]) \
+
+
+
+
+
+/* Wrapping Adds */
+MMVEC_ADDWRAP(b,    "b",    "Byte",         8,   b, b)
+MMVEC_ADDWRAP(h,    "h",    "Halfword",     16,  h, h)
+MMVEC_ADDWRAP(w,    "w",    "Word",         32,   w,    w)
+
+/* Saturating Adds */
+MMVEC_ADDU_SAT(ub, "ub",    "Unsigned Byte",        8,   ub,    ub)
+MMVEC_ADDU_SAT(uh, "uh",    "Unsigned Halfword",    16,  uh,    uh)
+MMVEC_ADDU_SAT(uw, "uw",    "Unsigned word",    32,  uw,    uw)
+MMVEC_ADDS_SAT(b,  "b",     "byte",             8,  b,     b)
+MMVEC_ADDS_SAT(h,  "h",     "Halfword",             16,  h,     h)
+MMVEC_ADDS_SAT(w,  "w",     "Word",                 32,  w,     w)
+
+
+/* Averaging Instructions */
+MMVEC_AVGU(ub,"ub",     "Unsigned Byte",     8,   ub,   ub)
+MMVEC_AVGU(uh,"uh",     "Unsigned Halfword", 16,  uh,   uh)
+MMVEC_AVGU_NOV1(uw,"uw",     "Unsigned Word",     32,  uw,   uw)
+MMVEC_AVGS_NOV1(b,   "b",    "Byte",               8,   b,   b)
+MMVEC_AVGS(h,   "h",    "Halfword",          16,   h,   h)
+MMVEC_AVGS(w,   "w",    "Word",              32,   w,   w)
+
+
+/* Absolute Difference */
+MMVEC_ABSDIFF(ub,"ub",  "Unsigned Byte",        8,   ub,    ub)
+MMVEC_ABSDIFF(uh,"uh",  "Unsigned Halfword",    16,  uh,    uh)
+MMVEC_ABSDIFF(h,"h",        "Halfword",             16,  uh,    h)
+MMVEC_ABSDIFF(w,"w",        "Word",                 32,  uw,    w)
+
+ITERATOR_INSN2_ANY_SLOT(8,vnavgub, "Vd32=vnavgub(Vu32,Vv32)", "Vd32.b=vnavg(Vu32.ub,Vv32.ub)",
+"Vector Negative Average Unsigned Byte", VdV.b[i]   = fVNAVGU(8, VuV.ub[i], VvV.ub[i]))
+
+ITERATOR_INSN_ANY_SLOT(32,vaddcarrysat,"Vd32.w=vadd(Vu32.w,Vv32.w,Qs4):carry:sat","add w/carry and saturate",
+VdV.w[i] = fVSATW(VuV.w[i]+VvV.w[i]+fGETQBIT(QsV,i*4)))
+
+ITERATOR_INSN_ANY_SLOT(32,vaddcarry,"Vd32.w=vadd(Vu32.w,Vv32.w,Qx4):carry","add w/carry",
+VdV.w[i] = VuV.w[i]+VvV.w[i]+fGETQBIT(QxV,i*4);
+fSETQBITS(QxV,4,0xF,4*i,-fCARRY_FROM_ADD32(VuV.w[i],VvV.w[i],fGETQBIT(QxV,i*4))))
+
+ITERATOR_INSN_ANY_SLOT(32,vsubcarry,"Vd32.w=vsub(Vu32.w,Vv32.w,Qx4):carry","add w/carry",
+VdV.w[i] = VuV.w[i]+~VvV.w[i]+fGETQBIT(QxV,i*4);
+fSETQBITS(QxV,4,0xF,4*i,-fCARRY_FROM_ADD32(VuV.w[i],~VvV.w[i],fGETQBIT(QxV,i*4))))
+
+ITERATOR_INSN_ANY_SLOT(32,vaddcarryo,"Vd32.w,Qe4=vadd(Vu32.w,Vv32.w):carry","add w/carry out-only",
+VdV.w[i] = VuV.w[i]+VvV.w[i];
+fSETQBITS(QeV,4,0xF,4*i,-fCARRY_FROM_ADD32(VuV.w[i],VvV.w[i],0)))
+
+ITERATOR_INSN_ANY_SLOT(32,vsubcarryo,"Vd32.w,Qe4=vsub(Vu32.w,Vv32.w):carry","subtract w/carry out-only",
+VdV.w[i] = VuV.w[i]+~VvV.w[i]+1;
+fSETQBITS(QeV,4,0xF,4*i,-fCARRY_FROM_ADD32(VuV.w[i],~VvV.w[i],1)))
+
+
+ITERATOR_INSN_ANY_SLOT(32,vsatdw,"Vd32.w=vsatdw(Vu32.w,Vv32.w)","Saturate from 64-bits (higher 32-bits come from first vector) to 32-bits",VdV.w[i] = fVSATDW(VuV.w[i],VvV.w[i]))
+
+
+#define MMVEC_ADDSAT_MIX(TAGEND,SATF,WIDTH,DEST,SRC1,SRC2)\
+ITERATOR_INSN_ANY_SLOT(WIDTH, vadd##TAGEND,"Vd32."#DEST"=vadd(Vu32."#SRC1",Vv32."#SRC2"):sat",    "Vector Add mixed", VdV.DEST[i] =  SATF(VuV.SRC1[i] +  VvV.SRC2[i]))\
+ITERATOR_INSN_ANY_SLOT(WIDTH, vsub##TAGEND,"Vd32."#DEST"=vsub(Vu32."#SRC1",Vv32."#SRC2"):sat",    "Vector Sub mixed", VdV.DEST[i] =  SATF(VuV.SRC1[i] -  VvV.SRC2[i]))\
+
+MMVEC_ADDSAT_MIX(ububb_sat,fVSATUB,8,ub,ub,b)
+
+/****************************
+*   WIDENING
+****************************/
+
+
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vaddubh,"Vdd32=vaddub(Vu32,Vv32)","Vdd32.h=vadd(Vu32.ub,Vv32.ub)",
+"Vector addition with widen into two vectors",
+    VddV.v[0].h[i] = fZE8_16(fGETUBYTE(0, VuV.uh[i])) + fZE8_16(fGETUBYTE(0, VvV.uh[i]));
+    VddV.v[1].h[i] = fZE8_16(fGETUBYTE(1, VuV.uh[i])) + fZE8_16(fGETUBYTE(1, VvV.uh[i])))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vsububh,"Vdd32=vsubub(Vu32,Vv32)","Vdd32.h=vsub(Vu32.ub,Vv32.ub)",
+"Vector subtraction with widen into two vectors",
+    VddV.v[0].h[i] = fZE8_16(fGETUBYTE(0, VuV.uh[i])) - fZE8_16(fGETUBYTE(0, VvV.uh[i]));
+    VddV.v[1].h[i] = fZE8_16(fGETUBYTE(1, VuV.uh[i])) - fZE8_16(fGETUBYTE(1, VvV.uh[i])))
+
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vaddhw,"Vdd32=vaddh(Vu32,Vv32)","Vdd32.w=vadd(Vu32.h,Vv32.h)",
+"Vector addition with widen into two vectors",
+    VddV.v[0].w[i] = fGETHALF(0, VuV.w[i]) + fGETHALF(0, VvV.w[i]);
+    VddV.v[1].w[i] = fGETHALF(1, VuV.w[i]) + fGETHALF(1, VvV.w[i]))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vsubhw,"Vdd32=vsubh(Vu32,Vv32)","Vdd32.w=vsub(Vu32.h,Vv32.h)",
+"Vector subtraction with widen into two vectors",
+    VddV.v[0].w[i] = fGETHALF(0, VuV.w[i]) - fGETHALF(0, VvV.w[i]);
+    VddV.v[1].w[i] = fGETHALF(1, VuV.w[i]) - fGETHALF(1, VvV.w[i]))
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vadduhw,"Vdd32=vadduh(Vu32,Vv32)","Vdd32.w=vadd(Vu32.uh,Vv32.uh)",
+"Vector addition with widen into two vectors",
+    VddV.v[0].w[i] = fZE16_32(fGETUHALF(0, VuV.uw[i])) + fZE16_32(fGETUHALF(0, VvV.uw[i]));
+    VddV.v[1].w[i] = fZE16_32(fGETUHALF(1, VuV.uw[i])) + fZE16_32(fGETUHALF(1, VvV.uw[i])))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vsubuhw,"Vdd32=vsubuh(Vu32,Vv32)","Vdd32.w=vsub(Vu32.uh,Vv32.uh)",
+"Vector subtraction with widen into two vectors",
+    VddV.v[0].w[i] = fZE16_32(fGETUHALF(0, VuV.uw[i])) - fZE16_32(fGETUHALF(0, VvV.uw[i]));
+    VddV.v[1].w[i] = fZE16_32(fGETUHALF(1, VuV.uw[i])) - fZE16_32(fGETUHALF(1, VvV.uw[i])))
+
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vaddhw_acc,"Vxx32+=vaddh(Vu32,Vv32)","Vxx32.w+=vadd(Vu32.h,Vv32.h)",
+"Vector addition with widen into two vectors",
+    VxxV.v[0].w[i] += fGETHALF(0, VuV.w[i]) + fGETHALF(0, VvV.w[i]);
+    VxxV.v[1].w[i] += fGETHALF(1, VuV.w[i]) + fGETHALF(1, VvV.w[i]))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vadduhw_acc,"Vxx32+=vadduh(Vu32,Vv32)","Vxx32.w+=vadd(Vu32.uh,Vv32.uh)",
+"Vector addition with widen into two vectors",
+    VxxV.v[0].w[i] += fGETUHALF(0, VuV.w[i]) + fGETUHALF(0, VvV.w[i]);
+    VxxV.v[1].w[i] += fGETUHALF(1, VuV.w[i]) + fGETUHALF(1, VvV.w[i]))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vaddubh_acc,"Vxx32+=vaddub(Vu32,Vv32)","Vxx32.h+=vadd(Vu32.ub,Vv32.ub)",
+"Vector addition with widen into two vectors",
+    VxxV.v[0].h[i] += fGETUBYTE(0, VuV.h[i]) + fGETUBYTE(0, VvV.h[i]);
+    VxxV.v[1].h[i] += fGETUBYTE(1, VuV.h[i]) + fGETUBYTE(1, VvV.h[i]))
+
+
+/****************************
+*   Conditional
+****************************/
+
+#define CONDADDSUB(WIDTH,TAGEND,LHSYN,RHSYN,DESCR,LHBEH,RHBEH) \
+ITERATOR_INSN2_ANY_SLOT(WIDTH,vadd##TAGEND##q,"if (Qv4."#TAGEND") "LHSYN"+="RHSYN,"if (Qv4) "LHSYN"+="RHSYN,DESCR,LHBEH=fCONDMASK##WIDTH(QvV,i,LHBEH+RHBEH,LHBEH)) \
+ITERATOR_INSN2_ANY_SLOT(WIDTH,vsub##TAGEND##q,"if (Qv4."#TAGEND") "LHSYN"-="RHSYN,"if (Qv4) "LHSYN"-="RHSYN,DESCR,LHBEH=fCONDMASK##WIDTH(QvV,i,LHBEH-RHBEH,LHBEH)) \
+ITERATOR_INSN2_ANY_SLOT(WIDTH,vadd##TAGEND##nq,"if (!Qv4."#TAGEND") "LHSYN"+="RHSYN,"if (!Qv4) "LHSYN"+="RHSYN,DESCR,LHBEH=fCONDMASK##WIDTH(QvV,i,LHBEH,LHBEH+RHBEH)) \
+ITERATOR_INSN2_ANY_SLOT(WIDTH,vsub##TAGEND##nq,"if (!Qv4."#TAGEND") "LHSYN"-="RHSYN,"if (!Qv4) "LHSYN"-="RHSYN,DESCR,LHBEH=fCONDMASK##WIDTH(QvV,i,LHBEH,LHBEH-RHBEH)) \
+
+CONDADDSUB(8,b,"Vx32.b","Vu32.b","Conditional add/sub Byte",VxV.ub[i],VuV.ub[i])
+CONDADDSUB(16,h,"Vx32.h","Vu32.h","Conditional add/sub Half",VxV.h[i],VuV.h[i])
+CONDADDSUB(32,w,"Vx32.w","Vu32.w","Conditional add/sub Word",VxV.w[i],VuV.w[i])
+
+/*****************************************************
+ ABSOLUTE VALUES
+*****************************************************/
+// V65
+ITERATOR_INSN2_ANY_SLOT_NOV1(8,vabsb,        "Vd32=vabsb(Vu32)",     "Vd32.b=vabs(Vu32.b)",     "Vector absolute value of bytes",    VdV.b[i]  =  fABS(VuV.b[i]))
+ITERATOR_INSN2_ANY_SLOT_NOV1(8,vabsb_sat,    "Vd32=vabsb(Vu32):sat", "Vd32.b=vabs(Vu32.b):sat", "Vector absolute value of bytes",    VdV.b[i]  =  fVSATB(fABS(fSE8_16(VuV.b[i]))))
+
+
+ITERATOR_INSN2_ANY_SLOT(16,vabsh,        "Vd32=vabsh(Vu32)",     "Vd32.h=vabs(Vu32.h)",     "Vector absolute value of halfwords",    VdV.h[i]  =  fABS(VuV.h[i]))
+ITERATOR_INSN2_ANY_SLOT(16,vabsh_sat,    "Vd32=vabsh(Vu32):sat", "Vd32.h=vabs(Vu32.h):sat", "Vector absolute value of halfwords",    VdV.h[i]  =  fVSATH(fABS(fSE16_32(VuV.h[i]))))
+ITERATOR_INSN2_ANY_SLOT(32,vabsw,        "Vd32=vabsw(Vu32)",     "Vd32.w=vabs(Vu32.w)",     "Vector absolute value of words",        VdV.w[i]  =  fABS(VuV.w[i]))
+ITERATOR_INSN2_ANY_SLOT(32,vabsw_sat,    "Vd32=vabsw(Vu32):sat", "Vd32.w=vabs(Vu32.w):sat", "Vector absolute value of words",        VdV.w[i]  =  fVSATW(fABS(fSE32_64(VuV.w[i]))))
+
+
+/**************************************************************************
+ * MMVECTOR MULTIPLICATIONS
+ * ************************************************************************/
+
+
+/* Byte by Byte */
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vmpybv,"Vdd32=vmpyb(Vu32,Vv32)","Vdd32.h=vmpy(Vu32.b,Vv32.b)",
+"Vector absolute value of words",
+    VddV.v[0].h[i] =  fMPY8SS(fGETBYTE(0, VuV.h[i]), fGETBYTE(0, VvV.h[i]));
+    VddV.v[1].h[i] =  fMPY8SS(fGETBYTE(1, VuV.h[i]), fGETBYTE(1, VvV.h[i])))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vmpybv_acc,"Vxx32+=vmpyb(Vu32,Vv32)","Vxx32.h+=vmpy(Vu32.b,Vv32.b)",
+"Vector absolute value of words",
+    VxxV.v[0].h[i] +=  fMPY8SS(fGETBYTE(0, VuV.h[i]), fGETBYTE(0, VvV.h[i]));
+    VxxV.v[1].h[i] +=  fMPY8SS(fGETBYTE(1, VuV.h[i]), fGETBYTE(1, VvV.h[i])))
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vmpyubv,"Vdd32=vmpyub(Vu32,Vv32)","Vdd32.uh=vmpy(Vu32.ub,Vv32.ub)",
+"Vector absolute value of words",
+    VddV.v[0].uh[i] =  fMPY8UU(fGETUBYTE(0, VuV.uh[i]), fGETUBYTE(0, VvV.uh[i]) );
+    VddV.v[1].uh[i] =  fMPY8UU(fGETUBYTE(1, VuV.uh[i]), fGETUBYTE(1, VvV.uh[i]) ))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vmpyubv_acc,"Vxx32+=vmpyub(Vu32,Vv32)","Vxx32.uh+=vmpy(Vu32.ub,Vv32.ub)",
+"Vector absolute value of words",
+    VxxV.v[0].uh[i] +=  fMPY8UU(fGETUBYTE(0, VuV.uh[i]), fGETUBYTE(0, VvV.uh[i]) );
+    VxxV.v[1].uh[i] +=  fMPY8UU(fGETUBYTE(1, VuV.uh[i]), fGETUBYTE(1, VvV.uh[i]) ))
+
+
+
+
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vmpybusv,"Vdd32=vmpybus(Vu32,Vv32)","Vdd32.h=vmpy(Vu32.ub,Vv32.b)",
+"Vector absolute value of words",
+    VddV.v[0].h[i]  = fMPY8US(fGETUBYTE(0, VuV.uh[i]), fGETBYTE(0, VvV.h[i]));
+    VddV.v[1].h[i]  = fMPY8US(fGETUBYTE(1, VuV.uh[i]), fGETBYTE(1, VvV.h[i])))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vmpybusv_acc,"Vxx32+=vmpybus(Vu32,Vv32)","Vxx32.h+=vmpy(Vu32.ub,Vv32.b)",
+"Vector absolute value of words",
+    VxxV.v[0].h[i]  += fMPY8US(fGETUBYTE(0, VuV.uh[i]), fGETBYTE(0, VvV.h[i]));
+    VxxV.v[1].h[i]  += fMPY8US(fGETUBYTE(1, VuV.uh[i]), fGETBYTE(1, VvV.h[i])))
+
+
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vmpabusv,"Vdd32=vmpabus(Vuu32,Vvv32)","Vdd32.h=vmpa(Vuu32.ub,Vvv32.b)",
+"Vertical Byte Multiply",
+    VddV.v[0].h[i] = fMPY8US(fGETUBYTE(0, VuuV.v[0].uh[i]), fGETBYTE(0, VvvV.v[0].uh[i])) + fMPY8US(fGETUBYTE(0, VuuV.v[1].uh[i]), fGETBYTE(0, VvvV.v[1].uh[i]));
+    VddV.v[1].h[i] = fMPY8US(fGETUBYTE(1, VuuV.v[0].uh[i]), fGETBYTE(1, VvvV.v[0].uh[i])) + fMPY8US(fGETUBYTE(1, VuuV.v[1].uh[i]), fGETBYTE(1, VvvV.v[1].uh[i])))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vmpabuuv,"Vdd32=vmpabuu(Vuu32,Vvv32)","Vdd32.h=vmpa(Vuu32.ub,Vvv32.ub)",
+"Vertical Byte Multiply",
+    VddV.v[0].h[i] = fMPY8UU(fGETUBYTE(0, VuuV.v[0].uh[i]), fGETUBYTE(0, VvvV.v[0].uh[i])) + fMPY8UU(fGETUBYTE(0, VuuV.v[1].uh[i]), fGETUBYTE(0, VvvV.v[1].uh[i]));
+    VddV.v[1].h[i] = fMPY8UU(fGETUBYTE(1, VuuV.v[0].uh[i]), fGETUBYTE(1, VvvV.v[0].uh[i])) + fMPY8UU(fGETUBYTE(1, VuuV.v[1].uh[i]), fGETUBYTE(1, VvvV.v[1].uh[i])))
+
+
+
+
+
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyhv,"Vdd32=vmpyh(Vu32,Vv32)","Vdd32.w=vmpy(Vu32.h,Vv32.h)",
+"Vector by Vector Halfword Multiply",
+    VddV.v[0].w[i] = fMPY16SS(fGETHALF(0, VuV.w[i]), fGETHALF(0, VvV.w[i]));
+    VddV.v[1].w[i] = fMPY16SS(fGETHALF(1, VuV.w[i]), fGETHALF(1, VvV.w[i])))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyhv_acc,"Vxx32+=vmpyh(Vu32,Vv32)","Vxx32.w+=vmpy(Vu32.h,Vv32.h)",
+"Vector by Vector Halfword Multiply",
+    VxxV.v[0].w[i] += fMPY16SS(fGETHALF(0, VuV.w[i]), fGETHALF(0, VvV.w[i]));
+    VxxV.v[1].w[i] += fMPY16SS(fGETHALF(1, VuV.w[i]), fGETHALF(1, VvV.w[i])))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyuhv,"Vdd32=vmpyuh(Vu32,Vv32)","Vdd32.uw=vmpy(Vu32.uh,Vv32.uh)",
+"Vector by Vector Unsigned Halfword Multiply",
+    VddV.v[0].uw[i] = fMPY16UU(fGETUHALF(0, VuV.uw[i]), fGETUHALF(0, VvV.uw[i]));
+    VddV.v[1].uw[i] = fMPY16UU(fGETUHALF(1, VuV.uw[i]), fGETUHALF(1, VvV.uw[i])))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyuhv_acc,"Vxx32+=vmpyuh(Vu32,Vv32)","Vxx32.uw+=vmpy(Vu32.uh,Vv32.uh)",
+"Vector by Vector Unsigned Halfword Multiply",
+    VxxV.v[0].uw[i] += fMPY16UU(fGETUHALF(0, VuV.uw[i]), fGETUHALF(0, VvV.uw[i]));
+    VxxV.v[1].uw[i] += fMPY16UU(fGETUHALF(1, VuV.uw[i]), fGETUHALF(1, VvV.uw[i])))
+
+
+
+/* Vector by Vector */
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vmpyhvsrs,"Vd32=vmpyh(Vu32,Vv32):<<1:rnd:sat","Vd32.h=vmpy(Vu32.h,Vv32.h):<<1:rnd:sat",
+"Vector halfword multiply with round, shift, and sat16",
+    VdV.h[i] = fVSATH(fGETHALF(1,fVSAT(fROUND((fMPY16SS(VuV.h[i],VvV.h[i]    )<<1))))))
+
+
+
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyhus, "Vdd32=vmpyhus(Vu32,Vv32)","Vdd32.w=vmpy(Vu32.h,Vv32.uh)",
+"Vector by Vector Halfword Multiply",
+    VddV.v[0].w[i] = fMPY16SU(fGETHALF(0, VuV.w[i]), fGETUHALF(0, VvV.uw[i]));
+    VddV.v[1].w[i] = fMPY16SU(fGETHALF(1, VuV.w[i]), fGETUHALF(1, VvV.uw[i])))
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyhus_acc, "Vxx32+=vmpyhus(Vu32,Vv32)","Vxx32.w+=vmpy(Vu32.h,Vv32.uh)",
+"Vector by Vector Halfword Multiply",
+    VxxV.v[0].w[i] += fMPY16SU(fGETHALF(0, VuV.w[i]), fGETUHALF(0, VvV.uw[i]));
+    VxxV.v[1].w[i] += fMPY16SU(fGETHALF(1, VuV.w[i]), fGETUHALF(1, VvV.uw[i])))
+
+
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vmpyih,"Vd32=vmpyih(Vu32,Vv32)","Vd32.h=vmpyi(Vu32.h,Vv32.h)",
+"Vector by Vector Halfword Multiply",
+    VdV.h[i] = fMPY16SS(VuV.h[i], VvV.h[i]))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vmpyih_acc,"Vx32+=vmpyih(Vu32,Vv32)","Vx32.h+=vmpyi(Vu32.h,Vv32.h)",
+"Vector by Vector Halfword Multiply",
+    VxV.h[i] += fMPY16SS(VuV.h[i], VvV.h[i]))
+
+
+
+/* 32x32 high half / frac */
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyewuh,"Vd32=vmpyewuh(Vu32,Vv32)","Vd32.w=vmpye(Vu32.w,Vv32.uh)",
+"Vector by Vector Halfword Multiply",
+VdV.w[i] = fMPY3216SU(VuV.w[i], fGETUHALF(0, VvV.w[i])) >> 16)
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyowh,"Vd32=vmpyowh(Vu32,Vv32):<<1:sat","Vd32.w=vmpyo(Vu32.w,Vv32.h):<<1:sat",
+"Vector by Vector Halfword Multiply",
+VdV.w[i] = fVSATW((((fMPY3216SS(VuV.w[i], fGETHALF(1, VvV.w[i])) >> 14) + 0) >> 1)))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyowh_rnd,"Vd32=vmpyowh(Vu32,Vv32):<<1:rnd:sat","Vd32.w=vmpyo(Vu32.w,Vv32.h):<<1:rnd:sat",
+"Vector by Vector Halfword Multiply",
+VdV.w[i] = fVSATW((((fMPY3216SS(VuV.w[i], fGETHALF(1, VvV.w[i])) >> 14) + 1) >> 1)))
+
+ITERATOR_INSN_MPY_SLOT_DOUBLE_VEC(32,vmpyewuh_64,"Vdd32=vmpye(Vu32.w,Vv32.uh)",
+"Word times Halfword Multiply, 64-bit result",
+	fHIDE(size8s_t prod;)
+	prod = fMPY32SU(VuV.w[i],fGETUHALF(0,VvV.w[i]));
+	VddV.v[1].w[i] = prod >> 16;
+	VddV.v[0].w[i] = prod << 16)
+
+ITERATOR_INSN_MPY_SLOT_DOUBLE_VEC(32,vmpyowh_64_acc,"Vxx32+=vmpyo(Vu32.w,Vv32.h)",
+"Word times Halfword Multiply, 64-bit result",
+	fHIDE(size8s_t prod;)
+	prod = fMPY32SS(VuV.w[i],fGETHALF(1,VvV.w[i]))  + fSE32_64(VxxV.v[1].w[i]);
+	VxxV.v[1].w[i] = prod >> 16;
+	fSETHALF(0, VxxV.v[0].w[i], VxxV.v[0].w[i] >> 16);
+	fSETHALF(1, VxxV.v[0].w[i], prod & 0x0000ffff))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyowh_sacc,"Vx32+=vmpyowh(Vu32,Vv32):<<1:sat:shift","Vx32.w+=vmpyo(Vu32.w,Vv32.h):<<1:sat:shift",
+"Vector by Vector Halfword Multiply",
+IV1DEAD() VxV.w[i] = fVSATW(((((VxV.w[i] + fMPY3216SS(VuV.w[i], fGETHALF(1, VvV.w[i]))) >> 14) + 0) >> 1)))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyowh_rnd_sacc,"Vx32+=vmpyowh(Vu32,Vv32):<<1:rnd:sat:shift","Vx32.w+=vmpyo(Vu32.w,Vv32.h):<<1:rnd:sat:shift",
+"Vector by Vector Halfword Multiply",
+IV1DEAD() VxV.w[i] = fVSATW(((((VxV.w[i] + fMPY3216SS(VuV.w[i], fGETHALF(1, VvV.w[i]))) >> 14) + 1) >> 1)))
+
+/* For 32x32 integer / low half */
+
+ITERATOR_INSN_MPY_SLOT(32,vmpyieoh,"Vd32.w=vmpyieo(Vu32.h,Vv32.h)","Odd/Even multiply for 32x32 low half",
+	VdV.w[i] = (fGETHALF(0,VuV.w[i])*fGETHALF(1,VvV.w[i])) << 16)
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyiewuh,"Vd32=vmpyiewuh(Vu32,Vv32)","Vd32.w=vmpyie(Vu32.w,Vv32.uh)",
+"Vector by Vector Word by Halfword Multiply",
+IV1DEAD()    VdV.w[i] = fMPY3216SU(VuV.w[i], fGETUHALF(0, VvV.w[i])) )
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyiowh,"Vd32=vmpyiowh(Vu32,Vv32)","Vd32.w=vmpyio(Vu32.w,Vv32.h)",
+"Vector by Vector Word by Halfword Multiply",
+IV1DEAD()    VdV.w[i] = fMPY3216SS(VuV.w[i], fGETHALF(1, VvV.w[i])) )
+
+/* Add back these... */
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyiewh_acc,"Vx32+=vmpyiewh(Vu32,Vv32)","Vx32.w+=vmpyie(Vu32.w,Vv32.h)",
+"Vector by Vector Word by Halfword Multiply",
+VxV.w[i] = VxV.w[i] + fMPY3216SS(VuV.w[i], fGETHALF(0, VvV.w[i])) )
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyiewuh_acc,"Vx32+=vmpyiewuh(Vu32,Vv32)","Vx32.w+=vmpyie(Vu32.w,Vv32.uh)",
+"Vector by Vector Word by Halfword Multiply",
+VxV.w[i] = VxV.w[i] + fMPY3216SU(VuV.w[i], fGETUHALF(0, VvV.w[i])) )
+
+
+
+
+
+
+
+/* Vector by Scalar */
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vmpyub,"Vdd32=vmpyub(Vu32,Rt32)","Vdd32.uh=vmpy(Vu32.ub,Rt32.ub)",
+"Vector absolute value of words",
+    VddV.v[0].uh[i]  = fMPY8UU(fGETUBYTE(0, VuV.uh[i]), fGETUBYTE((2*i+0)%4, RtV));
+    VddV.v[1].uh[i]  = fMPY8UU(fGETUBYTE(1, VuV.uh[i]), fGETUBYTE((2*i+1)%4, RtV)))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vmpyub_acc,"Vxx32+=vmpyub(Vu32,Rt32)","Vxx32.uh+=vmpy(Vu32.ub,Rt32.ub)",
+"Vector absolute value of words",
+    VxxV.v[0].uh[i] += fMPY8UU(fGETUBYTE(0, VuV.uh[i]), fGETUBYTE((2*i+0)%4, RtV));
+    VxxV.v[1].uh[i] += fMPY8UU(fGETUBYTE(1, VuV.uh[i]), fGETUBYTE((2*i+1)%4, RtV)))
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vmpybus,"Vdd32=vmpybus(Vu32,Rt32)","Vdd32.h=vmpy(Vu32.ub,Rt32.b)",
+"Vector absolute value of words",
+    VddV.v[0].h[i]  = fMPY8US(fGETUBYTE(0, VuV.uh[i]), fGETBYTE((2*i+0)%4, RtV));
+    VddV.v[1].h[i]  = fMPY8US(fGETUBYTE(1, VuV.uh[i]), fGETBYTE((2*i+1)%4, RtV)))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vmpybus_acc,"Vxx32+=vmpybus(Vu32,Rt32)","Vxx32.h+=vmpy(Vu32.ub,Rt32.b)",
+"Vector absolute value of words",
+    VxxV.v[0].h[i] += fMPY8US(fGETUBYTE(0, VuV.uh[i]), fGETBYTE((2*i+0)%4, RtV));
+    VxxV.v[1].h[i] += fMPY8US(fGETUBYTE(1, VuV.uh[i]), fGETBYTE((2*i+1)%4, RtV)))
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vmpabus,"Vdd32=vmpabus(Vuu32,Rt32)","Vdd32.h=vmpa(Vuu32.ub,Rt32.b)",
+"Vertical Byte Multiply",
+    VddV.v[0].h[i] = fMPY8US(fGETUBYTE(0, VuuV.v[0].uh[i]), fGETBYTE(0, RtV)) + fMPY16SS(fGETUBYTE(0, VuuV.v[1].uh[i]), fGETBYTE(1, RtV));
+    VddV.v[1].h[i] = fMPY8US(fGETUBYTE(1, VuuV.v[0].uh[i]), fGETBYTE(2, RtV)) + fMPY16SS(fGETUBYTE(1, VuuV.v[1].uh[i]), fGETBYTE(3, RtV)))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vmpabus_acc,"Vxx32+=vmpabus(Vuu32,Rt32)","Vxx32.h+=vmpa(Vuu32.ub,Rt32.b)",
+"Vertical Byte Multiply",
+    VxxV.v[0].h[i] += fMPY8US(fGETUBYTE(0, VuuV.v[0].uh[i]), fGETBYTE(0, RtV)) + fMPY16SS(fGETUBYTE(0, VuuV.v[1].uh[i]), fGETBYTE(1, RtV));
+    VxxV.v[1].h[i] += fMPY8US(fGETUBYTE(1, VuuV.v[0].uh[i]), fGETBYTE(2, RtV)) + fMPY16SS(fGETUBYTE(1, VuuV.v[1].uh[i]), fGETBYTE(3, RtV)))
+
+// V65
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC_NOV1(16,vmpabuu,"Vdd32=vmpabuu(Vuu32,Rt32)","Vdd32.h=vmpa(Vuu32.ub,Rt32.ub)",
+"Vertical Byte Multiply",
+    VddV.v[0].uh[i] = fMPY8UU(fGETUBYTE(0, VuuV.v[0].uh[i]), fGETUBYTE(0, RtV)) + fMPY8UU(fGETUBYTE(0, VuuV.v[1].uh[i]), fGETUBYTE(1, RtV));
+    VddV.v[1].uh[i] = fMPY8UU(fGETUBYTE(1, VuuV.v[0].uh[i]), fGETUBYTE(2, RtV)) + fMPY8UU(fGETUBYTE(1, VuuV.v[1].uh[i]), fGETUBYTE(3, RtV)))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC_NOV1(16,vmpabuu_acc,"Vxx32+=vmpabuu(Vuu32,Rt32)","Vxx32.h+=vmpa(Vuu32.ub,Rt32.ub)",
+"Vertical Byte Multiply",
+    VxxV.v[0].uh[i] += fMPY8UU(fGETUBYTE(0, VuuV.v[0].uh[i]), fGETUBYTE(0, RtV)) + fMPY8UU(fGETUBYTE(0, VuuV.v[1].uh[i]), fGETUBYTE(1, RtV));
+    VxxV.v[1].uh[i] += fMPY8UU(fGETUBYTE(1, VuuV.v[0].uh[i]), fGETUBYTE(2, RtV)) + fMPY8UU(fGETUBYTE(1, VuuV.v[1].uh[i]), fGETUBYTE(3, RtV)))
+
+
+
+
+/* Half by Byte */
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpahb,"Vdd32=vmpahb(Vuu32,Rt32)","Vdd32.w=vmpa(Vuu32.h,Rt32.b)",
+"Vertical Byte Multiply",
+    VddV.v[0].w[i] = fMPY16SS(fGETHALF(0, VuuV.v[0].w[i]), fSE8_16(fGETBYTE(0, RtV))) + fMPY16SS(fGETHALF(0, VuuV.v[1].w[i]), fSE8_16(fGETBYTE(1, RtV)));
+    VddV.v[1].w[i] = fMPY16SS(fGETHALF(1, VuuV.v[0].w[i]), fSE8_16(fGETBYTE(2, RtV))) + fMPY16SS(fGETHALF(1, VuuV.v[1].w[i]), fSE8_16(fGETBYTE(3, RtV))))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpahb_acc,"Vxx32+=vmpahb(Vuu32,Rt32)","Vxx32.w+=vmpa(Vuu32.h,Rt32.b)",
+"Vertical Byte Multiply",
+    VxxV.v[0].w[i] += fMPY16SS(fGETHALF(0, VuuV.v[0].w[i]), fSE8_16(fGETBYTE(0, RtV))) + fMPY16SS(fGETHALF(0, VuuV.v[1].w[i]), fSE8_16(fGETBYTE(1, RtV)));
+    VxxV.v[1].w[i] += fMPY16SS(fGETHALF(1, VuuV.v[0].w[i]), fSE8_16(fGETBYTE(2, RtV))) + fMPY16SS(fGETHALF(1, VuuV.v[1].w[i]), fSE8_16(fGETBYTE(3, RtV))))
+
+/* Half by Byte */
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpauhb,"Vdd32=vmpauhb(Vuu32,Rt32)","Vdd32.w=vmpa(Vuu32.uh,Rt32.b)",
+"Vertical Byte Multiply",
+    VddV.v[0].w[i] = fMPY16US(fGETUHALF(0, VuuV.v[0].w[i]), fSE8_16(fGETBYTE(0, RtV))) + fMPY16US(fGETUHALF(0, VuuV.v[1].w[i]), fSE8_16(fGETBYTE(1, RtV)));
+    VddV.v[1].w[i] = fMPY16US(fGETUHALF(1, VuuV.v[0].w[i]), fSE8_16(fGETBYTE(2, RtV))) + fMPY16US(fGETUHALF(1, VuuV.v[1].w[i]), fSE8_16(fGETBYTE(3, RtV))))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpauhb_acc,"Vxx32+=vmpauhb(Vuu32,Rt32)","Vxx32.w+=vmpa(Vuu32.uh,Rt32.b)",
+"Vertical Byte Multiply",
+    VxxV.v[0].w[i] += fMPY16US(fGETUHALF(0, VuuV.v[0].w[i]), fSE8_16(fGETBYTE(0, RtV))) + fMPY16US(fGETUHALF(0, VuuV.v[1].w[i]), fSE8_16(fGETBYTE(1, RtV)));
+    VxxV.v[1].w[i] += fMPY16US(fGETUHALF(1, VuuV.v[0].w[i]), fSE8_16(fGETBYTE(2, RtV))) + fMPY16US(fGETUHALF(1, VuuV.v[1].w[i]), fSE8_16(fGETBYTE(3, RtV))))
+
+
+
+
+
+
+
+/* Half by Half */
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyh,"Vdd32=vmpyh(Vu32,Rt32)","Vdd32.w=vmpy(Vu32.h,Rt32.h)",
+"Vector absolute value of words",
+    VddV.v[0].w[i] =  fMPY16SS(fGETHALF(0, VuV.w[i]), fGETHALF(0, RtV));
+    VddV.v[1].w[i] =  fMPY16SS(fGETHALF(1, VuV.w[i]), fGETHALF(1, RtV)))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC_NOV1(32,vmpyh_acc,"Vxx32+=vmpyh(Vu32,Rt32)","Vxx32.w+=vmpy(Vu32.h,Rt32.h)",
+"Vector even halfwords with scalar lower halfword multiply with shift and sat32",
+    VxxV.v[0].w[i] =  fCAST8s(VxxV.v[0].w[i]) + fMPY16SS(fGETHALF(0, VuV.w[i]), fGETHALF(0, RtV));
+    VxxV.v[1].w[i] =  fCAST8s(VxxV.v[1].w[i]) + fMPY16SS(fGETHALF(1, VuV.w[i]), fGETHALF(1, RtV)))
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyhsat_acc,"Vxx32+=vmpyh(Vu32,Rt32):sat","Vxx32.w+=vmpy(Vu32.h,Rt32.h):sat",
+"Vector even halfwords with scalar lower halfword multiply with shift and sat32",
+    VxxV.v[0].w[i] =  fVSATW(fCAST8s(VxxV.v[0].w[i]) + fMPY16SS(fGETHALF(0, VuV.w[i]), fGETHALF(0, RtV)));
+    VxxV.v[1].w[i] =  fVSATW(fCAST8s(VxxV.v[1].w[i]) + fMPY16SS(fGETHALF(1, VuV.w[i]), fGETHALF(1, RtV))))
+
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyhss,"Vd32=vmpyh(Vu32,Rt32):<<1:sat","Vd32.h=vmpy(Vu32.h,Rt32.h):<<1:sat",
+"Vector halfword by halfword multiply, shift by 1, and take upper 16 msb",
+          fSETHALF(0,VdV.w[i],fVSATH(fGETHALF(1,fVSAT((fMPY16SS(fGETHALF(0,VuV.w[i]),fGETHALF(0,RtV))<<1)))));
+          fSETHALF(1,VdV.w[i],fVSATH(fGETHALF(1,fVSAT((fMPY16SS(fGETHALF(1,VuV.w[i]),fGETHALF(1,RtV))<<1)))));
+)
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyhsrs,"Vd32=vmpyh(Vu32,Rt32):<<1:rnd:sat","Vd32.h=vmpy(Vu32.h,Rt32.h):<<1:rnd:sat",
+"Vector halfword with scalar halfword multiply with round, shift, and sat16",
+       fSETHALF(0,VdV.w[i],fVSATH(fGETHALF(1,fVSAT(fROUND((fMPY16SS(fGETHALF(0,VuV.w[i]),fGETHALF(0,RtV))<<1))))));
+       fSETHALF(1,VdV.w[i],fVSATH(fGETHALF(1,fVSAT(fROUND((fMPY16SS(fGETHALF(1,VuV.w[i]),fGETHALF(1,RtV))<<1))))));
+)
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyuh,"Vdd32=vmpyuh(Vu32,Rt32)","Vdd32.uw=vmpy(Vu32.uh,Rt32.uh)",
+"Vector even halfword unsigned multiply by scalar",
+    VddV.v[0].uw[i] = fMPY16UU(fGETUHALF(0, VuV.uw[i]),fGETUHALF(0,RtV));
+    VddV.v[1].uw[i] = fMPY16UU(fGETUHALF(1, VuV.uw[i]),fGETUHALF(1,RtV)))
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyuh_acc,"Vxx32+=vmpyuh(Vu32,Rt32)","Vxx32.uw+=vmpy(Vu32.uh,Rt32.uh)",
+"Vector even halfword unsigned multiply by scalar",
+    VxxV.v[0].uw[i] += fMPY16UU(fGETUHALF(0, VuV.uw[i]),fGETUHALF(0,RtV));
+    VxxV.v[1].uw[i] += fMPY16UU(fGETUHALF(1, VuV.uw[i]),fGETUHALF(1,RtV)))
+
+
+
+
+/********************************************
+*  HALF BY BYTE
+********************************************/
+ITERATOR_INSN2_MPY_SLOT(16,vmpyihb,"Vd32=vmpyihb(Vu32,Rt32)","Vd32.h=vmpyi(Vu32.h,Rt32.b)",
+"Vector word by byte multiply, keep lower result",
+VdV.h[i]  = fMPY16SS(VuV.h[i], fGETBYTE(i % 4, RtV) ))
+
+ITERATOR_INSN2_MPY_SLOT(16,vmpyihb_acc,"Vx32+=vmpyihb(Vu32,Rt32)","Vx32.h+=vmpyi(Vu32.h,Rt32.b)",
+"Vector word by byte multiply, keep lower result",
+VxV.h[i] += fMPY16SS(VuV.h[i], fGETBYTE(i % 4, RtV) ))
+
+
+/********************************************
+*  WORD BY BYTE
+********************************************/
+ITERATOR_INSN2_MPY_SLOT(32,vmpyiwb,"Vd32=vmpyiwb(Vu32,Rt32)","Vd32.w=vmpyi(Vu32.w,Rt32.b)",
+"Vector word by byte multiply, keep lower result",
+VdV.w[i]  = fMPY32SS(VuV.w[i], fGETBYTE(i % 4, RtV) ))
+
+ITERATOR_INSN2_MPY_SLOT(32,vmpyiwb_acc,"Vx32+=vmpyiwb(Vu32,Rt32)","Vx32.w+=vmpyi(Vu32.w,Rt32.b)",
+"Vector word by byte multiply, keep lower result",
+VxV.w[i] += fMPY32SS(VuV.w[i], fGETBYTE(i % 4, RtV) ))
+
+ITERATOR_INSN2_MPY_SLOT(32,vmpyiwub,"Vd32=vmpyiwub(Vu32,Rt32)","Vd32.w=vmpyi(Vu32.w,Rt32.ub)",
+"Vector word by byte multiply, keep lower result",
+VdV.w[i]  = fMPY32SS(VuV.w[i], fGETUBYTE(i % 4, RtV) ))
+
+ITERATOR_INSN2_MPY_SLOT(32,vmpyiwub_acc,"Vx32+=vmpyiwub(Vu32,Rt32)","Vx32.w+=vmpyi(Vu32.w,Rt32.ub)",
+"Vector word by byte multiply, keep lower result",
+VxV.w[i] += fMPY32SS(VuV.w[i], fGETUBYTE(i % 4, RtV) ))
+
+
+/********************************************
+*  WORD BY HALF
+********************************************/
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyiwh,"Vd32=vmpyiwh(Vu32,Rt32)","Vd32.w=vmpyi(Vu32.w,Rt32.h)",
+"Vector word by byte multiply, keep lower result",
+VdV.w[i]  = fMPY32SS(VuV.w[i], fGETHALF(i % 2, RtV)))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyiwh_acc,"Vx32+=vmpyiwh(Vu32,Rt32)","Vx32.w+=vmpyi(Vu32.w,Rt32.h)",
+"Vector word by byte multiply, keep lower result",
+VxV.w[i] += fMPY32SS(VuV.w[i], fGETHALF(i % 2, RtV)))
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+/**************************************************************************
+ * MMVECTOR LOGICAL OPERATIONS
+ * ************************************************************************/
+ITERATOR_INSN_ANY_SLOT(16,vand,"Vd32=vand(Vu32,Vv32)", "Vector Logical And", VdV.uh[i] = VuV.uh[i] & VvV.h[i])
+ITERATOR_INSN_ANY_SLOT(16,vor, "Vd32=vor(Vu32,Vv32)",  "Vector Logical Or", VdV.uh[i] = VuV.uh[i] | VvV.h[i])
+ITERATOR_INSN_ANY_SLOT(16,vxor,"Vd32=vxor(Vu32,Vv32)", "Vector Logical XOR",    VdV.uh[i] = VuV.uh[i] ^ VvV.h[i])
+ITERATOR_INSN_ANY_SLOT(16,vnot,"Vd32=vnot(Vu32)",     "Vector Logical NOT", VdV.uh[i] = ~VuV.uh[i])
+
+
+
+
+
+ITERATOR_INSN2_MPY_SLOT_LATE(8, vandqrt,
+"Vd32.ub=vand(Qu4.ub,Rt32.ub)", "Vd32=vand(Qu4,Rt32)", "Insert Predicate into Vector",
+    VdV.ub[i] = fGETQBIT(QuV,i) ? fGETUBYTE(i % 4, RtV) : 0)
+
+ITERATOR_INSN2_MPY_SLOT_LATE(8, vandqrt_acc,
+"Vx32.ub|=vand(Qu4.ub,Rt32.ub)", "Vx32|=vand(Qu4,Rt32)",  "Insert Predicate into Vector",
+    VxV.ub[i] |= (fGETQBIT(QuV,i)) ? fGETUBYTE(i % 4, RtV) : 0)
+
+ITERATOR_INSN2_MPY_SLOT_LATE(8, vandnqrt,
+"Vd32.ub=vand(!Qu4.ub,Rt32.ub)", "Vd32=vand(!Qu4,Rt32)", "Insert Predicate into Vector",
+    VdV.ub[i] = !fGETQBIT(QuV,i) ? fGETUBYTE(i % 4, RtV) : 0)
+
+ITERATOR_INSN2_MPY_SLOT_LATE(8, vandnqrt_acc,
+"Vx32.ub|=vand(!Qu4.ub,Rt32.ub)", "Vx32|=vand(!Qu4,Rt32)",  "Insert Predicate into Vector",
+    VxV.ub[i] |= !(fGETQBIT(QuV,i)) ? fGETUBYTE(i % 4, RtV) : 0)
+
+
+ITERATOR_INSN2_MPY_SLOT_LATE(8, vandvrt,
+"Qd4.ub=vand(Vu32.ub,Rt32.ub)", "Qd4=vand(Vu32,Rt32)", "Insert into Predicate",
+    fSETQBIT(QdV,i,((VuV.ub[i] & fGETUBYTE(i % 4, RtV)) != 0) ? 1 : 0))
+
+ITERATOR_INSN2_MPY_SLOT_LATE(8, vandvrt_acc,
+"Qx4.ub|=vand(Vu32.ub,Rt32.ub)", "Qx4|=vand(Vu32,Rt32)", "Insert into Predicate ",
+    fSETQBIT(QxV,i,fGETQBIT(QxV,i)|(((VuV.ub[i] & fGETUBYTE(i % 4, RtV)) != 0) ? 1 : 0)))
+
+ITERATOR_INSN_ANY_SLOT(8,vandvqv,"Vd32=vand(Qv4,Vu32)","Mask off bytes",
+VdV.b[i] = fGETQBIT(QvV,i) ? VuV.b[i] : 0)
+ITERATOR_INSN_ANY_SLOT(8,vandvnqv,"Vd32=vand(!Qv4,Vu32)","Mask off bytes",
+VdV.b[i] = !fGETQBIT(QvV,i) ? VuV.b[i] : 0)
+
+
+ /***************************************************
+ * Compare Vector with Vector
+ ***************************************************/
+#define VCMP(DEST, ASRC, ASRCOP, CMP, N, SRC, MASK, WIDTH)        \
+{ \
+       for(fHIDE(int) i = 0; i < fVBYTES(); i += WIDTH) { \
+		fSETQBITS(DEST,WIDTH,MASK,i,ASRC ASRCOP ((VuV.SRC[i/WIDTH] CMP VvV.SRC[i/WIDTH]) ? MASK : 0)); \
+    } \
+       }
+
+
+#define MMVEC_CMPGT(TYPE,TYPE2,TYPE3,DESCR,N,MASK,WIDTH,SRC) \
+EXTINSN(V6_vgt##TYPE,       "Qd4=vcmp.gt(Vu32." TYPE2 ",Vv32." TYPE2 ")",  ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA), DESCR" greater than", \
+	VCMP(QdV, , , >, N, SRC, MASK, WIDTH)) \
+EXTINSN(V6_vgt##TYPE##_and, "Qx4&=vcmp.gt(Vu32." TYPE2 ",Vv32." TYPE2 ")", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA), DESCR" greater than with predicate-and", \
+	VCMP(QxV, fGETQBITS(QxV,WIDTH,MASK,i), &, >, N, SRC, MASK, WIDTH)) \
+EXTINSN(V6_vgt##TYPE##_or,  "Qx4|=vcmp.gt(Vu32." TYPE2 ",Vv32." TYPE2 ")", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA), DESCR" greater than with predicate-or", \
+	VCMP(QxV, fGETQBITS(QxV,WIDTH,MASK,i), |, >, N, SRC, MASK, WIDTH)) \
+EXTINSN(V6_vgt##TYPE##_xor, "Qx4^=vcmp.gt(Vu32." TYPE2 ",Vv32." TYPE2 ")", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA), DESCR" greater than with predicate-xor", \
+	VCMP(QxV, fGETQBITS(QxV,WIDTH,MASK,i), ^, >, N, SRC, MASK, WIDTH))
+
+#define MMVEC_CMP(TYPE,TYPE2,TYPE3,DESCR,N,MASK, WIDTH, SRC)\
+MMVEC_CMPGT(TYPE,TYPE2,TYPE3,DESCR,N,MASK,WIDTH,SRC) \
+EXTINSN(V6_veq##TYPE,       "Qd4=vcmp.eq(Vu32." TYPE2 ",Vv32." TYPE2 ")",  ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA), DESCR" equal to", \
+	VCMP(QdV, , , ==, N, SRC, MASK, WIDTH)) \
+EXTINSN(V6_veq##TYPE##_and, "Qx4&=vcmp.eq(Vu32." TYPE2 ",Vv32." TYPE2 ")", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA), DESCR" equalto with predicate-and", \
+	VCMP(QxV, fGETQBITS(QxV,WIDTH,MASK,i), &, ==, N, SRC, MASK, WIDTH)) \
+EXTINSN(V6_veq##TYPE##_or,  "Qx4|=vcmp.eq(Vu32." TYPE2 ",Vv32." TYPE2 ")", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA), DESCR" equalto with predicate-or", \
+	VCMP(QxV, fGETQBITS(QxV,WIDTH,MASK,i), |, ==, N, SRC, MASK, WIDTH)) \
+EXTINSN(V6_veq##TYPE##_xor, "Qx4^=vcmp.eq(Vu32." TYPE2 ",Vv32." TYPE2 ")", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA), DESCR" equalto with predicate-xor", \
+	VCMP(QxV, fGETQBITS(QxV,WIDTH,MASK,i), ^, ==, N, SRC, MASK, WIDTH))
+
+
+MMVEC_CMP(w,"w","","Vector Word Compare ", fVELEM(32), 0xF, 4, w)
+MMVEC_CMP(h,"h","","Vector Half Compare ", fVELEM(16), 0x3, 2, h)
+MMVEC_CMP(b,"b","","Vector Half Compare ", fVELEM(8),  0x1, 1, b)
+MMVEC_CMPGT(uw,"uw","","Vector Unsigned Half Compare ", fVELEM(32), 0xF, 4,uw)
+MMVEC_CMPGT(uh,"uh","","Vector Unsigned Half Compare ", fVELEM(16), 0x3, 2,uh)
+MMVEC_CMPGT(ub,"ub","","Vector Unsigned Byte Compare ", fVELEM(8),  0x1, 1,ub)
+
+/***************************************************
+* Predicate Operations
+***************************************************/
+
+EXTINSN(V6_pred_scalar2, "Qd4=vsetq(Rt32)",         ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VP),   "Set Vector Predicate ",
+{
+    fHIDE(int i;)
+    for(i = 0; i < fVBYTES(); i++) fSETQBIT(QdV,i,(i < (RtV & (fVBYTES()-1))) ? 1 : 0);
+})
+
+EXTINSN(V6_pred_scalar2v2, "Qd4=vsetq2(Rt32)",         ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VP),   "Set Vector Predicate ",
+{
+    fHIDE(int i;)
+    for(i = 0; i < fVBYTES(); i++) fSETQBIT(QdV,i,(i <= ((RtV-1) & (fVBYTES()-1))) ? 1 : 0);
+})
+
+
+ITERATOR_INSN_ANY_SLOT_DOUBLE_VEC(8, shuffeqw, "Qd4.h=vshuffe(Qs4.w,Qt4.w)","Shrink Predicate", fSETQBIT(QdV,i, (i & 2) ? fGETQBIT(QsV,i-2) : fGETQBIT(QtV,i) ) )
+ITERATOR_INSN_ANY_SLOT_DOUBLE_VEC(8, shuffeqh, "Qd4.b=vshuffe(Qs4.h,Qt4.h)","Shrink Predicate", fSETQBIT(QdV,i, (i & 1) ? fGETQBIT(QsV,i-1) : fGETQBIT(QtV,i) ) )
+ITERATOR_INSN_ANY_SLOT_DOUBLE_VEC(8, pred_or, "Qd4=or(Qs4,Qt4)","Vector Predicate Or", fSETQBIT(QdV,i,fGETQBIT(QsV,i) || fGETQBIT(QtV,i) ) )
+ITERATOR_INSN_ANY_SLOT_DOUBLE_VEC(8, pred_and, "Qd4=and(Qs4,Qt4)","Vector Predicate And", fSETQBIT(QdV,i,fGETQBIT(QsV,i) && fGETQBIT(QtV,i) ) )
+ITERATOR_INSN_ANY_SLOT_DOUBLE_VEC(8, pred_xor, "Qd4=xor(Qs4,Qt4)","Vector Predicate Xor", fSETQBIT(QdV,i,fGETQBIT(QsV,i) ^ fGETQBIT(QtV,i) ) )
+ITERATOR_INSN_ANY_SLOT_DOUBLE_VEC(8, pred_or_n, "Qd4=or(Qs4,!Qt4)","Vector Predicate Or with not", fSETQBIT(QdV,i,fGETQBIT(QsV,i) || !fGETQBIT(QtV,i) ) )
+ITERATOR_INSN_ANY_SLOT_DOUBLE_VEC(8, pred_and_n, "Qd4=and(Qs4,!Qt4)","Vector Predicate And  with not", fSETQBIT(QdV,i,fGETQBIT(QsV,i) && !fGETQBIT(QtV,i) ) )
+ITERATOR_INSN_ANY_SLOT(8, pred_not, "Qd4=not(Qs4)","Vector Predicate Not", fSETQBIT(QdV,i,!fGETQBIT(QsV,i) ) )
+
+
+
+EXTINSN(V6_vcmov,  "if (Ps4) Vd32=Vu32",  ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA),   "Conditional Mov",
+{
+if (fLSBOLD(PsV))	{
+	fHIDE(int i;)
+	fVFOREACH(8, i) {
+		VdV.ub[i] = VuV.ub[i];
+	}
+	} else {CANCEL;}
+})
+
+EXTINSN(V6_vncmov,  "if (!Ps4) Vd32=Vu32",  ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA),   "Conditional Mov",
+{
+if (fLSBOLDNOT(PsV))	{
+	fHIDE(int i;)
+	fVFOREACH(8, i) {
+		VdV.ub[i] = VuV.ub[i];
+	}
+	} else {CANCEL;}
+})
+
+EXTINSN(V6_vccombine,  "if (Ps4) Vdd32=vcombine(Vu32,Vv32)",	ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA_DV),   "Conditional Combine",
+{
+if (fLSBOLD(PsV))	{
+	fHIDE(int i;)
+	fVFOREACH(8, i) {
+		VddV.v[0].ub[i] = VvV.ub[i];
+		VddV.v[1].ub[i] = VuV.ub[i];
+	}
+	} else {CANCEL;}
+})
+
+EXTINSN(V6_vnccombine,  "if (!Ps4) Vdd32=vcombine(Vu32,Vv32)",	ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA_DV),   "Conditional Combine",
+{
+if (fLSBOLDNOT(PsV))	{
+	fHIDE(int i;)
+	fVFOREACH(8, i) {
+		VddV.v[0].ub[i] = VvV.ub[i];
+		VddV.v[1].ub[i] = VuV.ub[i];
+	}
+	} else {CANCEL;}
+})
+
+
+
+ITERATOR_INSN_ANY_SLOT(8,vmux,"Vd32=vmux(Qt4,Vu32,Vv32)",
+"Vector Select Element 8-bit",
+    VdV.ub[i] = fGETQBIT(QtV,i) ? VuV.ub[i] : VvV.ub[i])
+
+ITERATOR_INSN_ANY_SLOT_DOUBLE_VEC(8,vswap,"Vdd32=vswap(Qt4,Vu32,Vv32)",
+"Vector Swap Element 8-bit",
+    VddV.v[0].ub[i] =  fGETQBIT(QtV,i) ? VuV.ub[i] : VvV.ub[i];
+	VddV.v[1].ub[i] = !fGETQBIT(QtV,i) ? VuV.ub[i] : VvV.ub[i])
+
+
+/***************************************************************************
+*
+*   MMVECTOR SORTING
+*
+****************************************************************************/
+
+#define MMVEC_SORT(TYPE,TYPE2,DESCR,ELEMENTSIZE,SRC)\
+ITERATOR_INSN2_ANY_SLOT(ELEMENTSIZE,vmax##TYPE, "Vd32=vmax" TYPE2 "(Vu32,Vv32)", "Vd32."#SRC"=vmax(Vu32."#SRC",Vv32."#SRC")", "Vector " DESCR " max", VdV.SRC[i] = (VuV.SRC[i] > VvV.SRC[i]) ? VuV.SRC[i] :  VvV.SRC[i])  \
+ITERATOR_INSN2_ANY_SLOT(ELEMENTSIZE,vmin##TYPE, "Vd32=vmin" TYPE2 "(Vu32,Vv32)", "Vd32."#SRC"=vmin(Vu32."#SRC",Vv32."#SRC")", "Vector " DESCR " min", VdV.SRC[i] = (VuV.SRC[i] < VvV.SRC[i]) ? VuV.SRC[i] :  VvV.SRC[i])
+
+MMVEC_SORT(b,"b", "signed byte",    8,  b)
+MMVEC_SORT(ub,"ub", "unsigned byte",    8,  ub)
+MMVEC_SORT(uh,"uh", "unsigned halfword",16, uh)
+MMVEC_SORT(h,   "h",    "halfword",         16, h)
+MMVEC_SORT(w,   "w",    "word",             32, w)
+
+
+
+
+
+
+
+
+
+/*************************************************************
+* SHUFFLES
+****************************************************************/
+
+ITERATOR_INSN2_ANY_SLOT(16,vsathub,"Vd32=vsathub(Vu32,Vv32)","Vd32.ub=vsat(Vu32.h,Vv32.h)",
+"Saturate and pack 32 halfwords to 32 unsigned bytes, and interleave them",
+    fSETBYTE(0, VdV.uh[i], fVSATUB(VvV.h[i]));
+    fSETBYTE(1, VdV.uh[i], fVSATUB(VuV.h[i])))
+
+ITERATOR_INSN2_ANY_SLOT(32,vsatwh,"Vd32=vsatwh(Vu32,Vv32)","Vd32.h=vsat(Vu32.w,Vv32.w)",
+"Saturate and pack 16 words to 16 halfwords, and interleave them",
+    fSETHALF(0, VdV.w[i], fVSATH(VvV.w[i]));
+    fSETHALF(1, VdV.w[i], fVSATH(VuV.w[i])))
+
+ITERATOR_INSN2_ANY_SLOT(32,vsatuwuh,"Vd32=vsatuwuh(Vu32,Vv32)","Vd32.uh=vsat(Vu32.uw,Vv32.uw)",
+"Saturate and pack 16 words to 16 halfwords, and interleave them",
+    fSETHALF(0, VdV.w[i], fVSATUH(VvV.uw[i]));
+    fSETHALF(1, VdV.w[i], fVSATUH(VuV.uw[i])))
+
+ITERATOR_INSN2_ANY_SLOT(16,vshuffeb,"Vd32=vshuffeb(Vu32,Vv32)","Vd32.b=vshuffe(Vu32.b,Vv32.b)",
+"Shuffle half words with in a lane",
+    fSETBYTE(0, VdV.uh[i], fGETUBYTE(0, VvV.uh[i]));
+    fSETBYTE(1, VdV.uh[i], fGETUBYTE(0, VuV.uh[i])))
+
+ITERATOR_INSN2_ANY_SLOT(16,vshuffob,"Vd32=vshuffob(Vu32,Vv32)","Vd32.b=vshuffo(Vu32.b,Vv32.b)",
+"Shuffle half words with in a lane",
+    fSETBYTE(0, VdV.uh[i], fGETUBYTE(1, VvV.uh[i]));
+    fSETBYTE(1, VdV.uh[i], fGETUBYTE(1, VuV.uh[i])))
+
+ITERATOR_INSN2_ANY_SLOT(32,vshufeh,"Vd32=vshuffeh(Vu32,Vv32)","Vd32.h=vshuffe(Vu32.h,Vv32.h)",
+"Shuffle half words with in a lane",
+    fSETHALF(0, VdV.uw[i], fGETUHALF(0, VvV.uw[i]));
+    fSETHALF(1, VdV.uw[i], fGETUHALF(0, VuV.uw[i])))
+
+ITERATOR_INSN2_ANY_SLOT(32,vshufoh,"Vd32=vshuffoh(Vu32,Vv32)","Vd32.h=vshuffo(Vu32.h,Vv32.h)",
+"Shuffle half words with in a lane",
+    fSETHALF(0, VdV.uw[i], fGETUHALF(1, VvV.uw[i]));
+    fSETHALF(1, VdV.uw[i], fGETUHALF(1, VuV.uw[i])))
+
+
+
+
+/**************************************************************************
+* Double Vector Shuffles
+**************************************************************************/
+
+EXTINSN(V6_vshuff, "vshuff(Vy32,Vx32,Rt32)",
+ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VP_VS),
+"2x2->2x2 transpose, for multiple data sizes, inplace",
+{
+	fHIDE(int offset;)
+	for (offset=1; offset<fVBYTES(); offset<<=1) {
+		if ( RtV & offset) {
+			    fHIDE(int k;) \
+				fVFOREACH(8, k) {\
+				if (!( k & offset)) {
+					fSWAPB(VyV.ub[k], VxV.ub[k+offset]);
+				}
+			}
+		}
+	}
+	})
+
+EXTINSN(V6_vshuffvdd, "Vdd32=vshuff(Vu32,Vv32,Rt8)",
+ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VP_VS),
+"2x2->2x2 transpose for multiple data sizes",
+{
+	fHIDE(int offset;)
+	VddV.v[0] = VvV;
+	VddV.v[1] = VuV;
+	for (offset=1; offset<fVBYTES(); offset<<=1) {
+		if ( RtV & offset) {
+			    fHIDE(int k;) \
+				fVFOREACH(8, k) {\
+				if (!( k & offset)) {
+					fSWAPB(VddV.v[1].ub[k], VddV.v[0].ub[k+offset]);
+				}
+			}
+		}
+	}
+	})
+
+EXTINSN(V6_vdeal, "vdeal(Vy32,Vx32,Rt32)",
+ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VP_VS),
+" vector - vector deal - or deinterleave, for multiple data sizes, inplace",
+{
+	fHIDE(int offset;)
+	for (offset=fVBYTES()>>1; offset>0; offset>>=1) {
+		if ( RtV & offset) {
+			    fHIDE(int k;) \
+				fVFOREACH(8, k) {\
+				if (!( k & offset)) {
+					fSWAPB(VyV.ub[k], VxV.ub[k+offset]);
+				}
+			}
+		}
+	}
+	})
+
+EXTINSN(V6_vdealvdd, "Vdd32=vdeal(Vu32,Vv32,Rt8)",
+ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VP_VS),
+" vector - vector deal - or deinterleave, for multiple data sizes",
+{
+	fHIDE(int offset;)
+	VddV.v[0] = VvV;
+	VddV.v[1] = VuV;
+	for (offset=fVBYTES()>>1; offset>0; offset>>=1) {
+		if ( RtV & offset) {
+			    fHIDE(int k;) \
+				fVFOREACH(8, k) {\
+				if (!( k & offset)) {
+					fSWAPB(VddV.v[1].ub[k], VddV.v[0].ub[k+offset]);
+				}
+			}
+		}
+	}
+	})
+
+/**************************************************************************/
+
+
+
+ITERATOR_INSN2_ANY_SLOT_DOUBLE_VEC(32,vshufoeh,"Vdd32=vshuffoeh(Vu32,Vv32)","Vdd32.h=vshuffoe(Vu32.h,Vv32.h)",
+"Vector Shuffle half words",
+    fSETHALF(0, VddV.v[0].uw[i], fGETUHALF(0, VvV.uw[i]));
+    fSETHALF(1, VddV.v[0].uw[i], fGETUHALF(0, VuV.uw[i]));
+    fSETHALF(0, VddV.v[1].uw[i], fGETUHALF(1, VvV.uw[i]));
+    fSETHALF(1, VddV.v[1].uw[i], fGETUHALF(1, VuV.uw[i])))
+
+ITERATOR_INSN2_ANY_SLOT_DOUBLE_VEC(16,vshufoeb,"Vdd32=vshuffoeb(Vu32,Vv32)","Vdd32.b=vshuffoe(Vu32.b,Vv32.b)",
+"Vector Shuffle bytes",
+    fSETBYTE(0, VddV.v[0].uh[i], fGETUBYTE(0, VvV.uh[i]));
+    fSETBYTE(1, VddV.v[0].uh[i], fGETUBYTE(0, VuV.uh[i]));
+    fSETBYTE(0, VddV.v[1].uh[i], fGETUBYTE(1, VvV.uh[i]));
+    fSETBYTE(1, VddV.v[1].uh[i], fGETUBYTE(1, VuV.uh[i])))
+
+
+/***************************************************************
+* Deal
+***************************************************************/
+
+ITERATOR_INSN2_PERMUTE_SLOT(32, vdealh, "Vd32=vdealh(Vu32)", "Vd32.h=vdeal(Vu32.h)",
+"Deal Halfwords",
+    VdV.uh[i  ] = fGETUHALF(0, VuV.uw[i]);
+    VdV.uh[i+fVELEM(32)] = fGETUHALF(1, VuV.uw[i]))
+
+ITERATOR_INSN2_PERMUTE_SLOT(16, vdealb, "Vd32=vdealb(Vu32)", "Vd32.b=vdeal(Vu32.b)",
+"Deal Halfwords",
+    VdV.ub[i   ] = fGETUBYTE(0, VuV.uh[i]);
+    VdV.ub[i+fVELEM(16)] = fGETUBYTE(1, VuV.uh[i]))
+
+ITERATOR_INSN2_PERMUTE_SLOT(32, vdealb4w,  "Vd32=vdealb4w(Vu32,Vv32)", "Vd32.b=vdeale(Vu32.b,Vv32.b)",
+"Deal Two Vectors Bytes",
+    VdV.ub[0+i ] = fGETUBYTE(0, VvV.uw[i]);
+    VdV.ub[fVELEM(32)+i ] = fGETUBYTE(2, VvV.uw[i]);
+    VdV.ub[2*fVELEM(32)+i] = fGETUBYTE(0, VuV.uw[i]);
+    VdV.ub[3*fVELEM(32)+i] = fGETUBYTE(2, VuV.uw[i]))
+
+/***************************************************************
+* shuffle
+***************************************************************/
+
+ITERATOR_INSN2_PERMUTE_SLOT(32, vshuffh, "Vd32=vshuffh(Vu32)", "Vd32.h=vshuff(Vu32.h)",
+"Deal Halfwords",
+    fSETHALF(0, VdV.uw[i], VuV.uh[i]);
+    fSETHALF(1, VdV.uw[i], VuV.uh[i+fVELEM(32)]))
+
+ITERATOR_INSN2_PERMUTE_SLOT(16, vshuffb, "Vd32=vshuffb(Vu32)", "Vd32.b=vshuff(Vu32.b)",
+"Deal Halfwords",
+    fSETBYTE(0, VdV.uh[i], VuV.ub[i]);
+    fSETBYTE(1, VdV.uh[i], VuV.ub[i+fVELEM(16)]))
+
+
+
+
+
+/***********************************************************
+* INSERT AND EXTRACT
+*********************************************************/
+EXTINSN(V6_extractw, "Rd32=vextract(Vu32,Rs32)",
+ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA,A_MEMLIKE,A_RESTRICT_SLOT0ONLY),
+"Extract an element from a vector to scalar",
+fHIDE(warn("RdN=%d VuN=%d RsN=%d RsV=0x%08x widx=%d",RdN,VuN,RsN,RsV,((RsV & (fVBYTES()-1)) >> 2));)
+RdV = VuV.uw[ (RsV & (fVBYTES()-1)) >> 2];
+fHIDE(warn("RdV=0x%08x",RdV);))
+
+EXTINSN(V6_vinsertwr, "Vx32.w=vinsert(Rt32)",
+ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VX),
+"Insert Word Scalar into Vector",
+VxV.uw[0] = RtV;)
+
+
+
+
+ITERATOR_INSN_MPY_SLOT_LATE(32,lvsplatw, "Vd32=vsplat(Rt32)", "Replicates scalar accross words in vector", VdV.uw[i] = RtV)
+
+ITERATOR_INSN_MPY_SLOT_LATE(16,lvsplath, "Vd32.h=vsplat(Rt32)", "Replicates scalar accross halves in vector", VdV.uh[i] = RtV)
+
+ITERATOR_INSN_MPY_SLOT_LATE(8,lvsplatb, "Vd32.b=vsplat(Rt32)", "Replicates scalar accross bytes in vector", VdV.ub[i] = RtV)
+
+
+ITERATOR_INSN_ANY_SLOT(32,vassign,"Vd32=Vu32","Copy a vector",VdV.w[i]=VuV.w[i])
+
+
+ITERATOR_INSN_ANY_SLOT_DOUBLE_VEC(8,vcombine,"Vdd32=vcombine(Vu32,Vv32)",
+"Vector assign, Any two to Vector Pair",
+    VddV.v[0].ub[i] = VvV.ub[i];
+    VddV.v[1].ub[i] = VuV.ub[i])
+
+
+
+///////////////////////////////////////////////////////////////////////////
+
+
+/*********************************************************
+* GENERAL PERMUTE NETWORKS
+*********************************************************/
+
+
+EXTINSN(V6_vdelta, "Vd32=vdelta(Vu32,Vv32)",    ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VP),
+"Reverse Benes Butterfly network ",
+{
+    fHIDE(int offset;)
+    fHIDE(int k;)
+    fHIDE(mmvector_t tmp;)
+    tmp = VuV;
+    for (offset=fVBYTES(); (offset>>=1)>0; ) {
+        for (k = 0; k<fVBYTES(); k++) {
+            VdV.ub[k] = (VvV.ub[k]&offset) ? tmp.ub[k^offset] : tmp.ub[k];
+        }
+        for (k = 0; k<fVBYTES(); k++) {
+            tmp.ub[k] = VdV.ub[k];
+        }
+    }
+})
+
+
+EXTINSN(V6_vrdelta, "Vd32=vrdelta(Vu32,Vv32)",  ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VP),
+"Forward Benes Butterfly network ",
+{
+	fHIDE(int offset;)
+    fHIDE(int k;)
+    fHIDE(mmvector_t tmp;)
+    tmp = VuV;
+    for (offset=1; offset<fVBYTES(); offset<<=1){
+        for (k = 0; k<fVBYTES(); k++) {
+            VdV.ub[k] = (VvV.ub[k]&offset) ? tmp.ub[k^offset] : tmp.ub[k];
+        }
+        for (k = 0; k<fVBYTES(); k++) {
+            tmp.ub[k] = VdV.ub[k];
+        }
+    }
+})
+
+
+
+
+
+ITERATOR_INSN2_SHIFT_SLOT(32,vcl0w,"Vd32=vcl0w(Vu32)","Vd32.uw=vcl0(Vu32.uw)",         "Count Leading Zeros in Word",     VdV.uw[i]=fCL1_4(~VuV.uw[i]))
+ITERATOR_INSN2_SHIFT_SLOT(16,vcl0h,"Vd32=vcl0h(Vu32)","Vd32.uh=vcl0(Vu32.uh)",         "Count Leading Zeros in Word",    VdV.uh[i]=fCL1_2(~VuV.uh[i]))
+
+ITERATOR_INSN2_SHIFT_SLOT(32,vnormamtw,"Vd32=vnormamtw(Vu32)","Vd32.w=vnormamt(Vu32.w)","Norm Amount Word",
+VdV.w[i]=fMAX(fCL1_4(~VuV.w[i]),fCL1_4(VuV.w[i]))-1; fHIDE(IV1DEAD();))
+ITERATOR_INSN2_SHIFT_SLOT(16,vnormamth,"Vd32=vnormamth(Vu32)","Vd32.h=vnormamt(Vu32.h)","Norm Amount Halfword",
+VdV.h[i]=fMAX(fCL1_2(~VuV.h[i]),fCL1_2(VuV.h[i]))-1; fHIDE(IV1DEAD();))
+
+ITERATOR_INSN_SHIFT_SLOT_VV_LATE(32,vaddclbw,"Vd32.w=vadd(vclb(Vu32.w),Vv32.w)",
+"Count leading bits and add",
+VdV.w[i] = fMAX(fCL1_4(~VuV.w[i]),fCL1_4(VuV.w[i])) + VvV.w[i])
+
+ITERATOR_INSN_SHIFT_SLOT_VV_LATE(16,vaddclbh,"Vd32.h=vadd(vclb(Vu32.h),Vv32.h)",
+"Count leading bits and add",
+VdV.h[i] = fMAX(fCL1_2(~VuV.h[i]),fCL1_2(VuV.h[i])) + VvV.h[i])
+
+
+ITERATOR_INSN2_SHIFT_SLOT(16,vpopcounth,"Vd32=vpopcounth(Vu32)","Vd32.h=vpopcount(Vu32.h)",   "Count Leading Zeros in Word",  VdV.uh[i]=fCOUNTONES_2(VuV.uh[i]))
+
+
+#define fHIST(INPUTVEC) \
+	fUARCH_NOTE_PUMP_4X(); \
+	fHIDE(int lane;) \
+	fHIDE(mmvector_t tmp;) \
+	fVFOREACH(128, lane) { \
+		for (fHIDE(int )i=0; i<128/8; ++i) { \
+			unsigned char value = INPUTVEC.ub[(128/8)*lane+i]; \
+			unsigned char regno = value>>3; \
+			unsigned char element = value & 7; \
+			READ_EXT_VREG(regno,tmp,0); \
+			tmp.uh[(128/16)*lane+(element)]++; \
+			WRITE_EXT_VREG(regno,tmp,EXT_NEW); \
+		} \
+	}
+
+#define fHISTQ(INPUTVEC,QVAL) \
+	fUARCH_NOTE_PUMP_4X(); \
+	fHIDE(int lane;) \
+	fHIDE(mmvector_t tmp;) \
+	fVFOREACH(128, lane) { \
+		for (fHIDE(int )i=0; i<128/8; ++i) { \
+			unsigned char value = INPUTVEC.ub[(128/8)*lane+i]; \
+			unsigned char regno = value>>3; \
+			unsigned char element = value & 7; \
+			READ_EXT_VREG(regno,tmp,0); \
+			if (fGETQBIT(QVAL,128/8*lane+i)) tmp.uh[(128/16)*lane+(element)]++; \
+			WRITE_EXT_VREG(regno,tmp,EXT_NEW); \
+		} \
+	}
+
+
+
+EXTINSN(V6_vhist, "vhist",ATTRIBS(A_EXTENSION,A_CVI,A_CVI_4SLOT), "vhist instruction",{ fHIDE(mmvector_t inputVec;) inputVec=fTMPVDATA(); fHIST(inputVec); })
+EXTINSN(V6_vhistq, "vhist(Qv4)",ATTRIBS(A_EXTENSION,A_CVI,A_CVI_4SLOT), "vhist instruction",{ fHIDE(mmvector_t inputVec;) inputVec=fTMPVDATA(); fHISTQ(inputVec,QvV); })
+
+#undef fHIST
+#undef fHISTQ
+
+
+/* **** WEIGHTED HISTOGRAM **** */
+
+
+#if 1
+#define WHIST(EL,MASK,BSHIFT,COND,SATF) \
+	fHIDE(unsigned int) bucket = fGETUBYTE(0,input.h[i]); \
+	fHIDE(unsigned int) weight = fGETUBYTE(1,input.h[i]); \
+	fHIDE(unsigned int) vindex = (bucket >> 3) & 0x1F; \
+	fHIDE(unsigned int) elindex = ((i>>BSHIFT) & (~MASK)) | ((bucket>>BSHIFT) & MASK); \
+	fHIDE(mmvector_t tmp;) \
+	READ_EXT_VREG(vindex,tmp,0); \
+	COND tmp.EL[elindex] = SATF(tmp.EL[elindex] + weight); \
+	WRITE_EXT_VREG(vindex,tmp,EXT_NEW); \
+	fUARCH_NOTE_PUMP_2X();
+
+ITERATOR_INSN_VHISTLIKE(16,vwhist256,"vwhist256","vector weighted histogram halfword counters", WHIST(uh,7,0,,))
+ITERATOR_INSN_VHISTLIKE(16,vwhist256q,"vwhist256(Qv4)","vector weighted histogram halfword counters", WHIST(uh,7,0,if (fGETQBIT(QvV,2*i)),))
+ITERATOR_INSN_VHISTLIKE(16,vwhist256_sat,"vwhist256:sat","vector weighted histogram halfword counters", WHIST(uh,7,0,,fVSATUH))
+ITERATOR_INSN_VHISTLIKE(16,vwhist256q_sat,"vwhist256(Qv4):sat","vector weighted histogram halfword counters", WHIST(uh,7,0,if (fGETQBIT(QvV,2*i)),fVSATUH))
+ITERATOR_INSN_VHISTLIKE(16,vwhist128,"vwhist128","vector weighted histogram word counters", WHIST(uw,3,1,,))
+ITERATOR_INSN_VHISTLIKE(16,vwhist128q,"vwhist128(Qv4)","vector weighted histogram word counters", WHIST(uw,3,1,if (fGETQBIT(QvV,2*i)),))
+ITERATOR_INSN_VHISTLIKE(16,vwhist128m,"vwhist128(#u1)","vector weighted histogram word counters", WHIST(uw,3,1,if ((bucket & 1) == uiV),))
+ITERATOR_INSN_VHISTLIKE(16,vwhist128qm,"vwhist128(Qv4,#u1)","vector weighted histogram word counters", WHIST(uw,3,1,if (((bucket & 1) == uiV) && fGETQBIT(QvV,2*i)),))
+
+
+#endif
+
+
+
+/* ******   lookup table instructions                          ***********  */
+
+/* Use low bits from idx to choose next-bigger elements from vector, then use LSB from idx to choose odd or even element */
+
+ITERATOR_INSN_PERMUTE_SLOT(8,vlutvvb,"Vd32.b=vlut32(Vu32.b,Vv32.b,Rt8)","vector-vector table lookup",
+fHIDE(unsigned int idx;) fHIDE(int matchval;) fHIDE(int oddhalf;)
+matchval = RtV & 0x7;
+oddhalf = (RtV >> (fVECLOGSIZE()-6)) & 0x1;
+idx = VuV.ub[i];
+VdV.b[i] = ((idx & 0xE0) == (matchval << 5)) ? fGETBYTE(oddhalf,VvV.h[idx % fVELEM(16)]) : 0)
+
+
+ITERATOR_INSN_PERMUTE_SLOT_DOUBLE_VEC(8,vlutvvb_oracc,"Vx32.b|=vlut32(Vu32.b,Vv32.b,Rt8)","vector-vector table lookup",
+fHIDE(unsigned int idx;) fHIDE(int matchval;) fHIDE(int oddhalf;)
+matchval = RtV & 0x7;
+oddhalf = (RtV >> (fVECLOGSIZE()-6)) & 0x1;
+idx = VuV.ub[i];
+VxV.b[i] |= ((idx & 0xE0) == (matchval << 5)) ? fGETBYTE(oddhalf,VvV.h[idx % fVELEM(16)]) : 0)
+
+ITERATOR_INSN_PERMUTE_SLOT_DOUBLE_VEC(16,vlutvwh,"Vdd32.h=vlut16(Vu32.b,Vv32.h,Rt8)","vector-vector table lookup",
+fHIDE(unsigned int idx;) fHIDE(int matchval;) fHIDE(int oddhalf;)
+matchval = RtV & 0xF;
+oddhalf = (RtV >> (fVECLOGSIZE()-6)) & 0x1;
+idx = fGETUBYTE(0,VuV.uh[i]);
+VddV.v[0].h[i] = ((idx & 0xF0) == (matchval << 4)) ? fGETHALF(oddhalf,VvV.w[idx % fVELEM(32)]) : 0;
+idx = fGETUBYTE(1,VuV.uh[i]);
+VddV.v[1].h[i] = ((idx & 0xF0) == (matchval << 4)) ? fGETHALF(oddhalf,VvV.w[idx % fVELEM(32)]) : 0)
+
+ITERATOR_INSN_PERMUTE_SLOT_DOUBLE_VEC(16,vlutvwh_oracc,"Vxx32.h|=vlut16(Vu32.b,Vv32.h,Rt8)","vector-vector table lookup",
+fHIDE(unsigned int idx;) fHIDE(int matchval;) fHIDE(int oddhalf;)
+matchval = fGETUBYTE(0,RtV) & 0xF;
+oddhalf = (RtV >> (fVECLOGSIZE()-6)) & 0x1;
+idx = fGETUBYTE(0,VuV.uh[i]);
+VxxV.v[0].h[i] |= ((idx & 0xF0) == (matchval << 4)) ? fGETHALF(oddhalf,VvV.w[idx % fVELEM(32)]) : 0;
+idx = fGETUBYTE(1,VuV.uh[i]);
+VxxV.v[1].h[i] |= ((idx & 0xF0) == (matchval << 4)) ? fGETHALF(oddhalf,VvV.w[idx % fVELEM(32)]) : 0)
+
+ITERATOR_INSN_PERMUTE_SLOT(8,vlutvvbi,"Vd32.b=vlut32(Vu32.b,Vv32.b,#u3)","vector-vector table lookup",
+fHIDE(unsigned int idx;) fHIDE(int matchval;) fHIDE(int oddhalf;)
+matchval = uiV & 0x7;
+oddhalf = (uiV >> (fVECLOGSIZE()-6)) & 0x1;
+idx = VuV.ub[i];
+VdV.b[i] = ((idx & 0xE0) == (matchval << 5)) ? fGETBYTE(oddhalf,VvV.h[idx % fVELEM(16)]) : 0)
+
+
+ITERATOR_INSN_PERMUTE_SLOT_DOUBLE_VEC(8,vlutvvb_oracci,"Vx32.b|=vlut32(Vu32.b,Vv32.b,#u3)","vector-vector table lookup",
+fHIDE(unsigned int idx;) fHIDE(int matchval;) fHIDE(int oddhalf;)
+matchval = uiV & 0x7;
+oddhalf = (uiV >> (fVECLOGSIZE()-6)) & 0x1;
+idx = VuV.ub[i];
+VxV.b[i] |= ((idx & 0xE0) == (matchval << 5)) ? fGETBYTE(oddhalf,VvV.h[idx % fVELEM(16)]) : 0)
+
+ITERATOR_INSN_PERMUTE_SLOT_DOUBLE_VEC(16,vlutvwhi,"Vdd32.h=vlut16(Vu32.b,Vv32.h,#u3)","vector-vector table lookup",
+fHIDE(unsigned int idx;) fHIDE(int matchval;) fHIDE(int oddhalf;)
+matchval = uiV & 0xF;
+oddhalf = (uiV >> (fVECLOGSIZE()-6)) & 0x1;
+idx = fGETUBYTE(0,VuV.uh[i]);
+VddV.v[0].h[i] = ((idx & 0xF0) == (matchval << 4)) ? fGETHALF(oddhalf,VvV.w[idx % fVELEM(32)]) : 0;
+idx = fGETUBYTE(1,VuV.uh[i]);
+VddV.v[1].h[i] = ((idx & 0xF0) == (matchval << 4)) ? fGETHALF(oddhalf,VvV.w[idx % fVELEM(32)]) : 0)
+
+ITERATOR_INSN_PERMUTE_SLOT_DOUBLE_VEC(16,vlutvwh_oracci,"Vxx32.h|=vlut16(Vu32.b,Vv32.h,#u3)","vector-vector table lookup",
+fHIDE(unsigned int idx;) fHIDE(int matchval;) fHIDE(int oddhalf;)
+matchval = uiV & 0xF;
+oddhalf = (uiV >> (fVECLOGSIZE()-6)) & 0x1;
+idx = fGETUBYTE(0,VuV.uh[i]);
+VxxV.v[0].h[i] |= ((idx & 0xF0) == (matchval << 4)) ? fGETHALF(oddhalf,VvV.w[idx % fVELEM(32)]) : 0;
+idx = fGETUBYTE(1,VuV.uh[i]);
+VxxV.v[1].h[i] |= ((idx & 0xF0) == (matchval << 4)) ? fGETHALF(oddhalf,VvV.w[idx % fVELEM(32)]) : 0)
+
+ITERATOR_INSN_PERMUTE_SLOT(8,vlutvvb_nm,"Vd32.b=vlut32(Vu32.b,Vv32.b,Rt8):nomatch","vector-vector table lookup",
+fHIDE(unsigned int idx;) fHIDE(int oddhalf;) fHIDE(int matchval;)
+    matchval = RtV & 0x7;
+    oddhalf = (RtV >> (fVECLOGSIZE()-6)) & 0x1;
+    idx = VuV.ub[i];
+    idx = (idx&0x1F) | (matchval<<5);
+    VdV.b[i] = fGETBYTE(oddhalf,VvV.h[idx % fVELEM(16)]))
+
+ITERATOR_INSN_PERMUTE_SLOT_DOUBLE_VEC(16,vlutvwh_nm,"Vdd32.h=vlut16(Vu32.b,Vv32.h,Rt8):nomatch","vector-vector table lookup",
+fHIDE(unsigned int idx;) fHIDE(int oddhalf;) fHIDE(int matchval;)
+    matchval = RtV & 0xF;
+    oddhalf = (RtV >> (fVECLOGSIZE()-6)) & 0x1;
+    idx = fGETUBYTE(0,VuV.uh[i]);
+    idx = (idx&0x0F) | (matchval<<4);
+    VddV.v[0].h[i] = fGETHALF(oddhalf,VvV.w[idx % fVELEM(32)]);
+    idx = fGETUBYTE(1,VuV.uh[i]);
+    idx = (idx&0x0F) | (matchval<<4);
+    VddV.v[1].h[i] = fGETHALF(oddhalf,VvV.w[idx % fVELEM(32)]))
+
+
+
+
+/******************************************************************************
+NON LINEAR - V65
+ ******************************************************************************/
+
+ITERATOR_INSN_SLOT2_DOUBLE_VEC(16,vmpahhsat,"Vx32.h=vmpa(Vx32.h,Vu32.h,Rtt32.h):sat","piecewise linear approximation",
+    VxV.h[i]= fVSATH( ( ( fMPY16SS(VxV.h[i],VuV.h[i])<<1) + (fGETHALF(( (VuV.h[i]>>14)&0x3), RttV )<<15))>>16))
+
+
+ITERATOR_INSN_SLOT2_DOUBLE_VEC(16,vmpauhuhsat,"Vx32.h=vmpa(Vx32.h,Vu32.uh,Rtt32.uh):sat","piecewise linear approximation",
+    VxV.h[i]= fVSATH( (  fMPY16SU(VxV.h[i],VuV.uh[i]) + (fGETUHALF(((VuV.uh[i]>>14)&0x3), RttV )<<15))>>16))
+
+ITERATOR_INSN_SLOT2_DOUBLE_VEC(16,vmpsuhuhsat,"Vx32.h=vmps(Vx32.h,Vu32.uh,Rtt32.uh):sat","piecewise linear approximation",
+    VxV.h[i]= fVSATH( (  fMPY16SU(VxV.h[i],VuV.uh[i]) - (fGETUHALF(((VuV.uh[i]>>14)&0x3), RttV )<<15))>>16))
+
+
+ITERATOR_INSN_SLOT2_DOUBLE_VEC(16,vlut4,"Vd32.h=vlut4(Vu32.uh,Rtt32.h)","4 entry lookup table",
+    VdV.h[i]= fGETHALF(  ((VuV.h[i]>>14)&0x3), RttV ))
+
+
+
+/******************************************************************************
+V65
+ ******************************************************************************/
+
+ITERATOR_INSN_MPY_SLOT_NOV1(32,vmpyuhe,"Vd32.uw=vmpye(Vu32.uh,Rt32.uh)",
+"Vector even halfword unsigned multiply by scalar",
+    VdV.uw[i] = fMPY16UU(fGETUHALF(0, VuV.uw[i]),fGETUHALF(0,RtV)))
+
+
+ITERATOR_INSN_MPY_SLOT_NOV1(32,vmpyuhe_acc,"Vx32.uw+=vmpye(Vu32.uh,Rt32.uh)",
+"Vector even halfword unsigned multiply by scalar",
+    VxV.uw[i] += fMPY16UU(fGETUHALF(0, VuV.uw[i]),fGETUHALF(0,RtV)))
+
+
+
+
+EXTINSN(V6_vgathermw,  "vtmp.w=vgather(Rt32,Mu2,Vv32.w).w", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_GATHER,A_CVI_VA,A_CVI_VM,A_CVI_TMP_DST,A_MEMLIKE), "Gather Words",
+{
+    fHIDE(int i;)
+	fHIDE(int element_size = 4;)
+    fHIDE(fGATHER_INIT( RtV, MuV, element_size);)
+    fVLASTBYTE(MuV, element_size);
+    fVALIGN(RtV, element_size);
+    fVFOREACH(32, i) {
+        EA = RtV+VvV.uw[i];
+        fVLOG_VTCM_GATHER_WORD(EA, VvV.uw[i], i,MuV);
+    }
+    fGATHER_FINISH()
+})
+EXTINSN(V6_vgathermh,  "vtmp.h=vgather(Rt32,Mu2,Vv32.h).h", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_GATHER,A_CVI_VA,A_CVI_VM,A_CVI_TMP_DST,A_MEMLIKE), "Gather halfwords",
+{
+    fHIDE(int i;)
+	fHIDE(int element_size = 2;)
+    fHIDE(fGATHER_INIT( RtV, MuV, element_size);)
+    fVLASTBYTE(MuV, element_size);
+    fVALIGN(RtV, element_size);
+    fVFOREACH(16, i) {
+        EA = RtV+VvV.uh[i];
+        fVLOG_VTCM_GATHER_HALFWORD(EA, VvV.uh[i], i,MuV);
+    }
+    fGATHER_FINISH()
+})
+
+
+
+EXTINSN(V6_vgathermhw,  "vtmp.h=vgather(Rt32,Mu2,Vvv32.w).h", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_GATHER,A_CVI_VA_DV,A_CVI_VM,A_CVI_TMP_DST,A_MEMLIKE), "Gather halfwords",
+{
+    fHIDE(int i;)
+    fHIDE(int j;)
+	fHIDE(int element_size = 2;)
+    fHIDE(fGATHER_INIT( RtV, MuV, element_size);)
+    fVLASTBYTE(MuV, element_size);
+    fVALIGN(RtV, element_size);
+    fVFOREACH(32, i) {
+       for(j = 0; j < 2; j++) {
+            EA = RtV+VvvV.v[j].uw[i];
+            fVLOG_VTCM_GATHER_HALFWORD_DV(EA, VvvV.v[j].uw[i], (2*i+j),i,j,MuV);
+        }
+    }
+     fGATHER_FINISH()
+})
+
+
+EXTINSN(V6_vgathermwq,  "if (Qs4) vtmp.w=vgather(Rt32,Mu2,Vv32.w).w", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_GATHER,A_CVI_VA,A_CVI_VM,A_CVI_TMP_DST,A_MEMLIKE), "Gather Words",
+{
+    fHIDE(int i;)
+	fHIDE(int element_size = 4;)
+    fHIDE(fGATHER_INIT( RtV, MuV, element_size);)
+    fVLASTBYTE(MuV, element_size);
+    fVALIGN(RtV, element_size);
+    fVFOREACH(32, i) {
+        EA = RtV+VvV.uw[i];
+        fVLOG_VTCM_GATHER_WORDQ(EA, VvV.uw[i], i,QsV,MuV);
+    }
+    fGATHER_FINISH()
+})
+EXTINSN(V6_vgathermhq,  "if (Qs4) vtmp.h=vgather(Rt32,Mu2,Vv32.h).h", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_GATHER,A_CVI_VA,A_CVI_VM,A_CVI_TMP_DST,A_MEMLIKE), "Gather halfwords",
+{
+    fHIDE(int i;)
+	fHIDE(int element_size = 2;)
+    fHIDE(fGATHER_INIT( RtV, MuV, element_size);)
+    fVLASTBYTE(MuV, element_size);
+    fVALIGN(RtV, element_size);
+    fVFOREACH(16, i) {
+        EA = RtV+VvV.uh[i];
+        fVLOG_VTCM_GATHER_HALFWORDQ(EA, VvV.uh[i], i,QsV,MuV);
+    }
+    fGATHER_FINISH()
+})
+
+
+
+EXTINSN(V6_vgathermhwq,  "if (Qs4) vtmp.h=vgather(Rt32,Mu2,Vvv32.w).h", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_GATHER,A_CVI_VA_DV,A_CVI_VM,A_CVI_TMP_DST,A_MEMLIKE), "Gather halfwords",
+{
+    fHIDE(int i;)
+    fHIDE(int j;)
+	fHIDE(int element_size = 2;)
+    fHIDE(fGATHER_INIT( RtV, MuV, element_size);)
+    fVLASTBYTE(MuV, element_size);
+    fVALIGN(RtV, element_size);
+    fVFOREACH(32, i) {
+       for(j = 0; j < 2; j++) {
+            EA = RtV+VvvV.v[j].uw[i];
+            fVLOG_VTCM_GATHER_HALFWORDQ_DV(EA, VvvV.v[j].uw[i], (2*i+j),i,j,QsV,MuV);
+       }
+    }
+    fGATHER_FINISH()
+})
+
+
+
+EXTINSN(V6_vscattermw , "vscatter(Rt32,Mu2,Vv32.w).w=Vw32", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_SCATTER,A_CVI_VA,A_CVI_VM,A_MEMLIKE), "Scatter Words",
+{
+    fHIDE(int i;)
+	fHIDE(int element_size = 4;)
+    fHIDE(fSCATTER_INIT( RtV, MuV, element_size);)
+    fVLASTBYTE(MuV, element_size);
+    fVALIGN(RtV, element_size);
+    fVFOREACH(32, i) {
+        EA = RtV+VvV.uw[i];
+        fVLOG_VTCM_WORD(EA, VvV.uw[i], VwV,i,MuV);
+    }
+    fSCATTER_FINISH(0)
+})
+
+
+
+EXTINSN(V6_vscattermh , "vscatter(Rt32,Mu2,Vv32.h).h=Vw32", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_SCATTER,A_CVI_VA,A_CVI_VM,A_MEMLIKE), "Scatter halfWords",
+{
+    fHIDE(int i;)
+	fHIDE(int element_size = 2;)
+    fHIDE(fSCATTER_INIT( RtV, MuV, element_size);)
+    fVLASTBYTE(MuV, element_size);
+    fVALIGN(RtV, element_size);
+    fVFOREACH(16, i) {
+        EA = RtV+VvV.uh[i];
+        fVLOG_VTCM_HALFWORD(EA,VvV.uh[i],VwV,i,MuV);
+    }
+    fSCATTER_FINISH(0)
+})
+
+
+EXTINSN(V6_vscattermw_add,  "vscatter(Rt32,Mu2,Vv32.w).w+=Vw32", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_SCATTER,A_CVI_VA,A_CVI_VM,A_MEMLIKE), "Scatter Words-Add",
+{
+    fHIDE(int i;)
+    fHIDE(int ALIGNMENT=4;)
+	fHIDE(int element_size = 4;)
+    fHIDE(fSCATTER_INIT( RtV, MuV, element_size);)
+    fVLASTBYTE(MuV, element_size);
+    fVALIGN(RtV, element_size);
+    fVFOREACH(32, i) {
+        EA = (RtV+fVALIGN(VvV.uw[i],ALIGNMENT));
+        fVLOG_VTCM_WORD_INCREMENT(EA,VvV.uw[i],VwV,i,ALIGNMENT,MuV);
+    }
+    fHIDE(fLOG_SCATTER_OP(4);)
+    fSCATTER_FINISH(1)
+})
+
+EXTINSN(V6_vscattermh_add,  "vscatter(Rt32,Mu2,Vv32.h).h+=Vw32", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_SCATTER,A_CVI_VA,A_CVI_VM,A_MEMLIKE), "Scatter halfword-Add",
+{
+    fHIDE(int i;)
+    fHIDE(int ALIGNMENT=2;)
+	fHIDE(int element_size = 2;)
+    fHIDE(fSCATTER_INIT( RtV, MuV, element_size);)
+    fVLASTBYTE(MuV, element_size);
+    fVALIGN(RtV, element_size);
+    fVFOREACH(16, i) {
+        EA = (RtV+fVALIGN(VvV.uh[i],ALIGNMENT));
+        fVLOG_VTCM_HALFWORD_INCREMENT(EA,VvV.uh[i],VwV,i,ALIGNMENT,MuV);
+    }
+    fHIDE(fLOG_SCATTER_OP(2);)
+    fSCATTER_FINISH(1)
+})
+
+
+EXTINSN(V6_vscattermwq,  "if (Qs4) vscatter(Rt32,Mu2,Vv32.w).w=Vw32", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_SCATTER,A_CVI_VA,A_CVI_VM,A_MEMLIKE), "Scatter Words conditional",
+{
+    fHIDE(int i;)
+	fHIDE(int element_size = 4;)
+    fHIDE(fSCATTER_INIT( RtV, MuV, element_size);)
+    fVLASTBYTE(MuV, element_size);
+    fVALIGN(RtV, element_size);
+    fVFOREACH(32, i) {
+        EA = RtV+VvV.uw[i];
+        fVLOG_VTCM_WORDQ(EA,VvV.uw[i], VwV,i,QsV,MuV);
+    }
+    fSCATTER_FINISH(0)
+})
+
+EXTINSN(V6_vscattermhq,  "if (Qs4) vscatter(Rt32,Mu2,Vv32.h).h=Vw32", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_SCATTER,A_CVI_VA,A_CVI_VM,A_MEMLIKE), "Scatter HalfWords conditional",
+{
+    fHIDE(int i;)
+	fHIDE(int element_size = 2;)
+    fHIDE(fSCATTER_INIT( RtV, MuV, element_size);)
+    fVLASTBYTE(MuV, element_size);
+    fVALIGN(RtV, element_size);
+    fVFOREACH(16, i) {
+        EA = RtV+VvV.uh[i];
+        fVLOG_VTCM_HALFWORDQ(EA,VvV.uh[i],VwV,i,QsV,MuV);
+    }
+    fSCATTER_FINISH(0)
+})
+
+
+
+
+EXTINSN(V6_vscattermhw , "vscatter(Rt32,Mu2,Vvv32.w).h=Vw32", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_SCATTER,A_CVI_VA_DV,A_CVI_VM,A_MEMLIKE), "Scatter Words",
+{
+    fHIDE(int i;)
+    fHIDE(int j;)
+	fHIDE(int element_size = 2;)
+    fHIDE(fSCATTER_INIT( RtV, MuV, element_size);)
+    fVLASTBYTE(MuV, element_size);
+    fVALIGN(RtV, element_size);
+    fVFOREACH(32, i) {
+        for(j = 0; j < 2; j++) {
+            EA = RtV+VvvV.v[j].uw[i];
+            fVLOG_VTCM_HALFWORD_DV(EA,VvvV.v[j].uw[i],VwV,(2*i+j),i,j,MuV);
+        }
+    }
+    fSCATTER_FINISH(0)
+})
+
+
+
+EXTINSN(V6_vscattermhwq,  "if (Qs4) vscatter(Rt32,Mu2,Vvv32.w).h=Vw32", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_SCATTER,A_CVI_VA_DV,A_CVI_VM,A_MEMLIKE), "Scatter halfwords conditional",
+{
+    fHIDE(int i;)
+    fHIDE(int j;)
+	fHIDE(int element_size = 2;)
+    fHIDE(fSCATTER_INIT( RtV, MuV, element_size);)
+    fVLASTBYTE(MuV, element_size);
+    fVALIGN(RtV, element_size);
+    fVFOREACH(32, i) {
+        for(j = 0; j < 2; j++) {
+            EA = RtV+VvvV.v[j].uw[i];
+            fVLOG_VTCM_HALFWORDQ_DV(EA,VvvV.v[j].uw[i],VwV,(2*i+j),QsV,i,j,MuV);
+        }
+    }
+    fSCATTER_FINISH(0)
+})
+
+EXTINSN(V6_vscattermhw_add,  "vscatter(Rt32,Mu2,Vvv32.w).h+=Vw32", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_SCATTER,A_CVI_VA_DV,A_CVI_VM,A_MEMLIKE), "Scatter halfwords-add",
+{
+    fHIDE(int i;)
+    fHIDE(int j;)
+    fHIDE(int ALIGNMENT=2;)
+	fHIDE(int element_size = 2;)
+    fHIDE(fSCATTER_INIT( RtV, MuV, element_size);)
+    fVLASTBYTE(MuV, element_size);
+    fVALIGN(RtV, element_size);
+    fVFOREACH(32, i) {
+        for(j = 0; j < 2; j++) {
+             EA =  RtV + fVALIGN(VvvV.v[j].uw[i],ALIGNMENT);;
+             fVLOG_VTCM_HALFWORD_INCREMENT_DV(EA,VvvV.v[j].uw[i],VwV,(2*i+j),i,j,ALIGNMENT,MuV);
+        }
+    }
+    fHIDE(fLOG_SCATTER_OP(2);)
+    fSCATTER_FINISH(1)
+})
+
+EXTINSN(V6_vprefixqb,"Vd32.b=prefixsum(Qv4)",   ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VS),  "parallel prefix sum of Q into byte",
+{
+    fHIDE(int i;)
+    fHIDE(size1u_t acc = 0;)
+    fVFOREACH(8, i) {
+        acc += fGETQBIT(QvV,i);
+        VdV.ub[i] = acc;
+    }
+    } )
+EXTINSN(V6_vprefixqh,"Vd32.h=prefixsum(Qv4)",   ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VS),  "parallel prefix sum of Q into halfwords",
+{
+    fHIDE(int i;)
+    fHIDE(size2u_t acc = 0;)
+    fVFOREACH(16, i) {
+        acc += fGETQBIT(QvV,i*2+0);
+        acc += fGETQBIT(QvV,i*2+1);
+        VdV.uh[i] = acc;
+    }
+    } )
+EXTINSN(V6_vprefixqw,"Vd32.w=prefixsum(Qv4)",   ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VS),  "parallel prefix sum of Q into words",
+{
+    fHIDE(int i;)
+    fHIDE(size4u_t acc = 0;)
+    fVFOREACH(32, i) {
+        acc += fGETQBIT(QvV,i*4+0);
+        acc += fGETQBIT(QvV,i*4+1);
+        acc += fGETQBIT(QvV,i*4+2);
+        acc += fGETQBIT(QvV,i*4+3);
+        VdV.uw[i] = acc;
+    }
+    } )
+
+
+
+
+
+/******************************************************************************
+ DEBUG Vector/Register Printing
+ ******************************************************************************/
+
+#define PRINT_VU(TYPE, TYPE2, COUNT)\
+    int i;  \
+    size4u_t vec_len = fVBYTES();\
+    fprintf(stdout,"V%2d: ",VuN);  \
+    for (i=0;i<vec_len>>COUNT;i++) {         \
+        fprintf(stdout,TYPE2 " ", VuV.TYPE[i]); \
+    };  \
+    fprintf(stdout,"\\n");  \
+	fflush(stdout);\
+
+#undef ATTR_VMEM
+#undef ATTR_VMEMU
+#undef ATTR_VMEM_NT
+
+#endif /* NO_MMVEC */
+
+#ifdef __SELF_DEF_EXTINSN
+#undef EXTINSN
+#undef __SELF_DEF_EXTINSN
+#endif
-- 
2.7.4


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 15/20] Hexagon HVX (target/hexagon) instruction decoding
  2021-07-05 23:34 [PATCH 00/20] Hexagon HVX (target/hexagon) patch series Taylor Simpson
                   ` (13 preceding siblings ...)
  2021-07-05 23:34 ` [PATCH 14/20] Hexagon HVX (target/hexagon) import semantics Taylor Simpson
@ 2021-07-05 23:34 ` Taylor Simpson
  2021-07-05 23:34 ` [PATCH 16/20] Hexagon HVX (target/hexagon) import instruction encodings Taylor Simpson
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 40+ messages in thread
From: Taylor Simpson @ 2021-07-05 23:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: ale, peter.maydell, bcain, richard.henderson, tsimpson, philmd

Add new file to target/hexagon/meson.build

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/mmvec/decode_ext_mmvec.h |  24 ++++
 target/hexagon/decode.c                 |  24 +++-
 target/hexagon/mmvec/decode_ext_mmvec.c | 235 ++++++++++++++++++++++++++++++++
 target/hexagon/meson.build              |   1 +
 4 files changed, 282 insertions(+), 2 deletions(-)
 create mode 100644 target/hexagon/mmvec/decode_ext_mmvec.h
 create mode 100644 target/hexagon/mmvec/decode_ext_mmvec.c

diff --git a/target/hexagon/mmvec/decode_ext_mmvec.h b/target/hexagon/mmvec/decode_ext_mmvec.h
new file mode 100644
index 0000000..3664b68
--- /dev/null
+++ b/target/hexagon/mmvec/decode_ext_mmvec.h
@@ -0,0 +1,24 @@
+/*
+ *  Copyright(c) 2019-2021 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HEXAGON_DECODE_EXT_MMVEC_H
+#define HEXAGON_DECODE_EXT_MMVEC_H
+
+void mmvec_ext_decode_checks(Packet *pkt, bool disas_only);
+SlotMask mmvec_ext_decode_find_iclass_slots(int opcode);
+
+#endif
diff --git a/target/hexagon/decode.c b/target/hexagon/decode.c
index d424245..653bfd7 100644
--- a/target/hexagon/decode.c
+++ b/target/hexagon/decode.c
@@ -22,6 +22,7 @@
 #include "decode.h"
 #include "insn.h"
 #include "printinsn.h"
+#include "mmvec/decode_ext_mmvec.h"
 
 #define fZXTN(N, M, VAL) ((VAL) & ((1LL << (N)) - 1))
 
@@ -566,8 +567,12 @@ static void decode_remove_extenders(Packet *packet)
 
 static SlotMask get_valid_slots(const Packet *pkt, unsigned int slot)
 {
-    return find_iclass_slots(pkt->insn[slot].opcode,
-                             pkt->insn[slot].iclass);
+    if (GET_ATTRIB(pkt->insn[slot].opcode, A_EXTENSION)) {
+        return mmvec_ext_decode_find_iclass_slots(pkt->insn[slot].opcode);
+    } else {
+        return find_iclass_slots(pkt->insn[slot].opcode,
+                                 pkt->insn[slot].iclass);
+    }
 }
 
 #define DECODE_NEW_TABLE(TAG, SIZE, WHATNOT)     /* NOTHING */
@@ -728,6 +733,11 @@ decode_insns_tablewalk(Insn *insn, const DectreeTable *table,
         }
         decode_op(insn, opc, encoding);
         return 1;
+    } else if (table->table[i].type == DECTREE_EXTSPACE) {
+        /*
+         * For now, HVX will be the only coproc
+         */
+        return decode_insns_tablewalk(insn, ext_trees[EXT_IDX_mmvec], encoding);
     } else {
         return 0;
     }
@@ -874,6 +884,7 @@ int decode_packet(int max_words, const uint32_t *words, Packet *pkt,
     int words_read = 0;
     bool end_of_packet = false;
     int new_insns = 0;
+    int i;
     uint32_t encoding32;
 
     /* Initialize */
@@ -901,6 +912,11 @@ int decode_packet(int max_words, const uint32_t *words, Packet *pkt,
         return 0;
     }
     pkt->encod_pkt_size_in_bytes = words_read * 4;
+    pkt->pkt_has_hvx = false;
+    for (i = 0; i < num_insns; i++) {
+        pkt->pkt_has_hvx |=
+            GET_ATTRIB(pkt->insn[i].opcode, A_CVI);
+    }
 
     /*
      * Check for :endloop in the parse bits
@@ -931,6 +947,10 @@ int decode_packet(int max_words, const uint32_t *words, Packet *pkt,
     decode_set_slot_number(pkt);
     decode_fill_newvalue_regno(pkt);
 
+    if (pkt->pkt_has_hvx) {
+        mmvec_ext_decode_checks(pkt, disas_only);
+    }
+
     if (!disas_only) {
         decode_shuffle_for_execution(pkt);
         decode_split_cmpjump(pkt);
diff --git a/target/hexagon/mmvec/decode_ext_mmvec.c b/target/hexagon/mmvec/decode_ext_mmvec.c
new file mode 100644
index 0000000..b5b1bf1
--- /dev/null
+++ b/target/hexagon/mmvec/decode_ext_mmvec.c
@@ -0,0 +1,235 @@
+/*
+ *  Copyright(c) 2019-2021 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "decode.h"
+#include "opcodes.h"
+#include "insn.h"
+#include "iclass.h"
+#include "mmvec/mmvec.h"
+#include "mmvec/decode_ext_mmvec.h"
+
+static void
+check_new_value(Packet *pkt)
+{
+    /* .new value for a MMVector store */
+    int i, j;
+    const char *reginfo;
+    const char *destletters;
+    const char *dststr = NULL;
+    uint16_t def_opcode;
+    char letter;
+    int def_regnum;
+
+    for (i = 1; i < pkt->num_insns; i++) {
+        uint16_t use_opcode = pkt->insn[i].opcode;
+        if (GET_ATTRIB(use_opcode, A_DOTNEWVALUE) &&
+            GET_ATTRIB(use_opcode, A_CVI) &&
+            GET_ATTRIB(use_opcode, A_STORE)) {
+            int use_regidx = strchr(opcode_reginfo[use_opcode], 's') -
+                opcode_reginfo[use_opcode];
+            /*
+             * What's encoded at the N-field is the offset to who's producing
+             * the value.
+             * Shift off the LSB which indicates odd/even register.
+             */
+            int def_off = ((pkt->insn[i].regno[use_regidx]) >> 1);
+            int def_oreg = pkt->insn[i].regno[use_regidx] & 1;
+            int def_idx = -1;
+            for (j = i - 1; (j >= 0) && (def_off >= 0); j--) {
+                if (!GET_ATTRIB(pkt->insn[j].opcode, A_CVI)) {
+                    continue;
+                }
+                def_off--;
+                if (def_off == 0) {
+                    def_idx = j;
+                    break;
+                }
+            }
+            /*
+             * Check for a badly encoded N-field which points to an instruction
+             * out-of-range
+             */
+            g_assert(!((def_off != 0) || (def_idx < 0) ||
+                       (def_idx > (pkt->num_insns - 1))));
+
+            /* def_idx is the index of the producer */
+            def_opcode = pkt->insn[def_idx].opcode;
+            reginfo = opcode_reginfo[def_opcode];
+            destletters = "dexy";
+            for (j = 0; (letter = destletters[j]) != 0; j++) {
+                dststr = strchr(reginfo, letter);
+                if (dststr != NULL) {
+                    break;
+                }
+            }
+            if ((dststr == NULL)  && GET_ATTRIB(def_opcode, A_CVI_GATHER)) {
+                def_regnum = 0;
+                pkt->insn[i].regno[use_regidx] = def_oreg;
+                pkt->insn[i].new_value_producer_slot = pkt->insn[def_idx].slot;
+            } else {
+                if (dststr == NULL) {
+                    /* still not there, we have a bad packet */
+                    g_assert_not_reached();
+                }
+                def_regnum = pkt->insn[def_idx].regno[dststr - reginfo];
+                /* Now patch up the consumer with the register number */
+                pkt->insn[i].regno[use_regidx] = def_regnum ^ def_oreg;
+                /* special case for (Vx,Vy) */
+                dststr = strchr(reginfo, 'y');
+                if (def_oreg && strchr(reginfo, 'x') && dststr) {
+                    def_regnum = pkt->insn[def_idx].regno[dststr - reginfo];
+                    pkt->insn[i].regno[use_regidx] = def_regnum;
+                }
+                /*
+                 * We need to remember who produces this value to later
+                 * check if it was dynamically cancelled
+                 */
+                pkt->insn[i].new_value_producer_slot = pkt->insn[def_idx].slot;
+            }
+        }
+    }
+}
+
+/*
+ * We don't want to reorder slot1/slot0 with respect to each other.
+ * So in our shuffling, we don't want to move the .cur / .tmp vmem earlier
+ * Instead, we should move the producing instruction later
+ * But the producing instruction might feed a .new store!
+ * So we may need to move that even later.
+ */
+
+static void
+decode_mmvec_move_cvi_to_end(Packet *pkt, int max)
+{
+    int i;
+    for (i = 0; i < max; i++) {
+        if (GET_ATTRIB(pkt->insn[i].opcode, A_CVI)) {
+            int last_inst = pkt->num_insns - 1;
+            uint16_t last_opcode = pkt->insn[last_inst].opcode;
+
+            /*
+             * If the last instruction is an endloop, move to the one before it
+             * Keep endloop as the last thing always
+             */
+            if ((last_opcode == J2_endloop0) ||
+                (last_opcode == J2_endloop1) ||
+                (last_opcode == J2_endloop01)) {
+                last_inst--;
+            }
+
+            decode_send_insn_to(pkt, i, last_inst);
+            max--;
+            i--;    /* Retry this index now that packet has rotated */
+        }
+    }
+}
+
+static void
+decode_shuffle_for_execution_vops(Packet *pkt)
+{
+    /*
+     * Sort for .new
+     */
+    int i;
+    for (i = 0; i < pkt->num_insns; i++) {
+        uint16_t opcode = pkt->insn[i].opcode;
+        if (GET_ATTRIB(opcode, A_LOAD) &&
+            (GET_ATTRIB(opcode, A_CVI_NEW) ||
+             GET_ATTRIB(opcode, A_CVI_TMP))) {
+            /*
+             * Find prior consuming vector instructions
+             * Move to end of packet
+             */
+            decode_mmvec_move_cvi_to_end(pkt, i);
+            break;
+        }
+    }
+
+    /* Move HVX new value stores to the end of the packet */
+    for (i = 0; i < pkt->num_insns - 1; i++) {
+        uint16_t opcode = pkt->insn[i].opcode;
+        if (GET_ATTRIB(opcode, A_STORE) &&
+            GET_ATTRIB(opcode, A_CVI_NEW) &&
+            !GET_ATTRIB(opcode, A_CVI_SCATTER_RELEASE)) {
+            int last_inst = pkt->num_insns - 1;
+            uint16_t last_opcode = pkt->insn[last_inst].opcode;
+
+            /*
+             * If the last instruction is an endloop, move to the one before it
+             * Keep endloop as the last thing always
+             */
+            if ((last_opcode == J2_endloop0) ||
+                (last_opcode == J2_endloop1) ||
+                (last_opcode == J2_endloop01)) {
+                last_inst--;
+            }
+
+            decode_send_insn_to(pkt, i, last_inst);
+            break;
+        }
+    }
+}
+
+static void
+check_for_vhist(Packet *pkt)
+{
+    pkt->pkt_has_vhist = false;
+    for (int i = 0; i < pkt->num_insns; i++) {
+        int opcode = pkt->insn[i].opcode;
+        if (GET_ATTRIB(opcode, A_CVI) && GET_ATTRIB(opcode, A_CVI_4SLOT)) {
+                pkt->pkt_has_vhist = true;
+                return;
+        }
+    }
+}
+
+/*
+ * Public Functions
+ */
+
+SlotMask mmvec_ext_decode_find_iclass_slots(int opcode)
+{
+    if (GET_ATTRIB(opcode, A_CVI_VM)) {
+        /* HVX memory instruction */
+        if (GET_ATTRIB(opcode, A_RESTRICT_SLOT0ONLY)) {
+            return SLOTS_0;
+        } else if (GET_ATTRIB(opcode, A_RESTRICT_SLOT1ONLY)) {
+            return SLOTS_1;
+        }
+        return SLOTS_01;
+    } else if (GET_ATTRIB(opcode, A_RESTRICT_SLOT2ONLY)) {
+        return SLOTS_2;
+    } else if (GET_ATTRIB(opcode, A_CVI_VX)) {
+        /* HVX multiply instruction */
+        return SLOTS_23;
+    } else if (GET_ATTRIB(opcode, A_CVI_VS_VX)) {
+        /* HVX permute/shift instruction */
+        return SLOTS_23;
+    } else {
+        return SLOTS_0123;
+    }
+}
+
+void mmvec_ext_decode_checks(Packet *pkt, bool disas_only)
+{
+    check_new_value(pkt);
+    if (!disas_only) {
+        decode_shuffle_for_execution_vops(pkt);
+    }
+    check_for_vhist(pkt);
+}
diff --git a/target/hexagon/meson.build b/target/hexagon/meson.build
index ed292b4..e598c43 100644
--- a/target/hexagon/meson.build
+++ b/target/hexagon/meson.build
@@ -173,6 +173,7 @@ hexagon_ss.add(files(
     'printinsn.c',
     'arch.c',
     'fma_emu.c',
+    'mmvec/decode_ext_mmvec.c',
     'mmvec/system_ext_mmvec.c',
 ))
 
-- 
2.7.4


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 16/20] Hexagon HVX (target/hexagon) import instruction encodings
  2021-07-05 23:34 [PATCH 00/20] Hexagon HVX (target/hexagon) patch series Taylor Simpson
                   ` (14 preceding siblings ...)
  2021-07-05 23:34 ` [PATCH 15/20] Hexagon HVX (target/hexagon) instruction decoding Taylor Simpson
@ 2021-07-05 23:34 ` Taylor Simpson
  2021-07-05 23:34 ` [PATCH 17/20] Hexagon HVX (tests/tcg/hexagon) vector_add_int test Taylor Simpson
                   ` (3 subsequent siblings)
  19 siblings, 0 replies; 40+ messages in thread
From: Taylor Simpson @ 2021-07-05 23:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: ale, peter.maydell, bcain, richard.henderson, tsimpson, philmd

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/decode.c                      |   4 +
 target/hexagon/imported/allextenc.def        |  20 +
 target/hexagon/imported/encode.def           |   1 +
 target/hexagon/imported/mmvec/encode_ext.def | 794 +++++++++++++++++++++++++++
 4 files changed, 819 insertions(+)
 create mode 100644 target/hexagon/imported/allextenc.def
 create mode 100644 target/hexagon/imported/mmvec/encode_ext.def

diff --git a/target/hexagon/decode.c b/target/hexagon/decode.c
index 653bfd7..6f0f27b 100644
--- a/target/hexagon/decode.c
+++ b/target/hexagon/decode.c
@@ -47,6 +47,7 @@ enum {
         /* Name   Num Table */
 DEF_REGMAP(R_16,  16, 0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23)
 DEF_REGMAP(R__8,  8,  0, 2, 4, 6, 16, 18, 20, 22)
+DEF_REGMAP(R_8,   8,  0, 1, 2, 3, 4, 5, 6, 7)
 
 #define DECODE_MAPPED_REG(OPNUM, NAME) \
     insn->regno[OPNUM] = DECODE_REGISTER_##NAME[insn->regno[OPNUM]];
@@ -158,6 +159,9 @@ static void decode_ext_init(void)
     for (i = EXT_IDX_noext; i < EXT_IDX_noext_AFTER; i++) {
         ext_trees[i] = &dectree_table_DECODE_EXT_EXT_noext;
     }
+    for (i = EXT_IDX_mmvec; i < EXT_IDX_mmvec_AFTER; i++) {
+        ext_trees[i] = &dectree_table_DECODE_EXT_EXT_mmvec;
+    }
 }
 
 typedef struct {
diff --git a/target/hexagon/imported/allextenc.def b/target/hexagon/imported/allextenc.def
new file mode 100644
index 0000000..39a3e93
--- /dev/null
+++ b/target/hexagon/imported/allextenc.def
@@ -0,0 +1,20 @@
+/*
+ *  Copyright(c) 2019-2021 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#define EXTNAME mmvec
+#include "mmvec/encode_ext.def"
+#undef EXTNAME
diff --git a/target/hexagon/imported/encode.def b/target/hexagon/imported/encode.def
index b9368d1..e40e7fb 100644
--- a/target/hexagon/imported/encode.def
+++ b/target/hexagon/imported/encode.def
@@ -71,6 +71,7 @@
 
 #include "encode_pp.def"
 #include "encode_subinsn.def"
+#include "allextenc.def"
 
 #ifdef __SELF_DEF_FIELD32
 #undef __SELF_DEF_FIELD32
diff --git a/target/hexagon/imported/mmvec/encode_ext.def b/target/hexagon/imported/mmvec/encode_ext.def
new file mode 100644
index 0000000..6fbbe2c
--- /dev/null
+++ b/target/hexagon/imported/mmvec/encode_ext.def
@@ -0,0 +1,794 @@
+/*
+ *  Copyright(c) 2019-2021 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#define CONCAT(A,B) A##B
+#define EXTEXTNAME(X) CONCAT(EXT_,X)
+#define DEF_ENC(TAG,STR) DEF_EXT_ENC(TAG,EXTEXTNAME(EXTNAME),STR)
+
+
+#ifndef NO_MMVEC
+DEF_ENC(V6_extractw,  ICLASS_LD" 001 0 000sssss  PP0uuuuu  --1ddddd") /* coproc insn, returns Rd */
+#endif
+
+
+#ifndef NO_MMVEC
+
+
+
+DEF_CLASS32(ICLASS_NCJ" 1--- -------- PP------ --------",COPROC_VMEM)
+DEF_CLASS32(ICLASS_NCJ" 1000 0-0ttttt PPi--iii ---ddddd",BaseOffset_VMEM_Loads)
+DEF_CLASS32(ICLASS_NCJ" 1000 1-0ttttt PPivviii ---ddddd",BaseOffset_if_Pv_VMEM_Loads)
+DEF_CLASS32(ICLASS_NCJ" 1000 0-1ttttt PPi--iii --------",BaseOffset_VMEM_Stores1)
+DEF_CLASS32(ICLASS_NCJ" 1000 1-0ttttt PPi--iii 00------",BaseOffset_VMEM_Stores2)
+DEF_CLASS32(ICLASS_NCJ" 1000 1-1ttttt PPivviii --------",BaseOffset_if_Pv_VMEM_Stores)
+
+DEF_CLASS32(ICLASS_NCJ" 1001 0-0xxxxx PP---iii ---ddddd",PostImm_VMEM_Loads)
+DEF_CLASS32(ICLASS_NCJ" 1001 1-0xxxxx PP-vviii ---ddddd",PostImm_if_Pv_VMEM_Loads)
+DEF_CLASS32(ICLASS_NCJ" 1001 0-1xxxxx PP---iii --------",PostImm_VMEM_Stores1)
+DEF_CLASS32(ICLASS_NCJ" 1001 1-0xxxxx PP---iii 00------",PostImm_VMEM_Stores2)
+DEF_CLASS32(ICLASS_NCJ" 1001 1-1xxxxx PP-vviii --------",PostImm_if_Pv_VMEM_Stores)
+
+DEF_CLASS32(ICLASS_NCJ" 1011 0-0xxxxx PPu----- ---ddddd",PostM_VMEM_Loads)
+DEF_CLASS32(ICLASS_NCJ" 1011 1-0xxxxx PPuvv--- ---ddddd",PostM_if_Pv_VMEM_Loads)
+DEF_CLASS32(ICLASS_NCJ" 1011 0-1xxxxx PPu----- --------",PostM_VMEM_Stores1)
+DEF_CLASS32(ICLASS_NCJ" 1011 1-0xxxxx PPu----- 00------",PostM_VMEM_Stores2)
+DEF_CLASS32(ICLASS_NCJ" 1011 1-1xxxxx PPuvv--- --------",PostM_if_Pv_VMEM_Stores)
+
+DEF_CLASS32(ICLASS_NCJ" 110- 0------- PP------ --------",Z_Load)
+DEF_CLASS32(ICLASS_NCJ" 110- 1------- PP------ --------",Z_Load_if_Pv)
+
+DEF_CLASS32(ICLASS_NCJ" 1111 000ttttt PPu--0-- ---vvvvv",Gather)
+DEF_CLASS32(ICLASS_NCJ" 1111 000ttttt PPu--1-- -ssvvvvv",Gather_if_Qs)
+DEF_CLASS32(ICLASS_NCJ" 1111 001ttttt PPuvvvvv ---wwwww",Scatter)
+DEF_CLASS32(ICLASS_NCJ" 1111 001ttttt PPuvvvvv -----sss",Scatter_New)
+DEF_CLASS32(ICLASS_NCJ" 1111 1--ttttt PPuvvvvv -sswwwww",Scatter_if_Qs)
+
+
+DEF_FIELD32(ICLASS_NCJ" 1--- -!------ PP------ --------",NT,"NonTemporal")
+
+
+
+DEF_FIELDROW_DESC32(                ICLASS_NCJ" 1 000 --- ----- PP i --iii ----- ---","[#0] vmem(Rt+#s4)[:nt]")
+
+#define LDST_ENC(TAG,MAJ3,MID3,RREG,TINY6,MIN3,VREG) DEF_ENC(TAG, ICLASS_NCJ "1" #MAJ3 #MID3 #RREG "PP" #TINY6 #MIN3 #VREG)
+
+#define LDST_BO(TAGPRE,MID3,PRED,MIN3,VREG) LDST_ENC(TAGPRE##_ai, 000,MID3,ttttt,i PRED iii,MIN3,VREG)
+#define LDST_PI(TAGPRE,MID3,PRED,MIN3,VREG) LDST_ENC(TAGPRE##_pi, 001,MID3,xxxxx,- PRED iii,MIN3,VREG)
+#define LDST_PM(TAGPRE,MID3,PRED,MIN3,VREG) LDST_ENC(TAGPRE##_ppu,011,MID3,xxxxx,u PRED ---,MIN3,VREG)
+
+#define LDST_BASICLD(OP,TAGPRE) \
+    OP(TAGPRE,                000,00,000,ddddd) \
+    OP(TAGPRE##_nt,           010,00,000,ddddd) \
+    OP(TAGPRE##_cur,          000,00,001,ddddd) \
+    OP(TAGPRE##_nt_cur,       010,00,001,ddddd) \
+    OP(TAGPRE##_tmp,          000,00,010,ddddd) \
+    OP(TAGPRE##_nt_tmp,       010,00,010,ddddd)
+
+#define LDST_BASICST(OP,TAGPRE) \
+    OP(TAGPRE,           001,--,000,sssss) \
+    OP(TAGPRE##_nt,      011,--,000,sssss) \
+    OP(TAGPRE##_new,     001,--,001,-0sss) \
+    OP(TAGPRE##_srls,    001,--,001,-1---) \
+    OP(TAGPRE##_nt_new,  011,--,001,--sss) \
+
+
+#define LDST_QPREDST(OP,TAGPRE) \
+    OP(TAGPRE##_qpred,    100,vv,000,sssss) \
+    OP(TAGPRE##_nt_qpred, 110,vv,000,sssss) \
+    OP(TAGPRE##_nqpred,   100,vv,001,sssss) \
+    OP(TAGPRE##_nt_nqpred,110,vv,001,sssss) \
+
+#define LDST_CONDLD(OP,TAGPRE) \
+    OP(TAGPRE##_pred,         100,vv,010,ddddd) \
+    OP(TAGPRE##_nt_pred,      110,vv,010,ddddd) \
+    OP(TAGPRE##_npred,        100,vv,011,ddddd) \
+    OP(TAGPRE##_nt_npred,     110,vv,011,ddddd) \
+    OP(TAGPRE##_cur_pred,     100,vv,100,ddddd) \
+    OP(TAGPRE##_nt_cur_pred,  110,vv,100,ddddd) \
+    OP(TAGPRE##_cur_npred,    100,vv,101,ddddd) \
+    OP(TAGPRE##_nt_cur_npred, 110,vv,101,ddddd) \
+    OP(TAGPRE##_tmp_pred,     100,vv,110,ddddd) \
+    OP(TAGPRE##_nt_tmp_pred,  110,vv,110,ddddd) \
+    OP(TAGPRE##_tmp_npred,    100,vv,111,ddddd) \
+    OP(TAGPRE##_nt_tmp_npred, 110,vv,111,ddddd) \
+
+#define LDST_PREDST(OP,TAGPRE,NT,MIN2) \
+    OP(TAGPRE##_pred,      1 NT 1,vv,MIN2 0,sssss) \
+    OP(TAGPRE##_npred,     1 NT 1,vv,MIN2 1,sssss)
+
+#define LDST_PREDSTNEW(OP,TAGPRE,NT,MIN2) \
+    OP(TAGPRE##_pred,      1 NT 1,vv,MIN2 0,NT 0 sss) \
+    OP(TAGPRE##_npred,     1 NT 1,vv,MIN2 1,NT 1 sss)
+
+// 0.0,vv,0--,sssss: pred st
+#define LDST_BASICPREDST(OP,TAGPRE) \
+    LDST_PREDST(OP,TAGPRE,             0,00) \
+    LDST_PREDST(OP,TAGPRE##_nt,        1,00) \
+    LDST_PREDSTNEW(OP,TAGPRE##_new,    0,01) \
+    LDST_PREDSTNEW(OP,TAGPRE##_nt_new, 1,01)
+
+
+
+LDST_BASICLD(LDST_BO,V6_vL32b)
+LDST_CONDLD(LDST_BO,V6_vL32b)
+LDST_BASICLD(LDST_PI,V6_vL32b)
+LDST_CONDLD(LDST_PI,V6_vL32b)
+LDST_BASICLD(LDST_PM,V6_vL32b)
+LDST_CONDLD(LDST_PM,V6_vL32b)
+
+// Loads
+
+LDST_BO(V6_vL32Ub,000,00,111,ddddd)
+//Stores
+LDST_BASICST(LDST_BO,V6_vS32b)
+
+
+LDST_BO(V6_vS32Ub,001,--,111,sssss)
+
+
+
+
+// Byte Enabled Stores
+LDST_QPREDST(LDST_BO,V6_vS32b)
+
+// Scalar Predicated Stores
+LDST_BASICPREDST(LDST_BO,V6_vS32b)
+
+
+LDST_PREDST(LDST_BO,V6_vS32Ub,0,11)
+
+
+
+
+DEF_FIELDROW_DESC32(                ICLASS_NCJ" 1 001 --- ----- PP - ----- ddddd ---","[#1] vmem(Rx++#s3)[:nt]")
+
+// Loads
+LDST_PI(V6_vL32Ub,000,00,111,ddddd)
+
+//Stores
+LDST_BASICST(LDST_PI,V6_vS32b)
+
+
+
+LDST_PI(V6_vS32Ub,001,--,111,sssss)
+
+
+// Byte Enabled Stores
+LDST_QPREDST(LDST_PI,V6_vS32b)
+
+
+// Scalar Predicated Stores
+LDST_BASICPREDST(LDST_PI,V6_vS32b)
+
+
+LDST_PREDST(LDST_PI,V6_vS32Ub,0,11)
+
+
+
+DEF_FIELDROW_DESC32(            ICLASS_NCJ" 1 011 --- ----- PP - ----- ----- ---","[#3] vmem(Rx++#M)[:nt]")
+
+// Loads
+LDST_PM(V6_vL32Ub,000,00,111,ddddd)
+
+//Stores
+LDST_BASICST(LDST_PM,V6_vS32b)
+
+
+
+LDST_PM(V6_vS32Ub,001,--,111,sssss)
+
+// Byte Enabled Stores
+LDST_QPREDST(LDST_PM,V6_vS32b)
+
+// Scalar Predicated Stores
+LDST_BASICPREDST(LDST_PM,V6_vS32b)
+
+
+LDST_PREDST(LDST_PM,V6_vS32Ub,0,11)
+
+
+
+DEF_ENC(V6_vaddcarrysat,    ICLASS_CJ" 1 101 100 vvvvv PP 1 uuuuu 0ss ddddd") //
+DEF_ENC(V6_vaddcarryo,        ICLASS_CJ" 1 101 101 vvvvv PP 1 uuuuu 0ee ddddd") //
+DEF_ENC(V6_vsubcarryo,        ICLASS_CJ" 1 101 101 vvvvv PP 1 uuuuu 1ee ddddd") //
+DEF_ENC(V6_vsatdw,          ICLASS_CJ" 1 101 100 vvvvv PP 1 uuuuu 111 ddddd") //
+
+DEF_FIELDROW_DESC32(           ICLASS_NCJ" 1 111 --- ----- PP - ----- ----- ---","[#6] vgather,vscatter")
+DEF_ENC(V6_vgathermw,         ICLASS_NCJ" 1 111 000 ttttt PP u --000 --- vvvvv")    // vtmp.w=vmem(Rt32,Mu2,Vv32.w).w
+DEF_ENC(V6_vgathermh,         ICLASS_NCJ" 1 111 000 ttttt PP u --001 --- vvvvv")    // vtmp.h=vmem(Rt32,Mu2,Vv32.h).h
+DEF_ENC(V6_vgathermhw,         ICLASS_NCJ" 1 111 000 ttttt PP u --010 --- vvvvv")    // vtmp.h=vmem(Rt32,Mu2,Vvv32.w).h
+
+
+DEF_ENC(V6_vgathermwq,         ICLASS_NCJ" 1 111 000 ttttt PP u --100 -ss vvvvv")    // if (Qs4) vtmp.w=vmem(Rt32,Mu2,Vv32.w).w
+DEF_ENC(V6_vgathermhq,         ICLASS_NCJ" 1 111 000 ttttt PP u --101 -ss vvvvv")    // if (Qs4) vtmp.h=vmem(Rt32,Mu2,Vv32.h).h
+DEF_ENC(V6_vgathermhwq,     ICLASS_NCJ" 1 111 000 ttttt PP u --110 -ss vvvvv")    // if (Qs4) vtmp.h=vmem(Rt32,Mu2,Vvv32.w).h
+
+
+
+DEF_ENC(V6_vscattermw,         ICLASS_NCJ" 1 111 001 ttttt PP u vvvvv 000 wwwww")    // vmem(Rt32,Mu2,Vv32.w)=Vw32.w
+DEF_ENC(V6_vscattermh,         ICLASS_NCJ" 1 111 001 ttttt PP u vvvvv 001 wwwww")    // vmem(Rt32,Mu2,Vv32.h)=Vw32.h
+DEF_ENC(V6_vscattermhw,     ICLASS_NCJ" 1 111 001 ttttt PP u vvvvv 010 wwwww")    // vmem(Rt32,Mu2,Vv32.h)=Vw32.h
+
+DEF_ENC(V6_vscattermw_add,     ICLASS_NCJ" 1 111 001 ttttt PP u vvvvv 100 wwwww")    // vmem(Rt32,Mu2,Vv32.w) += Vw32.w
+DEF_ENC(V6_vscattermh_add,     ICLASS_NCJ" 1 111 001 ttttt PP u vvvvv 101 wwwww")    // vmem(Rt32,Mu2,Vv32.h) += Vw32.h
+DEF_ENC(V6_vscattermhw_add, ICLASS_NCJ" 1 111 001 ttttt PP u vvvvv 110 wwwww")    // vmem(Rt32,Mu2,Vv32.h) += Vw32.h
+
+
+DEF_ENC(V6_vscattermwq,     ICLASS_NCJ" 1 111 100 ttttt PP u vvvvv 0ss wwwww")    // if (Qs4) vmem(Rt32,Mu2,Vv32.w)=Vw32.w
+DEF_ENC(V6_vscattermhq,     ICLASS_NCJ" 1 111 100 ttttt PP u vvvvv 1ss wwwww")    // if (Qs4) vmem(Rt32,Mu2,Vv32.h)=Vw32.h
+DEF_ENC(V6_vscattermhwq,     ICLASS_NCJ" 1 111 101 ttttt PP u vvvvv 0ss wwwww")    // if (Qs4) vmem(Rt32,Mu2,Vv32.h)=Vw32.h
+
+
+
+
+
+DEF_CLASS32(ICLASS_CJ" 1--- -------- PP------ --------",COPROC_VX)
+
+
+
+/***************************************************************
+*
+*  Group #0, Uses Q6 Rt8: new in v61
+*
+****************************************************************/
+
+DEF_FIELDROW_DESC32(            ICLASS_CJ" 1 000 --- ----- PP - ----- ----- ---","[#1] Vd32=(Vu32, Vv32, Rt8)")
+DEF_ENC(V6_vasrhbsat,             ICLASS_CJ" 1 000 vvv vvttt PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vasruwuhrndsat,         ICLASS_CJ" 1 000 vvv vvttt PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vasrwuhrndsat,         ICLASS_CJ" 1 000 vvv vvttt PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vlutvvb_nm,             ICLASS_CJ" 1 000 vvv vvttt PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vlutvwh_nm,             ICLASS_CJ" 1 000 vvv vvttt PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vasruhubrndsat,         ICLASS_CJ" 1 000 vvv vvttt PP 0 uuuuu 111 ddddd") //
+
+DEF_ENC(V6_vasruwuhsat,         ICLASS_CJ" 1 000 vvv vvttt PP 1 uuuuu 100 ddddd") //
+DEF_ENC(V6_vasruhubsat,            ICLASS_CJ" 1 000 vvv vvttt PP 1 uuuuu 101 ddddd") //
+
+/***************************************************************
+*
+*  Group #1, Uses Q6 Rt32
+*
+****************************************************************/
+
+DEF_FIELDROW_DESC32(        ICLASS_CJ" 1 001 --- ----- PP - ----- ----- ---","[#1] Vd32=(Vu32, Rt32)")
+DEF_ENC(V6_vtmpyb,             ICLASS_CJ" 1 001 000 ttttt PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vtmpybus,         ICLASS_CJ" 1 001 000 ttttt PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vdmpyhb,         ICLASS_CJ" 1 001 000 ttttt PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vrmpyub,         ICLASS_CJ" 1 001 000 ttttt PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vrmpybus,         ICLASS_CJ" 1 001 000 ttttt PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vdsaduh,         ICLASS_CJ" 1 001 000 ttttt PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vdmpybus,         ICLASS_CJ" 1 001 000 ttttt PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vdmpybus_dv,     ICLASS_CJ" 1 001 000 ttttt PP 0 uuuuu 111 ddddd") //
+
+DEF_ENC(V6_vdmpyhsusat,     ICLASS_CJ" 1 001 001 ttttt PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vdmpyhsuisat,     ICLASS_CJ" 1 001 001 ttttt PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vdmpyhsat,         ICLASS_CJ" 1 001 001 ttttt PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vdmpyhisat,         ICLASS_CJ" 1 001 001 ttttt PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vdmpyhb_dv,         ICLASS_CJ" 1 001 001 ttttt PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vmpybus,         ICLASS_CJ" 1 001 001 ttttt PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vmpabus,         ICLASS_CJ" 1 001 001 ttttt PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vmpahb,             ICLASS_CJ" 1 001 001 ttttt PP 0 uuuuu 111 ddddd") //
+
+DEF_ENC(V6_vmpyh,             ICLASS_CJ" 1 001 010 ttttt PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vmpyhss,         ICLASS_CJ" 1 001 010 ttttt PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vmpyhsrs,         ICLASS_CJ" 1 001 010 ttttt PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vmpyuh,             ICLASS_CJ" 1 001 010 ttttt PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vrmpybusi,         ICLASS_CJ" 1 001 010 ttttt PP 0 uuuuu 10i ddddd") //
+DEF_ENC(V6_vrsadubi,         ICLASS_CJ" 1 001 010 ttttt PP 0 uuuuu 11i ddddd") //
+
+DEF_ENC(V6_vmpyihb,         ICLASS_CJ" 1 001 011 ttttt PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vror,             ICLASS_CJ" 1 001 011 ttttt PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vmpyuhe,         ICLASS_CJ" 1 001 011 ttttt PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vmpabuu,         ICLASS_CJ" 1 001 011 ttttt PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vlut4,            ICLASS_CJ" 1 001 011 ttttt PP 0 uuuuu 100 ddddd") //
+
+
+DEF_ENC(V6_vasrw,             ICLASS_CJ" 1 001 011 ttttt PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vasrh,             ICLASS_CJ" 1 001 011 ttttt PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vaslw,             ICLASS_CJ" 1 001 011 ttttt PP 0 uuuuu 111 ddddd") //
+
+DEF_ENC(V6_vaslh,             ICLASS_CJ" 1 001 100 ttttt PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vlsrw,             ICLASS_CJ" 1 001 100 ttttt PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vlsrh,             ICLASS_CJ" 1 001 100 ttttt PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vlsrb,            ICLASS_CJ" 1 001 100 ttttt PP 0 uuuuu 011 ddddd") //
+
+DEF_ENC(V6_vmpauhb,            ICLASS_CJ" 1 001 100 ttttt PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vmpyiwub,         ICLASS_CJ" 1 001 100 ttttt PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vmpyiwh,         ICLASS_CJ" 1 001 100 ttttt PP 0 uuuuu 111 ddddd") //
+
+DEF_ENC(V6_vmpyiwb,         ICLASS_CJ" 1 001 101 ttttt PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_lvsplatw,         ICLASS_CJ" 1 001 101 ttttt PP 0 ----0 001 ddddd") //
+
+
+
+DEF_ENC(V6_pred_scalar2,     ICLASS_CJ" 1 001 101 ttttt PP 0 ----- 010 -01dd") //
+DEF_ENC(V6_vandvrt,         ICLASS_CJ" 1 001 101 ttttt PP 0 uuuuu 010 -10dd") //
+DEF_ENC(V6_pred_scalar2v2,     ICLASS_CJ" 1 001 101 ttttt PP 0 ----- 010 -11dd") //
+
+DEF_ENC(V6_vtmpyhb,         ICLASS_CJ" 1 001 101 ttttt PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vandqrt,         ICLASS_CJ" 1 001 101 ttttt PP 0 --0uu 101 ddddd") //
+DEF_ENC(V6_vandnqrt,         ICLASS_CJ" 1 001 101 ttttt PP 0 --1uu 101 ddddd") //
+
+DEF_ENC(V6_vrmpyubi,         ICLASS_CJ" 1 001 101 ttttt PP 0 uuuuu 11i ddddd") //
+
+DEF_ENC(V6_vmpyub,             ICLASS_CJ" 1 001 110 ttttt PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_lvsplath,         ICLASS_CJ" 1 001 110 ttttt PP 0 ----- 001 ddddd") //
+DEF_ENC(V6_lvsplatb,         ICLASS_CJ" 1 001 110 ttttt PP 0 ----- 010 ddddd") //
+
+
+DEF_FIELDROW_DESC32(        ICLASS_CJ" 1 001 --- ----- PP - ----- ----- ---","[#1] Vx32=(Vu32, Rt32)")
+DEF_ENC(V6_vtmpyb_acc,         ICLASS_CJ" 1 001 000 ttttt PP 1 uuuuu 000 xxxxx") //
+DEF_ENC(V6_vtmpybus_acc,     ICLASS_CJ" 1 001 000 ttttt PP 1 uuuuu 001 xxxxx") //
+DEF_ENC(V6_vtmpyhb_acc,     ICLASS_CJ" 1 001 000 ttttt PP 1 uuuuu 010 xxxxx") //
+DEF_ENC(V6_vdmpyhb_acc,     ICLASS_CJ" 1 001 000 ttttt PP 1 uuuuu 011 xxxxx") //
+DEF_ENC(V6_vrmpyub_acc,     ICLASS_CJ" 1 001 000 ttttt PP 1 uuuuu 100 xxxxx") //
+DEF_ENC(V6_vrmpybus_acc,     ICLASS_CJ" 1 001 000 ttttt PP 1 uuuuu 101 xxxxx") //
+DEF_ENC(V6_vdmpybus_acc,     ICLASS_CJ" 1 001 000 ttttt PP 1 uuuuu 110 xxxxx") //
+DEF_ENC(V6_vdmpybus_dv_acc, ICLASS_CJ" 1 001 000 ttttt PP 1 uuuuu 111 xxxxx") //
+
+DEF_ENC(V6_vdmpyhsusat_acc, ICLASS_CJ" 1 001 001 ttttt PP 1 uuuuu 000 xxxxx") //
+DEF_ENC(V6_vdmpyhsuisat_acc,ICLASS_CJ" 1 001 001 ttttt PP 1 uuuuu 001 xxxxx") //
+DEF_ENC(V6_vdmpyhisat_acc,     ICLASS_CJ" 1 001 001 ttttt PP 1 uuuuu 010 xxxxx") //
+DEF_ENC(V6_vdmpyhsat_acc,     ICLASS_CJ" 1 001 001 ttttt PP 1 uuuuu 011 xxxxx") //
+DEF_ENC(V6_vdmpyhb_dv_acc,     ICLASS_CJ" 1 001 001 ttttt PP 1 uuuuu 100 xxxxx") //
+DEF_ENC(V6_vmpybus_acc,     ICLASS_CJ" 1 001 001 ttttt PP 1 uuuuu 101 xxxxx") //
+DEF_ENC(V6_vmpabus_acc,     ICLASS_CJ" 1 001 001 ttttt PP 1 uuuuu 110 xxxxx") //
+DEF_ENC(V6_vmpahb_acc,         ICLASS_CJ" 1 001 001 ttttt PP 1 uuuuu 111 xxxxx") //
+
+DEF_ENC(V6_vmpyhsat_acc,     ICLASS_CJ" 1 001 010 ttttt PP 1 uuuuu 000 xxxxx") //
+DEF_ENC(V6_vmpyuh_acc,         ICLASS_CJ" 1 001 010 ttttt PP 1 uuuuu 001 xxxxx") //
+DEF_ENC(V6_vmpyiwb_acc,     ICLASS_CJ" 1 001 010 ttttt PP 1 uuuuu 010 xxxxx") //
+DEF_ENC(V6_vmpyiwh_acc,     ICLASS_CJ" 1 001 010 ttttt PP 1 uuuuu 011 xxxxx") //
+DEF_ENC(V6_vrmpybusi_acc,     ICLASS_CJ" 1 001 010 ttttt PP 1 uuuuu 10i xxxxx") //
+DEF_ENC(V6_vrsadubi_acc,     ICLASS_CJ" 1 001 010 ttttt PP 1 uuuuu 11i xxxxx") //
+
+DEF_ENC(V6_vdsaduh_acc,     ICLASS_CJ" 1 001 011 ttttt PP 1 uuuuu 000 xxxxx") //
+DEF_ENC(V6_vmpyihb_acc,     ICLASS_CJ" 1 001 011 ttttt PP 1 uuuuu 001 xxxxx") //
+DEF_ENC(V6_vaslw_acc,         ICLASS_CJ" 1 001 011 ttttt PP 1 uuuuu 010 xxxxx") //
+DEF_ENC(V6_vandqrt_acc,     ICLASS_CJ" 1 001 011 ttttt PP 1 --0uu 011 xxxxx") //
+DEF_ENC(V6_vandnqrt_acc,     ICLASS_CJ" 1 001 011 ttttt PP 1 --1uu 011 xxxxx") //
+DEF_ENC(V6_vandvrt_acc,     ICLASS_CJ" 1 001 011 ttttt PP 1 uuuuu 100 ---xx") //
+DEF_ENC(V6_vasrw_acc,         ICLASS_CJ" 1 001 011 ttttt PP 1 uuuuu 101 xxxxx") //
+DEF_ENC(V6_vrmpyubi_acc,     ICLASS_CJ" 1 001 011 ttttt PP 1 uuuuu 11i xxxxx") //
+
+DEF_ENC(V6_vmpyub_acc,         ICLASS_CJ" 1 001 100 ttttt PP 1 uuuuu 000 xxxxx") //
+DEF_ENC(V6_vmpyiwub_acc,    ICLASS_CJ" 1 001 100 ttttt PP 1 uuuuu 001 xxxxx") //
+DEF_ENC(V6_vmpauhb_acc,        ICLASS_CJ" 1 001 100 ttttt PP 1 uuuuu 010 xxxxx") //
+DEF_ENC(V6_vmpyuhe_acc,        ICLASS_CJ" 1 001 100 ttttt PP 1 uuuuu 011 xxxxx")
+DEF_ENC(V6_vmpahhsat,        ICLASS_CJ" 1 001 100 ttttt PP 1 uuuuu 100 xxxxx") //
+DEF_ENC(V6_vmpauhuhsat,        ICLASS_CJ" 1 001 100 ttttt PP 1 uuuuu 101 xxxxx") //
+DEF_ENC(V6_vmpsuhuhsat,        ICLASS_CJ" 1 001 100 ttttt PP 1 uuuuu 110 xxxxx") //
+DEF_ENC(V6_vasrh_acc,         ICLASS_CJ" 1 001 100 ttttt PP 1 uuuuu 111 xxxxx") //
+
+
+
+
+DEF_ENC(V6_vinsertwr,        ICLASS_CJ" 1 001 101 ttttt PP 1 ----- 001 xxxxx")
+
+DEF_ENC(V6_vmpabuu_acc,        ICLASS_CJ" 1 001 101 ttttt PP 1 uuuuu 100 xxxxx") //
+DEF_ENC(V6_vaslh_acc,        ICLASS_CJ" 1 001 101 ttttt PP 1 uuuuu 101 xxxxx") //
+DEF_ENC(V6_vmpyh_acc,        ICLASS_CJ" 1 001 101 ttttt PP 1 uuuuu 110 xxxxx") //
+
+
+
+DEF_FIELDROW_DESC32(        ICLASS_CJ" 1 001 --- ----- PP - ----- ----- ---","[#1] (Vx32, Vy32, Rt32)")
+DEF_ENC(V6_vshuff,             ICLASS_CJ" 1 001 111 ttttt PP 1 yyyyy 001 xxxxx") //
+DEF_ENC(V6_vdeal,             ICLASS_CJ" 1 001 111 ttttt PP 1 yyyyy 010 xxxxx") //
+
+DEF_FIELDROW_DESC32(    ICLASS_CJ" 1 010 --- ----- PP - ----- ----- ---","[#2] if (Ps) Vd=Vu")
+DEF_ENC(V6_vcmov,         ICLASS_CJ" 1 010 000 ----- PP - uuuuu -ss ddddd")
+DEF_ENC(V6_vncmov,         ICLASS_CJ" 1 010 001 ----- PP - uuuuu -ss ddddd")
+DEF_ENC(V6_vnccombine,     ICLASS_CJ" 1 010 010 vvvvv PP - uuuuu -ss ddddd")
+DEF_ENC(V6_vccombine,     ICLASS_CJ" 1 010 011 vvvvv PP - uuuuu -ss ddddd")
+
+DEF_ENC(V6_vrotr,       ICLASS_CJ" 1 010 100 vvvvv PP 1 uuuuu 111 ddddd")
+DEF_ENC(V6_vasr_into,   ICLASS_CJ" 1 010 101 vvvvv PP 1 uuuuu 111 xxxxx")
+
+/***************************************************************
+*
+*  Group #3, Uses Q6 Rt8
+*
+****************************************************************/
+
+DEF_FIELDROW_DESC32(        ICLASS_CJ" 1 011 --- ----- PP - ----- ----- ---","[#3] Vd32=(Vu32, Vv32, Rt8)")
+DEF_ENC(V6_valignb,         ICLASS_CJ" 1 011 vvv vvttt PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vlalignb,         ICLASS_CJ" 1 011 vvv vvttt PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vasrwh,         ICLASS_CJ" 1 011 vvv vvttt PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vasrwhsat,         ICLASS_CJ" 1 011 vvv vvttt PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vasrwhrndsat,     ICLASS_CJ" 1 011 vvv vvttt PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vasrwuhsat,         ICLASS_CJ" 1 011 vvv vvttt PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vasrhubsat,         ICLASS_CJ" 1 011 vvv vvttt PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vasrhubrndsat,     ICLASS_CJ" 1 011 vvv vvttt PP 0 uuuuu 111 ddddd") //
+
+DEF_ENC(V6_vasrhbrndsat,     ICLASS_CJ" 1 011 vvv vvttt PP 1 uuuuu 000 ddddd") //
+DEF_ENC(V6_vlutvvb,            ICLASS_CJ" 1 011 vvv vvttt PP 1 uuuuu 001 ddddd")
+DEF_ENC(V6_vshuffvdd,         ICLASS_CJ" 1 011 vvv vvttt PP 1 uuuuu 011 ddddd") //
+DEF_ENC(V6_vdealvdd,         ICLASS_CJ" 1 011 vvv vvttt PP 1 uuuuu 100 ddddd") //
+DEF_ENC(V6_vlutvvb_oracc,    ICLASS_CJ" 1 011 vvv vvttt PP 1 uuuuu 101 xxxxx")
+DEF_ENC(V6_vlutvwh,            ICLASS_CJ" 1 011 vvv vvttt PP 1 uuuuu 110 ddddd")
+DEF_ENC(V6_vlutvwh_oracc,    ICLASS_CJ" 1 011 vvv vvttt PP 1 uuuuu 111 xxxxx")
+
+
+
+/***************************************************************
+*
+*  Group #4, No Q6 regs
+*
+****************************************************************/
+
+DEF_FIELDROW_DESC32(    ICLASS_CJ" 1 100 --- ----- PP 0 ----- ----- ---","[#4] Vd32=(Vu32, Vv32)")
+DEF_ENC(V6_vrmpyubv,     ICLASS_CJ" 1 100 000 vvvvv PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vrmpybv,     ICLASS_CJ" 1 100 000 vvvvv PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vrmpybusv,     ICLASS_CJ" 1 100 000 vvvvv PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vdmpyhvsat,     ICLASS_CJ" 1 100 000 vvvvv PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vmpybv,         ICLASS_CJ" 1 100 000 vvvvv PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vmpyubv,     ICLASS_CJ" 1 100 000 vvvvv PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vmpybusv,     ICLASS_CJ" 1 100 000 vvvvv PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vmpyhv,         ICLASS_CJ" 1 100 000 vvvvv PP 0 uuuuu 111 ddddd") //
+
+DEF_ENC(V6_vmpyuhv,     ICLASS_CJ" 1 100 001 vvvvv PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vmpyhvsrs,     ICLASS_CJ" 1 100 001 vvvvv PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vmpyhus,     ICLASS_CJ" 1 100 001 vvvvv PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vmpabusv,     ICLASS_CJ" 1 100 001 vvvvv PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vmpyih,         ICLASS_CJ" 1 100 001 vvvvv PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vand,         ICLASS_CJ" 1 100 001 vvvvv PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vor,         ICLASS_CJ" 1 100 001 vvvvv PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vxor,         ICLASS_CJ" 1 100 001 vvvvv PP 0 uuuuu 111 ddddd") //
+
+DEF_ENC(V6_vaddw,         ICLASS_CJ" 1 100 010 vvvvv PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vaddubsat,     ICLASS_CJ" 1 100 010 vvvvv PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vadduhsat,     ICLASS_CJ" 1 100 010 vvvvv PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vaddhsat,     ICLASS_CJ" 1 100 010 vvvvv PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vaddwsat,     ICLASS_CJ" 1 100 010 vvvvv PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vsubb,         ICLASS_CJ" 1 100 010 vvvvv PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vsubh,         ICLASS_CJ" 1 100 010 vvvvv PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vsubw,         ICLASS_CJ" 1 100 010 vvvvv PP 0 uuuuu 111 ddddd") //
+
+DEF_ENC(V6_vsububsat,     ICLASS_CJ" 1 100 011 vvvvv PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vsubuhsat,     ICLASS_CJ" 1 100 011 vvvvv PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vsubhsat,     ICLASS_CJ" 1 100 011 vvvvv PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vsubwsat,     ICLASS_CJ" 1 100 011 vvvvv PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vaddb_dv,     ICLASS_CJ" 1 100 011 vvvvv PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vaddh_dv,     ICLASS_CJ" 1 100 011 vvvvv PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vaddw_dv,     ICLASS_CJ" 1 100 011 vvvvv PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vaddubsat_dv,ICLASS_CJ" 1 100 011 vvvvv PP 0 uuuuu 111 ddddd") //
+
+DEF_ENC(V6_vadduhsat_dv,ICLASS_CJ" 1 100 100 vvvvv PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vaddhsat_dv, ICLASS_CJ" 1 100 100 vvvvv PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vaddwsat_dv, ICLASS_CJ" 1 100 100 vvvvv PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vsubb_dv,     ICLASS_CJ" 1 100 100 vvvvv PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vsubh_dv,     ICLASS_CJ" 1 100 100 vvvvv PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vsubw_dv,     ICLASS_CJ" 1 100 100 vvvvv PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vsububsat_dv,ICLASS_CJ" 1 100 100 vvvvv PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vsubuhsat_dv,ICLASS_CJ" 1 100 100 vvvvv PP 0 uuuuu 111 ddddd") //
+
+DEF_ENC(V6_vsubhsat_dv,    ICLASS_CJ" 1 100 101 vvvvv PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vsubwsat_dv, ICLASS_CJ" 1 100 101 vvvvv PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vaddubh,     ICLASS_CJ" 1 100 101 vvvvv PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vadduhw,     ICLASS_CJ" 1 100 101 vvvvv PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vaddhw,         ICLASS_CJ" 1 100 101 vvvvv PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vsububh,     ICLASS_CJ" 1 100 101 vvvvv PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vsubuhw,        ICLASS_CJ" 1 100 101 vvvvv PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vsubhw,        ICLASS_CJ" 1 100 101 vvvvv PP 0 uuuuu 111 ddddd") //
+
+DEF_ENC(V6_vabsdiffub,    ICLASS_CJ" 1 100 110 vvvvv PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vabsdiffh,     ICLASS_CJ" 1 100 110 vvvvv PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vabsdiffuh,     ICLASS_CJ" 1 100 110 vvvvv PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vabsdiffw,     ICLASS_CJ" 1 100 110 vvvvv PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vavgub,         ICLASS_CJ" 1 100 110 vvvvv PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vavguh,         ICLASS_CJ" 1 100 110 vvvvv PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vavgh,        ICLASS_CJ" 1 100 110 vvvvv PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vavgw,        ICLASS_CJ" 1 100 110 vvvvv PP 0 uuuuu 111 ddddd") //
+
+DEF_ENC(V6_vnavgub,        ICLASS_CJ" 1 100 111 vvvvv PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vnavgh,         ICLASS_CJ" 1 100 111 vvvvv PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vnavgw,         ICLASS_CJ" 1 100 111 vvvvv PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vavgubrnd,     ICLASS_CJ" 1 100 111 vvvvv PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vavguhrnd,     ICLASS_CJ" 1 100 111 vvvvv PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vavghrnd,     ICLASS_CJ" 1 100 111 vvvvv PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vavgwrnd,    ICLASS_CJ" 1 100 111 vvvvv PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vmpabuuv,    ICLASS_CJ" 1 100 111 vvvvv PP 0 uuuuu 111 ddddd") //
+
+DEF_FIELDROW_DESC32(        ICLASS_CJ" 1 100 --- ----- PP 1 ----- ----- ---","[#4] Vx32=(Vu32, Vv32)")
+DEF_ENC(V6_vrmpyubv_acc,      ICLASS_CJ" 1 100 000 vvvvv PP 1 uuuuu 000 xxxxx") //
+DEF_ENC(V6_vrmpybv_acc,       ICLASS_CJ" 1 100 000 vvvvv PP 1 uuuuu 001 xxxxx") //
+DEF_ENC(V6_vrmpybusv_acc,    ICLASS_CJ" 1 100 000 vvvvv PP 1 uuuuu 010 xxxxx") //
+DEF_ENC(V6_vdmpyhvsat_acc,    ICLASS_CJ" 1 100 000 vvvvv PP 1 uuuuu 011 xxxxx") //
+DEF_ENC(V6_vmpybv_acc,         ICLASS_CJ" 1 100 000 vvvvv PP 1 uuuuu 100 xxxxx") //
+DEF_ENC(V6_vmpyubv_acc,     ICLASS_CJ" 1 100 000 vvvvv PP 1 uuuuu 101 xxxxx") //
+DEF_ENC(V6_vmpybusv_acc,    ICLASS_CJ" 1 100 000 vvvvv PP 1 uuuuu 110 xxxxx") //
+DEF_ENC(V6_vmpyhv_acc,        ICLASS_CJ" 1 100 000 vvvvv PP 1 uuuuu 111 xxxxx") //
+
+DEF_ENC(V6_vmpyuhv_acc,        ICLASS_CJ" 1 100 001 vvvvv PP 1 uuuuu 000 xxxxx") //
+DEF_ENC(V6_vmpyhus_acc,     ICLASS_CJ" 1 100 001 vvvvv PP 1 uuuuu 001 xxxxx") //
+DEF_ENC(V6_vaddhw_acc,        ICLASS_CJ" 1 100 001 vvvvv PP 1 uuuuu 010 xxxxx") //
+DEF_ENC(V6_vmpyowh_64_acc,    ICLASS_CJ" 1 100 001 vvvvv PP 1 uuuuu 011 xxxxx")
+DEF_ENC(V6_vmpyih_acc,         ICLASS_CJ" 1 100 001 vvvvv PP 1 uuuuu 100 xxxxx") //
+DEF_ENC(V6_vmpyiewuh_acc,    ICLASS_CJ" 1 100 001 vvvvv PP 1 uuuuu 101 xxxxx") //
+DEF_ENC(V6_vmpyowh_sacc,    ICLASS_CJ" 1 100 001 vvvvv PP 1 uuuuu 110 xxxxx") //
+DEF_ENC(V6_vmpyowh_rnd_sacc,ICLASS_CJ" 1 100 001 vvvvv PP 1 uuuuu 111 xxxxx") //
+
+DEF_ENC(V6_vmpyiewh_acc,      ICLASS_CJ" 1 100 010 vvvvv PP 1 uuuuu 000 xxxxx") //
+
+DEF_ENC(V6_vadduhw_acc,          ICLASS_CJ" 1 100 010 vvvvv PP 1 uuuuu 100 xxxxx") //
+DEF_ENC(V6_vaddubh_acc,          ICLASS_CJ" 1 100 010 vvvvv PP 1 uuuuu 101 xxxxx") //
+
+DEF_FIELDROW_DESC32(    ICLASS_CJ" 1 100 100 ----- PP 1 ----- ----- ---","[#4] Qx4=(Vu32, Vv32)")
+// Grouped by element size (lsbs), operation (next-lsbs) and operation (next-lsbs)
+DEF_ENC(V6_veqb_and,     ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 000 000xx") //
+DEF_ENC(V6_veqh_and,     ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 000 001xx") //
+DEF_ENC(V6_veqw_and,     ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 000 010xx") //
+
+DEF_ENC(V6_vgtb_and,    ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 000 100xx") //
+DEF_ENC(V6_vgth_and,    ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 000 101xx") //
+DEF_ENC(V6_vgtw_and,    ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 000 110xx") //
+
+DEF_ENC(V6_vgtub_and,    ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 001 000xx") //
+DEF_ENC(V6_vgtuh_and,    ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 001 001xx") //
+DEF_ENC(V6_vgtuw_and,    ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 001 010xx") //
+
+DEF_ENC(V6_veqb_or,     ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 010 000xx") //
+DEF_ENC(V6_veqh_or,     ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 010 001xx") //
+DEF_ENC(V6_veqw_or,     ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 010 010xx") //
+
+DEF_ENC(V6_vgtb_or,        ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 010 100xx") //
+DEF_ENC(V6_vgth_or,        ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 010 101xx") //
+DEF_ENC(V6_vgtw_or,        ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 010 110xx") //
+
+DEF_ENC(V6_vgtub_or,    ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 011 000xx") //
+DEF_ENC(V6_vgtuh_or,    ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 011 001xx") //
+DEF_ENC(V6_vgtuw_or,    ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 011 010xx") //
+
+DEF_ENC(V6_veqb_xor,     ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 100 000xx") //
+DEF_ENC(V6_veqh_xor,     ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 100 001xx") //
+DEF_ENC(V6_veqw_xor,     ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 100 010xx") //
+
+DEF_ENC(V6_vgtb_xor,    ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 100 100xx") //
+DEF_ENC(V6_vgth_xor,    ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 100 101xx") //
+DEF_ENC(V6_vgtw_xor,    ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 100 110xx") //
+
+DEF_ENC(V6_vgtub_xor,    ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 101 000xx") //
+DEF_ENC(V6_vgtuh_xor,    ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 101 001xx") //
+DEF_ENC(V6_vgtuw_xor,    ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 101 010xx") //
+
+DEF_FIELDROW_DESC32(    ICLASS_CJ" 1 100 101 ----- PP 1 ----- ----- ---","[#4] Qx4,Vd32=(Vu32, Vv32)")
+DEF_ENC(V6_vaddcarry,    ICLASS_CJ" 1 100 101 vvvvv PP 1 uuuuu 0xx ddddd") //
+DEF_ENC(V6_vsubcarry,    ICLASS_CJ" 1 100 101 vvvvv PP 1 uuuuu 1xx ddddd") //
+
+DEF_FIELDROW_DESC32(        ICLASS_CJ" 1 100 11- ----- PP 1 ----- ----- ---","[#4] Vx32|=(Vu32, Vv32,#)")
+DEF_ENC(V6_vlutvvb_oracci,    ICLASS_CJ" 1 100 110 vvvvv PP 1 uuuuu iii xxxxx") //
+DEF_ENC(V6_vlutvwh_oracci,    ICLASS_CJ" 1 100 111 vvvvv PP 1 uuuuu iii xxxxx") //
+
+
+
+/***************************************************************
+*
+*  Group #5, Reserved/Deprecated. Uses Q6 Rx. Stupid FFT.
+*
+****************************************************************/
+
+
+
+
+/***************************************************************
+*
+*  Group #6, No Q6 regs
+*
+****************************************************************/
+
+DEF_FIELDROW_DESC32(    ICLASS_CJ" 1 110 --0 ----- PP 0 ----- ----- ---","[#6] Vd32=Vu32")
+DEF_ENC(V6_vabsh,         ICLASS_CJ" 1 110 --0 ---00 PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vabsh_sat,     ICLASS_CJ" 1 110 --0 ---00 PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vabsw,         ICLASS_CJ" 1 110 --0 ---00 PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vabsw_sat,     ICLASS_CJ" 1 110 --0 ---00 PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vnot,         ICLASS_CJ" 1 110 --0 ---00 PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vdealh,         ICLASS_CJ" 1 110 --0 ---00 PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vdealb,         ICLASS_CJ" 1 110 --0 ---00 PP 0 uuuuu 111 ddddd") //
+
+DEF_ENC(V6_vunpackub,     ICLASS_CJ" 1 110 --0 ---01 PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vunpackuh,     ICLASS_CJ" 1 110 --0 ---01 PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vunpackb,     ICLASS_CJ" 1 110 --0 ---01 PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vunpackh,     ICLASS_CJ" 1 110 --0 ---01 PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vabsb,         ICLASS_CJ" 1 110 --0 ---01 PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vabsb_sat,     ICLASS_CJ" 1 110 --0 ---01 PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vshuffh,     ICLASS_CJ" 1 110 --0 ---01 PP 0 uuuuu 111 ddddd") //
+
+DEF_ENC(V6_vshuffb,     ICLASS_CJ" 1 110 --0 ---10 PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vzb,         ICLASS_CJ" 1 110 --0 ---10 PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vzh,         ICLASS_CJ" 1 110 --0 ---10 PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vsb,         ICLASS_CJ" 1 110 --0 ---10 PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vsh,         ICLASS_CJ" 1 110 --0 ---10 PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vcl0w,         ICLASS_CJ" 1 110 --0 ---10 PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vpopcounth,     ICLASS_CJ" 1 110 --0 ---10 PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vcl0h,         ICLASS_CJ" 1 110 --0 ---10 PP 0 uuuuu 111 ddddd") //
+
+
+DEF_FIELDROW_DESC32(    ICLASS_CJ" 1 110 --0 ---11 PP 0 ----- ----- ---","[#6] Qd4=Qt4, Qs4")
+DEF_ENC(V6_pred_and,     ICLASS_CJ" 1 110 tt0 ---11 PP 0 ---ss 000 000dd") //
+DEF_ENC(V6_pred_or,     ICLASS_CJ" 1 110 tt0 ---11 PP 0 ---ss 000 001dd") //
+DEF_ENC(V6_pred_not,     ICLASS_CJ" 1 110 --0 ---11 PP 0 ---ss 000 010dd") //
+DEF_ENC(V6_pred_xor,     ICLASS_CJ" 1 110 tt0 ---11 PP 0 ---ss 000 011dd") //
+DEF_ENC(V6_pred_or_n,     ICLASS_CJ" 1 110 tt0 ---11 PP 0 ---ss 000 100dd") //
+DEF_ENC(V6_pred_and_n,     ICLASS_CJ" 1 110 tt0 ---11 PP 0 ---ss 000 101dd") //
+DEF_ENC(V6_shuffeqh,     ICLASS_CJ" 1 110 tt0 ---11 PP 0 ---ss 000 110dd") //
+DEF_ENC(V6_shuffeqw,     ICLASS_CJ" 1 110 tt0 ---11 PP 0 ---ss 000 111dd") //
+
+DEF_ENC(V6_vnormamtw,        ICLASS_CJ" 1 110 --0 ---11 PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vnormamth,        ICLASS_CJ" 1 110 --0 ---11 PP 0 uuuuu 101 ddddd") //
+
+DEF_FIELDROW_DESC32(        ICLASS_CJ" 1 110 --1 ----- PP 0 ----- ----- ---","[#6] Vd32=Vu32,Vv32")
+DEF_ENC(V6_vlutvvbi,        ICLASS_CJ" 1 110 001 vvvvv PP 0 uuuuu iii ddddd")
+DEF_ENC(V6_vlutvwhi,        ICLASS_CJ" 1 110 011 vvvvv PP 0 uuuuu iii ddddd")
+
+DEF_ENC(V6_vaddbsat_dv,        ICLASS_CJ" 1 110 101 vvvvv PP 0 uuuuu 000 ddddd")
+DEF_ENC(V6_vsubbsat_dv,        ICLASS_CJ" 1 110 101 vvvvv PP 0 uuuuu 001 ddddd")
+DEF_ENC(V6_vadduwsat_dv,    ICLASS_CJ" 1 110 101 vvvvv PP 0 uuuuu 010 ddddd")
+DEF_ENC(V6_vsubuwsat_dv,    ICLASS_CJ" 1 110 101 vvvvv PP 0 uuuuu 011 ddddd")
+DEF_ENC(V6_vaddububb_sat,    ICLASS_CJ" 1 110 101 vvvvv PP 0 uuuuu 100 ddddd")
+DEF_ENC(V6_vsubububb_sat,    ICLASS_CJ" 1 110 101 vvvvv PP 0 uuuuu 101 ddddd")
+DEF_ENC(V6_vmpyewuh_64,        ICLASS_CJ" 1 110 101 vvvvv PP 0 uuuuu 110 ddddd")
+
+DEF_FIELDROW_DESC32(        ICLASS_CJ" 1 110 --0 ----- PP 1 ----- ----- ---","Vx32=Vu32")
+DEF_ENC(V6_vunpackob,         ICLASS_CJ" 1 110 --0 ---00 PP 1 uuuuu 000 xxxxx") //
+DEF_ENC(V6_vunpackoh,         ICLASS_CJ" 1 110 --0 ---00 PP 1 uuuuu 001 xxxxx") //
+//DEF_ENC(V6_vunpackow,     ICLASS_CJ" 1 110 --0 ---00 PP 1 uuuuu 010 xxxxx") //
+
+DEF_ENC(V6_vhist,            ICLASS_CJ" 1 110 --0 ---00 PP 1 -000- 100 -----")
+DEF_ENC(V6_vwhist256,        ICLASS_CJ" 1 110 --0 ---00 PP 1 -0010 100 -----")
+DEF_ENC(V6_vwhist256_sat,    ICLASS_CJ" 1 110 --0 ---00 PP 1 -0011 100 -----")
+DEF_ENC(V6_vwhist128,        ICLASS_CJ" 1 110 --0 ---00 PP 1 -010- 100 -----")
+DEF_ENC(V6_vwhist128m,        ICLASS_CJ" 1 110 --0 ---00 PP 1 -011i 100 -----")
+
+DEF_FIELDROW_DESC32(        ICLASS_CJ" 1 110 --0 ----- PP 1 ----- ----- ---","if (Qv4) Vx32=Vu32")
+DEF_ENC(V6_vaddbq,             ICLASS_CJ" 1 110 vv0 ---01 PP 1 uuuuu 000 xxxxx") //
+DEF_ENC(V6_vaddhq,             ICLASS_CJ" 1 110 vv0 ---01 PP 1 uuuuu 001 xxxxx") //
+DEF_ENC(V6_vaddwq,             ICLASS_CJ" 1 110 vv0 ---01 PP 1 uuuuu 010 xxxxx") //
+DEF_ENC(V6_vaddbnq,         ICLASS_CJ" 1 110 vv0 ---01 PP 1 uuuuu 011 xxxxx") //
+DEF_ENC(V6_vaddhnq,         ICLASS_CJ" 1 110 vv0 ---01 PP 1 uuuuu 100 xxxxx") //
+DEF_ENC(V6_vaddwnq,         ICLASS_CJ" 1 110 vv0 ---01 PP 1 uuuuu 101 xxxxx") //
+DEF_ENC(V6_vsubbq,             ICLASS_CJ" 1 110 vv0 ---01 PP 1 uuuuu 110 xxxxx") //
+DEF_ENC(V6_vsubhq,             ICLASS_CJ" 1 110 vv0 ---01 PP 1 uuuuu 111 xxxxx") //
+
+DEF_ENC(V6_vsubwq,             ICLASS_CJ" 1 110 vv0 ---10 PP 1 uuuuu 000 xxxxx") //
+DEF_ENC(V6_vsubbnq,         ICLASS_CJ" 1 110 vv0 ---10 PP 1 uuuuu 001 xxxxx") //
+DEF_ENC(V6_vsubhnq,         ICLASS_CJ" 1 110 vv0 ---10 PP 1 uuuuu 010 xxxxx") //
+DEF_ENC(V6_vsubwnq,         ICLASS_CJ" 1 110 vv0 ---10 PP 1 uuuuu 011 xxxxx") //
+
+DEF_ENC(V6_vhistq,            ICLASS_CJ" 1 110 vv0 ---10 PP 1 --00- 100 -----")
+DEF_ENC(V6_vwhist256q,        ICLASS_CJ" 1 110 vv0 ---10 PP 1 --010 100 -----")
+DEF_ENC(V6_vwhist256q_sat,    ICLASS_CJ" 1 110 vv0 ---10 PP 1 --011 100 -----")
+DEF_ENC(V6_vwhist128q,        ICLASS_CJ" 1 110 vv0 ---10 PP 1 --10- 100 -----")
+DEF_ENC(V6_vwhist128qm,        ICLASS_CJ" 1 110 vv0 ---10 PP 1 --11i 100 -----")
+
+
+DEF_ENC(V6_vandvqv,            ICLASS_CJ" 1 110 vv0 ---11 PP 1 uuuuu 000 ddddd")
+DEF_ENC(V6_vandvnqv,        ICLASS_CJ" 1 110 vv0 ---11 PP 1 uuuuu 001 ddddd")
+
+
+DEF_ENC(V6_vprefixqb,       ICLASS_CJ" 1 110 vv0 ---11 PP 1 --000 010 ddddd") //
+DEF_ENC(V6_vprefixqh,       ICLASS_CJ" 1 110 vv0 ---11 PP 1 --001 010 ddddd") //
+DEF_ENC(V6_vprefixqw,       ICLASS_CJ" 1 110 vv0 ---11 PP 1 --010 010 ddddd") //
+
+
+
+
+DEF_ENC(V6_vassign,            ICLASS_CJ" 1 110 --0 ---11 PP 1 uuuuu 111 ddddd")
+
+DEF_ENC(V6_valignbi,         ICLASS_CJ" 1 110 001 vvvvv PP 1 uuuuu iii ddddd")
+DEF_ENC(V6_vlalignbi,         ICLASS_CJ" 1 110 011 vvvvv PP 1 uuuuu iii ddddd")
+DEF_ENC(V6_vswap,             ICLASS_CJ" 1 110 101 vvvvv PP 1 uuuuu -tt ddddd") //
+DEF_ENC(V6_vmux,             ICLASS_CJ" 1 110 111 vvvvv PP 1 uuuuu -tt ddddd") //
+
+
+
+/***************************************************************
+*
+*  Group #7, No Q6 regs
+*
+****************************************************************/
+
+DEF_FIELDROW_DESC32(    ICLASS_CJ" 1 111 --- ----- PP 0 ----- ----- ---","[#7] Vd32=(Vu32, Vv32)")
+DEF_ENC(V6_vaddbsat,    ICLASS_CJ" 1 111 000 vvvvv PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vminub,         ICLASS_CJ" 1 111 000 vvvvv PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vminuh,         ICLASS_CJ" 1 111 000 vvvvv PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vminh,         ICLASS_CJ" 1 111 000 vvvvv PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vminw,         ICLASS_CJ" 1 111 000 vvvvv PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vmaxub,         ICLASS_CJ" 1 111 000 vvvvv PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vmaxuh,         ICLASS_CJ" 1 111 000 vvvvv PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vmaxh,         ICLASS_CJ" 1 111 000 vvvvv PP 0 uuuuu 111 ddddd") //
+
+
+DEF_ENC(V6_vaddclbh,    ICLASS_CJ" 1 111 000 vvvvv PP 1 uuuuu 000 ddddd") //
+DEF_ENC(V6_vaddclbw,    ICLASS_CJ" 1 111 000 vvvvv PP 1 uuuuu 001 ddddd") //
+
+DEF_ENC(V6_vavguw,        ICLASS_CJ" 1 111 000 vvvvv PP 1 uuuuu 010 ddddd") //
+DEF_ENC(V6_vavguwrnd,    ICLASS_CJ" 1 111 000 vvvvv PP 1 uuuuu 011 ddddd") //
+DEF_ENC(V6_vavgb,        ICLASS_CJ" 1 111 000 vvvvv PP 1 uuuuu 100 ddddd") //
+DEF_ENC(V6_vavgbrnd,    ICLASS_CJ" 1 111 000 vvvvv PP 1 uuuuu 101 ddddd") //
+DEF_ENC(V6_vnavgb,        ICLASS_CJ" 1 111 000 vvvvv PP 1 uuuuu 110 ddddd") //
+
+
+DEF_ENC(V6_vmaxw,         ICLASS_CJ" 1 111 001 vvvvv PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vdelta,         ICLASS_CJ" 1 111 001 vvvvv PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vsubbsat,    ICLASS_CJ" 1 111 001 vvvvv PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vrdelta,     ICLASS_CJ" 1 111 001 vvvvv PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vminb,         ICLASS_CJ" 1 111 001 vvvvv PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vmaxb,         ICLASS_CJ" 1 111 001 vvvvv PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vsatuwuh,    ICLASS_CJ" 1 111 001 vvvvv PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vdealb4w,     ICLASS_CJ" 1 111 001 vvvvv PP 0 uuuuu 111 ddddd") //
+
+
+DEF_ENC(V6_vmpyowh_rnd,     ICLASS_CJ" 1 111 010 vvvvv PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vshuffeb,      ICLASS_CJ" 1 111 010 vvvvv PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vshuffob,      ICLASS_CJ" 1 111 010 vvvvv PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vshufeh,      ICLASS_CJ" 1 111 010 vvvvv PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vshufoh,      ICLASS_CJ" 1 111 010 vvvvv PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vshufoeh,      ICLASS_CJ" 1 111 010 vvvvv PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vshufoeb,      ICLASS_CJ" 1 111 010 vvvvv PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vcombine,     ICLASS_CJ" 1 111 010 vvvvv PP 0 uuuuu 111 ddddd") //
+
+DEF_ENC(V6_vmpyieoh,     ICLASS_CJ" 1 111 011 vvvvv PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vadduwsat,     ICLASS_CJ" 1 111 011 vvvvv PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vsathub,     ICLASS_CJ" 1 111 011 vvvvv PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vsatwh,         ICLASS_CJ" 1 111 011 vvvvv PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vroundwh,    ICLASS_CJ" 1 111 011 vvvvv PP 0 uuuuu 100 ddddd")
+DEF_ENC(V6_vroundwuh,    ICLASS_CJ" 1 111 011 vvvvv PP 0 uuuuu 101 ddddd")
+DEF_ENC(V6_vroundhb,    ICLASS_CJ" 1 111 011 vvvvv PP 0 uuuuu 110 ddddd")
+DEF_ENC(V6_vroundhub,    ICLASS_CJ" 1 111 011 vvvvv PP 0 uuuuu 111 ddddd")
+
+DEF_FIELDROW_DESC32(    ICLASS_CJ" 1 111 100 ----- PP - ----- ----- ---","[#7] Qd4=(Vu32, Vv32)")
+DEF_ENC(V6_veqb,         ICLASS_CJ" 1 111 100 vvvvv PP 0 uuuuu 000 000dd") //
+DEF_ENC(V6_veqh,         ICLASS_CJ" 1 111 100 vvvvv PP 0 uuuuu 000 001dd") //
+DEF_ENC(V6_veqw,         ICLASS_CJ" 1 111 100 vvvvv PP 0 uuuuu 000 010dd") //
+
+DEF_ENC(V6_vgtb,         ICLASS_CJ" 1 111 100 vvvvv PP 0 uuuuu 000 100dd") //
+DEF_ENC(V6_vgth,         ICLASS_CJ" 1 111 100 vvvvv PP 0 uuuuu 000 101dd") //
+DEF_ENC(V6_vgtw,         ICLASS_CJ" 1 111 100 vvvvv PP 0 uuuuu 000 110dd") //
+
+DEF_ENC(V6_vgtub,         ICLASS_CJ" 1 111 100 vvvvv PP 0 uuuuu 001 000dd") //
+DEF_ENC(V6_vgtuh,         ICLASS_CJ" 1 111 100 vvvvv PP 0 uuuuu 001 001dd") //
+DEF_ENC(V6_vgtuw,         ICLASS_CJ" 1 111 100 vvvvv PP 0 uuuuu 001 010dd") //
+
+
+DEF_ENC(V6_vasrwv,         ICLASS_CJ" 1 111 101 vvvvv PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vlsrwv,         ICLASS_CJ" 1 111 101 vvvvv PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vlsrhv,         ICLASS_CJ" 1 111 101 vvvvv PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vasrhv,         ICLASS_CJ" 1 111 101 vvvvv PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vaslwv,         ICLASS_CJ" 1 111 101 vvvvv PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vaslhv,         ICLASS_CJ" 1 111 101 vvvvv PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vaddb,         ICLASS_CJ" 1 111 101 vvvvv PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vaddh,         ICLASS_CJ" 1 111 101 vvvvv PP 0 uuuuu 111 ddddd") //
+
+
+DEF_ENC(V6_vmpyiewuh,     ICLASS_CJ" 1 111 110 vvvvv PP 0 uuuuu 000 ddddd")
+DEF_ENC(V6_vmpyiowh,    ICLASS_CJ" 1 111 110 vvvvv PP 0 uuuuu 001 ddddd")
+DEF_ENC(V6_vpackeb,     ICLASS_CJ" 1 111 110 vvvvv PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vpackeh,     ICLASS_CJ" 1 111 110 vvvvv PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vsubuwsat,     ICLASS_CJ" 1 111 110 vvvvv PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vpackhub_sat,ICLASS_CJ" 1 111 110 vvvvv PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vpackhb_sat, ICLASS_CJ" 1 111 110 vvvvv PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vpackwuh_sat,ICLASS_CJ" 1 111 110 vvvvv PP 0 uuuuu 111 ddddd") //
+
+DEF_ENC(V6_vpackwh_sat, ICLASS_CJ" 1 111 111 vvvvv PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vpackob,     ICLASS_CJ" 1 111 111 vvvvv PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vpackoh,     ICLASS_CJ" 1 111 111 vvvvv PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vrounduhub,     ICLASS_CJ" 1 111 111 vvvvv PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vrounduwuh,     ICLASS_CJ" 1 111 111 vvvvv PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vmpyewuh,    ICLASS_CJ" 1 111 111 vvvvv PP 0 uuuuu 101 ddddd")
+DEF_ENC(V6_vmpyowh,        ICLASS_CJ" 1 111 111 vvvvv PP 0 uuuuu 111 ddddd")
+
+
+#endif /* NO MMVEC */
-- 
2.7.4


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 17/20] Hexagon HVX (tests/tcg/hexagon) vector_add_int test
  2021-07-05 23:34 [PATCH 00/20] Hexagon HVX (target/hexagon) patch series Taylor Simpson
                   ` (15 preceding siblings ...)
  2021-07-05 23:34 ` [PATCH 16/20] Hexagon HVX (target/hexagon) import instruction encodings Taylor Simpson
@ 2021-07-05 23:34 ` Taylor Simpson
  2021-07-05 23:34 ` [PATCH 18/20] Hexagon HVX (tests/tcg/hexagon) hvx_misc test Taylor Simpson
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 40+ messages in thread
From: Taylor Simpson @ 2021-07-05 23:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: ale, peter.maydell, bcain, richard.henderson, tsimpson, philmd

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 tests/tcg/hexagon/vector_add_int.c | 61 ++++++++++++++++++++++++++++++++++++++
 tests/tcg/hexagon/Makefile.target  |  3 ++
 2 files changed, 64 insertions(+)
 create mode 100644 tests/tcg/hexagon/vector_add_int.c

diff --git a/tests/tcg/hexagon/vector_add_int.c b/tests/tcg/hexagon/vector_add_int.c
new file mode 100644
index 0000000..d6010ea
--- /dev/null
+++ b/tests/tcg/hexagon/vector_add_int.c
@@ -0,0 +1,61 @@
+/*
+ *  Copyright(c) 2019-2021 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <stdio.h>
+
+int gA[401];
+int gB[401];
+int gC[401];
+
+void vector_add_int()
+{
+  int i;
+  for (i = 0; i < 400; i++) {
+    gA[i] = gB[i] + gC[i];
+  }
+}
+
+int main()
+{
+  int error = 0;
+  int i;
+  for (i = 0; i < 400; i++) {
+    gB[i] = i * 2;
+    gC[i] = i * 3;
+  }
+  gA[400] = 17;
+  vector_add_int();
+  for (i = 0; i < 400; i++) {
+    if (gA[i] != i * 5) {
+        error++;
+        printf("ERROR: gB[%d] = %d\t", i, gB[i]);
+        printf("gC[%d] = %d\t", i, gC[i]);
+        printf("gA[%d] = %d\n", i, gA[i]);
+    }
+  }
+  if (gA[400] != 17) {
+    error++;
+    printf("ERROR: Overran the buffer\n");
+  }
+  if (!error) {
+    printf("PASS\n");
+    return 0;
+  } else {
+    printf("FAIL\n");
+    return 1;
+  }
+}
diff --git a/tests/tcg/hexagon/Makefile.target b/tests/tcg/hexagon/Makefile.target
index 0992787..88db7dd 100644
--- a/tests/tcg/hexagon/Makefile.target
+++ b/tests/tcg/hexagon/Makefile.target
@@ -46,7 +46,10 @@ HEX_TESTS += circ
 HEX_TESTS += brev
 HEX_TESTS += load_unpack
 HEX_TESTS += load_align
+HEX_TESTS += vector_add_int
 HEX_TESTS += atomics
 HEX_TESTS += fpstuff
 
 TESTS += $(HEX_TESTS)
+
+vector_add_int: CFLAGS += -mhvx -fvectorize
-- 
2.7.4


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 18/20] Hexagon HVX (tests/tcg/hexagon) hvx_misc test
  2021-07-05 23:34 [PATCH 00/20] Hexagon HVX (target/hexagon) patch series Taylor Simpson
                   ` (16 preceding siblings ...)
  2021-07-05 23:34 ` [PATCH 17/20] Hexagon HVX (tests/tcg/hexagon) vector_add_int test Taylor Simpson
@ 2021-07-05 23:34 ` Taylor Simpson
  2021-07-05 23:34 ` [PATCH 19/20] Hexagon HVX (tests/tcg/hexagon) scatter_gather test Taylor Simpson
  2021-07-05 23:34 ` [PATCH 20/20] Hexagon HVX (tests/tcg/hexagon) histogram test Taylor Simpson
  19 siblings, 0 replies; 40+ messages in thread
From: Taylor Simpson @ 2021-07-05 23:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: ale, peter.maydell, bcain, richard.henderson, tsimpson, philmd

Tests for
    packet semantics
    vector loads (aligned and unaligned)
    vector stores (aligned and unaligned)
    vector masked stores
    vector operations

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 tests/tcg/hexagon/hvx_misc.c      | 331 ++++++++++++++++++++++++++++++++++++++
 tests/tcg/hexagon/Makefile.target |   2 +
 2 files changed, 333 insertions(+)
 create mode 100644 tests/tcg/hexagon/hvx_misc.c

diff --git a/tests/tcg/hexagon/hvx_misc.c b/tests/tcg/hexagon/hvx_misc.c
new file mode 100644
index 0000000..b226134
--- /dev/null
+++ b/tests/tcg/hexagon/hvx_misc.c
@@ -0,0 +1,331 @@
+/*
+ *  Copyright(c) 2021 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <stdio.h>
+#include <stdint.h>
+#include <string.h>
+
+int err;
+
+static void __check(int line, uint64_t result, uint64_t expect)
+{
+    if (result != expect) {
+        printf("ERROR at line %d: 0x%016llx != 0x%016llx\n",
+               line, result, expect);
+        err++;
+    }
+}
+
+#define check(RES, EXP) __check(__LINE__, RES, EXP)
+
+#define MAX_VEC_SIZE_BYTES         128
+
+typedef union {
+    uint64_t ud[MAX_VEC_SIZE_BYTES / 8];
+    int64_t   d[MAX_VEC_SIZE_BYTES / 8];
+    uint32_t uw[MAX_VEC_SIZE_BYTES / 4];
+    int32_t   w[MAX_VEC_SIZE_BYTES / 4];
+    uint16_t uh[MAX_VEC_SIZE_BYTES / 2];
+    int16_t   h[MAX_VEC_SIZE_BYTES / 2];
+    uint8_t  ub[MAX_VEC_SIZE_BYTES / 1];
+    int8_t    b[MAX_VEC_SIZE_BYTES / 1];
+} MMVector;
+
+#define BUFSIZE      16
+#define OUTSIZE      16
+#define MASKMOD      3
+
+MMVector buffer0[BUFSIZE] __attribute__((aligned(MAX_VEC_SIZE_BYTES)));
+MMVector buffer1[BUFSIZE] __attribute__((aligned(MAX_VEC_SIZE_BYTES)));
+MMVector mask[BUFSIZE] __attribute__((aligned(MAX_VEC_SIZE_BYTES)));
+MMVector output[OUTSIZE] __attribute__((aligned(MAX_VEC_SIZE_BYTES)));
+MMVector expect[OUTSIZE] __attribute__((aligned(MAX_VEC_SIZE_BYTES)));
+
+#define CHECK_OUTPUT_FUNC(FIELD, FIELDSZ) \
+static void check_output_##FIELD(int line, size_t num_vectors) \
+{ \
+    for (int i = 0; i < num_vectors; i++) { \
+        for (int j = 0; j < MAX_VEC_SIZE_BYTES / FIELDSZ; j++) { \
+            __check(line, output[i].FIELD[j], expect[i].FIELD[j]); \
+        } \
+    } \
+}
+
+CHECK_OUTPUT_FUNC(d,  8)
+CHECK_OUTPUT_FUNC(w,  4)
+CHECK_OUTPUT_FUNC(h,  2)
+CHECK_OUTPUT_FUNC(b,  1)
+
+static void init_buffers(void)
+{
+    int counter0 = 0;
+    int counter1 = 17;
+    for (int i = 0; i < BUFSIZE; i++) {
+        for (int j = 0; j < MAX_VEC_SIZE_BYTES; j++) {
+            buffer0[i].b[j] = counter0++;
+            buffer1[i].b[j] = counter1++;
+        }
+        for (int j = 0; j < MAX_VEC_SIZE_BYTES / 4; j++) {
+            mask[i].w[j] = (i + j % MASKMOD == 0) ? 0 : 1;
+        }
+    }
+}
+
+static void test_load_tmp(void)
+{
+    void *p0 = buffer0;
+    void *p1 = buffer1;
+    void *pout = output;
+
+    for (int i = 0; i < BUFSIZE; i++) {
+        /*
+         * Load into v2 as .tmp, then ues it in the next packet
+         * Should get the new value within the same packet and
+         * the old value in the next packet
+         */
+        asm("v3 = vmem(%0 + #0)\n\t"
+            "r1 = #1\n\t"
+            "v2 = vsplat(r1)\n\t"
+            "{\n\t"
+            "    v2.tmp = vmem(%1 + #0)\n\t"
+            "    v4.w = vadd(v2.w, v3.w)\n\t"
+            "}\n\t"
+            "v4.w = vadd(v4.w, v2.w)\n\t"
+            "vmem(%2 + #0) = v4\n\t"
+            : : "r"(p0), "r"(p1), "r"(pout)
+            : "r1", "v2", "v3", "v4", "v6", "memory");
+        p0 += sizeof(MMVector);
+        p1 += sizeof(MMVector);
+        pout += sizeof(MMVector);
+
+        for (int j = 0; j < MAX_VEC_SIZE_BYTES / 4; j++) {
+            expect[i].w[j] = buffer0[i].w[j] + buffer1[i].w[j] + 1;
+        }
+    }
+
+    check_output_w(__LINE__, BUFSIZE);
+}
+
+static void test_load_cur(void)
+{
+    void *p0 = buffer0;
+    void *pout = output;
+
+    for (int i = 0; i < BUFSIZE; i++) {
+        asm("{\n\t"
+            "    v2.cur = vmem(%0 + #0)\n\t"
+            "    vmem(%1 + #0) = v2\n\t"
+            "}\n\t"
+            : : "r"(p0), "r"(pout) : "v2", "memory");
+        p0 += sizeof(MMVector);
+        pout += sizeof(MMVector);
+
+        for (int j = 0; j < MAX_VEC_SIZE_BYTES / 4; j++) {
+            expect[i].uw[j] = buffer0[i].uw[j];
+        }
+    }
+
+    check_output_w(__LINE__, BUFSIZE);
+}
+
+static void test_load_aligned(void)
+{
+    /* Aligned loads ignore the low bits of the address */
+    void *p0 = buffer0;
+    void *pout = output;
+    const size_t offset = 13;
+
+    p0 += offset;    /* Create an unaligned address */
+    asm("v2 = vmem(%0 + #0)\n\t"
+        "vmem(%1 + #0) = v2\n\t"
+        : : "r"(p0), "r"(pout) : "v2", "memory");
+
+    expect[0] = buffer0[0];
+
+    check_output_w(__LINE__, 1);
+}
+
+static void test_load_unaligned(void)
+{
+    void *p0 = buffer0;
+    void *pout = output;
+    const size_t offset = 12;
+
+    p0 += offset;    /* Create an unaligned address */
+    asm("v2 = vmemu(%0 + #0)\n\t"
+        "vmem(%1 + #0) = v2\n\t"
+        : : "r"(p0), "r"(pout) : "v2", "memory");
+
+    memcpy(expect, &buffer0[0].ub[offset], sizeof(MMVector));
+
+    check_output_w(__LINE__, 1);
+}
+
+static void test_store_aligned(void)
+{
+    /* Aligned stores ignore the low bits of the address */
+    void *p0 = buffer0;
+    void *pout = output;
+    const size_t offset = 13;
+
+    pout += offset;    /* Create an unaligned address */
+    asm("v2 = vmem(%0 + #0)\n\t"
+        "vmem(%1 + #0) = v2\n\t"
+        : : "r"(p0), "r"(pout) : "v2", "memory");
+
+    expect[0] = buffer0[0];
+
+    check_output_w(__LINE__, 1);
+}
+
+static void test_store_unaligned(void)
+{
+    void *p0 = buffer0;
+    void *pout = output;
+    const size_t offset = 12;
+
+    pout += offset;    /* Create an unaligned address */
+    asm("v2 = vmem(%0 + #0)\n\t"
+        "vmemu(%1 + #0) = v2\n\t"
+        : : "r"(p0), "r"(pout) : "v2", "memory");
+
+    memcpy(expect, buffer0, 2 * sizeof(MMVector));
+    memcpy(&expect[0].ub[offset], buffer0, sizeof(MMVector));
+
+    check_output_w(__LINE__, 2);
+}
+
+static void test_masked_store(void)
+{
+    void *p0 = buffer0;
+    void *pmask = mask;
+    void *pout = output;
+
+    memset(expect, 0xff, sizeof(expect));
+    memset(output, 0xff, sizeof(expect));
+
+    for (int i = 0; i < BUFSIZE; i++) {
+        asm("r4 = #0\n\t"
+            "v4 = vsplat(r4)\n\t"
+            "v5 = vmem(%0 + #0)\n\t"
+            "q0 = vcmp.eq(v4.w, v5.w)\n\t"
+            "v5 = vmem(%1)\n\t"
+            "if (q0) vmem(%2) = v5\n\t"
+            : : "r"(pmask), "r"(p0), "r"(pout)
+            : "r4", "v4", "v5", "q0", "memory");
+        p0 += sizeof(MMVector);
+        pmask += sizeof(MMVector);
+        pout += sizeof(MMVector);
+
+        for (int j = 0; j < MAX_VEC_SIZE_BYTES / 4; j++) {
+            if (i + j % MASKMOD == 0) {
+                expect[i].w[j] = buffer0[i].w[j];
+            }
+        }
+    }
+
+    check_output_w(__LINE__, BUFSIZE);
+}
+
+#define OP1(ASM, EL, IN, OUT) \
+    asm("v2 = vmem(%0 + #0)\n\t" \
+        "v2" #EL " = " #ASM "(v2" #EL ")\n\t" \
+        "vmem(%1 + #0) = v2\n\t" \
+        : : "r"(IN), "r"(OUT) : "v2", "memory")
+
+#define OP2(ASM, EL, IN0, IN1, OUT) \
+    asm("v2 = vmem(%0 + #0)\n\t" \
+        "v3 = vmem(%1 + #0)\n\t" \
+        "v2" #EL " = " #ASM "(v2" #EL ", v3" #EL ")\n\t" \
+        "vmem(%2 + #0) = v2\n\t" \
+        : : "r"(IN0), "r"(IN1), "r"(OUT) : "v2", "v3", "memory")
+
+#define TEST_OP1(NAME, ASM, EL, FIELD, FIELDSZ, OP) \
+static void test_##NAME(void) \
+{ \
+    void *pin = buffer0; \
+    void *pout = output; \
+    for (int i = 0; i < BUFSIZE; i++) { \
+        OP1(ASM, EL, pin, pout); \
+        pin += sizeof(MMVector); \
+        pout += sizeof(MMVector); \
+    } \
+    for (int i = 0; i < BUFSIZE; i++) { \
+        for (int j = 0; j < MAX_VEC_SIZE_BYTES / FIELDSZ; j++) { \
+            expect[i].FIELD[j] = OP buffer0[i].FIELD[j]; \
+        } \
+    } \
+    check_output_##FIELD(__LINE__, BUFSIZE); \
+}
+
+#define TEST_OP2(NAME, ASM, EL, FIELD, FIELDSZ, OP) \
+static void test_##NAME(void) \
+{ \
+    void *p0 = buffer0; \
+    void *p1 = buffer1; \
+    void *pout = output; \
+    for (int i = 0; i < BUFSIZE; i++) { \
+        OP2(ASM, EL, p0, p1, pout); \
+        p0 += sizeof(MMVector); \
+        p1 += sizeof(MMVector); \
+        pout += sizeof(MMVector); \
+    } \
+    for (int i = 0; i < BUFSIZE; i++) { \
+        for (int j = 0; j < MAX_VEC_SIZE_BYTES / FIELDSZ; j++) { \
+            expect[i].FIELD[j] = buffer0[i].FIELD[j] OP buffer1[i].FIELD[j]; \
+        } \
+    } \
+    check_output_##FIELD(__LINE__, BUFSIZE); \
+}
+
+TEST_OP2(vadd_w, vadd, .w, w, 4, +)
+TEST_OP2(vadd_h, vadd, .h, h, 2, +)
+TEST_OP2(vadd_b, vadd, .b, b, 1, +)
+TEST_OP2(vsub_w, vsub, .w, w, 4, -)
+TEST_OP2(vsub_h, vsub, .h, h, 2, -)
+TEST_OP2(vsub_b, vsub, .b, b, 1, -)
+TEST_OP2(vxor, vxor, , d, 8, ^)
+TEST_OP2(vand, vand, , d, 8, &)
+TEST_OP2(vor, vor, , d, 8, |)
+TEST_OP1(vnot, vnot, , d, 8, ~)
+
+int main()
+{
+    init_buffers();
+
+    test_load_tmp();
+    test_load_cur();
+    test_load_aligned();
+    test_load_unaligned();
+    test_store_aligned();
+    test_store_unaligned();
+    test_masked_store();
+
+    test_vadd_w();
+    test_vadd_h();
+    test_vadd_b();
+    test_vsub_w();
+    test_vsub_h();
+    test_vsub_b();
+    test_vxor();
+    test_vand();
+    test_vor();
+    test_vnot();
+
+    puts(err ? "FAIL" : "PASS");
+    return err ? 1 : 0;
+}
diff --git a/tests/tcg/hexagon/Makefile.target b/tests/tcg/hexagon/Makefile.target
index 88db7dd..6ac9ac6 100644
--- a/tests/tcg/hexagon/Makefile.target
+++ b/tests/tcg/hexagon/Makefile.target
@@ -49,7 +49,9 @@ HEX_TESTS += load_align
 HEX_TESTS += vector_add_int
 HEX_TESTS += atomics
 HEX_TESTS += fpstuff
+HEX_TESTS += hvx_misc
 
 TESTS += $(HEX_TESTS)
 
 vector_add_int: CFLAGS += -mhvx -fvectorize
+hvx_misc: CFLAGS += -mhvx
-- 
2.7.4


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 19/20] Hexagon HVX (tests/tcg/hexagon) scatter_gather test
  2021-07-05 23:34 [PATCH 00/20] Hexagon HVX (target/hexagon) patch series Taylor Simpson
                   ` (17 preceding siblings ...)
  2021-07-05 23:34 ` [PATCH 18/20] Hexagon HVX (tests/tcg/hexagon) hvx_misc test Taylor Simpson
@ 2021-07-05 23:34 ` Taylor Simpson
  2021-07-05 23:34 ` [PATCH 20/20] Hexagon HVX (tests/tcg/hexagon) histogram test Taylor Simpson
  19 siblings, 0 replies; 40+ messages in thread
From: Taylor Simpson @ 2021-07-05 23:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: ale, peter.maydell, bcain, richard.henderson, tsimpson, philmd

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 tests/tcg/hexagon/scatter_gather.c | 1011 ++++++++++++++++++++++++++++++++++++
 tests/tcg/hexagon/Makefile.target  |    2 +
 2 files changed, 1013 insertions(+)
 create mode 100644 tests/tcg/hexagon/scatter_gather.c

diff --git a/tests/tcg/hexagon/scatter_gather.c b/tests/tcg/hexagon/scatter_gather.c
new file mode 100644
index 0000000..b93eb18
--- /dev/null
+++ b/tests/tcg/hexagon/scatter_gather.c
@@ -0,0 +1,1011 @@
+/*
+ *  Copyright(c) 2019-2021 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+/*
+ * This example tests the HVX scatter/gather instructions
+ *
+ * See section 5.13 of the V68 HVX Programmer's Reference
+ *
+ * There are 3 main classes operations
+ *     _16                 16-bit elements and 16-bit offsets
+ *     _32                 32-bit elements and 32-bit offsets
+ *     _16_32              16-bit elements and 32-bit offsets
+ *
+ * There are also masked and accumulate versions
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <stdlib.h>
+#include <inttypes.h>
+
+typedef long HVX_Vector       __attribute__((__vector_size__(128)))
+                              __attribute__((aligned(128)));
+typedef long HVX_VectorPair   __attribute__((__vector_size__(256)))
+                              __attribute__((aligned(128)));
+typedef long HVX_VectorPred   __attribute__((__vector_size__(128)))
+                              __attribute__((aligned(128)));
+
+#define VSCATTER_16(BASE, RGN, OFF, VALS) \
+    __builtin_HEXAGON_V6_vscattermh_128B((int)BASE, RGN, OFF, VALS)
+#define VSCATTER_16_MASKED(MASK, BASE, RGN, OFF, VALS) \
+    __builtin_HEXAGON_V6_vscattermhq_128B(MASK, (int)BASE, RGN, OFF, VALS)
+#define VSCATTER_32(BASE, RGN, OFF, VALS) \
+    __builtin_HEXAGON_V6_vscattermw_128B((int)BASE, RGN, OFF, VALS)
+#define VSCATTER_32_MASKED(MASK, BASE, RGN, OFF, VALS) \
+    __builtin_HEXAGON_V6_vscattermwq_128B(MASK, (int)BASE, RGN, OFF, VALS)
+#define VSCATTER_16_32(BASE, RGN, OFF, VALS) \
+    __builtin_HEXAGON_V6_vscattermhw_128B((int)BASE, RGN, OFF, VALS)
+#define VSCATTER_16_32_MASKED(MASK, BASE, RGN, OFF, VALS) \
+    __builtin_HEXAGON_V6_vscattermhwq_128B(MASK, (int)BASE, RGN, OFF, VALS)
+#define VSCATTER_16_ACC(BASE, RGN, OFF, VALS) \
+    __builtin_HEXAGON_V6_vscattermh_add_128B((int)BASE, RGN, OFF, VALS)
+#define VSCATTER_32_ACC(BASE, RGN, OFF, VALS) \
+    __builtin_HEXAGON_V6_vscattermw_add_128B((int)BASE, RGN, OFF, VALS)
+#define VSCATTER_16_32_ACC(BASE, RGN, OFF, VALS) \
+    __builtin_HEXAGON_V6_vscattermhw_add_128B((int)BASE, RGN, OFF, VALS)
+
+#define VGATHER_16(DSTADDR, BASE, RGN, OFF) \
+    __builtin_HEXAGON_V6_vgathermh_128B(DSTADDR, (int)BASE, RGN, OFF)
+#define VGATHER_16_MASKED(DSTADDR, MASK, BASE, RGN, OFF) \
+    __builtin_HEXAGON_V6_vgathermhq_128B(DSTADDR, MASK, (int)BASE, RGN, OFF)
+#define VGATHER_32(DSTADDR, BASE, RGN, OFF) \
+    __builtin_HEXAGON_V6_vgathermw_128B(DSTADDR, (int)BASE, RGN, OFF)
+#define VGATHER_32_MASKED(DSTADDR, MASK, BASE, RGN, OFF) \
+    __builtin_HEXAGON_V6_vgathermwq_128B(DSTADDR, MASK, (int)BASE, RGN, OFF)
+#define VGATHER_16_32(DSTADDR, BASE, RGN, OFF) \
+    __builtin_HEXAGON_V6_vgathermhw_128B(DSTADDR, (int)BASE, RGN, OFF)
+#define VGATHER_16_32_MASKED(DSTADDR, MASK, BASE, RGN, OFF) \
+    __builtin_HEXAGON_V6_vgathermhwq_128B(DSTADDR, MASK, (int)BASE, RGN, OFF)
+
+#define VSHUFF_H(V) \
+    __builtin_HEXAGON_V6_vshuffh_128B(V)
+#define VSPLAT_H(X) \
+    __builtin_HEXAGON_V6_lvsplath_128B(X)
+#define VAND_VAL(PRED, VAL) \
+    __builtin_HEXAGON_V6_vandvrt_128B(PRED, VAL)
+#define VDEAL_H(V) \
+    __builtin_HEXAGON_V6_vdealh_128B(V)
+
+int err;
+
+/* define the number of rows/cols in a square matrix */
+#define MATRIX_SIZE 64
+
+/* define the size of the scatter buffer */
+#define SCATTER_BUFFER_SIZE (MATRIX_SIZE * MATRIX_SIZE)
+
+/* fake vtcm - put buffers together and force alignment */
+static struct {
+    unsigned short vscatter16[SCATTER_BUFFER_SIZE];
+    unsigned short vgather16[MATRIX_SIZE];
+    unsigned int   vscatter32[SCATTER_BUFFER_SIZE];
+    unsigned int   vgather32[MATRIX_SIZE];
+    unsigned short vscatter16_32[SCATTER_BUFFER_SIZE];
+    unsigned short vgather16_32[MATRIX_SIZE];
+} vtcm __attribute__((aligned(0x10000)));
+
+/* declare the arrays of reference values */
+unsigned short vscatter16_ref[SCATTER_BUFFER_SIZE];
+unsigned short vgather16_ref[MATRIX_SIZE];
+unsigned int   vscatter32_ref[SCATTER_BUFFER_SIZE];
+unsigned int   vgather32_ref[MATRIX_SIZE];
+unsigned short vscatter16_32_ref[SCATTER_BUFFER_SIZE];
+unsigned short vgather16_32_ref[MATRIX_SIZE];
+
+/* declare the arrays of offsets */
+unsigned short half_offsets[MATRIX_SIZE];
+unsigned int   word_offsets[MATRIX_SIZE];
+
+/* declare the arrays of values */
+unsigned short half_values[MATRIX_SIZE];
+unsigned short half_values_acc[MATRIX_SIZE];
+unsigned short half_values_masked[MATRIX_SIZE];
+unsigned int   word_values[MATRIX_SIZE];
+unsigned int   word_values_acc[MATRIX_SIZE];
+unsigned int   word_values_masked[MATRIX_SIZE];
+
+/* declare the arrays of predicates */
+unsigned short half_predicates[MATRIX_SIZE];
+unsigned int   word_predicates[MATRIX_SIZE];
+
+/* make this big enough for all the intrinsics */
+const size_t region_len = sizeof(vtcm);
+
+/* optionally add sync instructions */
+#define SYNC_VECTOR 1
+
+static void sync_scatter(void *addr)
+{
+#if SYNC_VECTOR
+    /*
+     * Do the scatter release followed by a dummy load to complete the
+     * synchronization.  Normally the dummy load would be deferred as
+     * long as possible to minimize stalls.
+     */
+    asm volatile("vmem(%0 + #0):scatter_release\n" : : "r"(addr));
+    /* use volatile to force the load */
+    volatile HVX_Vector vDummy = *(HVX_Vector *)addr; vDummy = vDummy;
+#endif
+}
+
+static void sync_gather(void *addr)
+{
+#if SYNC_VECTOR
+    /* use volatile to force the load */
+    volatile HVX_Vector vDummy = *(HVX_Vector *)addr; vDummy = vDummy;
+#endif
+}
+
+/* optionally print the results */
+#define PRINT_DATA 0
+
+#define FILL_CHAR       '.'
+
+/* fill vtcm scratch with ee */
+void prefill_vtcm_scratch(void)
+{
+    memset(&vtcm, FILL_CHAR, sizeof(vtcm));
+}
+
+/* create byte offsets to be a diagonal of the matrix with 16 bit elements */
+void create_offsets_values_preds_16(void)
+{
+    unsigned short half_element = 0;
+    unsigned short half_element_masked = 0;
+    char letter = 'A';
+    char letter_masked = '@';
+
+    for (int i = 0; i < MATRIX_SIZE; i++) {
+        half_offsets[i] = i * (2 * MATRIX_SIZE + 2);
+
+        half_element = 0;
+        half_element_masked = 0;
+        for (int j = 0; j < 2; j++) {
+            half_element |= letter << j * 8;
+            half_element_masked |= letter_masked << j * 8;
+        }
+
+        half_values[i] = half_element;
+        half_values_acc[i] = ((i % 10) << 8) + (i % 10);
+        half_values_masked[i] = half_element_masked;
+
+        letter++;
+        /* reset to 'A' */
+        if (letter == 'M') {
+            letter = 'A';
+        }
+
+        half_predicates[i] = (i % 3 == 0 || i % 5 == 0) ? ~0 : 0;
+    }
+}
+
+/* create byte offsets to be a diagonal of the matrix with 32 bit elements */
+void create_offsets_values_preds_32(void)
+{
+    unsigned int word_element = 0;
+    unsigned int word_element_masked = 0;
+    char letter = 'A';
+    char letter_masked = '&';
+
+    for (int i = 0; i < MATRIX_SIZE; i++) {
+        word_offsets[i] = i * (4 * MATRIX_SIZE + 4);
+
+        word_element = 0;
+        word_element_masked = 0;
+        for (int j = 0; j < 4; j++) {
+            word_element |= letter << j * 8;
+            word_element_masked |= letter_masked << j * 8;
+        }
+
+        word_values[i] = word_element;
+        word_values_acc[i] = ((i % 10) << 8) + (i % 10);
+        word_values_masked[i] = word_element_masked;
+
+        letter++;
+        /* reset to 'A' */
+        if (letter == 'M') {
+            letter = 'A';
+        }
+
+        word_predicates[i] = (i % 4 == 0 || i % 7 == 0) ? ~0 : 0;
+    }
+}
+
+/*
+ * create byte offsets to be a diagonal of the matrix with 16 bit elements
+ * and 32 bit offsets
+ */
+void create_offsets_values_preds_16_32(void)
+{
+    unsigned short half_element = 0;
+    unsigned short half_element_masked = 0;
+    char letter = 'D';
+    char letter_masked = '$';
+
+    for (int i = 0; i < MATRIX_SIZE; i++) {
+        word_offsets[i] = i * (2 * MATRIX_SIZE + 2);
+
+        half_element = 0;
+        half_element_masked = 0;
+        for (int j = 0; j < 2; j++) {
+            half_element |= letter << j * 8;
+            half_element_masked |= letter_masked << j * 8;
+        }
+
+        half_values[i] = half_element;
+        half_values_acc[i] = ((i % 10) << 8) + (i % 10);
+        half_values_masked[i] = half_element_masked;
+
+        letter++;
+        /* reset to 'A' */
+        if (letter == 'P') {
+            letter = 'D';
+        }
+
+        half_predicates[i] = (i % 2 == 0 || i % 13 == 0) ? ~0 : 0;
+    }
+}
+
+/* scatter the 16 bit elements using intrinsics */
+void vector_scatter_16(void)
+{
+    /* copy the offsets and values to vectors */
+    HVX_Vector offsets = *(HVX_Vector *)half_offsets;
+    HVX_Vector values = *(HVX_Vector *)half_values;
+
+    VSCATTER_16(&vtcm.vscatter16, region_len, offsets, values);
+
+    sync_scatter(vtcm.vscatter16);
+}
+
+/* scatter-accumulate the 16 bit elements using intrinsics */
+void vector_scatter_16_acc(void)
+{
+    /* copy the offsets and values to vectors */
+    HVX_Vector offsets = *(HVX_Vector *)half_offsets;
+    HVX_Vector values = *(HVX_Vector *)half_values_acc;
+
+    VSCATTER_16_ACC(&vtcm.vscatter16, region_len, offsets, values);
+
+    sync_scatter(vtcm.vscatter16);
+}
+
+/* scatter the 16 bit elements using intrinsics */
+void vector_scatter_16_masked(void)
+{
+    /* copy the offsets and values to vectors */
+    HVX_Vector offsets = *(HVX_Vector *)half_offsets;
+    HVX_Vector values = *(HVX_Vector *)half_values_masked;
+    HVX_Vector pred_reg = *(HVX_Vector *)half_predicates;
+    HVX_VectorPred preds = VAND_VAL(pred_reg, ~0);
+
+    VSCATTER_16_MASKED(preds, &vtcm.vscatter16, region_len, offsets, values);
+
+    sync_scatter(vtcm.vscatter16);
+}
+
+/* scatter the 32 bit elements using intrinsics */
+void vector_scatter_32(void)
+{
+    /* copy the offsets and values to vectors */
+    HVX_Vector offsetslo = *(HVX_Vector *)word_offsets;
+    HVX_Vector offsetshi = *(HVX_Vector *)&word_offsets[MATRIX_SIZE / 2];
+    HVX_Vector valueslo = *(HVX_Vector *)word_values;
+    HVX_Vector valueshi = *(HVX_Vector *)&word_values[MATRIX_SIZE / 2];
+
+    VSCATTER_32(&vtcm.vscatter32, region_len, offsetslo, valueslo);
+    VSCATTER_32(&vtcm.vscatter32, region_len, offsetshi, valueshi);
+
+    sync_scatter(vtcm.vscatter32);
+}
+
+/* scatter-acc the 32 bit elements using intrinsics */
+void vector_scatter_32_acc(void)
+{
+    /* copy the offsets and values to vectors */
+    HVX_Vector offsetslo = *(HVX_Vector *)word_offsets;
+    HVX_Vector offsetshi = *(HVX_Vector *)&word_offsets[MATRIX_SIZE / 2];
+    HVX_Vector valueslo = *(HVX_Vector *)word_values_acc;
+    HVX_Vector valueshi = *(HVX_Vector *)&word_values_acc[MATRIX_SIZE / 2];
+
+    VSCATTER_32_ACC(&vtcm.vscatter32, region_len, offsetslo, valueslo);
+    VSCATTER_32_ACC(&vtcm.vscatter32, region_len, offsetshi, valueshi);
+
+    sync_scatter(vtcm.vscatter32);
+}
+
+/* scatter the 32 bit elements using intrinsics */
+void vector_scatter_32_masked(void)
+{
+    /* copy the offsets and values to vectors */
+    HVX_Vector offsetslo = *(HVX_Vector *)word_offsets;
+    HVX_Vector offsetshi = *(HVX_Vector *)&word_offsets[MATRIX_SIZE / 2];
+    HVX_Vector valueslo = *(HVX_Vector *)word_values_masked;
+    HVX_Vector valueshi = *(HVX_Vector *)&word_values_masked[MATRIX_SIZE / 2];
+    HVX_Vector pred_reglo = *(HVX_Vector *)word_predicates;
+    HVX_Vector pred_reghi = *(HVX_Vector *)&word_predicates[MATRIX_SIZE / 2];
+    HVX_VectorPred predslo = VAND_VAL(pred_reglo, ~0);
+    HVX_VectorPred predshi = VAND_VAL(pred_reghi, ~0);
+
+    VSCATTER_32_MASKED(predslo, &vtcm.vscatter32, region_len, offsetslo,
+                       valueslo);
+    VSCATTER_32_MASKED(predshi, &vtcm.vscatter32, region_len, offsetshi,
+                       valueshi);
+
+    sync_scatter(vtcm.vscatter16);
+}
+
+/* scatter the 16 bit elements with 32 bit offsets using intrinsics */
+void vector_scatter_16_32(void)
+{
+    HVX_VectorPair offsets;
+    HVX_Vector values;
+
+    /* get the word offsets in a vector pair */
+    offsets = *(HVX_VectorPair *)word_offsets;
+
+    /* these values need to be shuffled for the scatter */
+    values = *(HVX_Vector *)half_values;
+    values = VSHUFF_H(values);
+
+    VSCATTER_16_32(&vtcm.vscatter16_32, region_len, offsets, values);
+
+    sync_scatter(vtcm.vscatter16_32);
+}
+
+/* scatter-acc the 16 bit elements with 32 bit offsets using intrinsics */
+void vector_scatter_16_32_acc(void)
+{
+    HVX_VectorPair offsets;
+    HVX_Vector values;
+
+    /* get the word offsets in a vector pair */
+    offsets = *(HVX_VectorPair *)word_offsets;
+
+    /* these values need to be shuffled for the scatter */
+    values = *(HVX_Vector *)half_values_acc;
+    values = VSHUFF_H(values);
+
+    VSCATTER_16_32_ACC(&vtcm.vscatter16_32, region_len, offsets, values);
+
+    sync_scatter(vtcm.vscatter16_32);
+}
+
+/* masked scatter the 16 bit elements with 32 bit offsets using intrinsics */
+void vector_scatter_16_32_masked(void)
+{
+    HVX_VectorPair offsets;
+    HVX_Vector values;
+    HVX_Vector pred_reg;
+
+    /* get the word offsets in a vector pair */
+    offsets = *(HVX_VectorPair *)word_offsets;
+
+    /* these values need to be shuffled for the scatter */
+    values = *(HVX_Vector *)half_values_masked;
+    values = VSHUFF_H(values);
+
+    pred_reg = *(HVX_Vector *)half_predicates;
+    pred_reg = VSHUFF_H(pred_reg);
+    HVX_VectorPred preds = VAND_VAL(pred_reg, ~0);
+
+    VSCATTER_16_32_MASKED(preds, &vtcm.vscatter16_32, region_len, offsets,
+                          values);
+
+    sync_scatter(vtcm.vscatter16_32);
+}
+
+/* gather the elements from the scatter16 buffer */
+void vector_gather_16(void)
+{
+    HVX_Vector *vgather = (HVX_Vector *)&vtcm.vgather16;
+    HVX_Vector offsets = *(HVX_Vector *)half_offsets;
+
+    VGATHER_16(vgather, &vtcm.vscatter16, region_len, offsets);
+
+    sync_gather(vgather);
+}
+
+static unsigned short gather_16_masked_init(void)
+{
+    char letter = '?';
+    return letter | (letter << 8);
+}
+
+void vector_gather_16_masked(void)
+{
+    HVX_Vector *vgather = (HVX_Vector *)&vtcm.vgather16;
+    HVX_Vector offsets = *(HVX_Vector *)half_offsets;
+    HVX_Vector pred_reg = *(HVX_Vector *)half_predicates;
+    HVX_VectorPred preds = VAND_VAL(pred_reg, ~0);
+
+    *vgather = VSPLAT_H(gather_16_masked_init());
+    VGATHER_16_MASKED(vgather, preds, &vtcm.vscatter16, region_len, offsets);
+
+    sync_gather(vgather);
+}
+
+/* gather the elements from the scatter32 buffer */
+void vector_gather_32(void)
+{
+    HVX_Vector *vgatherlo = (HVX_Vector *)&vtcm.vgather32;
+    HVX_Vector *vgatherhi =
+        (HVX_Vector *)((int)&vtcm.vgather32 + (MATRIX_SIZE * 2));
+    HVX_Vector offsetslo = *(HVX_Vector *)word_offsets;
+    HVX_Vector offsetshi = *(HVX_Vector *)&word_offsets[MATRIX_SIZE / 2];
+
+    VGATHER_32(vgatherlo, &vtcm.vscatter32, region_len, offsetslo);
+    VGATHER_32(vgatherhi, &vtcm.vscatter32, region_len, offsetshi);
+
+    sync_gather(vgatherhi);
+}
+
+static unsigned int gather_32_masked_init(void)
+{
+    char letter = '?';
+    return letter | (letter << 8) | (letter << 16) | (letter << 24);
+}
+
+void vector_gather_32_masked(void)
+{
+    HVX_Vector *vgatherlo = (HVX_Vector *)&vtcm.vgather32;
+    HVX_Vector *vgatherhi =
+        (HVX_Vector *)((int)&vtcm.vgather32 + (MATRIX_SIZE * 2));
+    HVX_Vector offsetslo = *(HVX_Vector *)word_offsets;
+    HVX_Vector offsetshi = *(HVX_Vector *)&word_offsets[MATRIX_SIZE / 2];
+    HVX_Vector pred_reglo = *(HVX_Vector *)word_predicates;
+    HVX_VectorPred predslo = VAND_VAL(pred_reglo, ~0);
+    HVX_Vector pred_reghi = *(HVX_Vector *)&word_predicates[MATRIX_SIZE / 2];
+    HVX_VectorPred predshi = VAND_VAL(pred_reghi, ~0);
+
+    *vgatherlo = VSPLAT_H(gather_32_masked_init());
+    *vgatherhi = VSPLAT_H(gather_32_masked_init());
+    VGATHER_32_MASKED(vgatherlo, predslo, &vtcm.vscatter32, region_len,
+                      offsetslo);
+    VGATHER_32_MASKED(vgatherhi, predshi, &vtcm.vscatter32, region_len,
+                      offsetshi);
+
+    sync_gather(vgatherlo);
+    sync_gather(vgatherhi);
+}
+
+/* gather the elements from the scatter16_32 buffer */
+void vector_gather_16_32(void)
+{
+    HVX_Vector *vgather;
+    HVX_VectorPair offsets;
+    HVX_Vector values;
+
+    /* get the vtcm address to gather from */
+    vgather = (HVX_Vector *)&vtcm.vgather16_32;
+
+    /* get the word offsets in a vector pair */
+    offsets = *(HVX_VectorPair *)word_offsets;
+
+    VGATHER_16_32(vgather, &vtcm.vscatter16_32, region_len, offsets);
+
+    /* deal the elements to get the order back */
+    values = *(HVX_Vector *)vgather;
+    values = VDEAL_H(values);
+
+    /* write it back to vtcm address */
+    *(HVX_Vector *)vgather = values;
+}
+
+void vector_gather_16_32_masked(void)
+{
+    HVX_Vector *vgather;
+    HVX_VectorPair offsets;
+    HVX_Vector pred_reg;
+    HVX_VectorPred preds;
+    HVX_Vector values;
+
+    /* get the vtcm address to gather from */
+    vgather = (HVX_Vector *)&vtcm.vgather16_32;
+
+    /* get the word offsets in a vector pair */
+    offsets = *(HVX_VectorPair *)word_offsets;
+    pred_reg = *(HVX_Vector *)half_predicates;
+    pred_reg = VSHUFF_H(pred_reg);
+    preds = VAND_VAL(pred_reg, ~0);
+
+   *vgather = VSPLAT_H(gather_16_masked_init());
+   VGATHER_16_32_MASKED(vgather, preds, &vtcm.vscatter16_32, region_len,
+                        offsets);
+
+    /* deal the elements to get the order back */
+    values = *(HVX_Vector *)vgather;
+    values = VDEAL_H(values);
+
+    /* write it back to vtcm address */
+    *(HVX_Vector *)vgather = values;
+}
+
+static void check_buffer(const char *name, void *c, void *r, size_t size)
+{
+    char *check = (char *)c;
+    char *ref = (char *)r;
+    for (int i = 0; i < size; i++) {
+        if (check[i] != ref[i]) {
+            printf("ERROR %s [%d]: 0x%x (%c) != 0x%x (%c)\n", name, i,
+                   check[i], check[i], ref[i], ref[i]);
+            err++;
+        }
+    }
+}
+
+/*
+ * These scalar functions are the C equivalents of the vector functions that
+ * use HVX
+ */
+
+/* scatter the 16 bit elements using C */
+void scalar_scatter_16(unsigned short *vscatter16)
+{
+    for (int i = 0; i < MATRIX_SIZE; ++i) {
+        vscatter16[half_offsets[i] / 2] = half_values[i];
+    }
+}
+
+void check_scatter_16()
+{
+    memset(vscatter16_ref, FILL_CHAR,
+           SCATTER_BUFFER_SIZE * sizeof(unsigned short));
+    scalar_scatter_16(vscatter16_ref);
+    check_buffer(__func__, vtcm.vscatter16, vscatter16_ref,
+                 SCATTER_BUFFER_SIZE * sizeof(unsigned short));
+}
+
+/* scatter the 16 bit elements using C */
+void scalar_scatter_16_acc(unsigned short *vscatter16)
+{
+    for (int i = 0; i < MATRIX_SIZE; ++i) {
+        vscatter16[half_offsets[i] / 2] += half_values_acc[i];
+    }
+}
+
+void check_scatter_16_acc()
+{
+    memset(vscatter16_ref, FILL_CHAR,
+           SCATTER_BUFFER_SIZE * sizeof(unsigned short));
+    scalar_scatter_16(vscatter16_ref);
+    scalar_scatter_16_acc(vscatter16_ref);
+    check_buffer(__func__, vtcm.vscatter16, vscatter16_ref,
+                 SCATTER_BUFFER_SIZE * sizeof(unsigned short));
+}
+
+/* scatter the 16 bit elements using C */
+void scalar_scatter_16_masked(unsigned short *vscatter16)
+{
+    for (int i = 0; i < MATRIX_SIZE; i++) {
+        if (half_predicates[i]) {
+            vscatter16[half_offsets[i] / 2] = half_values_masked[i];
+        }
+    }
+
+}
+
+void check_scatter_16_masked()
+{
+    memset(vscatter16_ref, FILL_CHAR,
+           SCATTER_BUFFER_SIZE * sizeof(unsigned short));
+    scalar_scatter_16(vscatter16_ref);
+    scalar_scatter_16_acc(vscatter16_ref);
+    scalar_scatter_16_masked(vscatter16_ref);
+    check_buffer(__func__, vtcm.vscatter16, vscatter16_ref,
+                 SCATTER_BUFFER_SIZE * sizeof(unsigned short));
+}
+
+/* scatter the 32 bit elements using C */
+void scalar_scatter_32(unsigned int *vscatter32)
+{
+    for (int i = 0; i < MATRIX_SIZE; ++i) {
+        vscatter32[word_offsets[i] / 4] = word_values[i];
+    }
+}
+
+void check_scatter_32()
+{
+    memset(vscatter32_ref, FILL_CHAR,
+           SCATTER_BUFFER_SIZE * sizeof(unsigned int));
+    scalar_scatter_32(vscatter32_ref);
+    check_buffer(__func__, vtcm.vscatter32, vscatter32_ref,
+                 SCATTER_BUFFER_SIZE * sizeof(unsigned int));
+}
+
+/* scatter the 32 bit elements using C */
+void scalar_scatter_32_acc(unsigned int *vscatter32)
+{
+    for (int i = 0; i < MATRIX_SIZE; ++i) {
+        vscatter32[word_offsets[i] / 4] += word_values_acc[i];
+    }
+}
+
+void check_scatter_32_acc()
+{
+    memset(vscatter32_ref, FILL_CHAR,
+           SCATTER_BUFFER_SIZE * sizeof(unsigned int));
+    scalar_scatter_32(vscatter32_ref);
+    scalar_scatter_32_acc(vscatter32_ref);
+    check_buffer(__func__, vtcm.vscatter32, vscatter32_ref,
+                 SCATTER_BUFFER_SIZE * sizeof(unsigned int));
+}
+
+/* scatter the 32 bit elements using C */
+void scalar_scatter_32_masked(unsigned int *vscatter32)
+{
+    for (int i = 0; i < MATRIX_SIZE; i++) {
+        if (word_predicates[i]) {
+            vscatter32[word_offsets[i] / 4] = word_values_masked[i];
+        }
+    }
+}
+
+void check_scatter_32_masked()
+{
+    memset(vscatter32_ref, FILL_CHAR,
+           SCATTER_BUFFER_SIZE * sizeof(unsigned int));
+    scalar_scatter_32(vscatter32_ref);
+    scalar_scatter_32_acc(vscatter32_ref);
+    scalar_scatter_32_masked(vscatter32_ref);
+    check_buffer(__func__, vtcm.vscatter32, vscatter32_ref,
+                  SCATTER_BUFFER_SIZE * sizeof(unsigned int));
+}
+
+/* scatter the 32 bit elements using C */
+void scalar_scatter_16_32(unsigned short *vscatter16_32)
+{
+    for (int i = 0; i < MATRIX_SIZE; ++i) {
+        vscatter16_32[word_offsets[i] / 2] = half_values[i];
+    }
+}
+
+void check_scatter_16_32()
+{
+    memset(vscatter16_32_ref, FILL_CHAR,
+           SCATTER_BUFFER_SIZE * sizeof(unsigned short));
+    scalar_scatter_16_32(vscatter16_32_ref);
+    check_buffer(__func__, vtcm.vscatter16_32, vscatter16_32_ref,
+                 SCATTER_BUFFER_SIZE * sizeof(unsigned short));
+}
+
+/* scatter the 32 bit elements using C */
+void scalar_scatter_16_32_acc(unsigned short *vscatter16_32)
+{
+    for (int i = 0; i < MATRIX_SIZE; ++i) {
+        vscatter16_32[word_offsets[i] / 2] += half_values_acc[i];
+    }
+}
+
+void check_scatter_16_32_acc()
+{
+    memset(vscatter16_32_ref, FILL_CHAR,
+           SCATTER_BUFFER_SIZE * sizeof(unsigned short));
+    scalar_scatter_16_32(vscatter16_32_ref);
+    scalar_scatter_16_32_acc(vscatter16_32_ref);
+    check_buffer(__func__, vtcm.vscatter16_32, vscatter16_32_ref,
+                 SCATTER_BUFFER_SIZE * sizeof(unsigned short));
+}
+
+void scalar_scatter_16_32_masked(unsigned short *vscatter16_32)
+{
+    for (int i = 0; i < MATRIX_SIZE; i++) {
+        if (half_predicates[i]) {
+            vscatter16_32[word_offsets[i] / 2] = half_values_masked[i];
+        }
+    }
+}
+
+void check_scatter_16_32_masked()
+{
+    memset(vscatter16_32_ref, FILL_CHAR,
+           SCATTER_BUFFER_SIZE * sizeof(unsigned short));
+    scalar_scatter_16_32(vscatter16_32_ref);
+    scalar_scatter_16_32_acc(vscatter16_32_ref);
+    scalar_scatter_16_32_masked(vscatter16_32_ref);
+    check_buffer(__func__, vtcm.vscatter16_32, vscatter16_32_ref,
+                 SCATTER_BUFFER_SIZE * sizeof(unsigned short));
+}
+
+/* gather the elements from the scatter buffer using C */
+void scalar_gather_16(unsigned short *vgather16)
+{
+    for (int i = 0; i < MATRIX_SIZE; ++i) {
+        vgather16[i] = vtcm.vscatter16[half_offsets[i] / 2];
+    }
+}
+
+void check_gather_16()
+{
+      memset(vgather16_ref, 0, MATRIX_SIZE * sizeof(unsigned short));
+      scalar_gather_16(vgather16_ref);
+      check_buffer(__func__, vtcm.vgather16, vgather16_ref,
+                   MATRIX_SIZE * sizeof(unsigned short));
+}
+
+void scalar_gather_16_masked(unsigned short *vgather16)
+{
+    for (int i = 0; i < MATRIX_SIZE; ++i) {
+        if (half_predicates[i]) {
+            vgather16[i] = vtcm.vscatter16[half_offsets[i] / 2];
+        }
+    }
+}
+
+void check_gather_16_masked()
+{
+    memset(vgather16_ref, gather_16_masked_init(),
+           MATRIX_SIZE * sizeof(unsigned short));
+    scalar_gather_16_masked(vgather16_ref);
+    check_buffer(__func__, vtcm.vgather16, vgather16_ref,
+                 MATRIX_SIZE * sizeof(unsigned short));
+}
+
+/* gather the elements from the scatter buffer using C */
+void scalar_gather_32(unsigned int *vgather32)
+{
+    for (int i = 0; i < MATRIX_SIZE; ++i) {
+        vgather32[i] = vtcm.vscatter32[word_offsets[i] / 4];
+    }
+}
+
+void check_gather_32(void)
+{
+    memset(vgather32_ref, 0, MATRIX_SIZE * sizeof(unsigned int));
+    scalar_gather_32(vgather32_ref);
+    check_buffer(__func__, vtcm.vgather32, vgather32_ref,
+                 MATRIX_SIZE * sizeof(unsigned int));
+}
+
+void scalar_gather_32_masked(unsigned int *vgather32)
+{
+    for (int i = 0; i < MATRIX_SIZE; ++i) {
+        if (word_predicates[i]) {
+            vgather32[i] = vtcm.vscatter32[word_offsets[i] / 4];
+        }
+    }
+}
+
+
+void check_gather_32_masked(void)
+{
+    memset(vgather32_ref, gather_32_masked_init(),
+           MATRIX_SIZE * sizeof(unsigned int));
+    scalar_gather_32_masked(vgather32_ref);
+    check_buffer(__func__, vtcm.vgather32,
+                 vgather32_ref, MATRIX_SIZE * sizeof(unsigned int));
+}
+
+/* gather the elements from the scatter buffer using C */
+void scalar_gather_16_32(unsigned short *vgather16_32)
+{
+    for (int i = 0; i < MATRIX_SIZE; ++i) {
+        vgather16_32[i] = vtcm.vscatter16_32[word_offsets[i] / 2];
+    }
+}
+
+void check_gather_16_32(void)
+{
+    memset(vgather16_32_ref, 0, MATRIX_SIZE * sizeof(unsigned short));
+    scalar_gather_16_32(vgather16_32_ref);
+    check_buffer(__func__, vtcm.vgather16_32, vgather16_32_ref,
+                 MATRIX_SIZE * sizeof(unsigned short));
+}
+
+void scalar_gather_16_32_masked(unsigned short *vgather16_32)
+{
+    for (int i = 0; i < MATRIX_SIZE; ++i) {
+        if (half_predicates[i]) {
+            vgather16_32[i] = vtcm.vscatter16_32[word_offsets[i] / 2];
+        }
+    }
+
+}
+
+void check_gather_16_32_masked(void)
+{
+    memset(vgather16_32_ref, gather_16_masked_init(),
+           MATRIX_SIZE * sizeof(unsigned short));
+    scalar_gather_16_32_masked(vgather16_32_ref);
+    check_buffer(__func__, vtcm.vgather16_32, vgather16_32_ref,
+                 MATRIX_SIZE * sizeof(unsigned short));
+}
+
+/* print scatter16 buffer */
+void print_scatter16_buffer(void)
+{
+    if (PRINT_DATA) {
+        printf("\n\nPrinting the 16 bit scatter buffer");
+
+        for (int i = 0; i < SCATTER_BUFFER_SIZE; i++) {
+            if ((i % MATRIX_SIZE) == 0) {
+                printf("\n");
+            }
+            for (int j = 0; j < 2; j++) {
+                printf("%c", (char)((vtcm.vscatter16[i] >> j * 8) & 0xff));
+            }
+            printf(" ");
+        }
+        printf("\n");
+    }
+}
+
+/* print the gather 16 buffer */
+void print_gather_result_16(void)
+{
+    if (PRINT_DATA) {
+        printf("\n\nPrinting the 16 bit gather result\n");
+
+        for (int i = 0; i < MATRIX_SIZE; i++) {
+            for (int j = 0; j < 2; j++) {
+                printf("%c", (char)((vtcm.vgather16[i] >> j * 8) & 0xff));
+            }
+            printf(" ");
+        }
+        printf("\n");
+    }
+}
+
+/* print the scatter32 buffer */
+void print_scatter32_buffer(void)
+{
+    if (PRINT_DATA) {
+        printf("\n\nPrinting the 32 bit scatter buffer");
+
+        for (int i = 0; i < SCATTER_BUFFER_SIZE; i++) {
+            if ((i % MATRIX_SIZE) == 0) {
+                printf("\n");
+            }
+            for (int j = 0; j < 4; j++) {
+                printf("%c", (char)((vtcm.vscatter32[i] >> j * 8) & 0xff));
+            }
+            printf(" ");
+        }
+        printf("\n");
+    }
+}
+
+/* print the gather 32 buffer */
+void print_gather_result_32(void)
+{
+    if (PRINT_DATA) {
+        printf("\n\nPrinting the 32 bit gather result\n");
+
+        for (int i = 0; i < MATRIX_SIZE; i++) {
+            for (int j = 0; j < 4; j++) {
+                printf("%c", (char)((vtcm.vgather32[i] >> j * 8) & 0xff));
+            }
+            printf(" ");
+        }
+        printf("\n");
+    }
+}
+
+/* print the scatter16_32 buffer */
+void print_scatter16_32_buffer(void)
+{
+    if (PRINT_DATA) {
+        printf("\n\nPrinting the 16_32 bit scatter buffer");
+
+        for (int i = 0; i < SCATTER_BUFFER_SIZE; i++) {
+            if ((i % MATRIX_SIZE) == 0) {
+                printf("\n");
+            }
+            for (int j = 0; j < 2; j++) {
+                printf("%c",
+                      (unsigned char)((vtcm.vscatter16_32[i] >> j * 8) & 0xff));
+            }
+            printf(" ");
+        }
+        printf("\n");
+    }
+}
+
+/* print the gather 16_32 buffer */
+void print_gather_result_16_32(void)
+{
+    if (PRINT_DATA) {
+        printf("\n\nPrinting the 16_32 bit gather result\n");
+
+        for (int i = 0; i < MATRIX_SIZE; i++) {
+            for (int j = 0; j < 2; j++) {
+                printf("%c",
+                       (unsigned char)((vtcm.vgather16_32[i] >> j * 8) & 0xff));
+            }
+            printf(" ");
+        }
+        printf("\n");
+    }
+}
+
+int main()
+{
+    prefill_vtcm_scratch();
+
+    /* 16 bit elements with 16 bit offsets */
+    create_offsets_values_preds_16();
+
+    vector_scatter_16();
+    print_scatter16_buffer();
+    check_scatter_16();
+
+    vector_gather_16();
+    print_gather_result_16();
+    check_gather_16();
+
+    vector_gather_16_masked();
+    print_gather_result_16();
+    check_gather_16_masked();
+
+    vector_scatter_16_acc();
+    print_scatter16_buffer();
+    check_scatter_16_acc();
+
+    vector_scatter_16_masked();
+    print_scatter16_buffer();
+    check_scatter_16_masked();
+
+    /* 32 bit elements with 32 bit offsets */
+    create_offsets_values_preds_32();
+
+    vector_scatter_32();
+    print_scatter32_buffer();
+    check_scatter_32();
+
+    vector_gather_32();
+    print_gather_result_32();
+    check_gather_32();
+
+    vector_gather_32_masked();
+    print_gather_result_32();
+    check_gather_32_masked();
+
+    vector_scatter_32_acc();
+    print_scatter32_buffer();
+    check_scatter_32_acc();
+
+    vector_scatter_32_masked();
+    print_scatter32_buffer();
+    check_scatter_32_masked();
+
+    /* 16 bit elements with 32 bit offsets */
+    create_offsets_values_preds_16_32();
+
+    vector_scatter_16_32();
+    print_scatter16_32_buffer();
+    check_scatter_16_32();
+
+    vector_gather_16_32();
+    print_gather_result_16_32();
+    check_gather_16_32();
+
+    vector_gather_16_32_masked();
+    print_gather_result_16_32();
+    check_gather_16_32_masked();
+
+    vector_scatter_16_32_acc();
+    print_scatter16_32_buffer();
+    check_scatter_16_32_acc();
+
+    vector_scatter_16_32_masked();
+    print_scatter16_32_buffer();
+    check_scatter_16_32_masked();
+
+    puts(err ? "FAIL" : "PASS");
+    return err;
+}
diff --git a/tests/tcg/hexagon/Makefile.target b/tests/tcg/hexagon/Makefile.target
index 6ac9ac6..bde90e7 100644
--- a/tests/tcg/hexagon/Makefile.target
+++ b/tests/tcg/hexagon/Makefile.target
@@ -47,11 +47,13 @@ HEX_TESTS += brev
 HEX_TESTS += load_unpack
 HEX_TESTS += load_align
 HEX_TESTS += vector_add_int
+HEX_TESTS += scatter_gather
 HEX_TESTS += atomics
 HEX_TESTS += fpstuff
 HEX_TESTS += hvx_misc
 
 TESTS += $(HEX_TESTS)
 
+scatter_gather: CFLAGS += -mhvx
 vector_add_int: CFLAGS += -mhvx -fvectorize
 hvx_misc: CFLAGS += -mhvx
-- 
2.7.4


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 20/20] Hexagon HVX (tests/tcg/hexagon) histogram test
  2021-07-05 23:34 [PATCH 00/20] Hexagon HVX (target/hexagon) patch series Taylor Simpson
                   ` (18 preceding siblings ...)
  2021-07-05 23:34 ` [PATCH 19/20] Hexagon HVX (tests/tcg/hexagon) scatter_gather test Taylor Simpson
@ 2021-07-05 23:34 ` Taylor Simpson
  19 siblings, 0 replies; 40+ messages in thread
From: Taylor Simpson @ 2021-07-05 23:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: ale, peter.maydell, bcain, richard.henderson, tsimpson, philmd

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 tests/tcg/hexagon/hvx_histogram_input.h | 717 ++++++++++++++++++++++++++++++++
 tests/tcg/hexagon/hvx_histogram_row.h   |  24 ++
 tests/tcg/hexagon/hvx_histogram.c       |  88 ++++
 tests/tcg/hexagon/Makefile.target       |   5 +
 tests/tcg/hexagon/hvx_histogram_row.S   | 294 +++++++++++++
 5 files changed, 1128 insertions(+)
 create mode 100644 tests/tcg/hexagon/hvx_histogram_input.h
 create mode 100644 tests/tcg/hexagon/hvx_histogram_row.h
 create mode 100644 tests/tcg/hexagon/hvx_histogram.c
 create mode 100644 tests/tcg/hexagon/hvx_histogram_row.S

diff --git a/tests/tcg/hexagon/hvx_histogram_input.h b/tests/tcg/hexagon/hvx_histogram_input.h
new file mode 100644
index 0000000..2f91092
--- /dev/null
+++ b/tests/tcg/hexagon/hvx_histogram_input.h
@@ -0,0 +1,717 @@
+/*
+ *  Copyright(c) 2021 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+    { 0x26, 0x32, 0x2e, 0x2e, 0x2d, 0x2c, 0x2d, 0x2d,
+      0x2c, 0x2e, 0x31, 0x33, 0x36, 0x39, 0x3b, 0x3f,
+      0x42, 0x46, 0x4a, 0x4c, 0x51, 0x53, 0x53, 0x54,
+      0x56, 0x57, 0x58, 0x57, 0x56, 0x52, 0x51, 0x4f,
+      0x4c, 0x49, 0x47, 0x42, 0x3e, 0x3b, 0x38, 0x35,
+      0x33, 0x30, 0x2e, 0x2c, 0x2b, 0x2a, 0x2a, 0x28,
+      0x28, 0x27, 0x27, 0x28, 0x29, 0x2a, 0x2c, 0x2e,
+      0x2f, 0x33, 0x36, 0x38, 0x3c, 0x3d, 0x40, 0x42,
+      0x43, 0x42, 0x43, 0x44, 0x43, 0x41, 0x40, 0x3b,
+      0x3b, 0x3a, 0x38, 0x35, 0x32, 0x2f, 0x2c, 0x29,
+      0x27, 0x26, 0x23, 0x21, 0x1e, 0x1c, 0x1a, 0x19,
+      0x17, 0x15, 0x15, 0x14, 0x13, 0x12, 0x11, 0x10,
+      0x0f, 0x0e, 0x0f, 0x0f, 0x0e, 0x0d, 0x0d, 0x0d,
+      0x0c, 0x0d, 0x0e, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c,
+      0x0c, 0x0c, 0x0d, 0x0c, 0x0f, 0x0e, 0x0f, 0x0f,
+      0x0f, 0x10, 0x11, 0x12, 0x14, 0x16, 0x17, 0x19,
+      0x1c, 0x1d, 0x21, 0x25, 0x27, 0x29, 0x2b, 0x2f,
+      0x31, 0x33, 0x36, 0x38, 0x39, 0x3a, 0x3b, 0x3c,
+      0x3c, 0x3d, 0x3e, 0x3e, 0x3c, 0x3b, 0x3a, 0x39,
+      0x39, 0x3a, 0x3a, 0x3a, 0x3a, 0x3c, 0x3e, 0x43,
+      0x47, 0x4a, 0x4d, 0x51, 0x51, 0x54, 0x56, 0x56,
+      0x57, 0x56, 0x53, 0x4f, 0x4b, 0x47, 0x43, 0x41,
+      0x3e, 0x3c, 0x3a, 0x37, 0x36, 0x33, 0x32, 0x34,
+      0x34, 0x34, 0x34, 0x35, 0x36, 0x39, 0x3d, 0x3d,
+      0x3f, 0x40, 0x40, 0x40, 0x40, 0x3e, 0x40, 0x40,
+      0x42, 0x44, 0x47, 0x48, 0x4b, 0x4e, 0x56, 0x5c,
+      0x62, 0x68, 0x6f, 0x73, 0x76, 0x79, 0x7a, 0x7c,
+      0x7e, 0x7c, 0x78, 0x72, 0x6e, 0x69, 0x65, 0x60,
+      0x5b, 0x56, 0x52, 0x4d, 0x4a, 0x48, 0x47, 0x46,
+      0x44, 0x43, 0x42, 0x41, 0x41, 0x41, 0x40, 0x40,
+      0x3f, 0x3e, 0x3d, 0x3c, 0x3b, 0x3b, 0x38, 0x37,
+      0x36, 0x35, 0x36, 0x35, 0x36, 0x37, 0x38, 0x3c,
+      0x3d, 0x3f, 0x42, 0x44, 0x46, 0x48, 0x4b, 0x4c,
+      0x4e, 0x4e, 0x4d, 0x4c, 0x4a, 0x48, 0x49, 0x49,
+      0x4b, 0x4d, 0x4e, },
+    { 0x23, 0x2d, 0x29, 0x29, 0x28, 0x28, 0x29, 0x29,
+      0x28, 0x2b, 0x2d, 0x2f, 0x32, 0x34, 0x36, 0x3a,
+      0x3d, 0x41, 0x44, 0x47, 0x4a, 0x4c, 0x4e, 0x4e,
+      0x50, 0x51, 0x51, 0x51, 0x4f, 0x4c, 0x4b, 0x48,
+      0x46, 0x44, 0x40, 0x3d, 0x39, 0x36, 0x34, 0x30,
+      0x2f, 0x2d, 0x2a, 0x29, 0x28, 0x27, 0x26, 0x25,
+      0x25, 0x24, 0x24, 0x24, 0x26, 0x28, 0x28, 0x2a,
+      0x2b, 0x2e, 0x32, 0x34, 0x37, 0x39, 0x3b, 0x3c,
+      0x3d, 0x3d, 0x3e, 0x3e, 0x3e, 0x3c, 0x3b, 0x38,
+      0x37, 0x35, 0x33, 0x30, 0x2e, 0x2b, 0x27, 0x25,
+      0x24, 0x21, 0x20, 0x1d, 0x1b, 0x1a, 0x18, 0x16,
+      0x15, 0x14, 0x13, 0x12, 0x10, 0x11, 0x10, 0x0e,
+      0x0e, 0x0d, 0x0d, 0x0d, 0x0d, 0x0c, 0x0c, 0x0b,
+      0x0b, 0x0b, 0x0c, 0x0b, 0x0b, 0x09, 0x0a, 0x0b,
+      0x0b, 0x0a, 0x0a, 0x0c, 0x0c, 0x0c, 0x0d, 0x0e,
+      0x0e, 0x0f, 0x0f, 0x11, 0x12, 0x15, 0x15, 0x17,
+      0x1a, 0x1c, 0x1f, 0x22, 0x25, 0x26, 0x29, 0x2a,
+      0x2d, 0x30, 0x33, 0x34, 0x35, 0x35, 0x37, 0x37,
+      0x39, 0x3a, 0x39, 0x38, 0x37, 0x36, 0x36, 0x37,
+      0x35, 0x36, 0x35, 0x35, 0x36, 0x37, 0x3a, 0x3e,
+      0x40, 0x43, 0x48, 0x49, 0x4b, 0x4c, 0x4d, 0x4e,
+      0x4f, 0x4f, 0x4c, 0x48, 0x45, 0x41, 0x3e, 0x3b,
+      0x3a, 0x37, 0x36, 0x33, 0x32, 0x31, 0x30, 0x31,
+      0x32, 0x31, 0x31, 0x31, 0x31, 0x34, 0x37, 0x38,
+      0x3a, 0x3b, 0x3b, 0x3b, 0x3c, 0x3b, 0x3d, 0x3e,
+      0x3f, 0x40, 0x43, 0x44, 0x47, 0x4b, 0x4f, 0x56,
+      0x5a, 0x60, 0x66, 0x69, 0x6a, 0x6e, 0x71, 0x72,
+      0x73, 0x72, 0x6d, 0x69, 0x66, 0x60, 0x5c, 0x59,
+      0x54, 0x50, 0x4d, 0x48, 0x46, 0x44, 0x44, 0x43,
+      0x42, 0x41, 0x41, 0x40, 0x3f, 0x3f, 0x3e, 0x3d,
+      0x3d, 0x3d, 0x3c, 0x3a, 0x39, 0x38, 0x35, 0x35,
+      0x34, 0x34, 0x35, 0x34, 0x35, 0x36, 0x39, 0x3c,
+      0x3d, 0x3e, 0x41, 0x43, 0x44, 0x46, 0x48, 0x49,
+      0x4a, 0x49, 0x48, 0x47, 0x45, 0x43, 0x43, 0x44,
+      0x45, 0x47, 0x48, },
+    { 0x23, 0x2d, 0x2a, 0x2a, 0x29, 0x29, 0x2a, 0x2a,
+      0x29, 0x2c, 0x2d, 0x2f, 0x32, 0x34, 0x36, 0x3a,
+      0x3d, 0x40, 0x44, 0x48, 0x4a, 0x4c, 0x4e, 0x4e,
+      0x50, 0x51, 0x51, 0x51, 0x4f, 0x4c, 0x4b, 0x48,
+      0x46, 0x44, 0x40, 0x3d, 0x39, 0x36, 0x34, 0x30,
+      0x2f, 0x2d, 0x2a, 0x29, 0x28, 0x27, 0x26, 0x25,
+      0x25, 0x24, 0x24, 0x25, 0x26, 0x28, 0x29, 0x2a,
+      0x2b, 0x2e, 0x31, 0x34, 0x37, 0x39, 0x3b, 0x3c,
+      0x3d, 0x3e, 0x3e, 0x3d, 0x3e, 0x3c, 0x3c, 0x3a,
+      0x37, 0x35, 0x33, 0x30, 0x2f, 0x2b, 0x28, 0x26,
+      0x24, 0x21, 0x20, 0x1e, 0x1c, 0x1b, 0x18, 0x17,
+      0x16, 0x14, 0x13, 0x12, 0x10, 0x10, 0x0f, 0x0e,
+      0x0f, 0x0e, 0x0d, 0x0d, 0x0d, 0x0d, 0x0d, 0x0c,
+      0x0b, 0x0b, 0x0c, 0x0c, 0x0c, 0x0b, 0x0b, 0x0c,
+      0x0c, 0x0b, 0x0b, 0x0c, 0x0d, 0x0c, 0x0e, 0x0e,
+      0x0e, 0x0f, 0x11, 0x11, 0x13, 0x14, 0x16, 0x18,
+      0x1a, 0x1d, 0x1f, 0x22, 0x25, 0x26, 0x29, 0x2b,
+      0x2d, 0x31, 0x33, 0x34, 0x36, 0x37, 0x38, 0x38,
+      0x39, 0x3a, 0x39, 0x38, 0x37, 0x36, 0x37, 0x37,
+      0x35, 0x36, 0x35, 0x36, 0x35, 0x38, 0x3a, 0x3e,
+      0x40, 0x41, 0x45, 0x47, 0x49, 0x4a, 0x4c, 0x4d,
+      0x4e, 0x4d, 0x4a, 0x47, 0x44, 0x40, 0x3d, 0x3b,
+      0x39, 0x37, 0x34, 0x34, 0x32, 0x31, 0x31, 0x33,
+      0x32, 0x31, 0x32, 0x33, 0x32, 0x36, 0x38, 0x39,
+      0x3b, 0x3c, 0x3c, 0x3c, 0x3d, 0x3d, 0x3e, 0x3e,
+      0x41, 0x42, 0x43, 0x45, 0x48, 0x4c, 0x50, 0x56,
+      0x5b, 0x5f, 0x62, 0x67, 0x69, 0x6c, 0x6e, 0x6e,
+      0x70, 0x6f, 0x6b, 0x67, 0x63, 0x5e, 0x5b, 0x58,
+      0x54, 0x51, 0x4e, 0x4a, 0x48, 0x46, 0x46, 0x46,
+      0x45, 0x46, 0x44, 0x43, 0x44, 0x43, 0x42, 0x42,
+      0x41, 0x40, 0x3f, 0x3e, 0x3c, 0x3b, 0x3a, 0x39,
+      0x39, 0x39, 0x38, 0x37, 0x37, 0x3a, 0x3e, 0x40,
+      0x42, 0x43, 0x47, 0x47, 0x48, 0x4a, 0x4b, 0x4c,
+      0x4c, 0x4b, 0x4a, 0x48, 0x46, 0x44, 0x43, 0x45,
+      0x45, 0x46, 0x47, },
+    { 0x21, 0x2b, 0x28, 0x28, 0x28, 0x28, 0x29, 0x29,
+      0x28, 0x2a, 0x2d, 0x30, 0x32, 0x34, 0x37, 0x3a,
+      0x3c, 0x40, 0x44, 0x48, 0x4a, 0x4c, 0x4e, 0x4e,
+      0x50, 0x51, 0x52, 0x51, 0x4f, 0x4b, 0x4b, 0x48,
+      0x45, 0x43, 0x3f, 0x3c, 0x39, 0x36, 0x33, 0x30,
+      0x2f, 0x2d, 0x2b, 0x2a, 0x28, 0x27, 0x26, 0x25,
+      0x24, 0x24, 0x24, 0x25, 0x27, 0x27, 0x29, 0x2a,
+      0x2c, 0x2d, 0x31, 0x34, 0x37, 0x39, 0x3b, 0x3c,
+      0x3d, 0x3e, 0x3e, 0x3e, 0x3e, 0x3d, 0x3c, 0x3a,
+      0x37, 0x35, 0x33, 0x30, 0x2f, 0x2b, 0x28, 0x26,
+      0x25, 0x21, 0x20, 0x1e, 0x1c, 0x19, 0x19, 0x18,
+      0x17, 0x15, 0x15, 0x12, 0x11, 0x11, 0x11, 0x0f,
+      0x0e, 0x0e, 0x0e, 0x0e, 0x0d, 0x0d, 0x0d, 0x0c,
+      0x0c, 0x0c, 0x0b, 0x0b, 0x0b, 0x0b, 0x0b, 0x0b,
+      0x0c, 0x0c, 0x0c, 0x0c, 0x0e, 0x0e, 0x0f, 0x0f,
+      0x0f, 0x10, 0x11, 0x13, 0x13, 0x15, 0x16, 0x18,
+      0x1a, 0x1c, 0x1f, 0x22, 0x25, 0x28, 0x29, 0x2d,
+      0x2f, 0x32, 0x34, 0x35, 0x36, 0x37, 0x38, 0x38,
+      0x39, 0x3a, 0x39, 0x39, 0x37, 0x36, 0x37, 0x36,
+      0x35, 0x35, 0x37, 0x35, 0x36, 0x37, 0x3a, 0x3d,
+      0x3e, 0x41, 0x43, 0x46, 0x46, 0x47, 0x48, 0x49,
+      0x4a, 0x49, 0x47, 0x45, 0x42, 0x3f, 0x3d, 0x3b,
+      0x3a, 0x38, 0x36, 0x34, 0x32, 0x32, 0x32, 0x32,
+      0x32, 0x31, 0x33, 0x32, 0x34, 0x37, 0x38, 0x38,
+      0x3a, 0x3b, 0x3d, 0x3d, 0x3d, 0x3e, 0x3f, 0x41,
+      0x42, 0x44, 0x44, 0x46, 0x49, 0x4d, 0x50, 0x54,
+      0x58, 0x5c, 0x61, 0x63, 0x65, 0x69, 0x6a, 0x6c,
+      0x6d, 0x6c, 0x68, 0x64, 0x61, 0x5c, 0x59, 0x57,
+      0x53, 0x51, 0x4f, 0x4c, 0x4a, 0x48, 0x48, 0x49,
+      0x49, 0x48, 0x48, 0x48, 0x47, 0x47, 0x46, 0x46,
+      0x45, 0x44, 0x42, 0x41, 0x3f, 0x3e, 0x3c, 0x3c,
+      0x3c, 0x3d, 0x3c, 0x3c, 0x3c, 0x3e, 0x41, 0x43,
+      0x46, 0x48, 0x4a, 0x4b, 0x4c, 0x4d, 0x4e, 0x4e,
+      0x4e, 0x4d, 0x4b, 0x49, 0x47, 0x44, 0x44, 0x45,
+      0x45, 0x45, 0x46, },
+    { 0x22, 0x2b, 0x27, 0x27, 0x27, 0x27, 0x28, 0x28,
+      0x28, 0x2a, 0x2c, 0x2f, 0x30, 0x34, 0x37, 0x3b,
+      0x3d, 0x41, 0x45, 0x48, 0x4a, 0x4c, 0x4e, 0x4e,
+      0x50, 0x51, 0x52, 0x51, 0x4f, 0x4b, 0x4b, 0x47,
+      0x45, 0x43, 0x3f, 0x3c, 0x39, 0x36, 0x33, 0x30,
+      0x2f, 0x2d, 0x2b, 0x2a, 0x27, 0x26, 0x25, 0x24,
+      0x23, 0x24, 0x24, 0x25, 0x27, 0x27, 0x29, 0x2a,
+      0x2c, 0x2e, 0x31, 0x34, 0x37, 0x39, 0x3a, 0x3b,
+      0x3d, 0x3e, 0x3e, 0x3f, 0x3f, 0x3d, 0x3c, 0x3a,
+      0x38, 0x36, 0x34, 0x31, 0x2e, 0x2c, 0x29, 0x26,
+      0x25, 0x22, 0x20, 0x1e, 0x1c, 0x1a, 0x19, 0x18,
+      0x16, 0x15, 0x14, 0x12, 0x10, 0x11, 0x11, 0x0f,
+      0x0e, 0x0e, 0x0e, 0x0e, 0x0d, 0x0c, 0x0d, 0x0c,
+      0x0c, 0x0c, 0x0b, 0x0b, 0x0b, 0x0b, 0x0b, 0x0b,
+      0x0c, 0x0c, 0x0c, 0x0d, 0x0d, 0x0e, 0x0f, 0x0f,
+      0x0f, 0x10, 0x11, 0x13, 0x13, 0x15, 0x15, 0x18,
+      0x19, 0x1d, 0x1f, 0x21, 0x24, 0x27, 0x2a, 0x2c,
+      0x30, 0x33, 0x35, 0x36, 0x37, 0x38, 0x39, 0x39,
+      0x3a, 0x3a, 0x39, 0x39, 0x37, 0x36, 0x37, 0x36,
+      0x36, 0x36, 0x36, 0x36, 0x36, 0x37, 0x39, 0x3a,
+      0x3d, 0x3e, 0x41, 0x43, 0x43, 0x45, 0x46, 0x46,
+      0x47, 0x46, 0x44, 0x42, 0x40, 0x3d, 0x3a, 0x39,
+      0x37, 0x36, 0x35, 0x34, 0x33, 0x32, 0x32, 0x32,
+      0x32, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37, 0x38,
+      0x39, 0x3c, 0x3c, 0x3e, 0x3e, 0x3e, 0x41, 0x43,
+      0x44, 0x45, 0x46, 0x48, 0x49, 0x4c, 0x51, 0x54,
+      0x56, 0x5a, 0x5f, 0x61, 0x63, 0x65, 0x67, 0x69,
+      0x6a, 0x69, 0x67, 0x61, 0x5f, 0x5b, 0x58, 0x56,
+      0x54, 0x51, 0x50, 0x4e, 0x4c, 0x4a, 0x4b, 0x4c,
+      0x4c, 0x4b, 0x4b, 0x4b, 0x4b, 0x49, 0x4a, 0x49,
+      0x49, 0x48, 0x46, 0x44, 0x42, 0x41, 0x40, 0x3f,
+      0x3f, 0x40, 0x40, 0x40, 0x40, 0x42, 0x46, 0x49,
+      0x4b, 0x4c, 0x4f, 0x4f, 0x50, 0x52, 0x51, 0x51,
+      0x50, 0x4f, 0x4c, 0x4a, 0x48, 0x46, 0x45, 0x44,
+      0x44, 0x45, 0x46, },
+    { 0x21, 0x2a, 0x27, 0x27, 0x27, 0x27, 0x27, 0x27,
+      0x27, 0x29, 0x2d, 0x2f, 0x31, 0x34, 0x37, 0x3b,
+      0x3e, 0x41, 0x45, 0x48, 0x4a, 0x4c, 0x4e, 0x4e,
+      0x50, 0x51, 0x52, 0x51, 0x4f, 0x4b, 0x4b, 0x48,
+      0x45, 0x43, 0x3f, 0x3c, 0x39, 0x36, 0x33, 0x2f,
+      0x2f, 0x2d, 0x2a, 0x2a, 0x27, 0x26, 0x25, 0x24,
+      0x22, 0x24, 0x24, 0x25, 0x27, 0x27, 0x29, 0x2a,
+      0x2c, 0x2f, 0x31, 0x34, 0x37, 0x39, 0x3a, 0x3c,
+      0x3d, 0x3e, 0x3f, 0x40, 0x3f, 0x3d, 0x3d, 0x3a,
+      0x38, 0x36, 0x34, 0x31, 0x2e, 0x2c, 0x29, 0x26,
+      0x25, 0x22, 0x21, 0x1f, 0x1d, 0x1b, 0x19, 0x18,
+      0x16, 0x14, 0x14, 0x13, 0x11, 0x11, 0x11, 0x0f,
+      0x0f, 0x0f, 0x0e, 0x0e, 0x0d, 0x0d, 0x0d, 0x0d,
+      0x0d, 0x0d, 0x0c, 0x0b, 0x0b, 0x0b, 0x0b, 0x0c,
+      0x0c, 0x0d, 0x0d, 0x0d, 0x0e, 0x0e, 0x0f, 0x0f,
+      0x0f, 0x10, 0x13, 0x13, 0x14, 0x15, 0x17, 0x19,
+      0x1a, 0x1d, 0x1f, 0x22, 0x25, 0x27, 0x2a, 0x2e,
+      0x31, 0x33, 0x35, 0x38, 0x39, 0x3a, 0x3b, 0x3b,
+      0x3c, 0x3c, 0x3b, 0x3a, 0x39, 0x38, 0x38, 0x37,
+      0x36, 0x36, 0x37, 0x36, 0x37, 0x38, 0x38, 0x3a,
+      0x3b, 0x3e, 0x40, 0x40, 0x41, 0x42, 0x43, 0x42,
+      0x43, 0x42, 0x40, 0x40, 0x3f, 0x3c, 0x3b, 0x39,
+      0x38, 0x37, 0x36, 0x35, 0x34, 0x33, 0x32, 0x33,
+      0x32, 0x32, 0x34, 0x35, 0x35, 0x36, 0x39, 0x39,
+      0x3a, 0x3c, 0x3c, 0x3f, 0x40, 0x41, 0x43, 0x45,
+      0x45, 0x47, 0x48, 0x4a, 0x4b, 0x4d, 0x50, 0x53,
+      0x56, 0x59, 0x5c, 0x5f, 0x60, 0x65, 0x64, 0x66,
+      0x68, 0x66, 0x64, 0x61, 0x5e, 0x5a, 0x59, 0x56,
+      0x54, 0x52, 0x51, 0x50, 0x4e, 0x4c, 0x4d, 0x4f,
+      0x4f, 0x4f, 0x50, 0x50, 0x4f, 0x4f, 0x4e, 0x4d,
+      0x4c, 0x4b, 0x49, 0x47, 0x45, 0x44, 0x43, 0x43,
+      0x42, 0x43, 0x44, 0x44, 0x46, 0x47, 0x49, 0x4d,
+      0x4f, 0x51, 0x53, 0x54, 0x53, 0x54, 0x54, 0x53,
+      0x53, 0x51, 0x4e, 0x4b, 0x4a, 0x47, 0x45, 0x44,
+      0x44, 0x45, 0x46, },
+    { 0x20, 0x28, 0x26, 0x26, 0x25, 0x24, 0x27, 0x27,
+      0x27, 0x29, 0x2c, 0x2e, 0x31, 0x34, 0x37, 0x3b,
+      0x3e, 0x41, 0x45, 0x48, 0x4a, 0x4c, 0x4e, 0x4e,
+      0x50, 0x51, 0x52, 0x51, 0x4f, 0x4b, 0x4a, 0x49,
+      0x45, 0x43, 0x3f, 0x3c, 0x3a, 0x36, 0x33, 0x30,
+      0x2f, 0x2d, 0x2a, 0x28, 0x27, 0x26, 0x25, 0x24,
+      0x23, 0x24, 0x24, 0x25, 0x27, 0x27, 0x29, 0x2a,
+      0x2c, 0x2e, 0x31, 0x34, 0x37, 0x39, 0x3b, 0x3c,
+      0x3d, 0x3e, 0x3f, 0x40, 0x3e, 0x3d, 0x3d, 0x3a,
+      0x38, 0x36, 0x34, 0x31, 0x2f, 0x2c, 0x29, 0x27,
+      0x25, 0x21, 0x21, 0x1f, 0x1c, 0x1d, 0x19, 0x18,
+      0x16, 0x15, 0x15, 0x13, 0x12, 0x11, 0x11, 0x0f,
+      0x0f, 0x0e, 0x0f, 0x0f, 0x0e, 0x0d, 0x0d, 0x0d,
+      0x0c, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c,
+      0x0d, 0x0d, 0x0d, 0x0e, 0x0e, 0x0e, 0x0f, 0x10,
+      0x10, 0x10, 0x12, 0x13, 0x15, 0x16, 0x18, 0x1a,
+      0x1c, 0x1d, 0x20, 0x22, 0x25, 0x27, 0x2a, 0x2e,
+      0x30, 0x34, 0x38, 0x39, 0x3a, 0x3b, 0x3b, 0x3b,
+      0x3c, 0x3d, 0x3c, 0x3b, 0x3a, 0x39, 0x38, 0x37,
+      0x36, 0x36, 0x38, 0x37, 0x37, 0x37, 0x38, 0x3a,
+      0x3b, 0x3c, 0x3d, 0x3e, 0x3f, 0x40, 0x40, 0x40,
+      0x42, 0x40, 0x3f, 0x3e, 0x3d, 0x3b, 0x3a, 0x39,
+      0x37, 0x36, 0x36, 0x35, 0x34, 0x34, 0x33, 0x33,
+      0x33, 0x34, 0x35, 0x35, 0x35, 0x36, 0x38, 0x39,
+      0x3a, 0x3b, 0x3d, 0x3f, 0x42, 0x43, 0x45, 0x45,
+      0x46, 0x48, 0x49, 0x4b, 0x4b, 0x4d, 0x50, 0x53,
+      0x56, 0x57, 0x5a, 0x5c, 0x5e, 0x61, 0x63, 0x65,
+      0x66, 0x64, 0x62, 0x5f, 0x5c, 0x59, 0x58, 0x56,
+      0x55, 0x54, 0x52, 0x51, 0x50, 0x51, 0x51, 0x52,
+      0x52, 0x52, 0x52, 0x52, 0x51, 0x51, 0x51, 0x50,
+      0x4f, 0x4e, 0x4c, 0x4a, 0x47, 0x46, 0x45, 0x45,
+      0x45, 0x46, 0x46, 0x46, 0x4a, 0x4c, 0x4d, 0x52,
+      0x54, 0x56, 0x58, 0x58, 0x56, 0x57, 0x57, 0x56,
+      0x55, 0x53, 0x50, 0x4d, 0x49, 0x45, 0x44, 0x44,
+      0x43, 0x44, 0x45, },
+    { 0x1f, 0x27, 0x24, 0x23, 0x25, 0x24, 0x25, 0x26,
+      0x26, 0x28, 0x2b, 0x2e, 0x31, 0x34, 0x37, 0x3a,
+      0x3d, 0x41, 0x45, 0x48, 0x4b, 0x4d, 0x4f, 0x4e,
+      0x50, 0x51, 0x52, 0x50, 0x4f, 0x4b, 0x4a, 0x49,
+      0x45, 0x43, 0x3f, 0x3c, 0x3a, 0x36, 0x33, 0x30,
+      0x2f, 0x2d, 0x29, 0x28, 0x27, 0x26, 0x25, 0x24,
+      0x23, 0x25, 0x24, 0x25, 0x27, 0x27, 0x29, 0x2a,
+      0x2c, 0x2f, 0x32, 0x34, 0x37, 0x39, 0x3b, 0x3c,
+      0x3e, 0x3f, 0x3f, 0x40, 0x3e, 0x3d, 0x3c, 0x3a,
+      0x38, 0x36, 0x34, 0x31, 0x30, 0x2c, 0x29, 0x28,
+      0x25, 0x23, 0x22, 0x1f, 0x1c, 0x1c, 0x18, 0x18,
+      0x16, 0x14, 0x14, 0x13, 0x11, 0x11, 0x11, 0x0f,
+      0x0f, 0x0e, 0x0f, 0x0f, 0x0e, 0x0d, 0x0d, 0x0d,
+      0x0c, 0x0c, 0x0b, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c,
+      0x0d, 0x0e, 0x0e, 0x0f, 0x0d, 0x0f, 0x10, 0x10,
+      0x10, 0x11, 0x13, 0x14, 0x15, 0x16, 0x19, 0x1a,
+      0x1c, 0x1f, 0x20, 0x23, 0x26, 0x28, 0x2a, 0x2e,
+      0x31, 0x35, 0x38, 0x39, 0x3a, 0x3c, 0x3d, 0x3d,
+      0x3e, 0x3e, 0x3d, 0x3c, 0x3a, 0x3a, 0x39, 0x39,
+      0x38, 0x37, 0x38, 0x38, 0x37, 0x38, 0x39, 0x3a,
+      0x3c, 0x3c, 0x3d, 0x3e, 0x3f, 0x3f, 0x40, 0x3f,
+      0x41, 0x40, 0x3e, 0x3e, 0x3d, 0x3b, 0x3b, 0x39,
+      0x37, 0x37, 0x35, 0x36, 0x34, 0x34, 0x34, 0x35,
+      0x35, 0x34, 0x34, 0x35, 0x35, 0x37, 0x38, 0x39,
+      0x3a, 0x3c, 0x3f, 0x3f, 0x43, 0x43, 0x45, 0x47,
+      0x48, 0x48, 0x4a, 0x4b, 0x4e, 0x4d, 0x51, 0x53,
+      0x56, 0x58, 0x59, 0x5b, 0x5d, 0x60, 0x62, 0x63,
+      0x64, 0x63, 0x61, 0x5e, 0x5c, 0x5a, 0x57, 0x56,
+      0x55, 0x54, 0x53, 0x52, 0x51, 0x51, 0x52, 0x52,
+      0x54, 0x54, 0x55, 0x55, 0x55, 0x54, 0x54, 0x53,
+      0x52, 0x50, 0x4e, 0x4d, 0x4b, 0x4a, 0x48, 0x48,
+      0x48, 0x48, 0x4a, 0x4b, 0x4d, 0x4f, 0x52, 0x55,
+      0x58, 0x5a, 0x5b, 0x5b, 0x5b, 0x5b, 0x5a, 0x59,
+      0x58, 0x55, 0x51, 0x4e, 0x4a, 0x46, 0x45, 0x44,
+      0x44, 0x44, 0x44, },
+    { 0x1e, 0x26, 0x23, 0x23, 0x25, 0x24, 0x25, 0x26,
+      0x26, 0x28, 0x2b, 0x2e, 0x31, 0x34, 0x37, 0x3a,
+      0x3e, 0x42, 0x45, 0x48, 0x4b, 0x4d, 0x4f, 0x4f,
+      0x50, 0x51, 0x52, 0x50, 0x4f, 0x4b, 0x4a, 0x48,
+      0x46, 0x44, 0x3f, 0x3b, 0x39, 0x36, 0x33, 0x30,
+      0x2f, 0x2d, 0x2a, 0x28, 0x27, 0x26, 0x25, 0x24,
+      0x23, 0x24, 0x24, 0x25, 0x27, 0x27, 0x29, 0x2a,
+      0x2c, 0x2f, 0x32, 0x34, 0x37, 0x39, 0x3b, 0x3d,
+      0x3e, 0x3f, 0x41, 0x41, 0x40, 0x3e, 0x3d, 0x3b,
+      0x38, 0x37, 0x34, 0x32, 0x30, 0x2c, 0x2a, 0x27,
+      0x26, 0x23, 0x22, 0x20, 0x1d, 0x1b, 0x1a, 0x19,
+      0x17, 0x15, 0x15, 0x13, 0x12, 0x12, 0x11, 0x0f,
+      0x11, 0x0f, 0x0e, 0x0e, 0x0d, 0x0d, 0x0d, 0x0c,
+      0x0d, 0x0d, 0x0d, 0x0d, 0x0d, 0x0d, 0x0d, 0x0d,
+      0x0e, 0x0e, 0x0e, 0x0f, 0x10, 0x10, 0x11, 0x11,
+      0x11, 0x13, 0x16, 0x15, 0x15, 0x18, 0x1a, 0x1b,
+      0x1d, 0x20, 0x22, 0x24, 0x27, 0x29, 0x2c, 0x30,
+      0x33, 0x37, 0x3a, 0x3b, 0x3c, 0x3d, 0x3e, 0x3e,
+      0x40, 0x40, 0x40, 0x3f, 0x3e, 0x3d, 0x3c, 0x3a,
+      0x3a, 0x3a, 0x3a, 0x3a, 0x3a, 0x3a, 0x3b, 0x3d,
+      0x3d, 0x3f, 0x40, 0x40, 0x3f, 0x41, 0x41, 0x41,
+      0x41, 0x41, 0x40, 0x40, 0x3f, 0x3e, 0x3c, 0x3b,
+      0x3a, 0x39, 0x37, 0x36, 0x36, 0x35, 0x35, 0x36,
+      0x36, 0x35, 0x35, 0x36, 0x36, 0x38, 0x39, 0x39,
+      0x3b, 0x3c, 0x3e, 0x40, 0x41, 0x43, 0x45, 0x47,
+      0x48, 0x48, 0x4b, 0x4c, 0x4d, 0x4f, 0x51, 0x53,
+      0x56, 0x56, 0x59, 0x5b, 0x5d, 0x5f, 0x61, 0x62,
+      0x63, 0x63, 0x61, 0x5e, 0x5c, 0x5a, 0x59, 0x57,
+      0x56, 0x54, 0x54, 0x53, 0x52, 0x53, 0x53, 0x55,
+      0x56, 0x56, 0x57, 0x57, 0x57, 0x57, 0x56, 0x56,
+      0x55, 0x53, 0x51, 0x4f, 0x4d, 0x4b, 0x49, 0x4b,
+      0x4b, 0x4c, 0x4d, 0x4e, 0x51, 0x53, 0x55, 0x58,
+      0x5b, 0x5c, 0x60, 0x60, 0x5f, 0x5e, 0x5d, 0x5c,
+      0x5a, 0x57, 0x53, 0x4f, 0x4b, 0x46, 0x45, 0x44,
+      0x44, 0x44, 0x44, },
+    { 0x1d, 0x25, 0x22, 0x22, 0x23, 0x23, 0x24, 0x25,
+      0x25, 0x28, 0x2b, 0x2e, 0x31, 0x34, 0x37, 0x3a,
+      0x3e, 0x42, 0x45, 0x48, 0x4b, 0x4d, 0x4f, 0x4f,
+      0x50, 0x51, 0x52, 0x50, 0x4f, 0x4b, 0x4a, 0x47,
+      0x45, 0x43, 0x3f, 0x3c, 0x38, 0x35, 0x33, 0x30,
+      0x2f, 0x2d, 0x2a, 0x28, 0x27, 0x26, 0x25, 0x24,
+      0x23, 0x24, 0x24, 0x25, 0x27, 0x27, 0x29, 0x2a,
+      0x2b, 0x2f, 0x32, 0x34, 0x37, 0x39, 0x3c, 0x3d,
+      0x3e, 0x3f, 0x40, 0x41, 0x40, 0x3e, 0x3d, 0x3b,
+      0x39, 0x36, 0x34, 0x32, 0x30, 0x2d, 0x2a, 0x26,
+      0x26, 0x24, 0x22, 0x1f, 0x1d, 0x1c, 0x1a, 0x19,
+      0x18, 0x16, 0x15, 0x14, 0x12, 0x12, 0x12, 0x10,
+      0x10, 0x0f, 0x0e, 0x10, 0x0e, 0x0e, 0x0d, 0x0c,
+      0x0d, 0x0d, 0x0d, 0x0d, 0x0d, 0x0e, 0x0d, 0x0e,
+      0x0f, 0x0f, 0x0f, 0x10, 0x11, 0x11, 0x11, 0x12,
+      0x13, 0x14, 0x16, 0x16, 0x18, 0x1a, 0x1b, 0x1c,
+      0x1e, 0x21, 0x23, 0x25, 0x28, 0x2a, 0x2e, 0x32,
+      0x34, 0x38, 0x3a, 0x3c, 0x3d, 0x3f, 0x40, 0x42,
+      0x43, 0x43, 0x43, 0x42, 0x40, 0x3e, 0x3e, 0x3c,
+      0x3b, 0x3b, 0x3c, 0x3a, 0x3b, 0x3b, 0x3e, 0x3e,
+      0x40, 0x3f, 0x41, 0x41, 0x41, 0x42, 0x42, 0x43,
+      0x42, 0x41, 0x41, 0x41, 0x40, 0x3e, 0x3d, 0x3c,
+      0x3b, 0x3a, 0x39, 0x37, 0x36, 0x35, 0x36, 0x37,
+      0x35, 0x36, 0x36, 0x37, 0x38, 0x39, 0x3a, 0x3b,
+      0x3b, 0x3d, 0x3e, 0x40, 0x41, 0x41, 0x44, 0x46,
+      0x48, 0x48, 0x4a, 0x4c, 0x4d, 0x4f, 0x51, 0x53,
+      0x55, 0x57, 0x59, 0x5a, 0x5b, 0x5e, 0x5f, 0x61,
+      0x62, 0x61, 0x60, 0x5e, 0x5c, 0x5a, 0x59, 0x58,
+      0x56, 0x55, 0x54, 0x53, 0x53, 0x54, 0x54, 0x55,
+      0x57, 0x57, 0x58, 0x59, 0x5a, 0x58, 0x59, 0x58,
+      0x57, 0x55, 0x53, 0x52, 0x4f, 0x4e, 0x4d, 0x4d,
+      0x4d, 0x4f, 0x51, 0x50, 0x54, 0x56, 0x59, 0x5c,
+      0x5f, 0x61, 0x64, 0x64, 0x63, 0x61, 0x5e, 0x5e,
+      0x5c, 0x59, 0x54, 0x50, 0x4c, 0x46, 0x45, 0x44,
+      0x44, 0x44, 0x44, },
+    { 0x1c, 0x24, 0x21, 0x21, 0x21, 0x22, 0x23, 0x23,
+      0x25, 0x27, 0x2a, 0x2e, 0x31, 0x33, 0x37, 0x3b,
+      0x3e, 0x42, 0x45, 0x48, 0x4b, 0x4c, 0x50, 0x4f,
+      0x50, 0x51, 0x52, 0x50, 0x4e, 0x4b, 0x4a, 0x49,
+      0x45, 0x42, 0x3f, 0x3c, 0x38, 0x35, 0x33, 0x30,
+      0x2f, 0x2d, 0x2a, 0x28, 0x27, 0x26, 0x25, 0x24,
+      0x23, 0x24, 0x24, 0x25, 0x27, 0x27, 0x29, 0x2a,
+      0x2b, 0x2f, 0x32, 0x34, 0x38, 0x39, 0x3c, 0x3d,
+      0x3e, 0x3e, 0x40, 0x41, 0x40, 0x3e, 0x3c, 0x3a,
+      0x39, 0x37, 0x35, 0x33, 0x30, 0x2d, 0x2b, 0x28,
+      0x26, 0x23, 0x23, 0x20, 0x1e, 0x1b, 0x19, 0x19,
+      0x17, 0x16, 0x15, 0x14, 0x12, 0x12, 0x11, 0x10,
+      0x0f, 0x0e, 0x0e, 0x10, 0x0e, 0x0d, 0x0c, 0x0c,
+      0x0c, 0x0d, 0x0d, 0x0d, 0x0d, 0x0e, 0x0d, 0x0e,
+      0x0f, 0x0f, 0x0f, 0x10, 0x11, 0x11, 0x12, 0x14,
+      0x14, 0x14, 0x16, 0x18, 0x19, 0x1b, 0x1c, 0x1e,
+      0x20, 0x23, 0x26, 0x27, 0x29, 0x2c, 0x2f, 0x33,
+      0x36, 0x38, 0x3b, 0x3e, 0x3e, 0x42, 0x43, 0x46,
+      0x46, 0x46, 0x46, 0x44, 0x42, 0x41, 0x3f, 0x3e,
+      0x3d, 0x3d, 0x3e, 0x3d, 0x3d, 0x3e, 0x3e, 0x40,
+      0x40, 0x40, 0x43, 0x43, 0x42, 0x43, 0x45, 0x43,
+      0x43, 0x43, 0x42, 0x42, 0x41, 0x40, 0x40, 0x3e,
+      0x3c, 0x3a, 0x3a, 0x38, 0x36, 0x36, 0x36, 0x36,
+      0x37, 0x37, 0x36, 0x38, 0x38, 0x39, 0x3b, 0x3b,
+      0x3e, 0x3e, 0x3e, 0x40, 0x41, 0x43, 0x45, 0x46,
+      0x46, 0x49, 0x4c, 0x4c, 0x4d, 0x4f, 0x51, 0x54,
+      0x56, 0x57, 0x58, 0x5a, 0x5c, 0x5e, 0x60, 0x60,
+      0x61, 0x61, 0x60, 0x5f, 0x5c, 0x5a, 0x59, 0x58,
+      0x57, 0x57, 0x55, 0x54, 0x53, 0x55, 0x55, 0x58,
+      0x58, 0x59, 0x5a, 0x5a, 0x5a, 0x5b, 0x5b, 0x5b,
+      0x5a, 0x59, 0x56, 0x54, 0x53, 0x4e, 0x4e, 0x50,
+      0x50, 0x51, 0x52, 0x52, 0x57, 0x59, 0x5d, 0x60,
+      0x63, 0x63, 0x66, 0x66, 0x66, 0x64, 0x63, 0x61,
+      0x60, 0x5b, 0x55, 0x51, 0x4d, 0x48, 0x45, 0x44,
+      0x43, 0x43, 0x43, },
+    { 0x1b, 0x23, 0x20, 0x21, 0x22, 0x22, 0x23, 0x24,
+      0x26, 0x27, 0x2a, 0x2e, 0x31, 0x33, 0x37, 0x3b,
+      0x3d, 0x42, 0x46, 0x49, 0x4a, 0x4c, 0x4f, 0x4f,
+      0x50, 0x50, 0x52, 0x50, 0x4e, 0x4b, 0x4b, 0x49,
+      0x45, 0x42, 0x3e, 0x3c, 0x38, 0x35, 0x33, 0x30,
+      0x2f, 0x2d, 0x2a, 0x28, 0x27, 0x26, 0x25, 0x24,
+      0x23, 0x24, 0x24, 0x25, 0x27, 0x27, 0x29, 0x2a,
+      0x2c, 0x2f, 0x32, 0x35, 0x38, 0x3a, 0x3c, 0x3d,
+      0x3e, 0x3e, 0x40, 0x41, 0x40, 0x3f, 0x3d, 0x3b,
+      0x3a, 0x38, 0x36, 0x33, 0x30, 0x2d, 0x2b, 0x29,
+      0x27, 0x24, 0x24, 0x21, 0x1e, 0x1c, 0x1b, 0x1a,
+      0x18, 0x17, 0x16, 0x15, 0x13, 0x12, 0x10, 0x0f,
+      0x10, 0x0f, 0x0e, 0x0f, 0x0e, 0x0d, 0x0d, 0x0d,
+      0x0d, 0x0d, 0x0e, 0x0e, 0x0e, 0x0f, 0x0e, 0x0f,
+      0x10, 0x11, 0x11, 0x12, 0x13, 0x13, 0x14, 0x15,
+      0x15, 0x16, 0x17, 0x1a, 0x1b, 0x1d, 0x1e, 0x20,
+      0x21, 0x25, 0x27, 0x29, 0x2b, 0x2d, 0x31, 0x35,
+      0x37, 0x39, 0x3c, 0x3f, 0x40, 0x43, 0x46, 0x47,
+      0x4a, 0x49, 0x48, 0x46, 0x45, 0x43, 0x42, 0x41,
+      0x3f, 0x40, 0x3f, 0x3f, 0x40, 0x3f, 0x41, 0x43,
+      0x43, 0x43, 0x44, 0x45, 0x45, 0x45, 0x45, 0x45,
+      0x45, 0x45, 0x44, 0x43, 0x43, 0x42, 0x42, 0x40,
+      0x3e, 0x3d, 0x3c, 0x39, 0x38, 0x38, 0x38, 0x38,
+      0x38, 0x36, 0x38, 0x39, 0x39, 0x3a, 0x3c, 0x3d,
+      0x3e, 0x3e, 0x3f, 0x41, 0x42, 0x42, 0x43, 0x45,
+      0x46, 0x49, 0x4b, 0x4d, 0x4f, 0x50, 0x53, 0x54,
+      0x57, 0x58, 0x5a, 0x5c, 0x5b, 0x5e, 0x60, 0x61,
+      0x60, 0x60, 0x5f, 0x5f, 0x5d, 0x5b, 0x5b, 0x59,
+      0x58, 0x57, 0x56, 0x55, 0x55, 0x55, 0x57, 0x59,
+      0x5b, 0x5b, 0x5d, 0x5c, 0x5c, 0x5e, 0x5e, 0x5e,
+      0x5d, 0x5b, 0x59, 0x56, 0x54, 0x51, 0x51, 0x51,
+      0x52, 0x55, 0x56, 0x56, 0x5a, 0x5d, 0x5f, 0x63,
+      0x66, 0x68, 0x6b, 0x6b, 0x68, 0x67, 0x66, 0x64,
+      0x61, 0x5d, 0x57, 0x52, 0x4f, 0x49, 0x46, 0x45,
+      0x43, 0x43, 0x43, },
+    { 0x1a, 0x22, 0x1f, 0x20, 0x21, 0x22, 0x23, 0x24,
+      0x26, 0x27, 0x2a, 0x2d, 0x31, 0x33, 0x37, 0x3b,
+      0x3d, 0x41, 0x46, 0x49, 0x4a, 0x4d, 0x4f, 0x4f,
+      0x50, 0x51, 0x52, 0x50, 0x4e, 0x4b, 0x4b, 0x48,
+      0x44, 0x42, 0x3e, 0x3c, 0x39, 0x35, 0x33, 0x30,
+      0x2f, 0x2d, 0x2a, 0x28, 0x27, 0x26, 0x25, 0x24,
+      0x23, 0x24, 0x24, 0x25, 0x27, 0x27, 0x29, 0x2a,
+      0x2d, 0x2f, 0x32, 0x35, 0x39, 0x3a, 0x3c, 0x3d,
+      0x3e, 0x3f, 0x40, 0x41, 0x40, 0x3f, 0x3e, 0x3c,
+      0x3a, 0x38, 0x36, 0x33, 0x31, 0x2d, 0x2c, 0x29,
+      0x27, 0x26, 0x24, 0x21, 0x1f, 0x1d, 0x1c, 0x1a,
+      0x19, 0x18, 0x16, 0x15, 0x14, 0x13, 0x12, 0x10,
+      0x11, 0x10, 0x0f, 0x0f, 0x0f, 0x0e, 0x0e, 0x0e,
+      0x0f, 0x0f, 0x0e, 0x0e, 0x0e, 0x0f, 0x0f, 0x10,
+      0x11, 0x12, 0x12, 0x13, 0x15, 0x15, 0x16, 0x16,
+      0x17, 0x18, 0x1a, 0x1b, 0x1c, 0x1e, 0x1f, 0x21,
+      0x22, 0x25, 0x27, 0x2a, 0x2c, 0x2e, 0x33, 0x36,
+      0x39, 0x3a, 0x3d, 0x40, 0x41, 0x45, 0x47, 0x4a,
+      0x4c, 0x4d, 0x4c, 0x4a, 0x48, 0x45, 0x44, 0x41,
+      0x42, 0x42, 0x42, 0x42, 0x42, 0x43, 0x43, 0x44,
+      0x45, 0x47, 0x47, 0x48, 0x47, 0x48, 0x47, 0x47,
+      0x48, 0x48, 0x46, 0x46, 0x46, 0x43, 0x43, 0x41,
+      0x3f, 0x3e, 0x3b, 0x39, 0x38, 0x37, 0x37, 0x37,
+      0x38, 0x38, 0x37, 0x39, 0x39, 0x3a, 0x3c, 0x3e,
+      0x3e, 0x3f, 0x3f, 0x3f, 0x42, 0x43, 0x43, 0x45,
+      0x47, 0x48, 0x4b, 0x4c, 0x4e, 0x50, 0x51, 0x54,
+      0x56, 0x58, 0x5a, 0x5c, 0x5c, 0x5f, 0x5f, 0x5f,
+      0x61, 0x60, 0x5f, 0x5f, 0x5e, 0x5b, 0x5c, 0x5b,
+      0x59, 0x59, 0x57, 0x56, 0x55, 0x56, 0x57, 0x59,
+      0x5a, 0x5b, 0x5c, 0x5c, 0x5d, 0x5e, 0x5e, 0x5d,
+      0x5e, 0x5c, 0x5a, 0x57, 0x55, 0x52, 0x51, 0x52,
+      0x53, 0x55, 0x57, 0x58, 0x5c, 0x5e, 0x61, 0x65,
+      0x69, 0x6b, 0x6c, 0x6b, 0x6a, 0x69, 0x67, 0x64,
+      0x61, 0x5d, 0x59, 0x53, 0x4d, 0x48, 0x46, 0x45,
+      0x44, 0x44, 0x43, },
+    { 0x1a, 0x21, 0x1e, 0x1f, 0x20, 0x21, 0x23, 0x24,
+      0x25, 0x28, 0x2a, 0x2e, 0x31, 0x33, 0x37, 0x3b,
+      0x3e, 0x41, 0x46, 0x49, 0x4b, 0x4d, 0x4f, 0x4e,
+      0x50, 0x51, 0x51, 0x50, 0x4e, 0x4b, 0x4a, 0x48,
+      0x44, 0x42, 0x3e, 0x3c, 0x39, 0x35, 0x32, 0x30,
+      0x2f, 0x2d, 0x29, 0x27, 0x27, 0x26, 0x25, 0x24,
+      0x23, 0x24, 0x24, 0x25, 0x26, 0x27, 0x29, 0x2a,
+      0x2c, 0x2f, 0x32, 0x35, 0x38, 0x3b, 0x3c, 0x3e,
+      0x3f, 0x3f, 0x40, 0x41, 0x40, 0x3f, 0x3e, 0x3c,
+      0x3a, 0x39, 0x36, 0x34, 0x31, 0x2d, 0x2c, 0x29,
+      0x27, 0x26, 0x24, 0x21, 0x1f, 0x1d, 0x1c, 0x1a,
+      0x19, 0x17, 0x16, 0x15, 0x14, 0x13, 0x12, 0x10,
+      0x11, 0x10, 0x0f, 0x0f, 0x0f, 0x0e, 0x0e, 0x0e,
+      0x0e, 0x0e, 0x0e, 0x0e, 0x0e, 0x0f, 0x0f, 0x10,
+      0x11, 0x13, 0x14, 0x14, 0x15, 0x16, 0x17, 0x19,
+      0x19, 0x1a, 0x1c, 0x1d, 0x1e, 0x20, 0x22, 0x24,
+      0x25, 0x27, 0x29, 0x2c, 0x2e, 0x31, 0x35, 0x38,
+      0x3a, 0x3d, 0x41, 0x42, 0x45, 0x48, 0x4c, 0x4e,
+      0x4f, 0x4f, 0x4f, 0x4d, 0x4b, 0x49, 0x47, 0x47,
+      0x46, 0x45, 0x45, 0x45, 0x44, 0x44, 0x46, 0x47,
+      0x48, 0x49, 0x4b, 0x4b, 0x4a, 0x4b, 0x4b, 0x4a,
+      0x4b, 0x4a, 0x49, 0x49, 0x48, 0x46, 0x46, 0x44,
+      0x42, 0x41, 0x3d, 0x3b, 0x3a, 0x38, 0x38, 0x38,
+      0x37, 0x37, 0x39, 0x38, 0x3a, 0x3a, 0x3c, 0x3c,
+      0x3e, 0x40, 0x40, 0x41, 0x43, 0x43, 0x45, 0x46,
+      0x48, 0x49, 0x4b, 0x4e, 0x4f, 0x50, 0x53, 0x55,
+      0x57, 0x59, 0x5b, 0x5c, 0x5d, 0x5e, 0x5f, 0x60,
+      0x60, 0x60, 0x5f, 0x5f, 0x5e, 0x5c, 0x5b, 0x5a,
+      0x59, 0x58, 0x57, 0x57, 0x56, 0x56, 0x57, 0x58,
+      0x59, 0x5a, 0x5b, 0x5c, 0x5c, 0x5d, 0x5e, 0x5d,
+      0x5c, 0x5b, 0x58, 0x57, 0x54, 0x52, 0x52, 0x53,
+      0x54, 0x57, 0x58, 0x58, 0x5b, 0x5e, 0x62, 0x65,
+      0x69, 0x6b, 0x6d, 0x6c, 0x6a, 0x69, 0x67, 0x64,
+      0x62, 0x5e, 0x59, 0x54, 0x4d, 0x48, 0x47, 0x46,
+      0x45, 0x45, 0x44, },
+    { 0x1a, 0x21, 0x1e, 0x1f, 0x20, 0x21, 0x23, 0x24,
+      0x25, 0x28, 0x2a, 0x2e, 0x31, 0x34, 0x37, 0x3b,
+      0x3e, 0x42, 0x47, 0x49, 0x4b, 0x4d, 0x4f, 0x4f,
+      0x50, 0x51, 0x51, 0x50, 0x50, 0x4c, 0x4a, 0x47,
+      0x44, 0x42, 0x3e, 0x3c, 0x39, 0x35, 0x32, 0x31,
+      0x2f, 0x2d, 0x29, 0x27, 0x26, 0x26, 0x25, 0x24,
+      0x23, 0x24, 0x25, 0x25, 0x26, 0x27, 0x29, 0x2b,
+      0x2c, 0x2f, 0x33, 0x35, 0x38, 0x3a, 0x3c, 0x3e,
+      0x40, 0x40, 0x41, 0x42, 0x41, 0x3f, 0x3f, 0x3d,
+      0x3b, 0x39, 0x36, 0x33, 0x32, 0x2e, 0x2d, 0x2a,
+      0x27, 0x26, 0x25, 0x22, 0x1f, 0x1d, 0x1c, 0x1b,
+      0x19, 0x17, 0x17, 0x16, 0x15, 0x14, 0x12, 0x11,
+      0x11, 0x11, 0x10, 0x10, 0x0f, 0x0f, 0x0f, 0x0f,
+      0x0f, 0x0f, 0x10, 0x11, 0x10, 0x11, 0x11, 0x12,
+      0x11, 0x14, 0x15, 0x16, 0x17, 0x18, 0x19, 0x1b,
+      0x1c, 0x1c, 0x1e, 0x20, 0x21, 0x22, 0x23, 0x25,
+      0x27, 0x2a, 0x2c, 0x2f, 0x31, 0x35, 0x38, 0x3b,
+      0x3d, 0x40, 0x44, 0x47, 0x49, 0x4c, 0x4f, 0x51,
+      0x53, 0x53, 0x53, 0x51, 0x50, 0x4e, 0x4c, 0x4b,
+      0x4a, 0x49, 0x49, 0x49, 0x49, 0x4a, 0x4a, 0x4d,
+      0x4e, 0x4e, 0x4f, 0x50, 0x4f, 0x50, 0x51, 0x50,
+      0x50, 0x4e, 0x4d, 0x4c, 0x4b, 0x48, 0x48, 0x47,
+      0x44, 0x42, 0x3f, 0x3d, 0x3b, 0x3a, 0x39, 0x39,
+      0x39, 0x38, 0x39, 0x3b, 0x3a, 0x3c, 0x3e, 0x3d,
+      0x40, 0x40, 0x40, 0x42, 0x42, 0x42, 0x45, 0x46,
+      0x47, 0x49, 0x4c, 0x4e, 0x50, 0x50, 0x53, 0x56,
+      0x58, 0x59, 0x5d, 0x5d, 0x5e, 0x60, 0x61, 0x61,
+      0x62, 0x61, 0x60, 0x60, 0x5e, 0x5d, 0x5d, 0x5b,
+      0x57, 0x58, 0x56, 0x55, 0x55, 0x56, 0x56, 0x59,
+      0x59, 0x58, 0x5a, 0x5a, 0x5a, 0x5c, 0x5c, 0x5c,
+      0x5b, 0x5b, 0x58, 0x57, 0x54, 0x53, 0x52, 0x53,
+      0x54, 0x57, 0x58, 0x59, 0x5c, 0x5f, 0x63, 0x67,
+      0x6b, 0x6d, 0x6e, 0x6e, 0x6b, 0x6a, 0x68, 0x64,
+      0x62, 0x5e, 0x58, 0x53, 0x4f, 0x49, 0x47, 0x46,
+      0x45, 0x45, 0x44, },
+    { 0x19, 0x20, 0x1e, 0x1e, 0x1f, 0x20, 0x22, 0x23,
+      0x25, 0x27, 0x2a, 0x2e, 0x31, 0x34, 0x37, 0x3a,
+      0x3e, 0x41, 0x46, 0x49, 0x4a, 0x4d, 0x4f, 0x4e,
+      0x50, 0x51, 0x51, 0x4f, 0x4f, 0x4d, 0x49, 0x47,
+      0x44, 0x42, 0x3e, 0x3c, 0x39, 0x36, 0x32, 0x31,
+      0x2f, 0x2d, 0x29, 0x27, 0x26, 0x26, 0x25, 0x24,
+      0x23, 0x24, 0x25, 0x25, 0x26, 0x28, 0x29, 0x2b,
+      0x2c, 0x2f, 0x33, 0x35, 0x38, 0x3a, 0x3c, 0x3e,
+      0x3f, 0x3f, 0x41, 0x42, 0x41, 0x3f, 0x3f, 0x3d,
+      0x3c, 0x39, 0x36, 0x33, 0x32, 0x2e, 0x2d, 0x2a,
+      0x27, 0x26, 0x25, 0x22, 0x1f, 0x1e, 0x1d, 0x1b,
+      0x1a, 0x17, 0x17, 0x17, 0x14, 0x14, 0x12, 0x11,
+      0x11, 0x12, 0x11, 0x11, 0x10, 0x10, 0x10, 0x10,
+      0x10, 0x10, 0x11, 0x11, 0x11, 0x12, 0x13, 0x14,
+      0x14, 0x16, 0x17, 0x18, 0x19, 0x1a, 0x1c, 0x1e,
+      0x1e, 0x1f, 0x22, 0x23, 0x23, 0x24, 0x25, 0x27,
+      0x2a, 0x2d, 0x2f, 0x31, 0x35, 0x38, 0x3a, 0x3e,
+      0x41, 0x44, 0x48, 0x4b, 0x4d, 0x51, 0x53, 0x55,
+      0x57, 0x57, 0x56, 0x55, 0x54, 0x52, 0x52, 0x50,
+      0x4e, 0x50, 0x4e, 0x4d, 0x4d, 0x4d, 0x4f, 0x51,
+      0x51, 0x52, 0x54, 0x55, 0x55, 0x55, 0x57, 0x55,
+      0x54, 0x53, 0x52, 0x4e, 0x4d, 0x4b, 0x4a, 0x49,
+      0x46, 0x44, 0x41, 0x3f, 0x3d, 0x3b, 0x3a, 0x3a,
+      0x39, 0x39, 0x39, 0x39, 0x3a, 0x3b, 0x3d, 0x3e,
+      0x3f, 0x40, 0x41, 0x42, 0x44, 0x44, 0x45, 0x47,
+      0x49, 0x49, 0x4a, 0x4d, 0x50, 0x51, 0x53, 0x57,
+      0x5a, 0x5b, 0x5e, 0x5f, 0x60, 0x61, 0x62, 0x62,
+      0x63, 0x62, 0x60, 0x60, 0x5e, 0x5c, 0x5c, 0x59,
+      0x58, 0x56, 0x55, 0x55, 0x55, 0x55, 0x55, 0x54,
+      0x56, 0x56, 0x57, 0x58, 0x58, 0x59, 0x5a, 0x59,
+      0x58, 0x57, 0x56, 0x55, 0x54, 0x52, 0x53, 0x53,
+      0x53, 0x56, 0x57, 0x59, 0x5b, 0x5e, 0x62, 0x66,
+      0x6a, 0x6c, 0x6d, 0x6e, 0x6b, 0x69, 0x67, 0x64,
+      0x61, 0x5d, 0x58, 0x54, 0x50, 0x4a, 0x47, 0x46,
+      0x45, 0x45, 0x44, },
+    { 0x1a, 0x21, 0x1e, 0x1f, 0x1f, 0x20, 0x22, 0x23,
+      0x25, 0x27, 0x2b, 0x2e, 0x31, 0x34, 0x37, 0x3b,
+      0x3d, 0x42, 0x45, 0x49, 0x4a, 0x4d, 0x4e, 0x4e,
+      0x51, 0x52, 0x50, 0x4f, 0x4f, 0x4c, 0x49, 0x48,
+      0x45, 0x42, 0x3e, 0x3b, 0x39, 0x36, 0x32, 0x32,
+      0x2f, 0x2c, 0x2a, 0x28, 0x26, 0x26, 0x25, 0x24,
+      0x23, 0x24, 0x24, 0x25, 0x25, 0x28, 0x29, 0x2b,
+      0x2d, 0x2f, 0x33, 0x35, 0x38, 0x3a, 0x3c, 0x3e,
+      0x3f, 0x3f, 0x41, 0x42, 0x41, 0x3f, 0x3e, 0x3c,
+      0x3c, 0x3a, 0x37, 0x33, 0x32, 0x2f, 0x2d, 0x2b,
+      0x28, 0x26, 0x25, 0x22, 0x20, 0x1e, 0x1d, 0x1b,
+      0x1a, 0x17, 0x17, 0x16, 0x14, 0x14, 0x12, 0x11,
+      0x12, 0x11, 0x11, 0x11, 0x11, 0x10, 0x10, 0x10,
+      0x10, 0x11, 0x12, 0x12, 0x12, 0x13, 0x14, 0x14,
+      0x16, 0x18, 0x19, 0x1a, 0x1b, 0x1d, 0x1e, 0x1f,
+      0x21, 0x22, 0x23, 0x25, 0x26, 0x26, 0x28, 0x2a,
+      0x2c, 0x2e, 0x32, 0x34, 0x39, 0x39, 0x3d, 0x41,
+      0x45, 0x47, 0x4c, 0x4e, 0x51, 0x54, 0x56, 0x58,
+      0x5b, 0x5c, 0x5a, 0x59, 0x58, 0x56, 0x55, 0x53,
+      0x53, 0x52, 0x52, 0x51, 0x52, 0x52, 0x53, 0x55,
+      0x57, 0x58, 0x5a, 0x5a, 0x59, 0x5b, 0x59, 0x59,
+      0x58, 0x57, 0x55, 0x53, 0x51, 0x4e, 0x4c, 0x4a,
+      0x48, 0x46, 0x43, 0x40, 0x3e, 0x3c, 0x3b, 0x3b,
+      0x38, 0x39, 0x38, 0x39, 0x3a, 0x3d, 0x3d, 0x3e,
+      0x3f, 0x40, 0x41, 0x43, 0x44, 0x45, 0x46, 0x48,
+      0x4a, 0x4b, 0x4d, 0x4e, 0x50, 0x52, 0x54, 0x56,
+      0x59, 0x5c, 0x5e, 0x5f, 0x60, 0x62, 0x62, 0x63,
+      0x63, 0x63, 0x61, 0x5f, 0x5e, 0x5d, 0x5c, 0x5b,
+      0x59, 0x56, 0x56, 0x55, 0x54, 0x53, 0x53, 0x54,
+      0x55, 0x54, 0x55, 0x55, 0x55, 0x57, 0x58, 0x57,
+      0x57, 0x56, 0x55, 0x54, 0x54, 0x52, 0x52, 0x53,
+      0x54, 0x55, 0x57, 0x58, 0x5b, 0x5e, 0x62, 0x65,
+      0x69, 0x6b, 0x6d, 0x6e, 0x6a, 0x69, 0x67, 0x63,
+      0x61, 0x5d, 0x58, 0x54, 0x4f, 0x4b, 0x48, 0x47,
+      0x46, 0x45, 0x45, },
+    { 0x1a, 0x21, 0x1e, 0x1f, 0x1f, 0x20, 0x22, 0x23,
+      0x25, 0x27, 0x2b, 0x2d, 0x31, 0x34, 0x37, 0x3b,
+      0x3d, 0x42, 0x45, 0x48, 0x4c, 0x4e, 0x4e, 0x4f,
+      0x51, 0x52, 0x50, 0x50, 0x4f, 0x4c, 0x4a, 0x48,
+      0x45, 0x42, 0x3f, 0x3b, 0x39, 0x36, 0x32, 0x31,
+      0x2f, 0x2c, 0x2a, 0x28, 0x26, 0x26, 0x25, 0x24,
+      0x23, 0x24, 0x24, 0x25, 0x27, 0x28, 0x29, 0x2b,
+      0x2d, 0x30, 0x33, 0x36, 0x39, 0x3b, 0x3d, 0x3f,
+      0x3f, 0x40, 0x42, 0x43, 0x42, 0x40, 0x3e, 0x3c,
+      0x3c, 0x3a, 0x37, 0x34, 0x32, 0x2f, 0x2d, 0x2c,
+      0x2a, 0x27, 0x26, 0x23, 0x20, 0x1e, 0x1d, 0x1c,
+      0x1a, 0x18, 0x18, 0x17, 0x15, 0x16, 0x14, 0x12,
+      0x12, 0x12, 0x12, 0x12, 0x12, 0x11, 0x11, 0x12,
+      0x12, 0x12, 0x13, 0x14, 0x14, 0x14, 0x15, 0x16,
+      0x17, 0x19, 0x1b, 0x1c, 0x1e, 0x20, 0x20, 0x22,
+      0x24, 0x25, 0x26, 0x27, 0x28, 0x2a, 0x2c, 0x2c,
+      0x2f, 0x32, 0x35, 0x37, 0x3b, 0x3c, 0x41, 0x45,
+      0x48, 0x4c, 0x50, 0x52, 0x54, 0x57, 0x5a, 0x5c,
+      0x5f, 0x5f, 0x5f, 0x5d, 0x5c, 0x5b, 0x5a, 0x58,
+      0x57, 0x57, 0x57, 0x56, 0x56, 0x57, 0x57, 0x5a,
+      0x5c, 0x5e, 0x5f, 0x61, 0x5f, 0x5f, 0x5f, 0x5e,
+      0x5d, 0x5c, 0x5a, 0x57, 0x55, 0x52, 0x4f, 0x4e,
+      0x4a, 0x47, 0x46, 0x42, 0x41, 0x3e, 0x3d, 0x3c,
+      0x3b, 0x3a, 0x39, 0x39, 0x3b, 0x3c, 0x3d, 0x3f,
+      0x40, 0x42, 0x42, 0x44, 0x45, 0x46, 0x49, 0x49,
+      0x4b, 0x4c, 0x4e, 0x4f, 0x51, 0x54, 0x57, 0x58,
+      0x5b, 0x5d, 0x61, 0x61, 0x61, 0x63, 0x65, 0x65,
+      0x64, 0x64, 0x62, 0x61, 0x60, 0x5e, 0x5d, 0x5c,
+      0x59, 0x58, 0x56, 0x54, 0x53, 0x53, 0x53, 0x54,
+      0x54, 0x53, 0x53, 0x54, 0x54, 0x54, 0x55, 0x55,
+      0x56, 0x55, 0x54, 0x53, 0x53, 0x52, 0x52, 0x53,
+      0x55, 0x56, 0x57, 0x58, 0x5b, 0x5e, 0x62, 0x66,
+      0x69, 0x6b, 0x6d, 0x6d, 0x6b, 0x69, 0x67, 0x64,
+      0x61, 0x5d, 0x58, 0x55, 0x50, 0x4b, 0x48, 0x47,
+      0x46, 0x46, 0x46, },
+    { 0x1a, 0x20, 0x1e, 0x1f, 0x1f, 0x21, 0x22, 0x23,
+      0x25, 0x27, 0x2b, 0x2d, 0x31, 0x34, 0x37, 0x3b,
+      0x3d, 0x42, 0x45, 0x48, 0x4c, 0x4e, 0x4f, 0x4f,
+      0x51, 0x52, 0x51, 0x50, 0x4e, 0x4b, 0x4a, 0x48,
+      0x45, 0x42, 0x3f, 0x3b, 0x38, 0x36, 0x32, 0x31,
+      0x2f, 0x2c, 0x2a, 0x28, 0x26, 0x26, 0x25, 0x24,
+      0x23, 0x24, 0x24, 0x25, 0x27, 0x28, 0x29, 0x2b,
+      0x2e, 0x30, 0x33, 0x36, 0x39, 0x3b, 0x3d, 0x3f,
+      0x3f, 0x40, 0x41, 0x42, 0x41, 0x40, 0x3e, 0x3c,
+      0x3c, 0x3a, 0x37, 0x34, 0x33, 0x30, 0x2e, 0x2b,
+      0x29, 0x26, 0x24, 0x24, 0x20, 0x1f, 0x1d, 0x1d,
+      0x1a, 0x19, 0x17, 0x16, 0x16, 0x16, 0x16, 0x14,
+      0x13, 0x12, 0x13, 0x13, 0x13, 0x12, 0x12, 0x13,
+      0x13, 0x14, 0x15, 0x15, 0x14, 0x15, 0x16, 0x18,
+      0x19, 0x1b, 0x1c, 0x1e, 0x20, 0x21, 0x22, 0x24,
+      0x27, 0x28, 0x29, 0x2a, 0x2c, 0x2c, 0x2d, 0x2f,
+      0x32, 0x35, 0x37, 0x3a, 0x3c, 0x3e, 0x44, 0x48,
+      0x4c, 0x50, 0x54, 0x56, 0x58, 0x5b, 0x5e, 0x60,
+      0x61, 0x63, 0x62, 0x61, 0x60, 0x5f, 0x5e, 0x5e,
+      0x5c, 0x5c, 0x5b, 0x5a, 0x5a, 0x5b, 0x5c, 0x5e,
+      0x60, 0x63, 0x64, 0x65, 0x63, 0x62, 0x63, 0x63,
+      0x61, 0x60, 0x5e, 0x5b, 0x58, 0x55, 0x51, 0x4f,
+      0x4c, 0x4a, 0x47, 0x44, 0x42, 0x41, 0x3e, 0x3c,
+      0x3b, 0x3a, 0x3a, 0x3b, 0x3b, 0x3c, 0x3e, 0x3f,
+      0x40, 0x42, 0x43, 0x45, 0x46, 0x47, 0x49, 0x4a,
+      0x4c, 0x4c, 0x4f, 0x51, 0x52, 0x55, 0x58, 0x5b,
+      0x5c, 0x5f, 0x61, 0x62, 0x63, 0x64, 0x64, 0x65,
+      0x66, 0x65, 0x63, 0x62, 0x5f, 0x5e, 0x5e, 0x5c,
+      0x5b, 0x58, 0x56, 0x55, 0x54, 0x53, 0x52, 0x53,
+      0x52, 0x52, 0x52, 0x52, 0x52, 0x53, 0x55, 0x55,
+      0x55, 0x53, 0x53, 0x53, 0x52, 0x51, 0x52, 0x52,
+      0x55, 0x55, 0x58, 0x58, 0x5b, 0x5d, 0x61, 0x65,
+      0x68, 0x6a, 0x6c, 0x6b, 0x69, 0x68, 0x67, 0x64,
+      0x61, 0x5e, 0x58, 0x54, 0x4f, 0x4b, 0x49, 0x48,
+      0x47, 0x46, 0x45, },
+    { 0x19, 0x20, 0x1d, 0x1f, 0x1f, 0x20, 0x23, 0x23,
+      0x25, 0x27, 0x2b, 0x2d, 0x31, 0x34, 0x37, 0x3b,
+      0x3d, 0x42, 0x45, 0x48, 0x4c, 0x4e, 0x4f, 0x4f,
+      0x51, 0x52, 0x51, 0x50, 0x4e, 0x4b, 0x4a, 0x48,
+      0x44, 0x42, 0x3f, 0x3a, 0x38, 0x36, 0x32, 0x30,
+      0x2f, 0x2c, 0x2a, 0x28, 0x26, 0x26, 0x25, 0x24,
+      0x23, 0x24, 0x24, 0x25, 0x26, 0x28, 0x29, 0x2b,
+      0x2e, 0x30, 0x34, 0x36, 0x39, 0x3b, 0x3d, 0x3f,
+      0x3f, 0x40, 0x41, 0x42, 0x41, 0x40, 0x3e, 0x3c,
+      0x3c, 0x3a, 0x37, 0x34, 0x33, 0x30, 0x2e, 0x2b,
+      0x29, 0x27, 0x25, 0x24, 0x21, 0x1f, 0x1e, 0x1c,
+      0x1b, 0x19, 0x17, 0x16, 0x16, 0x16, 0x16, 0x14,
+      0x13, 0x12, 0x13, 0x13, 0x13, 0x13, 0x13, 0x13,
+      0x13, 0x14, 0x15, 0x14, 0x14, 0x14, 0x17, 0x19,
+      0x1a, 0x1c, 0x1e, 0x20, 0x21, 0x23, 0x24, 0x26,
+      0x29, 0x29, 0x2b, 0x2c, 0x2d, 0x2e, 0x30, 0x31,
+      0x34, 0x38, 0x3b, 0x3c, 0x3f, 0x42, 0x47, 0x4c,
+      0x50, 0x54, 0x57, 0x5b, 0x5c, 0x5e, 0x62, 0x63,
+      0x66, 0x66, 0x66, 0x65, 0x64, 0x63, 0x61, 0x62,
+      0x60, 0x60, 0x5f, 0x5e, 0x5e, 0x5f, 0x60, 0x62,
+      0x65, 0x67, 0x69, 0x6a, 0x69, 0x68, 0x69, 0x67,
+      0x66, 0x64, 0x62, 0x5f, 0x5c, 0x58, 0x54, 0x51,
+      0x4e, 0x4b, 0x49, 0x45, 0x43, 0x41, 0x40, 0x3e,
+      0x3c, 0x3a, 0x3b, 0x3b, 0x3c, 0x3d, 0x3e, 0x3f,
+      0x41, 0x42, 0x44, 0x46, 0x46, 0x48, 0x49, 0x4b,
+      0x4d, 0x50, 0x51, 0x53, 0x55, 0x57, 0x58, 0x5c,
+      0x5f, 0x60, 0x63, 0x64, 0x64, 0x65, 0x66, 0x66,
+      0x66, 0x65, 0x65, 0x63, 0x61, 0x5f, 0x5e, 0x5c,
+      0x5a, 0x58, 0x56, 0x55, 0x54, 0x53, 0x52, 0x52,
+      0x53, 0x52, 0x52, 0x52, 0x52, 0x53, 0x53, 0x53,
+      0x54, 0x53, 0x53, 0x52, 0x53, 0x51, 0x53, 0x53,
+      0x55, 0x57, 0x58, 0x59, 0x5b, 0x5d, 0x62, 0x64,
+      0x68, 0x6a, 0x6c, 0x6b, 0x69, 0x68, 0x67, 0x64,
+      0x61, 0x5d, 0x57, 0x54, 0x50, 0x4a, 0x48, 0x47,
+      0x46, 0x45, 0x45, },
diff --git a/tests/tcg/hexagon/hvx_histogram_row.h b/tests/tcg/hexagon/hvx_histogram_row.h
new file mode 100644
index 0000000..6a4531a
--- /dev/null
+++ b/tests/tcg/hexagon/hvx_histogram_row.h
@@ -0,0 +1,24 @@
+/*
+ *  Copyright(c) 2021 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HVX_HISTOGRAM_ROW_H
+#define HVX_HISTOGRAM_ROW_H
+
+void hvx_histogram_row(uint8_t *src, int stride, int width, int height,
+                       int *hist);
+
+#endif
diff --git a/tests/tcg/hexagon/hvx_histogram.c b/tests/tcg/hexagon/hvx_histogram.c
new file mode 100644
index 0000000..43377a9
--- /dev/null
+++ b/tests/tcg/hexagon/hvx_histogram.c
@@ -0,0 +1,88 @@
+/*
+ *  Copyright(c) 2021 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <stdio.h>
+#include <stdint.h>
+#include <string.h>
+#include "hvx_histogram_row.h"
+
+const int vector_len = 128;
+const int width = 275;
+const int height = 20;
+const int stride = (width + vector_len - 1) & -vector_len;
+
+int err;
+
+static uint8_t input[height][stride] __attribute__((aligned(128))) = {
+#include "hvx_histogram_input.h"
+};
+
+static int result[256] __attribute__((aligned(128)));
+static int expect[256] __attribute__((aligned(128)));
+
+static void check(void)
+{
+    for (int i = 0; i < 256; i++) {
+        int res = result[i];
+        int exp = expect[i];
+        if (res != exp) {
+            printf("ERROR at %3d: 0x%04x != 0x%04x\n",
+                   i, res, exp);
+            err++;
+        }
+    }
+}
+
+static void ref_histogram(uint8_t *src, int stride, int width, int height,
+                          int *hist)
+{
+    for (int i = 0; i < 256; i++) {
+        hist[i] = 0;
+    }
+
+    for (int i = 0; i < height; i++) {
+        for (int j = 0; j < width; j++) {
+            hist[src[i * stride + j]]++;
+        }
+    }
+}
+
+static void hvx_histogram(uint8_t *src, int stride, int width, int height,
+                          int *hist)
+{
+    int n = 8192 / width;
+
+    for (int i = 0; i < 256; i++) {
+        hist[i] = 0;
+    }
+
+    for (int i = 0; i < height; i += n) {
+        int k = height - i > n ? n : height - i;
+        hvx_histogram_row(src, stride, width, k, hist);
+        src += n * stride;
+    }
+}
+
+int main()
+{
+    ref_histogram(&input[0][0], stride, width, height, expect);
+    hvx_histogram(&input[0][0], stride, width, height, result);
+    check();
+
+    puts(err ? "FAIL" : "PASS");
+    return err ? 1 : 0;
+}
diff --git a/tests/tcg/hexagon/Makefile.target b/tests/tcg/hexagon/Makefile.target
index bde90e7..5b0e2a2 100644
--- a/tests/tcg/hexagon/Makefile.target
+++ b/tests/tcg/hexagon/Makefile.target
@@ -51,9 +51,14 @@ HEX_TESTS += scatter_gather
 HEX_TESTS += atomics
 HEX_TESTS += fpstuff
 HEX_TESTS += hvx_misc
+HEX_TESTS += hvx_histogram
 
 TESTS += $(HEX_TESTS)
 
 scatter_gather: CFLAGS += -mhvx
 vector_add_int: CFLAGS += -mhvx -fvectorize
 hvx_misc: CFLAGS += -mhvx
+hvx_histogram: CFLAGS += -mhvx -Wno-gnu-folding-constant
+
+hvx_histogram: hvx_histogram.c hvx_histogram_row.S
+	$(CC) $(CFLAGS) $(CROSS_CC_GUEST_CFLAGS) $^ -o $@
diff --git a/tests/tcg/hexagon/hvx_histogram_row.S b/tests/tcg/hexagon/hvx_histogram_row.S
new file mode 100644
index 0000000..2372fb9
--- /dev/null
+++ b/tests/tcg/hexagon/hvx_histogram_row.S
@@ -0,0 +1,294 @@
+/*
+ *  Copyright(c) 2021 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+
+/*
+ * void hvx_histogram_row(uint8_t *src,     => r0
+ *                        int stride,       => r1
+ *                        int width,        => r2
+ *                        int height,       => r3
+ *                        int *hist         => r4)
+ */
+    .text
+    .p2align 2
+    .global hvx_histogram_row
+    .type hvx_histogram_row, @function
+hvx_histogram_row:
+    { r2 = lsr(r2, #7)          /* size / VLEN */
+      r5 = and(r2, #127)        /* size % VLEN */
+      v1 = #0
+      v0 = #0
+    }
+    /*
+     * Step 1: Clean the whole vector register file
+     */
+    { v3:2 = v1:0
+      v5:4 = v1:0
+      p0 = cmp.gt(r2, #0)       /* P0 = (width / VLEN > 0) */
+      p1 = cmp.eq(r5, #0)       /* P1 = (width % VLEN == 0) */
+    }
+    { q0 = vsetq(r5)
+      v7:6 = v1:0
+    }
+    { v9:8   = v1:0
+      v11:10 = v1:0
+    }
+    { v13:12 = v1:0
+      v15:14 = v1:0
+    }
+    { v17:16 = v1:0
+      v19:18 = v1:0
+    }
+    { v21:20 = v1:0
+      v23:22 = v1:0
+    }
+    { v25:24 = v1:0
+      v27:26 = v1:0
+    }
+    { v29:28 = v1:0
+      v31:30 = v1:0
+      r10 = add(r0, r1)           /* R10 = &src[2 * stride] */
+      loop1(.outerloop, r3)
+    }
+
+    /*
+     * Step 2: vhist
+     */
+    .falign
+.outerloop:
+    { if (!p0) jump .loopend
+      loop0(.innerloop, r2)
+    }
+
+    .falign
+.innerloop:
+    { v0.tmp = vmem(R0++#1)
+      vhist
+    }:endloop0
+
+    .falign
+.loopend:
+    if (p1) jump .skip       /* if (width % VLEN == 0) done with current row */
+    { v0.tmp = vmem(r0 + #0)
+      vhist(q0)
+    }
+
+    .falign
+.skip:
+    { r0 = r10                    /* R0  = &src[(i + 1) * stride] */
+      r10 = add(r10, r1)          /* R10 = &src[(i + 2) * stride] */
+    }:endloop1
+
+
+    /*
+     * Step 3: Sum up the data
+     */
+    { v0.h = vshuff(v0.h)
+      r10 = ##0x00010001
+    }
+    v1.h = vshuff(v1.h)
+    { V2.h = vshuff(v2.h)
+      v0.w = vdmpy(v0.h, r10.h):sat
+    }
+    { v3.h = vshuff(v3.h)
+      v1.w = vdmpy(v1.h, r10.h):sat
+    }
+    { v4.h = vshuff(V4.h)
+      v2.w = vdmpy(v2.h, r10.h):sat
+    }
+    { v5.h = vshuff(v5.h)
+      v3.w = vdmpy(v3.h, r10.h):sat
+    }
+    { v6.h = vshuff(v6.h)
+      v4.w = vdmpy(v4.h, r10.h):sat
+    }
+    { v7.h = vshuff(v7.h)
+      v5.w = vdmpy(v5.h, r10.h):sat
+    }
+    { v8.h = vshuff(V8.h)
+      v6.w = vdmpy(v6.h, r10.h):sat
+    }
+    { v9.h = vshuff(V9.h)
+      v7.w = vdmpy(v7.h, r10.h):sat
+    }
+    { v10.h = vshuff(v10.h)
+      v8.w = vdmpy(v8.h, r10.h):sat
+    }
+    { v11.h = vshuff(v11.h)
+      v9.w = vdmpy(v9.h, r10.h):sat
+    }
+    { v12.h = vshuff(v12.h)
+      v10.w = vdmpy(v10.h, r10.h):sat
+    }
+    { v13.h = vshuff(V13.h)
+      v11.w = vdmpy(v11.h, r10.h):sat
+    }
+    { v14.h = vshuff(v14.h)
+      v12.w = vdmpy(v12.h, r10.h):sat
+    }
+    { v15.h = vshuff(v15.h)
+      v13.w = vdmpy(v13.h, r10.h):sat
+    }
+    { v16.h = vshuff(v16.h)
+      v14.w = vdmpy(v14.h, r10.h):sat
+    }
+    { v17.h = vshuff(v17.h)
+      v15.w = vdmpy(v15.h, r10.h):sat
+    }
+    { v18.h = vshuff(v18.h)
+      v16.w = vdmpy(v16.h, r10.h):sat
+    }
+    { v19.h = vshuff(v19.h)
+      v17.w = vdmpy(v17.h, r10.h):sat
+    }
+    { v20.h = vshuff(v20.h)
+      v18.W = vdmpy(v18.h, r10.h):sat
+    }
+    { v21.h = vshuff(v21.h)
+      v19.w = vdmpy(v19.h, r10.h):sat
+    }
+    { v22.h = vshuff(v22.h)
+      v20.w = vdmpy(v20.h, r10.h):sat
+    }
+    { v23.h = vshuff(v23.h)
+      v21.w = vdmpy(v21.h, r10.h):sat
+    }
+    { v24.h = vshuff(v24.h)
+      v22.w = vdmpy(v22.h, r10.h):sat
+    }
+    { v25.h = vshuff(v25.h)
+      v23.w = vdmpy(v23.h, r10.h):sat
+    }
+    { v26.h = vshuff(v26.h)
+      v24.w = vdmpy(v24.h, r10.h):sat
+    }
+    { v27.h = vshuff(V27.h)
+      v25.w = vdmpy(v25.h, r10.h):sat
+    }
+    { v28.h = vshuff(v28.h)
+      v26.w = vdmpy(v26.h, r10.h):sat
+    }
+    { v29.h = vshuff(v29.h)
+      v27.w = vdmpy(v27.h, r10.h):sat
+    }
+    { v30.h = vshuff(v30.h)
+      v28.w = vdmpy(v28.h, r10.h):sat
+    }
+    { v31.h = vshuff(v31.h)
+      v29.w = vdmpy(v29.h, r10.h):sat
+      r28 = #32
+    }
+    { vshuff(v1, v0, r28)
+      v30.w = vdmpy(v30.h, r10.h):sat
+    }
+    { vshuff(v3, v2, r28)
+      v31.w = vdmpy(v31.h, r10.h):sat
+    }
+    { vshuff(v5, v4, r28)
+      v0.w = vadd(v1.w, v0.w)
+      v2.w = vadd(v3.w, v2.w)
+    }
+    { vshuff(v7, v6, r28)
+      r7 = #64
+    }
+    { vshuff(v9, v8, r28)
+      v4.w = vadd(v5.w, v4.w)
+      v6.w = vadd(v7.w, v6.w)
+    }
+    vshuff(v11, v10, r28)
+    { vshuff(v13, v12, r28)
+      v8.w = vadd(v9.w, v8.w)
+      v10.w = vadd(v11.w, v10.w)
+    }
+    vshuff(v15, v14, r28)
+    { vshuff(v17, v16, r28)
+      v12.w = vadd(v13.w, v12.w)
+      v14.w = vadd(v15.w, v14.w)
+    }
+    vshuff(v19, v18, r28)
+    { vshuff(v21, v20, r28)
+      v16.w = vadd(v17.w, v16.w)
+      v18.w = vadd(v19.w, v18.w)
+    }
+    vshuff(v23, v22, r28)
+    { vshuff(v25, v24, r28)
+      v20.w = vadd(v21.w, v20.w)
+      v22.w = vadd(v23.w, v22.w)
+    }
+    vshuff(v27, v26, r28)
+    { vshuff(v29, v28, r28)
+      v24.w = vadd(v25.w, v24.w)
+      v26.w = vadd(v27.w, v26.w)
+    }
+    vshuff(v31, v30, r28)
+    { v28.w = vadd(v29.w, v28.w)
+      vshuff(v2, v0, r7)
+    }
+    { v30.w = vadd(v31.w, v30.w)
+      vshuff(v6, v4, r7)
+      v0.w  = vadd(v0.w, v2.w)
+    }
+    { vshuff(v10, v8, r7)
+      v1.tmp = vmem(r4 + #0)      /* update hist[0-31] */
+      v0.w  = vadd(v0.w, v1.w)
+      vmem(r4++#1) = v0.new
+    }
+    { vshuff(v14, v12, r7)
+      v4.w  = vadd(v4.w, v6.w)
+      v8.w  = vadd(v8.w, v10.w)
+    }
+    { vshuff(v18, v16, r7)
+      v1.tmp = vmem(r4 + #0)      /* update hist[32-63] */
+      v4.w  = vadd(v4.w, v1.w)
+      vmem(r4++#1) = v4.new
+    }
+    { vshuff(v22, v20, r7)
+      v12.w = vadd(v12.w, v14.w)
+      V16.w = vadd(v16.w, v18.w)
+    }
+    { vshuff(v26, v24, r7)
+      v1.tmp = vmem(r4 + #0)      /* update hist[64-95] */
+      v8.w  = vadd(v8.w, v1.w)
+      vmem(r4++#1) = v8.new
+    }
+    { vshuff(v30, v28, r7)
+      v1.tmp = vmem(r4 + #0)      /* update hist[96-127] */
+      v12.w  = vadd(v12.w, v1.w)
+      vmem(r4++#1) = v12.new
+    }
+
+    { v20.w = vadd(v20.w, v22.w)
+      v1.tmp = vmem(r4 + #0)      /* update hist[128-159] */
+      v16.w  = vadd(v16.w, v1.w)
+      vmem(r4++#1) = v16.new
+    }
+    { v24.w = vadd(v24.w, v26.w)
+      v1.tmp = vmem(r4 + #0)      /* update hist[160-191] */
+      v20.w  = vadd(v20.w, v1.w)
+      vmem(r4++#1) = v20.new
+    }
+    { v28.w = vadd(v28.w, v30.w)
+      v1.tmp = vmem(r4 + #0)      /* update hist[192-223] */
+      v24.w  = vadd(v24.w, v1.w)
+      vmem(r4++#1) = v24.new
+    }
+    { v1.tmp = vmem(r4 + #0)      /* update hist[224-255] */
+      v28.w  = vadd(v28.w, v1.w)
+      vmem(r4++#1) = v28.new
+    }
+    jumpr r31
+    .size hvx_histogram_row, .-hvx_histogram_row
-- 
2.7.4


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 01/20] Hexagon HVX (target/hexagon) README
  2021-07-05 23:34 ` [PATCH 01/20] Hexagon HVX (target/hexagon) README Taylor Simpson
@ 2021-07-12  8:16   ` Rob Landley
  2021-07-12 13:42     ` Brian Cain
  0 siblings, 1 reply; 40+ messages in thread
From: Rob Landley @ 2021-07-12  8:16 UTC (permalink / raw)
  To: Taylor Simpson, qemu-devel
  Cc: ale, peter.maydell, philmd, richard.henderson, bcain

On 7/5/21 6:34 PM, Taylor Simpson wrote:
> Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
> ---
>  target/hexagon/README | 83 ++++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 82 insertions(+), 1 deletion(-)

I'm poking at the hexagon toolchain build script you checked into the test
directory, which boils down to (starting with):

git clone https://github.com/llvm/llvm-project
mkdir build-llvm
cd build-llvm
cmake -G Ninja -DCMAKE_BUILD_TYPE=Release \
  -DCMAKE_INSTALL_PREFIX=$(readlink -f ../llvm) -DLLVM_ENABLE_LLD=ON \
  -DLLVM_TARGETS_TO_BUILD="Hexagon" -DLLVM_ENABLE_PROJECTS="clang;lld" \
  $(readlink -f ../llvm-project/llvm)

Except the LLVM_ENABLE_LLD part breaks with a standard debian/devuan x86-64 host
toolchain because it ONLY works with host llvm, and apparently only a pretty
current one at that:

  https://github.com/tensorflow/mlir-hlo/issues/4

(Devuan Beowulf only packages lld-7, not lld-10.)

I'm currently building:

cmake -G Ninja -DCMAKE_BUILD_TYPE=Release \
  -DCMAKE_INSTALL_PREFIX=$(readlink -f ../llvm) -DLLVM_ENABLE_PROJECTS="lld" \
  $(readlink -f ../llvm-project/llvm)
ninja all install

On the theory that should give me an lld-git I can play $PATH games and re-run
the other build with, but my QUESTION is what does the LLVM_ENABLE_LLD=potato
actually accomplish here? Is it a sanitizing step or is there something about
building with gcc's lld that's known to break the hexagon toolchain? If I just
omit it (to avoid building lld _twice_) will I (probably) get a working hexagon
toolchain? (Assuming I do the musl and headers-install builds and so on?)

What's the _issue_ here that this config symbol addresses?

Thanks,

Rob


^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH 01/20] Hexagon HVX (target/hexagon) README
  2021-07-12  8:16   ` Rob Landley
@ 2021-07-12 13:42     ` Brian Cain
  2021-07-19  1:10       ` Rob Landley
  0 siblings, 1 reply; 40+ messages in thread
From: Brian Cain @ 2021-07-12 13:42 UTC (permalink / raw)
  To: Rob Landley, Taylor Simpson, qemu-devel
  Cc: ale, peter.maydell, richard.henderson, philmd



> -----Original Message-----
> From: Rob Landley <rob@landley.net>
> Sent: Monday, July 12, 2021 3:16 AM
...
> Except the LLVM_ENABLE_LLD part breaks with a standard debian/devuan x86-
> 64 host
> toolchain because it ONLY works with host llvm, and apparently only a pretty
> current one at that:
> 
>   https://github.com/tensorflow/mlir-hlo/issues/4
> 
> (Devuan Beowulf only packages lld-7, not lld-10.)

Sorry about that, we did test only with fairly current/recent lld.

> I'm currently building:
> 
> cmake -G Ninja -DCMAKE_BUILD_TYPE=Release \
>   -DCMAKE_INSTALL_PREFIX=$(readlink -f ../llvm) -
> DLLVM_ENABLE_PROJECTS="lld" \
>   $(readlink -f ../llvm-project/llvm)
> ninja all install
> 
> On the theory that should give me an lld-git I can play $PATH games and re-run
> the other build with, but my QUESTION is what does the
> LLVM_ENABLE_LLD=potato
> actually accomplish here? Is it a sanitizing step or is there something about
> building with gcc's lld that's known to break the hexagon toolchain? If I just
> omit it (to avoid building lld _twice_) will I (probably) get a working hexagon
> toolchain? (Assuming I do the musl and headers-install builds and so on?)
> 
> What's the _issue_ here that this config symbol addresses?

lld is not necessary to build the hexagon toolchain.  It just happens that the build process takes an enormous amount of time (and memory) using the gnu BFD ld.  I would expect building with either ld/gold  would work fine.

If you don't mind binaries, there are x86_64 linux binary toolchains with lld on releases.llvm.org and there's also a binary hexagon-linux cross toolchain that we shared for use by kernel developers.  The hexagon linux toolchain is built on Ubuntu 16.04.

But when building your toolchain, omitting LLVM_ENABLE_LLD should work just fine.

-Brian

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 01/20] Hexagon HVX (target/hexagon) README
  2021-07-12 13:42     ` Brian Cain
@ 2021-07-19  1:10       ` Rob Landley
  2021-07-19 13:39         ` Brian Cain
  0 siblings, 1 reply; 40+ messages in thread
From: Rob Landley @ 2021-07-19  1:10 UTC (permalink / raw)
  To: Brian Cain, Taylor Simpson, qemu-devel
  Cc: ale, peter.maydell, richard.henderson, philmd

On 7/12/21 8:42 AM, Brian Cain wrote:
> If you don't mind binaries, there are x86_64 linux binary toolchains with lld
> on releases.llvm.org

I've never managed to run those binaries, because they're dynamically linked
against some specific distro I'm not using:

  $ bin/clang --help
  /lib/ld-linux-aarch64.so.1: No such file or directory

All the toolchains I build for distribution are statically linked on the host.
(Back in the day I even wrote a
https://github.com/landley/aboriginal/blob/master/sources/toys/ccwrap.c wrapper
to feed --nostdinc --nostdlib to gcc and build up all the includes again
manually to stop gcc from leaking random host context into the cross compilers,
but these days I use
https://github.com/landley/toybox/blob/master/scripts/mcm-buildall.sh with Rich
Felker's musl-cross-make to create statically linked cross and native musl
toolchains which I would happily post binaries of if they weren't GPLv3 and thus
non-distributable. Oh well.)

Vaguely trying to make an llvm-buildall.sh for toybox, which might be a fourth
section to https://landley.net/toybox/faq.html#cross but first I'm trying to
make the hexagon-only one work based on your model. :)

> and there's also a binary hexagon-linux cross toolchain that
> we shared for use by kernel developers.  The hexagon linux
> toolchain is built on Ubuntu 16.04.

Where's that one?

> But when building your toolchain, omitting LLVM_ENABLE_LLD should work just fine.

It did, thanks.

Now I'm trying to figure out what all the extra CFLAGS are for.

The clang_rt build has CMAKE_ASM_FLAGS="-G0 -mlong-calls -fno-pic
--target=hexagon-unknown-linux-musl" which
https://clang.llvm.org/docs/ClangCommandLineReference.html defines as:

-G<size>
  Put objects of at most <size> bytes into small data section (MIPS / Hexagon)

-mlong-calls
  Generate branches with extended addressability, usually via indirect jumps.

I don't understand why your libcc build needs no-pic? (Are we only building
a static libgcc.a instead of a dynamic one? I'm fine with that if so, but
this needs to be specified in the MAKE_ASM_FLAGS why?)

Why is it saying --target=hexagon-random-nonsense-musl to a toolchain
we built with exactly one target type? How does it NOT default to hexagon?
(Is this related to the build writing a hexagon-potato-banana-musl.cfg file
in the bin directory, meaning the config file is in the $PATH? Does clang only
look for it in the same directory the clang executable lives in?)

And while we're at it, the CONTENTS of hexagon-gratuitous-gnu-format.cfg is:

cat <<EOF > hexagon-unknown-linux-musl.cfg
-G0 --sysroot=${HEX_SYSROOT}
EOF

Which is ALREADY saying -G0? (Why does it want to do that globally? Some sort of
bug workaround?) So why do we specify it again here?

Next up build_musl_headers does CROSS_CFLAGS="-G0 -O0 -mv65 -fno-builtin
-fno-rounding-math --target=hexagon-unknown-linux-musl" which:

-O0
  disable most of the optimizer

-mv65
  -mtune for a specific hexagon generation.
  (Why? Does qemu only support that but not newer?)

-fno-builtin
  musl's ./configure already probes for this and will add it if
  the compiler supports it.

-fno-rounding-math
  the docs MENTION this, but do not explain it.

And again with the -G0.

These flags probably aren't needed _here_ because this is just the headers
install (which is basically a cp -a isn't it?). This looks like it's copied
verbatim from the musl library build. But that library build happens in a bit,
so relevant-ish I guess...

(Also, why does building librt-but-not-really need the libc headers?
The libgcc build EXPLICITLY does not require that, because otherwise you
have this kind of BS circular dependency. Also, how do you EVER build a
bare metal ELF toolchain with that dependency in there?)

Next up build_kernel_headers has KBUILD_CFLAGS_KERNEL="-mlong-calls" which
again, A) why does the compiler not do by default, B) can't be needed here
because you don't even have to specify a cross compiler when doing
headers_install. (I just confirmed this by diffing installs with an without a
cross compiler specified: they were identical. I remembered this from
https://github.com/torvalds/linux/commit/e0e2fa4b515c but checked again to be
sure.) Presumably this is more "shared with full kernel build".

And then build_musl, covered above under the headers build: lotsa flags, not
sure why.

> -Brian
> 

Rob

P.S. It took me a while to figure out that clang_rt is NOT librt.a, I think it's
their libgcc? Especially confusing since librt.a has existed for decades and
was on solaris before it was on linux, and the OBVIOUS name is libcc
the same way "cc" is the generic compiler name instead of "gcc".
(In fact that was the posix compiler name until they decided to replace
it with "c99" and everybody ignored them the way tar->pax was ignored,
largely because make's $CC defaults to "cc" so it Just Works, and yes
the cross compiler should have that name but the prepackaged clang tarball
above does not. *shrug* I fix that up when making my prefix symlinks. The
android NDK guys at least have the excuse of shipping NINE different
x86_64-linux-android*-clang with API version numbers and thus not wanting to
pick a default to single out, so leave making the -cc link as an exercise for
the reader. I give instructions for doing so on the toybox cross compiling page
I linked above. :)


^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH 01/20] Hexagon HVX (target/hexagon) README
  2021-07-19  1:10       ` Rob Landley
@ 2021-07-19 13:39         ` Brian Cain
  2021-07-19 16:19           ` Sid Manning
  0 siblings, 1 reply; 40+ messages in thread
From: Brian Cain @ 2021-07-19 13:39 UTC (permalink / raw)
  To: Rob Landley, Taylor Simpson, qemu-devel, Sid Manning
  Cc: ale, peter.maydell, richard.henderson, philmd



> -----Original Message-----
> From: Rob Landley <rob@landley.net>
...
> On 7/12/21 8:42 AM, Brian Cain wrote:
...
> > and there's also a binary hexagon-linux cross toolchain that
> > we shared for use by kernel developers.  The hexagon linux
> > toolchain is built on Ubuntu 16.04.
> 
> Where's that one?

https://codelinaro.jfrog.io/artifactory/codelinaro-qemu/2021-05-12/clang+llvm-12.0.0-cross-hexagon-unknown-linux-musl.tar.xz - 
	- Built on Ubuntu 16.04, similar dynamic dependencies as releases.llvm.org binaries
	- Manifest:
		- llvm+clang 12.0.0 tag
		- Linux 5.6.18
		- github.com/qemu/qemu 15106f7dc3290ff3254611f265849a314a93eb0e
		- github.com/quic/musl aff74b395fbf59cd7e93b3691905aa1af6c0778c


> > But when building your toolchain, omitting LLVM_ENABLE_LLD should work
> just fine.
> 
> It did, thanks.
> 
> Now I'm trying to figure out what all the extra CFLAGS are for.

+Sid for some perspective on the rationale of these flags.  Some of these flags may be workarounds for toolchain issues.

> The clang_rt build has CMAKE_ASM_FLAGS="-G0 -mlong-calls -fno-pic
> --target=hexagon-unknown-linux-musl" which
> https://clang.llvm.org/docs/ClangCommandLineReference.html defines as:
> 
> -G<size>
>   Put objects of at most <size> bytes into small data section (MIPS / Hexagon)
> 
> -mlong-calls
>   Generate branches with extended addressability, usually via indirect jumps.
> 
> I don't understand why your libcc build needs no-pic? (Are we only building
> a static libgcc.a instead of a dynamic one? I'm fine with that if so, but
> this needs to be specified in the MAKE_ASM_FLAGS why?)
> 
> Why is it saying --target=hexagon-random-nonsense-musl to a toolchain
> we built with exactly one target type? How does it NOT default to hexagon?
> (Is this related to the build writing a hexagon-potato-banana-musl.cfg file
> in the bin directory, meaning the config file is in the $PATH? Does clang only
> look for it in the same directory the clang executable lives in?)
> 
> And while we're at it, the CONTENTS of hexagon-gratuitous-gnu-format.cfg is:
> 
> cat <<EOF > hexagon-unknown-linux-musl.cfg
> -G0 --sysroot=${HEX_SYSROOT}
> EOF
> 
> Which is ALREADY saying -G0? (Why does it want to do that globally? Some
> sort of
> bug workaround?) So why do we specify it again here?
> 
> Next up build_musl_headers does CROSS_CFLAGS="-G0 -O0 -mv65 -fno-builtin
> -fno-rounding-math --target=hexagon-unknown-linux-musl" which:
> 
> -O0
>   disable most of the optimizer
> 
> -mv65
>   -mtune for a specific hexagon generation.
>   (Why? Does qemu only support that but not newer?)
> 
> -fno-builtin
>   musl's ./configure already probes for this and will add it if
>   the compiler supports it.
> 
> -fno-rounding-math
>   the docs MENTION this, but do not explain it.
> 
> And again with the -G0.
> 
> These flags probably aren't needed _here_ because this is just the headers
> install (which is basically a cp -a isn't it?). This looks like it's copied
> verbatim from the musl library build. But that library build happens in a bit,
> so relevant-ish I guess...
> 
> (Also, why does building librt-but-not-really need the libc headers?
> The libgcc build EXPLICITLY does not require that, because otherwise you
> have this kind of BS circular dependency. Also, how do you EVER build a
> bare metal ELF toolchain with that dependency in there?)
> 
> Next up build_kernel_headers has KBUILD_CFLAGS_KERNEL="-mlong-calls"
> which
> again, A) why does the compiler not do by default, B) can't be needed here
> because you don't even have to specify a cross compiler when doing
> headers_install. (I just confirmed this by diffing installs with an without a
> cross compiler specified: they were identical. I remembered this from
> https://github.com/torvalds/linux/commit/e0e2fa4b515c but checked again to
> be
> sure.) Presumably this is more "shared with full kernel build".
> 
> And then build_musl, covered above under the headers build: lotsa flags, not
> sure why.
> 
> > -Brian
> >
> 
> Rob
> 
> P.S. It took me a while to figure out that clang_rt is NOT librt.a, I think it's
> their libgcc? Especially confusing since librt.a has existed for decades and
> was on solaris before it was on linux, and the OBVIOUS name is libcc
> the same way "cc" is the generic compiler name instead of "gcc".
> (In fact that was the posix compiler name until they decided to replace
> it with "c99" and everybody ignored them the way tar->pax was ignored,
> largely because make's $CC defaults to "cc" so it Just Works, and yes
> the cross compiler should have that name but the prepackaged clang tarball
> above does not. *shrug* I fix that up when making my prefix symlinks. The
> android NDK guys at least have the excuse of shipping NINE different
> x86_64-linux-android*-clang with API version numbers and thus not wanting to
> pick a default to single out, so leave making the -cc link as an exercise for
> the reader. I give instructions for doing so on the toybox cross compiling page
> I linked above. :)

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH 01/20] Hexagon HVX (target/hexagon) README
  2021-07-19 13:39         ` Brian Cain
@ 2021-07-19 16:19           ` Sid Manning
  2021-07-26  7:57             ` Rob Landley
  0 siblings, 1 reply; 40+ messages in thread
From: Sid Manning @ 2021-07-19 16:19 UTC (permalink / raw)
  To: Brian Cain, Rob Landley, Taylor Simpson, qemu-devel
  Cc: ale, peter.maydell, richard.henderson, philmd



> -----Original Message-----
> From: Brian Cain <bcain@quicinc.com>
> Sent: Monday, July 19, 2021 8:40 AM
> To: Rob Landley <rob@landley.net>; Taylor Simpson
> <tsimpson@quicinc.com>; qemu-devel@nongnu.org; Sid Manning
> <sidneym@quicinc.com>
> Cc: ale@rev.ng; peter.maydell@linaro.org; richard.henderson@linaro.org;
> philmd@redhat.com
> Subject: RE: [EXT] Re: [PATCH 01/20] Hexagon HVX (target/hexagon)
> README
> 
> 
> 
> > -----Original Message-----
> > From: Rob Landley <rob@landley.net>
> ...
> > On 7/12/21 8:42 AM, Brian Cain wrote:
> ...
> > > and there's also a binary hexagon-linux cross toolchain that we
> > > shared for use by kernel developers.  The hexagon linux toolchain is
> > > built on Ubuntu 16.04.
> >
> > Where's that one?
> 
> https://codelinaro.jfrog.io/artifactory/codelinaro-qemu/2021-05-
> 12/clang+llvm-12.0.0-cross-hexagon-unknown-linux-musl.tar.xz -
> 	- Built on Ubuntu 16.04, similar dynamic dependencies as
> releases.llvm.org binaries
> 	- Manifest:
> 		- llvm+clang 12.0.0 tag
> 		- Linux 5.6.18
> 		- github.com/qemu/qemu
> 15106f7dc3290ff3254611f265849a314a93eb0e
> 		- github.com/quic/musl
> aff74b395fbf59cd7e93b3691905aa1af6c0778c
> 
> 
> > > But when building your toolchain, omitting LLVM_ENABLE_LLD should
> > > work
> > just fine.
> >
> > It did, thanks.
> >
> > Now I'm trying to figure out what all the extra CFLAGS are for.
> 
> +Sid for some perspective on the rationale of these flags.  Some of these
> flags may be workarounds for toolchain issues.
> 
> > The clang_rt build has CMAKE_ASM_FLAGS="-G0 -mlong-calls -fno-pic
> > --target=hexagon-unknown-linux-musl" which
> > https://clang.llvm.org/docs/ClangCommandLineReference.html defines as:
> >
> > -G<size>
> >   Put objects of at most <size> bytes into small data section (MIPS /
> > Hexagon)
> >
> > -mlong-calls
> >   Generate branches with extended addressability, usually via indirect
> jumps.
> >
> > I don't understand why your libcc build needs no-pic? (Are we only
> > building a static libgcc.a instead of a dynamic one? I'm fine with
> > that if so, but this needs to be specified in the MAKE_ASM_FLAGS why?)
> >
> > Why is it saying --target=hexagon-random-nonsense-musl to a toolchain
> > we built with exactly one target type? How does it NOT default to
> hexagon?
> > (Is this related to the build writing a hexagon-potato-banana-musl.cfg
> > file in the bin directory, meaning the config file is in the $PATH?
> > Does clang only look for it in the same directory the clang executable
> > lives in?)
> >
> > And while we're at it, the CONTENTS of hexagon-gratuitous-gnu-format.cfg
> is:
> >
> > cat <<EOF > hexagon-unknown-linux-musl.cfg
> > -G0 --sysroot=${HEX_SYSROOT}
> > EOF
> >
> > Which is ALREADY saying -G0? (Why does it want to do that globally?
> > Some sort of bug workaround?) So why do we specify it again here?
> >
> > Next up build_musl_headers does CROSS_CFLAGS="-G0 -O0 -mv65
> > -fno-builtin -fno-rounding-math --target=hexagon-unknown-linux-musl"
> which:

I agree G0 is overplayed here.  G0 is the implied default on Linux.  On occasion we use a different configuration of clang where small data (G8) is the default so G0 is specified.


> >
> > -O0
> >   disable most of the optimizer

This should be changed.  This was added so I could factor out clang's floating point optimizations.  These optimizations caused musl-libc testsuite to fail some floating point accuracy tests.  I know at least some of those issues have now been resolved.

> >
> > -mv65
> >   -mtune for a specific hexagon generation.
> >   (Why? Does qemu only support that but not newer?)
Passing -mvXX it is now recommended practice.  A few years ago the default arch selected changed from the oldest support arch to the newest arch.  QEMU supports later architectures.

> >
> > -fno-builtin
> >   musl's ./configure already probes for this and will add it if
> >   the compiler supports it.
I did not know this, we can get rid of -fno-builtin if the driver is meeting musl's expectations.


> >
> > -fno-rounding-math
> >   the docs MENTION this, but do not explain it.

This was workaround and is no longer needed.  IIRC clang was asserting while building musl.

> >
> > And again with the -G0.
> >
> > These flags probably aren't needed _here_ because this is just the
> > headers install (which is basically a cp -a isn't it?). This looks
> > like it's copied verbatim from the musl library build. But that
> > library build happens in a bit, so relevant-ish I guess...
> >
> > (Also, why does building librt-but-not-really need the libc headers?
> > The libgcc build EXPLICITLY does not require that, because otherwise
> > you have this kind of BS circular dependency. Also, how do you EVER
> > build a bare metal ELF toolchain with that dependency in there?)

Getting cmake to agree to build compiler-rt might be better now.


> >
> > Next up build_kernel_headers has KBUILD_CFLAGS_KERNEL="-mlong-
> calls"
> > which
> > again, A) why does the compiler not do by default, B) can't be needed
> > here because you don't even have to specify a cross compiler when
> > doing headers_install. (I just confirmed this by diffing installs with
> > an without a cross compiler specified: they were identical. I
> > remembered this from
> > https://github.com/torvalds/linux/commit/e0e2fa4b515c but checked
> > again to be
> > sure.) Presumably this is more "shared with full kernel build".

-mlong-calls are not needed for header install.  -mlong-calls are needed when building the kernel source.  If this is removed the link step may fail with a relocation overflow depending on the version of the kernel source you are building.


> >
> > And then build_musl, covered above under the headers build: lotsa
> > flags, not sure why.
> >
> > > -Brian
> > >
> >
> > Rob
> >
> > P.S. It took me a while to figure out that clang_rt is NOT librt.a, I
> > think it's their libgcc? Especially confusing since librt.a has
> > existed for decades and was on solaris before it was on linux, and the
> > OBVIOUS name is libcc the same way "cc" is the generic compiler name
> instead of "gcc".
> > (In fact that was the posix compiler name until they decided to
> > replace it with "c99" and everybody ignored them the way tar->pax was
> > ignored, largely because make's $CC defaults to "cc" so it Just Works,
> > and yes the cross compiler should have that name but the prepackaged
> > clang tarball above does not. *shrug* I fix that up when making my
> > prefix symlinks. The android NDK guys at least have the excuse of
> > shipping NINE different x86_64-linux-android*-clang with API version
> > numbers and thus not wanting to pick a default to single out, so leave
> > making the -cc link as an exercise for the reader. I give instructions
> > for doing so on the toybox cross compiling page I linked above. :)

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 02/20] Hexagon HVX (target/hexagon) add Hexagon Vector eXtensions (HVX) to core
  2021-07-05 23:34 ` [PATCH 02/20] Hexagon HVX (target/hexagon) add Hexagon Vector eXtensions (HVX) to core Taylor Simpson
@ 2021-07-25 13:08   ` Richard Henderson
  2021-07-26  4:02     ` Taylor Simpson
  0 siblings, 1 reply; 40+ messages in thread
From: Richard Henderson @ 2021-07-25 13:08 UTC (permalink / raw)
  To: Taylor Simpson, qemu-devel; +Cc: ale, bcain, philmd, peter.maydell

On 7/5/21 1:34 PM, Taylor Simpson wrote:
> HVX is a set of wide vector instructions.  Machine state includes
>      vector registers (VRegs)
>      vector predicate registers (QRegs)
>      temporary registers for intermediate values
>      store buffer (masked stores and scatter/gather)
> 
> Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
> ---
>   target/hexagon/cpu.h            | 35 ++++++++++++++++-
>   target/hexagon/hex_arch_types.h |  5 +++
>   target/hexagon/insn.h           |  3 ++
>   target/hexagon/internal.h       |  3 ++
>   target/hexagon/mmvec/mmvec.h    | 83 +++++++++++++++++++++++++++++++++++++++++
>   target/hexagon/cpu.c            | 72 ++++++++++++++++++++++++++++++++++-
>   6 files changed, 198 insertions(+), 3 deletions(-)
>   create mode 100644 target/hexagon/mmvec/mmvec.h
> 
> diff --git a/target/hexagon/cpu.h b/target/hexagon/cpu.h
> index 2855dd3..0b377c3 100644
> --- a/target/hexagon/cpu.h
> +++ b/target/hexagon/cpu.h
> @@ -26,6 +26,7 @@ typedef struct CPUHexagonState CPUHexagonState;
>   #include "qemu-common.h"
>   #include "exec/cpu-defs.h"
>   #include "hex_regs.h"
> +#include "mmvec/mmvec.h"
>   
>   #define NUM_PREGS 4
>   #define TOTAL_PER_THREAD_REGS 64
> @@ -34,6 +35,7 @@ typedef struct CPUHexagonState CPUHexagonState;
>   #define STORES_MAX 2
>   #define REG_WRITES_MAX 32
>   #define PRED_WRITES_MAX 5                   /* 4 insns + endloop */
> +#define VSTORES_MAX 2
>   
>   #define TYPE_HEXAGON_CPU "hexagon-cpu"
>   
> @@ -52,6 +54,13 @@ typedef struct {
>       uint64_t data64;
>   } MemLog;
>   
> +typedef struct {
> +    target_ulong va;
> +    int size;
> +    MMVector mask QEMU_ALIGNED(16);
> +    MMVector data QEMU_ALIGNED(16);
> +} VStoreLog;

Do you really need a MMVector mask, or should this be a QRegMask?

> -    target_ulong gather_issued;
> +    bool gather_issued;

Surely unrelated to adding state.

> +    MMVector VRegs[NUM_VREGS] QEMU_ALIGNED(16);
> +    MMVector future_VRegs[NUM_VREGS] QEMU_ALIGNED(16);
> +    MMVector tmp_VRegs[NUM_VREGS] QEMU_ALIGNED(16);

Ok, this is where I'm not keen on how you handle this for integer code, and for vector 
code has got to be past the realm of acceptable.

You have exactly 4 slots in your vliw packet.  You cannot possibly use 32 future or tmp 
slots.  For integers this wastage was at least small, but for vectors these waste just shy 
of 8k.

All you need to do is to be smarter about mapping e.g. 5 to 8 temp slots during translation.

> +    MMQReg QRegs[NUM_QREGS] QEMU_ALIGNED(16);
> +    MMQReg future_QRegs[NUM_QREGS] QEMU_ALIGNED(16);

Likewise.

> +    /* Temporaries used within instructions */
> +    MMVector zero_vector QEMU_ALIGNED(16);

You must be doing something wrong to need zero in memory.

> +/*
> + * The HVX register dump takes up a ton of space in the log
> + * Don't print it unless it is needed
> + */
> +#define DUMP_HVX 0
> +#if DUMP_HVX
> +    qemu_fprintf(f, "Vector Registers = {\n");
> +    for (int i = 0; i < NUM_VREGS; i++) {
> +        print_vreg(f, env, i);
> +    }
> +    for (int i = 0; i < NUM_QREGS; i++) {
> +        print_qreg(f, env, i);
> +    }
> +    qemu_fprintf(f, "}\n");
> +#endif

Use CPU_DUMP_FPU, controlled by -d fpu.

> -    cc->gdb_num_core_regs = TOTAL_PER_THREAD_REGS;
> +    cc->gdb_num_core_regs = TOTAL_PER_THREAD_REGS + NUM_VREGS + NUM_QREGS;

They're not really core regs though are they?
Surely gdb_register_coprocessor is a better representation.


r~


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 03/20] Hexagon HVX (target/hexagon) register names
  2021-07-05 23:34 ` [PATCH 03/20] Hexagon HVX (target/hexagon) register names Taylor Simpson
@ 2021-07-25 13:10   ` Richard Henderson
  0 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2021-07-25 13:10 UTC (permalink / raw)
  To: Taylor Simpson, qemu-devel; +Cc: ale, bcain, philmd, peter.maydell

On 7/5/21 1:34 PM, Taylor Simpson wrote:
> Signed-off-by: Taylor Simpson<tsimpson@quicinc.com>
> ---
>   target/hexagon/hex_regs.h | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/target/hexagon/hex_regs.h b/target/hexagon/hex_regs.h
> index f291911..e1b3149 100644
> --- a/target/hexagon/hex_regs.h
> +++ b/target/hexagon/hex_regs.h
> @@ -76,6 +76,7 @@ enum {
>       /* Use reserved control registers for qemu execution counts */
>       HEX_REG_QEMU_PKT_CNT      = 52,
>       HEX_REG_QEMU_INSN_CNT     = 53,
> +    HEX_REG_QEMU_HVX_CNT      = 54,
>       HEX_REG_UTIMERLO          = 62,
>       HEX_REG_UTIMERHI          = 63,
>   };

Maybe move the hunk in patch 2 adjusting hexagon_regnames to here?

Anyway,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 06/20] Hexagon HVX (target/hexagon) macros
  2021-07-05 23:34 ` [PATCH 06/20] Hexagon HVX (target/hexagon) macros Taylor Simpson
@ 2021-07-25 13:13   ` Richard Henderson
  0 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2021-07-25 13:13 UTC (permalink / raw)
  To: Taylor Simpson, qemu-devel; +Cc: ale, bcain, philmd, peter.maydell

On 7/5/21 1:34 PM, Taylor Simpson wrote:
> +static inline MMVector mmvec_vtmp_data(CPUHexagonState *env)
> +{
> +    VRegMask vsel = env->VRegs_updated_tmp;
> +    MMVector ret;
> +    int idx = clo32(~revbit32(vsel));
> +    if (vsel == 0) {
> +        printf("[UNDEFINED] no .tmp load when implicitly required...");
> +    }

No random debugging printfs please.


r~



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 10/20] Hexagon HVX (target/hexagon) C preprocessor for decode tree
  2021-07-05 23:34 ` [PATCH 10/20] Hexagon HVX (target/hexagon) C preprocessor for decode tree Taylor Simpson
@ 2021-07-25 13:15   ` Richard Henderson
  0 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2021-07-25 13:15 UTC (permalink / raw)
  To: Taylor Simpson, qemu-devel; +Cc: ale, bcain, philmd, peter.maydell

On 7/5/21 1:34 PM, Taylor Simpson wrote:
> +        char *name = (char *)opcode_names[opcode];
> +        if (strncmp(name, test, strlen(test)) == 0) {

Why did you cast away const here?


r~


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 11/20] Hexagon HVX (target/hexagon) instruction utility functions
  2021-07-05 23:34 ` [PATCH 11/20] Hexagon HVX (target/hexagon) instruction utility functions Taylor Simpson
@ 2021-07-25 13:21   ` Richard Henderson
  0 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2021-07-25 13:21 UTC (permalink / raw)
  To: Taylor Simpson, qemu-devel; +Cc: ale, bcain, philmd, peter.maydell

On 7/5/21 1:34 PM, Taylor Simpson wrote:
> Functions to support scatter/gather
...
> +uint32_t count_leading_ones_2(uint16_t src)
> +{
> +    int ret;
> +    for (ret = 0; src & 0x8000; src <<= 1) {
> +        ret++;
> +    }
> +    return ret;
> +}

Not related to scatter/gather.

Also, much better implemented as

   clz32(~src & 0xffff) - 16


r~


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 12/20] Hexagon HVX (target/hexagon) helper functions
  2021-07-05 23:34 ` [PATCH 12/20] Hexagon HVX (target/hexagon) helper functions Taylor Simpson
@ 2021-07-25 13:22   ` Richard Henderson
  2021-07-26  4:02     ` Taylor Simpson
  0 siblings, 1 reply; 40+ messages in thread
From: Richard Henderson @ 2021-07-25 13:22 UTC (permalink / raw)
  To: Taylor Simpson, qemu-devel; +Cc: ale, bcain, philmd, peter.maydell

On 7/5/21 1:34 PM, Taylor Simpson wrote:
> +                    put_user_u8(env->vstore[i].data.ub[j], va + j);

No put_user.


r~


^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH 12/20] Hexagon HVX (target/hexagon) helper functions
  2021-07-25 13:22   ` Richard Henderson
@ 2021-07-26  4:02     ` Taylor Simpson
  0 siblings, 0 replies; 40+ messages in thread
From: Taylor Simpson @ 2021-07-26  4:02 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: ale, Brian Cain, philmd, peter.maydell



> -----Original Message-----
> From: Richard Henderson <richard.henderson@linaro.org>
> Sent: Sunday, July 25, 2021 8:22 AM
> To: Taylor Simpson <tsimpson@quicinc.com>; qemu-devel@nongnu.org
> Cc: philmd@redhat.com; ale@rev.ng; Brian Cain <bcain@quicinc.com>;
> peter.maydell@linaro.org
> Subject: Re: [PATCH 12/20] Hexagon HVX (target/hexagon) helper functions
> 
> On 7/5/21 1:34 PM, Taylor Simpson wrote:
> > +                    put_user_u8(env->vstore[i].data.ub[j], va + j);
> 
> No put_user.

Yes, there's also an include of qemu.h that I'll remove.



^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH 02/20] Hexagon HVX (target/hexagon) add Hexagon Vector eXtensions (HVX) to core
  2021-07-25 13:08   ` Richard Henderson
@ 2021-07-26  4:02     ` Taylor Simpson
  2021-07-27 17:21       ` Taylor Simpson
  0 siblings, 1 reply; 40+ messages in thread
From: Taylor Simpson @ 2021-07-26  4:02 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: ale, Brian Cain, philmd, peter.maydell



> -----Original Message-----
> From: Richard Henderson <richard.henderson@linaro.org>
> Sent: Sunday, July 25, 2021 8:08 AM
> To: Taylor Simpson <tsimpson@quicinc.com>; qemu-devel@nongnu.org
> Cc: philmd@redhat.com; ale@rev.ng; Brian Cain <bcain@quicinc.com>;
> peter.maydell@linaro.org
> Subject: Re: [PATCH 02/20] Hexagon HVX (target/hexagon) add Hexagon
> Vector eXtensions (HVX) to core
> 
> On 7/5/21 1:34 PM, Taylor Simpson wrote:
> > HVX is a set of wide vector instructions.  Machine state includes
> >      vector registers (VRegs)
> >      vector predicate registers (QRegs)
> >      temporary registers for intermediate values
> >      store buffer (masked stores and scatter/gather)
> >
> > Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
> > ---
> >   target/hexagon/cpu.h            | 35 ++++++++++++++++-
> >   target/hexagon/hex_arch_types.h |  5 +++
> >   target/hexagon/insn.h           |  3 ++
> >   target/hexagon/internal.h       |  3 ++
> >   target/hexagon/mmvec/mmvec.h    | 83
> +++++++++++++++++++++++++++++++++++++++++
> >   target/hexagon/cpu.c            | 72
> ++++++++++++++++++++++++++++++++++-
> >   6 files changed, 198 insertions(+), 3 deletions(-)
> >   create mode 100644 target/hexagon/mmvec/mmvec.h
> >
> > diff --git a/target/hexagon/cpu.h b/target/hexagon/cpu.h index
> > 2855dd3..0b377c3 100644
> > --- a/target/hexagon/cpu.h
> > +++ b/target/hexagon/cpu.h
> > @@ -26,6 +26,7 @@ typedef struct CPUHexagonState CPUHexagonState;
> >   #include "qemu-common.h"
> >   #include "exec/cpu-defs.h"
> >   #include "hex_regs.h"
> > +#include "mmvec/mmvec.h"
> >
> >   #define NUM_PREGS 4
> >   #define TOTAL_PER_THREAD_REGS 64
> > @@ -34,6 +35,7 @@ typedef struct CPUHexagonState CPUHexagonState;
> >   #define STORES_MAX 2
> >   #define REG_WRITES_MAX 32
> >   #define PRED_WRITES_MAX 5                   /* 4 insns + endloop */
> > +#define VSTORES_MAX 2
> >
> >   #define TYPE_HEXAGON_CPU "hexagon-cpu"
> >
> > @@ -52,6 +54,13 @@ typedef struct {
> >       uint64_t data64;
> >   } MemLog;
> >
> > +typedef struct {
> > +    target_ulong va;
> > +    int size;
> > +    MMVector mask QEMU_ALIGNED(16);
> > +    MMVector data QEMU_ALIGNED(16);
> > +} VStoreLog;
> 
> Do you really need a MMVector mask, or should this be a QRegMask?

You are correct.  I'll change this.

> 
> > -    target_ulong gather_issued;
> > +    bool gather_issued;
> 
> Surely unrelated to adding state.

This was unintentionally included in the patch series for the scalar core.  Based on previous feedback, I know it should be a bool.  However, this can actually be removed altogether.  So, in the next iteration of this series, you'll see it removed.

> 
> > +    MMVector VRegs[NUM_VREGS] QEMU_ALIGNED(16);
> > +    MMVector future_VRegs[NUM_VREGS] QEMU_ALIGNED(16);
> > +    MMVector tmp_VRegs[NUM_VREGS] QEMU_ALIGNED(16);
> 
> Ok, this is where I'm not keen on how you handle this for integer code, and
> for vector code has got to be past the realm of acceptable.
> 
> You have exactly 4 slots in your vliw packet.  You cannot possibly use 32
> future or tmp slots.  For integers this wastage was at least small, but for
> vectors these waste just shy of 8k.
> 
> All you need to do is to be smarter about mapping e.g. 5 to 8 temp slots
> during translation.

OK

> 
> > +    MMQReg QRegs[NUM_QREGS] QEMU_ALIGNED(16);
> > +    MMQReg future_QRegs[NUM_QREGS] QEMU_ALIGNED(16);
> 
> Likewise.
> 
> > +    /* Temporaries used within instructions */
> > +    MMVector zero_vector QEMU_ALIGNED(16);
> 
> You must be doing something wrong to need zero in memory.

The architecture specifies that if you use a .new in a packet where the vector register isn't  defined, it gets zero.  So, the generator produces the following for .new references
    const intptr_t OsN_off =
        test_bit(OsN_num, ctx->vregs_updated)
            ? offsetof(CPUHexagonState, future_VRegs[OsN_num])
            : offsetof(CPUHexagonState, zero_vector);

> 
> > +/*
> > + * The HVX register dump takes up a ton of space in the log
> > + * Don't print it unless it is needed  */ #define DUMP_HVX 0 #if
> > +DUMP_HVX
> > +    qemu_fprintf(f, "Vector Registers = {\n");
> > +    for (int i = 0; i < NUM_VREGS; i++) {
> > +        print_vreg(f, env, i);
> > +    }
> > +    for (int i = 0; i < NUM_QREGS; i++) {
> > +        print_qreg(f, env, i);
> > +    }
> > +    qemu_fprintf(f, "}\n");
> > +#endif
> 
> Use CPU_DUMP_FPU, controlled by -d fpu.

These aren't FP registers, but OK.


> 
> > -    cc->gdb_num_core_regs = TOTAL_PER_THREAD_REGS;
> > +    cc->gdb_num_core_regs = TOTAL_PER_THREAD_REGS + NUM_VREGS
> +
> > + NUM_QREGS;
> 
> They're not really core regs though are they?
> Surely gdb_register_coprocessor is a better representation.

I'll look into this.


Thanks,
Taylor


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 01/20] Hexagon HVX (target/hexagon) README
  2021-07-19 16:19           ` Sid Manning
@ 2021-07-26  7:57             ` Rob Landley
  2021-07-26  8:54               ` Rob Landley
  0 siblings, 1 reply; 40+ messages in thread
From: Rob Landley @ 2021-07-26  7:57 UTC (permalink / raw)
  To: Sid Manning, Brian Cain, Taylor Simpson, qemu-devel
  Cc: ale, peter.maydell, richard.henderson, philmd

[-- Attachment #1: Type: text/plain, Size: 3844 bytes --]

On 7/19/21 11:19 AM, Sid Manning wrote:>> -----Original Message-----
>> From: Brian Cain <bcain@quicinc.com>
>> Sent: Monday, July 19, 2021 8:40 AM
>> To: Rob Landley <rob@landley.net>; Taylor Simpson
>> <tsimpson@quicinc.com>; qemu-devel@nongnu.org; Sid Manning
>> <sidneym@quicinc.com>
>> Cc: ale@rev.ng; peter.maydell@linaro.org; richard.henderson@linaro.org;
>> philmd@redhat.com
>> Subject: RE: [EXT] Re: [PATCH 01/20] Hexagon HVX (target/hexagon)
>> README
>> 
>> > -----Original Message-----
>> > From: Rob Landley <rob@landley.net>
>> ...
>> > On 7/12/21 8:42 AM, Brian Cain wrote:
>> ...
>> > > and there's also a binary hexagon-linux cross toolchain that we
>> > > shared for use by kernel developers.  The hexagon linux toolchain is
>> > > built on Ubuntu 16.04.
>> >
>> > Where's that one?
>> 
>> https://codelinaro.jfrog.io/artifactory/codelinaro-qemu/2021-05-
>> 12/clang+llvm-12.0.0-cross-hexagon-unknown-linux-musl.tar.xz -
>> 	- Built on Ubuntu 16.04, similar dynamic dependencies as
>> releases.llvm.org binaries

Indeed, in a "that also does not run on devuan, which is 99% stock debian" way. :(

Luckily, I built a working hexagon toolchain with the attached script, as in
"qemu-hexagon ran a statically linked toybox", and it even built a kernel.

I'm still trimming the build script down, that clang-rt section is WAY too big,
and I need to static link the binaires it produces so I can tar 'em up and use
them under a different distro, and I haven't even _started_ making a native
toolchain yet.[1]

Next question: is there a qemu-system-hexagon anywhere?

I mentioned I built a comet_defconfig kernel, ala:

LLVM_IAS=1
CROSS_COMPILE=~/toybox/hexagon/ccc/cross_bin/hexagon-unknown-linux-musl- make
ARCH=hexagon CC=~/toybox/hexagon/ccc/cross_bin/hexagon-unknown-linux-musl-cc

Which is kinda silly because:

1) Other packages figure out that ${CROSS}cc works but Linux insists on
${CROSS}gcc, and you can't even do "CC=cc make" because then it won't add the
cross compiler prefix. (And if I say LLVM=1 on the kernel command line, which I
shouldn't have to do, it uses _unprefixed_ clang as the $CC name, despite cross
compiling.)

2) If you don't set LLVM_IAS it tries to call the UNPREFIXED assembler, again
while cross compiling.

Anyway, I've got a compiler now and I (awkwardly) built a kernel and I'm sitting
down to try to figure out how to get qemu to invoke it: does this arch want
vmlinux or arch/hexagon/boot/$RANDOMFORMAT, is serial on console=ttyS0 or
/significant/dev/prefix/ttyasparagus0 or...

See https://github.com/landley/toybox/blob/master/scripts/mkroot.sh#L186 for the
other architectures I've already added to toybox's mkroot, yes I have a ~250
line bash script that builds bootable Linux systems for a bunch of different
architectures and adding a new architecture looks like:

elif [ "$TARGET" == m68k ]; then
QEMU="m68k -M q800" KARCH=m68k KARGS=ttyS0 VMLINUX=vmlinux
KCONF=MMU,M68040,M68KFPU_EMU,MAC,SCSI_MAC_ESP,MACINTOSH_DRIVERS,ADB,ADB_MACII,NET_CORE,MACSONIC,SERIAL_PMACZILOG,SERIAL_PMACZILOG_TTYS,SERIAL_PMACZILOG_CONSOLE

(There's a little documentation at https://landley.net/toybox/faq.html#mkroot if
you're curious.)

Anyway... it doesn't look like qemu-system-hexagon (softmmu) its currently in
vanilla qemu? Is there a public fork that has this somewhere?

Thanks,

Rob

[1] Why does https://llvm.org/docs/GettingStarted.html#cross-compiling-llvm talk
about osx? Dear compiler writers: a compiler is conceptually the same as an html
to pdf converter. It takes input files, it produces output files. Yes some of
the input files are common library stuff like fonts reused by multiple
input/output pairs... again like an html to pdf converter. This is not
unprecedented black magic. Sure it's clever. So was Quake, which has now been
genericized into a broad industry from WoW to Skyrim.

[-- Attachment #2: script.sh --]
[-- Type: application/x-shellscript, Size: 2327 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 01/20] Hexagon HVX (target/hexagon) README
  2021-07-26  7:57             ` Rob Landley
@ 2021-07-26  8:54               ` Rob Landley
  2021-07-26 13:59                 ` Taylor Simpson
  0 siblings, 1 reply; 40+ messages in thread
From: Rob Landley @ 2021-07-26  8:54 UTC (permalink / raw)
  To: Sid Manning, Brian Cain, Taylor Simpson, qemu-devel, linux-hexagon
  Cc: ale, peter.maydell, richard.henderson, philmd

[-- Attachment #1: Type: text/plain, Size: 2225 bytes --]

On 7/26/21 2:57 AM, Rob Landley wrote:
> Anyway... it doesn't look like qemu-system-hexagon (softmmu) its currently in
> vanilla qemu? Is there a public fork that has this somewhere?

I did a little wild flailing to get ./configure to give me a qemu-system-hexagon
option (patch attached), I.E. just enough to get meson to shut up and quite
possibly still missing something important. (Is this python? It looks kind of
like python.)

Unfortunately after liberally cribbing from the cris architecture (which seems
like the simplest one) I'm left with several new C files to implement, all
currently zero length in the patch:

  hw/hexagon/boot.c
  hw/hexagon/hexagon_comet.c
  target/hexagon/machine.c
  target/hexagon/mmu.c

(In theory there's a "none" board on all the current qemu-system architectures,
but I've never figured out what to DO with those...)

All this raises two problems:

1) I dunno how the hexagon mmu works. (I can presumably read the kernel code and
reverse engineer what that's looking for, but it would be really nice not to
_have_ to?)

2) What's a comet board? (Memory layout? I/O devices? I guess all I need for
serial console on initramfs is a contiguous block of DRAM, timer interrupt to
drive the scheduler, and a serial port. I keep thinking there should be a way to
tell the "none" board to add that stuff from the command line... but dunno how.
"map DRAM here". "add this clock hardware at here". "add this kind of serial
port at here." "call elf_load on this file and start executing at its entry
point"...)

3) Reading the arch/hexagon kernel stuff ala "so what IS in a comet board"...
CONFIG_HEXAGON_COMET is only ever used to guard one #define in a header file:

  arch/hexagon/include/asm/timer-regs.h:#define RTOS_TIMER_REGS_ADDR

Which is then used to initialize structure members in arch/hexagon/kernel/time.c
without any sort of guard there, and no it isn't #defined to 0 by default
anywhere I can see? And of course obj-y += time.o in
arch/hexagon/kernel/Makefile has no config guard there either. So if it wasn't
set, the build would break. And that's currently all the symbol does?

Anyway, I still hope somebody else has already done most of this in a git tree
somewhere. :)

Rob

[-- Attachment #2: hexagon-softmmu-skeleton.patch --]
[-- Type: text/x-patch, Size: 2532 bytes --]

diff --git a/default-configs/devices/hexagon-softmmu.mak b/default-configs/devices/hexagon-softmmu.mak
new file mode 100644
index 0000000000..c07ed1132f
--- /dev/null
+++ b/default-configs/devices/hexagon-softmmu.mak
@@ -0,0 +1,5 @@
+# Default configuration for hexagon-softmmu
+
+# Boards:
+#
+CONFIG_HEXAGON_COMET=y
diff --git a/default-configs/targets/hexagon-softmmu.mak b/default-configs/targets/hexagon-softmmu.mak
new file mode 100644
index 0000000000..003ed0a408
--- /dev/null
+++ b/default-configs/targets/hexagon-softmmu.mak
@@ -0,0 +1 @@
+TARGET_ARCH=hexagon
diff --git a/hw/hexagon/Kconfig b/hw/hexagon/Kconfig
new file mode 100644
index 0000000000..9ae8a5ce30
--- /dev/null
+++ b/hw/hexagon/Kconfig
@@ -0,0 +1,2 @@
+config HEXAGON_COMET
+    bool
diff --git a/hw/hexagon/boot.c b/hw/hexagon/boot.c
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/hw/hexagon/hexagon_comet.c b/hw/hexagon/hexagon_comet.c
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/hw/hexagon/meson.build b/hw/hexagon/meson.build
new file mode 100644
index 0000000000..83f23f5368
--- /dev/null
+++ b/hw/hexagon/meson.build
@@ -0,0 +1,5 @@
+hexagon_ss = ss.source_set()
+hexagon_ss.add(files('boot.c'))
+hexagon_ss.add(when: 'CONFIG_HEXAGON_COMET', if_true: files('hexagon_comet.c'))
+
+hw_arch += {'hexagon': hexagon_ss}
diff --git a/target/hexagon/machine.c b/target/hexagon/machine.c
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/target/hexagon/mmu.c b/target/hexagon/mmu.c
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/hw/Kconfig b/hw/Kconfig
index 805860f564..7cfd7db690 100644
--- a/hw/Kconfig
+++ b/hw/Kconfig
@@ -62,6 +62,7 @@ source sparc/Kconfig
 source sparc64/Kconfig
 source tricore/Kconfig
 source xtensa/Kconfig
+source hexagon/Kconfig
 
 # Symbols used by multiple targets
 config TEST_DEVICES
diff --git a/hw/meson.build b/hw/meson.build
index ba0601e36e..f43c4bacdd 100644
--- a/hw/meson.build
+++ b/hw/meson.build
@@ -46,6 +46,7 @@ subdir('alpha')
 subdir('arm')
 subdir('avr')
 subdir('cris')
+subdir('hexagon')
 subdir('hppa')
 subdir('i386')
 subdir('m68k')
diff --git a/target/hexagon/meson.build b/target/hexagon/meson.build
index 6fd9360b74..aef434421f 100644
--- a/target/hexagon/meson.build
+++ b/target/hexagon/meson.build
@@ -176,3 +176,7 @@ hexagon_ss.add(files(
 ))
 
 target_arch += {'hexagon': hexagon_ss}
+
+hexagon_softmmu_ss = ss.source_set()
+hexagon_softmmu_ss.add(files('mmu.c', 'machine.c'))
+target_softmmu_arch += {'hexagon': hexagon_softmmu_ss}

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH 01/20] Hexagon HVX (target/hexagon) README
  2021-07-26  8:54               ` Rob Landley
@ 2021-07-26 13:59                 ` Taylor Simpson
  2021-07-28  8:11                   ` Rob Landley
  2021-11-25  6:26                   ` Rob Landley
  0 siblings, 2 replies; 40+ messages in thread
From: Taylor Simpson @ 2021-07-26 13:59 UTC (permalink / raw)
  To: Rob Landley, Sid Manning, Brian Cain, qemu-devel, linux-hexagon
  Cc: ale, peter.maydell, richard.henderson, philmd

We're working on system mode support for Hexagon, and we plan to upstream it when it is ready.

Thanks,
Taylor



> -----Original Message-----
> From: Rob Landley <rob@landley.net>
> Sent: Monday, July 26, 2021 3:55 AM
> To: Sid Manning <sidneym@quicinc.com>; Brian Cain <bcain@quicinc.com>;
> Taylor Simpson <tsimpson@quicinc.com>; qemu-devel@nongnu.org; linux-
> hexagon@vger.kernel.org
> Cc: ale@rev.ng; peter.maydell@linaro.org; richard.henderson@linaro.org;
> philmd@redhat.com
> Subject: Re: [PATCH 01/20] Hexagon HVX (target/hexagon) README
> 
> On 7/26/21 2:57 AM, Rob Landley wrote:
> > Anyway... it doesn't look like qemu-system-hexagon (softmmu) its
> > currently in vanilla qemu? Is there a public fork that has this somewhere?
> 
> I did a little wild flailing to get ./configure to give me a qemu-system-hexagon
> option (patch attached), I.E. just enough to get meson to shut up and quite
> possibly still missing something important. (Is this python? It looks kind of like
> python.)
> 
> Unfortunately after liberally cribbing from the cris architecture (which seems
> like the simplest one) I'm left with several new C files to implement, all
> currently zero length in the patch:
> 
>   hw/hexagon/boot.c
>   hw/hexagon/hexagon_comet.c
>   target/hexagon/machine.c
>   target/hexagon/mmu.c
> 
> (In theory there's a "none" board on all the current qemu-system
> architectures, but I've never figured out what to DO with those...)
> 
> All this raises two problems:
> 
> 1) I dunno how the hexagon mmu works. (I can presumably read the kernel
> code and reverse engineer what that's looking for, but it would be really nice
> not to _have_ to?)
> 
> 2) What's a comet board? (Memory layout? I/O devices? I guess all I need for
> serial console on initramfs is a contiguous block of DRAM, timer interrupt to
> drive the scheduler, and a serial port. I keep thinking there should be a way
> to tell the "none" board to add that stuff from the command line... but
> dunno how.
> "map DRAM here". "add this clock hardware at here". "add this kind of serial
> port at here." "call elf_load on this file and start executing at its entry
> point"...)
> 
> 3) Reading the arch/hexagon kernel stuff ala "so what IS in a comet board"...
> CONFIG_HEXAGON_COMET is only ever used to guard one #define in a
> header file:
> 
>   arch/hexagon/include/asm/timer-regs.h:#define
> RTOS_TIMER_REGS_ADDR
> 
> Which is then used to initialize structure members in
> arch/hexagon/kernel/time.c without any sort of guard there, and no it isn't
> #defined to 0 by default anywhere I can see? And of course obj-y += time.o
> in arch/hexagon/kernel/Makefile has no config guard there either. So if it
> wasn't set, the build would break. And that's currently all the symbol does?
> 
> Anyway, I still hope somebody else has already done most of this in a git tree
> somewhere. :)
> 
> Rob

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH 02/20] Hexagon HVX (target/hexagon) add Hexagon Vector eXtensions (HVX) to core
  2021-07-26  4:02     ` Taylor Simpson
@ 2021-07-27 17:21       ` Taylor Simpson
  0 siblings, 0 replies; 40+ messages in thread
From: Taylor Simpson @ 2021-07-27 17:21 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: ale, Brian Cain, philmd, peter.maydell



> -----Original Message-----
> From: Taylor Simpson
> Sent: Sunday, July 25, 2021 10:02 PM
> To: Richard Henderson <richard.henderson@linaro.org>; qemu-
> devel@nongnu.org
> Cc: philmd@redhat.com; ale@rev.ng; Brian Cain <bcain@quicinc.com>;
> peter.maydell@linaro.org
> Subject: RE: [PATCH 02/20] Hexagon HVX (target/hexagon) add Hexagon
> Vector eXtensions (HVX) to core
> 
> 
> 
> > -----Original Message-----
> > From: Richard Henderson <richard.henderson@linaro.org>
> > Sent: Sunday, July 25, 2021 8:08 AM
> > To: Taylor Simpson <tsimpson@quicinc.com>; qemu-devel@nongnu.org
> > Cc: philmd@redhat.com; ale@rev.ng; Brian Cain <bcain@quicinc.com>;
> > peter.maydell@linaro.org
> > Subject: Re: [PATCH 02/20] Hexagon HVX (target/hexagon) add Hexagon
> > Vector eXtensions (HVX) to core
> >


> > > +    /* Temporaries used within instructions */
> > > +    MMVector zero_vector QEMU_ALIGNED(16);
> >
> > You must be doing something wrong to need zero in memory.
> 
> The architecture specifies that if you use a .new in a packet where the vector
> register isn't  defined, it gets zero.  So, the generator produces the following
> for .new references
>     const intptr_t OsN_off =
>         test_bit(OsN_num, ctx->vregs_updated)
>             ? offsetof(CPUHexagonState, future_VRegs[OsN_num])
>             : offsetof(CPUHexagonState, zero_vector);

Correction - the architecture does NOT allow a .new where the vector register isn't defined.

This code is being overly cautious, so I'll change it to always return future_VRegs and remove zero_vector from CPUHexagonState.

Thanks,
Taylor




^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 01/20] Hexagon HVX (target/hexagon) README
  2021-07-26 13:59                 ` Taylor Simpson
@ 2021-07-28  8:11                   ` Rob Landley
  2021-11-25  6:26                   ` Rob Landley
  1 sibling, 0 replies; 40+ messages in thread
From: Rob Landley @ 2021-07-28  8:11 UTC (permalink / raw)
  To: Taylor Simpson, Sid Manning, Brian Cain, qemu-devel, linux-hexagon
  Cc: ale, peter.maydell, richard.henderson, philmd

On 7/26/21 8:59 AM, Taylor Simpson wrote:
>> Anyway, I still hope somebody else has already done most of this in a git
>> tree somewhere. :)
>
> We're working on system mode support for Hexagon, and we plan to upstream it when it is ready.

Yay! Thanks.

While you're at it, why is llvm's cmake config unable to do:

  $ cccnext/cross_bin/hexagon-unknown-linux-musl-cc \
    -Xpreprocessor -P -E - <<< __SIZEOF_POINTER__
  4

I'm trying to genericize that llvm build script to do all the targets musl and
llvm agree on supporting, which means not passing in -DCMAKE_SIZEOF_VOID_P=4
because the compiler ALREADY KNOWS THIS... but cmake/config-ix.cmake line 196 is
REALLY going to barf if we didn't explicitly specify it on the command line? Are
the llvm developers not _aware_ of the "cc -E -dM - < /dev/null" trick? Even if
they aren't, why couldn't they just sizeof(void *) in a header file?

*shrug* I can do the above trick in the wrapper script and then provide
-DCMAKE_SIZEOF_VOID_P=$BLAH on the command line, it just seems DEEPLY pointless
to go to all the trouble of having a ./configure that has to be manually told
stuff the compiler already knows.

Confused,

Rob


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 01/20] Hexagon HVX (target/hexagon) README
  2021-07-26 13:59                 ` Taylor Simpson
  2021-07-28  8:11                   ` Rob Landley
@ 2021-11-25  6:26                   ` Rob Landley
  1 sibling, 0 replies; 40+ messages in thread
From: Rob Landley @ 2021-11-25  6:26 UTC (permalink / raw)
  To: Taylor Simpson, Sid Manning, Brian Cain, qemu-devel, linux-hexagon
  Cc: ale, peter.maydell, richard.henderson, philmd

On 7/26/21 8:59 AM, Taylor Simpson wrote:
> We're working on system mode support for Hexagon, and we plan to upstream it when it is ready.
> 
> Thanks,
> Taylor

Any progress on this? (Is there a way for outsiders to track the status?)

Thanks,

Rob


^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2021-11-25  6:29 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-05 23:34 [PATCH 00/20] Hexagon HVX (target/hexagon) patch series Taylor Simpson
2021-07-05 23:34 ` [PATCH 01/20] Hexagon HVX (target/hexagon) README Taylor Simpson
2021-07-12  8:16   ` Rob Landley
2021-07-12 13:42     ` Brian Cain
2021-07-19  1:10       ` Rob Landley
2021-07-19 13:39         ` Brian Cain
2021-07-19 16:19           ` Sid Manning
2021-07-26  7:57             ` Rob Landley
2021-07-26  8:54               ` Rob Landley
2021-07-26 13:59                 ` Taylor Simpson
2021-07-28  8:11                   ` Rob Landley
2021-11-25  6:26                   ` Rob Landley
2021-07-05 23:34 ` [PATCH 02/20] Hexagon HVX (target/hexagon) add Hexagon Vector eXtensions (HVX) to core Taylor Simpson
2021-07-25 13:08   ` Richard Henderson
2021-07-26  4:02     ` Taylor Simpson
2021-07-27 17:21       ` Taylor Simpson
2021-07-05 23:34 ` [PATCH 03/20] Hexagon HVX (target/hexagon) register names Taylor Simpson
2021-07-25 13:10   ` Richard Henderson
2021-07-05 23:34 ` [PATCH 04/20] Hexagon HVX (target/hexagon) support in gdbstub Taylor Simpson
2021-07-05 23:34 ` [PATCH 05/20] Hexagon HVX (target/hexagon) instruction attributes Taylor Simpson
2021-07-05 23:34 ` [PATCH 06/20] Hexagon HVX (target/hexagon) macros Taylor Simpson
2021-07-25 13:13   ` Richard Henderson
2021-07-05 23:34 ` [PATCH 07/20] Hexagon HVX (target/hexagon) import macro definitions Taylor Simpson
2021-07-05 23:34 ` [PATCH 08/20] Hexagon HVX (target/hexagon) semantics generator Taylor Simpson
2021-07-05 23:34 ` [PATCH 09/20] Hexagon HVX (target/hexagon) semantics generator - part 2 Taylor Simpson
2021-07-05 23:34 ` [PATCH 10/20] Hexagon HVX (target/hexagon) C preprocessor for decode tree Taylor Simpson
2021-07-25 13:15   ` Richard Henderson
2021-07-05 23:34 ` [PATCH 11/20] Hexagon HVX (target/hexagon) instruction utility functions Taylor Simpson
2021-07-25 13:21   ` Richard Henderson
2021-07-05 23:34 ` [PATCH 12/20] Hexagon HVX (target/hexagon) helper functions Taylor Simpson
2021-07-25 13:22   ` Richard Henderson
2021-07-26  4:02     ` Taylor Simpson
2021-07-05 23:34 ` [PATCH 13/20] Hexagon HVX (target/hexagon) TCG generation Taylor Simpson
2021-07-05 23:34 ` [PATCH 14/20] Hexagon HVX (target/hexagon) import semantics Taylor Simpson
2021-07-05 23:34 ` [PATCH 15/20] Hexagon HVX (target/hexagon) instruction decoding Taylor Simpson
2021-07-05 23:34 ` [PATCH 16/20] Hexagon HVX (target/hexagon) import instruction encodings Taylor Simpson
2021-07-05 23:34 ` [PATCH 17/20] Hexagon HVX (tests/tcg/hexagon) vector_add_int test Taylor Simpson
2021-07-05 23:34 ` [PATCH 18/20] Hexagon HVX (tests/tcg/hexagon) hvx_misc test Taylor Simpson
2021-07-05 23:34 ` [PATCH 19/20] Hexagon HVX (tests/tcg/hexagon) scatter_gather test Taylor Simpson
2021-07-05 23:34 ` [PATCH 20/20] Hexagon HVX (tests/tcg/hexagon) histogram test Taylor Simpson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).