All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/30] Hexagon HVX (target/hexagon) patch series
@ 2021-09-20 21:23 Taylor Simpson
  2021-09-20 21:23 ` [PATCH v3 01/30] Hexagon HVX (target/hexagon) README Taylor Simpson
                   ` (29 more replies)
  0 siblings, 30 replies; 51+ messages in thread
From: Taylor Simpson @ 2021-09-20 21:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: ale, bcain, tsimpson, richard.henderson, f4bug


This series adds support for the Hexagon Vector eXtensions (HVX)

These instructions are documented here
https://developer.qualcomm.com/downloads/qualcomm-hexagon-v66-hvx-programmer-s-reference-manual

Hexagon HVX is a wide vector engine with 128 byte vectors.

See patch 01 Hexagon HVX README for more information.


*** Known checkpatch issues ***

The following are known checkpatch errors in the series
    target/hexagon/gen_semantics.c    Suspicious ; after while (0)
    tests/tcg/hexagon/hvx_misc.c      Spaces around operator in macro invocation


*** Changes in v3 ***
Clean up gen_log_vreg_write
- Remove has_vhist parameter
Remove VRegs_updated_tmp from runtime state
- Check there is exactly one tmp for vhist at TCG generation time
Remove VRegs_select from runtime state
Add test_max_temps test to tests/tcg/hexagon/hvx_misc.c
Don't pass slot to HVX helpers

*** Changes in v2 ***
Address feedback from Richard Henderson <richard.henderson@linaro.org>
- Remove zero_vector from CPUHexagonState
- Remove gather_issued from CPUHexagonState
- Remove is_gather_store_insn from DisasContext and CPUHexagonState
- Change VStoreLog.mask to a bitmap
- Change VTCMStoreLog.mask to a bitmap
- Convert future_VRegs, tmp_Vregs to allocate as-needed
- Don't cast away const
- Remove/simplify count_leading_ones_2
- Control HVX dump with CPU_DUMP_FPU
Remove HVX support from target/hexagon/gdbstub.c
- Hexagon uses lldb which will require support for qRegisterInfo in the
  target-independent gdbstub.  Will contribute this separately
Convert the histogram instructions to execute at the end of packet commit
- This is necessary to allocate future_VRegs as-needed
Additional tests added in tests/tcg/hexagon
Added helper overrides for several instructions
- As a result, cleaned up utility functions
Additional_cleanup
- Change TCGv_ptr to not _local_
- Remove env argument from gen_commit_hvx



Taylor Simpson (30):
  Hexagon HVX (target/hexagon) README
  Hexagon HVX (target/hexagon) add Hexagon Vector eXtensions (HVX) to
    core
  Hexagon HVX (target/hexagon) register names
  Hexagon HVX (target/hexagon) instruction attributes
  Hexagon HVX (target/hexagon) macros
  Hexagon HVX (target/hexagon) import macro definitions
  Hexagon HVX (target/hexagon) semantics generator
  Hexagon HVX (target/hexagon) semantics generator - part 2
  Hexagon HVX (target/hexagon) C preprocessor for decode tree
  Hexagon HVX (target/hexagon) instruction utility functions
  Hexagon HVX (target/hexagon) helper functions
  Hexagon HVX (target/hexagon) TCG generation
  Hexagon HVX (target/hexagon) helper overrides infrastructure
  Hexagon HVX (target/hexagon) helper overrides for histogram
    instructions
  Hexagon HVX (target/hexagon) helper overrides - vector assign & cmov
  Hexagon HVX (target/hexagon) helper overrides - vector add & sub
  Hexagon HVX (target/hexagon) helper overrides - vector shifts
  Hexagon HVX (target/hexagon) helper overrides - vector max/min
  Hexagon HVX (target/hexagon) helper overrides - vector logical ops
  Hexagon HVX (target/hexagon) helper overrides - vector compares
  Hexagon HVX (target/hexagon) helper overrides - vector splat and abs
  Hexagon HVX (target/hexagon) helper overrides - vector loads
  Hexagon HVX (target/hexagon) helper overrides - vector stores
  Hexagon HVX (target/hexagon) import semantics
  Hexagon HVX (target/hexagon) instruction decoding
  Hexagon HVX (target/hexagon) import instruction encodings
  Hexagon HVX (tests/tcg/hexagon) vector_add_int test
  Hexagon HVX (tests/tcg/hexagon) hvx_misc test
  Hexagon HVX (tests/tcg/hexagon) scatter_gather test
  Hexagon HVX (tests/tcg/hexagon) histogram test

 target/hexagon/cpu.h                         |   35 +-
 target/hexagon/gen_tcg_hvx.h                 |  915 +++++++++
 target/hexagon/helper.h                      |   14 +
 target/hexagon/hex_arch_types.h              |    5 +
 target/hexagon/hex_regs.h                    |    1 +
 target/hexagon/insn.h                        |    3 +
 target/hexagon/internal.h                    |    3 +
 target/hexagon/macros.h                      |   22 +
 target/hexagon/mmvec/decode_ext_mmvec.h      |   24 +
 target/hexagon/mmvec/macros.h                |  341 ++++
 target/hexagon/mmvec/mmvec.h                 |   83 +
 target/hexagon/mmvec/system_ext_mmvec.h      |   29 +
 target/hexagon/translate.h                   |   61 +
 tests/tcg/hexagon/hvx_histogram_input.h      |  717 +++++++
 tests/tcg/hexagon/hvx_histogram_row.h        |   24 +
 target/hexagon/attribs_def.h.inc             |   22 +
 target/hexagon/cpu.c                         |   80 +-
 target/hexagon/decode.c                      |   28 +-
 target/hexagon/gen_dectree_import.c          |   13 +
 target/hexagon/gen_semantics.c               |   33 +
 target/hexagon/genptr.c                      |  191 ++
 target/hexagon/mmvec/decode_ext_mmvec.c      |  236 +++
 target/hexagon/mmvec/system_ext_mmvec.c      |   66 +
 target/hexagon/op_helper.c                   |  223 ++-
 target/hexagon/translate.c                   |  213 ++-
 tests/tcg/hexagon/hvx_histogram.c            |   88 +
 tests/tcg/hexagon/hvx_misc.c                 |  414 ++++
 tests/tcg/hexagon/scatter_gather.c           | 1011 ++++++++++
 tests/tcg/hexagon/vector_add_int.c           |   61 +
 target/hexagon/README                        |   81 +-
 target/hexagon/gen_helper_funcs.py           |  115 +-
 target/hexagon/gen_helper_protos.py          |   19 +-
 target/hexagon/gen_tcg_funcs.py              |  261 ++-
 target/hexagon/hex_common.py                 |   13 +
 target/hexagon/imported/allext.idef          |   25 +
 target/hexagon/imported/allext_macros.def    |   25 +
 target/hexagon/imported/allextenc.def        |   20 +
 target/hexagon/imported/allidefs.def         |    1 +
 target/hexagon/imported/encode.def           |    1 +
 target/hexagon/imported/macros.def           |   88 +
 target/hexagon/imported/mmvec/encode_ext.def |  794 ++++++++
 target/hexagon/imported/mmvec/ext.idef       | 2606 ++++++++++++++++++++++++++
 target/hexagon/imported/mmvec/macros.def     |  842 +++++++++
 target/hexagon/meson.build                   |   15 +-
 tests/tcg/hexagon/Makefile.target            |   12 +
 tests/tcg/hexagon/hvx_histogram_row.S        |  294 +++
 46 files changed, 10123 insertions(+), 45 deletions(-)
 create mode 100644 target/hexagon/gen_tcg_hvx.h
 create mode 100644 target/hexagon/mmvec/decode_ext_mmvec.h
 create mode 100644 target/hexagon/mmvec/macros.h
 create mode 100644 target/hexagon/mmvec/mmvec.h
 create mode 100644 target/hexagon/mmvec/system_ext_mmvec.h
 create mode 100644 tests/tcg/hexagon/hvx_histogram_input.h
 create mode 100644 tests/tcg/hexagon/hvx_histogram_row.h
 create mode 100644 target/hexagon/mmvec/decode_ext_mmvec.c
 create mode 100644 target/hexagon/mmvec/system_ext_mmvec.c
 create mode 100644 tests/tcg/hexagon/hvx_histogram.c
 create mode 100644 tests/tcg/hexagon/hvx_misc.c
 create mode 100644 tests/tcg/hexagon/scatter_gather.c
 create mode 100644 tests/tcg/hexagon/vector_add_int.c
 create mode 100644 target/hexagon/imported/allext.idef
 create mode 100644 target/hexagon/imported/allext_macros.def
 create mode 100644 target/hexagon/imported/allextenc.def
 create mode 100644 target/hexagon/imported/mmvec/encode_ext.def
 create mode 100644 target/hexagon/imported/mmvec/ext.idef
 create mode 100755 target/hexagon/imported/mmvec/macros.def
 create mode 100644 tests/tcg/hexagon/hvx_histogram_row.S

-- 
2.7.4


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH v3 01/30] Hexagon HVX (target/hexagon) README
  2021-09-20 21:23 [PATCH v3 00/30] Hexagon HVX (target/hexagon) patch series Taylor Simpson
@ 2021-09-20 21:23 ` Taylor Simpson
  2021-09-20 21:23 ` [PATCH v3 02/30] Hexagon HVX (target/hexagon) add Hexagon Vector eXtensions (HVX) to core Taylor Simpson
                   ` (28 subsequent siblings)
  29 siblings, 0 replies; 51+ messages in thread
From: Taylor Simpson @ 2021-09-20 21:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: ale, bcain, tsimpson, richard.henderson, f4bug

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/README | 81 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 80 insertions(+), 1 deletion(-)

diff --git a/target/hexagon/README b/target/hexagon/README
index b0b2435..d09971a 100644
--- a/target/hexagon/README
+++ b/target/hexagon/README
@@ -1,9 +1,13 @@
 Hexagon is Qualcomm's very long instruction word (VLIW) digital signal
-processor(DSP).
+processor(DSP).  We also support Hexagon Vector eXtensions (HVX).  HVX
+is a wide vector coprocessor designed for high performance computer vision,
+image processing, machine learning, and other workloads.
 
 The following versions of the Hexagon core are supported
     Scalar core: v67
     https://developer.qualcomm.com/downloads/qualcomm-hexagon-v67-programmer-s-reference-manual
+    HVX extension: v66
+    https://developer.qualcomm.com/downloads/qualcomm-hexagon-v66-hvx-programmer-s-reference-manual
 
 We presented an overview of the project at the 2019 KVM Forum.
     https://kvmforum2019.sched.com/event/Tmwc/qemu-hexagon-automatic-translation-of-the-isa-manual-pseudcode-to-tiny-code-instructions-of-a-vliw-architecture-niccolo-izzo-revng-taylor-simpson-qualcomm-innovation-center
@@ -124,6 +128,71 @@ There are also cases where we brute force the TCG code generation.
 Instructions with multiple definitions are examples.  These require special
 handling because qemu helpers can only return a single value.
 
+For HVX vectors, the generator behaves slightly differently.  The wide vectors
+won't fit in a TCGv or TCGv_i64, so we pass TCGv_ptr variables to pass the
+address to helper functions.  Here's an example for an HVX vector-add-word
+istruction.
+    static void generate_V6_vaddw(
+                    CPUHexagonState *env,
+                    DisasContext *ctx,
+                    Insn *insn,
+                    Packet *pkt)
+    {
+        const int VdN = insn->regno[0];
+        const intptr_t VdV_off =
+            ctx_future_vreg_off(ctx, VdN, 1, true);
+        TCGv_ptr VdV = tcg_temp_local_new_ptr();
+        tcg_gen_addi_ptr(VdV, cpu_env, VdV_off);
+        const int VuN = insn->regno[1];
+        const intptr_t VuV_off =
+            vreg_src_off(ctx, VuN);
+        TCGv_ptr VuV = tcg_temp_local_new_ptr();
+        const int VvN = insn->regno[2];
+        const intptr_t VvV_off =
+            vreg_src_off(ctx, VvN);
+        TCGv_ptr VvV = tcg_temp_local_new_ptr();
+        tcg_gen_addi_ptr(VuV, cpu_env, VuV_off);
+        tcg_gen_addi_ptr(VvV, cpu_env, VvV_off);
+        TCGv slot = tcg_const_tl(insn->slot);
+        gen_helper_V6_vaddw(cpu_env, VdV, VuV, VvV, slot);
+        tcg_temp_free(slot);
+        gen_log_vreg_write(ctx, VdV_off, VdN, EXT_DFL, insn->slot, false);
+        ctx_log_vreg_write(ctx, VdN, EXT_DFL, false);
+        tcg_temp_free_ptr(VdV);
+        tcg_temp_free_ptr(VuV);
+        tcg_temp_free_ptr(VvV);
+    }
+
+Notice that we also generate a variable named <operand>_off for each operand of
+the instruction.  This makes it easy to override the instruction semantics with
+functions from tcg-op-gvec.h.  Here's the override for this instruction.
+    #define fGEN_TCG_V6_vaddw(SHORTCODE) \
+        tcg_gen_gvec_add(MO_32, VdV_off, VuV_off, VvV_off, \
+                         sizeof(MMVector), sizeof(MMVector))
+
+Finally, we notice that the override doesn't use the TCGv_ptr variables, so
+we don't generate them when an override is present.  Here is what we generate
+when the override is present.
+    static void generate_V6_vaddw(
+                    CPUHexagonState *env,
+                    DisasContext *ctx,
+                    Insn *insn,
+                    Packet *pkt)
+    {
+        const int VdN = insn->regno[0];
+        const intptr_t VdV_off =
+            ctx_future_vreg_off(ctx, VdN, 1, true);
+        const int VuN = insn->regno[1];
+        const intptr_t VuV_off =
+            vreg_src_off(ctx, VuN);
+        const int VvN = insn->regno[2];
+        const intptr_t VvV_off =
+            vreg_src_off(ctx, VvN);
+        fGEN_TCG_V6_vaddw({ fHIDE(int i;) fVFOREACH(32, i) { VdV.w[i] = VuV.w[i] + VvV.w[i] ; } });
+        gen_log_vreg_write(ctx, VdV_off, VdN, EXT_DFL, insn->slot, false);
+        ctx_log_vreg_write(ctx, VdN, EXT_DFL, false);
+    }
+
 In addition to instruction semantics, we use a generator to create the decode
 tree.  This generation is also a two step process.  The first step is to run
 target/hexagon/gen_dectree_import.c to produce
@@ -140,6 +209,7 @@ runtime information for each thread and contains stuff like the GPR and
 predicate registers.
 
 macros.h
+mmvec/macros.h
 
 The Hexagon arch lib relies heavily on macros for the instruction semantics.
 This is a great advantage for qemu because we can override them for different
@@ -203,6 +273,15 @@ During runtime, the following fields in CPUHexagonState (see cpu.h) are used
     pred_written          boolean indicating if predicate was written
     mem_log_stores        record of the stores (indexed by slot)
 
+For Hexagon Vector eXtensions (HVX), the following fields are used
+    VRegs                       Vector registers
+    future_VRegs                Registers to be stored during packet commit
+    tmp_VRegs                   Temporary registers *not* stored during commit
+    VRegs_updated               Mask of predicated vector writes
+    QRegs                       Q (vector predicate) registers
+    future_QRegs                Registers to be stored during packet commit
+    QRegs_updated               Mask of predicated vector writes
+
 *** Debugging ***
 
 You can turn on a lot of debugging by changing the HEX_DEBUG macro to 1 in
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 02/30] Hexagon HVX (target/hexagon) add Hexagon Vector eXtensions (HVX) to core
  2021-09-20 21:23 [PATCH v3 00/30] Hexagon HVX (target/hexagon) patch series Taylor Simpson
  2021-09-20 21:23 ` [PATCH v3 01/30] Hexagon HVX (target/hexagon) README Taylor Simpson
@ 2021-09-20 21:23 ` Taylor Simpson
  2021-09-20 22:55   ` Richard Henderson
  2021-09-20 21:23 ` [PATCH v3 03/30] Hexagon HVX (target/hexagon) register names Taylor Simpson
                   ` (27 subsequent siblings)
  29 siblings, 1 reply; 51+ messages in thread
From: Taylor Simpson @ 2021-09-20 21:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: ale, bcain, tsimpson, richard.henderson, f4bug

HVX is a set of wide vector instructions.  Machine state includes
    vector registers (VRegs)
    vector predicate registers (QRegs)
    temporary registers for intermediate values
    store buffer (masked stores and scatter/gather)

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/cpu.h            | 35 ++++++++++++++++-
 target/hexagon/hex_arch_types.h |  5 +++
 target/hexagon/insn.h           |  3 ++
 target/hexagon/internal.h       |  3 ++
 target/hexagon/mmvec/mmvec.h    | 83 +++++++++++++++++++++++++++++++++++++++++
 target/hexagon/cpu.c            | 78 ++++++++++++++++++++++++++++++++++++--
 6 files changed, 201 insertions(+), 6 deletions(-)
 create mode 100644 target/hexagon/mmvec/mmvec.h

diff --git a/target/hexagon/cpu.h b/target/hexagon/cpu.h
index 2855dd3..7e32c88 100644
--- a/target/hexagon/cpu.h
+++ b/target/hexagon/cpu.h
@@ -26,6 +26,7 @@ typedef struct CPUHexagonState CPUHexagonState;
 #include "qemu-common.h"
 #include "exec/cpu-defs.h"
 #include "hex_regs.h"
+#include "mmvec/mmvec.h"
 
 #define NUM_PREGS 4
 #define TOTAL_PER_THREAD_REGS 64
@@ -34,6 +35,7 @@ typedef struct CPUHexagonState CPUHexagonState;
 #define STORES_MAX 2
 #define REG_WRITES_MAX 32
 #define PRED_WRITES_MAX 5                   /* 4 insns + endloop */
+#define VSTORES_MAX 2
 
 #define TYPE_HEXAGON_CPU "hexagon-cpu"
 
@@ -52,6 +54,13 @@ typedef struct {
     uint64_t data64;
 } MemLog;
 
+typedef struct {
+    target_ulong va;
+    int size;
+    DECLARE_BITMAP(mask, MAX_VEC_SIZE_BYTES / 8) QEMU_ALIGNED(16);
+    MMVector data QEMU_ALIGNED(16);
+} VStoreLog;
+
 #define EXEC_STATUS_OK          0x0000
 #define EXEC_STATUS_STOP        0x0002
 #define EXEC_STATUS_REPLAY      0x0010
@@ -64,6 +73,9 @@ typedef struct {
 #define CLEAR_EXCEPTION         (env->status &= (~EXEC_STATUS_EXCEPTION))
 #define SET_EXCEPTION           (env->status |= EXEC_STATUS_EXCEPTION)
 
+/* Maximum number of vector temps in a packet */
+#define VECTOR_TEMPS_MAX            4
+
 struct CPUHexagonState {
     target_ulong gpr[TOTAL_PER_THREAD_REGS];
     target_ulong pred[NUM_PREGS];
@@ -97,8 +109,27 @@ struct CPUHexagonState {
     target_ulong llsc_val;
     uint64_t     llsc_val_i64;
 
-    target_ulong is_gather_store_insn;
-    target_ulong gather_issued;
+    MMVector VRegs[NUM_VREGS] QEMU_ALIGNED(16);
+    MMVector future_VRegs[VECTOR_TEMPS_MAX] QEMU_ALIGNED(16);
+    MMVector tmp_VRegs[VECTOR_TEMPS_MAX] QEMU_ALIGNED(16);
+
+    VRegMask VRegs_updated;
+
+    MMQReg QRegs[NUM_QREGS] QEMU_ALIGNED(16);
+    MMQReg future_QRegs[NUM_QREGS] QEMU_ALIGNED(16);
+    QRegMask QRegs_updated;
+
+    /* Temporaries used within instructions */
+    MMVectorPair VuuV QEMU_ALIGNED(16);
+    MMVectorPair VvvV QEMU_ALIGNED(16);
+    MMVectorPair VxxV QEMU_ALIGNED(16);
+    MMVector     vtmp QEMU_ALIGNED(16);
+    MMQReg       qtmp QEMU_ALIGNED(16);
+
+    VStoreLog vstore[VSTORES_MAX];
+    target_ulong vstore_pending[VSTORES_MAX];
+    bool vtcm_pending;
+    VTCMStoreLog vtcm_log;
 };
 
 #define HEXAGON_CPU_CLASS(klass) \
diff --git a/target/hexagon/hex_arch_types.h b/target/hexagon/hex_arch_types.h
index d721e1f..78ad607 100644
--- a/target/hexagon/hex_arch_types.h
+++ b/target/hexagon/hex_arch_types.h
@@ -19,6 +19,7 @@
 #define HEXAGON_ARCH_TYPES_H
 
 #include "qemu/osdep.h"
+#include "mmvec/mmvec.h"
 #include "qemu/int128.h"
 
 /*
@@ -35,4 +36,8 @@ typedef uint64_t    size8u_t;
 typedef int64_t     size8s_t;
 typedef Int128      size16s_t;
 
+typedef MMVector          mmvector_t;
+typedef MMVectorPair      mmvector_pair_t;
+typedef MMQReg            mmqret_t;
+
 #endif
diff --git a/target/hexagon/insn.h b/target/hexagon/insn.h
index 2e34591..aa26389 100644
--- a/target/hexagon/insn.h
+++ b/target/hexagon/insn.h
@@ -67,6 +67,9 @@ struct Packet {
     bool pkt_has_store_s0;
     bool pkt_has_store_s1;
 
+    bool pkt_has_hvx;
+    Insn *vhist_insn;
+
     Insn insn[INSTRUCTIONS_MAX];
 };
 
diff --git a/target/hexagon/internal.h b/target/hexagon/internal.h
index 6b20aff..82ac304 100644
--- a/target/hexagon/internal.h
+++ b/target/hexagon/internal.h
@@ -31,6 +31,9 @@
 
 int hexagon_gdb_read_register(CPUState *cpu, GByteArray *buf, int reg);
 int hexagon_gdb_write_register(CPUState *cpu, uint8_t *buf, int reg);
+
+void hexagon_debug_vreg(CPUHexagonState *env, int regnum);
+void hexagon_debug_qreg(CPUHexagonState *env, int regnum);
 void hexagon_debug(CPUHexagonState *env);
 
 extern const char * const hexagon_regnames[TOTAL_PER_THREAD_REGS];
diff --git a/target/hexagon/mmvec/mmvec.h b/target/hexagon/mmvec/mmvec.h
new file mode 100644
index 0000000..6196c52
--- /dev/null
+++ b/target/hexagon/mmvec/mmvec.h
@@ -0,0 +1,83 @@
+/*
+ *  Copyright(c) 2019-2021 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HEXAGON_MMVEC_H
+#define HEXAGON_MMVEC_H
+
+#define MAX_VEC_SIZE_LOGBYTES 7
+#define MAX_VEC_SIZE_BYTES  (1 << MAX_VEC_SIZE_LOGBYTES)
+
+#define NUM_VREGS           32
+#define NUM_QREGS           4
+
+typedef uint32_t VRegMask; /* at least NUM_VREGS bits */
+typedef uint32_t QRegMask; /* at least NUM_QREGS bits */
+
+#define VECTOR_SIZE_BYTE    (fVECSIZE())
+
+typedef union {
+    uint64_t ud[MAX_VEC_SIZE_BYTES / 8];
+    int64_t   d[MAX_VEC_SIZE_BYTES / 8];
+    uint32_t uw[MAX_VEC_SIZE_BYTES / 4];
+    int32_t   w[MAX_VEC_SIZE_BYTES / 4];
+    uint16_t uh[MAX_VEC_SIZE_BYTES / 2];
+    int16_t   h[MAX_VEC_SIZE_BYTES / 2];
+    uint8_t  ub[MAX_VEC_SIZE_BYTES / 1];
+    int8_t    b[MAX_VEC_SIZE_BYTES / 1];
+} MMVector;
+
+typedef union {
+    uint64_t ud[2 * MAX_VEC_SIZE_BYTES / 8];
+    int64_t   d[2 * MAX_VEC_SIZE_BYTES / 8];
+    uint32_t uw[2 * MAX_VEC_SIZE_BYTES / 4];
+    int32_t   w[2 * MAX_VEC_SIZE_BYTES / 4];
+    uint16_t uh[2 * MAX_VEC_SIZE_BYTES / 2];
+    int16_t   h[2 * MAX_VEC_SIZE_BYTES / 2];
+    uint8_t  ub[2 * MAX_VEC_SIZE_BYTES / 1];
+    int8_t    b[2 * MAX_VEC_SIZE_BYTES / 1];
+    MMVector v[2];
+} MMVectorPair;
+
+typedef union {
+    uint64_t ud[MAX_VEC_SIZE_BYTES / 8 / 8];
+    int64_t   d[MAX_VEC_SIZE_BYTES / 8 / 8];
+    uint32_t uw[MAX_VEC_SIZE_BYTES / 4 / 8];
+    int32_t   w[MAX_VEC_SIZE_BYTES / 4 / 8];
+    uint16_t uh[MAX_VEC_SIZE_BYTES / 2 / 8];
+    int16_t   h[MAX_VEC_SIZE_BYTES / 2 / 8];
+    uint8_t  ub[MAX_VEC_SIZE_BYTES / 1 / 8];
+    int8_t    b[MAX_VEC_SIZE_BYTES / 1 / 8];
+} MMQReg;
+
+typedef struct {
+    MMVector data;
+    DECLARE_BITMAP(mask, MAX_VEC_SIZE_BYTES);
+    int size;
+    target_ulong va[MAX_VEC_SIZE_BYTES];
+    bool op;
+    int op_size;
+} VTCMStoreLog;
+
+
+/* Types of vector register assignment */
+typedef enum {
+    EXT_DFL,      /* Default */
+    EXT_NEW,      /* New - value used in the same packet */
+    EXT_TMP       /* Temp - value used but not stored to register */
+} VRegWriteType;
+
+#endif
diff --git a/target/hexagon/cpu.c b/target/hexagon/cpu.c
index 3338365..989bd76 100644
--- a/target/hexagon/cpu.c
+++ b/target/hexagon/cpu.c
@@ -113,7 +113,66 @@ static void print_reg(FILE *f, CPUHexagonState *env, int regnum)
                  hexagon_regnames[regnum], value);
 }
 
-static void hexagon_dump(CPUHexagonState *env, FILE *f)
+static void print_vreg(FILE *f, CPUHexagonState *env, int regnum,
+                       bool skip_if_zero)
+{
+    if (skip_if_zero) {
+        bool nonzero_found = false;
+        for (int i = 0; i < MAX_VEC_SIZE_BYTES; i++) {
+            if (env->VRegs[regnum].ub[i] != 0) {
+                nonzero_found = true;
+                break;
+            }
+        }
+        if (!nonzero_found) {
+            return;
+        }
+    }
+
+    qemu_fprintf(f, "  v%d = ( ", regnum);
+    qemu_fprintf(f, "0x%02x", env->VRegs[regnum].ub[MAX_VEC_SIZE_BYTES - 1]);
+    for (int i = MAX_VEC_SIZE_BYTES - 2; i >= 0; i--) {
+        qemu_fprintf(f, ", 0x%02x", env->VRegs[regnum].ub[i]);
+    }
+    qemu_fprintf(f, " )\n");
+}
+
+void hexagon_debug_vreg(CPUHexagonState *env, int regnum)
+{
+    print_vreg(stdout, env, regnum, false);
+}
+
+static void print_qreg(FILE *f, CPUHexagonState *env, int regnum,
+                       bool skip_if_zero)
+{
+    if (skip_if_zero) {
+        bool nonzero_found = false;
+        for (int i = 0; i < MAX_VEC_SIZE_BYTES / 8; i++) {
+            if (env->QRegs[regnum].ub[i] != 0) {
+                nonzero_found = true;
+                break;
+            }
+        }
+        if (!nonzero_found) {
+            return;
+        }
+    }
+
+    qemu_fprintf(f, "  q%d = ( ", regnum);
+    qemu_fprintf(f, "0x%02x",
+                 env->QRegs[regnum].ub[MAX_VEC_SIZE_BYTES / 8 - 1]);
+    for (int i = MAX_VEC_SIZE_BYTES / 8 - 2; i >= 0; i--) {
+        qemu_fprintf(f, ", 0x%02x", env->QRegs[regnum].ub[i]);
+    }
+    qemu_fprintf(f, " )\n");
+}
+
+void hexagon_debug_qreg(CPUHexagonState *env, int regnum)
+{
+    print_qreg(stdout, env, regnum, false);
+}
+
+static void hexagon_dump(CPUHexagonState *env, FILE *f, int flags)
 {
     HexagonCPU *cpu = env_archcpu(env);
 
@@ -159,6 +218,17 @@ static void hexagon_dump(CPUHexagonState *env, FILE *f)
     print_reg(f, env, HEX_REG_CS1);
 #endif
     qemu_fprintf(f, "}\n");
+
+    if (flags & CPU_DUMP_FPU) {
+        qemu_fprintf(f, "Vector Registers = {\n");
+        for (int i = 0; i < NUM_VREGS; i++) {
+            print_vreg(f, env, i, true);
+        }
+        for (int i = 0; i < NUM_QREGS; i++) {
+            print_qreg(f, env, i, true);
+        }
+        qemu_fprintf(f, "}\n");
+    }
 }
 
 static void hexagon_dump_state(CPUState *cs, FILE *f, int flags)
@@ -166,12 +236,12 @@ static void hexagon_dump_state(CPUState *cs, FILE *f, int flags)
     HexagonCPU *cpu = HEXAGON_CPU(cs);
     CPUHexagonState *env = &cpu->env;
 
-    hexagon_dump(env, f);
+    hexagon_dump(env, f, flags);
 }
 
 void hexagon_debug(CPUHexagonState *env)
 {
-    hexagon_dump(env, stdout);
+    hexagon_dump(env, stdout, CPU_DUMP_FPU);
 }
 
 static void hexagon_cpu_set_pc(CPUState *cs, vaddr value)
@@ -292,7 +362,7 @@ static void hexagon_cpu_class_init(ObjectClass *c, void *data)
     cc->set_pc = hexagon_cpu_set_pc;
     cc->gdb_read_register = hexagon_gdb_read_register;
     cc->gdb_write_register = hexagon_gdb_write_register;
-    cc->gdb_num_core_regs = TOTAL_PER_THREAD_REGS;
+    cc->gdb_num_core_regs = TOTAL_PER_THREAD_REGS + NUM_VREGS + NUM_QREGS;
     cc->gdb_stop_before_watchpoint = true;
     cc->disas_set_info = hexagon_cpu_disas_set_info;
     cc->tcg_ops = &hexagon_tcg_ops;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 03/30] Hexagon HVX (target/hexagon) register names
  2021-09-20 21:23 [PATCH v3 00/30] Hexagon HVX (target/hexagon) patch series Taylor Simpson
  2021-09-20 21:23 ` [PATCH v3 01/30] Hexagon HVX (target/hexagon) README Taylor Simpson
  2021-09-20 21:23 ` [PATCH v3 02/30] Hexagon HVX (target/hexagon) add Hexagon Vector eXtensions (HVX) to core Taylor Simpson
@ 2021-09-20 21:23 ` Taylor Simpson
  2021-09-20 21:23 ` [PATCH v3 04/30] Hexagon HVX (target/hexagon) instruction attributes Taylor Simpson
                   ` (26 subsequent siblings)
  29 siblings, 0 replies; 51+ messages in thread
From: Taylor Simpson @ 2021-09-20 21:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: ale, bcain, tsimpson, richard.henderson, f4bug

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/hex_regs.h | 1 +
 target/hexagon/cpu.c      | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/target/hexagon/hex_regs.h b/target/hexagon/hex_regs.h
index f291911..e1b3149 100644
--- a/target/hexagon/hex_regs.h
+++ b/target/hexagon/hex_regs.h
@@ -76,6 +76,7 @@ enum {
     /* Use reserved control registers for qemu execution counts */
     HEX_REG_QEMU_PKT_CNT      = 52,
     HEX_REG_QEMU_INSN_CNT     = 53,
+    HEX_REG_QEMU_HVX_CNT      = 54,
     HEX_REG_UTIMERLO          = 62,
     HEX_REG_UTIMERHI          = 63,
 };
diff --git a/target/hexagon/cpu.c b/target/hexagon/cpu.c
index 989bd76..3bd3f10 100644
--- a/target/hexagon/cpu.c
+++ b/target/hexagon/cpu.c
@@ -59,7 +59,7 @@ const char * const hexagon_regnames[TOTAL_PER_THREAD_REGS] = {
   "r24", "r25", "r26", "r27", "r28",  "r29", "r30", "r31",
   "sa0", "lc0", "sa1", "lc1", "p3_0", "c5",  "m0",  "m1",
   "usr", "pc",  "ugp", "gp",  "cs0",  "cs1", "c14", "c15",
-  "c16", "c17", "c18", "c19", "pkt_cnt",  "insn_cnt", "c22", "c23",
+  "c16", "c17", "c18", "c19", "pkt_cnt",  "insn_cnt", "hvx_cnt", "c23",
   "c24", "c25", "c26", "c27", "c28",  "c29", "c30", "c31",
 };
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 04/30] Hexagon HVX (target/hexagon) instruction attributes
  2021-09-20 21:23 [PATCH v3 00/30] Hexagon HVX (target/hexagon) patch series Taylor Simpson
                   ` (2 preceding siblings ...)
  2021-09-20 21:23 ` [PATCH v3 03/30] Hexagon HVX (target/hexagon) register names Taylor Simpson
@ 2021-09-20 21:23 ` Taylor Simpson
  2021-09-20 22:56   ` Richard Henderson
  2021-09-20 21:24 ` [PATCH v3 05/30] Hexagon HVX (target/hexagon) macros Taylor Simpson
                   ` (25 subsequent siblings)
  29 siblings, 1 reply; 51+ messages in thread
From: Taylor Simpson @ 2021-09-20 21:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: ale, bcain, tsimpson, richard.henderson, f4bug

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/attribs_def.h.inc | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/target/hexagon/attribs_def.h.inc b/target/hexagon/attribs_def.h.inc
index 3815509..4138a7a 100644
--- a/target/hexagon/attribs_def.h.inc
+++ b/target/hexagon/attribs_def.h.inc
@@ -41,6 +41,27 @@ DEF_ATTRIB(STORE, "Stores to memory", "", "")
 DEF_ATTRIB(MEMLIKE, "Memory-like instruction", "", "")
 DEF_ATTRIB(MEMLIKE_PACKET_RULES, "follows Memory-like packet rules", "", "")
 
+/* V6 Vector attributes */
+DEF_ATTRIB(CVI, "Executes on the HVX extension", "", "")
+
+DEF_ATTRIB(CVI_NEW, "New value memory instruction executes on HVX", "", "")
+DEF_ATTRIB(CVI_VM, "Memory instruction executes on HVX", "", "")
+DEF_ATTRIB(CVI_VP, "Permute instruction executes on HVX", "", "")
+DEF_ATTRIB(CVI_VP_VS, "Double vector permute/shft insn executes on HVX", "", "")
+DEF_ATTRIB(CVI_VX, "Multiply instruction executes on HVX", "", "")
+DEF_ATTRIB(CVI_VX_DV, "Double vector multiply insn executes on HVX", "", "")
+DEF_ATTRIB(CVI_VS, "Shift instruction executes on HVX", "", "")
+DEF_ATTRIB(CVI_VS_VX, "Permute/shift and multiply insn executes on HVX", "", "")
+DEF_ATTRIB(CVI_VA, "ALU instruction executes on HVX", "", "")
+DEF_ATTRIB(CVI_VA_DV, "Double vector alu instruction executes on HVX", "", "")
+DEF_ATTRIB(CVI_4SLOT, "Consumes all the vector execution resources", "", "")
+DEF_ATTRIB(CVI_TMP, "Transient Memory Load not written to register", "", "")
+DEF_ATTRIB(CVI_GATHER, "CVI Gather operation", "", "")
+DEF_ATTRIB(CVI_SCATTER, "CVI Scatter operation", "", "")
+DEF_ATTRIB(CVI_SCATTER_RELEASE, "CVI Store Release for scatter", "", "")
+DEF_ATTRIB(CVI_TMP_DST, "CVI instruction that doesn't write a register", "", "")
+DEF_ATTRIB(CVI_SLOT23, "Can execute in slot 2 or slot 3 (HVX)", "", "")
+
 
 /* Change-of-flow attributes */
 DEF_ATTRIB(JUMP, "Jump-type instruction", "", "")
@@ -86,6 +107,7 @@ DEF_ATTRIB(HWLOOP1_END, "Ends HW loop1", "", "")
 DEF_ATTRIB(DCZEROA, "dczeroa type", "", "")
 DEF_ATTRIB(ICFLUSHOP, "icflush op type", "", "")
 DEF_ATTRIB(DCFLUSHOP, "dcflush op type", "", "")
+DEF_ATTRIB(L2FLUSHOP, "l2flush op type", "", "")
 DEF_ATTRIB(DCFETCH, "dcfetch type", "", "")
 
 DEF_ATTRIB(L2FETCH, "Instruction is l2fetch type", "", "")
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 05/30] Hexagon HVX (target/hexagon) macros
  2021-09-20 21:23 [PATCH v3 00/30] Hexagon HVX (target/hexagon) patch series Taylor Simpson
                   ` (3 preceding siblings ...)
  2021-09-20 21:23 ` [PATCH v3 04/30] Hexagon HVX (target/hexagon) instruction attributes Taylor Simpson
@ 2021-09-20 21:24 ` Taylor Simpson
  2021-09-20 22:57   ` Richard Henderson
  2021-09-20 21:24 ` [PATCH v3 06/30] Hexagon HVX (target/hexagon) import macro definitions Taylor Simpson
                   ` (24 subsequent siblings)
  29 siblings, 1 reply; 51+ messages in thread
From: Taylor Simpson @ 2021-09-20 21:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: ale, bcain, tsimpson, richard.henderson, f4bug

macros to interface with the generator
macros referenced in instruction semantics

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/macros.h       |  22 +++
 target/hexagon/mmvec/macros.h | 341 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 363 insertions(+)
 create mode 100644 target/hexagon/mmvec/macros.h

diff --git a/target/hexagon/macros.h b/target/hexagon/macros.h
index 094b8da..d1d0348 100644
--- a/target/hexagon/macros.h
+++ b/target/hexagon/macros.h
@@ -269,6 +269,10 @@ static inline void gen_pred_cancel(TCGv pred, int slot_num)
 
 #define fNEWREG_ST(VAL) (VAL)
 
+#define fVSATUVALN(N, VAL) \
+    ({ \
+        (((int)(VAL)) < 0) ? 0 : ((1LL << (N)) - 1); \
+    })
 #define fSATUVALN(N, VAL) \
     ({ \
         fSET_OVERFLOW(); \
@@ -279,10 +283,16 @@ static inline void gen_pred_cancel(TCGv pred, int slot_num)
         fSET_OVERFLOW(); \
         ((VAL) < 0) ? (-(1LL << ((N) - 1))) : ((1LL << ((N) - 1)) - 1); \
     })
+#define fVSATVALN(N, VAL) \
+    ({ \
+        ((VAL) < 0) ? (-(1LL << ((N) - 1))) : ((1LL << ((N) - 1)) - 1); \
+    })
 #define fZXTN(N, M, VAL) (((N) != 0) ? extract64((VAL), 0, (N)) : 0LL)
 #define fSXTN(N, M, VAL) (((N) != 0) ? sextract64((VAL), 0, (N)) : 0LL)
 #define fSATN(N, VAL) \
     ((fSXTN(N, 64, VAL) == (VAL)) ? (VAL) : fSATVALN(N, VAL))
+#define fVSATN(N, VAL) \
+    ((fSXTN(N, 64, VAL) == (VAL)) ? (VAL) : fVSATVALN(N, VAL))
 #define fADDSAT64(DST, A, B) \
     do { \
         uint64_t __a = fCAST8u(A); \
@@ -305,12 +315,18 @@ static inline void gen_pred_cancel(TCGv pred, int slot_num)
             DST = __sum; \
         } \
     } while (0)
+#define fVSATUN(N, VAL) \
+    ((fZXTN(N, 64, VAL) == (VAL)) ? (VAL) : fVSATUVALN(N, VAL))
 #define fSATUN(N, VAL) \
     ((fZXTN(N, 64, VAL) == (VAL)) ? (VAL) : fSATUVALN(N, VAL))
 #define fSATH(VAL) (fSATN(16, VAL))
 #define fSATUH(VAL) (fSATUN(16, VAL))
+#define fVSATH(VAL) (fVSATN(16, VAL))
+#define fVSATUH(VAL) (fVSATUN(16, VAL))
 #define fSATUB(VAL) (fSATUN(8, VAL))
 #define fSATB(VAL) (fSATN(8, VAL))
+#define fVSATUB(VAL) (fVSATUN(8, VAL))
+#define fVSATB(VAL) (fVSATN(8, VAL))
 #define fIMMEXT(IMM) (IMM = IMM)
 #define fMUST_IMMEXT(IMM) fIMMEXT(IMM)
 
@@ -417,6 +433,8 @@ static inline TCGv gen_read_ireg(TCGv result, TCGv val, int shift)
 #define fCAST4s(A) ((int32_t)(A))
 #define fCAST8u(A) ((uint64_t)(A))
 #define fCAST8s(A) ((int64_t)(A))
+#define fCAST2_2s(A) ((int16_t)(A))
+#define fCAST2_2u(A) ((uint16_t)(A))
 #define fCAST4_4s(A) ((int32_t)(A))
 #define fCAST4_4u(A) ((uint32_t)(A))
 #define fCAST4_8s(A) ((int64_t)((int32_t)(A)))
@@ -514,7 +532,9 @@ static inline TCGv gen_read_ireg(TCGv result, TCGv val, int shift)
 #define fPM_M(REG, MVAL)    do { REG = REG + (MVAL); } while (0)
 #endif
 #define fSCALE(N, A) (((int64_t)(A)) << N)
+#define fVSATW(A) fVSATN(32, ((long long)A))
 #define fSATW(A) fSATN(32, ((long long)A))
+#define fVSAT(A) fVSATN(32, (A))
 #define fSAT(A) fSATN(32, (A))
 #define fSAT_ORIG_SHL(A, ORIG_REG) \
     ((((int32_t)((fSAT(A)) ^ ((int32_t)(ORIG_REG)))) < 0) \
@@ -651,12 +671,14 @@ static inline TCGv gen_read_ireg(TCGv result, TCGv val, int shift)
             fSETBIT(j, DST, VAL); \
         } \
     } while (0)
+#define fCOUNTONES_2(VAL) ctpop16(VAL)
 #define fCOUNTONES_4(VAL) ctpop32(VAL)
 #define fCOUNTONES_8(VAL) ctpop64(VAL)
 #define fBREV_8(VAL) revbit64(VAL)
 #define fBREV_4(VAL) revbit32(VAL)
 #define fCL1_8(VAL) clo64(VAL)
 #define fCL1_4(VAL) clo32(VAL)
+#define fCL1_2(VAL) (clz32(~(uint16_t)(VAL) & 0xffff) - 16)
 #define fINTERLEAVE(ODD, EVEN) interleave(ODD, EVEN)
 #define fDEINTERLEAVE(MIXED) deinterleave(MIXED)
 #define fHIDE(A) A
diff --git a/target/hexagon/mmvec/macros.h b/target/hexagon/mmvec/macros.h
new file mode 100644
index 0000000..e2d1e65
--- /dev/null
+++ b/target/hexagon/mmvec/macros.h
@@ -0,0 +1,341 @@
+/*
+ *  Copyright(c) 2019-2021 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HEXAGON_MMVEC_MACROS_H
+#define HEXAGON_MMVEC_MACROS_H
+
+#include "qemu/osdep.h"
+#include "qemu/host-utils.h"
+#include "arch.h"
+#include "mmvec/system_ext_mmvec.h"
+
+#ifndef QEMU_GENERATE
+#define VdV      (*(MMVector *)(VdV_void))
+#define VsV      (*(MMVector *)(VsV_void))
+#define VuV      (*(MMVector *)(VuV_void))
+#define VvV      (*(MMVector *)(VvV_void))
+#define VwV      (*(MMVector *)(VwV_void))
+#define VxV      (*(MMVector *)(VxV_void))
+#define VyV      (*(MMVector *)(VyV_void))
+
+#define VddV     (*(MMVectorPair *)(VddV_void))
+#define VuuV     (*(MMVectorPair *)(VuuV_void))
+#define VvvV     (*(MMVectorPair *)(VvvV_void))
+#define VxxV     (*(MMVectorPair *)(VxxV_void))
+
+#define QeV      (*(MMQReg *)(QeV_void))
+#define QdV      (*(MMQReg *)(QdV_void))
+#define QsV      (*(MMQReg *)(QsV_void))
+#define QtV      (*(MMQReg *)(QtV_void))
+#define QuV      (*(MMQReg *)(QuV_void))
+#define QvV      (*(MMQReg *)(QvV_void))
+#define QxV      (*(MMQReg *)(QxV_void))
+#endif
+
+#define LOG_VTCM_BYTE(VA, MASK, VAL, IDX) \
+    do { \
+        env->vtcm_log.data.ub[IDX] = (VAL); \
+        if (MASK) { \
+            set_bit((IDX), env->vtcm_log.mask); \
+        } else { \
+            clear_bit((IDX), env->vtcm_log.mask); \
+        } \
+        env->vtcm_log.va[IDX] = (VA); \
+    } while (0)
+
+#define fNOTQ(VAL) \
+    ({ \
+        MMQReg _ret;  \
+        int _i_;  \
+        for (_i_ = 0; _i_ < fVECSIZE() / 64; _i_++) { \
+            _ret.ud[_i_] = ~VAL.ud[_i_]; \
+        } \
+        _ret;\
+     })
+#define fGETQBITS(REG, WIDTH, MASK, BITNO) \
+    ((MASK) & (REG.w[(BITNO) >> 5] >> ((BITNO) & 0x1f)))
+#define fGETQBIT(REG, BITNO) fGETQBITS(REG, 1, 1, BITNO)
+#define fGENMASKW(QREG, IDX) \
+    (((fGETQBIT(QREG, (IDX * 4 + 0)) ? 0xFF : 0x0) << 0)  | \
+     ((fGETQBIT(QREG, (IDX * 4 + 1)) ? 0xFF : 0x0) << 8)  | \
+     ((fGETQBIT(QREG, (IDX * 4 + 2)) ? 0xFF : 0x0) << 16) | \
+     ((fGETQBIT(QREG, (IDX * 4 + 3)) ? 0xFF : 0x0) << 24))
+#define fGETNIBBLE(IDX, SRC) (fSXTN(4, 8, (SRC >> (4 * IDX)) & 0xF))
+#define fGETCRUMB(IDX, SRC) (fSXTN(2, 8, (SRC >> (2 * IDX)) & 0x3))
+#define fGETCRUMB_SYMMETRIC(IDX, SRC) \
+    ((fGETCRUMB(IDX, SRC) >= 0 ? (2 - fGETCRUMB(IDX, SRC)) \
+                               : fGETCRUMB(IDX, SRC)))
+#define fGENMASKH(QREG, IDX) \
+    (((fGETQBIT(QREG, (IDX * 2 + 0)) ? 0xFF : 0x0) << 0) | \
+     ((fGETQBIT(QREG, (IDX * 2 + 1)) ? 0xFF : 0x0) << 8))
+#define fGETMASKW(VREG, QREG, IDX) (VREG.w[IDX] & fGENMASKW((QREG), IDX))
+#define fGETMASKH(VREG, QREG, IDX) (VREG.h[IDX] & fGENMASKH((QREG), IDX))
+#define fCONDMASK8(QREG, IDX, YESVAL, NOVAL) \
+    (fGETQBIT(QREG, IDX) ? (YESVAL) : (NOVAL))
+#define fCONDMASK16(QREG, IDX, YESVAL, NOVAL) \
+    ((fGENMASKH(QREG, IDX) & (YESVAL)) | \
+     (fGENMASKH(fNOTQ(QREG), IDX) & (NOVAL)))
+#define fCONDMASK32(QREG, IDX, YESVAL, NOVAL) \
+    ((fGENMASKW(QREG, IDX) & (YESVAL)) | \
+     (fGENMASKW(fNOTQ(QREG), IDX) & (NOVAL)))
+#define fSETQBITS(REG, WIDTH, MASK, BITNO, VAL) \
+    do { \
+        uint32_t __TMP = (VAL); \
+        REG.w[(BITNO) >> 5] &= ~((MASK) << ((BITNO) & 0x1f)); \
+        REG.w[(BITNO) >> 5] |= (((__TMP) & (MASK)) << ((BITNO) & 0x1f)); \
+    } while (0)
+#define fSETQBIT(REG, BITNO, VAL) fSETQBITS(REG, 1, 1, BITNO, VAL)
+#define fVBYTES() (fVECSIZE())
+#define fVALIGN(ADDR, LOG2_ALIGNMENT) (ADDR = ADDR & ~(LOG2_ALIGNMENT - 1))
+#define fVLASTBYTE(ADDR, LOG2_ALIGNMENT) (ADDR = ADDR | (LOG2_ALIGNMENT - 1))
+#define fVELEM(WIDTH) ((fVECSIZE() * 8) / WIDTH)
+#define fVECLOGSIZE() (7)
+#define fVECSIZE() (1 << fVECLOGSIZE())
+#define fSWAPB(A, B) do { uint8_t tmp = A; A = B; B = tmp; } while (0)
+#define fV_AL_CHECK(EA, MASK) \
+    if ((EA) & (MASK)) { \
+        warn("aligning misaligned vector. EA=%08x", (EA)); \
+    }
+#define fSCATTER_INIT(REGION_START, LENGTH, ELEMENT_SIZE) \
+    mem_vector_scatter_init(env, slot, REGION_START, LENGTH, ELEMENT_SIZE)
+#define fGATHER_INIT(REGION_START, LENGTH, ELEMENT_SIZE) \
+    mem_vector_gather_init(env, REGION_START, LENGTH, ELEMENT_SIZE)
+#define fSCATTER_FINISH(OP)
+#define fGATHER_FINISH()
+#define fLOG_SCATTER_OP(SIZE) \
+    do { \
+        env->vtcm_log.op = true; \
+        env->vtcm_log.op_size = SIZE; \
+    } while (0)
+#define fVLOG_VTCM_WORD_INCREMENT(EA, OFFSET, INC, IDX, ALIGNMENT, LEN) \
+    do { \
+        int log_byte = 0; \
+        target_ulong va = EA; \
+        target_ulong va_high = EA + LEN; \
+        for (int i0 = 0; i0 < 4; i0++) { \
+            log_byte = (va + i0) <= va_high; \
+            LOG_VTCM_BYTE(va + i0, log_byte, INC. ub[4 * IDX + i0], \
+                          4 * IDX + i0); \
+        } \
+    } while (0)
+#define fVLOG_VTCM_HALFWORD_INCREMENT(EA, OFFSET, INC, IDX, ALIGNMENT, LEN) \
+    do { \
+        int log_byte = 0; \
+        target_ulong va = EA; \
+        target_ulong va_high = EA + LEN; \
+        for (int i0 = 0; i0 < 2; i0++) { \
+            log_byte = (va + i0) <= va_high; \
+            LOG_VTCM_BYTE(va + i0, log_byte, INC.ub[2 * IDX + i0], \
+                          2 * IDX + i0); \
+        } \
+    } while (0)
+
+#define fVLOG_VTCM_HALFWORD_INCREMENT_DV(EA, OFFSET, INC, IDX, IDX2, IDX_H, \
+                                         ALIGNMENT, LEN) \
+    do { \
+        int log_byte = 0; \
+        target_ulong va = EA; \
+        target_ulong va_high = EA + LEN; \
+        for (int i0 = 0; i0 < 2; i0++) { \
+            log_byte = (va + i0) <= va_high; \
+            LOG_VTCM_BYTE(va + i0, log_byte, INC.ub[2 * IDX + i0], \
+                          2 * IDX + i0); \
+        } \
+    } while (0)
+
+/* NOTE - Will this always be tmp_VRegs[0]; */
+#define GATHER_FUNCTION(EA, OFFSET, IDX, LEN, ELEMENT_SIZE, BANK_IDX, QVAL) \
+    do { \
+        int i0; \
+        target_ulong va = EA; \
+        target_ulong va_high = EA + LEN; \
+        uintptr_t ra = GETPC(); \
+        int log_bank = 0; \
+        int log_byte = 0; \
+        for (i0 = 0; i0 < ELEMENT_SIZE; i0++) { \
+            log_byte = ((va + i0) <= va_high) && QVAL; \
+            log_bank |= (log_byte << i0); \
+            uint8_t B; \
+            B = cpu_ldub_data_ra(env, EA + i0, ra); \
+            env->tmp_VRegs[0].ub[ELEMENT_SIZE * IDX + i0] = B; \
+            LOG_VTCM_BYTE(va + i0, log_byte, B, ELEMENT_SIZE * IDX + i0); \
+        } \
+    } while (0)
+#define fVLOG_VTCM_GATHER_WORD(EA, OFFSET, IDX, LEN) \
+    do { \
+        GATHER_FUNCTION(EA, OFFSET, IDX, LEN, 4, IDX, 1); \
+    } while (0)
+#define fVLOG_VTCM_GATHER_HALFWORD(EA, OFFSET, IDX, LEN) \
+    do { \
+        GATHER_FUNCTION(EA, OFFSET, IDX, LEN, 2, IDX, 1); \
+    } while (0)
+#define fVLOG_VTCM_GATHER_HALFWORD_DV(EA, OFFSET, IDX, IDX2, IDX_H, LEN) \
+    do { \
+        GATHER_FUNCTION(EA, OFFSET, IDX, LEN, 2, (2 * IDX2 + IDX_H), 1); \
+    } while (0)
+#define fVLOG_VTCM_GATHER_WORDQ(EA, OFFSET, IDX, Q, LEN) \
+    do { \
+        GATHER_FUNCTION(EA, OFFSET, IDX, LEN, 4, IDX, \
+                        fGETQBIT(QsV, 4 * IDX + i0)); \
+    } while (0)
+#define fVLOG_VTCM_GATHER_HALFWORDQ(EA, OFFSET, IDX, Q, LEN) \
+    do { \
+        GATHER_FUNCTION(EA, OFFSET, IDX, LEN, 2, IDX, \
+                        fGETQBIT(QsV, 2 * IDX + i0)); \
+    } while (0)
+#define fVLOG_VTCM_GATHER_HALFWORDQ_DV(EA, OFFSET, IDX, IDX2, IDX_H, Q, LEN) \
+    do { \
+        GATHER_FUNCTION(EA, OFFSET, IDX, LEN, 2, (2 * IDX2 + IDX_H), \
+                        fGETQBIT(QsV, 2 * IDX + i0)); \
+    } while (0)
+#define SCATTER_OP_WRITE_TO_MEM(TYPE) \
+    do { \
+        uintptr_t ra = GETPC(); \
+        for (int i = 0; i < env->vtcm_log.size; i += sizeof(TYPE)) { \
+            if (test_bit(i, env->vtcm_log.mask)) { \
+                TYPE dst = 0; \
+                TYPE inc = 0; \
+                for (int j = 0; j < sizeof(TYPE); j++) { \
+                    uint8_t val; \
+                    val = cpu_ldub_data_ra(env, env->vtcm_log.va[i + j], ra); \
+                    dst |= val << (8 * j); \
+                    inc |= env->vtcm_log.data.ub[j + i] << (8 * j); \
+                    clear_bit(j + i, env->vtcm_log.mask); \
+                    env->vtcm_log.data.ub[j + i] = 0; \
+                } \
+                dst += inc; \
+                for (int j = 0; j < sizeof(TYPE); j++) { \
+                    cpu_stb_data_ra(env, env->vtcm_log.va[i + j], \
+                                    (dst >> (8 * j)) & 0xFF, ra); \
+                } \
+            } \
+        } \
+    } while (0)
+#define SCATTER_FUNCTION(EA, OFFSET, IDX, LEN, ELEM_SIZE, BANK_IDX, QVAL, IN) \
+    do { \
+        int i0; \
+        target_ulong va = EA; \
+        target_ulong va_high = EA + LEN; \
+        int log_bank = 0; \
+        int log_byte = 0; \
+        for (i0 = 0; i0 < ELEM_SIZE; i0++) { \
+            log_byte = ((va + i0) <= va_high) && QVAL; \
+            log_bank |= (log_byte << i0); \
+            LOG_VTCM_BYTE(va + i0, log_byte, IN.ub[ELEM_SIZE * IDX + i0], \
+                          ELEM_SIZE * IDX + i0); \
+        } \
+    } while (0)
+#define fVLOG_VTCM_HALFWORD(EA, OFFSET, IN, IDX, LEN) \
+    do { \
+        SCATTER_FUNCTION(EA, OFFSET, IDX, LEN, 2, IDX, 1, IN); \
+    } while (0)
+#define fVLOG_VTCM_WORD(EA, OFFSET, IN, IDX, LEN) \
+    do { \
+        SCATTER_FUNCTION(EA, OFFSET, IDX, LEN, 4, IDX, 1, IN); \
+    } while (0)
+#define fVLOG_VTCM_HALFWORDQ(EA, OFFSET, IN, IDX, Q, LEN) \
+    do { \
+        SCATTER_FUNCTION(EA, OFFSET, IDX, LEN, 2, IDX, \
+                         fGETQBIT(QsV, 2 * IDX + i0), IN); \
+    } while (0)
+#define fVLOG_VTCM_WORDQ(EA, OFFSET, IN, IDX, Q, LEN) \
+    do { \
+        SCATTER_FUNCTION(EA, OFFSET, IDX, LEN, 4, IDX, \
+                         fGETQBIT(QsV, 4 * IDX + i0), IN); \
+    } while (0)
+#define fVLOG_VTCM_HALFWORD_DV(EA, OFFSET, IN, IDX, IDX2, IDX_H, LEN) \
+    do { \
+        SCATTER_FUNCTION(EA, OFFSET, IDX, LEN, 2, \
+                         (2 * IDX2 + IDX_H), 1, IN); \
+    } while (0)
+#define fVLOG_VTCM_HALFWORDQ_DV(EA, OFFSET, IN, IDX, Q, IDX2, IDX_H, LEN) \
+    do { \
+        SCATTER_FUNCTION(EA, OFFSET, IDX, LEN, 2, (2 * IDX2 + IDX_H), \
+                         fGETQBIT(QsV, 2 * IDX + i0), IN); \
+    } while (0)
+#define fSTORERELEASE(EA, TYPE) \
+    do { \
+        fV_AL_CHECK(EA, fVECSIZE() - 1); \
+    } while (0)
+#ifdef QEMU_GENERATE
+#define fLOADMMV(EA, DST) gen_vreg_load(ctx, DST##_off, EA, true)
+#endif
+#ifdef QEMU_GENERATE
+#define fLOADMMVU(EA, DST) gen_vreg_load(ctx, DST##_off, EA, false)
+#endif
+#ifdef QEMU_GENERATE
+#define fSTOREMMV(EA, SRC) \
+    gen_vreg_store(ctx, insn, pkt, EA, SRC##_off, insn->slot, true)
+#endif
+#ifdef QEMU_GENERATE
+#define fSTOREMMVQ(EA, SRC, MASK) \
+    gen_vreg_masked_store(ctx, EA, SRC##_off, MASK##_off, insn->slot, false)
+#endif
+#ifdef QEMU_GENERATE
+#define fSTOREMMVNQ(EA, SRC, MASK) \
+    gen_vreg_masked_store(ctx, EA, SRC##_off, MASK##_off, insn->slot, true)
+#endif
+#ifdef QEMU_GENERATE
+#define fSTOREMMVU(EA, SRC) \
+    gen_vreg_store(ctx, insn, pkt, EA, SRC##_off, insn->slot, false)
+#endif
+#define fVFOREACH(WIDTH, VAR) for (VAR = 0; VAR < fVELEM(WIDTH); VAR++)
+#define fVARRAY_ELEMENT_ACCESS(ARRAY, TYPE, INDEX) \
+    ARRAY.v[(INDEX) / (fVECSIZE() / (sizeof(ARRAY.TYPE[0])))].TYPE[(INDEX) % \
+    (fVECSIZE() / (sizeof(ARRAY.TYPE[0])))]
+
+#define fVSATDW(U, V) fVSATW(((((long long)U) << 32) | fZXTN(32, 64, V)))
+#define fVASL_SATHI(U, V) fVSATW(((U) << 1) | ((V) >> 31))
+#define fVUADDSAT(WIDTH, U, V) \
+    fVSATUN(WIDTH, fZXTN(WIDTH, 2 * WIDTH, U) + fZXTN(WIDTH, 2 * WIDTH, V))
+#define fVSADDSAT(WIDTH, U, V) \
+    fVSATN(WIDTH, fSXTN(WIDTH, 2 * WIDTH, U) + fSXTN(WIDTH, 2 * WIDTH, V))
+#define fVUSUBSAT(WIDTH, U, V) \
+    fVSATUN(WIDTH, fZXTN(WIDTH, 2 * WIDTH, U) - fZXTN(WIDTH, 2 * WIDTH, V))
+#define fVSSUBSAT(WIDTH, U, V) \
+    fVSATN(WIDTH, fSXTN(WIDTH, 2 * WIDTH, U) - fSXTN(WIDTH, 2 * WIDTH, V))
+#define fVAVGU(WIDTH, U, V) \
+    ((fZXTN(WIDTH, 2 * WIDTH, U) + fZXTN(WIDTH, 2 * WIDTH, V)) >> 1)
+#define fVAVGURND(WIDTH, U, V) \
+    ((fZXTN(WIDTH, 2 * WIDTH, U) + fZXTN(WIDTH, 2 * WIDTH, V) + 1) >> 1)
+#define fVNAVGU(WIDTH, U, V) \
+    ((fZXTN(WIDTH, 2 * WIDTH, U) - fZXTN(WIDTH, 2 * WIDTH, V)) >> 1)
+#define fVNAVGURNDSAT(WIDTH, U, V) \
+    fVSATUN(WIDTH, ((fZXTN(WIDTH, 2 * WIDTH, U) - \
+                     fZXTN(WIDTH, 2 * WIDTH, V) + 1) >> 1))
+#define fVAVGS(WIDTH, U, V) \
+    ((fSXTN(WIDTH, 2 * WIDTH, U) + fSXTN(WIDTH, 2 * WIDTH, V)) >> 1)
+#define fVAVGSRND(WIDTH, U, V) \
+    ((fSXTN(WIDTH, 2 * WIDTH, U) + fSXTN(WIDTH, 2 * WIDTH, V) + 1) >> 1)
+#define fVNAVGS(WIDTH, U, V) \
+    ((fSXTN(WIDTH, 2 * WIDTH, U) - fSXTN(WIDTH, 2 * WIDTH, V)) >> 1)
+#define fVNAVGSRND(WIDTH, U, V) \
+    ((fSXTN(WIDTH, 2 * WIDTH, U) - fSXTN(WIDTH, 2 * WIDTH, V) + 1) >> 1)
+#define fVNAVGSRNDSAT(WIDTH, U, V) \
+    fVSATN(WIDTH, ((fSXTN(WIDTH, 2 * WIDTH, U) - \
+                    fSXTN(WIDTH, 2 * WIDTH, V) + 1) >> 1))
+#define fVNOROUND(VAL, SHAMT) VAL
+#define fVNOSAT(VAL) VAL
+#define fVROUND(VAL, SHAMT) \
+    ((VAL) + (((SHAMT) > 0) ? (1LL << ((SHAMT) - 1)) : 0))
+#define fCARRY_FROM_ADD32(A, B, C) \
+    (((fZXTN(32, 64, A) + fZXTN(32, 64, B) + C) >> 32) & 1)
+#define fUARCH_NOTE_PUMP_4X()
+#define fUARCH_NOTE_PUMP_2X()
+
+#define IV1DEAD()
+#endif
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 06/30] Hexagon HVX (target/hexagon) import macro definitions
  2021-09-20 21:23 [PATCH v3 00/30] Hexagon HVX (target/hexagon) patch series Taylor Simpson
                   ` (4 preceding siblings ...)
  2021-09-20 21:24 ` [PATCH v3 05/30] Hexagon HVX (target/hexagon) macros Taylor Simpson
@ 2021-09-20 21:24 ` Taylor Simpson
  2021-09-20 22:58   ` Richard Henderson
  2021-09-20 21:24 ` [PATCH v3 07/30] Hexagon HVX (target/hexagon) semantics generator Taylor Simpson
                   ` (23 subsequent siblings)
  29 siblings, 1 reply; 51+ messages in thread
From: Taylor Simpson @ 2021-09-20 21:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: ale, bcain, tsimpson, richard.henderson, f4bug

Imported from the Hexagon architecture library
    imported/allext_macros.def       Top level macro include for all extensions
    imported/macros.def              Scalar core macros (some HVX here)
    imported/mmvec/macros.def        HVX macro definitions
The macro definition files specify instruction attributes that are applied
to each instruction that reverences the macro.

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/imported/allext_macros.def |  25 +
 target/hexagon/imported/macros.def        |  88 ++++
 target/hexagon/imported/mmvec/macros.def  | 842 ++++++++++++++++++++++++++++++
 3 files changed, 955 insertions(+)
 create mode 100644 target/hexagon/imported/allext_macros.def
 create mode 100755 target/hexagon/imported/mmvec/macros.def

diff --git a/target/hexagon/imported/allext_macros.def b/target/hexagon/imported/allext_macros.def
new file mode 100644
index 0000000..9c91199
--- /dev/null
+++ b/target/hexagon/imported/allext_macros.def
@@ -0,0 +1,25 @@
+/*
+ *  Copyright(c) 2019-2021 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+/*
+ * Top level file for all instruction set extensions
+ */
+#define EXTNAME mmvec
+#define EXTSTR "mmvec"
+#include "mmvec/macros.def"
+#undef EXTNAME
+#undef EXTSTR
diff --git a/target/hexagon/imported/macros.def b/target/hexagon/imported/macros.def
index 32ed3bf..e23f915 100755
--- a/target/hexagon/imported/macros.def
+++ b/target/hexagon/imported/macros.def
@@ -177,6 +177,12 @@ DEF_MACRO(
 )
 
 DEF_MACRO(
+    fVSATUVALN,
+    ({ ((VAL) < 0) ? 0 : ((1LL<<(N))-1);}),
+    ()
+)
+
+DEF_MACRO(
     fSATUVALN,
     ({fSET_OVERFLOW(); ((VAL) < 0) ? 0 : ((1LL<<(N))-1);}),
     ()
@@ -189,6 +195,12 @@ DEF_MACRO(
 )
 
 DEF_MACRO(
+    fVSATVALN,
+    ({((VAL) < 0) ? (-(1LL<<((N)-1))) : ((1LL<<((N)-1))-1);}),
+    ()
+)
+
+DEF_MACRO(
     fZXTN, /* macro name */
     ((VAL) & ((1LL<<(N))-1)),
     /* attribs */
@@ -205,6 +217,11 @@ DEF_MACRO(
     ((fSXTN(N,64,VAL) == (VAL)) ? (VAL) : fSATVALN(N,VAL)),
     ()
 )
+DEF_MACRO(
+    fVSATN,
+    ((fSXTN(N,64,VAL) == (VAL)) ? (VAL) : fVSATVALN(N,VAL)),
+    ()
+)
 
 DEF_MACRO(
     fADDSAT64,
@@ -235,6 +252,12 @@ DEF_MACRO(
 )
 
 DEF_MACRO(
+    fVSATUN,
+    ((fZXTN(N,64,VAL) == (VAL)) ? (VAL) : fVSATUVALN(N,VAL)),
+    ()
+)
+
+DEF_MACRO(
     fSATUN,
     ((fZXTN(N,64,VAL) == (VAL)) ? (VAL) : fSATUVALN(N,VAL)),
     ()
@@ -254,6 +277,19 @@ DEF_MACRO(
 )
 
 DEF_MACRO(
+    fVSATH,
+    (fVSATN(16,VAL)),
+    ()
+)
+
+DEF_MACRO(
+    fVSATUH,
+    (fVSATUN(16,VAL)),
+    ()
+)
+
+
+DEF_MACRO(
     fSATUB,
     (fSATUN(8,VAL)),
     ()
@@ -265,6 +301,20 @@ DEF_MACRO(
 )
 
 
+DEF_MACRO(
+    fVSATUB,
+    (fVSATUN(8,VAL)),
+    ()
+)
+DEF_MACRO(
+    fVSATB,
+    (fVSATN(8,VAL)),
+    ()
+)
+
+
+
+
 /*************************************/
 /* immediate extension               */
 /*************************************/
@@ -557,6 +607,18 @@ DEF_MACRO(
 )
 
 DEF_MACRO(
+    fCAST2_2s, /* macro name */
+    ((size2s_t)(A)),
+    /* optional attributes */
+)
+
+DEF_MACRO(
+    fCAST2_2u, /* macro name */
+    ((size2u_t)(A)),
+    /* optional attributes */
+)
+
+DEF_MACRO(
     fCAST4_4s, /* macro name */
     ((size4s_t)(A)),
     /* optional attributes */
@@ -876,6 +938,11 @@ DEF_MACRO(
     (((size8s_t)(A))<<N),
     /* optional attributes */
 )
+DEF_MACRO(
+    fVSATW, /* saturating to 32-bits*/
+    fVSATN(32,((long long)A)),
+    ()
+)
 
 DEF_MACRO(
     fSATW, /* saturating to 32-bits*/
@@ -884,6 +951,12 @@ DEF_MACRO(
 )
 
 DEF_MACRO(
+    fVSAT, /* saturating to 32-bits*/
+    fVSATN(32,(A)),
+    ()
+)
+
+DEF_MACRO(
     fSAT, /* saturating to 32-bits*/
     fSATN(32,(A)),
     ()
@@ -1389,6 +1462,11 @@ DEF_MACRO(fSETBITS,
 /*************************************/
 /* Used for parity, etc........      */
 /*************************************/
+DEF_MACRO(fCOUNTONES_2,
+    count_ones_2(VAL),
+    /* nothing */
+)
+
 DEF_MACRO(fCOUNTONES_4,
     count_ones_4(VAL),
     /* nothing */
@@ -1419,6 +1497,11 @@ DEF_MACRO(fCL1_4,
     /* nothing */
 )
 
+DEF_MACRO(fCL1_2,
+    count_leading_ones_2(VAL),
+    /* nothing */
+)
+
 DEF_MACRO(fINTERLEAVE,
     interleave(ODD,EVEN),
     /* nothing */
@@ -1576,3 +1659,8 @@ DEF_MACRO(fBRANCH_SPECULATE_STALL,
     },
     ()
 )
+
+DEF_MACRO(IV1DEAD,
+    ,
+    ()
+)
diff --git a/target/hexagon/imported/mmvec/macros.def b/target/hexagon/imported/mmvec/macros.def
new file mode 100755
index 0000000..7e5438a
--- /dev/null
+++ b/target/hexagon/imported/mmvec/macros.def
@@ -0,0 +1,842 @@
+/*
+ *  Copyright(c) 2019-2021 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+DEF_MACRO(fDUMPQ,
+	do {
+		printf(STR ":" #REG ": 0x%016llx\n",REG.ud[0]);
+	} while (0),
+	()
+)
+
+DEF_MACRO(fUSE_LOOKUP_ADDRESS_BY_REV,
+	PROC->arch_proc_options->mmvec_use_full_va_for_lookup,
+	()
+)
+
+DEF_MACRO(fUSE_LOOKUP_ADDRESS,
+	1,
+	()
+)
+
+DEF_MACRO(fNOTQ,
+	({mmqreg_t _ret = {0}; int _i_; for (_i_ = 0; _i_ < fVECSIZE()/64; _i_++) _ret.ud[_i_] = ~VAL.ud[_i_]; _ret;}),
+	()
+)
+
+DEF_MACRO(fGETQBITS,
+	((MASK) & (REG.w[(BITNO)>>5] >> ((BITNO) & 0x1f))),
+	()
+)
+
+DEF_MACRO(fGETQBIT,
+	fGETQBITS(REG,1,1,BITNO),
+	()
+)
+
+DEF_MACRO(fGENMASKW,
+	(((fGETQBIT(QREG,(IDX*4+0)) ? 0xFF : 0x0) << 0)
+	|((fGETQBIT(QREG,(IDX*4+1)) ? 0xFF : 0x0) << 8)
+	|((fGETQBIT(QREG,(IDX*4+2)) ? 0xFF : 0x0) << 16)
+	|((fGETQBIT(QREG,(IDX*4+3)) ? 0xFF : 0x0) << 24)),
+	()
+)
+DEF_MACRO(fGET10BIT,
+	{
+		COE = (((((fGETUBYTE(3,VAL) >> (2 * POS)) & 3) << 8) | fGETUBYTE(POS,VAL)) << 6);
+		COE >>= 6;
+	},
+	()
+)
+
+DEF_MACRO(fVMAX,
+	(X>Y) ? X : Y,
+	()
+)
+
+
+DEF_MACRO(fGETNIBBLE,
+    ( fSXTN(4,8,(SRC >> (4*IDX)) & 0xF) ),
+    ()
+)
+
+DEF_MACRO(fGETCRUMB,
+    ( fSXTN(2,8,(SRC >> (2*IDX)) & 0x3) ),
+    ()
+)
+
+DEF_MACRO(fGETCRUMB_SYMMETRIC,
+    ( (fGETCRUMB(IDX,SRC)>=0 ? (2-fGETCRUMB(IDX,SRC)) : fGETCRUMB(IDX,SRC) ) ),
+    ()
+)
+
+#define ZERO_OFFSET_2B +
+
+DEF_MACRO(fGENMASKH,
+	(((fGETQBIT(QREG,(IDX*2+0)) ? 0xFF : 0x0) << 0)
+	|((fGETQBIT(QREG,(IDX*2+1)) ? 0xFF : 0x0) << 8)),
+	()
+)
+
+DEF_MACRO(fGETMASKW,
+	(VREG.w[IDX] & fGENMASKW((QREG),IDX)),
+	()
+)
+
+DEF_MACRO(fGETMASKH,
+	(VREG.h[IDX] & fGENMASKH((QREG),IDX)),
+	()
+)
+
+DEF_MACRO(fCONDMASK8,
+	(fGETQBIT(QREG,IDX) ? (YESVAL) : (NOVAL)),
+	()
+)
+
+DEF_MACRO(fCONDMASK16,
+	((fGENMASKH(QREG,IDX) & (YESVAL)) | (fGENMASKH(fNOTQ(QREG),IDX) & (NOVAL))),
+	()
+)
+
+DEF_MACRO(fCONDMASK32,
+	((fGENMASKW(QREG,IDX) & (YESVAL)) | (fGENMASKW(fNOTQ(QREG),IDX) & (NOVAL))),
+	()
+)
+
+
+DEF_MACRO(fSETQBITS,
+	do {
+		size4u_t __TMP = (VAL);
+		REG.w[(BITNO)>>5] &= ~((MASK) << ((BITNO) & 0x1f));
+		REG.w[(BITNO)>>5] |= (((__TMP) & (MASK)) << ((BITNO) & 0x1f));
+	} while (0),
+	()
+)
+
+DEF_MACRO(fSETQBIT,
+	fSETQBITS(REG,1,1,BITNO,VAL),
+	()
+)
+
+DEF_MACRO(fVBYTES,
+	(fVECSIZE()),
+	()
+)
+
+DEF_MACRO(fVHALVES,
+	(fVECSIZE()/2),
+	()
+)
+
+DEF_MACRO(fVWORDS,
+	(fVECSIZE()/4),
+	()
+)
+
+DEF_MACRO(fVDWORDS,
+	(fVECSIZE()/8),
+	()
+)
+
+DEF_MACRO(fVALIGN,
+    ( ADDR = ADDR & ~(LOG2_ALIGNMENT-1)),
+    ()
+)
+
+DEF_MACRO(fVLASTBYTE,
+    ( ADDR = ADDR | (LOG2_ALIGNMENT-1)),
+    ()
+)
+
+
+DEF_MACRO(fVELEM,
+    ((fVECSIZE()*8)/WIDTH),
+    ()
+)
+
+DEF_MACRO(fVECLOGSIZE,
+    (mmvec_current_veclogsize(thread)),
+    ()
+)
+
+DEF_MACRO(fVECSIZE,
+    (1<<fVECLOGSIZE()),
+    ()
+)
+
+DEF_MACRO(fSWAPB,
+    {
+		size1u_t tmp = A;
+		A = B;
+		B = tmp;
+	},
+    /* NOTHING */
+)
+
+DEF_MACRO(
+	fVZERO,
+	mmvec_zero_vector(),
+	()
+)
+
+DEF_MACRO(
+    fNEWVREG,
+    ((THREAD2STRUCT->VRegs_updated & (((VRegMask)1)<<VNUM)) ? THREAD2STRUCT->future_VRegs[VNUM] : mmvec_zero_vector()),
+    (A_DOTNEWVALUE,A_RESTRICT_SLOT0ONLY)
+)
+
+DEF_MACRO(
+	fV_AL_CHECK,
+	if ((EA) & (MASK)) {
+		warn("aligning misaligned vector. PC=%08x EA=%08x",thread->Regs[REG_PC],(EA));
+	},
+	()
+)
+DEF_MACRO(fSCATTER_INIT,
+    {
+    mem_vector_scatter_init(thread, insn,   REGION_START, LENGTH, ELEMENT_SIZE);
+	if (EXCEPTION_DETECTED) return;
+    },
+    (A_STORE,A_MEMLIKE,A_RESTRICT_SLOT0ONLY)
+)
+
+DEF_MACRO(fGATHER_INIT,
+    {
+    mem_vector_gather_init(thread, insn,   REGION_START, LENGTH, ELEMENT_SIZE);
+	if (EXCEPTION_DETECTED) return;
+    },
+    (A_LOAD,A_MEMLIKE,A_RESTRICT_SLOT1ONLY)
+)
+
+DEF_MACRO(fSCATTER_FINISH,
+    {
+	if (EXCEPTION_DETECTED) return;
+    mem_vector_scatter_finish(thread, insn, OP);
+    },
+    ()
+)
+
+DEF_MACRO(fGATHER_FINISH,
+    {
+	if (EXCEPTION_DETECTED) return;
+    mem_vector_gather_finish(thread, insn);
+    },
+    ()
+)
+
+
+DEF_MACRO(CHECK_VTCM_PAGE,
+     {
+        int slot = insn->slot;
+        paddr_t pa = thread->mem_access[slot].paddr+OFFSET;
+        pa = pa & ~(ALIGNMENT-1);
+        FLAG = (pa < (thread->mem_access[slot].paddr+LENGTH));
+     },
+    ()
+)
+DEF_MACRO(COUNT_OUT_OF_BOUNDS,
+     {
+        if (!FLAG)
+        {
+               THREAD2STRUCT->vtcm_log.oob_access += SIZE;
+               warn("Scatter/Gather out of bounds of region");
+        }
+     },
+    ()
+)
+
+DEF_MACRO(fLOG_SCATTER_OP,
+    {
+        // Log the size and indicate that the extension ext.c file needs to increment right before memory write
+        THREAD2STRUCT->vtcm_log.op = 1;
+        THREAD2STRUCT->vtcm_log.op_size = SIZE;
+    },
+    ()
+)
+
+
+
+DEF_MACRO(fVLOG_VTCM_WORD_INCREMENT,
+    {
+        int slot = insn->slot;
+        int log_bank = 0;
+        int log_byte =0;
+        paddr_t pa = thread->mem_access[slot].paddr+(OFFSET & ~(ALIGNMENT-1));
+        paddr_t pa_high = thread->mem_access[slot].paddr+LEN;
+        for(int i0 = 0; i0 < 4; i0++)
+        {
+            log_byte =  ((OFFSET>=0)&&((pa+i0)<=pa_high));
+            log_bank |= (log_byte<<i0);
+            LOG_VTCM_BYTE(pa+i0,log_byte,INC.ub[4*IDX+i0],4*IDX+i0);
+        }
+        { LOG_VTCM_BANK(pa, log_bank, IDX); }
+    },
+    ()
+)
+
+DEF_MACRO(fVLOG_VTCM_HALFWORD_INCREMENT,
+    {
+        int slot = insn->slot;
+        int log_bank = 0;
+        int log_byte = 0;
+        paddr_t pa = thread->mem_access[slot].paddr+(OFFSET & ~(ALIGNMENT-1));
+        paddr_t pa_high = thread->mem_access[slot].paddr+LEN;
+        for(int i0 = 0; i0 < 2; i0++) {
+            log_byte =  ((OFFSET>=0)&&((pa+i0)<=pa_high));
+            log_bank |= (log_byte<<i0);
+            LOG_VTCM_BYTE(pa+i0,log_byte,INC.ub[2*IDX+i0],2*IDX+i0);
+        }
+        { LOG_VTCM_BANK(pa, log_bank,IDX); }
+    },
+    ()
+)
+
+DEF_MACRO(fVLOG_VTCM_HALFWORD_INCREMENT_DV,
+    {
+        int slot = insn->slot;
+        int log_bank = 0;
+        int log_byte = 0;
+        paddr_t pa = thread->mem_access[slot].paddr+(OFFSET & ~(ALIGNMENT-1));
+        paddr_t pa_high = thread->mem_access[slot].paddr+LEN;
+        for(int i0 = 0; i0 < 2; i0++) {
+            log_byte =  ((OFFSET>=0)&&((pa+i0)<=pa_high));
+            log_bank |= (log_byte<<i0);
+            LOG_VTCM_BYTE(pa+i0,log_byte,INC.ub[2*IDX+i0],2*IDX+i0);
+        }
+        { LOG_VTCM_BANK(pa, log_bank,(2*IDX2+IDX_H));}
+    },
+    ()
+)
+
+
+
+DEF_MACRO(GATHER_FUNCTION,
+{
+        int slot = insn->slot;
+        int i0;
+        paddr_t pa = thread->mem_access[slot].paddr+OFFSET;
+        paddr_t pa_high = thread->mem_access[slot].paddr+LEN;
+        int log_bank = 0;
+        int log_byte = 0;
+        for(i0 = 0; i0 < ELEMENT_SIZE; i0++)
+        {
+            log_byte =  ((OFFSET>=0)&&((pa+i0)<=pa_high)) && QVAL;
+            log_bank |= (log_byte<<i0);
+            size1u_t B  = sim_mem_read1(thread->system_ptr, thread->threadId, thread->mem_access[slot].paddr+OFFSET+i0);
+            THREAD2STRUCT->tmp_VRegs[0].ub[ELEMENT_SIZE*IDX+i0] = B;
+            LOG_VTCM_BYTE(pa+i0,log_byte,B,ELEMENT_SIZE*IDX+i0);
+        }
+        LOG_VTCM_BANK(pa, log_bank,BANK_IDX);
+},
+()
+)
+
+
+
+DEF_MACRO(fVLOG_VTCM_GATHER_WORD,
+    {
+		GATHER_FUNCTION(EA,OFFSET,IDX, LEN, 4, IDX, 1);
+    },
+    ()
+)
+DEF_MACRO(fVLOG_VTCM_GATHER_HALFWORD,
+    {
+		GATHER_FUNCTION(EA,OFFSET,IDX, LEN, 2, IDX, 1);
+    },
+    ()
+)
+DEF_MACRO(fVLOG_VTCM_GATHER_HALFWORD_DV,
+    {
+		GATHER_FUNCTION(EA,OFFSET,IDX, LEN, 2, (2*IDX2+IDX_H), 1);
+    },
+    ()
+)
+DEF_MACRO(fVLOG_VTCM_GATHER_WORDQ,
+    {
+		GATHER_FUNCTION(EA,OFFSET,IDX, LEN, 4, IDX, fGETQBIT(QsV,4*IDX+i0));
+    },
+    ()
+)
+DEF_MACRO(fVLOG_VTCM_GATHER_HALFWORDQ,
+    {
+		GATHER_FUNCTION(EA,OFFSET,IDX, LEN, 2, IDX, fGETQBIT(QsV,2*IDX+i0));
+    },
+    ()
+)
+
+DEF_MACRO(fVLOG_VTCM_GATHER_HALFWORDQ_DV,
+    {
+		GATHER_FUNCTION(EA,OFFSET,IDX, LEN, 2, (2*IDX2+IDX_H), fGETQBIT(QsV,2*IDX+i0));
+    },
+    ()
+)
+
+
+DEF_MACRO(DEBUG_LOG_ADDR,
+    {
+
+        if (thread->processor_ptr->arch_proc_options->mmvec_network_addr_log2)
+        {
+
+            int slot = insn->slot;
+            paddr_t pa = thread->mem_access[slot].paddr+OFFSET;
+        }
+    },
+    ()
+)
+
+
+
+
+
+
+
+DEF_MACRO(SCATTER_OP_WRITE_TO_MEM,
+    {
+        for (int i = 0; i < mmvecx->vtcm_log.size; i+=sizeof(TYPE))
+        {
+            if ( mmvecx->vtcm_log.mask.ub[i] != 0) {
+                TYPE dst = 0;
+                TYPE inc = 0;
+                for(int j = 0; j < sizeof(TYPE); j++) {
+                    dst |= (sim_mem_read1(thread->system_ptr, thread->threadId, mmvecx->vtcm_log.pa[i+j]) << (8*j));
+                    inc |= mmvecx->vtcm_log.data.ub[j+i] << (8*j);
+
+                    mmvecx->vtcm_log.mask.ub[j+i] = 0;
+                    mmvecx->vtcm_log.data.ub[j+i] = 0;
+                    mmvecx->vtcm_log.offsets.ub[j+i] = 0;
+                }
+                dst += inc;
+                for(int j = 0; j < sizeof(TYPE); j++) {
+                    sim_mem_write1(thread->system_ptr,thread->threadId, mmvecx->vtcm_log.pa[i+j], (dst >> (8*j))& 0xFF );
+                }
+        }
+
+    }
+    },
+    ()
+)
+
+DEF_MACRO(SCATTER_FUNCTION,
+{
+        int slot = insn->slot;
+        int i0;
+        paddr_t pa = thread->mem_access[slot].paddr+OFFSET;
+        paddr_t pa_high = thread->mem_access[slot].paddr+LEN;
+        int log_bank = 0;
+        int log_byte = 0;
+        for(i0 = 0; i0 < ELEMENT_SIZE; i0++) {
+            log_byte = ((OFFSET>=0)&&((pa+i0)<=pa_high)) && QVAL;
+            log_bank |= (log_byte<<i0);
+            LOG_VTCM_BYTE(pa+i0,log_byte,IN.ub[ELEMENT_SIZE*IDX+i0],ELEMENT_SIZE*IDX+i0);
+        }
+        LOG_VTCM_BANK(pa, log_bank,BANK_IDX);
+
+},
+()
+)
+
+DEF_MACRO(fVLOG_VTCM_HALFWORD,
+    {
+		SCATTER_FUNCTION (EA,OFFSET,IDX, LEN, 2, IDX, 1, IN);
+    },
+    ()
+)
+DEF_MACRO(fVLOG_VTCM_WORD,
+    {
+		SCATTER_FUNCTION (EA,OFFSET,IDX, LEN, 4, IDX, 1, IN);
+    },
+    ()
+)
+
+DEF_MACRO(fVLOG_VTCM_HALFWORDQ,
+    {
+		SCATTER_FUNCTION (EA,OFFSET,IDX, LEN, 2, IDX, fGETQBIT(QsV,2*IDX+i0), IN);
+    },
+    ()
+)
+DEF_MACRO(fVLOG_VTCM_WORDQ,
+    {
+		SCATTER_FUNCTION (EA,OFFSET,IDX, LEN, 4, IDX, fGETQBIT(QsV,4*IDX+i0), IN);
+    },
+    ()
+)
+
+
+
+
+
+DEF_MACRO(fVLOG_VTCM_HALFWORD_DV,
+    {
+		SCATTER_FUNCTION (EA,OFFSET,IDX, LEN, 2, (2*IDX2+IDX_H), 1, IN);
+    },
+    ()
+)
+
+DEF_MACRO(fVLOG_VTCM_HALFWORDQ_DV,
+    {
+		SCATTER_FUNCTION (EA,OFFSET,IDX, LEN, 2, (2*IDX2+IDX_H), fGETQBIT(QsV,2*IDX+i0), IN);
+    },
+    ()
+)
+
+
+
+
+
+
+DEF_MACRO(fSTORERELEASE,
+    {
+        fV_AL_CHECK(EA,fVECSIZE()-1);
+
+        mem_store_release(thread, insn, fVECSIZE(), EA&~(fVECSIZE()-1), EA, TYPE, fUSE_LOOKUP_ADDRESS_BY_REV(thread->processor_ptr));
+    },
+	(A_STORE,A_MEMLIKE)
+)
+
+DEF_MACRO(fVFETCH_AL,
+    {
+    fV_AL_CHECK(EA,fVECSIZE()-1);
+    mem_fetch_vector(thread, insn, EA&~(fVECSIZE()-1), insn->slot, fVECSIZE());
+    },
+    (A_LOAD,A_MEMLIKE)
+)
+
+
+DEF_MACRO(fLOADMMV_AL,
+    {
+    fV_AL_CHECK(EA,ALIGNMENT-1);
+	thread->last_pkt->double_access_vec = 0;
+    mem_load_vector_oddva(thread, insn, EA&~(ALIGNMENT-1), EA, insn->slot, LEN, &DST.ub[0], LEN, fUSE_LOOKUP_ADDRESS_BY_REV(thread->processor_ptr));
+    },
+    (A_LOAD,A_MEMLIKE)
+)
+
+DEF_MACRO(fLOADMMV,
+	fLOADMMV_AL(EA,fVECSIZE(),fVECSIZE(),DST),
+	()
+)
+
+DEF_MACRO(fLOADMMVQ,
+	do {
+		int __i;
+		fLOADMMV_AL(EA,fVECSIZE(),fVECSIZE(),DST);
+		fVFOREACH(8,__i) if (!fGETQBIT(QVAL,__i)) DST.b[__i] = 0;
+	} while (0),
+	()
+)
+
+DEF_MACRO(fLOADMMVNQ,
+	do {
+		int __i;
+		fLOADMMV_AL(EA,fVECSIZE(),fVECSIZE(),DST);
+		fVFOREACH(8,__i) if (fGETQBIT(QVAL,__i)) DST.b[__i] = 0;
+	} while (0),
+	()
+)
+
+DEF_MACRO(fLOADMMVU_AL,
+    {
+    size4u_t size2 = (EA)&(ALIGNMENT-1);
+    size4u_t size1 = LEN-size2;
+	thread->last_pkt->double_access_vec = 1;
+    mem_load_vector_oddva(thread, insn, EA+size1, EA+fVECSIZE(), /* slot */ 1, size2, &DST.ub[size1], size2, fUSE_LOOKUP_ADDRESS());
+    mem_load_vector_oddva(thread, insn, EA, EA,/* slot */ 0, size1, &DST.ub[0], size1, fUSE_LOOKUP_ADDRESS_BY_REV(thread->processor_ptr));
+    },
+    (A_LOAD,A_MEMLIKE)
+)
+
+DEF_MACRO(fLOADMMVU,
+	{
+		/* if address happens to be aligned, only do aligned load */
+        thread->last_pkt->pkt_has_vtcm_access = 0;
+        thread->last_pkt->pkt_access_count = 0;
+		if ( (EA & (fVECSIZE()-1)) == 0) {
+            thread->last_pkt->pkt_has_vmemu_access = 0;
+			thread->last_pkt->double_access = 0;
+
+			fLOADMMV_AL(EA,fVECSIZE(),fVECSIZE(),DST);
+		} else {
+            thread->last_pkt->pkt_has_vmemu_access = 1;
+			thread->last_pkt->double_access = 1;
+
+			fLOADMMVU_AL(EA,fVECSIZE(),fVECSIZE(),DST);
+		}
+	},
+	()
+)
+
+DEF_MACRO(fSTOREMMV_AL,
+    {
+    fV_AL_CHECK(EA,ALIGNMENT-1);
+    mem_store_vector_oddva(thread, insn, EA&~(ALIGNMENT-1), EA, insn->slot, LEN, &SRC.ub[0], 0, 0, fUSE_LOOKUP_ADDRESS_BY_REV(thread->processor_ptr));
+    },
+    (A_STORE,A_MEMLIKE)
+)
+
+DEF_MACRO(fSTOREMMV,
+	fSTOREMMV_AL(EA,fVECSIZE(),fVECSIZE(),SRC),
+	()
+)
+
+DEF_MACRO(fSTOREMMVQ_AL,
+    do {
+	mmvector_t maskvec;
+	int i;
+	for (i = 0; i < fVECSIZE(); i++) maskvec.ub[i] = fGETQBIT(MASK,i);
+	mem_store_vector_oddva(thread, insn, EA&~(ALIGNMENT-1), EA, insn->slot, LEN, &SRC.ub[0], &maskvec.ub[0], 0, fUSE_LOOKUP_ADDRESS_BY_REV(thread->processor_ptr));
+    } while (0),
+    (A_STORE,A_MEMLIKE)
+)
+
+DEF_MACRO(fSTOREMMVQ,
+	fSTOREMMVQ_AL(EA,fVECSIZE(),fVECSIZE(),SRC,MASK),
+	()
+)
+
+DEF_MACRO(fSTOREMMVNQ_AL,
+    {
+	mmvector_t maskvec;
+	int i;
+	for (i = 0; i < fVECSIZE(); i++) maskvec.ub[i] = fGETQBIT(MASK,i);
+        fV_AL_CHECK(EA,ALIGNMENT-1);
+	mem_store_vector_oddva(thread, insn, EA&~(ALIGNMENT-1), EA, insn->slot, LEN, &SRC.ub[0], &maskvec.ub[0], 1, fUSE_LOOKUP_ADDRESS_BY_REV(thread->processor_ptr));
+    },
+    (A_STORE,A_MEMLIKE)
+)
+
+DEF_MACRO(fSTOREMMVNQ,
+	fSTOREMMVNQ_AL(EA,fVECSIZE(),fVECSIZE(),SRC,MASK),
+	()
+)
+
+DEF_MACRO(fSTOREMMVU_AL,
+    {
+    size4u_t size1 = ALIGNMENT-((EA)&(ALIGNMENT-1));
+    size4u_t size2;
+    if (size1>LEN) size1 = LEN;
+    size2 = LEN-size1;
+    mem_store_vector_oddva(thread, insn, EA+size1, EA+fVECSIZE(), /* slot */ 1, size2, &SRC.ub[size1], 0, 0, fUSE_LOOKUP_ADDRESS());
+    mem_store_vector_oddva(thread, insn, EA, EA, /* slot */ 0, size1, &SRC.ub[0], 0, 0, fUSE_LOOKUP_ADDRESS_BY_REV(thread->processor_ptr));
+    },
+    (A_STORE,A_MEMLIKE)
+)
+
+DEF_MACRO(fSTOREMMVU,
+	{
+        thread->last_pkt->pkt_has_vtcm_access = 0;
+        thread->last_pkt->pkt_access_count = 0;
+		if ( (EA & (fVECSIZE()-1)) == 0) {
+			thread->last_pkt->double_access = 0;
+			fSTOREMMV_AL(EA,fVECSIZE(),fVECSIZE(),SRC);
+		} else {
+			thread->last_pkt->double_access = 1;
+            thread->last_pkt->pkt_has_vmemu_access = 1;
+			fSTOREMMVU_AL(EA,fVECSIZE(),fVECSIZE(),SRC);
+		}
+	},
+	()
+)
+
+DEF_MACRO(fSTOREMMVQU_AL,
+    {
+	size4u_t size1 = ALIGNMENT-((EA)&(ALIGNMENT-1));
+	size4u_t size2;
+	mmvector_t maskvec;
+	int i;
+	for (i = 0; i < fVECSIZE(); i++) maskvec.ub[i] = fGETQBIT(MASK,i);
+	if (size1>LEN) size1 = LEN;
+	size2 = LEN-size1;
+	mem_store_vector_oddva(thread, insn, EA+size1, EA+fVECSIZE(),/* slot */ 1, size2, &SRC.ub[size1], &maskvec.ub[size1], 0, fUSE_LOOKUP_ADDRESS());
+	mem_store_vector_oddva(thread, insn, EA, /* slot */ 0, size1, &SRC.ub[0], &maskvec.ub[0], 0, fUSE_LOOKUP_ADDRESS_BY_REV(thread->processor_ptr));
+    },
+    (A_STORE,A_MEMLIKE)
+)
+
+DEF_MACRO(fSTOREMMVQU,
+	{
+        thread->last_pkt->pkt_has_vtcm_access = 0;
+        thread->last_pkt->pkt_access_count = 0;
+		if ( (EA & (fVECSIZE()-1)) == 0) {
+			thread->last_pkt->double_access = 0;
+			fSTOREMMVQ_AL(EA,fVECSIZE(),fVECSIZE(),SRC,MASK);
+		} else {
+			thread->last_pkt->double_access = 1;
+            thread->last_pkt->pkt_has_vmemu_access = 1;
+			fSTOREMMVQU_AL(EA,fVECSIZE(),fVECSIZE(),SRC,MASK);
+		}
+	},
+	()
+)
+
+DEF_MACRO(fSTOREMMVNQU_AL,
+    {
+	size4u_t size1 = ALIGNMENT-((EA)&(ALIGNMENT-1));
+	size4u_t size2;
+	mmvector_t maskvec;
+	int i;
+	for (i = 0; i < fVECSIZE(); i++) maskvec.ub[i] = fGETQBIT(MASK,i);
+	if (size1>LEN) size1 = LEN;
+	size2 = LEN-size1;
+	mem_store_vector_oddva(thread, insn, EA+size1, EA+fVECSIZE(), /* slot */ 1, size2, &SRC.ub[size1], &maskvec.ub[size1], 1, fUSE_LOOKUP_ADDRESS());
+	mem_store_vector_oddva(thread, insn, EA, EA, /* slot */ 0, size1, &SRC.ub[0], &maskvec.ub[0], 1, fUSE_LOOKUP_ADDRESS_BY_REV(thread->processor_ptr));
+    },
+    (A_STORE,A_MEMLIKE)
+)
+
+DEF_MACRO(fSTOREMMVNQU,
+	{
+        thread->last_pkt->pkt_has_vtcm_access = 0;
+        thread->last_pkt->pkt_access_count = 0;
+		if ( (EA & (fVECSIZE()-1)) == 0) {
+			thread->last_pkt->double_access = 0;
+			fSTOREMMVNQ_AL(EA,fVECSIZE(),fVECSIZE(),SRC,MASK);
+		} else {
+			thread->last_pkt->double_access = 1;
+            thread->last_pkt->pkt_has_vmemu_access = 1;
+			fSTOREMMVNQU_AL(EA,fVECSIZE(),fVECSIZE(),SRC,MASK);
+		}
+	},
+	()
+)
+
+
+
+
+DEF_MACRO(fVFOREACH,
+    for (VAR = 0; VAR < fVELEM(WIDTH); VAR++),
+    /* NOTHING */
+)
+
+DEF_MACRO(fVARRAY_ELEMENT_ACCESS,
+    ARRAY.v[(INDEX) / (fVECSIZE()/(sizeof(ARRAY.TYPE[0])))].TYPE[(INDEX) % (fVECSIZE()/(sizeof(ARRAY.TYPE[0])))],
+    ()
+)
+
+DEF_MACRO(fVNEWCANCEL,
+	do { THREAD2STRUCT->VRegs_select &= ~(1<<(REGNUM)); } while (0),
+	()
+)
+
+DEF_MACRO(fTMPVDATA,
+	mmvec_vtmp_data(thread),
+	(A_CVI)
+)
+
+DEF_MACRO(fVSATDW,
+    fVSATW( ( ( ((long long)U)<<32 ) | fZXTN(32,64,V) ) ),
+    /* attribs */
+)
+
+DEF_MACRO(fVASL_SATHI,
+    fVSATW(((U)<<1) | ((V)>>31)),
+    /* attribs */
+)
+
+DEF_MACRO(fVUADDSAT,
+	fVSATUN( WIDTH, fZXTN(WIDTH, 2*WIDTH, U)  + fZXTN(WIDTH, 2*WIDTH, V)),
+	/* attribs */
+)
+
+DEF_MACRO(fVSADDSAT,
+	fVSATN(  WIDTH, fSXTN(WIDTH, 2*WIDTH, U)  + fSXTN(WIDTH, 2*WIDTH, V)),
+	/* attribs */
+)
+
+DEF_MACRO(fVUSUBSAT,
+	fVSATUN( WIDTH, fZXTN(WIDTH, 2*WIDTH, U)  - fZXTN(WIDTH, 2*WIDTH, V)),
+	/* attribs */
+)
+
+DEF_MACRO(fVSSUBSAT,
+	fVSATN(  WIDTH, fSXTN(WIDTH, 2*WIDTH, U)  - fSXTN(WIDTH, 2*WIDTH, V)),
+	/* attribs */
+)
+
+DEF_MACRO(fVAVGU,
+	((fZXTN(WIDTH, 2*WIDTH, U) + fZXTN(WIDTH, 2*WIDTH, V))>>1),
+	/* attribs */
+)
+
+DEF_MACRO(fVAVGURND,
+	((fZXTN(WIDTH, 2*WIDTH, U) + fZXTN(WIDTH, 2*WIDTH, V)+1)>>1),
+	/* attribs */
+)
+
+DEF_MACRO(fVNAVGU,
+	((fZXTN(WIDTH, 2*WIDTH, U) - fZXTN(WIDTH, 2*WIDTH, V))>>1),
+	/* attribs */
+)
+
+DEF_MACRO(fVNAVGURNDSAT,
+	fVSATUN(WIDTH,((fZXTN(WIDTH, 2*WIDTH, U) - fZXTN(WIDTH, 2*WIDTH, V)+1)>>1)),
+	/* attribs */
+)
+
+DEF_MACRO(fVAVGS,
+	((fSXTN(WIDTH, 2*WIDTH, U) + fSXTN(WIDTH, 2*WIDTH, V))>>1),
+	/* attribs */
+)
+
+DEF_MACRO(fVAVGSRND,
+	((fSXTN(WIDTH, 2*WIDTH, U) + fSXTN(WIDTH, 2*WIDTH, V)+1)>>1),
+	/* attribs */
+)
+
+DEF_MACRO(fVNAVGS,
+	((fSXTN(WIDTH, 2*WIDTH, U) - fSXTN(WIDTH, 2*WIDTH, V))>>1),
+	/* attribs */
+)
+
+DEF_MACRO(fVNAVGSRND,
+	((fSXTN(WIDTH, 2*WIDTH, U) - fSXTN(WIDTH, 2*WIDTH, V)+1)>>1),
+	/* attribs */
+)
+
+DEF_MACRO(fVNAVGSRNDSAT,
+	fVSATN(WIDTH,((fSXTN(WIDTH, 2*WIDTH, U) - fSXTN(WIDTH, 2*WIDTH, V)+1)>>1)),
+	/* attribs */
+)
+
+
+DEF_MACRO(fVNOROUND,
+	VAL,
+	/* NOTHING */
+)
+DEF_MACRO(fVNOSAT,
+	VAL,
+	/* NOTHING */
+)
+
+DEF_MACRO(fVROUND,
+	((VAL) + (((SHAMT)>0)?(1LL<<((SHAMT)-1)):0)),
+	/* NOTHING */
+)
+
+DEF_MACRO(fCARRY_FROM_ADD32,
+	(((fZXTN(32,64,A)+fZXTN(32,64,B)+C) >> 32) & 1),
+	/* NOTHING */
+)
+
+DEF_MACRO(fUARCH_NOTE_PUMP_4X,
+	,
+	()
+)
+
+DEF_MACRO(fUARCH_NOTE_PUMP_2X,
+	,
+	()
+)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 07/30] Hexagon HVX (target/hexagon) semantics generator
  2021-09-20 21:23 [PATCH v3 00/30] Hexagon HVX (target/hexagon) patch series Taylor Simpson
                   ` (5 preceding siblings ...)
  2021-09-20 21:24 ` [PATCH v3 06/30] Hexagon HVX (target/hexagon) import macro definitions Taylor Simpson
@ 2021-09-20 21:24 ` Taylor Simpson
  2021-09-20 22:59   ` Richard Henderson
  2021-09-20 21:24 ` [PATCH v3 08/30] Hexagon HVX (target/hexagon) semantics generator - part 2 Taylor Simpson
                   ` (22 subsequent siblings)
  29 siblings, 1 reply; 51+ messages in thread
From: Taylor Simpson @ 2021-09-20 21:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: ale, bcain, tsimpson, richard.henderson, f4bug

Add HVX support to the semantics generator

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/gen_semantics.c | 33 +++++++++++++++++++++++++++++++++
 target/hexagon/hex_common.py   | 13 +++++++++++++
 2 files changed, 46 insertions(+)

diff --git a/target/hexagon/gen_semantics.c b/target/hexagon/gen_semantics.c
index c5fccec..4a2bdd7 100644
--- a/target/hexagon/gen_semantics.c
+++ b/target/hexagon/gen_semantics.c
@@ -44,6 +44,11 @@ int main(int argc, char *argv[])
  *         Q6INSN(A2_add,"Rd32=add(Rs32,Rt32)",ATTRIBS(),
  *         "Add 32-bit registers",
  *         { RdV=RsV+RtV;})
+ *     HVX instructions have the following form
+ *         EXTINSN(V6_vinsertwr, "Vx32.w=vinsert(Rt32)",
+ *         ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VX),
+ *         "Insert Word Scalar into Vector",
+ *         VxV.uw[0] = RtV;)
  */
 #define Q6INSN(TAG, BEH, ATTRIBS, DESCR, SEM) \
     do { \
@@ -59,8 +64,23 @@ int main(int argc, char *argv[])
                          ")\n", \
                 #TAG, STRINGIZE(ATTRIBS)); \
     } while (0);
+#define EXTINSN(TAG, BEH, ATTRIBS, DESCR, SEM) \
+    do { \
+        fprintf(outfile, "SEMANTICS( \\\n" \
+                         "    \"%s\", \\\n" \
+                         "    %s, \\\n" \
+                         "    \"\"\"%s\"\"\" \\\n" \
+                         ")\n", \
+                #TAG, STRINGIZE(BEH), STRINGIZE(SEM)); \
+        fprintf(outfile, "ATTRIBUTES( \\\n" \
+                         "    \"%s\", \\\n" \
+                         "    \"%s\" \\\n" \
+                         ")\n", \
+                #TAG, STRINGIZE(ATTRIBS)); \
+    } while (0);
 #include "imported/allidefs.def"
 #undef Q6INSN
+#undef EXTINSN
 
 /*
  * Process the macro definitions
@@ -83,6 +103,19 @@ int main(int argc, char *argv[])
 #include "imported/macros.def"
 #undef DEF_MACRO
 
+/*
+ * Process the macros for HVX
+ */
+#define DEF_MACRO(MNAME, BEH, ATTRS) \
+    fprintf(outfile, "MACROATTRIB( \\\n" \
+                     "    \"%s\", \\\n" \
+                     "    \"\"\"%s\"\"\", \\\n" \
+                     "    \"%s\" \\\n" \
+                     ")\n", \
+            #MNAME, STRINGIZE(BEH), STRINGIZE(ATTRS));
+#include "imported/allext_macros.def"
+#undef DEF_MACRO
+
     fclose(outfile);
     return 0;
 }
diff --git a/target/hexagon/hex_common.py b/target/hexagon/hex_common.py
index b3b5340..47fb628 100755
--- a/target/hexagon/hex_common.py
+++ b/target/hexagon/hex_common.py
@@ -143,6 +143,9 @@ def compute_tag_immediates(tag):
 ##          P                predicate register
 ##          R                GPR register
 ##          M                modifier register
+##          Q                HVX predicate vector
+##          V                HVX vector register
+##          O                HVX new vector register
 ##      regid can be one of the following
 ##          d, e             destination register
 ##          dd               destination register pair
@@ -178,6 +181,9 @@ def is_readwrite(regid):
 def is_scalar_reg(regtype):
     return regtype in "RPC"
 
+def is_hvx_reg(regtype):
+    return regtype in "VQ"
+
 def is_old_val(regtype, regid, tag):
     return regtype+regid+'V' in semdict[tag]
 
@@ -201,6 +207,13 @@ def need_ea(tag):
 def skip_qemu_helper(tag):
     return tag in overrides.keys()
 
+def is_tmp_result(tag):
+    return ('A_CVI_TMP' in attribdict[tag] or
+            'A_CVI_TMP_DST' in attribdict[tag])
+
+def is_new_result(tag):
+    return ('A_CVI_NEW' in attribdict[tag])
+
 def imm_name(immlett):
     return "%siV" % immlett
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 08/30] Hexagon HVX (target/hexagon) semantics generator - part 2
  2021-09-20 21:23 [PATCH v3 00/30] Hexagon HVX (target/hexagon) patch series Taylor Simpson
                   ` (6 preceding siblings ...)
  2021-09-20 21:24 ` [PATCH v3 07/30] Hexagon HVX (target/hexagon) semantics generator Taylor Simpson
@ 2021-09-20 21:24 ` Taylor Simpson
  2021-09-20 23:03   ` Richard Henderson
  2021-09-20 21:24 ` [PATCH v3 09/30] Hexagon HVX (target/hexagon) C preprocessor for decode tree Taylor Simpson
                   ` (21 subsequent siblings)
  29 siblings, 1 reply; 51+ messages in thread
From: Taylor Simpson @ 2021-09-20 21:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: ale, bcain, tsimpson, richard.henderson, f4bug

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/gen_helper_funcs.py  | 112 ++++++++++++++--
 target/hexagon/gen_helper_protos.py |  16 ++-
 target/hexagon/gen_tcg_funcs.py     | 258 ++++++++++++++++++++++++++++++++++--
 3 files changed, 364 insertions(+), 22 deletions(-)

diff --git a/target/hexagon/gen_helper_funcs.py b/target/hexagon/gen_helper_funcs.py
index 2b1c5d8..ac5ce10 100755
--- a/target/hexagon/gen_helper_funcs.py
+++ b/target/hexagon/gen_helper_funcs.py
@@ -48,12 +48,26 @@ def gen_helper_arg_pair(f,regtype,regid,regno):
     if regno >= 0 : f.write(", ")
     f.write("int64_t %s%sV" % (regtype,regid))
 
+def gen_helper_arg_ext(f,regtype,regid,regno):
+    if regno > 0 : f.write(", ")
+    f.write("void *%s%sV_void" % (regtype,regid))
+
+def gen_helper_arg_ext_pair(f,regtype,regid,regno):
+    if regno > 0 : f.write(", ")
+    f.write("void *%s%sV_void" % (regtype,regid))
+
 def gen_helper_arg_opn(f,regtype,regid,i,tag):
     if (hex_common.is_pair(regid)):
-        gen_helper_arg_pair(f,regtype,regid,i)
+        if (hex_common.is_hvx_reg(regtype)):
+            gen_helper_arg_ext_pair(f,regtype,regid,i)
+        else:
+            gen_helper_arg_pair(f,regtype,regid,i)
     elif (hex_common.is_single(regid)):
         if hex_common.is_old_val(regtype, regid, tag):
-            gen_helper_arg(f,regtype,regid,i)
+            if (hex_common.is_hvx_reg(regtype)):
+                gen_helper_arg_ext(f,regtype,regid,i)
+            else:
+                gen_helper_arg(f,regtype,regid,i)
         elif hex_common.is_new_val(regtype, regid, tag):
             gen_helper_arg_new(f,regtype,regid,i)
         else:
@@ -72,25 +86,67 @@ def gen_helper_dest_decl_pair(f,regtype,regid,regno,subfield=""):
     f.write("    int64_t %s%sV%s = 0;\n" % \
         (regtype,regid,subfield))
 
+def gen_helper_dest_decl_ext(f,regtype,regid):
+    if (regtype == "Q"):
+        f.write("    /* %s%sV is *(MMQReg *)(%s%sV_void) */\n" % \
+            (regtype,regid,regtype,regid))
+    else:
+        f.write("    /* %s%sV is *(MMVector *)(%s%sV_void) */\n" % \
+            (regtype,regid,regtype,regid))
+
+def gen_helper_dest_decl_ext_pair(f,regtype,regid,regno):
+    f.write("    /* %s%sV is *(MMVectorPair *))%s%sV_void) */\n" % \
+        (regtype,regid,regtype, regid))
+
 def gen_helper_dest_decl_opn(f,regtype,regid,i):
     if (hex_common.is_pair(regid)):
-        gen_helper_dest_decl_pair(f,regtype,regid,i)
+        if (hex_common.is_hvx_reg(regtype)):
+            gen_helper_dest_decl_ext_pair(f,regtype,regid, i)
+        else:
+            gen_helper_dest_decl_pair(f,regtype,regid,i)
     elif (hex_common.is_single(regid)):
-        gen_helper_dest_decl(f,regtype,regid,i)
+        if (hex_common.is_hvx_reg(regtype)):
+            gen_helper_dest_decl_ext(f,regtype,regid)
+        else:
+            gen_helper_dest_decl(f,regtype,regid,i)
     else:
         print("Bad register parse: ",regtype,regid,toss,numregs)
 
+def gen_helper_src_var_ext(f,regtype,regid):
+    if (regtype == "Q"):
+       f.write("    /* %s%sV is *(MMQReg *)(%s%sV_void) */\n" % \
+           (regtype,regid,regtype,regid))
+    else:
+       f.write("    /* %s%sV is *(MMVector *)(%s%sV_void) */\n" % \
+           (regtype,regid,regtype,regid))
+
+def gen_helper_src_var_ext_pair(f,regtype,regid,regno):
+    f.write("    /* %s%sV%s is *(MMVectorPair *)(%s%sV%s_void) */\n" % \
+        (regtype,regid,regno,regtype,regid,regno))
+
 def gen_helper_return(f,regtype,regid,regno):
     f.write("    return %s%sV;\n" % (regtype,regid))
 
 def gen_helper_return_pair(f,regtype,regid,regno):
     f.write("    return %s%sV;\n" % (regtype,regid))
 
+def gen_helper_dst_write_ext(f,regtype,regid):
+    return
+
+def gen_helper_dst_write_ext_pair(f,regtype,regid):
+    return
+
 def gen_helper_return_opn(f, regtype, regid, i):
     if (hex_common.is_pair(regid)):
-        gen_helper_return_pair(f,regtype,regid,i)
+        if (hex_common.is_hvx_reg(regtype)):
+            gen_helper_dst_write_ext_pair(f,regtype,regid)
+        else:
+            gen_helper_return_pair(f,regtype,regid,i)
     elif (hex_common.is_single(regid)):
-        gen_helper_return(f,regtype,regid,i)
+        if (hex_common.is_hvx_reg(regtype)):
+            gen_helper_dst_write_ext(f,regtype,regid)
+        else:
+            gen_helper_return(f,regtype,regid,i)
     else:
         print("Bad register parse: ",regtype,regid,toss,numregs)
 
@@ -129,14 +185,20 @@ def gen_helper_function(f, tag, tagregs, tagimms):
                 % (tag, tag))
     else:
         ## The return type of the function is the type of the destination
-        ## register
+        ## register (if scalar)
         i=0
         for regtype,regid,toss,numregs in regs:
             if (hex_common.is_written(regid)):
                 if (hex_common.is_pair(regid)):
-                    gen_helper_return_type_pair(f,regtype,regid,i)
+                    if (hex_common.is_hvx_reg(regtype)):
+                        continue
+                    else:
+                        gen_helper_return_type_pair(f,regtype,regid,i)
                 elif (hex_common.is_single(regid)):
-                    gen_helper_return_type(f,regtype,regid,i)
+                    if (hex_common.is_hvx_reg(regtype)):
+                            continue
+                    else:
+                        gen_helper_return_type(f,regtype,regid,i)
                 else:
                     print("Bad register parse: ",regtype,regid,toss,numregs)
             i += 1
@@ -145,16 +207,37 @@ def gen_helper_function(f, tag, tagregs, tagimms):
             f.write("void")
         f.write(" HELPER(%s)(CPUHexagonState *env" % tag)
 
+        ## Arguments include the vector destination operands
         i = 1
+        for regtype,regid,toss,numregs in regs:
+            if (hex_common.is_written(regid)):
+                if (hex_common.is_pair(regid)):
+                    if (hex_common.is_hvx_reg(regtype)):
+                        gen_helper_arg_ext_pair(f,regtype,regid,i)
+                    else:
+                        continue
+                elif (hex_common.is_single(regid)):
+                    if (hex_common.is_hvx_reg(regtype)):
+                        gen_helper_arg_ext(f,regtype,regid,i)
+                    else:
+                        # This is the return value of the function
+                        continue
+                else:
+                    print("Bad register parse: ",regtype,regid,toss,numregs)
+                i += 1
 
         ## Arguments to the helper function are the source regs and immediates
         for regtype,regid,toss,numregs in regs:
             if (hex_common.is_read(regid)):
+                if (hex_common.is_hvx_reg(regtype) and
+                    hex_common.is_readwrite(regid)):
+                    continue
                 gen_helper_arg_opn(f,regtype,regid,i,tag)
                 i += 1
         for immlett,bits,immshift in imms:
             gen_helper_arg_imm(f,immlett)
             i += 1
+
         if hex_common.need_slot(tag):
             if i > 0: f.write(", ")
             f.write("uint32_t slot")
@@ -173,6 +256,17 @@ def gen_helper_function(f, tag, tagregs, tagimms):
                 gen_helper_dest_decl_opn(f,regtype,regid,i)
             i += 1
 
+        for regtype,regid,toss,numregs in regs:
+            if (hex_common.is_read(regid)):
+                if (hex_common.is_pair(regid)):
+                    if (hex_common.is_hvx_reg(regtype)):
+                        gen_helper_src_var_ext_pair(f,regtype,regid,i)
+                elif (hex_common.is_single(regid)):
+                    if (hex_common.is_hvx_reg(regtype)):
+                        gen_helper_src_var_ext(f,regtype,regid)
+                else:
+                    print("Bad register parse: ",regtype,regid,toss,numregs)
+
         if 'A_FPOP' in hex_common.attribdict[tag]:
             f.write('    arch_fpop_start(env);\n');
 
diff --git a/target/hexagon/gen_helper_protos.py b/target/hexagon/gen_helper_protos.py
index ea41007..229ef8d 100755
--- a/target/hexagon/gen_helper_protos.py
+++ b/target/hexagon/gen_helper_protos.py
@@ -94,19 +94,33 @@ def gen_helper_prototype(f, tag, tagregs, tagimms):
             f.write('DEF_HELPER_%s(%s' % (def_helper_size, tag))
 
         ## Generate the qemu DEF_HELPER type for each result
+        ## Iterate over this list twice
+        ## - Emit the scalar result
+        ## - Emit the vector result
         i=0
         for regtype,regid,toss,numregs in regs:
             if (hex_common.is_written(regid)):
-                gen_def_helper_opn(f, tag, regtype, regid, toss, numregs, i)
+                if (not hex_common.is_hvx_reg(regtype)):
+                    gen_def_helper_opn(f, tag, regtype, regid, toss, numregs, i)
                 i += 1
 
         ## Put the env between the outputs and inputs
         f.write(', env' )
         i += 1
 
+        # Second pass
+        for regtype,regid,toss,numregs in regs:
+            if (hex_common.is_written(regid)):
+                if (hex_common.is_hvx_reg(regtype)):
+                    gen_def_helper_opn(f, tag, regtype, regid, toss, numregs, i)
+                    i += 1
+
         ## Generate the qemu type for each input operand (regs and immediates)
         for regtype,regid,toss,numregs in regs:
             if (hex_common.is_read(regid)):
+                if (hex_common.is_hvx_reg(regtype) and
+                    hex_common.is_readwrite(regid)):
+                    continue
                 gen_def_helper_opn(f, tag, regtype, regid, toss, numregs, i)
                 i += 1
         for immlett,bits,immshift in imms:
diff --git a/target/hexagon/gen_tcg_funcs.py b/target/hexagon/gen_tcg_funcs.py
index 7ceb25b..1abe59d 100755
--- a/target/hexagon/gen_tcg_funcs.py
+++ b/target/hexagon/gen_tcg_funcs.py
@@ -119,10 +119,95 @@ def genptr_decl(f, tag, regtype, regid, regno):
                 (regtype, regid, regtype, regid))
         else:
             print("Bad register parse: ", regtype, regid)
+    elif (regtype == "V"):
+        if (regid in {"dd"}):
+            f.write("    const int %s%sN = insn->regno[%d];\n" %\
+                (regtype, regid, regno))
+            f.write("    const intptr_t %s%sV_off =\n" %\
+                 (regtype, regid))
+            if (hex_common.is_tmp_result(tag)):
+                f.write("        ctx_tmp_vreg_off(ctx, %s%sN, 2, true);\n" % \
+                     (regtype, regid))
+            else:
+                f.write("        ctx_future_vreg_off(ctx, %s%sN," % \
+                     (regtype, regid))
+                f.write(" 2, true);\n")
+            if (not hex_common.skip_qemu_helper(tag)):
+                f.write("    TCGv_ptr %s%sV = tcg_temp_new_ptr();\n" % \
+                    (regtype, regid))
+                f.write("    tcg_gen_addi_ptr(%s%sV, cpu_env, %s%sV_off);\n" % \
+                    (regtype, regid, regtype, regid))
+        elif (regid in {"uu", "vv", "xx"}):
+            f.write("    const int %s%sN = insn->regno[%d];\n" %\
+                (regtype, regid, regno))
+            f.write("    const intptr_t %s%sV_off =\n" % \
+                 (regtype, regid))
+            f.write("        offsetof(CPUHexagonState, %s%sV);\n" % \
+                 (regtype, regid))
+            if (not hex_common.skip_qemu_helper(tag)):
+                f.write("    TCGv_ptr %s%sV = tcg_temp_new_ptr();\n" % \
+                    (regtype, regid))
+                f.write("    tcg_gen_addi_ptr(%s%sV, cpu_env, %s%sV_off);\n" % \
+                    (regtype, regid, regtype, regid))
+        elif (regid in {"s", "u", "v", "w"}):
+            f.write("    const int %s%sN = insn->regno[%d];\n" % \
+                (regtype, regid, regno))
+            f.write("    const intptr_t %s%sV_off =\n" % \
+                              (regtype, regid))
+            f.write("        vreg_src_off(ctx, %s%sN);\n" % \
+                              (regtype, regid))
+            if (not hex_common.skip_qemu_helper(tag)):
+                f.write("    TCGv_ptr %s%sV = tcg_temp_new_ptr();\n" % \
+                    (regtype, regid))
+        elif (regid in {"d", "x", "y"}):
+            f.write("    const int %s%sN = insn->regno[%d];\n" % \
+                (regtype, regid, regno))
+            f.write("    const intptr_t %s%sV_off =\n" % \
+                (regtype, regid))
+            if (hex_common.is_tmp_result(tag)):
+                f.write("        ctx_tmp_vreg_off(ctx, %s%sN, 1, true);\n" % \
+                    (regtype, regid))
+            else:
+                f.write("        ctx_future_vreg_off(ctx, %s%sN," %\
+                    (regtype, regid))
+                f.write(" 1, true);\n");
+            if (not hex_common.skip_qemu_helper(tag)):
+                f.write("    TCGv_ptr %s%sV = tcg_temp_new_ptr();\n" % \
+                    (regtype, regid))
+                f.write("    tcg_gen_addi_ptr(%s%sV, cpu_env, %s%sV_off);\n" % \
+                    (regtype, regid, regtype, regid))
+        else:
+            print("Bad register parse: ", regtype, regid)
+    elif (regtype == "Q"):
+        if (regid in {"d", "e", "x"}):
+            f.write("    const int %s%sN = insn->regno[%d];\n" % \
+                (regtype, regid, regno))
+            f.write("    const intptr_t %s%sV_off =\n" % \
+                (regtype, regid))
+            f.write("        offsetof(CPUHexagonState,\n")
+            f.write("                 future_QRegs[%s%sN]);\n" % \
+                (regtype, regid))
+            if (not hex_common.skip_qemu_helper(tag)):
+                f.write("    TCGv_ptr %s%sV = tcg_temp_new_ptr();\n" % \
+                    (regtype, regid))
+                f.write("    tcg_gen_addi_ptr(%s%sV, cpu_env, %s%sV_off);\n" % \
+                    (regtype, regid, regtype, regid))
+        elif (regid in {"s", "t", "u", "v"}):
+            f.write("    const int %s%sN = insn->regno[%d];\n" % \
+                (regtype, regid, regno))
+            f.write("    const intptr_t %s%sV_off =\n" %\
+                (regtype, regid))
+            f.write("        offsetof(CPUHexagonState, QRegs[%s%sN]);\n" % \
+                (regtype, regid))
+            if (not hex_common.skip_qemu_helper(tag)):
+                f.write("    TCGv_ptr %s%sV = tcg_temp_new_ptr();\n" % \
+                    (regtype, regid))
+        else:
+            print("Bad register parse: ", regtype, regid)
     else:
         print("Bad register parse: ", regtype, regid)
 
-def genptr_decl_new(f,regtype,regid,regno):
+def genptr_decl_new(f, tag, regtype, regid, regno):
     if (regtype == "N"):
         if (regid in {"s", "t"}):
             f.write("    TCGv %s%sN = hex_new_value[insn->regno[%d]];\n" % \
@@ -135,6 +220,21 @@ def genptr_decl_new(f,regtype,regid,regno):
                 (regtype, regid, regno))
         else:
             print("Bad register parse: ", regtype, regid)
+    elif (regtype == "O"):
+        if (regid == "s"):
+            f.write("    const intptr_t %s%sN_num = insn->regno[%d];\n" % \
+                (regtype, regid, regno))
+            if (hex_common.skip_qemu_helper(tag)):
+                f.write("    const intptr_t %s%sN_off =\n" % \
+                    (regtype, regid))
+                f.write("         ctx_future_vreg_off(ctx, %s%sN_num," % \
+                    (regtype, regid))
+                f.write(" 1, true);\n")
+            else:
+                f.write("    TCGv %s%sN = tcg_const_tl(%s%sN_num);\n" % \
+                    (regtype, regid, regtype, regid))
+        else:
+            print("Bad register parse: ", regtype, regid)
     else:
         print("Bad register parse: ", regtype, regid)
 
@@ -145,7 +245,7 @@ def genptr_decl_opn(f, tag, regtype, regid, toss, numregs, i):
         if hex_common.is_old_val(regtype, regid, tag):
             genptr_decl(f,tag, regtype, regid, i)
         elif hex_common.is_new_val(regtype, regid, tag):
-            genptr_decl_new(f,regtype,regid,i)
+            genptr_decl_new(f, tag, regtype, regid, i)
         else:
             print("Bad register parse: ",regtype,regid,toss,numregs)
     else:
@@ -159,7 +259,7 @@ def genptr_decl_imm(f,immlett):
     f.write("    int %s = insn->immed[%d];\n" % \
         (hex_common.imm_name(immlett), i))
 
-def genptr_free(f,regtype,regid,regno):
+def genptr_free(f, tag, regtype, regid, regno):
     if (regtype == "R"):
         if (regid in {"dd", "ss", "tt", "xx", "yy"}):
             f.write("    tcg_temp_free_i64(%s%sV);\n" % (regtype, regid))
@@ -182,33 +282,55 @@ def genptr_free(f,regtype,regid,regno):
     elif (regtype == "M"):
         if (regid != "u"):
             print("Bad register parse: ", regtype, regid)
+    elif (regtype == "V"):
+        if (regid in {"dd", "uu", "vv", "xx", \
+                      "d", "s", "u", "v", "w", "x", "y"}):
+            if (not hex_common.skip_qemu_helper(tag)):
+                f.write("    tcg_temp_free_ptr(%s%sV);\n" % \
+                    (regtype, regid))
+        else:
+            print("Bad register parse: ", regtype, regid)
+    elif (regtype == "Q"):
+        if (regid in {"d", "e", "s", "t", "u", "v", "x"}):
+            if (not hex_common.skip_qemu_helper(tag)):
+                f.write("    tcg_temp_free_ptr(%s%sV);\n" % \
+                    (regtype, regid))
+        else:
+            print("Bad register parse: ", regtype, regid)
     else:
         print("Bad register parse: ", regtype, regid)
 
-def genptr_free_new(f,regtype,regid,regno):
+def genptr_free_new(f, tag, regtype, regid, regno):
     if (regtype == "N"):
         if (regid not in {"s", "t"}):
             print("Bad register parse: ", regtype, regid)
     elif (regtype == "P"):
         if (regid not in {"t", "u", "v"}):
             print("Bad register parse: ", regtype, regid)
+    elif (regtype == "O"):
+        if (regid == "s"):
+            if (not hex_common.skip_qemu_helper(tag)):
+                f.write("    tcg_temp_free(%s%sN);\n" % \
+                    (regtype, regid))
+        else:
+            print("Bad register parse: ", regtype, regid)
     else:
         print("Bad register parse: ", regtype, regid)
 
 def genptr_free_opn(f,regtype,regid,i,tag):
     if (hex_common.is_pair(regid)):
-        genptr_free(f,regtype,regid,i)
+        genptr_free(f, tag, regtype, regid, i)
     elif (hex_common.is_single(regid)):
         if hex_common.is_old_val(regtype, regid, tag):
-            genptr_free(f,regtype,regid,i)
+            genptr_free(f, tag, regtype, regid, i)
         elif hex_common.is_new_val(regtype, regid, tag):
-            genptr_free_new(f,regtype,regid,i)
+            genptr_free_new(f, tag, regtype, regid, i)
         else:
             print("Bad register parse: ",regtype,regid,toss,numregs)
     else:
         print("Bad register parse: ",regtype,regid,toss,numregs)
 
-def genptr_src_read(f,regtype,regid):
+def genptr_src_read(f, tag, regtype, regid):
     if (regtype == "R"):
         if (regid in {"ss", "tt", "xx", "yy"}):
             f.write("    tcg_gen_concat_i32_i64(%s%sV, hex_gpr[%s%sN],\n" % \
@@ -238,6 +360,47 @@ def genptr_src_read(f,regtype,regid):
     elif (regtype == "M"):
         if (regid != "u"):
             print("Bad register parse: ", regtype, regid)
+    elif (regtype == "V"):
+        if (regid in {"uu", "vv", "xx"}):
+            f.write("    tcg_gen_gvec_mov(MO_64, %s%sV_off,\n" % \
+                (regtype, regid))
+            f.write("        vreg_src_off(ctx, %s%sN),\n" % \
+                (regtype, regid))
+            f.write("        sizeof(MMVector), sizeof(MMVector));\n")
+            f.write("    tcg_gen_gvec_mov(MO_64,\n")
+            f.write("        %s%sV_off + sizeof(MMVector),\n" % \
+                (regtype, regid))
+            f.write("        vreg_src_off(ctx, %s%sN ^ 1),\n" % \
+                (regtype, regid))
+            f.write("        sizeof(MMVector), sizeof(MMVector));\n")
+        elif (regid in {"s", "u", "v", "w"}):
+            if (not hex_common.skip_qemu_helper(tag)):
+                f.write("    tcg_gen_addi_ptr(%s%sV, cpu_env, %s%sV_off);\n" % \
+                                 (regtype, regid, regtype, regid))
+        elif (regid in {"x", "y"}):
+            f.write("    tcg_gen_gvec_mov(MO_64, %s%sV_off,\n" % \
+                             (regtype, regid))
+            f.write("        vreg_src_off(ctx, %s%sN),\n" % \
+                             (regtype, regid))
+            f.write("        sizeof(MMVector), sizeof(MMVector));\n")
+            if (not hex_common.skip_qemu_helper(tag)):
+                f.write("    tcg_gen_addi_ptr(%s%sV, cpu_env, %s%sV_off);\n" % \
+                                 (regtype, regid, regtype, regid))
+        else:
+            print("Bad register parse: ", regtype, regid)
+    elif (regtype == "Q"):
+        if (regid in {"s", "t", "u", "v"}):
+            if (not hex_common.skip_qemu_helper(tag)):
+                f.write("    tcg_gen_addi_ptr(%s%sV, cpu_env, %s%sV_off);\n" % \
+                    (regtype, regid, regtype, regid))
+        elif (regid in {"x"}):
+            f.write("    tcg_gen_gvec_mov(MO_64, %s%sV_off,\n" % \
+                (regtype, regid))
+            f.write("        offsetof(CPUHexagonState, QRegs[%s%sN]),\n" % \
+                (regtype, regid))
+            f.write("        sizeof(MMQReg), sizeof(MMQReg));\n")
+        else:
+            print("Bad register parse: ", regtype, regid)
     else:
         print("Bad register parse: ", regtype, regid)
 
@@ -248,15 +411,18 @@ def genptr_src_read_new(f,regtype,regid):
     elif (regtype == "P"):
         if (regid not in {"t", "u", "v"}):
             print("Bad register parse: ", regtype, regid)
+    elif (regtype == "O"):
+        if (regid != "s"):
+            print("Bad register parse: ", regtype, regid)
     else:
         print("Bad register parse: ", regtype, regid)
 
 def genptr_src_read_opn(f,regtype,regid,tag):
     if (hex_common.is_pair(regid)):
-        genptr_src_read(f,regtype,regid)
+        genptr_src_read(f, tag, regtype, regid)
     elif (hex_common.is_single(regid)):
         if hex_common.is_old_val(regtype, regid, tag):
-            genptr_src_read(f,regtype,regid)
+            genptr_src_read(f, tag, regtype, regid)
         elif hex_common.is_new_val(regtype, regid, tag):
             genptr_src_read_new(f,regtype,regid)
         else:
@@ -334,11 +500,68 @@ def genptr_dst_write(f, tag, regtype, regid):
     else:
         print("Bad register parse: ", regtype, regid)
 
+def genptr_dst_write_ext(f, tag, regtype, regid, newv="0"):
+    if (regtype == "V"):
+        if (regid in {"dd", "xx", "yy"}):
+            if ('A_CONDEXEC' in hex_common.attribdict[tag]):
+                is_predicated = "true"
+            else:
+                is_predicated = "false"
+            f.write("    gen_log_vreg_write_pair(ctx, %s%sV_off, %s%sN, " % \
+                (regtype, regid, regtype, regid))
+            f.write("%s, insn->slot, %s);\n" % \
+                (newv, is_predicated))
+            f.write("    ctx_log_vreg_write_pair(ctx, %s%sN, %s,\n" % \
+                (regtype, regid, newv))
+            f.write("        %s);\n" % (is_predicated))
+        elif (regid in {"d", "x", "y"}):
+            if ('A_CONDEXEC' in hex_common.attribdict[tag]):
+                is_predicated = "true"
+            else:
+                is_predicated = "false"
+            f.write("    gen_log_vreg_write(ctx, %s%sV_off, %s%sN, %s, " % \
+                (regtype, regid, regtype, regid, newv))
+            f.write("insn->slot, %s);\n" % \
+                (is_predicated))
+            f.write("    ctx_log_vreg_write(ctx, %s%sN, %s, %s);\n" % \
+                (regtype, regid, newv, is_predicated))
+        else:
+            print("Bad register parse: ", regtype, regid)
+    elif (regtype == "Q"):
+        if (regid in {"d", "e", "x"}):
+            if ('A_CONDEXEC' in hex_common.attribdict[tag]):
+                is_predicated = "true"
+            else:
+                is_predicated = "false"
+            f.write("    gen_log_qreg_write(%s%sV_off, %s%sN, %s, " % \
+                (regtype, regid, regtype, regid, newv))
+            f.write("insn->slot, %s);\n" % (is_predicated))
+            f.write("    ctx_log_qreg_write(ctx, %s%sN, %s);\n" % \
+                (regtype, regid, is_predicated))
+        else:
+            print("Bad register parse: ", regtype, regid)
+    else:
+        print("Bad register parse: ", regtype, regid)
+
 def genptr_dst_write_opn(f,regtype, regid, tag):
     if (hex_common.is_pair(regid)):
-        genptr_dst_write(f, tag, regtype, regid)
+        if (hex_common.is_hvx_reg(regtype)):
+            if (hex_common.is_tmp_result(tag)):
+                genptr_dst_write_ext(f, tag, regtype, regid, "EXT_TMP")
+            else:
+                genptr_dst_write_ext(f, tag, regtype, regid)
+        else:
+            genptr_dst_write(f, tag, regtype, regid)
     elif (hex_common.is_single(regid)):
-        genptr_dst_write(f, tag, regtype, regid)
+        if (hex_common.is_hvx_reg(regtype)):
+            if (hex_common.is_new_result(tag)):
+                genptr_dst_write_ext(f, tag, regtype, regid, "EXT_NEW")
+            if (hex_common.is_tmp_result(tag)):
+                genptr_dst_write_ext(f, tag, regtype, regid, "EXT_TMP")
+            else:
+                genptr_dst_write_ext(f, tag, regtype, regid, "EXT_DFL")
+        else:
+            genptr_dst_write(f, tag, regtype, regid)
     else:
         print("Bad register parse: ",regtype,regid,toss,numregs)
 
@@ -409,13 +632,24 @@ def gen_tcg_func(f, tag, regs, imms):
         ## If there is a scalar result, it is the return type
         for regtype,regid,toss,numregs in regs:
             if (hex_common.is_written(regid)):
+                if (hex_common.is_hvx_reg(regtype)):
+                    continue
                 gen_helper_call_opn(f, tag, regtype, regid, toss, numregs, i)
                 i += 1
         if (i > 0): f.write(", ")
         f.write("cpu_env")
         i=1
         for regtype,regid,toss,numregs in regs:
+            if (hex_common.is_written(regid)):
+                if (not hex_common.is_hvx_reg(regtype)):
+                    continue
+                gen_helper_call_opn(f, tag, regtype, regid, toss, numregs, i)
+                i += 1
+        for regtype,regid,toss,numregs in regs:
             if (hex_common.is_read(regid)):
+                if (hex_common.is_hvx_reg(regtype) and
+                    hex_common.is_readwrite(regid)):
+                    continue
                 gen_helper_call_opn(f, tag, regtype, regid, toss, numregs, i)
                 i += 1
         for immlett,bits,immshift in imms:
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 09/30] Hexagon HVX (target/hexagon) C preprocessor for decode tree
  2021-09-20 21:23 [PATCH v3 00/30] Hexagon HVX (target/hexagon) patch series Taylor Simpson
                   ` (7 preceding siblings ...)
  2021-09-20 21:24 ` [PATCH v3 08/30] Hexagon HVX (target/hexagon) semantics generator - part 2 Taylor Simpson
@ 2021-09-20 21:24 ` Taylor Simpson
  2021-09-20 23:04   ` Richard Henderson
  2021-09-20 21:24 ` [PATCH v3 10/30] Hexagon HVX (target/hexagon) instruction utility functions Taylor Simpson
                   ` (20 subsequent siblings)
  29 siblings, 1 reply; 51+ messages in thread
From: Taylor Simpson @ 2021-09-20 21:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: ale, bcain, tsimpson, richard.henderson, f4bug

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/gen_dectree_import.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/target/hexagon/gen_dectree_import.c b/target/hexagon/gen_dectree_import.c
index 5b7ecfc..ee35467 100644
--- a/target/hexagon/gen_dectree_import.c
+++ b/target/hexagon/gen_dectree_import.c
@@ -40,6 +40,11 @@ const char * const opcode_names[] = {
  *         Q6INSN(A2_add,"Rd32=add(Rs32,Rt32)",ATTRIBS(),
  *         "Add 32-bit registers",
  *         { RdV=RsV+RtV;})
+ *     HVX instructions have the following form
+ *         EXTINSN(V6_vinsertwr, "Vx32.w=vinsert(Rt32)",
+ *         ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VX,A_CVI_LATE),
+ *         "Insert Word Scalar into Vector",
+ *         VxV.uw[0] = RtV;)
  */
 const char * const opcode_syntax[XX_LAST_OPCODE] = {
 #define Q6INSN(TAG, BEH, ATTRIBS, DESCR, SEM) \
@@ -105,6 +110,14 @@ static const char *get_opcode_enc(int opcode)
 
 static const char *get_opcode_enc_class(int opcode)
 {
+    const char *tmp = opcode_encodings[opcode].encoding;
+    if (tmp == NULL) {
+        const char *test = "V6_";        /* HVX */
+        const char *name = opcode_names[opcode];
+        if (strncmp(name, test, strlen(test)) == 0) {
+            return "EXT_mmvec";
+        }
+    }
     return opcode_enc_class_names[opcode_encodings[opcode].enc_class];
 }
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 10/30] Hexagon HVX (target/hexagon) instruction utility functions
  2021-09-20 21:23 [PATCH v3 00/30] Hexagon HVX (target/hexagon) patch series Taylor Simpson
                   ` (8 preceding siblings ...)
  2021-09-20 21:24 ` [PATCH v3 09/30] Hexagon HVX (target/hexagon) C preprocessor for decode tree Taylor Simpson
@ 2021-09-20 21:24 ` Taylor Simpson
  2021-09-20 21:24 ` [PATCH v3 11/30] Hexagon HVX (target/hexagon) helper functions Taylor Simpson
                   ` (19 subsequent siblings)
  29 siblings, 0 replies; 51+ messages in thread
From: Taylor Simpson @ 2021-09-20 21:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: ale, bcain, tsimpson, richard.henderson, f4bug

Functions to support scatter/gather
Add new file to target/hexagon/meson.build

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/mmvec/system_ext_mmvec.h | 29 +++++++++++++++
 target/hexagon/mmvec/system_ext_mmvec.c | 66 +++++++++++++++++++++++++++++++++
 target/hexagon/meson.build              |  1 +
 3 files changed, 96 insertions(+)
 create mode 100644 target/hexagon/mmvec/system_ext_mmvec.h
 create mode 100644 target/hexagon/mmvec/system_ext_mmvec.c

diff --git a/target/hexagon/mmvec/system_ext_mmvec.h b/target/hexagon/mmvec/system_ext_mmvec.h
new file mode 100644
index 0000000..2963061
--- /dev/null
+++ b/target/hexagon/mmvec/system_ext_mmvec.h
@@ -0,0 +1,29 @@
+/*
+ *  Copyright(c) 2019-2021 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HEXAGON_SYSTEM_EXT_MMVEC_H
+#define HEXAGON_SYSTEM_EXT_MMVEC_H
+
+void mem_gather_store(CPUHexagonState *env, target_ulong vaddr, int slot);
+void mem_vector_scatter_init(CPUHexagonState *env, int slot,
+                             target_ulong base_vaddr, int length,
+                             int element_size);
+void mem_vector_gather_init(CPUHexagonState *env,
+                            target_ulong base_vaddr, int length,
+                            int element_size);
+
+#endif
diff --git a/target/hexagon/mmvec/system_ext_mmvec.c b/target/hexagon/mmvec/system_ext_mmvec.c
new file mode 100644
index 0000000..9de1a25
--- /dev/null
+++ b/target/hexagon/mmvec/system_ext_mmvec.c
@@ -0,0 +1,66 @@
+/*
+ *  Copyright(c) 2019-2021 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "cpu.h"
+#include "mmvec/system_ext_mmvec.h"
+
+void mem_gather_store(CPUHexagonState *env, target_ulong vaddr, int slot)
+{
+    size_t size = sizeof(MMVector);
+
+    env->vstore_pending[slot] = 1;
+    env->vstore[slot].va   = vaddr;
+    env->vstore[slot].size = size;
+    memcpy(&env->vstore[slot].data.ub[0], &env->tmp_VRegs[0], size);
+
+    /* On a gather store, overwrite the store mask to emulate dropped gathers */
+    bitmap_copy(env->vstore[slot].mask, env->vtcm_log.mask, size);
+}
+
+void mem_vector_scatter_init(CPUHexagonState *env, int slot,
+                             target_ulong base_vaddr,
+                             int length, int element_size)
+{
+    int i;
+
+    for (i = 0; i < sizeof(MMVector); i++) {
+        env->vtcm_log.data.ub[i] = 0;
+    }
+    bitmap_zero(env->vtcm_log.mask, MAX_VEC_SIZE_BYTES);
+
+    env->vtcm_pending = true;
+    env->vtcm_log.op = false;
+    env->vtcm_log.op_size = 0;
+    env->vtcm_log.size = sizeof(MMVector);
+}
+
+void mem_vector_gather_init(CPUHexagonState *env,
+                            target_ulong base_vaddr,
+                            int length, int element_size)
+{
+    int i;
+
+    for (i = 0; i < sizeof(MMVector); i++) {
+        env->vtcm_log.data.ub[i] = 0;
+        env->vtcm_log.va[i] = 0;
+        env->tmp_VRegs[0].ub[i] = 0;
+    }
+    bitmap_zero(env->vtcm_log.mask, MAX_VEC_SIZE_BYTES / 8);
+    env->vtcm_log.op = false;
+    env->vtcm_log.op_size = 0;
+}
diff --git a/target/hexagon/meson.build b/target/hexagon/meson.build
index 6fd9360..ed292b4 100644
--- a/target/hexagon/meson.build
+++ b/target/hexagon/meson.build
@@ -173,6 +173,7 @@ hexagon_ss.add(files(
     'printinsn.c',
     'arch.c',
     'fma_emu.c',
+    'mmvec/system_ext_mmvec.c',
 ))
 
 target_arch += {'hexagon': hexagon_ss}
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 11/30] Hexagon HVX (target/hexagon) helper functions
  2021-09-20 21:23 [PATCH v3 00/30] Hexagon HVX (target/hexagon) patch series Taylor Simpson
                   ` (9 preceding siblings ...)
  2021-09-20 21:24 ` [PATCH v3 10/30] Hexagon HVX (target/hexagon) instruction utility functions Taylor Simpson
@ 2021-09-20 21:24 ` Taylor Simpson
  2021-09-20 21:24 ` [PATCH v3 12/30] Hexagon HVX (target/hexagon) TCG generation Taylor Simpson
                   ` (18 subsequent siblings)
  29 siblings, 0 replies; 51+ messages in thread
From: Taylor Simpson @ 2021-09-20 21:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: ale, bcain, tsimpson, richard.henderson, f4bug

Commit vector stores (masked and scatter/gather)
Log vector register writes
Add the execution counters to the debug log
Histogram instructions

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/helper.h    |  13 +++
 target/hexagon/op_helper.c | 218 ++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 229 insertions(+), 2 deletions(-)

diff --git a/target/hexagon/helper.h b/target/hexagon/helper.h
index ca201fb..c99c1c1 100644
--- a/target/hexagon/helper.h
+++ b/target/hexagon/helper.h
@@ -23,6 +23,7 @@ DEF_HELPER_1(debug_start_packet, void, env)
 DEF_HELPER_FLAGS_3(debug_check_store_width, TCG_CALL_NO_WG, void, env, int, int)
 DEF_HELPER_FLAGS_3(debug_commit_end, TCG_CALL_NO_WG, void, env, int, int)
 DEF_HELPER_2(commit_store, void, env, int)
+DEF_HELPER_1(commit_hvx_stores, void, env)
 DEF_HELPER_FLAGS_4(fcircadd, TCG_CALL_NO_RWG_SE, s32, s32, s32, s32, s32)
 DEF_HELPER_FLAGS_1(fbrev, TCG_CALL_NO_RWG_SE, i32, i32)
 DEF_HELPER_3(sfrecipa, i64, env, f32, f32)
@@ -89,3 +90,15 @@ DEF_HELPER_4(sffms_lib, f32, env, f32, f32, f32)
 
 DEF_HELPER_3(dfmpyfix, f64, env, f64, f64)
 DEF_HELPER_4(dfmpyhh, f64, env, f64, f64, f64)
+
+/* Histogram instructions */
+DEF_HELPER_1(vhist, void, env)
+DEF_HELPER_1(vhistq, void, env)
+DEF_HELPER_1(vwhist256, void, env)
+DEF_HELPER_1(vwhist256q, void, env)
+DEF_HELPER_1(vwhist256_sat, void, env)
+DEF_HELPER_1(vwhist256q_sat, void, env)
+DEF_HELPER_1(vwhist128, void, env)
+DEF_HELPER_1(vwhist128q, void, env)
+DEF_HELPER_2(vwhist128m, void, env, s32)
+DEF_HELPER_2(vwhist128qm, void, env, s32)
diff --git a/target/hexagon/op_helper.c b/target/hexagon/op_helper.c
index 61d5cde..a0c50a3 100644
--- a/target/hexagon/op_helper.c
+++ b/target/hexagon/op_helper.c
@@ -27,6 +27,8 @@
 #include "arch.h"
 #include "hex_arch_types.h"
 #include "fma_emu.h"
+#include "mmvec/mmvec.h"
+#include "mmvec/macros.h"
 
 #define SF_BIAS        127
 #define SF_MANTBITS    23
@@ -164,6 +166,52 @@ void HELPER(commit_store)(CPUHexagonState *env, int slot_num)
     }
 }
 
+void HELPER(commit_hvx_stores)(CPUHexagonState *env)
+{
+    uintptr_t ra = GETPC();
+    int i;
+
+    /* Normal (possibly masked) vector store */
+    for (i = 0; i < VSTORES_MAX; i++) {
+        if (env->vstore_pending[i]) {
+            env->vstore_pending[i] = 0;
+            target_ulong va = env->vstore[i].va;
+            int size = env->vstore[i].size;
+            for (int j = 0; j < size; j++) {
+                if (test_bit(j, env->vstore[i].mask)) {
+                    cpu_stb_data_ra(env, va + j, env->vstore[i].data.ub[j], ra);
+                }
+            }
+        }
+    }
+
+    /* Scatter store */
+    if (env->vtcm_pending) {
+        env->vtcm_pending = false;
+        if (env->vtcm_log.op) {
+            /* Need to perform the scatter read/modify/write at commit time */
+            if (env->vtcm_log.op_size == 2) {
+                SCATTER_OP_WRITE_TO_MEM(uint16_t);
+            } else if (env->vtcm_log.op_size == 4) {
+                /* Word Scatter += */
+                SCATTER_OP_WRITE_TO_MEM(uint32_t);
+            } else {
+                g_assert_not_reached();
+            }
+        } else {
+            for (i = 0; i < env->vtcm_log.size; i++) {
+                if (test_bit(i, env->vtcm_log.mask)) {
+                    cpu_stb_data_ra(env, env->vtcm_log.va[i],
+                                    env->vtcm_log.data.ub[i], ra);
+                    clear_bit(i, env->vtcm_log.mask);
+                    env->vtcm_log.data.ub[i] = 0;
+                }
+
+            }
+        }
+    }
+}
+
 static void print_store(CPUHexagonState *env, int slot)
 {
     if (!(env->slot_cancelled & (1 << slot))) {
@@ -242,9 +290,10 @@ void HELPER(debug_commit_end)(CPUHexagonState *env, int has_st0, int has_st1)
     HEX_DEBUG_LOG("Next PC = " TARGET_FMT_lx "\n", env->next_PC);
     HEX_DEBUG_LOG("Exec counters: pkt = " TARGET_FMT_lx
                   ", insn = " TARGET_FMT_lx
-                  "\n",
+                  ", hvx = " TARGET_FMT_lx "\n",
                   env->gpr[HEX_REG_QEMU_PKT_CNT],
-                  env->gpr[HEX_REG_QEMU_INSN_CNT]);
+                  env->gpr[HEX_REG_QEMU_INSN_CNT],
+                  env->gpr[HEX_REG_QEMU_HVX_CNT]);
 
 }
 
@@ -1165,6 +1214,171 @@ float64 HELPER(dfmpyhh)(CPUHexagonState *env, float64 RxxV,
     return RxxV;
 }
 
+/* Histogram instructions */
+
+void HELPER(vhist)(CPUHexagonState *env)
+{
+    MMVector *input = &env->tmp_VRegs[0];
+
+    for (int lane = 0; lane < 8; lane++) {
+        for (int i = 0; i < sizeof(MMVector) / 8; ++i) {
+            unsigned char value = input->ub[(sizeof(MMVector) / 8) * lane + i];
+            unsigned char regno = value >> 3;
+            unsigned char element = value & 7;
+
+            env->VRegs[regno].uh[(sizeof(MMVector) / 16) * lane + element]++;
+        }
+    }
+}
+
+void HELPER(vhistq)(CPUHexagonState *env)
+{
+    MMVector *input = &env->tmp_VRegs[0];
+
+    for (int lane = 0; lane < 8; lane++) {
+        for (int i = 0; i < sizeof(MMVector) / 8; ++i) {
+            unsigned char value = input->ub[(sizeof(MMVector) / 8) * lane + i];
+            unsigned char regno = value >> 3;
+            unsigned char element = value & 7;
+
+            if (fGETQBIT(env->qtmp, sizeof(MMVector) / 8 * lane + i)) {
+                env->VRegs[regno].uh[
+                    (sizeof(MMVector) / 16) * lane + element]++;
+            }
+        }
+    }
+}
+
+void HELPER(vwhist256)(CPUHexagonState *env)
+{
+    MMVector *input = &env->tmp_VRegs[0];
+
+    for (int i = 0; i < (sizeof(MMVector) / 2); i++) {
+        unsigned int bucket = fGETUBYTE(0, input->h[i]);
+        unsigned int weight = fGETUBYTE(1, input->h[i]);
+        unsigned int vindex = (bucket >> 3) & 0x1F;
+        unsigned int elindex = ((i >> 0) & (~7)) | ((bucket >> 0) & 7);
+
+        env->VRegs[vindex].uh[elindex] =
+            env->VRegs[vindex].uh[elindex] + weight;
+    }
+}
+
+void HELPER(vwhist256q)(CPUHexagonState *env)
+{
+    MMVector *input = &env->tmp_VRegs[0];
+
+    for (int i = 0; i < (sizeof(MMVector) / 2); i++) {
+        unsigned int bucket = fGETUBYTE(0, input->h[i]);
+        unsigned int weight = fGETUBYTE(1, input->h[i]);
+        unsigned int vindex = (bucket >> 3) & 0x1F;
+        unsigned int elindex = ((i >> 0) & (~7)) | ((bucket >> 0) & 7);
+
+        if (fGETQBIT(env->qtmp, 2 * i)) {
+            env->VRegs[vindex].uh[elindex] =
+                env->VRegs[vindex].uh[elindex] + weight;
+        }
+    }
+}
+
+void HELPER(vwhist256_sat)(CPUHexagonState *env)
+{
+    MMVector *input = &env->tmp_VRegs[0];
+
+    for (int i = 0; i < (sizeof(MMVector) / 2); i++) {
+        unsigned int bucket = fGETUBYTE(0, input->h[i]);
+        unsigned int weight = fGETUBYTE(1, input->h[i]);
+        unsigned int vindex = (bucket >> 3) & 0x1F;
+        unsigned int elindex = ((i >> 0) & (~7)) | ((bucket >> 0) & 7);
+
+        env->VRegs[vindex].uh[elindex] =
+            fVSATUH(env->VRegs[vindex].uh[elindex] + weight);
+    }
+}
+
+void HELPER(vwhist256q_sat)(CPUHexagonState *env)
+{
+    MMVector *input = &env->tmp_VRegs[0];
+
+    for (int i = 0; i < (sizeof(MMVector) / 2); i++) {
+        unsigned int bucket = fGETUBYTE(0, input->h[i]);
+        unsigned int weight = fGETUBYTE(1, input->h[i]);
+        unsigned int vindex = (bucket >> 3) & 0x1F;
+        unsigned int elindex = ((i >> 0) & (~7)) | ((bucket >> 0) & 7);
+
+        if (fGETQBIT(env->qtmp, 2 * i)) {
+            env->VRegs[vindex].uh[elindex] =
+                fVSATUH(env->VRegs[vindex].uh[elindex] + weight);
+        }
+    }
+}
+
+void HELPER(vwhist128)(CPUHexagonState *env)
+{
+    MMVector *input = &env->tmp_VRegs[0];
+
+    for (int i = 0; i < (sizeof(MMVector) / 2); i++) {
+        unsigned int bucket = fGETUBYTE(0, input->h[i]);
+        unsigned int weight = fGETUBYTE(1, input->h[i]);
+        unsigned int vindex = (bucket >> 3) & 0x1F;
+        unsigned int elindex = ((i >> 1) & (~3)) | ((bucket >> 1) & 3);
+
+        env->VRegs[vindex].uw[elindex] =
+            env->VRegs[vindex].uw[elindex] + weight;
+    }
+}
+
+void HELPER(vwhist128q)(CPUHexagonState *env)
+{
+    MMVector *input = &env->tmp_VRegs[0];
+
+    for (int i = 0; i < (sizeof(MMVector) / 2); i++) {
+        unsigned int bucket = fGETUBYTE(0, input->h[i]);
+        unsigned int weight = fGETUBYTE(1, input->h[i]);
+        unsigned int vindex = (bucket >> 3) & 0x1F;
+        unsigned int elindex = ((i >> 1) & (~3)) | ((bucket >> 1) & 3);
+
+        if (fGETQBIT(env->qtmp, 2 * i)) {
+            env->VRegs[vindex].uw[elindex] =
+                env->VRegs[vindex].uw[elindex] + weight;
+        }
+    }
+}
+
+void HELPER(vwhist128m)(CPUHexagonState *env, int32_t uiV)
+{
+    MMVector *input = &env->tmp_VRegs[0];
+
+    for (int i = 0; i < (sizeof(MMVector) / 2); i++) {
+        unsigned int bucket = fGETUBYTE(0, input->h[i]);
+        unsigned int weight = fGETUBYTE(1, input->h[i]);
+        unsigned int vindex = (bucket >> 3) & 0x1F;
+        unsigned int elindex = ((i >> 1) & (~3)) | ((bucket >> 1) & 3);
+
+        if ((bucket & 1) == uiV) {
+            env->VRegs[vindex].uw[elindex] =
+                env->VRegs[vindex].uw[elindex] + weight;
+        }
+    }
+}
+
+void HELPER(vwhist128qm)(CPUHexagonState *env, int32_t uiV)
+{
+    MMVector *input = &env->tmp_VRegs[0];
+
+    for (int i = 0; i < (sizeof(MMVector) / 2); i++) {
+        unsigned int bucket = fGETUBYTE(0, input->h[i]);
+        unsigned int weight = fGETUBYTE(1, input->h[i]);
+        unsigned int vindex = (bucket >> 3) & 0x1F;
+        unsigned int elindex = ((i >> 1) & (~3)) | ((bucket >> 1) & 3);
+
+        if (((bucket & 1) == uiV) && fGETQBIT(env->qtmp, 2 * i)) {
+            env->VRegs[vindex].uw[elindex] =
+                env->VRegs[vindex].uw[elindex] + weight;
+        }
+    }
+}
+
 static void cancel_slot(CPUHexagonState *env, uint32_t slot)
 {
     HEX_DEBUG_LOG("Slot %d cancelled\n", slot);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 12/30] Hexagon HVX (target/hexagon) TCG generation
  2021-09-20 21:23 [PATCH v3 00/30] Hexagon HVX (target/hexagon) patch series Taylor Simpson
                   ` (10 preceding siblings ...)
  2021-09-20 21:24 ` [PATCH v3 11/30] Hexagon HVX (target/hexagon) helper functions Taylor Simpson
@ 2021-09-20 21:24 ` Taylor Simpson
  2021-09-20 21:24 ` [PATCH v3 13/30] Hexagon HVX (target/hexagon) helper overrides infrastructure Taylor Simpson
                   ` (17 subsequent siblings)
  29 siblings, 0 replies; 51+ messages in thread
From: Taylor Simpson @ 2021-09-20 21:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: ale, bcain, tsimpson, richard.henderson, f4bug

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/translate.h |  61 +++++++++++++
 target/hexagon/genptr.c    |  15 ++++
 target/hexagon/translate.c | 213 ++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 287 insertions(+), 2 deletions(-)

diff --git a/target/hexagon/translate.h b/target/hexagon/translate.h
index 703fd13..fccfb94 100644
--- a/target/hexagon/translate.h
+++ b/target/hexagon/translate.h
@@ -29,6 +29,7 @@ typedef struct DisasContext {
     uint32_t mem_idx;
     uint32_t num_packets;
     uint32_t num_insns;
+    uint32_t num_hvx_insns;
     int reg_log[REG_WRITES_MAX];
     int reg_log_idx;
     DECLARE_BITMAP(regs_written, TOTAL_PER_THREAD_REGS);
@@ -37,6 +38,20 @@ typedef struct DisasContext {
     DECLARE_BITMAP(pregs_written, NUM_PREGS);
     uint8_t store_width[STORES_MAX];
     bool s1_store_processed;
+    int future_vregs_idx;
+    int future_vregs_num[VECTOR_TEMPS_MAX];
+    int tmp_vregs_idx;
+    int tmp_vregs_num[VECTOR_TEMPS_MAX];
+    int vreg_log[NUM_VREGS];
+    bool vreg_is_predicated[NUM_VREGS];
+    int vreg_log_idx;
+    DECLARE_BITMAP(vregs_updated_tmp, NUM_VREGS);
+    DECLARE_BITMAP(vregs_updated, NUM_VREGS);
+    DECLARE_BITMAP(vregs_select, NUM_VREGS);
+    int qreg_log[NUM_QREGS];
+    bool qreg_is_predicated[NUM_QREGS];
+    int qreg_log_idx;
+    bool pre_commit;
 } DisasContext;
 
 static inline void ctx_log_reg_write(DisasContext *ctx, int rnum)
@@ -67,6 +82,46 @@ static inline bool is_preloaded(DisasContext *ctx, int num)
     return test_bit(num, ctx->regs_written);
 }
 
+intptr_t ctx_future_vreg_off(DisasContext *ctx, int regnum,
+                             int num, bool alloc_ok);
+intptr_t ctx_tmp_vreg_off(DisasContext *ctx, int regnum,
+                          int num, bool alloc_ok);
+
+static inline void ctx_log_vreg_write(DisasContext *ctx,
+                                      int rnum, VRegWriteType type,
+                                      bool is_predicated)
+{
+    if (type != EXT_TMP) {
+        ctx->vreg_log[ctx->vreg_log_idx] = rnum;
+        ctx->vreg_is_predicated[ctx->vreg_log_idx] = is_predicated;
+        ctx->vreg_log_idx++;
+
+        set_bit(rnum, ctx->vregs_updated);
+    }
+    if (type == EXT_NEW) {
+        set_bit(rnum, ctx->vregs_select);
+    }
+    if (type == EXT_TMP) {
+        set_bit(rnum, ctx->vregs_updated_tmp);
+    }
+}
+
+static inline void ctx_log_vreg_write_pair(DisasContext *ctx,
+                                           int rnum, VRegWriteType type,
+                                           bool is_predicated)
+{
+    ctx_log_vreg_write(ctx, rnum ^ 0, type, is_predicated);
+    ctx_log_vreg_write(ctx, rnum ^ 1, type, is_predicated);
+}
+
+static inline void ctx_log_qreg_write(DisasContext *ctx,
+                                      int rnum, bool is_predicated)
+{
+    ctx->qreg_log[ctx->qreg_log_idx] = rnum;
+    ctx->qreg_is_predicated[ctx->qreg_log_idx] = is_predicated;
+    ctx->qreg_log_idx++;
+}
+
 extern TCGv hex_gpr[TOTAL_PER_THREAD_REGS];
 extern TCGv hex_pred[NUM_PREGS];
 extern TCGv hex_next_PC;
@@ -85,6 +140,12 @@ extern TCGv hex_dczero_addr;
 extern TCGv hex_llsc_addr;
 extern TCGv hex_llsc_val;
 extern TCGv_i64 hex_llsc_val_i64;
+extern TCGv hex_VRegs_updated;
+extern TCGv hex_QRegs_updated;
+extern TCGv hex_vstore_addr[VSTORES_MAX];
+extern TCGv hex_vstore_size[VSTORES_MAX];
+extern TCGv hex_vstore_pending[VSTORES_MAX];
 
+bool is_gather_store_insn(Insn *insn, Packet *pkt);
 void process_store(DisasContext *ctx, Packet *pkt, int slot_num);
 #endif
diff --git a/target/hexagon/genptr.c b/target/hexagon/genptr.c
index 7333299..da8527d 100644
--- a/target/hexagon/genptr.c
+++ b/target/hexagon/genptr.c
@@ -167,6 +167,9 @@ static inline void gen_read_ctrl_reg(DisasContext *ctx, const int reg_num,
     } else if (reg_num == HEX_REG_QEMU_INSN_CNT) {
         tcg_gen_addi_tl(dest, hex_gpr[HEX_REG_QEMU_INSN_CNT],
                         ctx->num_insns);
+    } else if (reg_num == HEX_REG_QEMU_HVX_CNT) {
+        tcg_gen_addi_tl(dest, hex_gpr[HEX_REG_QEMU_HVX_CNT],
+                        ctx->num_hvx_insns);
     } else {
         tcg_gen_mov_tl(dest, hex_gpr[reg_num]);
     }
@@ -194,6 +197,12 @@ static inline void gen_read_ctrl_reg_pair(DisasContext *ctx, const int reg_num,
         tcg_gen_concat_i32_i64(dest, pkt_cnt, insn_cnt);
         tcg_temp_free(pkt_cnt);
         tcg_temp_free(insn_cnt);
+    } else if (reg_num == HEX_REG_QEMU_HVX_CNT) {
+        TCGv hvx_cnt = tcg_temp_new();
+        tcg_gen_addi_tl(hvx_cnt, hex_gpr[HEX_REG_QEMU_HVX_CNT],
+                        ctx->num_hvx_insns);
+        tcg_gen_concat_i32_i64(dest, hvx_cnt, hex_gpr[reg_num + 1]);
+        tcg_temp_free(hvx_cnt);
     } else {
         tcg_gen_concat_i32_i64(dest,
             hex_gpr[reg_num],
@@ -229,6 +238,9 @@ static inline void gen_write_ctrl_reg(DisasContext *ctx, int reg_num,
         if (reg_num == HEX_REG_QEMU_INSN_CNT) {
             ctx->num_insns = 0;
         }
+        if (reg_num == HEX_REG_QEMU_HVX_CNT) {
+            ctx->num_hvx_insns = 0;
+        }
     }
 }
 
@@ -250,6 +262,9 @@ static inline void gen_write_ctrl_reg_pair(DisasContext *ctx, int reg_num,
             ctx->num_packets = 0;
             ctx->num_insns = 0;
         }
+        if (reg_num == HEX_REG_QEMU_HVX_CNT) {
+            ctx->num_hvx_insns = 0;
+        }
     }
 }
 
diff --git a/target/hexagon/translate.c b/target/hexagon/translate.c
index 6fb4e68..915a541 100644
--- a/target/hexagon/translate.c
+++ b/target/hexagon/translate.c
@@ -19,6 +19,7 @@
 #include "qemu/osdep.h"
 #include "cpu.h"
 #include "tcg/tcg-op.h"
+#include "tcg/tcg-op-gvec.h"
 #include "exec/cpu_ldst.h"
 #include "exec/log.h"
 #include "internal.h"
@@ -47,11 +48,60 @@ TCGv hex_dczero_addr;
 TCGv hex_llsc_addr;
 TCGv hex_llsc_val;
 TCGv_i64 hex_llsc_val_i64;
+TCGv hex_VRegs_updated;
+TCGv hex_QRegs_updated;
+TCGv hex_vstore_addr[VSTORES_MAX];
+TCGv hex_vstore_size[VSTORES_MAX];
+TCGv hex_vstore_pending[VSTORES_MAX];
 
 static const char * const hexagon_prednames[] = {
   "p0", "p1", "p2", "p3"
 };
 
+intptr_t ctx_future_vreg_off(DisasContext *ctx, int regnum,
+                          int num, bool alloc_ok)
+{
+    intptr_t offset;
+
+    /* See if it is already allocated */
+    for (int i = 0; i < ctx->future_vregs_idx; i++) {
+        if (ctx->future_vregs_num[i] == regnum) {
+            return offsetof(CPUHexagonState, future_VRegs[i]);
+        }
+    }
+
+    g_assert(alloc_ok);
+    offset = offsetof(CPUHexagonState, future_VRegs[ctx->future_vregs_idx]);
+    for (int i = 0; i < num; i++) {
+        ctx->future_vregs_num[ctx->future_vregs_idx + i] = regnum++;
+    }
+    ctx->future_vregs_idx += num;
+    g_assert(ctx->future_vregs_idx <= VECTOR_TEMPS_MAX);
+    return offset;
+}
+
+intptr_t ctx_tmp_vreg_off(DisasContext *ctx, int regnum,
+                          int num, bool alloc_ok)
+{
+    intptr_t offset;
+
+    /* See if it is already allocated */
+    for (int i = 0; i < ctx->tmp_vregs_idx; i++) {
+        if (ctx->tmp_vregs_num[i] == regnum) {
+            return offsetof(CPUHexagonState, tmp_VRegs[i]);
+        }
+    }
+
+    g_assert(alloc_ok);
+    offset = offsetof(CPUHexagonState, tmp_VRegs[ctx->tmp_vregs_idx]);
+    for (int i = 0; i < num; i++) {
+        ctx->tmp_vregs_num[ctx->tmp_vregs_idx + i] = regnum++;
+    }
+    ctx->tmp_vregs_idx += num;
+    g_assert(ctx->tmp_vregs_idx <= VECTOR_TEMPS_MAX);
+    return offset;
+}
+
 static void gen_exception_raw(int excp)
 {
     TCGv_i32 helper_tmp = tcg_const_i32(excp);
@@ -65,6 +115,8 @@ static void gen_exec_counters(DisasContext *ctx)
                     hex_gpr[HEX_REG_QEMU_PKT_CNT], ctx->num_packets);
     tcg_gen_addi_tl(hex_gpr[HEX_REG_QEMU_INSN_CNT],
                     hex_gpr[HEX_REG_QEMU_INSN_CNT], ctx->num_insns);
+    tcg_gen_addi_tl(hex_gpr[HEX_REG_QEMU_HVX_CNT],
+                    hex_gpr[HEX_REG_QEMU_HVX_CNT], ctx->num_hvx_insns);
 }
 
 static void gen_end_tb(DisasContext *ctx)
@@ -173,11 +225,19 @@ static void gen_start_packet(DisasContext *ctx, Packet *pkt)
     bitmap_zero(ctx->regs_written, TOTAL_PER_THREAD_REGS);
     ctx->preg_log_idx = 0;
     bitmap_zero(ctx->pregs_written, NUM_PREGS);
+    ctx->future_vregs_idx = 0;
+    ctx->tmp_vregs_idx = 0;
+    ctx->vreg_log_idx = 0;
+    bitmap_zero(ctx->vregs_updated_tmp, NUM_VREGS);
+    bitmap_zero(ctx->vregs_updated, NUM_VREGS);
+    bitmap_zero(ctx->vregs_select, NUM_VREGS);
+    ctx->qreg_log_idx = 0;
     for (i = 0; i < STORES_MAX; i++) {
         ctx->store_width[i] = 0;
     }
     tcg_gen_movi_tl(hex_pkt_has_store_s1, pkt->pkt_has_store_s1);
     ctx->s1_store_processed = false;
+    ctx->pre_commit = true;
 
     if (HEX_DEBUG) {
         /* Handy place to set a breakpoint before the packet executes */
@@ -199,6 +259,26 @@ static void gen_start_packet(DisasContext *ctx, Packet *pkt)
     if (need_pred_written(pkt)) {
         tcg_gen_movi_tl(hex_pred_written, 0);
     }
+
+    if (pkt->pkt_has_hvx) {
+        tcg_gen_movi_tl(hex_VRegs_updated, 0);
+        tcg_gen_movi_tl(hex_QRegs_updated, 0);
+    }
+}
+
+bool is_gather_store_insn(Insn *insn, Packet *pkt)
+{
+    if (GET_ATTRIB(insn->opcode, A_CVI_NEW) &&
+        insn->new_value_producer_slot == 1) {
+        /* Look for gather instruction */
+        for (int i = 0; i < pkt->num_insns; i++) {
+            Insn *in = &pkt->insn[i];
+            if (GET_ATTRIB(in->opcode, A_CVI_GATHER) && in->slot == 1) {
+                return true;
+            }
+        }
+    }
+    return false;
 }
 
 /*
@@ -452,10 +532,102 @@ static void process_dczeroa(DisasContext *ctx, Packet *pkt)
     }
 }
 
+static bool pkt_has_hvx_store(Packet *pkt)
+{
+    int i;
+    for (i = 0; i < pkt->num_insns; i++) {
+        int opcode = pkt->insn[i].opcode;
+        if (GET_ATTRIB(opcode, A_CVI) && GET_ATTRIB(opcode, A_STORE)) {
+            return true;
+        }
+    }
+    return false;
+}
+
+static void gen_commit_hvx(DisasContext *ctx, Packet *pkt)
+{
+    int i;
+
+    /*
+     *    for (i = 0; i < ctx->vreg_log_idx; i++) {
+     *        int rnum = ctx->vreg_log[i];
+     *        if (ctx->vreg_is_predicated[i]) {
+     *            if (env->VRegs_updated & (1 << rnum)) {
+     *                env->VRegs[rnum] = env->future_VRegs[rnum];
+     *            }
+     *        } else {
+     *            env->VRegs[rnum] = env->future_VRegs[rnum];
+     *        }
+     *    }
+     */
+    for (i = 0; i < ctx->vreg_log_idx; i++) {
+        int rnum = ctx->vreg_log[i];
+        bool is_predicated = ctx->vreg_is_predicated[i];
+        intptr_t dstoff = offsetof(CPUHexagonState, VRegs[rnum]);
+        intptr_t srcoff = ctx_future_vreg_off(ctx, rnum, 1, false);
+        size_t size = sizeof(MMVector);
+
+        if (is_predicated) {
+            TCGv cmp = tcg_temp_local_new();
+            TCGLabel *label_skip = gen_new_label();
+
+            tcg_gen_andi_tl(cmp, hex_VRegs_updated, 1 << rnum);
+            tcg_gen_brcondi_tl(TCG_COND_EQ, cmp, 0, label_skip);
+            {
+                tcg_gen_gvec_mov(MO_64, dstoff, srcoff, size, size);
+            }
+            gen_set_label(label_skip);
+            tcg_temp_free(cmp);
+        } else {
+            tcg_gen_gvec_mov(MO_64, dstoff, srcoff, size, size);
+        }
+    }
+
+    /*
+     *    for (i = 0; i < ctx->qreg_log_idx; i++) {
+     *        int rnum = ctx->qreg_log[i];
+     *        if (ctx->qreg_is_predicated[i]) {
+     *            if (env->QRegs_updated) & (1 << rnum)) {
+     *                env->QRegs[rnum] = env->future_QRegs[rnum];
+     *            }
+     *        } else {
+     *            env->QRegs[rnum] = env->future_QRegs[rnum];
+     *        }
+     *    }
+     */
+    for (i = 0; i < ctx->qreg_log_idx; i++) {
+        int rnum = ctx->qreg_log[i];
+        bool is_predicated = ctx->qreg_is_predicated[i];
+        intptr_t dstoff = offsetof(CPUHexagonState, QRegs[rnum]);
+        intptr_t srcoff = offsetof(CPUHexagonState, future_QRegs[rnum]);
+        size_t size = sizeof(MMQReg);
+
+        if (is_predicated) {
+            TCGv cmp = tcg_temp_local_new();
+            TCGLabel *label_skip = gen_new_label();
+
+            tcg_gen_andi_tl(cmp, hex_QRegs_updated, 1 << rnum);
+            tcg_gen_brcondi_tl(TCG_COND_EQ, cmp, 0, label_skip);
+            {
+                tcg_gen_gvec_mov(MO_64, dstoff, srcoff, size, size);
+            }
+            gen_set_label(label_skip);
+            tcg_temp_free(cmp);
+        } else {
+            tcg_gen_gvec_mov(MO_64, dstoff, srcoff, size, size);
+        }
+    }
+
+    if (pkt_has_hvx_store(pkt)) {
+        gen_helper_commit_hvx_stores(cpu_env);
+    }
+}
+
 static void update_exec_counters(DisasContext *ctx, Packet *pkt)
 {
     int num_insns = pkt->num_insns;
     int num_real_insns = 0;
+    int num_hvx_insns = 0;
 
     for (int i = 0; i < num_insns; i++) {
         if (!pkt->insn[i].is_endloop &&
@@ -463,18 +635,26 @@ static void update_exec_counters(DisasContext *ctx, Packet *pkt)
             !GET_ATTRIB(pkt->insn[i].opcode, A_IT_NOP)) {
             num_real_insns++;
         }
+        if (GET_ATTRIB(pkt->insn[i].opcode, A_CVI)) {
+            num_hvx_insns++;
+        }
     }
 
     ctx->num_packets++;
     ctx->num_insns += num_real_insns;
+    ctx->num_hvx_insns += num_hvx_insns;
 }
 
-static void gen_commit_packet(DisasContext *ctx, Packet *pkt)
+static void gen_commit_packet(CPUHexagonState *env, DisasContext *ctx,
+                              Packet *pkt)
 {
     gen_reg_writes(ctx);
     gen_pred_writes(ctx, pkt);
     process_store_log(ctx, pkt);
     process_dczeroa(ctx, pkt);
+    if (pkt->pkt_has_hvx) {
+        gen_commit_hvx(ctx, pkt);
+    }
     update_exec_counters(ctx, pkt);
     if (HEX_DEBUG) {
         TCGv has_st0 =
@@ -489,6 +669,11 @@ static void gen_commit_packet(DisasContext *ctx, Packet *pkt)
         tcg_temp_free(has_st1);
     }
 
+    if (pkt->vhist_insn != NULL) {
+        ctx->pre_commit = false;
+        pkt->vhist_insn->generate(env, ctx, pkt->vhist_insn, pkt);
+    }
+
     if (pkt->pkt_has_cof) {
         gen_end_tb(ctx);
     }
@@ -513,7 +698,7 @@ static void decode_and_translate_packet(CPUHexagonState *env, DisasContext *ctx)
         for (i = 0; i < pkt.num_insns; i++) {
             gen_insn(env, ctx, &pkt.insn[i], &pkt);
         }
-        gen_commit_packet(ctx, &pkt);
+        gen_commit_packet(env, ctx, &pkt);
         ctx->base.pc_next += pkt.encod_pkt_size_in_bytes;
     } else {
         gen_exception_end_tb(ctx, HEX_EXCP_INVALID_PACKET);
@@ -528,6 +713,7 @@ static void hexagon_tr_init_disas_context(DisasContextBase *dcbase,
     ctx->mem_idx = MMU_USER_IDX;
     ctx->num_packets = 0;
     ctx->num_insns = 0;
+    ctx->num_hvx_insns = 0;
 }
 
 static void hexagon_tr_tb_start(DisasContextBase *db, CPUState *cpu)
@@ -636,6 +822,9 @@ static char store_addr_names[STORES_MAX][NAME_LEN];
 static char store_width_names[STORES_MAX][NAME_LEN];
 static char store_val32_names[STORES_MAX][NAME_LEN];
 static char store_val64_names[STORES_MAX][NAME_LEN];
+static char vstore_addr_names[VSTORES_MAX][NAME_LEN];
+static char vstore_size_names[VSTORES_MAX][NAME_LEN];
+static char vstore_pending_names[VSTORES_MAX][NAME_LEN];
 
 void hexagon_translate_init(void)
 {
@@ -698,6 +887,10 @@ void hexagon_translate_init(void)
         offsetof(CPUHexagonState, llsc_val), "llsc_val");
     hex_llsc_val_i64 = tcg_global_mem_new_i64(cpu_env,
         offsetof(CPUHexagonState, llsc_val_i64), "llsc_val_i64");
+    hex_VRegs_updated = tcg_global_mem_new(cpu_env,
+        offsetof(CPUHexagonState, VRegs_updated), "VRegs_updated");
+    hex_QRegs_updated = tcg_global_mem_new(cpu_env,
+        offsetof(CPUHexagonState, QRegs_updated), "QRegs_updated");
     for (i = 0; i < STORES_MAX; i++) {
         snprintf(store_addr_names[i], NAME_LEN, "store_addr_%d", i);
         hex_store_addr[i] = tcg_global_mem_new(cpu_env,
@@ -719,4 +912,20 @@ void hexagon_translate_init(void)
             offsetof(CPUHexagonState, mem_log_stores[i].data64),
             store_val64_names[i]);
     }
+    for (int i = 0; i < VSTORES_MAX; i++) {
+        snprintf(vstore_addr_names[i], NAME_LEN, "vstore_addr_%d", i);
+        hex_vstore_addr[i] = tcg_global_mem_new(cpu_env,
+            offsetof(CPUHexagonState, vstore[i].va),
+            vstore_addr_names[i]);
+
+        snprintf(vstore_size_names[i], NAME_LEN, "vstore_size_%d", i);
+        hex_vstore_size[i] = tcg_global_mem_new(cpu_env,
+            offsetof(CPUHexagonState, vstore[i].size),
+            vstore_size_names[i]);
+
+        snprintf(vstore_pending_names[i], NAME_LEN, "vstore_pending_%d", i);
+        hex_vstore_pending[i] = tcg_global_mem_new(cpu_env,
+            offsetof(CPUHexagonState, vstore_pending[i]),
+            vstore_pending_names[i]);
+    }
 }
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 13/30] Hexagon HVX (target/hexagon) helper overrides infrastructure
  2021-09-20 21:23 [PATCH v3 00/30] Hexagon HVX (target/hexagon) patch series Taylor Simpson
                   ` (11 preceding siblings ...)
  2021-09-20 21:24 ` [PATCH v3 12/30] Hexagon HVX (target/hexagon) TCG generation Taylor Simpson
@ 2021-09-20 21:24 ` Taylor Simpson
  2021-09-20 21:24 ` [PATCH v3 14/30] Hexagon HVX (target/hexagon) helper overrides for histogram instructions Taylor Simpson
                   ` (16 subsequent siblings)
  29 siblings, 0 replies; 51+ messages in thread
From: Taylor Simpson @ 2021-09-20 21:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: ale, bcain, tsimpson, richard.henderson, f4bug

Build the infrastructure to create overrides for HVX instructions.
We create a new empty file (gen_tcg_hvx.h) that will be populated
in subsequent patches.

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/gen_tcg_hvx.h        | 21 +++++++++++++++++++++
 target/hexagon/genptr.c             |  1 +
 target/hexagon/gen_helper_funcs.py  |  3 ++-
 target/hexagon/gen_helper_protos.py |  3 ++-
 target/hexagon/gen_tcg_funcs.py     |  3 ++-
 target/hexagon/meson.build          | 13 +++++++------
 6 files changed, 35 insertions(+), 9 deletions(-)
 create mode 100644 target/hexagon/gen_tcg_hvx.h

diff --git a/target/hexagon/gen_tcg_hvx.h b/target/hexagon/gen_tcg_hvx.h
new file mode 100644
index 0000000..b5c6cad
--- /dev/null
+++ b/target/hexagon/gen_tcg_hvx.h
@@ -0,0 +1,21 @@
+/*
+ *  Copyright(c) 2019-2021 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HEXAGON_GEN_TCG_HVX_H
+#define HEXAGON_GEN_TCG_HVX_H
+
+#endif
diff --git a/target/hexagon/genptr.c b/target/hexagon/genptr.c
index da8527d..5a9a7df 100644
--- a/target/hexagon/genptr.c
+++ b/target/hexagon/genptr.c
@@ -26,6 +26,7 @@
 #include "macros.h"
 #undef QEMU_GENERATE
 #include "gen_tcg.h"
+#include "gen_tcg_hvx.h"
 
 static inline void gen_log_predicated_reg_write(int rnum, TCGv val, int slot)
 {
diff --git a/target/hexagon/gen_helper_funcs.py b/target/hexagon/gen_helper_funcs.py
index ac5ce10..a446c45 100755
--- a/target/hexagon/gen_helper_funcs.py
+++ b/target/hexagon/gen_helper_funcs.py
@@ -286,11 +286,12 @@ def main():
     hex_common.read_semantics_file(sys.argv[1])
     hex_common.read_attribs_file(sys.argv[2])
     hex_common.read_overrides_file(sys.argv[3])
+    hex_common.read_overrides_file(sys.argv[4])
     hex_common.calculate_attribs()
     tagregs = hex_common.get_tagregs()
     tagimms = hex_common.get_tagimms()
 
-    with open(sys.argv[4], 'w') as f:
+    with open(sys.argv[5], 'w') as f:
         for tag in hex_common.tags:
             ## Skip the priv instructions
             if ( "A_PRIV" in hex_common.attribdict[tag] ) :
diff --git a/target/hexagon/gen_helper_protos.py b/target/hexagon/gen_helper_protos.py
index 229ef8d..3b4e993 100755
--- a/target/hexagon/gen_helper_protos.py
+++ b/target/hexagon/gen_helper_protos.py
@@ -135,11 +135,12 @@ def main():
     hex_common.read_semantics_file(sys.argv[1])
     hex_common.read_attribs_file(sys.argv[2])
     hex_common.read_overrides_file(sys.argv[3])
+    hex_common.read_overrides_file(sys.argv[4])
     hex_common.calculate_attribs()
     tagregs = hex_common.get_tagregs()
     tagimms = hex_common.get_tagimms()
 
-    with open(sys.argv[4], 'w') as f:
+    with open(sys.argv[5], 'w') as f:
         for tag in hex_common.tags:
             ## Skip the priv instructions
             if ( "A_PRIV" in hex_common.attribdict[tag] ) :
diff --git a/target/hexagon/gen_tcg_funcs.py b/target/hexagon/gen_tcg_funcs.py
index 1abe59d..479d6be 100755
--- a/target/hexagon/gen_tcg_funcs.py
+++ b/target/hexagon/gen_tcg_funcs.py
@@ -688,11 +688,12 @@ def main():
     hex_common.read_semantics_file(sys.argv[1])
     hex_common.read_attribs_file(sys.argv[2])
     hex_common.read_overrides_file(sys.argv[3])
+    hex_common.read_overrides_file(sys.argv[4])
     hex_common.calculate_attribs()
     tagregs = hex_common.get_tagregs()
     tagimms = hex_common.get_tagimms()
 
-    with open(sys.argv[4], 'w') as f:
+    with open(sys.argv[5], 'w') as f:
         f.write("#ifndef HEXAGON_TCG_FUNCS_H\n")
         f.write("#define HEXAGON_TCG_FUNCS_H\n\n")
 
diff --git a/target/hexagon/meson.build b/target/hexagon/meson.build
index ed292b4..cae366c 100644
--- a/target/hexagon/meson.build
+++ b/target/hexagon/meson.build
@@ -20,6 +20,7 @@ hexagon_ss = ss.source_set()
 hex_common_py = 'hex_common.py'
 attribs_def = meson.current_source_dir() / 'attribs_def.h.inc'
 gen_tcg_h = meson.current_source_dir() / 'gen_tcg.h'
+gen_tcg_hvx_h = meson.current_source_dir() / 'gen_tcg_hvx.h'
 
 #
 #  Step 1
@@ -63,8 +64,8 @@ helper_protos_generated = custom_target(
     'helper_protos_generated.h.inc',
     output: 'helper_protos_generated.h.inc',
     depends: [semantics_generated],
-    depend_files: [hex_common_py, attribs_def, gen_tcg_h],
-    command: [python, files('gen_helper_protos.py'), semantics_generated, attribs_def, gen_tcg_h, '@OUTPUT@'],
+    depend_files: [hex_common_py, attribs_def, gen_tcg_h, gen_tcg_hvx_h],
+    command: [python, files('gen_helper_protos.py'), semantics_generated, attribs_def, gen_tcg_h, gen_tcg_hvx_h, '@OUTPUT@'],
 )
 hexagon_ss.add(helper_protos_generated)
 
@@ -72,8 +73,8 @@ tcg_funcs_generated = custom_target(
     'tcg_funcs_generated.c.inc',
     output: 'tcg_funcs_generated.c.inc',
     depends: [semantics_generated],
-    depend_files: [hex_common_py, attribs_def, gen_tcg_h],
-    command: [python, files('gen_tcg_funcs.py'), semantics_generated, attribs_def, gen_tcg_h, '@OUTPUT@'],
+    depend_files: [hex_common_py, attribs_def, gen_tcg_h, gen_tcg_hvx_h],
+    command: [python, files('gen_tcg_funcs.py'), semantics_generated, attribs_def, gen_tcg_h, gen_tcg_hvx_h, '@OUTPUT@'],
 )
 hexagon_ss.add(tcg_funcs_generated)
 
@@ -90,8 +91,8 @@ helper_funcs_generated = custom_target(
     'helper_funcs_generated.c.inc',
     output: 'helper_funcs_generated.c.inc',
     depends: [semantics_generated],
-    depend_files: [hex_common_py, attribs_def, gen_tcg_h],
-    command: [python, files('gen_helper_funcs.py'), semantics_generated, attribs_def, gen_tcg_h, '@OUTPUT@'],
+    depend_files: [hex_common_py, attribs_def, gen_tcg_h, gen_tcg_hvx_h],
+    command: [python, files('gen_helper_funcs.py'), semantics_generated, attribs_def, gen_tcg_h, gen_tcg_hvx_h, '@OUTPUT@'],
 )
 hexagon_ss.add(helper_funcs_generated)
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 14/30] Hexagon HVX (target/hexagon) helper overrides for histogram instructions
  2021-09-20 21:23 [PATCH v3 00/30] Hexagon HVX (target/hexagon) patch series Taylor Simpson
                   ` (12 preceding siblings ...)
  2021-09-20 21:24 ` [PATCH v3 13/30] Hexagon HVX (target/hexagon) helper overrides infrastructure Taylor Simpson
@ 2021-09-20 21:24 ` Taylor Simpson
  2021-09-20 21:24 ` [PATCH v3 15/30] Hexagon HVX (target/hexagon) helper overrides - vector assign & cmov Taylor Simpson
                   ` (15 subsequent siblings)
  29 siblings, 0 replies; 51+ messages in thread
From: Taylor Simpson @ 2021-09-20 21:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: ale, bcain, tsimpson, richard.henderson, f4bug

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/gen_tcg_hvx.h | 108 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 108 insertions(+)

diff --git a/target/hexagon/gen_tcg_hvx.h b/target/hexagon/gen_tcg_hvx.h
index b5c6cad..eb29566 100644
--- a/target/hexagon/gen_tcg_hvx.h
+++ b/target/hexagon/gen_tcg_hvx.h
@@ -18,4 +18,112 @@
 #ifndef HEXAGON_GEN_TCG_HVX_H
 #define HEXAGON_GEN_TCG_HVX_H
 
+/*
+ * Histogram instructions
+ *
+ * Note that these instructions operate directly on the vector registers
+ * and therefore happen after commit.
+ *
+ * The generate_<tag> function is called twice
+ *     The first time is during the normal TCG generation
+ *         ctx->pre_commit is true
+ *         In the masked cases, we save the mask to the qtmp temporary
+ *         Otherwise, there is nothing to do
+ *     The second call is at the end of gen_commit_packet
+ *         ctx->pre_commit is false
+ *         Generate the call to the helper
+ */
+
+static inline void assert_vhist_tmp(DisasContext *ctx)
+{
+    /* vhist instructions require exactly one .tmp to be defined */
+    g_assert(ctx->tmp_vregs_idx == 1);
+}
+
+#define fGEN_TCG_V6_vhist(SHORTCODE) \
+    if (!ctx->pre_commit) { \
+        assert_vhist_tmp(ctx); \
+        gen_helper_vhist(cpu_env); \
+    }
+#define fGEN_TCG_V6_vhistq(SHORTCODE) \
+    do { \
+        if (ctx->pre_commit) { \
+            intptr_t dstoff = offsetof(CPUHexagonState, qtmp); \
+            tcg_gen_gvec_mov(MO_64, dstoff, QvV_off, \
+                             sizeof(MMVector), sizeof(MMVector)); \
+        } else { \
+            assert_vhist_tmp(ctx); \
+            gen_helper_vhistq(cpu_env); \
+        } \
+    } while (0)
+#define fGEN_TCG_V6_vwhist256(SHORTCODE) \
+    if (!ctx->pre_commit) { \
+        assert_vhist_tmp(ctx); \
+        gen_helper_vwhist256(cpu_env); \
+    }
+#define fGEN_TCG_V6_vwhist256q(SHORTCODE) \
+    do { \
+        if (ctx->pre_commit) { \
+            intptr_t dstoff = offsetof(CPUHexagonState, qtmp); \
+            tcg_gen_gvec_mov(MO_64, dstoff, QvV_off, \
+                             sizeof(MMVector), sizeof(MMVector)); \
+        } else { \
+            assert_vhist_tmp(ctx); \
+            gen_helper_vwhist256q(cpu_env); \
+        } \
+    } while (0)
+#define fGEN_TCG_V6_vwhist256_sat(SHORTCODE) \
+    if (!ctx->pre_commit) { \
+        assert_vhist_tmp(ctx); \
+        gen_helper_vwhist256_sat(cpu_env); \
+    }
+#define fGEN_TCG_V6_vwhist256q_sat(SHORTCODE) \
+    do { \
+        if (ctx->pre_commit) { \
+            intptr_t dstoff = offsetof(CPUHexagonState, qtmp); \
+            tcg_gen_gvec_mov(MO_64, dstoff, QvV_off, \
+                             sizeof(MMVector), sizeof(MMVector)); \
+        } else { \
+            assert_vhist_tmp(ctx); \
+            gen_helper_vwhist256q_sat(cpu_env); \
+        } \
+    } while (0)
+#define fGEN_TCG_V6_vwhist128(SHORTCODE) \
+    if (!ctx->pre_commit) { \
+        assert_vhist_tmp(ctx); \
+        gen_helper_vwhist128(cpu_env); \
+    }
+#define fGEN_TCG_V6_vwhist128q(SHORTCODE) \
+    do { \
+        if (ctx->pre_commit) { \
+            intptr_t dstoff = offsetof(CPUHexagonState, qtmp); \
+            tcg_gen_gvec_mov(MO_64, dstoff, QvV_off, \
+                             sizeof(MMVector), sizeof(MMVector)); \
+        } else { \
+            assert_vhist_tmp(ctx); \
+            gen_helper_vwhist128q(cpu_env); \
+        } \
+    } while (0)
+#define fGEN_TCG_V6_vwhist128m(SHORTCODE) \
+    if (!ctx->pre_commit) { \
+        TCGv tcgv_uiV = tcg_const_tl(uiV); \
+        assert_vhist_tmp(ctx); \
+        gen_helper_vwhist128m(cpu_env, tcgv_uiV); \
+        tcg_temp_free(tcgv_uiV); \
+    }
+#define fGEN_TCG_V6_vwhist128qm(SHORTCODE) \
+    do { \
+        if (ctx->pre_commit) { \
+            intptr_t dstoff = offsetof(CPUHexagonState, qtmp); \
+            tcg_gen_gvec_mov(MO_64, dstoff, QvV_off, \
+                             sizeof(MMVector), sizeof(MMVector)); \
+        } else { \
+            TCGv tcgv_uiV = tcg_const_tl(uiV); \
+            assert_vhist_tmp(ctx); \
+            gen_helper_vwhist128qm(cpu_env, tcgv_uiV); \
+            tcg_temp_free(tcgv_uiV); \
+        } \
+    } while (0)
+
+
 #endif
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 15/30] Hexagon HVX (target/hexagon) helper overrides - vector assign & cmov
  2021-09-20 21:23 [PATCH v3 00/30] Hexagon HVX (target/hexagon) patch series Taylor Simpson
                   ` (13 preceding siblings ...)
  2021-09-20 21:24 ` [PATCH v3 14/30] Hexagon HVX (target/hexagon) helper overrides for histogram instructions Taylor Simpson
@ 2021-09-20 21:24 ` Taylor Simpson
  2021-09-20 21:59   ` Philippe Mathieu-Daudé
  2021-09-20 23:19   ` Richard Henderson
  2021-09-20 21:24 ` [PATCH v3 16/30] Hexagon HVX (target/hexagon) helper overrides - vector add & sub Taylor Simpson
                   ` (14 subsequent siblings)
  29 siblings, 2 replies; 51+ messages in thread
From: Taylor Simpson @ 2021-09-20 21:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: ale, bcain, tsimpson, richard.henderson, f4bug

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/gen_tcg_hvx.h | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/target/hexagon/gen_tcg_hvx.h b/target/hexagon/gen_tcg_hvx.h
index eb29566..bcd53d4 100644
--- a/target/hexagon/gen_tcg_hvx.h
+++ b/target/hexagon/gen_tcg_hvx.h
@@ -126,4 +126,35 @@ static inline void assert_vhist_tmp(DisasContext *ctx)
     } while (0)
 
 
+#define fGEN_TCG_V6_vassign(SHORTCODE) \
+    tcg_gen_gvec_mov(MO_64, VdV_off, VuV_off, \
+                     sizeof(MMVector), sizeof(MMVector))
+
+/* Vector conditional move */
+#define fGEN_TCG_VEC_CMOV(PRED) \
+    do { \
+        TCGv lsb = tcg_temp_new(); \
+        TCGLabel *false_label = gen_new_label(); \
+        TCGLabel *end_label = gen_new_label(); \
+        tcg_gen_andi_tl(lsb, PsV, 1); \
+        tcg_gen_brcondi_tl(TCG_COND_NE, lsb, PRED, false_label); \
+        tcg_temp_free(lsb); \
+        tcg_gen_gvec_mov(MO_64, VdV_off, VuV_off, \
+                         sizeof(MMVector), sizeof(MMVector)); \
+        tcg_gen_br(end_label); \
+        gen_set_label(false_label); \
+        tcg_gen_ori_tl(hex_slot_cancelled, hex_slot_cancelled, \
+                       1 << insn->slot); \
+        gen_set_label(end_label); \
+    } while (0)
+
+
+/* Vector conditional move (true) */
+#define fGEN_TCG_V6_vcmov(SHORTCODE) \
+    fGEN_TCG_VEC_CMOV(1)
+
+/* Vector conditional move (false) */
+#define fGEN_TCG_V6_vncmov(SHORTCODE) \
+    fGEN_TCG_VEC_CMOV(0)
+
 #endif
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 16/30] Hexagon HVX (target/hexagon) helper overrides - vector add & sub
  2021-09-20 21:23 [PATCH v3 00/30] Hexagon HVX (target/hexagon) patch series Taylor Simpson
                   ` (14 preceding siblings ...)
  2021-09-20 21:24 ` [PATCH v3 15/30] Hexagon HVX (target/hexagon) helper overrides - vector assign & cmov Taylor Simpson
@ 2021-09-20 21:24 ` Taylor Simpson
  2021-09-20 23:18   ` Richard Henderson
  2021-09-20 21:24 ` [PATCH v3 17/30] Hexagon HVX (target/hexagon) helper overrides - vector shifts Taylor Simpson
                   ` (13 subsequent siblings)
  29 siblings, 1 reply; 51+ messages in thread
From: Taylor Simpson @ 2021-09-20 21:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: ale, bcain, tsimpson, richard.henderson, f4bug

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/gen_tcg_hvx.h | 50 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

diff --git a/target/hexagon/gen_tcg_hvx.h b/target/hexagon/gen_tcg_hvx.h
index bcd53d4..c2db0ad 100644
--- a/target/hexagon/gen_tcg_hvx.h
+++ b/target/hexagon/gen_tcg_hvx.h
@@ -157,4 +157,54 @@ static inline void assert_vhist_tmp(DisasContext *ctx)
 #define fGEN_TCG_V6_vncmov(SHORTCODE) \
     fGEN_TCG_VEC_CMOV(0)
 
+/* Vector add - various forms */
+#define fGEN_TCG_V6_vaddb(SHORTCODE) \
+    tcg_gen_gvec_add(MO_8, VdV_off, VuV_off, VvV_off, \
+                     sizeof(MMVector), sizeof(MMVector))
+
+#define fGEN_TCG_V6_vaddh(SHORTCYDE) \
+    tcg_gen_gvec_add(MO_16, VdV_off, VuV_off, VvV_off, \
+                     sizeof(MMVector), sizeof(MMVector))
+
+#define fGEN_TCG_V6_vaddw(SHORTCODE) \
+    tcg_gen_gvec_add(MO_32, VdV_off, VuV_off, VvV_off, \
+                     sizeof(MMVector), sizeof(MMVector))
+
+#define fGEN_TCG_V6_vaddb_dv(SHORTCODE) \
+    tcg_gen_gvec_add(MO_8, VddV_off, VuuV_off, VvvV_off, \
+                     sizeof(MMVector) * 2, sizeof(MMVector) * 2)
+
+#define fGEN_TCG_V6_vaddh_dv(SHORTCYDE) \
+    tcg_gen_gvec_add(MO_16, VddV_off, VuuV_off, VvvV_off, \
+                     sizeof(MMVector) * 2, sizeof(MMVector) * 2)
+
+#define fGEN_TCG_V6_vaddw_dv(SHORTCODE) \
+    tcg_gen_gvec_add(MO_32, VddV_off, VuuV_off, VvvV_off, \
+                     sizeof(MMVector) * 2, sizeof(MMVector) * 2)
+
+/* Vector sub - various forms */
+#define fGEN_TCG_V6_vsubb(SHORTCODE) \
+    tcg_gen_gvec_sub(MO_8, VdV_off, VuV_off, VvV_off, \
+                     sizeof(MMVector), sizeof(MMVector))
+
+#define fGEN_TCG_V6_vsubh(SHORTCODE) \
+    tcg_gen_gvec_sub(MO_16, VdV_off, VuV_off, VvV_off, \
+                     sizeof(MMVector), sizeof(MMVector))
+
+#define fGEN_TCG_V6_vsubw(SHORTCODE) \
+    tcg_gen_gvec_sub(MO_32, VdV_off, VuV_off, VvV_off, \
+                     sizeof(MMVector), sizeof(MMVector))
+
+#define fGEN_TCG_V6_vsubb_dv(SHORTCODE) \
+    tcg_gen_gvec_sub(MO_8, VddV_off, VuuV_off, VvvV_off, \
+                     sizeof(MMVector) * 2, sizeof(MMVector) * 2)
+
+#define fGEN_TCG_V6_vsubh_dv(SHORTCODE) \
+    tcg_gen_gvec_sub(MO_16, VddV_off, VuuV_off, VvvV_off, \
+                     sizeof(MMVector) * 2, sizeof(MMVector) * 2)
+
+#define fGEN_TCG_V6_vsubw_dv(SHORTCODE) \
+    tcg_gen_gvec_sub(MO_32, VddV_off, VuuV_off, VvvV_off, \
+                     sizeof(MMVector) * 2, sizeof(MMVector) * 2)
+
 #endif
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 17/30] Hexagon HVX (target/hexagon) helper overrides - vector shifts
  2021-09-20 21:23 [PATCH v3 00/30] Hexagon HVX (target/hexagon) patch series Taylor Simpson
                   ` (15 preceding siblings ...)
  2021-09-20 21:24 ` [PATCH v3 16/30] Hexagon HVX (target/hexagon) helper overrides - vector add & sub Taylor Simpson
@ 2021-09-20 21:24 ` Taylor Simpson
  2021-09-20 23:20   ` Richard Henderson
  2021-09-20 21:24 ` [PATCH v3 18/30] Hexagon HVX (target/hexagon) helper overrides - vector max/min Taylor Simpson
                   ` (12 subsequent siblings)
  29 siblings, 1 reply; 51+ messages in thread
From: Taylor Simpson @ 2021-09-20 21:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: ale, bcain, tsimpson, richard.henderson, f4bug

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/gen_tcg_hvx.h | 122 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 122 insertions(+)

diff --git a/target/hexagon/gen_tcg_hvx.h b/target/hexagon/gen_tcg_hvx.h
index c2db0ad..a7748aa 100644
--- a/target/hexagon/gen_tcg_hvx.h
+++ b/target/hexagon/gen_tcg_hvx.h
@@ -207,4 +207,126 @@ static inline void assert_vhist_tmp(DisasContext *ctx)
     tcg_gen_gvec_sub(MO_32, VddV_off, VuuV_off, VvvV_off, \
                      sizeof(MMVector) * 2, sizeof(MMVector) * 2)
 
+/* Vector shift right - various forms */
+#define fGEN_TCG_V6_vasrh(SHORTCODE) \
+    do { \
+        TCGv shift = tcg_temp_new(); \
+        tcg_gen_andi_tl(shift, RtV, 15); \
+        tcg_gen_gvec_sars(MO_16, VdV_off, VuV_off, shift, \
+                          sizeof(MMVector), sizeof(MMVector)); \
+        tcg_temp_free(shift); \
+    } while (0)
+
+#define fGEN_TCG_V6_vasrh_acc(SHORTCODE) \
+    do { \
+        intptr_t tmpoff = offsetof(CPUHexagonState, vtmp); \
+        TCGv shift = tcg_temp_new(); \
+        tcg_gen_andi_tl(shift, RtV, 15); \
+        tcg_gen_gvec_sars(MO_16, tmpoff, VuV_off, shift, \
+                          sizeof(MMVector), sizeof(MMVector)); \
+        tcg_gen_gvec_add(MO_16, VxV_off, VxV_off, tmpoff, \
+                         sizeof(MMVector), sizeof(MMVector)); \
+        tcg_temp_free(shift); \
+    } while (0)
+
+#define fGEN_TCG_V6_vasrw(SHORTCODE) \
+    do { \
+        TCGv shift = tcg_temp_new(); \
+        tcg_gen_andi_tl(shift, RtV, 31); \
+        tcg_gen_gvec_sars(MO_32, VdV_off, VuV_off, shift, \
+                          sizeof(MMVector), sizeof(MMVector)); \
+        tcg_temp_free(shift); \
+    } while (0)
+
+#define fGEN_TCG_V6_vasrw_acc(SHORTCODE) \
+    do { \
+        intptr_t tmpoff = offsetof(CPUHexagonState, vtmp); \
+        TCGv shift = tcg_temp_new(); \
+        tcg_gen_andi_tl(shift, RtV, 31); \
+        tcg_gen_gvec_sars(MO_32, tmpoff, VuV_off, shift, \
+                          sizeof(MMVector), sizeof(MMVector)); \
+        tcg_gen_gvec_add(MO_32, VxV_off, VxV_off, tmpoff, \
+                          sizeof(MMVector), sizeof(MMVector)); \
+        tcg_temp_free(shift); \
+    } while (0)
+
+#define fGEN_TCG_V6_vlsrb(SHORTCODE) \
+    do { \
+        TCGv shift = tcg_temp_new(); \
+        tcg_gen_andi_tl(shift, RtV, 7); \
+        tcg_gen_gvec_shrs(MO_8, VdV_off, VuV_off, shift, \
+                          sizeof(MMVector), sizeof(MMVector)); \
+        tcg_temp_free(shift); \
+    } while (0)
+
+#define fGEN_TCG_V6_vlsrh(SHORTCODE) \
+    do { \
+        TCGv shift = tcg_temp_new(); \
+        tcg_gen_andi_tl(shift, RtV, 15); \
+        tcg_gen_gvec_shrs(MO_16, VdV_off, VuV_off, shift, \
+                          sizeof(MMVector), sizeof(MMVector)); \
+        tcg_temp_free(shift); \
+    } while (0)
+
+#define fGEN_TCG_V6_vlsrw(SHORTCODE) \
+    do { \
+        TCGv shift = tcg_temp_new(); \
+        tcg_gen_andi_tl(shift, RtV, 31); \
+        tcg_gen_gvec_shrs(MO_32, VdV_off, VuV_off, shift, \
+                          sizeof(MMVector), sizeof(MMVector)); \
+        tcg_temp_free(shift); \
+    } while (0)
+
+/* Vector shift left - various forms */
+#define fGEN_TCG_V6_vaslb(SHORTCODE) \
+    do { \
+        TCGv shift = tcg_temp_new(); \
+        tcg_gen_andi_tl(shift, RtV, 7); \
+        tcg_gen_gvec_shls(MO_8, VdV_off, VuV_off, shift, \
+                          sizeof(MMVector), sizeof(MMVector)); \
+        tcg_temp_free(shift); \
+    } while (0)
+
+#define fGEN_TCG_V6_vaslh(SHORTCODE) \
+    do { \
+        TCGv shift = tcg_temp_new(); \
+        tcg_gen_andi_tl(shift, RtV, 15); \
+        tcg_gen_gvec_shls(MO_16, VdV_off, VuV_off, shift, \
+                          sizeof(MMVector), sizeof(MMVector)); \
+        tcg_temp_free(shift); \
+    } while (0)
+
+#define fGEN_TCG_V6_vaslh_acc(SHORTCODE) \
+    do { \
+        intptr_t tmpoff = offsetof(CPUHexagonState, vtmp); \
+        TCGv shift = tcg_temp_new(); \
+        tcg_gen_andi_tl(shift, RtV, 15); \
+        tcg_gen_gvec_shls(MO_16, tmpoff, VuV_off, shift, \
+                          sizeof(MMVector), sizeof(MMVector)); \
+        tcg_gen_gvec_add(MO_16, VxV_off, VxV_off, tmpoff, \
+                         sizeof(MMVector), sizeof(MMVector)); \
+        tcg_temp_free(shift); \
+    } while (0)
+
+#define fGEN_TCG_V6_vaslw(SHORTCODE) \
+    do { \
+        TCGv shift = tcg_temp_new(); \
+        tcg_gen_andi_tl(shift, RtV, 31); \
+        tcg_gen_gvec_shls(MO_32, VdV_off, VuV_off, shift, \
+                          sizeof(MMVector), sizeof(MMVector)); \
+        tcg_temp_free(shift); \
+    } while (0)
+
+#define fGEN_TCG_V6_vaslw_acc(SHORTCODE) \
+    do { \
+        intptr_t tmpoff = offsetof(CPUHexagonState, vtmp); \
+        TCGv shift = tcg_temp_new(); \
+        tcg_gen_andi_tl(shift, RtV, 31); \
+        tcg_gen_gvec_shls(MO_32, tmpoff, VuV_off, shift, \
+                          sizeof(MMVector), sizeof(MMVector)); \
+        tcg_gen_gvec_add(MO_32, VxV_off, VxV_off, tmpoff, \
+                         sizeof(MMVector), sizeof(MMVector)); \
+        tcg_temp_free(shift); \
+    } while (0)
+
 #endif
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 18/30] Hexagon HVX (target/hexagon) helper overrides - vector max/min
  2021-09-20 21:23 [PATCH v3 00/30] Hexagon HVX (target/hexagon) patch series Taylor Simpson
                   ` (16 preceding siblings ...)
  2021-09-20 21:24 ` [PATCH v3 17/30] Hexagon HVX (target/hexagon) helper overrides - vector shifts Taylor Simpson
@ 2021-09-20 21:24 ` Taylor Simpson
  2021-09-20 23:20   ` Richard Henderson
  2021-09-20 21:24 ` [PATCH v3 19/30] Hexagon HVX (target/hexagon) helper overrides - vector logical ops Taylor Simpson
                   ` (11 subsequent siblings)
  29 siblings, 1 reply; 51+ messages in thread
From: Taylor Simpson @ 2021-09-20 21:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: ale, bcain, tsimpson, richard.henderson, f4bug

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/gen_tcg_hvx.h | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/target/hexagon/gen_tcg_hvx.h b/target/hexagon/gen_tcg_hvx.h
index a7748aa..006ba74 100644
--- a/target/hexagon/gen_tcg_hvx.h
+++ b/target/hexagon/gen_tcg_hvx.h
@@ -329,4 +329,38 @@ static inline void assert_vhist_tmp(DisasContext *ctx)
         tcg_temp_free(shift); \
     } while (0)
 
+/* Vector max - various forms */
+#define fGEN_TCG_V6_vmaxw(SHORTCODE) \
+    tcg_gen_gvec_smax(MO_32, VdV_off, VuV_off, VvV_off, \
+                      sizeof(MMVector), sizeof(MMVector))
+#define fGEN_TCG_V6_vmaxh(SHORTCODE) \
+    tcg_gen_gvec_smax(MO_16, VdV_off, VuV_off, VvV_off, \
+                      sizeof(MMVector), sizeof(MMVector))
+#define fGEN_TCG_V6_vmaxuh(SHORTCODE) \
+    tcg_gen_gvec_umax(MO_16, VdV_off, VuV_off, VvV_off, \
+                      sizeof(MMVector), sizeof(MMVector))
+#define fGEN_TCG_V6_vmaxb(SHORTCODE) \
+    tcg_gen_gvec_smax(MO_8, VdV_off, VuV_off, VvV_off, \
+                      sizeof(MMVector), sizeof(MMVector))
+#define fGEN_TCG_V6_vmaxub(SHORTCODE) \
+    tcg_gen_gvec_umax(MO_8, VdV_off, VuV_off, VvV_off, \
+                      sizeof(MMVector), sizeof(MMVector))
+
+/* Vector min - various forms */
+#define fGEN_TCG_V6_vminw(SHORTCODE) \
+    tcg_gen_gvec_smin(MO_32, VdV_off, VuV_off, VvV_off, \
+                      sizeof(MMVector), sizeof(MMVector))
+#define fGEN_TCG_V6_vminh(SHORTCODE) \
+    tcg_gen_gvec_smin(MO_16, VdV_off, VuV_off, VvV_off, \
+                      sizeof(MMVector), sizeof(MMVector))
+#define fGEN_TCG_V6_vminuh(SHORTCODE) \
+    tcg_gen_gvec_umin(MO_16, VdV_off, VuV_off, VvV_off, \
+                      sizeof(MMVector), sizeof(MMVector))
+#define fGEN_TCG_V6_vminb(SHORTCODE) \
+    tcg_gen_gvec_smin(MO_8, VdV_off, VuV_off, VvV_off, \
+                      sizeof(MMVector), sizeof(MMVector))
+#define fGEN_TCG_V6_vminub(SHORTCODE) \
+    tcg_gen_gvec_umin(MO_8, VdV_off, VuV_off, VvV_off, \
+                      sizeof(MMVector), sizeof(MMVector))
+
 #endif
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 19/30] Hexagon HVX (target/hexagon) helper overrides - vector logical ops
  2021-09-20 21:23 [PATCH v3 00/30] Hexagon HVX (target/hexagon) patch series Taylor Simpson
                   ` (17 preceding siblings ...)
  2021-09-20 21:24 ` [PATCH v3 18/30] Hexagon HVX (target/hexagon) helper overrides - vector max/min Taylor Simpson
@ 2021-09-20 21:24 ` Taylor Simpson
  2021-09-20 23:22   ` Richard Henderson
  2021-09-20 21:24 ` [PATCH v3 20/30] Hexagon HVX (target/hexagon) helper overrides - vector compares Taylor Simpson
                   ` (10 subsequent siblings)
  29 siblings, 1 reply; 51+ messages in thread
From: Taylor Simpson @ 2021-09-20 21:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: ale, bcain, tsimpson, richard.henderson, f4bug

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/gen_tcg_hvx.h | 52 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/target/hexagon/gen_tcg_hvx.h b/target/hexagon/gen_tcg_hvx.h
index 006ba74..bd0abc6 100644
--- a/target/hexagon/gen_tcg_hvx.h
+++ b/target/hexagon/gen_tcg_hvx.h
@@ -363,4 +363,56 @@ static inline void assert_vhist_tmp(DisasContext *ctx)
     tcg_gen_gvec_umin(MO_8, VdV_off, VuV_off, VvV_off, \
                       sizeof(MMVector), sizeof(MMVector))
 
+/* Vector logical ops */
+#define fGEN_TCG_V6_vxor(SHORTCODE) \
+    tcg_gen_gvec_xor(MO_64, VdV_off, VuV_off, VvV_off, \
+                     sizeof(MMVector), sizeof(MMVector))
+
+#define fGEN_TCG_V6_vand(SHORTCODE) \
+    tcg_gen_gvec_and(MO_64, VdV_off, VuV_off, VvV_off, \
+                     sizeof(MMVector), sizeof(MMVector))
+
+#define fGEN_TCG_V6_vor(SHORTCODE) \
+    tcg_gen_gvec_or(MO_64, VdV_off, VuV_off, VvV_off, \
+                    sizeof(MMVector), sizeof(MMVector))
+
+#define fGEN_TCG_V6_vnot(SHORTCODE) \
+    tcg_gen_gvec_not(MO_64, VdV_off, VuV_off, \
+                     sizeof(MMVector), sizeof(MMVector))
+
+/* Q register logical ops */
+#define fGEN_TCG_V6_pred_or(SHORTCODE) \
+    tcg_gen_gvec_or(MO_64, QdV_off, QsV_off, QtV_off, \
+                    sizeof(MMQReg), sizeof(MMQReg))
+
+#define fGEN_TCG_V6_pred_and(SHORTCODE) \
+    tcg_gen_gvec_and(MO_64, QdV_off, QsV_off, QtV_off, \
+                     sizeof(MMQReg), sizeof(MMQReg))
+
+#define fGEN_TCG_V6_pred_xor(SHORTCODE) \
+    tcg_gen_gvec_xor(MO_64, QdV_off, QsV_off, QtV_off, \
+                     sizeof(MMQReg), sizeof(MMQReg))
+
+#define fGEN_TCG_V6_pred_or_n(SHORTCODE) \
+    do { \
+        intptr_t tmpoff = offsetof(CPUHexagonState, qtmp); \
+        tcg_gen_gvec_not(MO_64, tmpoff, QtV_off, \
+                         sizeof(MMQReg), sizeof(MMQReg)); \
+        tcg_gen_gvec_or(MO_64, QdV_off, QsV_off, tmpoff, \
+                        sizeof(MMQReg), sizeof(MMQReg)); \
+    } while (0)
+
+#define fGEN_TCG_V6_pred_and_n(SHORTCODE) \
+    do { \
+        intptr_t tmpoff = offsetof(CPUHexagonState, qtmp); \
+        tcg_gen_gvec_not(MO_64, tmpoff, QtV_off, \
+                         sizeof(MMQReg), sizeof(MMQReg)); \
+        tcg_gen_gvec_and(MO_64, QdV_off, QsV_off, tmpoff, \
+                         sizeof(MMQReg), sizeof(MMQReg)); \
+    } while (0)
+
+#define fGEN_TCG_V6_pred_not(SHORTCODE) \
+    tcg_gen_gvec_not(MO_64, QdV_off, QsV_off, \
+                     sizeof(MMQReg), sizeof(MMQReg))
+
 #endif
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 20/30] Hexagon HVX (target/hexagon) helper overrides - vector compares
  2021-09-20 21:23 [PATCH v3 00/30] Hexagon HVX (target/hexagon) patch series Taylor Simpson
                   ` (18 preceding siblings ...)
  2021-09-20 21:24 ` [PATCH v3 19/30] Hexagon HVX (target/hexagon) helper overrides - vector logical ops Taylor Simpson
@ 2021-09-20 21:24 ` Taylor Simpson
  2021-09-20 23:23   ` Richard Henderson
  2021-09-20 21:24 ` [PATCH v3 21/30] Hexagon HVX (target/hexagon) helper overrides - vector splat and abs Taylor Simpson
                   ` (9 subsequent siblings)
  29 siblings, 1 reply; 51+ messages in thread
From: Taylor Simpson @ 2021-09-20 21:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: ale, bcain, tsimpson, richard.henderson, f4bug

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/gen_tcg_hvx.h | 103 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 103 insertions(+)

diff --git a/target/hexagon/gen_tcg_hvx.h b/target/hexagon/gen_tcg_hvx.h
index bd0abc6..aa38398 100644
--- a/target/hexagon/gen_tcg_hvx.h
+++ b/target/hexagon/gen_tcg_hvx.h
@@ -415,4 +415,107 @@ static inline void assert_vhist_tmp(DisasContext *ctx)
     tcg_gen_gvec_not(MO_64, QdV_off, QsV_off, \
                      sizeof(MMQReg), sizeof(MMQReg))
 
+/* Vector compares */
+#define fGEN_TCG_VEC_CMP(COND, TYPE, SIZE) \
+    do { \
+        intptr_t tmpoff = offsetof(CPUHexagonState, vtmp); \
+        tcg_gen_gvec_cmp(COND, TYPE, tmpoff, VuV_off, VvV_off, \
+                         sizeof(MMVector), sizeof(MMVector)); \
+        vec_to_qvec(SIZE, QdV_off, tmpoff); \
+    } while (0)
+
+#define fGEN_TCG_V6_vgtw(SHORTCODE) \
+    fGEN_TCG_VEC_CMP(TCG_COND_GT, MO_32, 4)
+#define fGEN_TCG_V6_vgth(SHORTCODE) \
+    fGEN_TCG_VEC_CMP(TCG_COND_GT, MO_16, 2)
+#define fGEN_TCG_V6_vgtb(SHORTCODE) \
+    fGEN_TCG_VEC_CMP(TCG_COND_GT, MO_8, 1)
+
+#define fGEN_TCG_V6_vgtuw(SHORTCODE) \
+    fGEN_TCG_VEC_CMP(TCG_COND_GTU, MO_32, 4)
+#define fGEN_TCG_V6_vgtuh(SHORTCODE) \
+    fGEN_TCG_VEC_CMP(TCG_COND_GTU, MO_16, 2)
+#define fGEN_TCG_V6_vgtub(SHORTCODE) \
+    fGEN_TCG_VEC_CMP(TCG_COND_GTU, MO_8, 1)
+
+#define fGEN_TCG_V6_veqw(SHORTCODE) \
+    fGEN_TCG_VEC_CMP(TCG_COND_EQ, MO_32, 4)
+#define fGEN_TCG_V6_veqh(SHORTCODE) \
+    fGEN_TCG_VEC_CMP(TCG_COND_EQ, MO_16, 2)
+#define fGEN_TCG_V6_veqb(SHORTCODE) \
+    fGEN_TCG_VEC_CMP(TCG_COND_EQ, MO_8, 1)
+
+#define fGEN_TCG_VEC_CMP_OP(COND, TYPE, SIZE, OP) \
+    do { \
+        intptr_t tmpoff = offsetof(CPUHexagonState, vtmp); \
+        intptr_t qoff = offsetof(CPUHexagonState, qtmp); \
+        tcg_gen_gvec_cmp(COND, TYPE, tmpoff, VuV_off, VvV_off, \
+                         sizeof(MMVector), sizeof(MMVector)); \
+        vec_to_qvec(SIZE, qoff, tmpoff); \
+        OP(MO_64, QxV_off, QxV_off, qoff, sizeof(MMQReg), sizeof(MMQReg)); \
+    } while (0)
+
+#define fGEN_TCG_V6_vgtw_and(SHORTCODE) \
+    fGEN_TCG_VEC_CMP_OP(TCG_COND_GT, MO_32, 4, tcg_gen_gvec_and)
+#define fGEN_TCG_V6_vgtw_or(SHORTCODE) \
+    fGEN_TCG_VEC_CMP_OP(TCG_COND_GT, MO_32, 4, tcg_gen_gvec_or)
+#define fGEN_TCG_V6_vgtw_xor(SHORTCODE) \
+    fGEN_TCG_VEC_CMP_OP(TCG_COND_GT, MO_32, 4, tcg_gen_gvec_xor)
+
+#define fGEN_TCG_V6_vgtuw_and(SHORTCODE) \
+    fGEN_TCG_VEC_CMP_OP(TCG_COND_GTU, MO_32, 4, tcg_gen_gvec_and)
+#define fGEN_TCG_V6_vgtuw_or(SHORTCODE) \
+    fGEN_TCG_VEC_CMP_OP(TCG_COND_GTU, MO_32, 4, tcg_gen_gvec_or)
+#define fGEN_TCG_V6_vgtuw_xor(SHORTCODE) \
+    fGEN_TCG_VEC_CMP_OP(TCG_COND_GTU, MO_32, 4, tcg_gen_gvec_xor)
+
+#define fGEN_TCG_V6_vgth_and(SHORTCODE) \
+    fGEN_TCG_VEC_CMP_OP(TCG_COND_GT, MO_16, 2, tcg_gen_gvec_and)
+#define fGEN_TCG_V6_vgth_or(SHORTCODE) \
+    fGEN_TCG_VEC_CMP_OP(TCG_COND_GT, MO_16, 2, tcg_gen_gvec_or)
+#define fGEN_TCG_V6_vgth_xor(SHORTCODE) \
+    fGEN_TCG_VEC_CMP_OP(TCG_COND_GT, MO_16, 2, tcg_gen_gvec_xor)
+
+#define fGEN_TCG_V6_vgtuh_and(SHORTCODE) \
+    fGEN_TCG_VEC_CMP_OP(TCG_COND_GTU, MO_16, 2, tcg_gen_gvec_and)
+#define fGEN_TCG_V6_vgtuh_or(SHORTCODE) \
+    fGEN_TCG_VEC_CMP_OP(TCG_COND_GTU, MO_16, 2, tcg_gen_gvec_or)
+#define fGEN_TCG_V6_vgtuh_xor(SHORTCODE) \
+    fGEN_TCG_VEC_CMP_OP(TCG_COND_GTU, MO_16, 2, tcg_gen_gvec_xor)
+
+#define fGEN_TCG_V6_vgtb_and(SHORTCODE) \
+    fGEN_TCG_VEC_CMP_OP(TCG_COND_GT, MO_8, 1, tcg_gen_gvec_and)
+#define fGEN_TCG_V6_vgtb_or(SHORTCODE) \
+    fGEN_TCG_VEC_CMP_OP(TCG_COND_GT, MO_8, 1, tcg_gen_gvec_or)
+#define fGEN_TCG_V6_vgtb_xor(SHORTCODE) \
+    fGEN_TCG_VEC_CMP_OP(TCG_COND_GT, MO_8, 1, tcg_gen_gvec_xor)
+
+#define fGEN_TCG_V6_vgtub_and(SHORTCODE) \
+    fGEN_TCG_VEC_CMP_OP(TCG_COND_GTU, MO_8, 1, tcg_gen_gvec_and)
+#define fGEN_TCG_V6_vgtub_or(SHORTCODE) \
+    fGEN_TCG_VEC_CMP_OP(TCG_COND_GTU, MO_8, 1, tcg_gen_gvec_or)
+#define fGEN_TCG_V6_vgtub_xor(SHORTCODE) \
+    fGEN_TCG_VEC_CMP_OP(TCG_COND_GTU, MO_8, 1, tcg_gen_gvec_xor)
+
+#define fGEN_TCG_V6_veqw_and(SHORTCODE) \
+    fGEN_TCG_VEC_CMP_OP(TCG_COND_EQ, MO_32, 4, tcg_gen_gvec_and)
+#define fGEN_TCG_V6_veqw_or(SHORTCODE) \
+    fGEN_TCG_VEC_CMP_OP(TCG_COND_EQ, MO_32, 4, tcg_gen_gvec_or)
+#define fGEN_TCG_V6_veqw_xor(SHORTCODE) \
+    fGEN_TCG_VEC_CMP_OP(TCG_COND_EQ, MO_32, 4, tcg_gen_gvec_xor)
+
+#define fGEN_TCG_V6_veqh_and(SHORTCODE) \
+    fGEN_TCG_VEC_CMP_OP(TCG_COND_EQ, MO_16, 2, tcg_gen_gvec_and)
+#define fGEN_TCG_V6_veqh_or(SHORTCODE) \
+    fGEN_TCG_VEC_CMP_OP(TCG_COND_EQ, MO_16, 2, tcg_gen_gvec_or)
+#define fGEN_TCG_V6_veqh_xor(SHORTCODE) \
+    fGEN_TCG_VEC_CMP_OP(TCG_COND_EQ, MO_16, 2, tcg_gen_gvec_xor)
+
+#define fGEN_TCG_V6_veqb_and(SHORTCODE) \
+    fGEN_TCG_VEC_CMP_OP(TCG_COND_EQ, MO_8, 1, tcg_gen_gvec_and)
+#define fGEN_TCG_V6_veqb_or(SHORTCODE) \
+    fGEN_TCG_VEC_CMP_OP(TCG_COND_EQ, MO_8, 1, tcg_gen_gvec_or)
+#define fGEN_TCG_V6_veqb_xor(SHORTCODE) \
+    fGEN_TCG_VEC_CMP_OP(TCG_COND_EQ, MO_8, 1, tcg_gen_gvec_xor)
+
 #endif
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 21/30] Hexagon HVX (target/hexagon) helper overrides - vector splat and abs
  2021-09-20 21:23 [PATCH v3 00/30] Hexagon HVX (target/hexagon) patch series Taylor Simpson
                   ` (19 preceding siblings ...)
  2021-09-20 21:24 ` [PATCH v3 20/30] Hexagon HVX (target/hexagon) helper overrides - vector compares Taylor Simpson
@ 2021-09-20 21:24 ` Taylor Simpson
  2021-09-20 23:24   ` Richard Henderson
  2021-09-20 21:24 ` [PATCH v3 22/30] Hexagon HVX (target/hexagon) helper overrides - vector loads Taylor Simpson
                   ` (8 subsequent siblings)
  29 siblings, 1 reply; 51+ messages in thread
From: Taylor Simpson @ 2021-09-20 21:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: ale, bcain, tsimpson, richard.henderson, f4bug

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/gen_tcg_hvx.h | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/target/hexagon/gen_tcg_hvx.h b/target/hexagon/gen_tcg_hvx.h
index aa38398..e10e410 100644
--- a/target/hexagon/gen_tcg_hvx.h
+++ b/target/hexagon/gen_tcg_hvx.h
@@ -518,4 +518,30 @@ static inline void assert_vhist_tmp(DisasContext *ctx)
 #define fGEN_TCG_V6_veqb_xor(SHORTCODE) \
     fGEN_TCG_VEC_CMP_OP(TCG_COND_EQ, MO_8, 1, tcg_gen_gvec_xor)
 
+/* Vector splat - various forms */
+#define fGEN_TCG_V6_lvsplatw(SHORTCODE) \
+    tcg_gen_gvec_dup_i32(MO_32, VdV_off, \
+                         sizeof(MMVector), sizeof(MMVector), RtV)
+
+#define fGEN_TCG_V6_lvsplath(SHORTCODE) \
+    tcg_gen_gvec_dup_i32(MO_16, VdV_off, \
+                         sizeof(MMVector), sizeof(MMVector), RtV)
+
+#define fGEN_TCG_V6_lvsplatb(SHORTCODE) \
+    tcg_gen_gvec_dup_i32(MO_8, VdV_off, \
+                         sizeof(MMVector), sizeof(MMVector), RtV)
+
+/* Vector absolute value - various forms */
+#define fGEN_TCG_V6_vabsb(SHORTCODE) \
+    tcg_gen_gvec_abs(MO_8, VdV_off, VuV_off, \
+                     sizeof(MMVector), sizeof(MMVector))
+
+#define fGEN_TCG_V6_vabsh(SHORTCODE) \
+    tcg_gen_gvec_abs(MO_16, VdV_off, VuV_off, \
+                     sizeof(MMVector), sizeof(MMVector))
+
+#define fGEN_TCG_V6_vabsw(SHORTCODE) \
+    tcg_gen_gvec_abs(MO_32, VdV_off, VuV_off, \
+                     sizeof(MMVector), sizeof(MMVector))
+
 #endif
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 22/30] Hexagon HVX (target/hexagon) helper overrides - vector loads
  2021-09-20 21:23 [PATCH v3 00/30] Hexagon HVX (target/hexagon) patch series Taylor Simpson
                   ` (20 preceding siblings ...)
  2021-09-20 21:24 ` [PATCH v3 21/30] Hexagon HVX (target/hexagon) helper overrides - vector splat and abs Taylor Simpson
@ 2021-09-20 21:24 ` Taylor Simpson
  2021-09-20 23:26   ` Richard Henderson
  2021-09-20 21:24 ` [PATCH v3 23/30] Hexagon HVX (target/hexagon) helper overrides - vector stores Taylor Simpson
                   ` (7 subsequent siblings)
  29 siblings, 1 reply; 51+ messages in thread
From: Taylor Simpson @ 2021-09-20 21:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: ale, bcain, tsimpson, richard.henderson, f4bug

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/gen_tcg_hvx.h | 150 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 150 insertions(+)

diff --git a/target/hexagon/gen_tcg_hvx.h b/target/hexagon/gen_tcg_hvx.h
index e10e410..76fb0cc 100644
--- a/target/hexagon/gen_tcg_hvx.h
+++ b/target/hexagon/gen_tcg_hvx.h
@@ -544,4 +544,154 @@ static inline void assert_vhist_tmp(DisasContext *ctx)
     tcg_gen_gvec_abs(MO_32, VdV_off, VuV_off, \
                      sizeof(MMVector), sizeof(MMVector))
 
+/* Vector loads */
+#define fGEN_TCG_V6_vL32b_pi(SHORTCODE)                    SHORTCODE
+#define fGEN_TCG_V6_vL32Ub_pi(SHORTCODE)                   SHORTCODE
+#define fGEN_TCG_V6_vL32b_cur_pi(SHORTCODE)                SHORTCODE
+#define fGEN_TCG_V6_vL32b_tmp_pi(SHORTCODE)                SHORTCODE
+#define fGEN_TCG_V6_vL32b_nt_pi(SHORTCODE)                 SHORTCODE
+#define fGEN_TCG_V6_vL32b_nt_cur_pi(SHORTCODE)             SHORTCODE
+#define fGEN_TCG_V6_vL32b_nt_tmp_pi(SHORTCODE)             SHORTCODE
+#define fGEN_TCG_V6_vL32b_ai(SHORTCODE)                    SHORTCODE
+#define fGEN_TCG_V6_vL32Ub_ai(SHORTCODE)                   SHORTCODE
+#define fGEN_TCG_V6_vL32b_cur_ai(SHORTCODE)                SHORTCODE
+#define fGEN_TCG_V6_vL32b_tmp_ai(SHORTCODE)                SHORTCODE
+#define fGEN_TCG_V6_vL32b_nt_ai(SHORTCODE)                 SHORTCODE
+#define fGEN_TCG_V6_vL32b_nt_cur_ai(SHORTCODE)             SHORTCODE
+#define fGEN_TCG_V6_vL32b_nt_tmp_ai(SHORTCODE)             SHORTCODE
+#define fGEN_TCG_V6_vL32b_ppu(SHORTCODE)                   SHORTCODE
+#define fGEN_TCG_V6_vL32Ub_ppu(SHORTCODE)                  SHORTCODE
+#define fGEN_TCG_V6_vL32b_cur_ppu(SHORTCODE)               SHORTCODE
+#define fGEN_TCG_V6_vL32b_tmp_ppu(SHORTCODE)               SHORTCODE
+#define fGEN_TCG_V6_vL32b_nt_ppu(SHORTCODE)                SHORTCODE
+#define fGEN_TCG_V6_vL32b_nt_cur_ppu(SHORTCODE)            SHORTCODE
+#define fGEN_TCG_V6_vL32b_nt_tmp_ppu(SHORTCODE)            SHORTCODE
+
+/* Predicated vector loads */
+#define fGEN_TCG_PRED_VEC_LOAD(GET_EA, PRED, DSTOFF, INC) \
+    do { \
+        TCGv LSB = tcg_temp_new(); \
+        TCGLabel *false_label = gen_new_label(); \
+        TCGLabel *end_label = gen_new_label(); \
+        GET_EA; \
+        PRED; \
+        tcg_gen_brcondi_tl(TCG_COND_EQ, LSB, 0, false_label); \
+        tcg_temp_free(LSB); \
+        gen_vreg_load(ctx, DSTOFF, EA, true); \
+        INC; \
+        tcg_gen_br(end_label); \
+        gen_set_label(false_label); \
+        tcg_gen_ori_tl(hex_slot_cancelled, hex_slot_cancelled, \
+                       1 << insn->slot); \
+        gen_set_label(end_label); \
+    } while (0)
+
+#define fGEN_TCG_PRED_VEC_LOAD_pred_pi \
+    fGEN_TCG_PRED_VEC_LOAD(fLSBOLD(PvV), \
+                           fEA_REG(RxV), \
+                           VdV_off, \
+                           fPM_I(RxV, siV * sizeof(MMVector)))
+#define fGEN_TCG_PRED_VEC_LOAD_npred_pi \
+    fGEN_TCG_PRED_VEC_LOAD(fLSBOLDNOT(PvV), \
+                           fEA_REG(RxV), \
+                           VdV_off, \
+                           fPM_I(RxV, siV * sizeof(MMVector)))
+
+#define fGEN_TCG_V6_vL32b_pred_pi(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_LOAD_pred_pi
+#define fGEN_TCG_V6_vL32b_npred_pi(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_LOAD_npred_pi
+#define fGEN_TCG_V6_vL32b_cur_pred_pi(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_LOAD_pred_pi
+#define fGEN_TCG_V6_vL32b_cur_npred_pi(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_LOAD_npred_pi
+#define fGEN_TCG_V6_vL32b_tmp_pred_pi(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_LOAD_pred_pi
+#define fGEN_TCG_V6_vL32b_tmp_npred_pi(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_LOAD_npred_pi
+#define fGEN_TCG_V6_vL32b_nt_pred_pi(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_LOAD_pred_pi
+#define fGEN_TCG_V6_vL32b_nt_npred_pi(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_LOAD_npred_pi
+#define fGEN_TCG_V6_vL32b_nt_cur_pred_pi(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_LOAD_pred_pi
+#define fGEN_TCG_V6_vL32b_nt_cur_npred_pi(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_LOAD_npred_pi
+#define fGEN_TCG_V6_vL32b_nt_tmp_pred_pi(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_LOAD_pred_pi
+#define fGEN_TCG_V6_vL32b_nt_tmp_npred_pi(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_LOAD_npred_pi
+
+#define fGEN_TCG_PRED_VEC_LOAD_pred_ai \
+    fGEN_TCG_PRED_VEC_LOAD(fLSBOLD(PvV), \
+                           fEA_RI(RtV, siV * sizeof(MMVector)), \
+                           VdV_off, \
+                           do {} while (0))
+#define fGEN_TCG_PRED_VEC_LOAD_npred_ai \
+    fGEN_TCG_PRED_VEC_LOAD(fLSBOLDNOT(PvV), \
+                           fEA_RI(RtV, siV * sizeof(MMVector)), \
+                           VdV_off, \
+                           do {} while (0))
+
+#define fGEN_TCG_V6_vL32b_pred_ai(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_LOAD_pred_ai
+#define fGEN_TCG_V6_vL32b_npred_ai(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_LOAD_npred_ai
+#define fGEN_TCG_V6_vL32b_cur_pred_ai(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_LOAD_pred_ai
+#define fGEN_TCG_V6_vL32b_cur_npred_ai(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_LOAD_npred_ai
+#define fGEN_TCG_V6_vL32b_tmp_pred_ai(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_LOAD_pred_ai
+#define fGEN_TCG_V6_vL32b_tmp_npred_ai(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_LOAD_npred_ai
+#define fGEN_TCG_V6_vL32b_nt_pred_ai(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_LOAD_pred_ai
+#define fGEN_TCG_V6_vL32b_nt_npred_ai(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_LOAD_npred_ai
+#define fGEN_TCG_V6_vL32b_nt_cur_pred_ai(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_LOAD_pred_ai
+#define fGEN_TCG_V6_vL32b_nt_cur_npred_ai(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_LOAD_npred_ai
+#define fGEN_TCG_V6_vL32b_nt_tmp_pred_ai(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_LOAD_pred_ai
+#define fGEN_TCG_V6_vL32b_nt_tmp_npred_ai(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_LOAD_npred_ai
+
+#define fGEN_TCG_PRED_VEC_LOAD_pred_ppu \
+    fGEN_TCG_PRED_VEC_LOAD(fLSBOLD(PvV), \
+                           fEA_REG(RxV), \
+                           VdV_off, \
+                           fPM_M(RxV, MuV))
+#define fGEN_TCG_PRED_VEC_LOAD_npred_ppu \
+    fGEN_TCG_PRED_VEC_LOAD(fLSBOLDNOT(PvV), \
+                           fEA_REG(RxV), \
+                           VdV_off, \
+                           fPM_M(RxV, MuV))
+
+#define fGEN_TCG_V6_vL32b_pred_ppu(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_LOAD_pred_ppu
+#define fGEN_TCG_V6_vL32b_npred_ppu(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_LOAD_npred_ppu
+#define fGEN_TCG_V6_vL32b_cur_pred_ppu(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_LOAD_pred_ppu
+#define fGEN_TCG_V6_vL32b_cur_npred_ppu(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_LOAD_npred_ppu
+#define fGEN_TCG_V6_vL32b_tmp_pred_ppu(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_LOAD_pred_ppu
+#define fGEN_TCG_V6_vL32b_tmp_npred_ppu(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_LOAD_npred_ppu
+#define fGEN_TCG_V6_vL32b_nt_pred_ppu(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_LOAD_pred_ppu
+#define fGEN_TCG_V6_vL32b_nt_npred_ppu(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_LOAD_npred_ppu
+#define fGEN_TCG_V6_vL32b_nt_cur_pred_ppu(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_LOAD_pred_ppu
+#define fGEN_TCG_V6_vL32b_nt_cur_npred_ppu(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_LOAD_npred_ppu
+#define fGEN_TCG_V6_vL32b_nt_tmp_pred_ppu(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_LOAD_pred_ppu
+#define fGEN_TCG_V6_vL32b_nt_tmp_npred_ppu(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_LOAD_npred_ppu
+
 #endif
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 23/30] Hexagon HVX (target/hexagon) helper overrides - vector stores
  2021-09-20 21:23 [PATCH v3 00/30] Hexagon HVX (target/hexagon) patch series Taylor Simpson
                   ` (21 preceding siblings ...)
  2021-09-20 21:24 ` [PATCH v3 22/30] Hexagon HVX (target/hexagon) helper overrides - vector loads Taylor Simpson
@ 2021-09-20 21:24 ` Taylor Simpson
  2021-09-20 23:27   ` Richard Henderson
  2021-09-20 21:24 ` [PATCH v3 24/30] Hexagon HVX (target/hexagon) import semantics Taylor Simpson
                   ` (6 subsequent siblings)
  29 siblings, 1 reply; 51+ messages in thread
From: Taylor Simpson @ 2021-09-20 21:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: ale, bcain, tsimpson, richard.henderson, f4bug

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/gen_tcg_hvx.h | 218 +++++++++++++++++++++++++++++++++++++++++++
 target/hexagon/helper.h      |   1 +
 target/hexagon/op_helper.c   |   5 +
 3 files changed, 224 insertions(+)

diff --git a/target/hexagon/gen_tcg_hvx.h b/target/hexagon/gen_tcg_hvx.h
index 76fb0cc..9dba29d 100644
--- a/target/hexagon/gen_tcg_hvx.h
+++ b/target/hexagon/gen_tcg_hvx.h
@@ -694,4 +694,222 @@ static inline void assert_vhist_tmp(DisasContext *ctx)
 #define fGEN_TCG_V6_vL32b_nt_tmp_npred_ppu(SHORTCODE) \
     fGEN_TCG_PRED_VEC_LOAD_npred_ppu
 
+/* Vector stores */
+#define fGEN_TCG_V6_vS32b_pi(SHORTCODE)                    SHORTCODE
+#define fGEN_TCG_V6_vS32Ub_pi(SHORTCODE)                   SHORTCODE
+#define fGEN_TCG_V6_vS32b_nt_pi(SHORTCODE)                 SHORTCODE
+#define fGEN_TCG_V6_vS32b_ai(SHORTCODE)                    SHORTCODE
+#define fGEN_TCG_V6_vS32Ub_ai(SHORTCODE)                   SHORTCODE
+#define fGEN_TCG_V6_vS32b_nt_ai(SHORTCODE)                 SHORTCODE
+#define fGEN_TCG_V6_vS32b_ppu(SHORTCODE)                   SHORTCODE
+#define fGEN_TCG_V6_vS32Ub_ppu(SHORTCODE)                  SHORTCODE
+#define fGEN_TCG_V6_vS32b_nt_ppu(SHORTCODE)                SHORTCODE
+
+/* New value vector stores */
+#define fGEN_TCG_NEWVAL_VEC_STORE(GET_EA, INC) \
+    do { \
+        GET_EA; \
+        gen_vreg_store(ctx, insn, pkt, EA, OsN_off, insn->slot, true); \
+        INC; \
+    } while (0)
+
+#define fGEN_TCG_NEWVAL_VEC_STORE_pi \
+    fGEN_TCG_NEWVAL_VEC_STORE(fEA_REG(RxV), fPM_I(RxV, siV * sizeof(MMVector)))
+
+#define fGEN_TCG_V6_vS32b_new_pi(SHORTCODE) \
+    fGEN_TCG_NEWVAL_VEC_STORE_pi
+#define fGEN_TCG_V6_vS32b_nt_new_pi(SHORTCODE) \
+    fGEN_TCG_NEWVAL_VEC_STORE_pi
+
+#define fGEN_TCG_NEWVAL_VEC_STORE_ai \
+    fGEN_TCG_NEWVAL_VEC_STORE(fEA_RI(RtV, siV * sizeof(MMVector)), \
+                              do { } while (0))
+
+#define fGEN_TCG_V6_vS32b_new_ai(SHORTCODE) \
+    fGEN_TCG_NEWVAL_VEC_STORE_ai
+#define fGEN_TCG_V6_vS32b_nt_new_ai(SHORTCODE) \
+    fGEN_TCG_NEWVAL_VEC_STORE_ai
+
+#define fGEN_TCG_NEWVAL_VEC_STORE_ppu \
+    fGEN_TCG_NEWVAL_VEC_STORE(fEA_REG(RxV), fPM_M(RxV, MuV))
+
+#define fGEN_TCG_V6_vS32b_new_ppu(SHORTCODE) \
+    fGEN_TCG_NEWVAL_VEC_STORE_ppu
+#define fGEN_TCG_V6_vS32b_nt_new_ppu(SHORTCODE) \
+    fGEN_TCG_NEWVAL_VEC_STORE_ppu
+
+/* Predicated vector stores */
+#define fGEN_TCG_PRED_VEC_STORE(GET_EA, PRED, SRCOFF, ALIGN, INC) \
+    do { \
+        TCGv LSB = tcg_temp_new(); \
+        TCGLabel *false_label = gen_new_label(); \
+        TCGLabel *end_label = gen_new_label(); \
+        GET_EA; \
+        PRED; \
+        tcg_gen_brcondi_tl(TCG_COND_EQ, LSB, 0, false_label); \
+        tcg_temp_free(LSB); \
+        gen_vreg_store(ctx, insn, pkt, EA, SRCOFF, insn->slot, ALIGN); \
+        INC; \
+        tcg_gen_br(end_label); \
+        gen_set_label(false_label); \
+        tcg_gen_ori_tl(hex_slot_cancelled, hex_slot_cancelled, \
+                       1 << insn->slot); \
+        gen_set_label(end_label); \
+    } while (0)
+
+#define fGEN_TCG_PRED_VEC_STORE_pred_pi(ALIGN) \
+    fGEN_TCG_PRED_VEC_STORE(fLSBOLD(PvV), \
+                            fEA_REG(RxV), \
+                            VsV_off, ALIGN, \
+                            fPM_I(RxV, siV * sizeof(MMVector)))
+#define fGEN_TCG_PRED_VEC_STORE_npred_pi(ALIGN) \
+    fGEN_TCG_PRED_VEC_STORE(fLSBOLDNOT(PvV), \
+                            fEA_REG(RxV), \
+                            VsV_off, ALIGN, \
+                            fPM_I(RxV, siV * sizeof(MMVector)))
+#define fGEN_TCG_PRED_VEC_STORE_new_pred_pi \
+    fGEN_TCG_PRED_VEC_STORE(fLSBOLD(PvV), \
+                            fEA_REG(RxV), \
+                            OsN_off, true, \
+                            fPM_I(RxV, siV * sizeof(MMVector)))
+#define fGEN_TCG_PRED_VEC_STORE_new_npred_pi \
+    fGEN_TCG_PRED_VEC_STORE(fLSBOLDNOT(PvV), \
+                            fEA_REG(RxV), \
+                            OsN_off, true, \
+                            fPM_I(RxV, siV * sizeof(MMVector)))
+
+#define fGEN_TCG_V6_vS32b_pred_pi(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_STORE_pred_pi(true)
+#define fGEN_TCG_V6_vS32b_npred_pi(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_STORE_npred_pi(true)
+#define fGEN_TCG_V6_vS32Ub_pred_pi(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_STORE_pred_pi(false)
+#define fGEN_TCG_V6_vS32Ub_npred_pi(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_STORE_npred_pi(false)
+#define fGEN_TCG_V6_vS32b_nt_pred_pi(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_STORE_pred_pi(true)
+#define fGEN_TCG_V6_vS32b_nt_npred_pi(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_STORE_npred_pi(true)
+#define fGEN_TCG_V6_vS32b_new_pred_pi(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_STORE_new_pred_pi
+#define fGEN_TCG_V6_vS32b_new_npred_pi(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_STORE_new_npred_pi
+#define fGEN_TCG_V6_vS32b_nt_new_pred_pi(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_STORE_new_pred_pi
+#define fGEN_TCG_V6_vS32b_nt_new_npred_pi(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_STORE_new_npred_pi
+
+#define fGEN_TCG_PRED_VEC_STORE_pred_ai(ALIGN) \
+    fGEN_TCG_PRED_VEC_STORE(fLSBOLD(PvV), \
+                            fEA_RI(RtV, siV * sizeof(MMVector)), \
+                            VsV_off, ALIGN, \
+                            do { } while (0))
+#define fGEN_TCG_PRED_VEC_STORE_npred_ai(ALIGN) \
+    fGEN_TCG_PRED_VEC_STORE(fLSBOLDNOT(PvV), \
+                            fEA_RI(RtV, siV * sizeof(MMVector)), \
+                            VsV_off, ALIGN, \
+                            do { } while (0))
+#define fGEN_TCG_PRED_VEC_STORE_new_pred_ai \
+    fGEN_TCG_PRED_VEC_STORE(fLSBOLD(PvV), \
+                            fEA_RI(RtV, siV * sizeof(MMVector)), \
+                            OsN_off, true, \
+                            do { } while (0))
+#define fGEN_TCG_PRED_VEC_STORE_new_npred_ai \
+    fGEN_TCG_PRED_VEC_STORE(fLSBOLDNOT(PvV), \
+                            fEA_RI(RtV, siV * sizeof(MMVector)), \
+                            OsN_off, true, \
+                            do { } while (0))
+
+#define fGEN_TCG_V6_vS32b_pred_ai(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_STORE_pred_ai(true)
+#define fGEN_TCG_V6_vS32b_npred_ai(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_STORE_npred_ai(true)
+#define fGEN_TCG_V6_vS32Ub_pred_ai(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_STORE_pred_ai(false)
+#define fGEN_TCG_V6_vS32Ub_npred_ai(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_STORE_npred_ai(false)
+#define fGEN_TCG_V6_vS32b_nt_pred_ai(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_STORE_pred_ai(true)
+#define fGEN_TCG_V6_vS32b_nt_npred_ai(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_STORE_npred_ai(true)
+#define fGEN_TCG_V6_vS32b_new_pred_ai(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_STORE_new_pred_ai
+#define fGEN_TCG_V6_vS32b_new_npred_ai(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_STORE_new_npred_ai
+#define fGEN_TCG_V6_vS32b_nt_new_pred_ai(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_STORE_new_pred_ai
+#define fGEN_TCG_V6_vS32b_nt_new_npred_ai(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_STORE_new_npred_ai
+
+#define fGEN_TCG_PRED_VEC_STORE_pred_ppu(ALIGN) \
+    fGEN_TCG_PRED_VEC_STORE(fLSBOLD(PvV), \
+                            fEA_REG(RxV), \
+                            VsV_off, ALIGN, \
+                            fPM_M(RxV, MuV))
+#define fGEN_TCG_PRED_VEC_STORE_npred_ppu(ALIGN) \
+    fGEN_TCG_PRED_VEC_STORE(fLSBOLDNOT(PvV), \
+                            fEA_REG(RxV), \
+                            VsV_off, ALIGN, \
+                            fPM_M(RxV, MuV))
+#define fGEN_TCG_PRED_VEC_STORE_new_pred_ppu \
+    fGEN_TCG_PRED_VEC_STORE(fLSBOLD(PvV), \
+                            fEA_REG(RxV), \
+                            OsN_off, true, \
+                            fPM_M(RxV, MuV))
+#define fGEN_TCG_PRED_VEC_STORE_new_npred_ppu \
+    fGEN_TCG_PRED_VEC_STORE(fLSBOLDNOT(PvV), \
+                            fEA_REG(RxV), \
+                            OsN_off, true, \
+                            fPM_M(RxV, MuV))
+
+#define fGEN_TCG_V6_vS32b_pred_ppu(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_STORE_pred_ppu(true)
+#define fGEN_TCG_V6_vS32b_npred_ppu(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_STORE_npred_ppu(true)
+#define fGEN_TCG_V6_vS32Ub_pred_ppu(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_STORE_pred_ppu(false)
+#define fGEN_TCG_V6_vS32Ub_npred_ppu(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_STORE_npred_ppu(false)
+#define fGEN_TCG_V6_vS32b_nt_pred_ppu(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_STORE_pred_ppu(true)
+#define fGEN_TCG_V6_vS32b_nt_npred_ppu(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_STORE_npred_ppu(true)
+#define fGEN_TCG_V6_vS32b_new_pred_ppu(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_STORE_new_pred_ppu
+#define fGEN_TCG_V6_vS32b_new_npred_ppu(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_STORE_new_npred_ppu
+#define fGEN_TCG_V6_vS32b_nt_new_pred_ppu(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_STORE_new_pred_ppu
+#define fGEN_TCG_V6_vS32b_nt_new_npred_ppu(SHORTCODE) \
+    fGEN_TCG_PRED_VEC_STORE_new_npred_ppu
+
+/* Masked vector stores */
+#define fGEN_TCG_V6_vS32b_qpred_pi(SHORTCODE)              SHORTCODE
+#define fGEN_TCG_V6_vS32b_nt_qpred_pi(SHORTCODE)           SHORTCODE
+#define fGEN_TCG_V6_vS32b_qpred_ai(SHORTCODE)              SHORTCODE
+#define fGEN_TCG_V6_vS32b_nt_qpred_ai(SHORTCODE)           SHORTCODE
+#define fGEN_TCG_V6_vS32b_qpred_ppu(SHORTCODE)             SHORTCODE
+#define fGEN_TCG_V6_vS32b_nt_qpred_ppu(SHORTCODE)          SHORTCODE
+#define fGEN_TCG_V6_vS32b_nqpred_pi(SHORTCODE)             SHORTCODE
+#define fGEN_TCG_V6_vS32b_nt_nqpred_pi(SHORTCODE)          SHORTCODE
+#define fGEN_TCG_V6_vS32b_nqpred_ai(SHORTCODE)             SHORTCODE
+#define fGEN_TCG_V6_vS32b_nt_nqpred_ai(SHORTCODE)          SHORTCODE
+#define fGEN_TCG_V6_vS32b_nqpred_ppu(SHORTCODE)            SHORTCODE
+#define fGEN_TCG_V6_vS32b_nt_nqpred_ppu(SHORTCODE)         SHORTCODE
+
+/* Store release not modelled in qemu, but need to suppress compiler warnings */
+#define fGEN_TCG_V6_vS32b_srls_pi(SHORTCODE) \
+    do { \
+        siV = siV; \
+    } while (0)
+#define fGEN_TCG_V6_vS32b_srls_ai(SHORTCODE) \
+    do { \
+        RtV = RtV; \
+        siV = siV; \
+    } while (0)
+#define fGEN_TCG_V6_vS32b_srls_ppu(SHORTCODE) \
+    do { \
+        MuV = MuV; \
+    } while (0)
+
 #endif
diff --git a/target/hexagon/helper.h b/target/hexagon/helper.h
index c99c1c1..e3262f9 100644
--- a/target/hexagon/helper.h
+++ b/target/hexagon/helper.h
@@ -23,6 +23,7 @@ DEF_HELPER_1(debug_start_packet, void, env)
 DEF_HELPER_FLAGS_3(debug_check_store_width, TCG_CALL_NO_WG, void, env, int, int)
 DEF_HELPER_FLAGS_3(debug_commit_end, TCG_CALL_NO_WG, void, env, int, int)
 DEF_HELPER_2(commit_store, void, env, int)
+DEF_HELPER_3(gather_store, void, env, i32, int)
 DEF_HELPER_1(commit_hvx_stores, void, env)
 DEF_HELPER_FLAGS_4(fcircadd, TCG_CALL_NO_RWG_SE, s32, s32, s32, s32, s32)
 DEF_HELPER_FLAGS_1(fbrev, TCG_CALL_NO_RWG_SE, i32, i32)
diff --git a/target/hexagon/op_helper.c b/target/hexagon/op_helper.c
index a0c50a3..c3fb43a 100644
--- a/target/hexagon/op_helper.c
+++ b/target/hexagon/op_helper.c
@@ -166,6 +166,11 @@ void HELPER(commit_store)(CPUHexagonState *env, int slot_num)
     }
 }
 
+void HELPER(gather_store)(CPUHexagonState *env, uint32_t addr, int slot)
+{
+    mem_gather_store(env, addr, slot);
+}
+
 void HELPER(commit_hvx_stores)(CPUHexagonState *env)
 {
     uintptr_t ra = GETPC();
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 24/30] Hexagon HVX (target/hexagon) import semantics
  2021-09-20 21:23 [PATCH v3 00/30] Hexagon HVX (target/hexagon) patch series Taylor Simpson
                   ` (22 preceding siblings ...)
  2021-09-20 21:24 ` [PATCH v3 23/30] Hexagon HVX (target/hexagon) helper overrides - vector stores Taylor Simpson
@ 2021-09-20 21:24 ` Taylor Simpson
  2021-09-20 23:27   ` Richard Henderson
  2021-09-20 21:24 ` [PATCH v3 25/30] Hexagon HVX (target/hexagon) instruction decoding Taylor Simpson
                   ` (5 subsequent siblings)
  29 siblings, 1 reply; 51+ messages in thread
From: Taylor Simpson @ 2021-09-20 21:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: ale, bcain, tsimpson, richard.henderson, f4bug

Imported from the Hexagon architecture library
    imported/allext.idef           Top level file for all extensions
    imported/mmvec/ext.idef        HVX instruction definitions

Support functions added to target/hexagon/genptr.c

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/genptr.c                |  175 +++
 target/hexagon/imported/allext.idef    |   25 +
 target/hexagon/imported/allidefs.def   |    1 +
 target/hexagon/imported/mmvec/ext.idef | 2606 ++++++++++++++++++++++++++++++++
 4 files changed, 2807 insertions(+)
 create mode 100644 target/hexagon/imported/allext.idef
 create mode 100644 target/hexagon/imported/mmvec/ext.idef

diff --git a/target/hexagon/genptr.c b/target/hexagon/genptr.c
index 5a9a7df..9e5637b 100644
--- a/target/hexagon/genptr.c
+++ b/target/hexagon/genptr.c
@@ -19,11 +19,13 @@
 #include "cpu.h"
 #include "internal.h"
 #include "tcg/tcg-op.h"
+#include "tcg/tcg-op-gvec.h"
 #include "insn.h"
 #include "opcodes.h"
 #include "translate.h"
 #define QEMU_GENERATE       /* Used internally by macros.h */
 #include "macros.h"
+#include "mmvec/macros.h"
 #undef QEMU_GENERATE
 #include "gen_tcg.h"
 #include "gen_tcg_hvx.h"
@@ -475,5 +477,178 @@ static TCGv gen_8bitsof(TCGv result, TCGv value)
     return result;
 }
 
+static intptr_t vreg_src_off(DisasContext *ctx, int num)
+{
+    intptr_t offset = offsetof(CPUHexagonState, VRegs[num]);
+
+    if (test_bit(num, ctx->vregs_select)) {
+        offset = ctx_future_vreg_off(ctx, num, 1, false);
+    }
+    if (test_bit(num, ctx->vregs_updated_tmp)) {
+        offset = ctx_tmp_vreg_off(ctx, num, 1, false);
+    }
+    return offset;
+}
+
+static void gen_log_vreg_write(DisasContext *ctx, intptr_t srcoff, int num,
+                               VRegWriteType type, int slot_num,
+                               bool is_predicated)
+{
+    TCGLabel *label_end = NULL;
+    intptr_t dstoff;
+
+    if (is_predicated) {
+        TCGv cancelled = tcg_temp_local_new();
+        label_end = gen_new_label();
+
+        /* Don't do anything if the slot was cancelled */
+        tcg_gen_extract_tl(cancelled, hex_slot_cancelled, slot_num, 1);
+        tcg_gen_brcondi_tl(TCG_COND_NE, cancelled, 0, label_end);
+        tcg_temp_free(cancelled);
+    }
+
+    if (type != EXT_TMP) {
+        dstoff = ctx_future_vreg_off(ctx, num, 1, true);
+        tcg_gen_gvec_mov(MO_64, dstoff, srcoff,
+                         sizeof(MMVector), sizeof(MMVector));
+        tcg_gen_ori_tl(hex_VRegs_updated, hex_VRegs_updated, 1 << num);
+    } else {
+        dstoff = ctx_tmp_vreg_off(ctx, num, 1, false);
+        tcg_gen_gvec_mov(MO_64, dstoff, srcoff,
+                         sizeof(MMVector), sizeof(MMVector));
+    }
+
+    if (is_predicated) {
+        gen_set_label(label_end);
+    }
+}
+
+static void gen_log_vreg_write_pair(DisasContext *ctx, intptr_t srcoff, int num,
+                                    VRegWriteType type, int slot_num,
+                                    bool is_predicated)
+{
+    gen_log_vreg_write(ctx, srcoff, num ^ 0, type, slot_num, is_predicated);
+    srcoff += sizeof(MMVector);
+    gen_log_vreg_write(ctx, srcoff, num ^ 1, type, slot_num, is_predicated);
+}
+
+static void gen_log_qreg_write(intptr_t srcoff, int num, int vnew,
+                               int slot_num, bool is_predicated)
+{
+    TCGLabel *label_end = NULL;
+    intptr_t dstoff;
+
+    if (is_predicated) {
+        TCGv cancelled = tcg_temp_local_new();
+        label_end = gen_new_label();
+
+        /* Don't do anything if the slot was cancelled */
+        tcg_gen_extract_tl(cancelled, hex_slot_cancelled, slot_num, 1);
+        tcg_gen_brcondi_tl(TCG_COND_NE, cancelled, 0, label_end);
+        tcg_temp_free(cancelled);
+    }
+
+    dstoff = offsetof(CPUHexagonState, future_QRegs[num]);
+    tcg_gen_gvec_mov(MO_64, dstoff, srcoff, sizeof(MMQReg), sizeof(MMQReg));
+
+    if (is_predicated) {
+        tcg_gen_ori_tl(hex_QRegs_updated, hex_QRegs_updated, 1 << num);
+        gen_set_label(label_end);
+    }
+}
+
+static void gen_vreg_load(DisasContext *ctx, intptr_t dstoff, TCGv src,
+                          bool aligned)
+{
+    TCGv_i64 tmp = tcg_temp_new_i64();
+    if (aligned) {
+        tcg_gen_andi_tl(src, src, ~((int32_t)sizeof(MMVector) - 1));
+    }
+    for (int i = 0; i < sizeof(MMVector) / 8; i++) {
+        tcg_gen_qemu_ld64(tmp, src, ctx->mem_idx);
+        tcg_gen_addi_tl(src, src, 8);
+        tcg_gen_st_i64(tmp, cpu_env, dstoff + i * 8);
+    }
+    tcg_temp_free_i64(tmp);
+}
+
+static void gen_vreg_store(DisasContext *ctx, Insn *insn, Packet *pkt,
+                           TCGv EA, intptr_t srcoff, int slot, bool aligned)
+{
+    intptr_t dstoff = offsetof(CPUHexagonState, vstore[slot].data);
+    intptr_t maskoff = offsetof(CPUHexagonState, vstore[slot].mask);
+
+    if (is_gather_store_insn(insn, pkt)) {
+        TCGv sl = tcg_const_tl(slot);
+        gen_helper_gather_store(cpu_env, EA, sl);
+        tcg_temp_free(sl);
+        return;
+    }
+
+    tcg_gen_movi_tl(hex_vstore_pending[slot], 1);
+    if (aligned) {
+        tcg_gen_andi_tl(hex_vstore_addr[slot], EA,
+                        ~((int32_t)sizeof(MMVector) - 1));
+    } else {
+        tcg_gen_mov_tl(hex_vstore_addr[slot], EA);
+    }
+    tcg_gen_movi_tl(hex_vstore_size[slot], sizeof(MMVector));
+
+    /* Copy the data to the vstore buffer */
+    tcg_gen_gvec_mov(MO_64, dstoff, srcoff, sizeof(MMVector), sizeof(MMVector));
+    /* Set the mask to all 1's */
+    tcg_gen_gvec_dup_imm(MO_64, maskoff, sizeof(MMQReg), sizeof(MMQReg), ~0LL);
+}
+
+static void gen_vreg_masked_store(DisasContext *ctx, TCGv EA, intptr_t srcoff,
+                                  intptr_t bitsoff, int slot, bool invert)
+{
+    intptr_t dstoff = offsetof(CPUHexagonState, vstore[slot].data);
+    intptr_t maskoff = offsetof(CPUHexagonState, vstore[slot].mask);
+
+    tcg_gen_movi_tl(hex_vstore_pending[slot], 1);
+    tcg_gen_andi_tl(hex_vstore_addr[slot], EA,
+                    ~((int32_t)sizeof(MMVector) - 1));
+    tcg_gen_movi_tl(hex_vstore_size[slot], sizeof(MMVector));
+
+    /* Copy the data to the vstore buffer */
+    tcg_gen_gvec_mov(MO_64, dstoff, srcoff, sizeof(MMVector), sizeof(MMVector));
+    /* Copy the mask */
+    tcg_gen_gvec_mov(MO_64, maskoff, bitsoff, sizeof(MMQReg), sizeof(MMQReg));
+    if (invert) {
+        tcg_gen_gvec_not(MO_64, maskoff, maskoff,
+                         sizeof(MMQReg), sizeof(MMQReg));
+    }
+}
+
+static void vec_to_qvec(size_t size, intptr_t dstoff, intptr_t srcoff)
+{
+    TCGv_i64 tmp = tcg_temp_new_i64();
+    TCGv_i64 word = tcg_temp_new_i64();
+    TCGv_i64 bits = tcg_temp_new_i64();
+    TCGv_i64 mask = tcg_temp_new_i64();
+    TCGv_i64 zero = tcg_const_i64(0);
+    TCGv_i64 ones = tcg_const_i64(~0);
+
+    for (int i = 0; i < sizeof(MMVector) / 8; i++) {
+        tcg_gen_ld_i64(tmp, cpu_env, srcoff + i * 8);
+        tcg_gen_movi_i64(mask, 0);
+
+        for (int j = 0; j < 8; j += size) {
+            tcg_gen_extract_i64(word, tmp, j * 8, size * 8);
+            tcg_gen_movcond_i64(TCG_COND_NE, bits, word, zero, ones, zero);
+            tcg_gen_deposit_i64(mask, mask, bits, j, size);
+        }
+
+        tcg_gen_st8_i64(mask, cpu_env, dstoff + i);
+    }
+    tcg_temp_free_i64(tmp);
+    tcg_temp_free_i64(word);
+    tcg_temp_free_i64(bits);
+    tcg_temp_free_i64(mask);
+    tcg_temp_free_i64(zero);
+    tcg_temp_free_i64(ones);
+}
+
 #include "tcg_funcs_generated.c.inc"
 #include "tcg_func_table_generated.c.inc"
diff --git a/target/hexagon/imported/allext.idef b/target/hexagon/imported/allext.idef
new file mode 100644
index 0000000..9d4b23e
--- /dev/null
+++ b/target/hexagon/imported/allext.idef
@@ -0,0 +1,25 @@
+/*
+ *  Copyright(c) 2019-2021 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+/*
+ * Top level file for all instruction set extensions
+ */
+#define EXTNAME mmvec
+#define EXTSTR "mmvec"
+#include "mmvec/ext.idef"
+#undef EXTNAME
+#undef EXTSTR
diff --git a/target/hexagon/imported/allidefs.def b/target/hexagon/imported/allidefs.def
index 2aace29..ee253b8 100644
--- a/target/hexagon/imported/allidefs.def
+++ b/target/hexagon/imported/allidefs.def
@@ -28,3 +28,4 @@
 #include "shift.idef"
 #include "system.idef"
 #include "subinsns.idef"
+#include "allext.idef"
diff --git a/target/hexagon/imported/mmvec/ext.idef b/target/hexagon/imported/mmvec/ext.idef
new file mode 100644
index 0000000..8ca5a60
--- /dev/null
+++ b/target/hexagon/imported/mmvec/ext.idef
@@ -0,0 +1,2606 @@
+/*
+ *  Copyright(c) 2019-2021 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+/******************************************************************************
+ *
+ *     HOYA: MULTI MEDIA INSTRUCITONS
+ *
+ ******************************************************************************/
+
+#ifndef EXTINSN
+#define EXTINSN Q6INSN
+#define __SELF_DEF_EXTINSN 1
+#endif
+
+#ifndef NO_MMVEC
+
+#define DO_FOR_EACH_CODE(WIDTH, CODE) \
+{ \
+    fHIDE(int i;) \
+    fVFOREACH(WIDTH, i) {\
+        CODE ;\
+    } \
+}
+
+
+
+
+#define ITERATOR_INSN_ANY_SLOT(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA),  \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+
+
+#define ITERATOR_INSN2_ANY_SLOT(WIDTH,TAG,SYNTAX,SYNTAX2,DESCR,CODE) \
+ITERATOR_INSN_ANY_SLOT(WIDTH,TAG,SYNTAX2,DESCR,CODE)
+
+#define ITERATOR_INSN_ANY_SLOT_DOUBLE_VEC(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA_DV),  \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+
+#define ITERATOR_INSN2_ANY_SLOT_DOUBLE_VEC(WIDTH,TAG,SYNTAX,SYNTAX2,DESCR,CODE) \
+ITERATOR_INSN_ANY_SLOT_DOUBLE_VEC(WIDTH,TAG,SYNTAX2,DESCR,CODE)
+
+
+#define ITERATOR_INSN_SHIFT_SLOT(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VS),  \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+
+
+#define ITERATOR_INSN_SHIFT_SLOT_VV_LATE(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VS),  \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+#define ITERATOR_INSN2_SHIFT_SLOT(WIDTH,TAG,SYNTAX,SYNTAX2,DESCR,CODE) \
+ITERATOR_INSN_SHIFT_SLOT(WIDTH,TAG,SYNTAX2,DESCR,CODE)
+
+#define ITERATOR_INSN_PERMUTE_SLOT(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VP),  \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+#define ITERATOR_INSN2_PERMUTE_SLOT(WIDTH,TAG,SYNTAX,SYNTAX2,DESCR,CODE) \
+ITERATOR_INSN_PERMUTE_SLOT(WIDTH,TAG,SYNTAX2,DESCR,CODE)
+
+#define ITERATOR_INSN_PERMUTE_SLOT_DEP(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VP),
+
+
+#define ITERATOR_INSN2_PERMUTE_SLOT_DEP(WIDTH,TAG,SYNTAX,SYNTAX2,DESCR,CODE) \
+ITERATOR_INSN_PERMUTE_SLOT_DEP(WIDTH,TAG,SYNTAX2,DESCR,CODE)
+
+#define ITERATOR_INSN_PERMUTE_SLOT_DOUBLE_VEC(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VP_VS),  \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+#define ITERATOR_INSN_PERMUTE_SLOT_DOUBLE_VEC_DEP(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VP_VS),  \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+#define ITERATOR_INSN2_PERMUTE_SLOT_DOUBLE_VEC(WIDTH,TAG,SYNTAX,SYNTAX2,DESCR,CODE) \
+ITERATOR_INSN_PERMUTE_SLOT_DOUBLE_VEC(WIDTH,TAG,SYNTAX2,DESCR,CODE)
+
+#define ITERATOR_INSN_MPY_SLOT(WIDTH,TAG, SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, \
+ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VX),  \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+#define ITERATOR_INSN_MPY_SLOT_LATE(WIDTH,TAG, SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, \
+ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VX),  \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+#define ITERATOR_INSN2_MPY_SLOT(WIDTH,TAG,SYNTAX,SYNTAX2,DESCR,CODE) \
+ITERATOR_INSN_MPY_SLOT(WIDTH,TAG,SYNTAX2,DESCR,CODE)
+
+#define ITERATOR_INSN2_MPY_SLOT_LATE(WIDTH,TAG, SYNTAX,SYNTAX2,DESCR,CODE) \
+ITERATOR_INSN_MPY_SLOT_LATE(WIDTH,TAG, SYNTAX2,DESCR,CODE)
+
+
+#define ITERATOR_INSN_MPY_SLOT_DOUBLE_VEC(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VX_DV),  \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+#define ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(WIDTH,TAG,SYNTAX,SYNTAX2,DESCR,CODE) \
+ITERATOR_INSN_MPY_SLOT_DOUBLE_VEC(WIDTH,TAG,SYNTAX2,DESCR,CODE)
+
+
+
+
+#define ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC2(WIDTH,TAG,SYNTAX,SYNTAX2,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VX_DV,A_CVI_VX_VSRC0_IS_DST), DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+#define ITERATOR_INSN_SLOT2_DOUBLE_VEC(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VX_DV,A_RESTRICT_SLOT2ONLY), DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+#define ITERATOR_INSN_VHISTLIKE(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_4SLOT),  \
+DESCR, fHIDE(mmvector_t input;) input = fTMPVDATA(); DO_FOR_EACH_CODE(WIDTH, CODE))
+
+
+
+
+
+/******************************************************************************************
+*
+* MMVECTOR MEMORY OPERATIONS - NO NAPALI V1
+*
+*******************************************************************************************/
+
+
+
+#define ITERATOR_INSN_MPY_SLOT_DOUBLE_VEC_NOV1(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VX_DV),  \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+#define ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC_NOV1(WIDTH,TAG,SYNTAX,SYNTAX2,DESCR,CODE) \
+ITERATOR_INSN_MPY_SLOT_DOUBLE_VEC_NOV1(WIDTH,TAG,SYNTAX2,DESCR,CODE)
+
+
+
+#define ITERATOR_INSN_SHIFT_SLOT_NOV1(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VS),  \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+#define ITERATOR_INSN2_SHIFT_SLOT_NOV1(WIDTH,TAG,SYNTAX,SYNTAX2,DESCR,CODE) \
+ITERATOR_INSN_SHIFT_SLOT_NOV1(WIDTH,TAG,SYNTAX2,DESCR,CODE)
+
+
+#define ITERATOR_INSN_ANY_SLOT_NOV1(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA),  \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+#define ITERATOR_INSN2_ANY_SLOT_NOV1(WIDTH,TAG,SYNTAX,SYNTAX2,DESCR,CODE) \
+ITERATOR_INSN_ANY_SLOT_NOV1(WIDTH,TAG,SYNTAX2,DESCR,CODE)
+
+
+#define ITERATOR_INSN_MPY_SLOT_NOV1(WIDTH,TAG, SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, \
+ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VX),  \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+#define ITERATOR_INSN_PERMUTE_SLOT_NOV1(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VP),  \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+#define ITERATOR_INSN2_PERMUTE_SLOTT_NOV1(WIDTH,TAG,SYNTAX,SYNTAX2,DESCR,CODE) \
+ITERATOR_INSN_PERMUTE_SLOT(WIDTH,TAG,SYNTAX2,DESCR,CODE)
+
+#define ITERATOR_INSN_PERMUTE_SLOT_DEPT_NOV1(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VP),
+
+
+#define ITERATOR_INSN2_PERMUTE_SLOT_DEPT_NOV1(WIDTH,TAG,SYNTAX,SYNTAX2,DESCR,CODE) \
+ITERATOR_INSN_PERMUTE_SLOT_DEP_NOV1(WIDTH,TAG,SYNTAX2,DESCR,CODE)
+
+#define ITERATOR_INSN_PERMUTE_SLOT_DOUBLE_VEC_NOV1(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VP_VS),  \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+#define ITERATOR_INSN_PERMUTE_SLOT_DOUBLE_VEC_DEPT_NOV1(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VP_VS),  \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
+#define ITERATOR_INSN2_PERMUTE_SLOT_DOUBLE_VEC_NOV1(WIDTH,TAG,SYNTAX,SYNTAX2,DESCR,CODE) \
+ITERATOR_INSN_PERMUTE_SLOT_DOUBLE_VEC_NOV1(WIDTH,TAG,SYNTAX2,DESCR,CODE)
+
+#define NARROWING_SHIFT_NOV1(ITERSIZE,TAG,DSTM,DSTTYPE,SRCTYPE,SYNOPTS,SATFUNC,RNDFUNC,SHAMTMASK) \
+ITERATOR_INSN_SHIFT_SLOT_NOV1(ITERSIZE,TAG, \
+"Vd32." #DSTTYPE "=vasr(Vu32." #SRCTYPE ",Vv32." #SRCTYPE ",Rt8)" #SYNOPTS, \
+"Vector shift right and shuffle", \
+    fHIDE(int )shamt = RtV & SHAMTMASK; \
+    DSTM(0,VdV.SRCTYPE[i],SATFUNC(RNDFUNC(VvV.SRCTYPE[i],shamt) >> shamt)); \
+    DSTM(1,VdV.SRCTYPE[i],SATFUNC(RNDFUNC(VuV.SRCTYPE[i],shamt) >> shamt)))
+
+#define MMVEC_AVGS_NOV1(TYPE,TYPE2,DESCR, WIDTH, DEST,SRC)\
+ITERATOR_INSN2_ANY_SLOT_NOV1(WIDTH,vavg##TYPE,                        "Vd32=vavg"TYPE2"(Vu32,Vv32)",          "Vd32."#DEST"=vavg(Vu32."#SRC",Vv32."#SRC")",          "Vector Average "DESCR,                                      VdV.DEST[i]  = fVAVGS(       WIDTH,  VuV.SRC[i], VvV.SRC[i])) \
+ITERATOR_INSN2_ANY_SLOT_NOV1(WIDTH,vavg##TYPE##rnd,                   "Vd32=vavg"TYPE2"(Vu32,Vv32):rnd",      "Vd32."#DEST"=vavg(Vu32."#SRC",Vv32."#SRC"):rnd",      "Vector Average % Round"DESCR,                               VdV.DEST[i]  = fVAVGSRND(    WIDTH,  VuV.SRC[i], VvV.SRC[i])) \
+ITERATOR_INSN2_ANY_SLOT_NOV1(WIDTH,vnavg##TYPE,                       "Vd32=vnavg"TYPE2"(Vu32,Vv32)",         "Vd32."#DEST"=vnavg(Vu32."#SRC",Vv32."#SRC")",         "Vector Negative Average "DESCR,                             VdV.DEST[i]  = fVNAVGS(      WIDTH,  VuV.SRC[i], VvV.SRC[i]))
+
+  #define MMVEC_AVGU_NOV1(TYPE,TYPE2,DESCR, WIDTH, DEST,SRC)\
+ITERATOR_INSN2_ANY_SLOT_NOV1(WIDTH,vavg##TYPE,                        "Vd32=vavg"TYPE2"(Vu32,Vv32)",         "Vd32."#DEST"=vavg(Vu32."#SRC",Vv32."#SRC")",        "Vector Average "DESCR,                                      VdV.DEST[i] = fVAVGU(   WIDTH,  VuV.SRC[i], VvV.SRC[i])) \
+ITERATOR_INSN2_ANY_SLOT_NOV1(WIDTH,vavg##TYPE##rnd,                   "Vd32=vavg"TYPE2"(Vu32,Vv32):rnd",     "Vd32."#DEST"=vavg(Vu32."#SRC",Vv32."#SRC"):rnd",    "Vector Average % Round"DESCR,                               VdV.DEST[i] = fVAVGURND(WIDTH,  VuV.SRC[i], VvV.SRC[i]))
+
+
+
+/******************************************************************************************
+*
+* MMVECTOR MEMORY OPERATIONS
+*
+*******************************************************************************************/
+
+#define MMVEC_EACH_EA(TAG,DESCR,ATTRIB,NT,SYNTAXA,SYNTAXB,BEH) \
+EXTINSN(V6_##TAG##_pi,      SYNTAXA "(Rx32++#s3)" NT SYNTAXB,ATTRIB,DESCR,{ fEA_REG(RxV); BEH; fPM_I(RxV,VEC_SCALE(siV)); }) \
+EXTINSN(V6_##TAG##_ai,      SYNTAXA "(Rt32+#s4)" NT SYNTAXB,ATTRIB,DESCR,{ fEA_RI(RtV,VEC_SCALE(siV)); BEH;}) \
+EXTINSN(V6_##TAG##_ppu,      SYNTAXA "(Rx32++Mu2)" NT SYNTAXB,ATTRIB,DESCR,{ fEA_REG(RxV); BEH; fPM_M(RxV,MuV); }) \
+
+
+#define MMVEC_COND_EACH_EA_TRUE(TAG,DESCR,ATTRIB,NT,SYNTAXA,SYNTAXB,SYNTAXP,BEH) \
+EXTINSN(V6_##TAG##_pred_pi,      "if (" #SYNTAXP "4) " SYNTAXA "(Rx32++#s3)" NT SYNTAXB, ATTRIB,DESCR, { if (fLSBOLD(SYNTAXP##V)) { fEA_REG(RxV); BEH; fPM_I(RxV,siV*fVECSIZE()); } else {CANCEL;}}) \
+EXTINSN(V6_##TAG##_pred_ai,      "if (" #SYNTAXP "4) " SYNTAXA "(Rt32+#s4)" NT SYNTAXB, ATTRIB,DESCR,  { if (fLSBOLD(SYNTAXP##V)) { fEA_RI(RtV,siV*fVECSIZE()); BEH;} else {CANCEL;}}) \
+EXTINSN(V6_##TAG##_pred_ppu,     "if (" #SYNTAXP "4) " SYNTAXA "(Rx32++Mu2)" NT SYNTAXB,ATTRIB,DESCR,  { if (fLSBOLD(SYNTAXP##V)) { fEA_REG(RxV); BEH; fPM_M(RxV,MuV); } else {CANCEL;}}) \
+
+#define MMVEC_COND_EACH_EA_FALSE(TAG,DESCR,ATTRIB,NT,SYNTAXA,SYNTAXB,SYNTAXP,BEH) \
+EXTINSN(V6_##TAG##_npred_pi,     "if (!" #SYNTAXP "4) " SYNTAXA "(Rx32++#s3)" NT SYNTAXB,ATTRIB,DESCR,{ if (fLSBOLDNOT(SYNTAXP##V)) { fEA_REG(RxV); BEH; fPM_I(RxV,siV*fVECSIZE()); } else {CANCEL;}}) \
+EXTINSN(V6_##TAG##_npred_ai,     "if (!" #SYNTAXP "4) " SYNTAXA "(Rt32+#s4)" NT SYNTAXB,ATTRIB,DESCR, { if (fLSBOLDNOT(SYNTAXP##V)) { fEA_RI(RtV,siV*fVECSIZE()); BEH;} else {CANCEL;}}) \
+EXTINSN(V6_##TAG##_npred_ppu,    "if (!" #SYNTAXP "4) " SYNTAXA "(Rx32++Mu2)" NT SYNTAXB,ATTRIB,DESCR,{ if (fLSBOLDNOT(SYNTAXP##V)) { fEA_REG(RxV); BEH; fPM_M(RxV,MuV); } else {CANCEL;}})
+
+#define MMVEC_COND_EACH_EA(TAG,DESCR,ATTRIB,NT,SYNTAXA,SYNTAXB,SYNTAXP,BEH) \
+MMVEC_COND_EACH_EA_TRUE(TAG,DESCR,ATTRIB,NT,SYNTAXA,SYNTAXB,SYNTAXP,BEH) \
+MMVEC_COND_EACH_EA_FALSE(TAG,DESCR,ATTRIB,NT,SYNTAXA,SYNTAXB,SYNTAXP,BEH)
+
+
+#define VEC_SCALE(X) X*fVECSIZE()
+
+
+#define MMVEC_LD(TAG,DESCR,ATTRIB,NT) MMVEC_EACH_EA(TAG,DESCR,ATTRIB,NT,"Vd32=vmem","",fLOADMMV(EA,VdV))
+#define MMVEC_LDC(TAG,DESCR,ATTRIB,NT) MMVEC_EACH_EA(TAG##_cur,DESCR,ATTRIB,NT,"Vd32.cur=vmem","",fLOADMMV(EA,VdV))
+#define MMVEC_LDT(TAG,DESCR,ATTRIB,NT) MMVEC_EACH_EA(TAG##_tmp,DESCR,ATTRIB,NT,"Vd32.tmp=vmem","",fLOADMMV(EA,VdV))
+#define MMVEC_LDU(TAG,DESCR,ATTRIB,NT) MMVEC_EACH_EA(TAG,DESCR,ATTRIB,NT,"Vd32=vmemu","",fLOADMMVU(EA,VdV))
+
+
+#define MMVEC_STQ(TAG,DESCR,ATTRIB,NT) \
+MMVEC_EACH_EA(TAG##_qpred,DESCR,ATTRIB,NT,"if (Qv4) vmem","=Vs32",fSTOREMMVQ(EA,VsV,QvV)) \
+MMVEC_EACH_EA(TAG##_nqpred,DESCR,ATTRIB,NT,"if (!Qv4) vmem","=Vs32",fSTOREMMVNQ(EA,VsV,QvV))
+
+/****************************************************************
+* MAPPING FOR VMEMs
+****************************************************************/
+
+#define ATTR_VMEM A_EXTENSION,A_CVI,A_CVI_VM
+#define ATTR_VMEMU A_EXTENSION,A_CVI,A_CVI_VM,A_CVI_VP
+
+
+MMVEC_LD(vL32b,  "Aligned Vector Load",        ATTRIBS(ATTR_VMEM,A_LOAD,A_CVI_VA),)
+MMVEC_LDC(vL32b,  "Aligned Vector Load Cur",	ATTRIBS(ATTR_VMEM,A_LOAD,A_CVI_NEW,A_CVI_VA),)
+MMVEC_LDT(vL32b,  "Aligned Vector Load Tmp",	ATTRIBS(ATTR_VMEM,A_LOAD,A_CVI_TMP),)
+
+MMVEC_COND_EACH_EA(vL32b,"Conditional Aligned Vector Load",ATTRIBS(ATTR_VMEM,A_LOAD,A_CVI_VA),,"Vd32=vmem",,Pv,fLOADMMV(EA,VdV);)
+MMVEC_COND_EACH_EA(vL32b_cur,"Conditional Aligned Vector Load Cur",ATTRIBS(ATTR_VMEM,A_LOAD,A_CVI_VA,A_CVI_NEW),,"Vd32.cur=vmem",,Pv,fLOADMMV(EA,VdV);)
+MMVEC_COND_EACH_EA(vL32b_tmp,"Conditional Aligned Vector Load Tmp",ATTRIBS(ATTR_VMEM,A_LOAD,A_CVI_TMP),,"Vd32.tmp=vmem",,Pv,fLOADMMV(EA,VdV);)
+
+MMVEC_EACH_EA(vS32b,"Aligned Vector Store",ATTRIBS(ATTR_VMEM,A_STORE,A_RESTRICT_SLOT0ONLY,A_CVI_VA),,"vmem","=Vs32",fSTOREMMV(EA,VsV))
+MMVEC_COND_EACH_EA(vS32b,"Aligned Vector Store",ATTRIBS(ATTR_VMEM,A_STORE,A_RESTRICT_SLOT0ONLY,A_CVI_VA),,"vmem","=Vs32",Pv,fSTOREMMV(EA,VsV))
+
+
+MMVEC_STQ(vS32b,  "Aligned Vector Store",      ATTRIBS(ATTR_VMEM,A_STORE,A_RESTRICT_SLOT0ONLY,A_CVI_VA),)
+
+MMVEC_LDU(vL32Ub, "Unaligned Vector Load",     ATTRIBS(ATTR_VMEMU,A_LOAD,A_RESTRICT_NOSLOT1),)
+
+MMVEC_EACH_EA(vS32Ub,"Unaligned Vector Store",ATTRIBS(ATTR_VMEMU,A_STORE,A_RESTRICT_NOSLOT1),,"vmemu","=Vs32",fSTOREMMVU(EA,VsV))
+
+MMVEC_COND_EACH_EA(vS32Ub,"Unaligned Vector Store",ATTRIBS(ATTR_VMEMU,A_STORE,A_RESTRICT_NOSLOT1),,"vmemu","=Vs32",Pv,fSTOREMMVU(EA,VsV))
+
+MMVEC_EACH_EA(vS32b_new,"Aligned Vector Store New",ATTRIBS(ATTR_VMEM,A_STORE,A_CVI_NEW,A_DOTNEWVALUE,A_RESTRICT_SLOT0ONLY),,"vmem","=Os8.new",fSTOREMMV(EA,fNEWVREG(OsN)))
+
+// V65 store relase, zero byte store
+MMVEC_EACH_EA(vS32b_srls,"Aligned Vector Scatter Release",ATTRIBS(ATTR_VMEM,A_STORE,A_CVI_SCATTER_RELEASE,A_CVI_NEW,A_RESTRICT_SLOT0ONLY),,"vmem",":scatter_release",fSTORERELEASE(EA,0))
+
+
+
+MMVEC_COND_EACH_EA(vS32b_new,"Aligned Vector Store New",ATTRIBS(ATTR_VMEM,A_STORE,A_CVI_NEW,A_DOTNEWVALUE,A_RESTRICT_SLOT0ONLY),,"vmem","=Os8.new",Pv,fSTOREMMV(EA,fNEWVREG(OsN)))
+
+
+/******************************************************************************************
+*
+* MMVECTOR MEMORY OPERATIONS - NON TEMPORAL
+*
+*******************************************************************************************/
+
+#define ATTR_VMEM_NT A_EXTENSION,A_CVI,A_CVI_VM
+
+MMVEC_EACH_EA(vS32b_nt,"Aligned Vector Store - Non temporal",ATTRIBS(ATTR_VMEM_NT,A_STORE,A_RESTRICT_SLOT0ONLY,A_CVI_VA),":nt","vmem","=Vs32",fSTOREMMV(EA,VsV))
+MMVEC_COND_EACH_EA(vS32b_nt,"Aligned Vector Store - Non temporal",ATTRIBS(ATTR_VMEM_NT,A_STORE,A_RESTRICT_SLOT0ONLY,A_CVI_VA),":nt","vmem","=Vs32",Pv,fSTOREMMV(EA,VsV))
+
+MMVEC_EACH_EA(vS32b_nt_new,"Aligned Vector Store New - Non temporal",ATTRIBS(ATTR_VMEM_NT,A_STORE,A_CVI_NEW,A_DOTNEWVALUE,A_RESTRICT_SLOT0ONLY),":nt","vmem","=Os8.new",fSTOREMMV(EA,fNEWVREG(OsN)))
+MMVEC_COND_EACH_EA(vS32b_nt_new,"Aligned Vector Store New - Non temporal",ATTRIBS(ATTR_VMEM_NT,A_STORE,A_CVI_NEW,A_DOTNEWVALUE,A_RESTRICT_SLOT0ONLY),":nt","vmem","=Os8.new",Pv,fSTOREMMV(EA,fNEWVREG(OsN)))
+
+
+MMVEC_STQ(vS32b_nt,  "Aligned Vector Store - Non temporal",      ATTRIBS(ATTR_VMEM_NT,A_STORE,A_RESTRICT_SLOT0ONLY,A_CVI_VA),":nt")
+
+MMVEC_LD(vL32b_nt,  "Aligned Vector Load - Non temporal",       ATTRIBS(ATTR_VMEM_NT,A_LOAD,A_CVI_VA),":nt")
+MMVEC_LDC(vL32b_nt,  "Aligned Vector Load Cur - Non temporal",	ATTRIBS(ATTR_VMEM_NT,A_LOAD,A_CVI_NEW,A_CVI_VA),":nt")
+MMVEC_LDT(vL32b_nt,  "Aligned Vector Load Tmp - Non temporal",	ATTRIBS(ATTR_VMEM_NT,A_LOAD,A_CVI_TMP),":nt")
+
+MMVEC_COND_EACH_EA(vL32b_nt,"Conditional Aligned Vector Load",ATTRIBS(ATTR_VMEM_NT,A_CVI_VA),,"Vd32=vmem",":nt",Pv,fLOADMMV(EA,VdV);)
+MMVEC_COND_EACH_EA(vL32b_nt_cur,"Conditional Aligned Vector Load Cur",ATTRIBS(ATTR_VMEM_NT,A_CVI_VA,A_CVI_NEW),,"Vd32.cur=vmem",":nt",Pv,fLOADMMV(EA,VdV);)
+MMVEC_COND_EACH_EA(vL32b_nt_tmp,"Conditional Aligned Vector Load Tmp",ATTRIBS(ATTR_VMEM_NT,A_CVI_TMP),,"Vd32.tmp=vmem",":nt",Pv,fLOADMMV(EA,VdV);)
+
+
+#undef VEC_SCALE
+
+
+/***************************************************
+ * Vector Alignment
+ ************************************************/
+
+#define VALIGNB(SHIFT)  \
+    fHIDE(int i;) \
+    for(i = 0; i < fVBYTES(); i++) {\
+        VdV.ub[i] = (i+SHIFT>=fVBYTES()) ? VuV.ub[i+SHIFT-fVBYTES()] : VvV.ub[i+SHIFT];\
+	}
+
+EXTINSN(V6_valignb,  "Vd32=valign(Vu32,Vv32,Rt8)",  ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VP),"Align Two vectors by Rt8 as control",
+{
+	unsigned shift = RtV & (fVBYTES()-1);
+	VALIGNB(shift)
+})
+EXTINSN(V6_vlalignb, "Vd32=vlalign(Vu32,Vv32,Rt8)", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VP),"Align Two vectors by Rt8 as control",
+{
+	unsigned shift = fVBYTES() - (RtV & (fVBYTES()-1));
+	VALIGNB(shift)
+})
+EXTINSN(V6_valignbi, "Vd32=valign(Vu32,Vv32,#u3)", 	ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VP),"Align Two vectors by #u3 as control",
+{
+	VALIGNB(uiV)
+})
+EXTINSN(V6_vlalignbi,"Vd32=vlalign(Vu32,Vv32,#u3)", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VP),"Align Two vectors by #u3 as control",
+{
+	unsigned shift = fVBYTES() - uiV;
+	VALIGNB(shift)
+})
+
+EXTINSN(V6_vror, "Vd32=vror(Vu32,Rt32)", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VP),
+"Align Two vectors by Rt32 as control",
+{
+	fHIDE(int k;)
+	for (k=0;k<fVBYTES();k++) {
+		VdV.ub[k] = VuV.ub[(k+RtV)&(fVBYTES()-1)];
+	}
+	})
+
+
+
+
+
+
+
+/**************************************************************
+* Unpack elements with zero/sign extend and cross lane permute
+***************************************************************/
+
+ITERATOR_INSN2_PERMUTE_SLOT_DOUBLE_VEC(8,vunpackub,  "Vdd32=vunpackub(Vu32)", "Vdd32.uh=vunpack(Vu32.ub)", "Unpack byte with zero-extend",     fVARRAY_ELEMENT_ACCESS(VddV, uh, i)  = fZE8_16( VuV.ub[i]))
+ITERATOR_INSN2_PERMUTE_SLOT_DOUBLE_VEC(8,vunpackb,   "Vdd32=vunpackb(Vu32)",  "Vdd32.h=vunpack(Vu32.b)",   "Unpack bytes with sign-extend",    fVARRAY_ELEMENT_ACCESS(VddV, h,  i)  = fSE8_16( VuV.b[i] ))
+ITERATOR_INSN2_PERMUTE_SLOT_DOUBLE_VEC(16,vunpackuh, "Vdd32=vunpackuh(Vu32)", "Vdd32.uw=vunpack(Vu32.uh)", "Unpack halves with zero-extend",   fVARRAY_ELEMENT_ACCESS(VddV, uw, i)  = fZE16_32(VuV.uh[i]))
+ITERATOR_INSN2_PERMUTE_SLOT_DOUBLE_VEC(16,vunpackh,  "Vdd32=vunpackh(Vu32)",  "Vdd32.w=vunpack(Vu32.h)",   "Unpack halves with sign-extend",   fVARRAY_ELEMENT_ACCESS(VddV, w,  i)  = fSE16_32(VuV.h[i] ))
+
+ITERATOR_INSN2_PERMUTE_SLOT_DOUBLE_VEC(8, vunpackob, "Vxx32|=vunpackob(Vu32)", "Vxx32.h|=vunpacko(Vu32.b)", "Unpack byte to odd bytes ",       fVARRAY_ELEMENT_ACCESS(VxxV, uh, i) |= fZE8_16( VuV.ub[i])<<8)
+ITERATOR_INSN2_PERMUTE_SLOT_DOUBLE_VEC(16,vunpackoh, "Vxx32|=vunpackoh(Vu32)", "Vxx32.w|=vunpacko(Vu32.h)", "Unpack halves to odd halves",     fVARRAY_ELEMENT_ACCESS(VxxV, uw, i) |= fZE16_32(VuV.uh[i])<<16)
+
+
+/**************************************************************
+* Pack elements and cross lane permute
+***************************************************************/
+
+ ITERATOR_INSN2_PERMUTE_SLOT(16, vpackeb,  "Vd32=vpackeb(Vu32,Vv32)", "Vd32.b=vpacke(Vu32.h,Vv32.h)",
+ "Pack  bytes",
+    VdV.ub[i]               = fGETUBYTE(0, VvV.uh[i]);
+    VdV.ub[i+fVELEM(16)]    = fGETUBYTE(0, VuV.uh[i]))
+
+ ITERATOR_INSN2_PERMUTE_SLOT(32, vpackeh,  "Vd32=vpackeh(Vu32,Vv32)", "Vd32.h=vpacke(Vu32.w,Vv32.w)",
+ "Pack  halfwords",
+    VdV.uh[i]               = fGETUHALF(0, VvV.uw[i]);
+    VdV.uh[i+fVELEM(32)]    = fGETUHALF(0, VuV.uw[i]))
+
+  ITERATOR_INSN2_PERMUTE_SLOT(16, vpackob,  "Vd32=vpackob(Vu32,Vv32)", "Vd32.b=vpacko(Vu32.h,Vv32.h)",
+ "Pack  bytes",
+    VdV.ub[i]               = fGETUBYTE(1, VvV.uh[i]);
+    VdV.ub[i+fVELEM(16)]    = fGETUBYTE(1, VuV.uh[i]))
+
+ ITERATOR_INSN2_PERMUTE_SLOT(32, vpackoh,  "Vd32=vpackoh(Vu32,Vv32)", "Vd32.h=vpacko(Vu32.w,Vv32.w)",
+ "Pack  halfwords",
+    VdV.uh[i]               = fGETUHALF(1, VvV.uw[i]);
+    VdV.uh[i+fVELEM(32)]    = fGETUHALF(1, VuV.uw[i]))
+
+
+
+ITERATOR_INSN2_PERMUTE_SLOT(16, vpackhub_sat,  "Vd32=vpackhub(Vu32,Vv32):sat", "Vd32.ub=vpack(Vu32.h,Vv32.h):sat",
+ "Pack ubytes with saturation",
+    VdV.ub[i]               = fVSATUB(VvV.h[i]);
+    VdV.ub[i+fVELEM(16)]    = fVSATUB(VuV.h[i]))
+
+
+ITERATOR_INSN2_PERMUTE_SLOT(16, vpackhb_sat,  "Vd32=vpackhb(Vu32,Vv32):sat", "Vd32.b=vpack(Vu32.h,Vv32.h):sat",
+ "Pack bytes with saturation",
+    VdV.b[i]               = fVSATB(VvV.h[i]);
+    VdV.b[i+fVELEM(16)]    = fVSATB(VuV.h[i]))
+
+
+ITERATOR_INSN2_PERMUTE_SLOT(32, vpackwuh_sat,  "Vd32=vpackwuh(Vu32,Vv32):sat", "Vd32.uh=vpack(Vu32.w,Vv32.w):sat",
+ "Pack ubytes with saturation",
+    VdV.uh[i]               = fVSATUH(VvV.w[i]);
+    VdV.uh[i+fVELEM(32)]    = fVSATUH(VuV.w[i]))
+
+ITERATOR_INSN2_PERMUTE_SLOT(32, vpackwh_sat,  "Vd32=vpackwh(Vu32,Vv32):sat", "Vd32.h=vpack(Vu32.w,Vv32.w):sat",
+ "Pack bytes with saturation",
+    VdV.h[i]               = fVSATH(VvV.w[i]);
+    VdV.h[i+fVELEM(32)]    = fVSATH(VuV.w[i]))
+
+
+
+
+
+/**************************************************************
+* Zero/Sign Extend with in-lane permute
+***************************************************************/
+
+ITERATOR_INSN2_ANY_SLOT_DOUBLE_VEC(16,vzb,"Vdd32=vzxtb(Vu32)","Vdd32.uh=vzxt(Vu32.ub)",
+"Vector Zero Extend Bytes",
+    VddV.v[0].uh[i] = fZE8_16(fGETUBYTE(0, VuV.uh[i]));
+    VddV.v[1].uh[i] = fZE8_16(fGETUBYTE(1, VuV.uh[i])))
+
+ITERATOR_INSN2_ANY_SLOT_DOUBLE_VEC(16,vsb,"Vdd32=vsxtb(Vu32)","Vdd32.h=vsxt(Vu32.b)",
+"Vector Sign Extend Bytes",
+    VddV.v[0].h[i] = fSE8_16(fGETBYTE(0, VuV.h[i]));
+    VddV.v[1].h[i] = fSE8_16(fGETBYTE(1, VuV.h[i])))
+
+ITERATOR_INSN2_ANY_SLOT_DOUBLE_VEC(32,vzh,"Vdd32=vzxth(Vu32)","Vdd32.uw=vzxt(Vu32.uh)",
+"Vector Zero Extend halfwords",
+    VddV.v[0].uw[i] = fZE16_32(fGETUHALF(0, VuV.uw[i]));
+    VddV.v[1].uw[i] = fZE16_32(fGETUHALF(1, VuV.uw[i])))
+
+ITERATOR_INSN2_ANY_SLOT_DOUBLE_VEC(32,vsh,"Vdd32=vsxth(Vu32)","Vdd32.w=vsxt(Vu32.h)",
+"Vector Sign Extend halfwords",
+    VddV.v[0].w[i] = fSE16_32(fGETHALF(0, VuV.w[i]));
+    VddV.v[1].w[i] = fSE16_32(fGETHALF(1, VuV.w[i])))
+
+
+/**********************************************************************
+*
+*
+*
+*               MMVECTOR REDUCTION
+*
+*
+*
+**********************************************************************/
+
+/********************************************
+*  2-WAY REDUCTION - UNSIGNED BYTE BY BYTE
+********************************************/
+
+
+ITERATOR_INSN2_MPY_SLOT(16,vdmpybus,"Vd32=vdmpybus(Vu32,Rt32)","Vd32.h=vdmpy(Vu32.ub,Rt32.b)",
+"Vector Dual Multiply-Accumulates unsigned bytes by bytes",
+    VdV.h[i]   = fMPY8US( fGETUBYTE(0, VuV.uh[i]), fGETBYTE((2*i) % 4, RtV));
+    VdV.h[i]  += fMPY8US( fGETUBYTE(1, VuV.uh[i]), fGETBYTE((2*i+1)%4, RtV)))
+
+ITERATOR_INSN2_MPY_SLOT(16,vdmpybus_acc,"Vx32+=vdmpybus(Vu32,Rt32)","Vx32.h+=vdmpy(Vu32.ub,Rt32.b)",
+"Vector Dual Multiply-Accumulates unsigned bytes by  bytes, and accumulate",
+    VxV.h[i] += fMPY8US( fGETUBYTE(0, VuV.uh[i]), fGETBYTE((2*i) % 4, RtV));
+    VxV.h[i] += fMPY8US( fGETUBYTE(1, VuV.uh[i]), fGETBYTE((2*i+1)%4, RtV)))
+
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vdmpybus_dv,"Vdd32=vdmpybus(Vuu32,Rt32)","Vdd32.h=vdmpy(Vuu32.ub,Rt32.b)",
+"Vector Dual Multiply-Accumulates unsigned bytes by  bytes, and accumulate Sliding Window Reduction",
+    VddV.v[0].h[i]  = fMPY8US(fGETUBYTE(0, VuuV.v[0].uh[i]),fGETBYTE((2*i) % 4, RtV));
+    VddV.v[0].h[i] += fMPY8US(fGETUBYTE(1, VuuV.v[0].uh[i]),fGETBYTE((2*i+1)%4, RtV));
+
+    VddV.v[1].h[i]  = fMPY8US(fGETUBYTE(1, VuuV.v[0].uh[i]),fGETBYTE((2*i) % 4, RtV));
+    VddV.v[1].h[i] += fMPY8US(fGETUBYTE(0, VuuV.v[1].uh[i]),fGETBYTE((2*i+1)%4, RtV)))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vdmpybus_dv_acc,"Vxx32+=vdmpybus(Vuu32,Rt32)","Vxx32.h+=vdmpy(Vuu32.ub,Rt32.b)",
+"Vector Dual Multiply-Accumulates unsigned bytes by  bytes, and accumulate Sliding Window Reduction",
+    VxxV.v[0].h[i] += fMPY8US(fGETUBYTE(0, VuuV.v[0].uh[i]),fGETBYTE((2*i) % 4, RtV));
+    VxxV.v[0].h[i] += fMPY8US(fGETUBYTE(1, VuuV.v[0].uh[i]),fGETBYTE((2*i+1)%4, RtV));
+
+    VxxV.v[1].h[i] += fMPY8US(fGETUBYTE(1, VuuV.v[0].uh[i]),fGETBYTE((2*i) % 4, RtV));
+    VxxV.v[1].h[i] += fMPY8US(fGETUBYTE(0, VuuV.v[1].uh[i]),fGETBYTE((2*i+1)%4, RtV)))
+
+
+
+/********************************************
+*  2-WAY REDUCTION - HALF BY BYTE
+********************************************/
+ITERATOR_INSN2_MPY_SLOT(32,vdmpyhb,"Vd32=vdmpyhb(Vu32,Rt32)","Vd32.w=vdmpy(Vu32.h,Rt32.b)",
+"Dual-Vector 2-Element Half x Byte Reduction with Sliding Window Overlap",
+    VdV.w[i]  = fMPY16SS(fGETHALF(0, VuV.w[i]),fGETBYTE((2*i+0)%4, RtV));
+    VdV.w[i] += fMPY16SS(fGETHALF(1, VuV.w[i]),fGETBYTE((2*i+1)%4, RtV)))
+
+ITERATOR_INSN2_MPY_SLOT(32,vdmpyhb_acc,"Vx32+=vdmpyhb(Vu32,Rt32)","Vx32.w+=vdmpy(Vu32.h,Rt32.b)",
+"Dual-Vector 2-Element Half x Byte Reduction with Sliding Window Overlap",
+    VxV.w[i] += fMPY16SS(fGETHALF(0, VuV.w[i]),fGETBYTE((2*i+0)%4, RtV));
+    VxV.w[i] += fMPY16SS(fGETHALF(1, VuV.w[i]),fGETBYTE((2*i+1)%4, RtV)))
+
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vdmpyhb_dv,"Vdd32=vdmpyhb(Vuu32,Rt32)","Vdd32.w=vdmpy(Vuu32.h,Rt32.b)",
+"Dual-Vector 2-Element Half x Byte Reduction with Sliding Window Overlap",
+    VddV.v[0].w[i]  = fMPY16SS(fGETHALF(0, VuuV.v[0].w[i]),fGETBYTE((2*i+0)%4, RtV));
+    VddV.v[0].w[i] += fMPY16SS(fGETHALF(1, VuuV.v[0].w[i]),fGETBYTE((2*i+1)%4, RtV));
+
+    VddV.v[1].w[i]  = fMPY16SS(fGETHALF(1, VuuV.v[0].w[i]),fGETBYTE((2*i+0)%4, RtV));
+    VddV.v[1].w[i] += fMPY16SS(fGETHALF(0, VuuV.v[1].w[i]),fGETBYTE((2*i+1)%4, RtV)))
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vdmpyhb_dv_acc,"Vxx32+=vdmpyhb(Vuu32,Rt32)","Vxx32.w+=vdmpy(Vuu32.h,Rt32.b)",
+"Dual-Vector 2-Element Half x Byte Reduction with Sliding Window Overlap",
+    VxxV.v[0].w[i] += fMPY16SS(fGETHALF(0, VuuV.v[0].w[i]),fGETBYTE((2*i+0)%4, RtV));
+    VxxV.v[0].w[i] += fMPY16SS(fGETHALF(1, VuuV.v[0].w[i]),fGETBYTE((2*i+1)%4, RtV));
+
+    VxxV.v[1].w[i] += fMPY16SS(fGETHALF(1, VuuV.v[0].w[i]),fGETBYTE((2*i+0)%4, RtV));
+    VxxV.v[1].w[i] += fMPY16SS(fGETHALF(0, VuuV.v[1].w[i]),fGETBYTE((2*i+1)%4, RtV)))
+
+
+
+
+
+/********************************************
+*  2-WAY REDUCTION - HALF BY HALF
+********************************************/
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vdmpyhvsat,"Vd32=vdmpyh(Vu32,Vv32):sat","Vd32.w=vdmpy(Vu32.h,Vv32.h):sat",
+"Vector halfword multiply, accumulate pairs, sat to word",
+    fHIDE(size8s_t accum;)
+    accum    = fMPY16SS(fGETHALF(0,VuV.w[i]),fGETHALF(0, VvV.w[i]));
+    accum   += fMPY16SS(fGETHALF(1,VuV.w[i]),fGETHALF(1, VvV.w[i]));
+    VdV.w[i] = fVSATW(accum))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vdmpyhvsat_acc,"Vx32+=vdmpyh(Vu32,Vv32):sat","Vx32.w+=vdmpy(Vu32.h,Vv32.h):sat",
+"Vector halfword multiply, accumulate pairs, sat to word",
+    fHIDE(size8s_t accum;)
+    accum    = fMPY16SS(fGETHALF(0,VuV.w[i]),fGETHALF(0, VvV.w[i]));
+    accum   += fMPY16SS(fGETHALF(1,VuV.w[i]),fGETHALF(1, VvV.w[i]));
+    VxV.w[i] = fVSATW(VxV.w[i]+accum))
+
+
+/* VDMPYH */
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vdmpyhsat,"Vd32=vdmpyh(Vu32,Rt32):sat","Vd32.w=vdmpy(Vu32.h,Rt32.h):sat",
+"Vector halfword multiply, accumulate pairs, saturate to word",
+    fHIDE(size8s_t accum;)
+    accum    = fMPY16SS(fGETHALF(0, VuV.w[i]),fGETHALF(0, RtV));
+    accum   += fMPY16SS(fGETHALF(1, VuV.w[i]),fGETHALF(1, RtV));
+    VdV.w[i] = fVSATW(accum))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vdmpyhsat_acc,"Vx32+=vdmpyh(Vu32,Rt32):sat","Vx32.w+=vdmpy(Vu32.h,Rt32.h):sat",
+"Vector halfword multiply, accumulate pairs, saturate to word",
+    fHIDE(size8s_t) accum = VxV.w[i];
+    accum   += fMPY16SS(fGETHALF(0, VuV.w[i]),fGETHALF(0, RtV));
+    accum   += fMPY16SS(fGETHALF(1, VuV.w[i]),fGETHALF(1, RtV));
+    VxV.w[i] = fVSATW(accum))
+
+
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vdmpyhisat,"Vd32=vdmpyh(Vuu32,Rt32):sat","Vd32.w=vdmpy(Vuu32.h,Rt32.h):sat",
+"Dual Vector Signed Halfword by Signed Halfword 2-Way Reduction to Halfword with saturation",
+    fHIDE(size8s_t accum;)
+    accum    = fMPY16SS(fGETHALF(1,VuuV.v[0].w[i]),fGETHALF(0,RtV));
+    accum   += fMPY16SS(fGETHALF(0,VuuV.v[1].w[i]),fGETHALF(1,RtV));
+    VdV.w[i] = fVSATW(accum))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vdmpyhisat_acc,"Vx32+=vdmpyh(Vuu32,Rt32):sat","Vx32.w+=vdmpy(Vuu32.h,Rt32.h):sat",
+"Dual Vector Signed Halfword by Signed Halfword 2-Way Reduction to Halfword with accumulation and saturation",
+    fHIDE(size8s_t) accum = VxV.w[i];
+    accum   += fMPY16SS(fGETHALF(1,VuuV.v[0].w[i]),fGETHALF(0,RtV));
+    accum   += fMPY16SS(fGETHALF(0,VuuV.v[1].w[i]),fGETHALF(1,RtV));
+    VxV.w[i] = fVSATW(accum))
+
+
+
+
+
+
+
+/* VDMPYHSU */
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vdmpyhsusat,"Vd32=vdmpyhsu(Vu32,Rt32):sat","Vd32.w=vdmpy(Vu32.h,Rt32.uh):sat",
+"Vector halfword multiply, accumulate pairs, saturate to word",
+    fHIDE(size8s_t accum;)
+    accum    = fMPY16SU(fGETHALF(0, VuV.w[i]),fGETUHALF(0, RtV));
+    accum   += fMPY16SU(fGETHALF(1, VuV.w[i]),fGETUHALF(1, RtV));
+    VdV.w[i] = fVSATW(accum))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vdmpyhsusat_acc,"Vx32+=vdmpyhsu(Vu32,Rt32):sat","Vx32.w+=vdmpy(Vu32.h,Rt32.uh):sat",
+"Vector halfword multiply, accumulate pairs, saturate to word",
+    fHIDE(size8s_t) accum=VxV.w[i];
+    accum   += fMPY16SU(fGETHALF(0, VuV.w[i]),fGETUHALF(0, RtV));
+    accum   += fMPY16SU(fGETHALF(1, VuV.w[i]),fGETUHALF(1, RtV));
+    VxV.w[i] = fVSATW(accum))
+
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vdmpyhsuisat,"Vd32=vdmpyhsu(Vuu32,Rt32,#1):sat","Vd32.w=vdmpy(Vuu32.h,Rt32.uh,#1):sat",
+"Dual Vector Signed Halfword by Signed Halfword 2-Way Reduction to Halfword with saturation",
+    fHIDE(size8s_t accum;)
+    accum    = fMPY16SU(fGETHALF(1,VuuV.v[0].w[i]),fGETUHALF(0,RtV));
+    accum   += fMPY16SU(fGETHALF(0,VuuV.v[1].w[i]),fGETUHALF(1,RtV));
+    VdV.w[i] = fVSATW(accum))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vdmpyhsuisat_acc,"Vx32+=vdmpyhsu(Vuu32,Rt32,#1):sat","Vx32.w+=vdmpy(Vuu32.h,Rt32.uh,#1):sat",
+"Dual Vector Signed Halfword by Signed Halfword 2-Way Reduction to Halfword with accumulation and saturation",
+    fHIDE(size8s_t) accum=VxV.w[i];
+    accum   += fMPY16SU(fGETHALF(1, VuuV.v[0].w[i]),fGETUHALF(0,RtV));
+    accum   += fMPY16SU(fGETHALF(0, VuuV.v[1].w[i]),fGETUHALF(1,RtV));
+    VxV.w[i] = fVSATW(accum))
+
+
+
+/********************************************
+*  3-WAY REDUCTION - UNSIGNED BYTE BY  BYTE
+********************************************/
+
+ ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vtmpyb, "Vdd32=vtmpyb(Vuu32,Rt32)", "Vdd32.h=vtmpy(Vuu32.b,Rt32.b)",
+"Dual Vector 3x1 Reduction",
+    VddV.v[0].h[i]  = fMPY8SS(fGETBYTE(0,VuuV.v[0].h[i]), fGETBYTE((2*i  )%4, RtV));
+    VddV.v[0].h[i] += fMPY8SS(fGETBYTE(1,VuuV.v[0].h[i]), fGETBYTE((2*i+1)%4, RtV));
+    VddV.v[0].h[i] += fGETBYTE(0,VuuV.v[1].h[i]);
+
+    VddV.v[1].h[i]  = fMPY8SS(fGETBYTE(1,VuuV.v[0].h[i]), fGETBYTE((2*i  )%4, RtV));
+    VddV.v[1].h[i] += fMPY8SS(fGETBYTE(0,VuuV.v[1].h[i]), fGETBYTE((2*i+1)%4, RtV));
+    VddV.v[1].h[i] += fGETBYTE(1,VuuV.v[1].h[i]))
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vtmpyb_acc, "Vxx32+=vtmpyb(Vuu32,Rt32)", "Vxx32.h+=vtmpy(Vuu32.b,Rt32.b)",
+"Dual Vector 3x1 Reduction",
+    VxxV.v[0].h[i] += fMPY8SS(fGETBYTE(0,VuuV.v[0].h[i]), fGETBYTE((2*i  )%4, RtV));
+    VxxV.v[0].h[i] += fMPY8SS(fGETBYTE(1,VuuV.v[0].h[i]), fGETBYTE((2*i+1)%4, RtV));
+    VxxV.v[0].h[i] += fGETBYTE(0,VuuV.v[1].h[i]);
+
+    VxxV.v[1].h[i] += fMPY8SS(fGETBYTE(1,VuuV.v[0].h[i]), fGETBYTE((2*i  )%4, RtV));
+    VxxV.v[1].h[i] += fMPY8SS(fGETBYTE(0,VuuV.v[1].h[i]), fGETBYTE((2*i+1)%4, RtV));
+    VxxV.v[1].h[i] += fGETBYTE(1,VuuV.v[1].h[i]))
+
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vtmpybus, "Vdd32=vtmpybus(Vuu32,Rt32)", "Vdd32.h=vtmpy(Vuu32.ub,Rt32.b)",
+"Dual Vector 3x1 Reduction",
+    VddV.v[0].h[i]  = fMPY8US(fGETUBYTE(0,VuuV.v[0].uh[i]), fGETBYTE((2*i  )%4, RtV));
+    VddV.v[0].h[i] += fMPY8US(fGETUBYTE(1,VuuV.v[0].uh[i]), fGETBYTE((2*i+1)%4, RtV));
+    VddV.v[0].h[i] += fGETUBYTE(0,VuuV.v[1].uh[i]);
+
+    VddV.v[1].h[i]  = fMPY8US(fGETUBYTE(1,VuuV.v[0].uh[i]), fGETBYTE((2*i  )%4, RtV));
+    VddV.v[1].h[i] += fMPY8US(fGETUBYTE(0,VuuV.v[1].uh[i]), fGETBYTE((2*i+1)%4, RtV));
+    VddV.v[1].h[i] += fGETUBYTE(1,VuuV.v[1].uh[i]))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vtmpybus_acc, "Vxx32+=vtmpybus(Vuu32,Rt32)", "Vxx32.h+=vtmpy(Vuu32.ub,Rt32.b)",
+"Dual Vector 3x1 Reduction",
+    VxxV.v[0].h[i] += fMPY8US(fGETUBYTE(0,VuuV.v[0].uh[i]), fGETBYTE((2*i  )%4, RtV));
+    VxxV.v[0].h[i] += fMPY8US(fGETUBYTE(1,VuuV.v[0].uh[i]), fGETBYTE((2*i+1)%4, RtV));
+    VxxV.v[0].h[i] += fGETUBYTE(0,VuuV.v[1].uh[i]);
+
+    VxxV.v[1].h[i] += fMPY8US(fGETUBYTE(1,VuuV.v[0].uh[i]), fGETBYTE((2*i  )%4, RtV));
+    VxxV.v[1].h[i] += fMPY8US(fGETUBYTE(0,VuuV.v[1].uh[i]), fGETBYTE((2*i+1)%4, RtV));
+    VxxV.v[1].h[i] += fGETUBYTE(1,VuuV.v[1].uh[i]))
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vtmpyhb, "Vdd32=vtmpyhb(Vuu32,Rt32)", "Vdd32.w=vtmpy(Vuu32.h,Rt32.b)",
+"Dual Vector 3x1 Reduction",
+    VddV.v[0].w[i] = fMPY16SS(fGETHALF(0,VuuV.v[0].w[i]), fSE8_16(fGETBYTE((2*i+0)%4, RtV)));
+    VddV.v[0].w[i]+= fMPY16SS(fGETHALF(1,VuuV.v[0].w[i]), fSE8_16(fGETBYTE((2*i+1)%4, RtV)));
+    VddV.v[0].w[i]+= fGETHALF(0,VuuV.v[1].w[i]);
+
+    VddV.v[1].w[i] = fMPY16SS(fGETHALF(1,VuuV.v[0].w[i]), fSE8_16(fGETBYTE((2*i+0)%4, RtV)));
+    VddV.v[1].w[i]+= fMPY16SS(fGETHALF(0,VuuV.v[1].w[i]), fSE8_16(fGETBYTE((2*i+1)%4, RtV)));
+    VddV.v[1].w[i]+= fGETHALF(1,VuuV.v[1].w[i]))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vtmpyhb_acc, "Vxx32+=vtmpyhb(Vuu32,Rt32)", "Vxx32.w+=vtmpy(Vuu32.h,Rt32.b)",
+"Dual Vector 3x1 Reduction",
+    VxxV.v[0].w[i]+= fMPY16SS(fGETHALF(0,VuuV.v[0].w[i]), fSE8_16(fGETBYTE((2*i+0)%4, RtV)));
+    VxxV.v[0].w[i]+= fMPY16SS(fGETHALF(1,VuuV.v[0].w[i]), fSE8_16(fGETBYTE((2*i+1)%4, RtV)));
+    VxxV.v[0].w[i]+= fGETHALF(0,VuuV.v[1].w[i]);
+
+    VxxV.v[1].w[i]+= fMPY16SS(fGETHALF(1,VuuV.v[0].w[i]), fSE8_16(fGETBYTE((2*i+0)%4, RtV)));
+    VxxV.v[1].w[i]+= fMPY16SS(fGETHALF(0,VuuV.v[1].w[i]), fSE8_16(fGETBYTE((2*i+1)%4, RtV)));
+    VxxV.v[1].w[i]+= fGETHALF(1,VuuV.v[1].w[i]))
+
+
+/********************************************
+*  4-WAY REDUCTION - UNSIGNED BYTE BY UNSIGNED BYTE
+********************************************/
+
+
+
+ITERATOR_INSN2_MPY_SLOT(32,vrmpyub,"Vd32=vrmpyub(Vu32,Rt32)","Vd32.uw=vrmpy(Vu32.ub,Rt32.ub)",
+"Vector Multiply-Accumulate Reduce with 4 byte coefficients",
+    VdV.uw[i]  = fMPY8UU(fGETUBYTE(0,VuV.uw[i]), fGETUBYTE(0,RtV));
+    VdV.uw[i] += fMPY8UU(fGETUBYTE(1,VuV.uw[i]), fGETUBYTE(1,RtV));
+    VdV.uw[i] += fMPY8UU(fGETUBYTE(2,VuV.uw[i]), fGETUBYTE(2,RtV));
+    VdV.uw[i] += fMPY8UU(fGETUBYTE(3,VuV.uw[i]), fGETUBYTE(3,RtV)))
+
+ITERATOR_INSN2_MPY_SLOT(32,vrmpyub_acc,"Vx32+=vrmpyub(Vu32,Rt32)","Vx32.uw+=vrmpy(Vu32.ub,Rt32.ub)",
+"Vector Multiply-Accumulate Reduce with 4 byte coefficients Accumulate",
+    VxV.uw[i] += fMPY8UU(fGETUBYTE(0,VuV.uw[i]), fGETUBYTE(0,RtV));
+    VxV.uw[i] += fMPY8UU(fGETUBYTE(1,VuV.uw[i]), fGETUBYTE(1,RtV));
+    VxV.uw[i] += fMPY8UU(fGETUBYTE(2,VuV.uw[i]), fGETUBYTE(2,RtV));
+    VxV.uw[i] += fMPY8UU(fGETUBYTE(3,VuV.uw[i]), fGETUBYTE(3,RtV)))
+
+
+ITERATOR_INSN2_MPY_SLOT(32,vrmpyubv,"Vd32=vrmpyub(Vu32,Vv32)","Vd32.uw=vrmpy(Vu32.ub,Vv32.ub)",
+"Vector Multiply-Accumulate Reduce with 4 byte coefficients",
+    VdV.uw[i]  = fMPY8UU(fGETUBYTE(0,VuV.uw[i]), fGETUBYTE(0,VvV.uw[i]));
+    VdV.uw[i] += fMPY8UU(fGETUBYTE(1,VuV.uw[i]), fGETUBYTE(1,VvV.uw[i]));
+    VdV.uw[i] += fMPY8UU(fGETUBYTE(2,VuV.uw[i]), fGETUBYTE(2,VvV.uw[i]));
+    VdV.uw[i] += fMPY8UU(fGETUBYTE(3,VuV.uw[i]), fGETUBYTE(3,VvV.uw[i])))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vrmpyubv_acc,"Vx32+=vrmpyub(Vu32,Vv32)","Vx32.uw+=vrmpy(Vu32.ub,Vv32.ub)",
+"Vector Multiply-Accumulate Reduce with 4 byte coefficients Accumulate",
+    VxV.uw[i] += fMPY8UU(fGETUBYTE(0,VuV.uw[i]), fGETUBYTE(0,VvV.uw[i]));
+    VxV.uw[i] += fMPY8UU(fGETUBYTE(1,VuV.uw[i]), fGETUBYTE(1,VvV.uw[i]));
+    VxV.uw[i] += fMPY8UU(fGETUBYTE(2,VuV.uw[i]), fGETUBYTE(2,VvV.uw[i]));
+    VxV.uw[i] += fMPY8UU(fGETUBYTE(3,VuV.uw[i]), fGETUBYTE(3,VvV.uw[i])))
+
+ITERATOR_INSN2_MPY_SLOT(32,vrmpybv,"Vd32=vrmpyb(Vu32,Vv32)","Vd32.w=vrmpy(Vu32.b,Vv32.b)",
+"Vector Multiply-Accumulate Reduce with 4 byte coefficients",
+    VdV.w[i]  = fMPY8SS(fGETBYTE(0, VuV.w[i]), fGETBYTE(0, VvV.w[i]));
+    VdV.w[i] += fMPY8SS(fGETBYTE(1, VuV.w[i]), fGETBYTE(1, VvV.w[i]));
+    VdV.w[i] += fMPY8SS(fGETBYTE(2, VuV.w[i]), fGETBYTE(2, VvV.w[i]));
+    VdV.w[i] += fMPY8SS(fGETBYTE(3, VuV.w[i]), fGETBYTE(3, VvV.w[i])))
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vrmpybv_acc,"Vx32+=vrmpyb(Vu32,Vv32)","Vx32.w+=vrmpy(Vu32.b,Vv32.b)",
+"Vector Multiply-Accumulate Reduce with 4 byte coefficients",
+    VxV.w[i] += fMPY8SS(fGETBYTE(0, VuV.w[i]), fGETBYTE(0, VvV.w[i]));
+    VxV.w[i] += fMPY8SS(fGETBYTE(1, VuV.w[i]), fGETBYTE(1, VvV.w[i]));
+    VxV.w[i] += fMPY8SS(fGETBYTE(2, VuV.w[i]), fGETBYTE(2, VvV.w[i]));
+    VxV.w[i] += fMPY8SS(fGETBYTE(3, VuV.w[i]), fGETBYTE(3, VvV.w[i])))
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vrmpyubi,"Vdd32=vrmpyub(Vuu32,Rt32,#u1)","Vdd32.uw=vrmpy(Vuu32.ub,Rt32.ub,#u1)",
+"Dual Vector Unsigned Byte By Signed Byte 4-way Reduction to Word",
+    VddV.v[0].uw[i]  = fMPY8UU(fGETUBYTE(0, VuuV.v[uiV ? 1:0].uw[i]),fGETUBYTE((0-uiV) & 0x3,RtV));
+    VddV.v[0].uw[i] += fMPY8UU(fGETUBYTE(1, VuuV.v[0        ].uw[i]),fGETUBYTE((1-uiV) & 0x3,RtV));
+    VddV.v[0].uw[i] += fMPY8UU(fGETUBYTE(2, VuuV.v[0        ].uw[i]),fGETUBYTE((2-uiV) & 0x3,RtV));
+    VddV.v[0].uw[i] += fMPY8UU(fGETUBYTE(3, VuuV.v[0        ].uw[i]),fGETUBYTE((3-uiV) & 0x3,RtV));
+
+    VddV.v[1].uw[i]  = fMPY8UU(fGETUBYTE(0, VuuV.v[1        ].uw[i]),fGETUBYTE((2-uiV) & 0x3,RtV));
+    VddV.v[1].uw[i] += fMPY8UU(fGETUBYTE(1, VuuV.v[1        ].uw[i]),fGETUBYTE((3-uiV) & 0x3,RtV));
+    VddV.v[1].uw[i] += fMPY8UU(fGETUBYTE(2, VuuV.v[uiV ? 1:0].uw[i]),fGETUBYTE((0-uiV) & 0x3,RtV));
+    VddV.v[1].uw[i] += fMPY8UU(fGETUBYTE(3, VuuV.v[0        ].uw[i]),fGETUBYTE((1-uiV) & 0x3,RtV)))
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vrmpyubi_acc,"Vxx32+=vrmpyub(Vuu32,Rt32,#u1)","Vxx32.uw+=vrmpy(Vuu32.ub,Rt32.ub,#u1)",
+"Dual Vector Unsigned Byte By Signed Byte 4-way Reduction with accumulate and saturation to Word",
+    VxxV.v[0].uw[i] += fMPY8UU(fGETUBYTE(0, VuuV.v[uiV ? 1:0].uw[i]),fGETUBYTE((0-uiV) & 0x3,RtV));
+    VxxV.v[0].uw[i] += fMPY8UU(fGETUBYTE(1, VuuV.v[0        ].uw[i]),fGETUBYTE((1-uiV) & 0x3,RtV));
+    VxxV.v[0].uw[i] += fMPY8UU(fGETUBYTE(2, VuuV.v[0        ].uw[i]),fGETUBYTE((2-uiV) & 0x3,RtV));
+    VxxV.v[0].uw[i] += fMPY8UU(fGETUBYTE(3, VuuV.v[0        ].uw[i]),fGETUBYTE((3-uiV) & 0x3,RtV));
+
+    VxxV.v[1].uw[i] += fMPY8UU(fGETUBYTE(0, VuuV.v[1        ].uw[i]),fGETUBYTE((2-uiV) & 0x3,RtV));
+    VxxV.v[1].uw[i] += fMPY8UU(fGETUBYTE(1, VuuV.v[1        ].uw[i]),fGETUBYTE((3-uiV) & 0x3,RtV));
+    VxxV.v[1].uw[i] += fMPY8UU(fGETUBYTE(2, VuuV.v[uiV ? 1:0].uw[i]),fGETUBYTE((0-uiV) & 0x3,RtV));
+    VxxV.v[1].uw[i] += fMPY8UU(fGETUBYTE(3, VuuV.v[0        ].uw[i]),fGETUBYTE((1-uiV) & 0x3,RtV)))
+
+
+
+
+/********************************************
+*  4-WAY REDUCTION - UNSIGNED BYTE BY  BYTE
+********************************************/
+
+ITERATOR_INSN2_MPY_SLOT(32,vrmpybus,"Vd32=vrmpybus(Vu32,Rt32)","Vd32.w=vrmpy(Vu32.ub,Rt32.b)",
+"Vector Multiply-Accumulate Reduce with 4 byte coefficients",
+    VdV.w[i]  = fMPY8US(fGETUBYTE(0,VuV.uw[i]), fGETBYTE(0,RtV));
+    VdV.w[i] += fMPY8US(fGETUBYTE(1,VuV.uw[i]), fGETBYTE(1,RtV));
+    VdV.w[i] += fMPY8US(fGETUBYTE(2,VuV.uw[i]), fGETBYTE(2,RtV));
+    VdV.w[i] += fMPY8US(fGETUBYTE(3,VuV.uw[i]), fGETBYTE(3,RtV)))
+
+
+ITERATOR_INSN2_MPY_SLOT(32,vrmpybus_acc,"Vx32+=vrmpybus(Vu32,Rt32)","Vx32.w+=vrmpy(Vu32.ub,Rt32.b)",
+"Vector Multiply-Accumulate Reduce with 4 byte coefficients",
+    VxV.w[i] += fMPY8US(fGETUBYTE(0,VuV.uw[i]), fGETBYTE(0,RtV));
+    VxV.w[i] += fMPY8US(fGETUBYTE(1,VuV.uw[i]), fGETBYTE(1,RtV));
+    VxV.w[i] += fMPY8US(fGETUBYTE(2,VuV.uw[i]), fGETBYTE(2,RtV));
+    VxV.w[i] += fMPY8US(fGETUBYTE(3,VuV.uw[i]), fGETBYTE(3,RtV)))
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vrmpybusi,"Vdd32=vrmpybus(Vuu32,Rt32,#u1)","Vdd32.w=vrmpy(Vuu32.ub,Rt32.b,#u1)",
+"Dual Vector Unsigned Byte By Signed Byte 4-way Reduction to Word",
+    VddV.v[0].w[i]  = fMPY8US(fGETUBYTE(0, VuuV.v[uiV ? 1:0].uw[i]),fGETBYTE((0-uiV) & 0x3,RtV));
+    VddV.v[0].w[i] += fMPY8US(fGETUBYTE(1, VuuV.v[0        ].uw[i]),fGETBYTE((1-uiV) & 0x3,RtV));
+    VddV.v[0].w[i] += fMPY8US(fGETUBYTE(2, VuuV.v[0        ].uw[i]),fGETBYTE((2-uiV) & 0x3,RtV));
+    VddV.v[0].w[i] += fMPY8US(fGETUBYTE(3, VuuV.v[0        ].uw[i]),fGETBYTE((3-uiV) & 0x3,RtV));
+
+    VddV.v[1].w[i]  = fMPY8US(fGETUBYTE(0, VuuV.v[1        ].uw[i]),fGETBYTE((2-uiV) & 0x3,RtV));
+    VddV.v[1].w[i] += fMPY8US(fGETUBYTE(1, VuuV.v[1        ].uw[i]),fGETBYTE((3-uiV) & 0x3,RtV));
+    VddV.v[1].w[i] += fMPY8US(fGETUBYTE(2, VuuV.v[uiV ? 1:0].uw[i]),fGETBYTE((0-uiV) & 0x3,RtV));
+    VddV.v[1].w[i] += fMPY8US(fGETUBYTE(3, VuuV.v[0        ].uw[i]),fGETBYTE((1-uiV) & 0x3,RtV)))
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vrmpybusi_acc,"Vxx32+=vrmpybus(Vuu32,Rt32,#u1)","Vxx32.w+=vrmpy(Vuu32.ub,Rt32.b,#u1)",
+"Dual Vector Unsigned Byte By Signed Byte 4-way Reduction with accumulate and saturation to Word",
+    VxxV.v[0].w[i] += fMPY8US(fGETUBYTE(0, VuuV.v[uiV ? 1:0].uw[i]),fGETBYTE((0-uiV) & 0x3,RtV));
+    VxxV.v[0].w[i] += fMPY8US(fGETUBYTE(1, VuuV.v[0        ].uw[i]),fGETBYTE((1-uiV) & 0x3,RtV));
+    VxxV.v[0].w[i] += fMPY8US(fGETUBYTE(2, VuuV.v[0        ].uw[i]),fGETBYTE((2-uiV) & 0x3,RtV));
+    VxxV.v[0].w[i] += fMPY8US(fGETUBYTE(3, VuuV.v[0        ].uw[i]),fGETBYTE((3-uiV) & 0x3,RtV));
+
+    VxxV.v[1].w[i] += fMPY8US(fGETUBYTE(0, VuuV.v[1        ].uw[i]),fGETBYTE((2-uiV) & 0x3,RtV));
+    VxxV.v[1].w[i] += fMPY8US(fGETUBYTE(1, VuuV.v[1        ].uw[i]),fGETBYTE((3-uiV) & 0x3,RtV));
+    VxxV.v[1].w[i] += fMPY8US(fGETUBYTE(2, VuuV.v[uiV ? 1:0].uw[i]),fGETBYTE((0-uiV) & 0x3,RtV));
+    VxxV.v[1].w[i] += fMPY8US(fGETUBYTE(3, VuuV.v[0        ].uw[i]),fGETBYTE((1-uiV) & 0x3,RtV)))
+
+
+
+
+ITERATOR_INSN2_MPY_SLOT(32,vrmpybusv,"Vd32=vrmpybus(Vu32,Vv32)","Vd32.w=vrmpy(Vu32.ub,Vv32.b)",
+"Vector Multiply-Accumulate Reduce with 4 byte coefficients",
+    VdV.w[i]  = fMPY8US(fGETUBYTE(0,VuV.uw[i]), fGETBYTE(0,VvV.w[i]));
+    VdV.w[i] += fMPY8US(fGETUBYTE(1,VuV.uw[i]), fGETBYTE(1,VvV.w[i]));
+    VdV.w[i] += fMPY8US(fGETUBYTE(2,VuV.uw[i]), fGETBYTE(2,VvV.w[i]));
+    VdV.w[i] += fMPY8US(fGETUBYTE(3,VuV.uw[i]), fGETBYTE(3,VvV.w[i])))
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vrmpybusv_acc,"Vx32+=vrmpybus(Vu32,Vv32)","Vx32.w+=vrmpy(Vu32.ub,Vv32.b)",
+"Vector Multiply-Accumulate Reduce with 4 byte coefficients",
+    VxV.w[i] += fMPY8US(fGETUBYTE(0,VuV.uw[i]), fGETBYTE(0,VvV.w[i]));
+    VxV.w[i] += fMPY8US(fGETUBYTE(1,VuV.uw[i]), fGETBYTE(1,VvV.w[i]));
+    VxV.w[i] += fMPY8US(fGETUBYTE(2,VuV.uw[i]), fGETBYTE(2,VvV.w[i]));
+    VxV.w[i] += fMPY8US(fGETUBYTE(3,VuV.uw[i]), fGETBYTE(3,VvV.w[i])))
+
+
+
+
+
+
+
+
+
+
+
+/********************************************
+*  2-WAY REDUCTION - SAD
+********************************************/
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vdsaduh,"Vdd32=vdsaduh(Vuu32,Rt32)","Vdd32.uw=vdsad(Vuu32.uh,Rt32.uh)",
+"Dual Vector Halfword by Byte 4-Way Reduction to Word",
+    VddV.v[0].uw[i]  = fABS(fGETUHALF(0, VuuV.v[0].uw[i]) - fGETUHALF(0,RtV));
+    VddV.v[0].uw[i] += fABS(fGETUHALF(1, VuuV.v[0].uw[i]) - fGETUHALF(1,RtV));
+    VddV.v[1].uw[i]  = fABS(fGETUHALF(1, VuuV.v[0].uw[i]) - fGETUHALF(0,RtV));
+    VddV.v[1].uw[i] += fABS(fGETUHALF(0, VuuV.v[1].uw[i]) - fGETUHALF(1,RtV)))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vdsaduh_acc,"Vxx32+=vdsaduh(Vuu32,Rt32)","Vxx32.uw+=vdsad(Vuu32.uh,Rt32.uh)",
+"Dual Vector Halfword by Byte 4-Way Reduction to Word",
+    VxxV.v[0].uw[i] += fABS(fGETUHALF(0, VuuV.v[0].uw[i]) - fGETUHALF(0,RtV));
+    VxxV.v[0].uw[i] += fABS(fGETUHALF(1, VuuV.v[0].uw[i]) - fGETUHALF(1,RtV));
+    VxxV.v[1].uw[i] += fABS(fGETUHALF(1, VuuV.v[0].uw[i]) - fGETUHALF(0,RtV));
+    VxxV.v[1].uw[i] += fABS(fGETUHALF(0, VuuV.v[1].uw[i]) - fGETUHALF(1,RtV)))
+
+
+
+
+/********************************************
+*  4-WAY REDUCTION - SAD
+********************************************/
+
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vrsadubi,"Vdd32=vrsadub(Vuu32,Rt32,#u1)","Vdd32.uw=vrsad(Vuu32.ub,Rt32.ub,#u1)",
+"Dual Vector Halfword by Byte 4-Way Reduction to Word",
+    VddV.v[0].uw[i]  = fABS(fZE8_16(fGETUBYTE(0, VuuV.v[uiV?1:0].uw[i])) - fZE8_16(fGETUBYTE((0-uiV)&3,RtV)));
+    VddV.v[0].uw[i] += fABS(fZE8_16(fGETUBYTE(1, VuuV.v[0      ].uw[i])) - fZE8_16(fGETUBYTE((1-uiV)&3,RtV)));
+    VddV.v[0].uw[i] += fABS(fZE8_16(fGETUBYTE(2, VuuV.v[0      ].uw[i])) - fZE8_16(fGETUBYTE((2-uiV)&3,RtV)));
+    VddV.v[0].uw[i] += fABS(fZE8_16(fGETUBYTE(3, VuuV.v[0      ].uw[i])) - fZE8_16(fGETUBYTE((3-uiV)&3,RtV)));
+
+    VddV.v[1].uw[i]  = fABS(fZE8_16(fGETUBYTE(0, VuuV.v[1      ].uw[i])) - fZE8_16(fGETUBYTE((2-uiV)&3,RtV)));
+    VddV.v[1].uw[i] += fABS(fZE8_16(fGETUBYTE(1, VuuV.v[1      ].uw[i])) - fZE8_16(fGETUBYTE((3-uiV)&3,RtV)));
+    VddV.v[1].uw[i] += fABS(fZE8_16(fGETUBYTE(2, VuuV.v[uiV?1:0].uw[i])) - fZE8_16(fGETUBYTE((0-uiV)&3,RtV)));
+    VddV.v[1].uw[i] += fABS(fZE8_16(fGETUBYTE(3, VuuV.v[0      ].uw[i])) - fZE8_16(fGETUBYTE((1-uiV)&3,RtV))))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vrsadubi_acc,"Vxx32+=vrsadub(Vuu32,Rt32,#u1)","Vxx32.uw+=vrsad(Vuu32.ub,Rt32.ub,#u1)",
+"Dual Vector Halfword by Byte 4-Way Reduction to Word",
+    VxxV.v[0].uw[i] += fABS(fZE8_16(fGETUBYTE(0, VuuV.v[uiV?1:0].uw[i])) - fZE8_16(fGETUBYTE((0-uiV)&3,RtV)));
+    VxxV.v[0].uw[i] += fABS(fZE8_16(fGETUBYTE(1, VuuV.v[0      ].uw[i])) - fZE8_16(fGETUBYTE((1-uiV)&3,RtV)));
+    VxxV.v[0].uw[i] += fABS(fZE8_16(fGETUBYTE(2, VuuV.v[0      ].uw[i])) - fZE8_16(fGETUBYTE((2-uiV)&3,RtV)));
+    VxxV.v[0].uw[i] += fABS(fZE8_16(fGETUBYTE(3, VuuV.v[0      ].uw[i])) - fZE8_16(fGETUBYTE((3-uiV)&3,RtV)));
+
+    VxxV.v[1].uw[i] += fABS(fZE8_16(fGETUBYTE(0, VuuV.v[1      ].uw[i])) - fZE8_16(fGETUBYTE((2-uiV)&3,RtV)));
+    VxxV.v[1].uw[i] += fABS(fZE8_16(fGETUBYTE(1, VuuV.v[1      ].uw[i])) - fZE8_16(fGETUBYTE((3-uiV)&3,RtV)));
+    VxxV.v[1].uw[i] += fABS(fZE8_16(fGETUBYTE(2, VuuV.v[uiV?1:0].uw[i])) - fZE8_16(fGETUBYTE((0-uiV)&3,RtV)));
+    VxxV.v[1].uw[i] += fABS(fZE8_16(fGETUBYTE(3, VuuV.v[0      ].uw[i])) - fZE8_16(fGETUBYTE((1-uiV)&3,RtV))))
+
+
+
+
+
+
+
+
+
+
+/*********************************************************************
+ * MMVECTOR SHIFTING
+ * ******************************************************************/
+// Macro to shift arithmetically left/right and by either RT or Vv
+
+#define V_SHIFT(TYPE, DESC, SIZE, LOGSIZE, CASTTYPE)   \
+ITERATOR_INSN2_SHIFT_SLOT(SIZE,vasr##TYPE,   "Vd32=vasr" #TYPE "(Vu32,Rt32)","Vd32."#TYPE"=vasr(Vu32."#TYPE",Rt32)",         "Vector arithmetic shift right " DESC,    VdV.TYPE[i]     = (VuV.TYPE[i]    >> (RtV & (SIZE-1)))) \
+ITERATOR_INSN2_SHIFT_SLOT(SIZE,vasl##TYPE,   "Vd32=vasl" #TYPE "(Vu32,Rt32)","Vd32."#TYPE"=vasl(Vu32."#TYPE",Rt32)",         "Vector arithmetic shift left  " DESC,    VdV.TYPE[i]     = (VuV.TYPE[i]    << (RtV & (SIZE-1)))) \
+ITERATOR_INSN2_SHIFT_SLOT(SIZE,vlsr##TYPE,   "Vd32=vlsr" #TYPE "(Vu32,Rt32)","Vd32.u"#TYPE"=vlsr(Vu32.u"#TYPE",Rt32)",       "Vector logical shift right "    DESC,    VdV.u##TYPE[i]  = (VuV.u##TYPE[i] >> (RtV & (SIZE-1)))) \
+ITERATOR_INSN2_SHIFT_SLOT(SIZE,vasr##TYPE##v,"Vd32=vasr" #TYPE "(Vu32,Vv32)","Vd32."#TYPE"=vasr(Vu32."#TYPE",Vv32."#TYPE")", "Vector arithmetic shift right " DESC,    VdV.TYPE[i]     = fBIDIR_ASHIFTR(VuV.TYPE[i], fSXTN((LOGSIZE+1),SIZE,VvV.TYPE[i]),CASTTYPE)) \
+ITERATOR_INSN2_SHIFT_SLOT(SIZE,vasl##TYPE##v,"Vd32=vasl" #TYPE "(Vu32,Vv32)","Vd32."#TYPE"=vasl(Vu32."#TYPE",Vv32."#TYPE")", "Vector arithmetic shift left  " DESC,    VdV.TYPE[i]     = fBIDIR_ASHIFTL(VuV.TYPE[i],  fSXTN((LOGSIZE+1),SIZE,VvV.TYPE[i]),CASTTYPE)) \
+ITERATOR_INSN2_SHIFT_SLOT(SIZE,vlsr##TYPE##v,"Vd32=vlsr" #TYPE "(Vu32,Vv32)","Vd32."#TYPE"=vlsr(Vu32."#TYPE",Vv32."#TYPE")", "Vector logical shift right "    DESC,    VdV.u##TYPE[i]  = fBIDIR_LSHIFTR(VuV.u##TYPE[i], fSXTN((LOGSIZE+1),SIZE,VvV.TYPE[i]),CASTTYPE)) \
+
+V_SHIFT(w, "word",   32,5,4_4)
+V_SHIFT(h, "halfword", 16,4,2_2)
+
+ITERATOR_INSN_SHIFT_SLOT(8,vlsrb,"Vd32.ub=vlsr(Vu32.ub,Rt32)","vec log shift right bytes", VdV.b[i] = VuV.ub[i] >> (RtV & 0x7))
+
+ITERATOR_INSN2_SHIFT_SLOT(32,vrotr,"Vd32=vrotr(Vu32,Vv32)","Vd32.uw=vrotr(Vu32.uw,Vv32.uw)","Vector word rotate right", VdV.uw[i] = ((VuV.uw[i] >> (VvV.uw[i] & 0x1f)) | (VuV.uw[i] << (32 - (VvV.uw[i] & 0x1f)))))
+
+/*********************************************************************
+ * MMVECTOR SHIFT AND PERMUTE
+ * ******************************************************************/
+
+ITERATOR_INSN2_PERMUTE_SLOT_DOUBLE_VEC(32,vasr_into,"Vxx32=vasrinto(Vu32,Vv32)","Vxx32.w=vasrinto(Vu32.w,Vv32.w)","ASR vector 1 elements and overlay dropping bits to MSB of vector 2 elements",
+    fHIDE(int64_t ) shift = (fSE32_64(VuV.w[i]) << 32);
+    fHIDE(int64_t ) mask  = (((fSE32_64(VxxV.v[0].w[i])) << 32) | fZE32_64(VxxV.v[0].w[i]));
+    fHIDE(int64_t) lomask = (((fSE32_64(1)) << 32) - 1);
+    fHIDE(int ) count = -(0x40 & VvV.w[i]) + (VvV.w[i] & 0x3f);
+    fHIDE(int64_t ) result = (count == -0x40) ? 0 : (((count < 0) ? ((shift << -(count)) | (mask & (lomask << -(count)))) : ((shift >> count) | (mask & (lomask >> count)))));
+    VxxV.v[1].w[i] = ((result >> 32) & 0xffffffff);
+    VxxV.v[0].w[i] = (result & 0xffffffff))
+
+#define NEW_NARROWING_SHIFT 1
+
+#if NEW_NARROWING_SHIFT
+#define NARROWING_SHIFT(ITERSIZE,TAG,DSTM,DSTTYPE,SRCTYPE,SYNOPTS,SATFUNC,RNDFUNC,SHAMTMASK) \
+ITERATOR_INSN_SHIFT_SLOT(ITERSIZE,TAG, \
+"Vd32." #DSTTYPE "=vasr(Vu32." #SRCTYPE ",Vv32." #SRCTYPE ",Rt8)" #SYNOPTS, \
+"Vector shift right and shuffle", \
+    fHIDE(int )shamt = RtV & SHAMTMASK; \
+    DSTM(0,VdV.SRCTYPE[i],SATFUNC(RNDFUNC(VvV.SRCTYPE[i],shamt) >> shamt)); \
+    DSTM(1,VdV.SRCTYPE[i],SATFUNC(RNDFUNC(VuV.SRCTYPE[i],shamt) >> shamt)))
+
+
+
+
+
+/* WORD TO HALF*/
+
+NARROWING_SHIFT(32,vasrwh,fSETHALF,h,w,,fECHO,fVNOROUND,0xF)
+NARROWING_SHIFT(32,vasrwhsat,fSETHALF,h,w,:sat,fVSATH,fVNOROUND,0xF)
+NARROWING_SHIFT(32,vasrwhrndsat,fSETHALF,h,w,:rnd:sat,fVSATH,fVROUND,0xF)
+NARROWING_SHIFT(32,vasrwuhrndsat,fSETHALF,uh,w,:rnd:sat,fVSATUH,fVROUND,0xF)
+NARROWING_SHIFT(32,vasrwuhsat,fSETHALF,uh,w,:sat,fVSATUH,fVNOROUND,0xF)
+NARROWING_SHIFT(32,vasruwuhrndsat,fSETHALF,uh,uw,:rnd:sat,fVSATUH,fVROUND,0xF)
+
+NARROWING_SHIFT_NOV1(32,vasruwuhsat,fSETHALF,uh,uw,:sat,fVSATUH,fVNOROUND,0xF)
+NARROWING_SHIFT(16,vasrhubsat,fSETBYTE,ub,h,:sat,fVSATUB,fVNOROUND,0x7)
+NARROWING_SHIFT(16,vasrhubrndsat,fSETBYTE,ub,h,:rnd:sat,fVSATUB,fVROUND,0x7)
+NARROWING_SHIFT(16,vasrhbsat,fSETBYTE,b,h,:sat,fVSATB,fVNOROUND,0x7)
+NARROWING_SHIFT(16,vasrhbrndsat,fSETBYTE,b,h,:rnd:sat,fVSATB,fVROUND,0x7)
+
+NARROWING_SHIFT_NOV1(16,vasruhubsat,fSETBYTE,ub,uh,:sat,fVSATUB,fVNOROUND,0x7)
+NARROWING_SHIFT_NOV1(16,vasruhubrndsat,fSETBYTE,ub,uh,:rnd:sat,fVSATUB,fVROUND,0x7)
+
+#else
+ITERATOR_INSN2_SHIFT_SLOT(32,vasrwh,"Vd32=vasrwh(Vu32,Vv32,Rt8)","Vd32.h=vasr(Vu32.w,Vv32.w,Rt8)",
+"Vector arithmetic shift right words, shuffle even halfwords",
+    fSETHALF(0,VdV.w[i], (VvV.w[i] >> (RtV & 0xF)));
+    fSETHALF(1,VdV.w[i], (VuV.w[i] >> (RtV & 0xF))))
+
+
+ITERATOR_INSN2_SHIFT_SLOT(32,vasrwhsat,"Vd32=vasrwh(Vu32,Vv32,Rt8):sat","Vd32.h=vasr(Vu32.w,Vv32.w,Rt8):sat",
+"Vector arithmetic shift right words, shuffle even halfwords",
+    fSETHALF(0,VdV.w[i], fVSATH(VvV.w[i] >> (RtV & 0xF)));
+    fSETHALF(1,VdV.w[i], fVSATH(VuV.w[i] >> (RtV & 0xF))))
+
+ITERATOR_INSN2_SHIFT_SLOT(32,vasrwhrndsat,"Vd32=vasrwh(Vu32,Vv32,Rt8):rnd:sat","Vd32.h=vasr(Vu32.w,Vv32.w,Rt8):rnd:sat",
+"Vector arithmetic shift right words, shuffle even halfwords",
+    fHIDE(int ) shamt = RtV & 0xF;
+    fSETHALF(0,VdV.w[i], fVSATH(  (VvV.w[i] + fBIDIR_ASHIFTL(1,(shamt-1),4_8) ) >> shamt));
+    fSETHALF(1,VdV.w[i], fVSATH(  (VuV.w[i] + fBIDIR_ASHIFTL(1,(shamt-1),4_8) ) >> shamt)))
+
+ITERATOR_INSN2_SHIFT_SLOT(32,vasrwuhrndsat,"Vd32=vasrwuh(Vu32,Vv32,Rt8):rnd:sat","Vd32.uh=vasr(Vu32.w,Vv32.w,Rt8):rnd:sat",
+"Vector arithmetic shift right words, shuffle even halfwords",
+    fHIDE(int ) shamt = RtV & 0xF;
+    fSETHALF(0,VdV.w[i], fVSATUH(  (VvV.w[i] + fBIDIR_ASHIFTL(1,(shamt-1),4_8) ) >> shamt));
+    fSETHALF(1,VdV.w[i], fVSATUH(  (VuV.w[i] + fBIDIR_ASHIFTL(1,(shamt-1),4_8) ) >> shamt)))
+
+ITERATOR_INSN2_SHIFT_SLOT(32,vasrwuhsat,"Vd32=vasrwuh(Vu32,Vv32,Rt8):sat","Vd32.uh=vasr(Vu32.w,Vv32.w,Rt8):sat",
+"Vector arithmetic shift right words, shuffle even halfwords",
+    fSETHALF(0, VdV.uw[i], fVSATUH(VvV.w[i] >> (RtV & 0xF)));
+    fSETHALF(1, VdV.uw[i], fVSATUH(VuV.w[i] >> (RtV & 0xF))))
+
+ITERATOR_INSN2_SHIFT_SLOT(32,vasruwuhrndsat,"Vd32=vasruwuh(Vu32,Vv32,Rt8):rnd:sat","Vd32.uh=vasr(Vu32.uw,Vv32.uw,Rt8):rnd:sat",
+"Vector arithmetic shift right words, shuffle even halfwords",
+    fHIDE(int ) shamt = RtV & 0xF;
+    fSETHALF(0,VdV.w[i], fVSATUH(  (VvV.uw[i] + fBIDIR_ASHIFTL(1,(shamt-1),4_8) ) >> shamt));
+    fSETHALF(1,VdV.w[i], fVSATUH(  (VuV.uw[i] + fBIDIR_ASHIFTL(1,(shamt-1),4_8) ) >> shamt)))
+#endif
+
+
+
+ITERATOR_INSN2_SHIFT_SLOT(32,vroundwh,"Vd32=vroundwh(Vu32,Vv32):sat","Vd32.h=vround(Vu32.w,Vv32.w):sat",
+"Vector round words to halves, shuffle resultant halfwords",
+    fSETHALF(0, VdV.uw[i], fVSATH((VvV.w[i] + fCONSTLL(0x8000)) >> 16));
+    fSETHALF(1, VdV.uw[i], fVSATH((VuV.w[i] + fCONSTLL(0x8000)) >> 16)))
+
+ITERATOR_INSN2_SHIFT_SLOT(32,vroundwuh,"Vd32=vroundwuh(Vu32,Vv32):sat","Vd32.uh=vround(Vu32.w,Vv32.w):sat",
+"Vector round words to halves, shuffle resultant halfwords",
+    fSETHALF(0, VdV.uw[i], fVSATUH((VvV.w[i] + fCONSTLL(0x8000)) >> 16));
+    fSETHALF(1, VdV.uw[i], fVSATUH((VuV.w[i] + fCONSTLL(0x8000)) >> 16)))
+
+ITERATOR_INSN2_SHIFT_SLOT(32,vrounduwuh,"Vd32=vrounduwuh(Vu32,Vv32):sat","Vd32.uh=vround(Vu32.uw,Vv32.uw):sat",
+"Vector round words to halves, shuffle resultant halfwords",
+    fSETHALF(0, VdV.uw[i], fVSATUH((VvV.uw[i] + fCONSTLL(0x8000)) >> 16));
+    fSETHALF(1, VdV.uw[i], fVSATUH((VuV.uw[i] + fCONSTLL(0x8000)) >> 16)))
+
+
+
+
+
+/* HALF TO BYTE*/
+
+ITERATOR_INSN2_SHIFT_SLOT(16,vroundhb,"Vd32=vroundhb(Vu32,Vv32):sat","Vd32.b=vround(Vu32.h,Vv32.h):sat",
+"Vector round words to halves, shuffle resultant halfwords",
+    fSETBYTE(0, VdV.uh[i], fVSATB((VvV.h[i] + 0x80) >> 8));
+    fSETBYTE(1, VdV.uh[i], fVSATB((VuV.h[i] + 0x80) >> 8)))
+
+ITERATOR_INSN2_SHIFT_SLOT(16,vroundhub,"Vd32=vroundhub(Vu32,Vv32):sat","Vd32.ub=vround(Vu32.h,Vv32.h):sat",
+"Vector round words to halves, shuffle resultant halfwords",
+    fSETBYTE(0, VdV.uh[i], fVSATUB((VvV.h[i] + 0x80) >> 8));
+    fSETBYTE(1, VdV.uh[i], fVSATUB((VuV.h[i] + 0x80) >> 8)))
+
+ITERATOR_INSN2_SHIFT_SLOT(16,vrounduhub,"Vd32=vrounduhub(Vu32,Vv32):sat","Vd32.ub=vround(Vu32.uh,Vv32.uh):sat",
+"Vector round words to halves, shuffle resultant halfwords",
+    fSETBYTE(0, VdV.uh[i], fVSATUB((VvV.uh[i] + 0x80) >> 8));
+    fSETBYTE(1, VdV.uh[i], fVSATUB((VuV.uh[i] + 0x80) >> 8)))
+
+
+ITERATOR_INSN2_SHIFT_SLOT(32,vaslw_acc,"Vx32+=vaslw(Vu32,Rt32)","Vx32.w+=vasl(Vu32.w,Rt32)",
+"Vector shift add word",
+    VxV.w[i]  +=  (VuV.w[i] << (RtV & (32-1))))
+
+ITERATOR_INSN2_SHIFT_SLOT(32,vasrw_acc,"Vx32+=vasrw(Vu32,Rt32)","Vx32.w+=vasr(Vu32.w,Rt32)",
+"Vector shift add word",
+    VxV.w[i]  +=  (VuV.w[i] >> (RtV & (32-1))))
+
+ITERATOR_INSN2_SHIFT_SLOT_NOV1(16,vaslh_acc,"Vx32+=vaslh(Vu32,Rt32)","Vx32.h+=vasl(Vu32.h,Rt32)",
+"Vector shift add halfword",
+    VxV.h[i]  +=  (VuV.h[i] << (RtV & (16-1))))
+
+ITERATOR_INSN2_SHIFT_SLOT_NOV1(16,vasrh_acc,"Vx32+=vasrh(Vu32,Rt32)","Vx32.h+=vasr(Vu32.h,Rt32)",
+"Vector shift add halfword",
+    VxV.h[i]  +=  (VuV.h[i] >> (RtV & (16-1))))
+
+/**************************************************************************
+*
+* MMVECTOR ELEMENT-WISE ARITHMETIC
+*
+**************************************************************************/
+
+/**************************************************************************
+* MACROS GO IN MACROS.DEF NOT HERE!!!
+**************************************************************************/
+
+
+#define MMVEC_ABSDIFF(TYPE,TYPE2,DESCR, WIDTH, DEST,SRC)\
+ITERATOR_INSN2_MPY_SLOT(WIDTH, vabsdiff##TYPE,                   "Vd32=vabsdiff"TYPE2"(Vu32,Vv32)" ,"Vd32."#DEST"=vabsdiff(Vu32."#SRC",Vv32."#SRC")" ,     "Vector Absolute of Difference "DESCR,   VdV.DEST[i] = (VuV.SRC[i] > VvV.SRC[i]) ? (VuV.SRC[i] - VvV.SRC[i]) : (VvV.SRC[i] - VuV.SRC[i]))
+
+#define MMVEC_ADDU_SAT(TYPE,TYPE2,DESCR, WIDTH, DEST,SRC)\
+ITERATOR_INSN2_ANY_SLOT(WIDTH, vadd##TYPE##sat,                  "Vd32=vadd"TYPE2"(Vu32,Vv32):sat" ,    "Vd32."#DEST"=vadd(Vu32."#SRC",Vv32."#SRC"):sat",    "Vector Add & Saturate "DESCR,            VdV.DEST[i] = fVUADDSAT(WIDTH,  VuV.SRC[i], VvV.SRC[i]))\
+ITERATOR_INSN2_ANY_SLOT_DOUBLE_VEC(WIDTH, vadd##TYPE##sat_dv,    "Vdd32=vadd"TYPE2"(Vuu32,Vvv32):sat",  "Vdd32."#DEST"=vadd(Vuu32."#SRC",Vvv32."#SRC"):sat", "Double Vector Add & Saturate "DESCR,    VddV.v[0].DEST[i] = fVUADDSAT(WIDTH, VuuV.v[0].SRC[i],VvvV.v[0].SRC[i]); VddV.v[1].DEST[i] = fVUADDSAT(WIDTH, VuuV.v[1].SRC[i],VvvV.v[1].SRC[i]))\
+ITERATOR_INSN2_ANY_SLOT(WIDTH, vsub##TYPE##sat,                  "Vd32=vsub"TYPE2"(Vu32,Vv32):sat",     "Vd32."#DEST"=vsub(Vu32."#SRC",Vv32."#SRC"):sat",    "Vector Add & Saturate "DESCR,            VdV.DEST[i] = fVUSUBSAT(WIDTH,  VuV.SRC[i], VvV.SRC[i]))\
+ITERATOR_INSN2_ANY_SLOT_DOUBLE_VEC(WIDTH, vsub##TYPE##sat_dv,    "Vdd32=vsub"TYPE2"(Vuu32,Vvv32):sat",  "Vdd32."#DEST"=vsub(Vuu32."#SRC",Vvv32."#SRC"):sat", "Double Vector Add & Saturate "DESCR,    VddV.v[0].DEST[i] = fVUSUBSAT(WIDTH, VuuV.v[0].SRC[i],VvvV.v[0].SRC[i]); VddV.v[1].DEST[i] = fVUSUBSAT(WIDTH, VuuV.v[1].SRC[i],VvvV.v[1].SRC[i]))\
+
+#define MMVEC_ADDS_SAT(TYPE,TYPE2,DESCR, WIDTH,DEST,SRC)\
+ITERATOR_INSN2_ANY_SLOT(WIDTH, vadd##TYPE##sat,                  "Vd32=vadd"TYPE2"(Vu32,Vv32):sat" ,    "Vd32."#DEST"=vadd(Vu32."#SRC",Vv32."#SRC"):sat",    "Vector Add & Saturate "DESCR,            VdV.DEST[i] = fVSADDSAT(WIDTH,  VuV.SRC[i],  VvV.SRC[i]))\
+ITERATOR_INSN2_ANY_SLOT_DOUBLE_VEC(WIDTH, vadd##TYPE##sat_dv,    "Vdd32=vadd"TYPE2"(Vuu32,Vvv32):sat",  "Vdd32."#DEST"=vadd(Vuu32."#SRC",Vvv32."#SRC"):sat", "Double Vector Add & Saturate "DESCR,    VddV.v[0].DEST[i] = fVSADDSAT(WIDTH, VuuV.v[0].SRC[i], VvvV.v[0].SRC[i]); VddV.v[1].DEST[i] = fVSADDSAT(WIDTH, VuuV.v[1].SRC[i], VvvV.v[1].SRC[i]))\
+ITERATOR_INSN2_ANY_SLOT(WIDTH, vsub##TYPE##sat,                  "Vd32=vsub"TYPE2"(Vu32,Vv32):sat",     "Vd32."#DEST"=vsub(Vu32."#SRC",Vv32."#SRC"):sat",    "Vector Add & Saturate "DESCR,            VdV.DEST[i] = fVSSUBSAT(WIDTH,  VuV.SRC[i],  VvV.SRC[i]))\
+ITERATOR_INSN2_ANY_SLOT_DOUBLE_VEC(WIDTH, vsub##TYPE##sat_dv,    "Vdd32=vsub"TYPE2"(Vuu32,Vvv32):sat",  "Vdd32."#DEST"=vsub(Vuu32."#SRC",Vvv32."#SRC"):sat", "Double Vector Add & Saturate "DESCR,    VddV.v[0].DEST[i] = fVSSUBSAT(WIDTH, VuuV.v[0].SRC[i], VvvV.v[0].SRC[i]); VddV.v[1].DEST[i] = fVSSUBSAT(WIDTH, VuuV.v[1].SRC[i], VvvV.v[1].SRC[i]))\
+
+#define MMVEC_AVGU(TYPE,TYPE2,DESCR, WIDTH, DEST,SRC)\
+ITERATOR_INSN2_ANY_SLOT(WIDTH,vavg##TYPE,                        "Vd32=vavg"TYPE2"(Vu32,Vv32)",         "Vd32."#DEST"=vavg(Vu32."#SRC",Vv32."#SRC")",        "Vector Average "DESCR,                                      VdV.DEST[i] = fVAVGU(   WIDTH,  VuV.SRC[i], VvV.SRC[i])) \
+ITERATOR_INSN2_ANY_SLOT(WIDTH,vavg##TYPE##rnd,                   "Vd32=vavg"TYPE2"(Vu32,Vv32):rnd",     "Vd32."#DEST"=vavg(Vu32."#SRC",Vv32."#SRC"):rnd",    "Vector Average % Round"DESCR,                               VdV.DEST[i] = fVAVGURND(WIDTH,  VuV.SRC[i], VvV.SRC[i]))
+
+
+
+#define MMVEC_AVGS(TYPE,TYPE2,DESCR, WIDTH, DEST,SRC)\
+ITERATOR_INSN2_ANY_SLOT(WIDTH,vavg##TYPE,                        "Vd32=vavg"TYPE2"(Vu32,Vv32)",          "Vd32."#DEST"=vavg(Vu32."#SRC",Vv32."#SRC")",          "Vector Average "DESCR,                                      VdV.DEST[i]  = fVAVGS(       WIDTH,  VuV.SRC[i], VvV.SRC[i])) \
+ITERATOR_INSN2_ANY_SLOT(WIDTH,vavg##TYPE##rnd,                   "Vd32=vavg"TYPE2"(Vu32,Vv32):rnd",      "Vd32."#DEST"=vavg(Vu32."#SRC",Vv32."#SRC"):rnd",      "Vector Average % Round"DESCR,                               VdV.DEST[i]  = fVAVGSRND(    WIDTH,  VuV.SRC[i], VvV.SRC[i])) \
+ITERATOR_INSN2_ANY_SLOT(WIDTH,vnavg##TYPE,                       "Vd32=vnavg"TYPE2"(Vu32,Vv32)",         "Vd32."#DEST"=vnavg(Vu32."#SRC",Vv32."#SRC")",         "Vector Negative Average "DESCR,                             VdV.DEST[i]  = fVNAVGS(      WIDTH,  VuV.SRC[i], VvV.SRC[i]))
+
+
+
+
+
+
+
+#define MMVEC_ADDWRAP(TYPE,TYPE2, DESCR, WIDTH , DEST,SRC)\
+ITERATOR_INSN2_ANY_SLOT(WIDTH, vadd##TYPE,                  "Vd32=vadd"TYPE2"(Vu32,Vv32)" ,     "Vd32."#DEST"=vadd(Vu32."#SRC",Vv32."#SRC")",    "Vector Add "DESCR,          VdV.DEST[i] =  VuV.SRC[i] +  VvV.SRC[i])\
+ITERATOR_INSN2_ANY_SLOT(WIDTH, vsub##TYPE,                  "Vd32=vsub"TYPE2"(Vu32,Vv32)" ,     "Vd32."#DEST"=vsub(Vu32."#SRC",Vv32."#SRC")",    "Vector Sub "DESCR,          VdV.DEST[i] =  VuV.SRC[i] -  VvV.SRC[i])\
+ITERATOR_INSN2_ANY_SLOT_DOUBLE_VEC(WIDTH, vadd##TYPE##_dv,  "Vdd32=vadd"TYPE2"(Vuu32,Vvv32)" ,  "Vdd32."#DEST"=vadd(Vuu32."#SRC",Vvv32."#SRC")", "Double Vector Add "DESCR,   VddV.v[0].DEST[i] = VuuV.v[0].SRC[i] + VvvV.v[0].SRC[i]; VddV.v[1].DEST[i] = VuuV.v[1].SRC[i] + VvvV.v[1].SRC[i])\
+ITERATOR_INSN2_ANY_SLOT_DOUBLE_VEC(WIDTH, vsub##TYPE##_dv,  "Vdd32=vsub"TYPE2"(Vuu32,Vvv32)" ,  "Vdd32."#DEST"=vsub(Vuu32."#SRC",Vvv32."#SRC")", "Double Vector Sub "DESCR,   VddV.v[0].DEST[i] = VuuV.v[0].SRC[i] - VvvV.v[0].SRC[i]; VddV.v[1].DEST[i] = VuuV.v[1].SRC[i] - VvvV.v[1].SRC[i]) \
+
+
+
+
+
+/* Wrapping Adds */
+MMVEC_ADDWRAP(b,    "b",    "Byte",         8,   b, b)
+MMVEC_ADDWRAP(h,    "h",    "Halfword",     16,  h, h)
+MMVEC_ADDWRAP(w,    "w",    "Word",         32,   w,    w)
+
+/* Saturating Adds */
+MMVEC_ADDU_SAT(ub, "ub",    "Unsigned Byte",        8,   ub,    ub)
+MMVEC_ADDU_SAT(uh, "uh",    "Unsigned Halfword",    16,  uh,    uh)
+MMVEC_ADDU_SAT(uw, "uw",    "Unsigned word",    32,  uw,    uw)
+MMVEC_ADDS_SAT(b,  "b",     "byte",             8,  b,     b)
+MMVEC_ADDS_SAT(h,  "h",     "Halfword",             16,  h,     h)
+MMVEC_ADDS_SAT(w,  "w",     "Word",                 32,  w,     w)
+
+
+/* Averaging Instructions */
+MMVEC_AVGU(ub,"ub",     "Unsigned Byte",     8,   ub,   ub)
+MMVEC_AVGU(uh,"uh",     "Unsigned Halfword", 16,  uh,   uh)
+MMVEC_AVGU_NOV1(uw,"uw",     "Unsigned Word",     32,  uw,   uw)
+MMVEC_AVGS_NOV1(b,   "b",    "Byte",               8,   b,   b)
+MMVEC_AVGS(h,   "h",    "Halfword",          16,   h,   h)
+MMVEC_AVGS(w,   "w",    "Word",              32,   w,   w)
+
+
+/* Absolute Difference */
+MMVEC_ABSDIFF(ub,"ub",  "Unsigned Byte",        8,   ub,    ub)
+MMVEC_ABSDIFF(uh,"uh",  "Unsigned Halfword",    16,  uh,    uh)
+MMVEC_ABSDIFF(h,"h",        "Halfword",             16,  uh,    h)
+MMVEC_ABSDIFF(w,"w",        "Word",                 32,  uw,    w)
+
+ITERATOR_INSN2_ANY_SLOT(8,vnavgub, "Vd32=vnavgub(Vu32,Vv32)", "Vd32.b=vnavg(Vu32.ub,Vv32.ub)",
+"Vector Negative Average Unsigned Byte", VdV.b[i]   = fVNAVGU(8, VuV.ub[i], VvV.ub[i]))
+
+ITERATOR_INSN_ANY_SLOT(32,vaddcarrysat,"Vd32.w=vadd(Vu32.w,Vv32.w,Qs4):carry:sat","add w/carry and saturate",
+VdV.w[i] = fVSATW(VuV.w[i]+VvV.w[i]+fGETQBIT(QsV,i*4)))
+
+ITERATOR_INSN_ANY_SLOT(32,vaddcarry,"Vd32.w=vadd(Vu32.w,Vv32.w,Qx4):carry","add w/carry",
+VdV.w[i] = VuV.w[i]+VvV.w[i]+fGETQBIT(QxV,i*4);
+fSETQBITS(QxV,4,0xF,4*i,-fCARRY_FROM_ADD32(VuV.w[i],VvV.w[i],fGETQBIT(QxV,i*4))))
+
+ITERATOR_INSN_ANY_SLOT(32,vsubcarry,"Vd32.w=vsub(Vu32.w,Vv32.w,Qx4):carry","add w/carry",
+VdV.w[i] = VuV.w[i]+~VvV.w[i]+fGETQBIT(QxV,i*4);
+fSETQBITS(QxV,4,0xF,4*i,-fCARRY_FROM_ADD32(VuV.w[i],~VvV.w[i],fGETQBIT(QxV,i*4))))
+
+ITERATOR_INSN_ANY_SLOT(32,vaddcarryo,"Vd32.w,Qe4=vadd(Vu32.w,Vv32.w):carry","add w/carry out-only",
+VdV.w[i] = VuV.w[i]+VvV.w[i];
+fSETQBITS(QeV,4,0xF,4*i,-fCARRY_FROM_ADD32(VuV.w[i],VvV.w[i],0)))
+
+ITERATOR_INSN_ANY_SLOT(32,vsubcarryo,"Vd32.w,Qe4=vsub(Vu32.w,Vv32.w):carry","subtract w/carry out-only",
+VdV.w[i] = VuV.w[i]+~VvV.w[i]+1;
+fSETQBITS(QeV,4,0xF,4*i,-fCARRY_FROM_ADD32(VuV.w[i],~VvV.w[i],1)))
+
+
+ITERATOR_INSN_ANY_SLOT(32,vsatdw,"Vd32.w=vsatdw(Vu32.w,Vv32.w)","Saturate from 64-bits (higher 32-bits come from first vector) to 32-bits",VdV.w[i] = fVSATDW(VuV.w[i],VvV.w[i]))
+
+
+#define MMVEC_ADDSAT_MIX(TAGEND,SATF,WIDTH,DEST,SRC1,SRC2)\
+ITERATOR_INSN_ANY_SLOT(WIDTH, vadd##TAGEND,"Vd32."#DEST"=vadd(Vu32."#SRC1",Vv32."#SRC2"):sat",    "Vector Add mixed", VdV.DEST[i] =  SATF(VuV.SRC1[i] +  VvV.SRC2[i]))\
+ITERATOR_INSN_ANY_SLOT(WIDTH, vsub##TAGEND,"Vd32."#DEST"=vsub(Vu32."#SRC1",Vv32."#SRC2"):sat",    "Vector Sub mixed", VdV.DEST[i] =  SATF(VuV.SRC1[i] -  VvV.SRC2[i]))\
+
+MMVEC_ADDSAT_MIX(ububb_sat,fVSATUB,8,ub,ub,b)
+
+/****************************
+*   WIDENING
+****************************/
+
+
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vaddubh,"Vdd32=vaddub(Vu32,Vv32)","Vdd32.h=vadd(Vu32.ub,Vv32.ub)",
+"Vector addition with widen into two vectors",
+    VddV.v[0].h[i] = fZE8_16(fGETUBYTE(0, VuV.uh[i])) + fZE8_16(fGETUBYTE(0, VvV.uh[i]));
+    VddV.v[1].h[i] = fZE8_16(fGETUBYTE(1, VuV.uh[i])) + fZE8_16(fGETUBYTE(1, VvV.uh[i])))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vsububh,"Vdd32=vsubub(Vu32,Vv32)","Vdd32.h=vsub(Vu32.ub,Vv32.ub)",
+"Vector subtraction with widen into two vectors",
+    VddV.v[0].h[i] = fZE8_16(fGETUBYTE(0, VuV.uh[i])) - fZE8_16(fGETUBYTE(0, VvV.uh[i]));
+    VddV.v[1].h[i] = fZE8_16(fGETUBYTE(1, VuV.uh[i])) - fZE8_16(fGETUBYTE(1, VvV.uh[i])))
+
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vaddhw,"Vdd32=vaddh(Vu32,Vv32)","Vdd32.w=vadd(Vu32.h,Vv32.h)",
+"Vector addition with widen into two vectors",
+    VddV.v[0].w[i] = fGETHALF(0, VuV.w[i]) + fGETHALF(0, VvV.w[i]);
+    VddV.v[1].w[i] = fGETHALF(1, VuV.w[i]) + fGETHALF(1, VvV.w[i]))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vsubhw,"Vdd32=vsubh(Vu32,Vv32)","Vdd32.w=vsub(Vu32.h,Vv32.h)",
+"Vector subtraction with widen into two vectors",
+    VddV.v[0].w[i] = fGETHALF(0, VuV.w[i]) - fGETHALF(0, VvV.w[i]);
+    VddV.v[1].w[i] = fGETHALF(1, VuV.w[i]) - fGETHALF(1, VvV.w[i]))
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vadduhw,"Vdd32=vadduh(Vu32,Vv32)","Vdd32.w=vadd(Vu32.uh,Vv32.uh)",
+"Vector addition with widen into two vectors",
+    VddV.v[0].w[i] = fZE16_32(fGETUHALF(0, VuV.uw[i])) + fZE16_32(fGETUHALF(0, VvV.uw[i]));
+    VddV.v[1].w[i] = fZE16_32(fGETUHALF(1, VuV.uw[i])) + fZE16_32(fGETUHALF(1, VvV.uw[i])))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vsubuhw,"Vdd32=vsubuh(Vu32,Vv32)","Vdd32.w=vsub(Vu32.uh,Vv32.uh)",
+"Vector subtraction with widen into two vectors",
+    VddV.v[0].w[i] = fZE16_32(fGETUHALF(0, VuV.uw[i])) - fZE16_32(fGETUHALF(0, VvV.uw[i]));
+    VddV.v[1].w[i] = fZE16_32(fGETUHALF(1, VuV.uw[i])) - fZE16_32(fGETUHALF(1, VvV.uw[i])))
+
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vaddhw_acc,"Vxx32+=vaddh(Vu32,Vv32)","Vxx32.w+=vadd(Vu32.h,Vv32.h)",
+"Vector addition with widen into two vectors",
+    VxxV.v[0].w[i] += fGETHALF(0, VuV.w[i]) + fGETHALF(0, VvV.w[i]);
+    VxxV.v[1].w[i] += fGETHALF(1, VuV.w[i]) + fGETHALF(1, VvV.w[i]))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vadduhw_acc,"Vxx32+=vadduh(Vu32,Vv32)","Vxx32.w+=vadd(Vu32.uh,Vv32.uh)",
+"Vector addition with widen into two vectors",
+    VxxV.v[0].w[i] += fGETUHALF(0, VuV.w[i]) + fGETUHALF(0, VvV.w[i]);
+    VxxV.v[1].w[i] += fGETUHALF(1, VuV.w[i]) + fGETUHALF(1, VvV.w[i]))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vaddubh_acc,"Vxx32+=vaddub(Vu32,Vv32)","Vxx32.h+=vadd(Vu32.ub,Vv32.ub)",
+"Vector addition with widen into two vectors",
+    VxxV.v[0].h[i] += fGETUBYTE(0, VuV.h[i]) + fGETUBYTE(0, VvV.h[i]);
+    VxxV.v[1].h[i] += fGETUBYTE(1, VuV.h[i]) + fGETUBYTE(1, VvV.h[i]))
+
+
+/****************************
+*   Conditional
+****************************/
+
+#define CONDADDSUB(WIDTH,TAGEND,LHSYN,RHSYN,DESCR,LHBEH,RHBEH) \
+ITERATOR_INSN2_ANY_SLOT(WIDTH,vadd##TAGEND##q,"if (Qv4."#TAGEND") "LHSYN"+="RHSYN,"if (Qv4) "LHSYN"+="RHSYN,DESCR,LHBEH=fCONDMASK##WIDTH(QvV,i,LHBEH+RHBEH,LHBEH)) \
+ITERATOR_INSN2_ANY_SLOT(WIDTH,vsub##TAGEND##q,"if (Qv4."#TAGEND") "LHSYN"-="RHSYN,"if (Qv4) "LHSYN"-="RHSYN,DESCR,LHBEH=fCONDMASK##WIDTH(QvV,i,LHBEH-RHBEH,LHBEH)) \
+ITERATOR_INSN2_ANY_SLOT(WIDTH,vadd##TAGEND##nq,"if (!Qv4."#TAGEND") "LHSYN"+="RHSYN,"if (!Qv4) "LHSYN"+="RHSYN,DESCR,LHBEH=fCONDMASK##WIDTH(QvV,i,LHBEH,LHBEH+RHBEH)) \
+ITERATOR_INSN2_ANY_SLOT(WIDTH,vsub##TAGEND##nq,"if (!Qv4."#TAGEND") "LHSYN"-="RHSYN,"if (!Qv4) "LHSYN"-="RHSYN,DESCR,LHBEH=fCONDMASK##WIDTH(QvV,i,LHBEH,LHBEH-RHBEH)) \
+
+CONDADDSUB(8,b,"Vx32.b","Vu32.b","Conditional add/sub Byte",VxV.ub[i],VuV.ub[i])
+CONDADDSUB(16,h,"Vx32.h","Vu32.h","Conditional add/sub Half",VxV.h[i],VuV.h[i])
+CONDADDSUB(32,w,"Vx32.w","Vu32.w","Conditional add/sub Word",VxV.w[i],VuV.w[i])
+
+/*****************************************************
+ ABSOLUTE VALUES
+*****************************************************/
+// V65
+ITERATOR_INSN2_ANY_SLOT_NOV1(8,vabsb,        "Vd32=vabsb(Vu32)",     "Vd32.b=vabs(Vu32.b)",     "Vector absolute value of bytes",    VdV.b[i]  =  fABS(VuV.b[i]))
+ITERATOR_INSN2_ANY_SLOT_NOV1(8,vabsb_sat,    "Vd32=vabsb(Vu32):sat", "Vd32.b=vabs(Vu32.b):sat", "Vector absolute value of bytes",    VdV.b[i]  =  fVSATB(fABS(fSE8_16(VuV.b[i]))))
+
+
+ITERATOR_INSN2_ANY_SLOT(16,vabsh,        "Vd32=vabsh(Vu32)",     "Vd32.h=vabs(Vu32.h)",     "Vector absolute value of halfwords",    VdV.h[i]  =  fABS(VuV.h[i]))
+ITERATOR_INSN2_ANY_SLOT(16,vabsh_sat,    "Vd32=vabsh(Vu32):sat", "Vd32.h=vabs(Vu32.h):sat", "Vector absolute value of halfwords",    VdV.h[i]  =  fVSATH(fABS(fSE16_32(VuV.h[i]))))
+ITERATOR_INSN2_ANY_SLOT(32,vabsw,        "Vd32=vabsw(Vu32)",     "Vd32.w=vabs(Vu32.w)",     "Vector absolute value of words",        VdV.w[i]  =  fABS(VuV.w[i]))
+ITERATOR_INSN2_ANY_SLOT(32,vabsw_sat,    "Vd32=vabsw(Vu32):sat", "Vd32.w=vabs(Vu32.w):sat", "Vector absolute value of words",        VdV.w[i]  =  fVSATW(fABS(fSE32_64(VuV.w[i]))))
+
+
+/**************************************************************************
+ * MMVECTOR MULTIPLICATIONS
+ * ************************************************************************/
+
+
+/* Byte by Byte */
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vmpybv,"Vdd32=vmpyb(Vu32,Vv32)","Vdd32.h=vmpy(Vu32.b,Vv32.b)",
+"Vector absolute value of words",
+    VddV.v[0].h[i] =  fMPY8SS(fGETBYTE(0, VuV.h[i]), fGETBYTE(0, VvV.h[i]));
+    VddV.v[1].h[i] =  fMPY8SS(fGETBYTE(1, VuV.h[i]), fGETBYTE(1, VvV.h[i])))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vmpybv_acc,"Vxx32+=vmpyb(Vu32,Vv32)","Vxx32.h+=vmpy(Vu32.b,Vv32.b)",
+"Vector absolute value of words",
+    VxxV.v[0].h[i] +=  fMPY8SS(fGETBYTE(0, VuV.h[i]), fGETBYTE(0, VvV.h[i]));
+    VxxV.v[1].h[i] +=  fMPY8SS(fGETBYTE(1, VuV.h[i]), fGETBYTE(1, VvV.h[i])))
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vmpyubv,"Vdd32=vmpyub(Vu32,Vv32)","Vdd32.uh=vmpy(Vu32.ub,Vv32.ub)",
+"Vector absolute value of words",
+    VddV.v[0].uh[i] =  fMPY8UU(fGETUBYTE(0, VuV.uh[i]), fGETUBYTE(0, VvV.uh[i]) );
+    VddV.v[1].uh[i] =  fMPY8UU(fGETUBYTE(1, VuV.uh[i]), fGETUBYTE(1, VvV.uh[i]) ))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vmpyubv_acc,"Vxx32+=vmpyub(Vu32,Vv32)","Vxx32.uh+=vmpy(Vu32.ub,Vv32.ub)",
+"Vector absolute value of words",
+    VxxV.v[0].uh[i] +=  fMPY8UU(fGETUBYTE(0, VuV.uh[i]), fGETUBYTE(0, VvV.uh[i]) );
+    VxxV.v[1].uh[i] +=  fMPY8UU(fGETUBYTE(1, VuV.uh[i]), fGETUBYTE(1, VvV.uh[i]) ))
+
+
+
+
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vmpybusv,"Vdd32=vmpybus(Vu32,Vv32)","Vdd32.h=vmpy(Vu32.ub,Vv32.b)",
+"Vector absolute value of words",
+    VddV.v[0].h[i]  = fMPY8US(fGETUBYTE(0, VuV.uh[i]), fGETBYTE(0, VvV.h[i]));
+    VddV.v[1].h[i]  = fMPY8US(fGETUBYTE(1, VuV.uh[i]), fGETBYTE(1, VvV.h[i])))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vmpybusv_acc,"Vxx32+=vmpybus(Vu32,Vv32)","Vxx32.h+=vmpy(Vu32.ub,Vv32.b)",
+"Vector absolute value of words",
+    VxxV.v[0].h[i]  += fMPY8US(fGETUBYTE(0, VuV.uh[i]), fGETBYTE(0, VvV.h[i]));
+    VxxV.v[1].h[i]  += fMPY8US(fGETUBYTE(1, VuV.uh[i]), fGETBYTE(1, VvV.h[i])))
+
+
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vmpabusv,"Vdd32=vmpabus(Vuu32,Vvv32)","Vdd32.h=vmpa(Vuu32.ub,Vvv32.b)",
+"Vertical Byte Multiply",
+    VddV.v[0].h[i] = fMPY8US(fGETUBYTE(0, VuuV.v[0].uh[i]), fGETBYTE(0, VvvV.v[0].uh[i])) + fMPY8US(fGETUBYTE(0, VuuV.v[1].uh[i]), fGETBYTE(0, VvvV.v[1].uh[i]));
+    VddV.v[1].h[i] = fMPY8US(fGETUBYTE(1, VuuV.v[0].uh[i]), fGETBYTE(1, VvvV.v[0].uh[i])) + fMPY8US(fGETUBYTE(1, VuuV.v[1].uh[i]), fGETBYTE(1, VvvV.v[1].uh[i])))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vmpabuuv,"Vdd32=vmpabuu(Vuu32,Vvv32)","Vdd32.h=vmpa(Vuu32.ub,Vvv32.ub)",
+"Vertical Byte Multiply",
+    VddV.v[0].h[i] = fMPY8UU(fGETUBYTE(0, VuuV.v[0].uh[i]), fGETUBYTE(0, VvvV.v[0].uh[i])) + fMPY8UU(fGETUBYTE(0, VuuV.v[1].uh[i]), fGETUBYTE(0, VvvV.v[1].uh[i]));
+    VddV.v[1].h[i] = fMPY8UU(fGETUBYTE(1, VuuV.v[0].uh[i]), fGETUBYTE(1, VvvV.v[0].uh[i])) + fMPY8UU(fGETUBYTE(1, VuuV.v[1].uh[i]), fGETUBYTE(1, VvvV.v[1].uh[i])))
+
+
+
+
+
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyhv,"Vdd32=vmpyh(Vu32,Vv32)","Vdd32.w=vmpy(Vu32.h,Vv32.h)",
+"Vector by Vector Halfword Multiply",
+    VddV.v[0].w[i] = fMPY16SS(fGETHALF(0, VuV.w[i]), fGETHALF(0, VvV.w[i]));
+    VddV.v[1].w[i] = fMPY16SS(fGETHALF(1, VuV.w[i]), fGETHALF(1, VvV.w[i])))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyhv_acc,"Vxx32+=vmpyh(Vu32,Vv32)","Vxx32.w+=vmpy(Vu32.h,Vv32.h)",
+"Vector by Vector Halfword Multiply",
+    VxxV.v[0].w[i] += fMPY16SS(fGETHALF(0, VuV.w[i]), fGETHALF(0, VvV.w[i]));
+    VxxV.v[1].w[i] += fMPY16SS(fGETHALF(1, VuV.w[i]), fGETHALF(1, VvV.w[i])))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyuhv,"Vdd32=vmpyuh(Vu32,Vv32)","Vdd32.uw=vmpy(Vu32.uh,Vv32.uh)",
+"Vector by Vector Unsigned Halfword Multiply",
+    VddV.v[0].uw[i] = fMPY16UU(fGETUHALF(0, VuV.uw[i]), fGETUHALF(0, VvV.uw[i]));
+    VddV.v[1].uw[i] = fMPY16UU(fGETUHALF(1, VuV.uw[i]), fGETUHALF(1, VvV.uw[i])))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyuhv_acc,"Vxx32+=vmpyuh(Vu32,Vv32)","Vxx32.uw+=vmpy(Vu32.uh,Vv32.uh)",
+"Vector by Vector Unsigned Halfword Multiply",
+    VxxV.v[0].uw[i] += fMPY16UU(fGETUHALF(0, VuV.uw[i]), fGETUHALF(0, VvV.uw[i]));
+    VxxV.v[1].uw[i] += fMPY16UU(fGETUHALF(1, VuV.uw[i]), fGETUHALF(1, VvV.uw[i])))
+
+
+
+/* Vector by Vector */
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vmpyhvsrs,"Vd32=vmpyh(Vu32,Vv32):<<1:rnd:sat","Vd32.h=vmpy(Vu32.h,Vv32.h):<<1:rnd:sat",
+"Vector halfword multiply with round, shift, and sat16",
+    VdV.h[i] = fVSATH(fGETHALF(1,fVSAT(fROUND((fMPY16SS(VuV.h[i],VvV.h[i]    )<<1))))))
+
+
+
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyhus, "Vdd32=vmpyhus(Vu32,Vv32)","Vdd32.w=vmpy(Vu32.h,Vv32.uh)",
+"Vector by Vector Halfword Multiply",
+    VddV.v[0].w[i] = fMPY16SU(fGETHALF(0, VuV.w[i]), fGETUHALF(0, VvV.uw[i]));
+    VddV.v[1].w[i] = fMPY16SU(fGETHALF(1, VuV.w[i]), fGETUHALF(1, VvV.uw[i])))
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyhus_acc, "Vxx32+=vmpyhus(Vu32,Vv32)","Vxx32.w+=vmpy(Vu32.h,Vv32.uh)",
+"Vector by Vector Halfword Multiply",
+    VxxV.v[0].w[i] += fMPY16SU(fGETHALF(0, VuV.w[i]), fGETUHALF(0, VvV.uw[i]));
+    VxxV.v[1].w[i] += fMPY16SU(fGETHALF(1, VuV.w[i]), fGETUHALF(1, VvV.uw[i])))
+
+
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vmpyih,"Vd32=vmpyih(Vu32,Vv32)","Vd32.h=vmpyi(Vu32.h,Vv32.h)",
+"Vector by Vector Halfword Multiply",
+    VdV.h[i] = fMPY16SS(VuV.h[i], VvV.h[i]))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vmpyih_acc,"Vx32+=vmpyih(Vu32,Vv32)","Vx32.h+=vmpyi(Vu32.h,Vv32.h)",
+"Vector by Vector Halfword Multiply",
+    VxV.h[i] += fMPY16SS(VuV.h[i], VvV.h[i]))
+
+
+
+/* 32x32 high half / frac */
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyewuh,"Vd32=vmpyewuh(Vu32,Vv32)","Vd32.w=vmpye(Vu32.w,Vv32.uh)",
+"Vector by Vector Halfword Multiply",
+VdV.w[i] = fMPY3216SU(VuV.w[i], fGETUHALF(0, VvV.w[i])) >> 16)
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyowh,"Vd32=vmpyowh(Vu32,Vv32):<<1:sat","Vd32.w=vmpyo(Vu32.w,Vv32.h):<<1:sat",
+"Vector by Vector Halfword Multiply",
+VdV.w[i] = fVSATW((((fMPY3216SS(VuV.w[i], fGETHALF(1, VvV.w[i])) >> 14) + 0) >> 1)))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyowh_rnd,"Vd32=vmpyowh(Vu32,Vv32):<<1:rnd:sat","Vd32.w=vmpyo(Vu32.w,Vv32.h):<<1:rnd:sat",
+"Vector by Vector Halfword Multiply",
+VdV.w[i] = fVSATW((((fMPY3216SS(VuV.w[i], fGETHALF(1, VvV.w[i])) >> 14) + 1) >> 1)))
+
+ITERATOR_INSN_MPY_SLOT_DOUBLE_VEC(32,vmpyewuh_64,"Vdd32=vmpye(Vu32.w,Vv32.uh)",
+"Word times Halfword Multiply, 64-bit result",
+	fHIDE(size8s_t prod;)
+	prod = fMPY32SU(VuV.w[i],fGETUHALF(0,VvV.w[i]));
+	VddV.v[1].w[i] = prod >> 16;
+	VddV.v[0].w[i] = prod << 16)
+
+ITERATOR_INSN_MPY_SLOT_DOUBLE_VEC(32,vmpyowh_64_acc,"Vxx32+=vmpyo(Vu32.w,Vv32.h)",
+"Word times Halfword Multiply, 64-bit result",
+	fHIDE(size8s_t prod;)
+	prod = fMPY32SS(VuV.w[i],fGETHALF(1,VvV.w[i]))  + fSE32_64(VxxV.v[1].w[i]);
+	VxxV.v[1].w[i] = prod >> 16;
+	fSETHALF(0, VxxV.v[0].w[i], VxxV.v[0].w[i] >> 16);
+	fSETHALF(1, VxxV.v[0].w[i], prod & 0x0000ffff))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyowh_sacc,"Vx32+=vmpyowh(Vu32,Vv32):<<1:sat:shift","Vx32.w+=vmpyo(Vu32.w,Vv32.h):<<1:sat:shift",
+"Vector by Vector Halfword Multiply",
+IV1DEAD() VxV.w[i] = fVSATW(((((VxV.w[i] + fMPY3216SS(VuV.w[i], fGETHALF(1, VvV.w[i]))) >> 14) + 0) >> 1)))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyowh_rnd_sacc,"Vx32+=vmpyowh(Vu32,Vv32):<<1:rnd:sat:shift","Vx32.w+=vmpyo(Vu32.w,Vv32.h):<<1:rnd:sat:shift",
+"Vector by Vector Halfword Multiply",
+IV1DEAD() VxV.w[i] = fVSATW(((((VxV.w[i] + fMPY3216SS(VuV.w[i], fGETHALF(1, VvV.w[i]))) >> 14) + 1) >> 1)))
+
+/* For 32x32 integer / low half */
+
+ITERATOR_INSN_MPY_SLOT(32,vmpyieoh,"Vd32.w=vmpyieo(Vu32.h,Vv32.h)","Odd/Even multiply for 32x32 low half",
+	VdV.w[i] = (fGETHALF(0,VuV.w[i])*fGETHALF(1,VvV.w[i])) << 16)
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyiewuh,"Vd32=vmpyiewuh(Vu32,Vv32)","Vd32.w=vmpyie(Vu32.w,Vv32.uh)",
+"Vector by Vector Word by Halfword Multiply",
+IV1DEAD()    VdV.w[i] = fMPY3216SU(VuV.w[i], fGETUHALF(0, VvV.w[i])) )
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyiowh,"Vd32=vmpyiowh(Vu32,Vv32)","Vd32.w=vmpyio(Vu32.w,Vv32.h)",
+"Vector by Vector Word by Halfword Multiply",
+IV1DEAD()    VdV.w[i] = fMPY3216SS(VuV.w[i], fGETHALF(1, VvV.w[i])) )
+
+/* Add back these... */
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyiewh_acc,"Vx32+=vmpyiewh(Vu32,Vv32)","Vx32.w+=vmpyie(Vu32.w,Vv32.h)",
+"Vector by Vector Word by Halfword Multiply",
+VxV.w[i] = VxV.w[i] + fMPY3216SS(VuV.w[i], fGETHALF(0, VvV.w[i])) )
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyiewuh_acc,"Vx32+=vmpyiewuh(Vu32,Vv32)","Vx32.w+=vmpyie(Vu32.w,Vv32.uh)",
+"Vector by Vector Word by Halfword Multiply",
+VxV.w[i] = VxV.w[i] + fMPY3216SU(VuV.w[i], fGETUHALF(0, VvV.w[i])) )
+
+
+
+
+
+
+
+/* Vector by Scalar */
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vmpyub,"Vdd32=vmpyub(Vu32,Rt32)","Vdd32.uh=vmpy(Vu32.ub,Rt32.ub)",
+"Vector absolute value of words",
+    VddV.v[0].uh[i]  = fMPY8UU(fGETUBYTE(0, VuV.uh[i]), fGETUBYTE((2*i+0)%4, RtV));
+    VddV.v[1].uh[i]  = fMPY8UU(fGETUBYTE(1, VuV.uh[i]), fGETUBYTE((2*i+1)%4, RtV)))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vmpyub_acc,"Vxx32+=vmpyub(Vu32,Rt32)","Vxx32.uh+=vmpy(Vu32.ub,Rt32.ub)",
+"Vector absolute value of words",
+    VxxV.v[0].uh[i] += fMPY8UU(fGETUBYTE(0, VuV.uh[i]), fGETUBYTE((2*i+0)%4, RtV));
+    VxxV.v[1].uh[i] += fMPY8UU(fGETUBYTE(1, VuV.uh[i]), fGETUBYTE((2*i+1)%4, RtV)))
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vmpybus,"Vdd32=vmpybus(Vu32,Rt32)","Vdd32.h=vmpy(Vu32.ub,Rt32.b)",
+"Vector absolute value of words",
+    VddV.v[0].h[i]  = fMPY8US(fGETUBYTE(0, VuV.uh[i]), fGETBYTE((2*i+0)%4, RtV));
+    VddV.v[1].h[i]  = fMPY8US(fGETUBYTE(1, VuV.uh[i]), fGETBYTE((2*i+1)%4, RtV)))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vmpybus_acc,"Vxx32+=vmpybus(Vu32,Rt32)","Vxx32.h+=vmpy(Vu32.ub,Rt32.b)",
+"Vector absolute value of words",
+    VxxV.v[0].h[i] += fMPY8US(fGETUBYTE(0, VuV.uh[i]), fGETBYTE((2*i+0)%4, RtV));
+    VxxV.v[1].h[i] += fMPY8US(fGETUBYTE(1, VuV.uh[i]), fGETBYTE((2*i+1)%4, RtV)))
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vmpabus,"Vdd32=vmpabus(Vuu32,Rt32)","Vdd32.h=vmpa(Vuu32.ub,Rt32.b)",
+"Vertical Byte Multiply",
+    VddV.v[0].h[i] = fMPY8US(fGETUBYTE(0, VuuV.v[0].uh[i]), fGETBYTE(0, RtV)) + fMPY16SS(fGETUBYTE(0, VuuV.v[1].uh[i]), fGETBYTE(1, RtV));
+    VddV.v[1].h[i] = fMPY8US(fGETUBYTE(1, VuuV.v[0].uh[i]), fGETBYTE(2, RtV)) + fMPY16SS(fGETUBYTE(1, VuuV.v[1].uh[i]), fGETBYTE(3, RtV)))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(16,vmpabus_acc,"Vxx32+=vmpabus(Vuu32,Rt32)","Vxx32.h+=vmpa(Vuu32.ub,Rt32.b)",
+"Vertical Byte Multiply",
+    VxxV.v[0].h[i] += fMPY8US(fGETUBYTE(0, VuuV.v[0].uh[i]), fGETBYTE(0, RtV)) + fMPY16SS(fGETUBYTE(0, VuuV.v[1].uh[i]), fGETBYTE(1, RtV));
+    VxxV.v[1].h[i] += fMPY8US(fGETUBYTE(1, VuuV.v[0].uh[i]), fGETBYTE(2, RtV)) + fMPY16SS(fGETUBYTE(1, VuuV.v[1].uh[i]), fGETBYTE(3, RtV)))
+
+// V65
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC_NOV1(16,vmpabuu,"Vdd32=vmpabuu(Vuu32,Rt32)","Vdd32.h=vmpa(Vuu32.ub,Rt32.ub)",
+"Vertical Byte Multiply",
+    VddV.v[0].uh[i] = fMPY8UU(fGETUBYTE(0, VuuV.v[0].uh[i]), fGETUBYTE(0, RtV)) + fMPY8UU(fGETUBYTE(0, VuuV.v[1].uh[i]), fGETUBYTE(1, RtV));
+    VddV.v[1].uh[i] = fMPY8UU(fGETUBYTE(1, VuuV.v[0].uh[i]), fGETUBYTE(2, RtV)) + fMPY8UU(fGETUBYTE(1, VuuV.v[1].uh[i]), fGETUBYTE(3, RtV)))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC_NOV1(16,vmpabuu_acc,"Vxx32+=vmpabuu(Vuu32,Rt32)","Vxx32.h+=vmpa(Vuu32.ub,Rt32.ub)",
+"Vertical Byte Multiply",
+    VxxV.v[0].uh[i] += fMPY8UU(fGETUBYTE(0, VuuV.v[0].uh[i]), fGETUBYTE(0, RtV)) + fMPY8UU(fGETUBYTE(0, VuuV.v[1].uh[i]), fGETUBYTE(1, RtV));
+    VxxV.v[1].uh[i] += fMPY8UU(fGETUBYTE(1, VuuV.v[0].uh[i]), fGETUBYTE(2, RtV)) + fMPY8UU(fGETUBYTE(1, VuuV.v[1].uh[i]), fGETUBYTE(3, RtV)))
+
+
+
+
+/* Half by Byte */
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpahb,"Vdd32=vmpahb(Vuu32,Rt32)","Vdd32.w=vmpa(Vuu32.h,Rt32.b)",
+"Vertical Byte Multiply",
+    VddV.v[0].w[i] = fMPY16SS(fGETHALF(0, VuuV.v[0].w[i]), fSE8_16(fGETBYTE(0, RtV))) + fMPY16SS(fGETHALF(0, VuuV.v[1].w[i]), fSE8_16(fGETBYTE(1, RtV)));
+    VddV.v[1].w[i] = fMPY16SS(fGETHALF(1, VuuV.v[0].w[i]), fSE8_16(fGETBYTE(2, RtV))) + fMPY16SS(fGETHALF(1, VuuV.v[1].w[i]), fSE8_16(fGETBYTE(3, RtV))))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpahb_acc,"Vxx32+=vmpahb(Vuu32,Rt32)","Vxx32.w+=vmpa(Vuu32.h,Rt32.b)",
+"Vertical Byte Multiply",
+    VxxV.v[0].w[i] += fMPY16SS(fGETHALF(0, VuuV.v[0].w[i]), fSE8_16(fGETBYTE(0, RtV))) + fMPY16SS(fGETHALF(0, VuuV.v[1].w[i]), fSE8_16(fGETBYTE(1, RtV)));
+    VxxV.v[1].w[i] += fMPY16SS(fGETHALF(1, VuuV.v[0].w[i]), fSE8_16(fGETBYTE(2, RtV))) + fMPY16SS(fGETHALF(1, VuuV.v[1].w[i]), fSE8_16(fGETBYTE(3, RtV))))
+
+/* Half by Byte */
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpauhb,"Vdd32=vmpauhb(Vuu32,Rt32)","Vdd32.w=vmpa(Vuu32.uh,Rt32.b)",
+"Vertical Byte Multiply",
+    VddV.v[0].w[i] = fMPY16US(fGETUHALF(0, VuuV.v[0].w[i]), fSE8_16(fGETBYTE(0, RtV))) + fMPY16US(fGETUHALF(0, VuuV.v[1].w[i]), fSE8_16(fGETBYTE(1, RtV)));
+    VddV.v[1].w[i] = fMPY16US(fGETUHALF(1, VuuV.v[0].w[i]), fSE8_16(fGETBYTE(2, RtV))) + fMPY16US(fGETUHALF(1, VuuV.v[1].w[i]), fSE8_16(fGETBYTE(3, RtV))))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpauhb_acc,"Vxx32+=vmpauhb(Vuu32,Rt32)","Vxx32.w+=vmpa(Vuu32.uh,Rt32.b)",
+"Vertical Byte Multiply",
+    VxxV.v[0].w[i] += fMPY16US(fGETUHALF(0, VuuV.v[0].w[i]), fSE8_16(fGETBYTE(0, RtV))) + fMPY16US(fGETUHALF(0, VuuV.v[1].w[i]), fSE8_16(fGETBYTE(1, RtV)));
+    VxxV.v[1].w[i] += fMPY16US(fGETUHALF(1, VuuV.v[0].w[i]), fSE8_16(fGETBYTE(2, RtV))) + fMPY16US(fGETUHALF(1, VuuV.v[1].w[i]), fSE8_16(fGETBYTE(3, RtV))))
+
+
+
+
+
+
+
+/* Half by Half */
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyh,"Vdd32=vmpyh(Vu32,Rt32)","Vdd32.w=vmpy(Vu32.h,Rt32.h)",
+"Vector absolute value of words",
+    VddV.v[0].w[i] =  fMPY16SS(fGETHALF(0, VuV.w[i]), fGETHALF(0, RtV));
+    VddV.v[1].w[i] =  fMPY16SS(fGETHALF(1, VuV.w[i]), fGETHALF(1, RtV)))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC_NOV1(32,vmpyh_acc,"Vxx32+=vmpyh(Vu32,Rt32)","Vxx32.w+=vmpy(Vu32.h,Rt32.h)",
+"Vector even halfwords with scalar lower halfword multiply with shift and sat32",
+    VxxV.v[0].w[i] =  fCAST8s(VxxV.v[0].w[i]) + fMPY16SS(fGETHALF(0, VuV.w[i]), fGETHALF(0, RtV));
+    VxxV.v[1].w[i] =  fCAST8s(VxxV.v[1].w[i]) + fMPY16SS(fGETHALF(1, VuV.w[i]), fGETHALF(1, RtV)))
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyhsat_acc,"Vxx32+=vmpyh(Vu32,Rt32):sat","Vxx32.w+=vmpy(Vu32.h,Rt32.h):sat",
+"Vector even halfwords with scalar lower halfword multiply with shift and sat32",
+    VxxV.v[0].w[i] =  fVSATW(fCAST8s(VxxV.v[0].w[i]) + fMPY16SS(fGETHALF(0, VuV.w[i]), fGETHALF(0, RtV)));
+    VxxV.v[1].w[i] =  fVSATW(fCAST8s(VxxV.v[1].w[i]) + fMPY16SS(fGETHALF(1, VuV.w[i]), fGETHALF(1, RtV))))
+
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyhss,"Vd32=vmpyh(Vu32,Rt32):<<1:sat","Vd32.h=vmpy(Vu32.h,Rt32.h):<<1:sat",
+"Vector halfword by halfword multiply, shift by 1, and take upper 16 msb",
+          fSETHALF(0,VdV.w[i],fVSATH(fGETHALF(1,fVSAT((fMPY16SS(fGETHALF(0,VuV.w[i]),fGETHALF(0,RtV))<<1)))));
+          fSETHALF(1,VdV.w[i],fVSATH(fGETHALF(1,fVSAT((fMPY16SS(fGETHALF(1,VuV.w[i]),fGETHALF(1,RtV))<<1)))));
+)
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyhsrs,"Vd32=vmpyh(Vu32,Rt32):<<1:rnd:sat","Vd32.h=vmpy(Vu32.h,Rt32.h):<<1:rnd:sat",
+"Vector halfword with scalar halfword multiply with round, shift, and sat16",
+       fSETHALF(0,VdV.w[i],fVSATH(fGETHALF(1,fVSAT(fROUND((fMPY16SS(fGETHALF(0,VuV.w[i]),fGETHALF(0,RtV))<<1))))));
+       fSETHALF(1,VdV.w[i],fVSATH(fGETHALF(1,fVSAT(fROUND((fMPY16SS(fGETHALF(1,VuV.w[i]),fGETHALF(1,RtV))<<1))))));
+)
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyuh,"Vdd32=vmpyuh(Vu32,Rt32)","Vdd32.uw=vmpy(Vu32.uh,Rt32.uh)",
+"Vector even halfword unsigned multiply by scalar",
+    VddV.v[0].uw[i] = fMPY16UU(fGETUHALF(0, VuV.uw[i]),fGETUHALF(0,RtV));
+    VddV.v[1].uw[i] = fMPY16UU(fGETUHALF(1, VuV.uw[i]),fGETUHALF(1,RtV)))
+
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyuh_acc,"Vxx32+=vmpyuh(Vu32,Rt32)","Vxx32.uw+=vmpy(Vu32.uh,Rt32.uh)",
+"Vector even halfword unsigned multiply by scalar",
+    VxxV.v[0].uw[i] += fMPY16UU(fGETUHALF(0, VuV.uw[i]),fGETUHALF(0,RtV));
+    VxxV.v[1].uw[i] += fMPY16UU(fGETUHALF(1, VuV.uw[i]),fGETUHALF(1,RtV)))
+
+
+
+
+/********************************************
+*  HALF BY BYTE
+********************************************/
+ITERATOR_INSN2_MPY_SLOT(16,vmpyihb,"Vd32=vmpyihb(Vu32,Rt32)","Vd32.h=vmpyi(Vu32.h,Rt32.b)",
+"Vector word by byte multiply, keep lower result",
+VdV.h[i]  = fMPY16SS(VuV.h[i], fGETBYTE(i % 4, RtV) ))
+
+ITERATOR_INSN2_MPY_SLOT(16,vmpyihb_acc,"Vx32+=vmpyihb(Vu32,Rt32)","Vx32.h+=vmpyi(Vu32.h,Rt32.b)",
+"Vector word by byte multiply, keep lower result",
+VxV.h[i] += fMPY16SS(VuV.h[i], fGETBYTE(i % 4, RtV) ))
+
+
+/********************************************
+*  WORD BY BYTE
+********************************************/
+ITERATOR_INSN2_MPY_SLOT(32,vmpyiwb,"Vd32=vmpyiwb(Vu32,Rt32)","Vd32.w=vmpyi(Vu32.w,Rt32.b)",
+"Vector word by byte multiply, keep lower result",
+VdV.w[i]  = fMPY32SS(VuV.w[i], fGETBYTE(i % 4, RtV) ))
+
+ITERATOR_INSN2_MPY_SLOT(32,vmpyiwb_acc,"Vx32+=vmpyiwb(Vu32,Rt32)","Vx32.w+=vmpyi(Vu32.w,Rt32.b)",
+"Vector word by byte multiply, keep lower result",
+VxV.w[i] += fMPY32SS(VuV.w[i], fGETBYTE(i % 4, RtV) ))
+
+ITERATOR_INSN2_MPY_SLOT(32,vmpyiwub,"Vd32=vmpyiwub(Vu32,Rt32)","Vd32.w=vmpyi(Vu32.w,Rt32.ub)",
+"Vector word by byte multiply, keep lower result",
+VdV.w[i]  = fMPY32SS(VuV.w[i], fGETUBYTE(i % 4, RtV) ))
+
+ITERATOR_INSN2_MPY_SLOT(32,vmpyiwub_acc,"Vx32+=vmpyiwub(Vu32,Rt32)","Vx32.w+=vmpyi(Vu32.w,Rt32.ub)",
+"Vector word by byte multiply, keep lower result",
+VxV.w[i] += fMPY32SS(VuV.w[i], fGETUBYTE(i % 4, RtV) ))
+
+
+/********************************************
+*  WORD BY HALF
+********************************************/
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyiwh,"Vd32=vmpyiwh(Vu32,Rt32)","Vd32.w=vmpyi(Vu32.w,Rt32.h)",
+"Vector word by byte multiply, keep lower result",
+VdV.w[i]  = fMPY32SS(VuV.w[i], fGETHALF(i % 2, RtV)))
+
+ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(32,vmpyiwh_acc,"Vx32+=vmpyiwh(Vu32,Rt32)","Vx32.w+=vmpyi(Vu32.w,Rt32.h)",
+"Vector word by byte multiply, keep lower result",
+VxV.w[i] += fMPY32SS(VuV.w[i], fGETHALF(i % 2, RtV)))
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+/**************************************************************************
+ * MMVECTOR LOGICAL OPERATIONS
+ * ************************************************************************/
+ITERATOR_INSN_ANY_SLOT(16,vand,"Vd32=vand(Vu32,Vv32)", "Vector Logical And", VdV.uh[i] = VuV.uh[i] & VvV.h[i])
+ITERATOR_INSN_ANY_SLOT(16,vor, "Vd32=vor(Vu32,Vv32)",  "Vector Logical Or", VdV.uh[i] = VuV.uh[i] | VvV.h[i])
+ITERATOR_INSN_ANY_SLOT(16,vxor,"Vd32=vxor(Vu32,Vv32)", "Vector Logical XOR",    VdV.uh[i] = VuV.uh[i] ^ VvV.h[i])
+ITERATOR_INSN_ANY_SLOT(16,vnot,"Vd32=vnot(Vu32)",     "Vector Logical NOT", VdV.uh[i] = ~VuV.uh[i])
+
+
+
+
+
+ITERATOR_INSN2_MPY_SLOT_LATE(8, vandqrt,
+"Vd32.ub=vand(Qu4.ub,Rt32.ub)", "Vd32=vand(Qu4,Rt32)", "Insert Predicate into Vector",
+    VdV.ub[i] = fGETQBIT(QuV,i) ? fGETUBYTE(i % 4, RtV) : 0)
+
+ITERATOR_INSN2_MPY_SLOT_LATE(8, vandqrt_acc,
+"Vx32.ub|=vand(Qu4.ub,Rt32.ub)", "Vx32|=vand(Qu4,Rt32)",  "Insert Predicate into Vector",
+    VxV.ub[i] |= (fGETQBIT(QuV,i)) ? fGETUBYTE(i % 4, RtV) : 0)
+
+ITERATOR_INSN2_MPY_SLOT_LATE(8, vandnqrt,
+"Vd32.ub=vand(!Qu4.ub,Rt32.ub)", "Vd32=vand(!Qu4,Rt32)", "Insert Predicate into Vector",
+    VdV.ub[i] = !fGETQBIT(QuV,i) ? fGETUBYTE(i % 4, RtV) : 0)
+
+ITERATOR_INSN2_MPY_SLOT_LATE(8, vandnqrt_acc,
+"Vx32.ub|=vand(!Qu4.ub,Rt32.ub)", "Vx32|=vand(!Qu4,Rt32)",  "Insert Predicate into Vector",
+    VxV.ub[i] |= !(fGETQBIT(QuV,i)) ? fGETUBYTE(i % 4, RtV) : 0)
+
+
+ITERATOR_INSN2_MPY_SLOT_LATE(8, vandvrt,
+"Qd4.ub=vand(Vu32.ub,Rt32.ub)", "Qd4=vand(Vu32,Rt32)", "Insert into Predicate",
+    fSETQBIT(QdV,i,((VuV.ub[i] & fGETUBYTE(i % 4, RtV)) != 0) ? 1 : 0))
+
+ITERATOR_INSN2_MPY_SLOT_LATE(8, vandvrt_acc,
+"Qx4.ub|=vand(Vu32.ub,Rt32.ub)", "Qx4|=vand(Vu32,Rt32)", "Insert into Predicate ",
+    fSETQBIT(QxV,i,fGETQBIT(QxV,i)|(((VuV.ub[i] & fGETUBYTE(i % 4, RtV)) != 0) ? 1 : 0)))
+
+ITERATOR_INSN_ANY_SLOT(8,vandvqv,"Vd32=vand(Qv4,Vu32)","Mask off bytes",
+VdV.b[i] = fGETQBIT(QvV,i) ? VuV.b[i] : 0)
+ITERATOR_INSN_ANY_SLOT(8,vandvnqv,"Vd32=vand(!Qv4,Vu32)","Mask off bytes",
+VdV.b[i] = !fGETQBIT(QvV,i) ? VuV.b[i] : 0)
+
+
+ /***************************************************
+ * Compare Vector with Vector
+ ***************************************************/
+#define VCMP(DEST, ASRC, ASRCOP, CMP, N, SRC, MASK, WIDTH)        \
+{ \
+       for(fHIDE(int) i = 0; i < fVBYTES(); i += WIDTH) { \
+		fSETQBITS(DEST,WIDTH,MASK,i,ASRC ASRCOP ((VuV.SRC[i/WIDTH] CMP VvV.SRC[i/WIDTH]) ? MASK : 0)); \
+    } \
+       }
+
+
+#define MMVEC_CMPGT(TYPE,TYPE2,TYPE3,DESCR,N,MASK,WIDTH,SRC) \
+EXTINSN(V6_vgt##TYPE,       "Qd4=vcmp.gt(Vu32." TYPE2 ",Vv32." TYPE2 ")",  ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA), DESCR" greater than", \
+	VCMP(QdV, , , >, N, SRC, MASK, WIDTH)) \
+EXTINSN(V6_vgt##TYPE##_and, "Qx4&=vcmp.gt(Vu32." TYPE2 ",Vv32." TYPE2 ")", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA), DESCR" greater than with predicate-and", \
+	VCMP(QxV, fGETQBITS(QxV,WIDTH,MASK,i), &, >, N, SRC, MASK, WIDTH)) \
+EXTINSN(V6_vgt##TYPE##_or,  "Qx4|=vcmp.gt(Vu32." TYPE2 ",Vv32." TYPE2 ")", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA), DESCR" greater than with predicate-or", \
+	VCMP(QxV, fGETQBITS(QxV,WIDTH,MASK,i), |, >, N, SRC, MASK, WIDTH)) \
+EXTINSN(V6_vgt##TYPE##_xor, "Qx4^=vcmp.gt(Vu32." TYPE2 ",Vv32." TYPE2 ")", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA), DESCR" greater than with predicate-xor", \
+	VCMP(QxV, fGETQBITS(QxV,WIDTH,MASK,i), ^, >, N, SRC, MASK, WIDTH))
+
+#define MMVEC_CMP(TYPE,TYPE2,TYPE3,DESCR,N,MASK, WIDTH, SRC)\
+MMVEC_CMPGT(TYPE,TYPE2,TYPE3,DESCR,N,MASK,WIDTH,SRC) \
+EXTINSN(V6_veq##TYPE,       "Qd4=vcmp.eq(Vu32." TYPE2 ",Vv32." TYPE2 ")",  ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA), DESCR" equal to", \
+	VCMP(QdV, , , ==, N, SRC, MASK, WIDTH)) \
+EXTINSN(V6_veq##TYPE##_and, "Qx4&=vcmp.eq(Vu32." TYPE2 ",Vv32." TYPE2 ")", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA), DESCR" equalto with predicate-and", \
+	VCMP(QxV, fGETQBITS(QxV,WIDTH,MASK,i), &, ==, N, SRC, MASK, WIDTH)) \
+EXTINSN(V6_veq##TYPE##_or,  "Qx4|=vcmp.eq(Vu32." TYPE2 ",Vv32." TYPE2 ")", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA), DESCR" equalto with predicate-or", \
+	VCMP(QxV, fGETQBITS(QxV,WIDTH,MASK,i), |, ==, N, SRC, MASK, WIDTH)) \
+EXTINSN(V6_veq##TYPE##_xor, "Qx4^=vcmp.eq(Vu32." TYPE2 ",Vv32." TYPE2 ")", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA), DESCR" equalto with predicate-xor", \
+	VCMP(QxV, fGETQBITS(QxV,WIDTH,MASK,i), ^, ==, N, SRC, MASK, WIDTH))
+
+
+MMVEC_CMP(w,"w","","Vector Word Compare ", fVELEM(32), 0xF, 4, w)
+MMVEC_CMP(h,"h","","Vector Half Compare ", fVELEM(16), 0x3, 2, h)
+MMVEC_CMP(b,"b","","Vector Half Compare ", fVELEM(8),  0x1, 1, b)
+MMVEC_CMPGT(uw,"uw","","Vector Unsigned Half Compare ", fVELEM(32), 0xF, 4,uw)
+MMVEC_CMPGT(uh,"uh","","Vector Unsigned Half Compare ", fVELEM(16), 0x3, 2,uh)
+MMVEC_CMPGT(ub,"ub","","Vector Unsigned Byte Compare ", fVELEM(8),  0x1, 1,ub)
+
+/***************************************************
+* Predicate Operations
+***************************************************/
+
+EXTINSN(V6_pred_scalar2, "Qd4=vsetq(Rt32)",         ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VP),   "Set Vector Predicate ",
+{
+    fHIDE(int i;)
+    for(i = 0; i < fVBYTES(); i++) fSETQBIT(QdV,i,(i < (RtV & (fVBYTES()-1))) ? 1 : 0);
+})
+
+EXTINSN(V6_pred_scalar2v2, "Qd4=vsetq2(Rt32)",         ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VP),   "Set Vector Predicate ",
+{
+    fHIDE(int i;)
+    for(i = 0; i < fVBYTES(); i++) fSETQBIT(QdV,i,(i <= ((RtV-1) & (fVBYTES()-1))) ? 1 : 0);
+})
+
+
+ITERATOR_INSN_ANY_SLOT_DOUBLE_VEC(8, shuffeqw, "Qd4.h=vshuffe(Qs4.w,Qt4.w)","Shrink Predicate", fSETQBIT(QdV,i, (i & 2) ? fGETQBIT(QsV,i-2) : fGETQBIT(QtV,i) ) )
+ITERATOR_INSN_ANY_SLOT_DOUBLE_VEC(8, shuffeqh, "Qd4.b=vshuffe(Qs4.h,Qt4.h)","Shrink Predicate", fSETQBIT(QdV,i, (i & 1) ? fGETQBIT(QsV,i-1) : fGETQBIT(QtV,i) ) )
+ITERATOR_INSN_ANY_SLOT_DOUBLE_VEC(8, pred_or, "Qd4=or(Qs4,Qt4)","Vector Predicate Or", fSETQBIT(QdV,i,fGETQBIT(QsV,i) || fGETQBIT(QtV,i) ) )
+ITERATOR_INSN_ANY_SLOT_DOUBLE_VEC(8, pred_and, "Qd4=and(Qs4,Qt4)","Vector Predicate And", fSETQBIT(QdV,i,fGETQBIT(QsV,i) && fGETQBIT(QtV,i) ) )
+ITERATOR_INSN_ANY_SLOT_DOUBLE_VEC(8, pred_xor, "Qd4=xor(Qs4,Qt4)","Vector Predicate Xor", fSETQBIT(QdV,i,fGETQBIT(QsV,i) ^ fGETQBIT(QtV,i) ) )
+ITERATOR_INSN_ANY_SLOT_DOUBLE_VEC(8, pred_or_n, "Qd4=or(Qs4,!Qt4)","Vector Predicate Or with not", fSETQBIT(QdV,i,fGETQBIT(QsV,i) || !fGETQBIT(QtV,i) ) )
+ITERATOR_INSN_ANY_SLOT_DOUBLE_VEC(8, pred_and_n, "Qd4=and(Qs4,!Qt4)","Vector Predicate And  with not", fSETQBIT(QdV,i,fGETQBIT(QsV,i) && !fGETQBIT(QtV,i) ) )
+ITERATOR_INSN_ANY_SLOT(8, pred_not, "Qd4=not(Qs4)","Vector Predicate Not", fSETQBIT(QdV,i,!fGETQBIT(QsV,i) ) )
+
+
+
+EXTINSN(V6_vcmov,  "if (Ps4) Vd32=Vu32",  ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA),   "Conditional Mov",
+{
+if (fLSBOLD(PsV))	{
+	fHIDE(int i;)
+	fVFOREACH(8, i) {
+		VdV.ub[i] = VuV.ub[i];
+	}
+	} else {CANCEL;}
+})
+
+EXTINSN(V6_vncmov,  "if (!Ps4) Vd32=Vu32",  ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA),   "Conditional Mov",
+{
+if (fLSBOLDNOT(PsV))	{
+	fHIDE(int i;)
+	fVFOREACH(8, i) {
+		VdV.ub[i] = VuV.ub[i];
+	}
+	} else {CANCEL;}
+})
+
+EXTINSN(V6_vccombine,  "if (Ps4) Vdd32=vcombine(Vu32,Vv32)",	ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA_DV),   "Conditional Combine",
+{
+if (fLSBOLD(PsV))	{
+	fHIDE(int i;)
+	fVFOREACH(8, i) {
+		VddV.v[0].ub[i] = VvV.ub[i];
+		VddV.v[1].ub[i] = VuV.ub[i];
+	}
+	} else {CANCEL;}
+})
+
+EXTINSN(V6_vnccombine,  "if (!Ps4) Vdd32=vcombine(Vu32,Vv32)",	ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA_DV),   "Conditional Combine",
+{
+if (fLSBOLDNOT(PsV))	{
+	fHIDE(int i;)
+	fVFOREACH(8, i) {
+		VddV.v[0].ub[i] = VvV.ub[i];
+		VddV.v[1].ub[i] = VuV.ub[i];
+	}
+	} else {CANCEL;}
+})
+
+
+
+ITERATOR_INSN_ANY_SLOT(8,vmux,"Vd32=vmux(Qt4,Vu32,Vv32)",
+"Vector Select Element 8-bit",
+    VdV.ub[i] = fGETQBIT(QtV,i) ? VuV.ub[i] : VvV.ub[i])
+
+ITERATOR_INSN_ANY_SLOT_DOUBLE_VEC(8,vswap,"Vdd32=vswap(Qt4,Vu32,Vv32)",
+"Vector Swap Element 8-bit",
+    VddV.v[0].ub[i] =  fGETQBIT(QtV,i) ? VuV.ub[i] : VvV.ub[i];
+	VddV.v[1].ub[i] = !fGETQBIT(QtV,i) ? VuV.ub[i] : VvV.ub[i])
+
+
+/***************************************************************************
+*
+*   MMVECTOR SORTING
+*
+****************************************************************************/
+
+#define MMVEC_SORT(TYPE,TYPE2,DESCR,ELEMENTSIZE,SRC)\
+ITERATOR_INSN2_ANY_SLOT(ELEMENTSIZE,vmax##TYPE, "Vd32=vmax" TYPE2 "(Vu32,Vv32)", "Vd32."#SRC"=vmax(Vu32."#SRC",Vv32."#SRC")", "Vector " DESCR " max", VdV.SRC[i] = (VuV.SRC[i] > VvV.SRC[i]) ? VuV.SRC[i] :  VvV.SRC[i])  \
+ITERATOR_INSN2_ANY_SLOT(ELEMENTSIZE,vmin##TYPE, "Vd32=vmin" TYPE2 "(Vu32,Vv32)", "Vd32."#SRC"=vmin(Vu32."#SRC",Vv32."#SRC")", "Vector " DESCR " min", VdV.SRC[i] = (VuV.SRC[i] < VvV.SRC[i]) ? VuV.SRC[i] :  VvV.SRC[i])
+
+MMVEC_SORT(b,"b", "signed byte",    8,  b)
+MMVEC_SORT(ub,"ub", "unsigned byte",    8,  ub)
+MMVEC_SORT(uh,"uh", "unsigned halfword",16, uh)
+MMVEC_SORT(h,   "h",    "halfword",         16, h)
+MMVEC_SORT(w,   "w",    "word",             32, w)
+
+
+
+
+
+
+
+
+
+/*************************************************************
+* SHUFFLES
+****************************************************************/
+
+ITERATOR_INSN2_ANY_SLOT(16,vsathub,"Vd32=vsathub(Vu32,Vv32)","Vd32.ub=vsat(Vu32.h,Vv32.h)",
+"Saturate and pack 32 halfwords to 32 unsigned bytes, and interleave them",
+    fSETBYTE(0, VdV.uh[i], fVSATUB(VvV.h[i]));
+    fSETBYTE(1, VdV.uh[i], fVSATUB(VuV.h[i])))
+
+ITERATOR_INSN2_ANY_SLOT(32,vsatwh,"Vd32=vsatwh(Vu32,Vv32)","Vd32.h=vsat(Vu32.w,Vv32.w)",
+"Saturate and pack 16 words to 16 halfwords, and interleave them",
+    fSETHALF(0, VdV.w[i], fVSATH(VvV.w[i]));
+    fSETHALF(1, VdV.w[i], fVSATH(VuV.w[i])))
+
+ITERATOR_INSN2_ANY_SLOT(32,vsatuwuh,"Vd32=vsatuwuh(Vu32,Vv32)","Vd32.uh=vsat(Vu32.uw,Vv32.uw)",
+"Saturate and pack 16 words to 16 halfwords, and interleave them",
+    fSETHALF(0, VdV.w[i], fVSATUH(VvV.uw[i]));
+    fSETHALF(1, VdV.w[i], fVSATUH(VuV.uw[i])))
+
+ITERATOR_INSN2_ANY_SLOT(16,vshuffeb,"Vd32=vshuffeb(Vu32,Vv32)","Vd32.b=vshuffe(Vu32.b,Vv32.b)",
+"Shuffle half words with in a lane",
+    fSETBYTE(0, VdV.uh[i], fGETUBYTE(0, VvV.uh[i]));
+    fSETBYTE(1, VdV.uh[i], fGETUBYTE(0, VuV.uh[i])))
+
+ITERATOR_INSN2_ANY_SLOT(16,vshuffob,"Vd32=vshuffob(Vu32,Vv32)","Vd32.b=vshuffo(Vu32.b,Vv32.b)",
+"Shuffle half words with in a lane",
+    fSETBYTE(0, VdV.uh[i], fGETUBYTE(1, VvV.uh[i]));
+    fSETBYTE(1, VdV.uh[i], fGETUBYTE(1, VuV.uh[i])))
+
+ITERATOR_INSN2_ANY_SLOT(32,vshufeh,"Vd32=vshuffeh(Vu32,Vv32)","Vd32.h=vshuffe(Vu32.h,Vv32.h)",
+"Shuffle half words with in a lane",
+    fSETHALF(0, VdV.uw[i], fGETUHALF(0, VvV.uw[i]));
+    fSETHALF(1, VdV.uw[i], fGETUHALF(0, VuV.uw[i])))
+
+ITERATOR_INSN2_ANY_SLOT(32,vshufoh,"Vd32=vshuffoh(Vu32,Vv32)","Vd32.h=vshuffo(Vu32.h,Vv32.h)",
+"Shuffle half words with in a lane",
+    fSETHALF(0, VdV.uw[i], fGETUHALF(1, VvV.uw[i]));
+    fSETHALF(1, VdV.uw[i], fGETUHALF(1, VuV.uw[i])))
+
+
+
+
+/**************************************************************************
+* Double Vector Shuffles
+**************************************************************************/
+
+EXTINSN(V6_vshuff, "vshuff(Vy32,Vx32,Rt32)",
+ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VP_VS),
+"2x2->2x2 transpose, for multiple data sizes, inplace",
+{
+	fHIDE(int offset;)
+	for (offset=1; offset<fVBYTES(); offset<<=1) {
+		if ( RtV & offset) {
+			    fHIDE(int k;) \
+				fVFOREACH(8, k) {\
+				if (!( k & offset)) {
+					fSWAPB(VyV.ub[k], VxV.ub[k+offset]);
+				}
+			}
+		}
+	}
+	})
+
+EXTINSN(V6_vshuffvdd, "Vdd32=vshuff(Vu32,Vv32,Rt8)",
+ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VP_VS),
+"2x2->2x2 transpose for multiple data sizes",
+{
+	fHIDE(int offset;)
+	VddV.v[0] = VvV;
+	VddV.v[1] = VuV;
+	for (offset=1; offset<fVBYTES(); offset<<=1) {
+		if ( RtV & offset) {
+			    fHIDE(int k;) \
+				fVFOREACH(8, k) {\
+				if (!( k & offset)) {
+					fSWAPB(VddV.v[1].ub[k], VddV.v[0].ub[k+offset]);
+				}
+			}
+		}
+	}
+	})
+
+EXTINSN(V6_vdeal, "vdeal(Vy32,Vx32,Rt32)",
+ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VP_VS),
+" vector - vector deal - or deinterleave, for multiple data sizes, inplace",
+{
+	fHIDE(int offset;)
+	for (offset=fVBYTES()>>1; offset>0; offset>>=1) {
+		if ( RtV & offset) {
+			    fHIDE(int k;) \
+				fVFOREACH(8, k) {\
+				if (!( k & offset)) {
+					fSWAPB(VyV.ub[k], VxV.ub[k+offset]);
+				}
+			}
+		}
+	}
+	})
+
+EXTINSN(V6_vdealvdd, "Vdd32=vdeal(Vu32,Vv32,Rt8)",
+ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VP_VS),
+" vector - vector deal - or deinterleave, for multiple data sizes",
+{
+	fHIDE(int offset;)
+	VddV.v[0] = VvV;
+	VddV.v[1] = VuV;
+	for (offset=fVBYTES()>>1; offset>0; offset>>=1) {
+		if ( RtV & offset) {
+			    fHIDE(int k;) \
+				fVFOREACH(8, k) {\
+				if (!( k & offset)) {
+					fSWAPB(VddV.v[1].ub[k], VddV.v[0].ub[k+offset]);
+				}
+			}
+		}
+	}
+	})
+
+/**************************************************************************/
+
+
+
+ITERATOR_INSN2_ANY_SLOT_DOUBLE_VEC(32,vshufoeh,"Vdd32=vshuffoeh(Vu32,Vv32)","Vdd32.h=vshuffoe(Vu32.h,Vv32.h)",
+"Vector Shuffle half words",
+    fSETHALF(0, VddV.v[0].uw[i], fGETUHALF(0, VvV.uw[i]));
+    fSETHALF(1, VddV.v[0].uw[i], fGETUHALF(0, VuV.uw[i]));
+    fSETHALF(0, VddV.v[1].uw[i], fGETUHALF(1, VvV.uw[i]));
+    fSETHALF(1, VddV.v[1].uw[i], fGETUHALF(1, VuV.uw[i])))
+
+ITERATOR_INSN2_ANY_SLOT_DOUBLE_VEC(16,vshufoeb,"Vdd32=vshuffoeb(Vu32,Vv32)","Vdd32.b=vshuffoe(Vu32.b,Vv32.b)",
+"Vector Shuffle bytes",
+    fSETBYTE(0, VddV.v[0].uh[i], fGETUBYTE(0, VvV.uh[i]));
+    fSETBYTE(1, VddV.v[0].uh[i], fGETUBYTE(0, VuV.uh[i]));
+    fSETBYTE(0, VddV.v[1].uh[i], fGETUBYTE(1, VvV.uh[i]));
+    fSETBYTE(1, VddV.v[1].uh[i], fGETUBYTE(1, VuV.uh[i])))
+
+
+/***************************************************************
+* Deal
+***************************************************************/
+
+ITERATOR_INSN2_PERMUTE_SLOT(32, vdealh, "Vd32=vdealh(Vu32)", "Vd32.h=vdeal(Vu32.h)",
+"Deal Halfwords",
+    VdV.uh[i  ] = fGETUHALF(0, VuV.uw[i]);
+    VdV.uh[i+fVELEM(32)] = fGETUHALF(1, VuV.uw[i]))
+
+ITERATOR_INSN2_PERMUTE_SLOT(16, vdealb, "Vd32=vdealb(Vu32)", "Vd32.b=vdeal(Vu32.b)",
+"Deal Halfwords",
+    VdV.ub[i   ] = fGETUBYTE(0, VuV.uh[i]);
+    VdV.ub[i+fVELEM(16)] = fGETUBYTE(1, VuV.uh[i]))
+
+ITERATOR_INSN2_PERMUTE_SLOT(32, vdealb4w,  "Vd32=vdealb4w(Vu32,Vv32)", "Vd32.b=vdeale(Vu32.b,Vv32.b)",
+"Deal Two Vectors Bytes",
+    VdV.ub[0+i ] = fGETUBYTE(0, VvV.uw[i]);
+    VdV.ub[fVELEM(32)+i ] = fGETUBYTE(2, VvV.uw[i]);
+    VdV.ub[2*fVELEM(32)+i] = fGETUBYTE(0, VuV.uw[i]);
+    VdV.ub[3*fVELEM(32)+i] = fGETUBYTE(2, VuV.uw[i]))
+
+/***************************************************************
+* shuffle
+***************************************************************/
+
+ITERATOR_INSN2_PERMUTE_SLOT(32, vshuffh, "Vd32=vshuffh(Vu32)", "Vd32.h=vshuff(Vu32.h)",
+"Deal Halfwords",
+    fSETHALF(0, VdV.uw[i], VuV.uh[i]);
+    fSETHALF(1, VdV.uw[i], VuV.uh[i+fVELEM(32)]))
+
+ITERATOR_INSN2_PERMUTE_SLOT(16, vshuffb, "Vd32=vshuffb(Vu32)", "Vd32.b=vshuff(Vu32.b)",
+"Deal Halfwords",
+    fSETBYTE(0, VdV.uh[i], VuV.ub[i]);
+    fSETBYTE(1, VdV.uh[i], VuV.ub[i+fVELEM(16)]))
+
+
+
+
+
+/***********************************************************
+* INSERT AND EXTRACT
+*********************************************************/
+EXTINSN(V6_extractw, "Rd32=vextract(Vu32,Rs32)",
+ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VA,A_MEMLIKE,A_RESTRICT_SLOT0ONLY),
+"Extract an element from a vector to scalar",
+fHIDE(warn("RdN=%d VuN=%d RsN=%d RsV=0x%08x widx=%d",RdN,VuN,RsN,RsV,((RsV & (fVBYTES()-1)) >> 2));)
+RdV = VuV.uw[ (RsV & (fVBYTES()-1)) >> 2];
+fHIDE(warn("RdV=0x%08x",RdV);))
+
+EXTINSN(V6_vinsertwr, "Vx32.w=vinsert(Rt32)",
+ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VX),
+"Insert Word Scalar into Vector",
+VxV.uw[0] = RtV;)
+
+
+
+
+ITERATOR_INSN_MPY_SLOT_LATE(32,lvsplatw, "Vd32=vsplat(Rt32)", "Replicates scalar accross words in vector", VdV.uw[i] = RtV)
+
+ITERATOR_INSN_MPY_SLOT_LATE(16,lvsplath, "Vd32.h=vsplat(Rt32)", "Replicates scalar accross halves in vector", VdV.uh[i] = RtV)
+
+ITERATOR_INSN_MPY_SLOT_LATE(8,lvsplatb, "Vd32.b=vsplat(Rt32)", "Replicates scalar accross bytes in vector", VdV.ub[i] = RtV)
+
+
+ITERATOR_INSN_ANY_SLOT(32,vassign,"Vd32=Vu32","Copy a vector",VdV.w[i]=VuV.w[i])
+
+
+ITERATOR_INSN_ANY_SLOT_DOUBLE_VEC(8,vcombine,"Vdd32=vcombine(Vu32,Vv32)",
+"Vector assign, Any two to Vector Pair",
+    VddV.v[0].ub[i] = VvV.ub[i];
+    VddV.v[1].ub[i] = VuV.ub[i])
+
+
+
+///////////////////////////////////////////////////////////////////////////
+
+
+/*********************************************************
+* GENERAL PERMUTE NETWORKS
+*********************************************************/
+
+
+EXTINSN(V6_vdelta, "Vd32=vdelta(Vu32,Vv32)",    ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VP),
+"Reverse Benes Butterfly network ",
+{
+    fHIDE(int offset;)
+    fHIDE(int k;)
+    fHIDE(mmvector_t tmp;)
+    tmp = VuV;
+    for (offset=fVBYTES(); (offset>>=1)>0; ) {
+        for (k = 0; k<fVBYTES(); k++) {
+            VdV.ub[k] = (VvV.ub[k]&offset) ? tmp.ub[k^offset] : tmp.ub[k];
+        }
+        for (k = 0; k<fVBYTES(); k++) {
+            tmp.ub[k] = VdV.ub[k];
+        }
+    }
+})
+
+
+EXTINSN(V6_vrdelta, "Vd32=vrdelta(Vu32,Vv32)",  ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VP),
+"Forward Benes Butterfly network ",
+{
+	fHIDE(int offset;)
+    fHIDE(int k;)
+    fHIDE(mmvector_t tmp;)
+    tmp = VuV;
+    for (offset=1; offset<fVBYTES(); offset<<=1){
+        for (k = 0; k<fVBYTES(); k++) {
+            VdV.ub[k] = (VvV.ub[k]&offset) ? tmp.ub[k^offset] : tmp.ub[k];
+        }
+        for (k = 0; k<fVBYTES(); k++) {
+            tmp.ub[k] = VdV.ub[k];
+        }
+    }
+})
+
+
+
+
+
+ITERATOR_INSN2_SHIFT_SLOT(32,vcl0w,"Vd32=vcl0w(Vu32)","Vd32.uw=vcl0(Vu32.uw)",         "Count Leading Zeros in Word",     VdV.uw[i]=fCL1_4(~VuV.uw[i]))
+ITERATOR_INSN2_SHIFT_SLOT(16,vcl0h,"Vd32=vcl0h(Vu32)","Vd32.uh=vcl0(Vu32.uh)",         "Count Leading Zeros in Word",    VdV.uh[i]=fCL1_2(~VuV.uh[i]))
+
+ITERATOR_INSN2_SHIFT_SLOT(32,vnormamtw,"Vd32=vnormamtw(Vu32)","Vd32.w=vnormamt(Vu32.w)","Norm Amount Word",
+VdV.w[i]=fMAX(fCL1_4(~VuV.w[i]),fCL1_4(VuV.w[i]))-1; fHIDE(IV1DEAD();))
+ITERATOR_INSN2_SHIFT_SLOT(16,vnormamth,"Vd32=vnormamth(Vu32)","Vd32.h=vnormamt(Vu32.h)","Norm Amount Halfword",
+VdV.h[i]=fMAX(fCL1_2(~VuV.h[i]),fCL1_2(VuV.h[i]))-1; fHIDE(IV1DEAD();))
+
+ITERATOR_INSN_SHIFT_SLOT_VV_LATE(32,vaddclbw,"Vd32.w=vadd(vclb(Vu32.w),Vv32.w)",
+"Count leading bits and add",
+VdV.w[i] = fMAX(fCL1_4(~VuV.w[i]),fCL1_4(VuV.w[i])) + VvV.w[i])
+
+ITERATOR_INSN_SHIFT_SLOT_VV_LATE(16,vaddclbh,"Vd32.h=vadd(vclb(Vu32.h),Vv32.h)",
+"Count leading bits and add",
+VdV.h[i] = fMAX(fCL1_2(~VuV.h[i]),fCL1_2(VuV.h[i])) + VvV.h[i])
+
+
+ITERATOR_INSN2_SHIFT_SLOT(16,vpopcounth,"Vd32=vpopcounth(Vu32)","Vd32.h=vpopcount(Vu32.h)",   "Count Leading Zeros in Word",  VdV.uh[i]=fCOUNTONES_2(VuV.uh[i]))
+
+
+#define fHIST(INPUTVEC) \
+	fUARCH_NOTE_PUMP_4X(); \
+	fHIDE(int lane;) \
+	fHIDE(mmvector_t tmp;) \
+	fVFOREACH(128, lane) { \
+		for (fHIDE(int )i=0; i<128/8; ++i) { \
+			unsigned char value = INPUTVEC.ub[(128/8)*lane+i]; \
+			unsigned char regno = value>>3; \
+			unsigned char element = value & 7; \
+			READ_EXT_VREG(regno,tmp,0); \
+			tmp.uh[(128/16)*lane+(element)]++; \
+			WRITE_EXT_VREG(regno,tmp,EXT_NEW); \
+		} \
+	}
+
+#define fHISTQ(INPUTVEC,QVAL) \
+	fUARCH_NOTE_PUMP_4X(); \
+	fHIDE(int lane;) \
+	fHIDE(mmvector_t tmp;) \
+	fVFOREACH(128, lane) { \
+		for (fHIDE(int )i=0; i<128/8; ++i) { \
+			unsigned char value = INPUTVEC.ub[(128/8)*lane+i]; \
+			unsigned char regno = value>>3; \
+			unsigned char element = value & 7; \
+			READ_EXT_VREG(regno,tmp,0); \
+			if (fGETQBIT(QVAL,128/8*lane+i)) tmp.uh[(128/16)*lane+(element)]++; \
+			WRITE_EXT_VREG(regno,tmp,EXT_NEW); \
+		} \
+	}
+
+
+
+EXTINSN(V6_vhist, "vhist",ATTRIBS(A_EXTENSION,A_CVI,A_CVI_4SLOT), "vhist instruction",{ fHIDE(mmvector_t inputVec;) inputVec=fTMPVDATA(); fHIST(inputVec); })
+EXTINSN(V6_vhistq, "vhist(Qv4)",ATTRIBS(A_EXTENSION,A_CVI,A_CVI_4SLOT), "vhist instruction",{ fHIDE(mmvector_t inputVec;) inputVec=fTMPVDATA(); fHISTQ(inputVec,QvV); })
+
+#undef fHIST
+#undef fHISTQ
+
+
+/* **** WEIGHTED HISTOGRAM **** */
+
+
+#if 1
+#define WHIST(EL,MASK,BSHIFT,COND,SATF) \
+	fHIDE(unsigned int) bucket = fGETUBYTE(0,input.h[i]); \
+	fHIDE(unsigned int) weight = fGETUBYTE(1,input.h[i]); \
+	fHIDE(unsigned int) vindex = (bucket >> 3) & 0x1F; \
+	fHIDE(unsigned int) elindex = ((i>>BSHIFT) & (~MASK)) | ((bucket>>BSHIFT) & MASK); \
+	fHIDE(mmvector_t tmp;) \
+	READ_EXT_VREG(vindex,tmp,0); \
+	COND tmp.EL[elindex] = SATF(tmp.EL[elindex] + weight); \
+	WRITE_EXT_VREG(vindex,tmp,EXT_NEW); \
+	fUARCH_NOTE_PUMP_2X();
+
+ITERATOR_INSN_VHISTLIKE(16,vwhist256,"vwhist256","vector weighted histogram halfword counters", WHIST(uh,7,0,,))
+ITERATOR_INSN_VHISTLIKE(16,vwhist256q,"vwhist256(Qv4)","vector weighted histogram halfword counters", WHIST(uh,7,0,if (fGETQBIT(QvV,2*i)),))
+ITERATOR_INSN_VHISTLIKE(16,vwhist256_sat,"vwhist256:sat","vector weighted histogram halfword counters", WHIST(uh,7,0,,fVSATUH))
+ITERATOR_INSN_VHISTLIKE(16,vwhist256q_sat,"vwhist256(Qv4):sat","vector weighted histogram halfword counters", WHIST(uh,7,0,if (fGETQBIT(QvV,2*i)),fVSATUH))
+ITERATOR_INSN_VHISTLIKE(16,vwhist128,"vwhist128","vector weighted histogram word counters", WHIST(uw,3,1,,))
+ITERATOR_INSN_VHISTLIKE(16,vwhist128q,"vwhist128(Qv4)","vector weighted histogram word counters", WHIST(uw,3,1,if (fGETQBIT(QvV,2*i)),))
+ITERATOR_INSN_VHISTLIKE(16,vwhist128m,"vwhist128(#u1)","vector weighted histogram word counters", WHIST(uw,3,1,if ((bucket & 1) == uiV),))
+ITERATOR_INSN_VHISTLIKE(16,vwhist128qm,"vwhist128(Qv4,#u1)","vector weighted histogram word counters", WHIST(uw,3,1,if (((bucket & 1) == uiV) && fGETQBIT(QvV,2*i)),))
+
+
+#endif
+
+
+
+/* ******   lookup table instructions                          ***********  */
+
+/* Use low bits from idx to choose next-bigger elements from vector, then use LSB from idx to choose odd or even element */
+
+ITERATOR_INSN_PERMUTE_SLOT(8,vlutvvb,"Vd32.b=vlut32(Vu32.b,Vv32.b,Rt8)","vector-vector table lookup",
+fHIDE(unsigned int idx;) fHIDE(int matchval;) fHIDE(int oddhalf;)
+matchval = RtV & 0x7;
+oddhalf = (RtV >> (fVECLOGSIZE()-6)) & 0x1;
+idx = VuV.ub[i];
+VdV.b[i] = ((idx & 0xE0) == (matchval << 5)) ? fGETBYTE(oddhalf,VvV.h[idx % fVELEM(16)]) : 0)
+
+
+ITERATOR_INSN_PERMUTE_SLOT_DOUBLE_VEC(8,vlutvvb_oracc,"Vx32.b|=vlut32(Vu32.b,Vv32.b,Rt8)","vector-vector table lookup",
+fHIDE(unsigned int idx;) fHIDE(int matchval;) fHIDE(int oddhalf;)
+matchval = RtV & 0x7;
+oddhalf = (RtV >> (fVECLOGSIZE()-6)) & 0x1;
+idx = VuV.ub[i];
+VxV.b[i] |= ((idx & 0xE0) == (matchval << 5)) ? fGETBYTE(oddhalf,VvV.h[idx % fVELEM(16)]) : 0)
+
+ITERATOR_INSN_PERMUTE_SLOT_DOUBLE_VEC(16,vlutvwh,"Vdd32.h=vlut16(Vu32.b,Vv32.h,Rt8)","vector-vector table lookup",
+fHIDE(unsigned int idx;) fHIDE(int matchval;) fHIDE(int oddhalf;)
+matchval = RtV & 0xF;
+oddhalf = (RtV >> (fVECLOGSIZE()-6)) & 0x1;
+idx = fGETUBYTE(0,VuV.uh[i]);
+VddV.v[0].h[i] = ((idx & 0xF0) == (matchval << 4)) ? fGETHALF(oddhalf,VvV.w[idx % fVELEM(32)]) : 0;
+idx = fGETUBYTE(1,VuV.uh[i]);
+VddV.v[1].h[i] = ((idx & 0xF0) == (matchval << 4)) ? fGETHALF(oddhalf,VvV.w[idx % fVELEM(32)]) : 0)
+
+ITERATOR_INSN_PERMUTE_SLOT_DOUBLE_VEC(16,vlutvwh_oracc,"Vxx32.h|=vlut16(Vu32.b,Vv32.h,Rt8)","vector-vector table lookup",
+fHIDE(unsigned int idx;) fHIDE(int matchval;) fHIDE(int oddhalf;)
+matchval = fGETUBYTE(0,RtV) & 0xF;
+oddhalf = (RtV >> (fVECLOGSIZE()-6)) & 0x1;
+idx = fGETUBYTE(0,VuV.uh[i]);
+VxxV.v[0].h[i] |= ((idx & 0xF0) == (matchval << 4)) ? fGETHALF(oddhalf,VvV.w[idx % fVELEM(32)]) : 0;
+idx = fGETUBYTE(1,VuV.uh[i]);
+VxxV.v[1].h[i] |= ((idx & 0xF0) == (matchval << 4)) ? fGETHALF(oddhalf,VvV.w[idx % fVELEM(32)]) : 0)
+
+ITERATOR_INSN_PERMUTE_SLOT(8,vlutvvbi,"Vd32.b=vlut32(Vu32.b,Vv32.b,#u3)","vector-vector table lookup",
+fHIDE(unsigned int idx;) fHIDE(int matchval;) fHIDE(int oddhalf;)
+matchval = uiV & 0x7;
+oddhalf = (uiV >> (fVECLOGSIZE()-6)) & 0x1;
+idx = VuV.ub[i];
+VdV.b[i] = ((idx & 0xE0) == (matchval << 5)) ? fGETBYTE(oddhalf,VvV.h[idx % fVELEM(16)]) : 0)
+
+
+ITERATOR_INSN_PERMUTE_SLOT_DOUBLE_VEC(8,vlutvvb_oracci,"Vx32.b|=vlut32(Vu32.b,Vv32.b,#u3)","vector-vector table lookup",
+fHIDE(unsigned int idx;) fHIDE(int matchval;) fHIDE(int oddhalf;)
+matchval = uiV & 0x7;
+oddhalf = (uiV >> (fVECLOGSIZE()-6)) & 0x1;
+idx = VuV.ub[i];
+VxV.b[i] |= ((idx & 0xE0) == (matchval << 5)) ? fGETBYTE(oddhalf,VvV.h[idx % fVELEM(16)]) : 0)
+
+ITERATOR_INSN_PERMUTE_SLOT_DOUBLE_VEC(16,vlutvwhi,"Vdd32.h=vlut16(Vu32.b,Vv32.h,#u3)","vector-vector table lookup",
+fHIDE(unsigned int idx;) fHIDE(int matchval;) fHIDE(int oddhalf;)
+matchval = uiV & 0xF;
+oddhalf = (uiV >> (fVECLOGSIZE()-6)) & 0x1;
+idx = fGETUBYTE(0,VuV.uh[i]);
+VddV.v[0].h[i] = ((idx & 0xF0) == (matchval << 4)) ? fGETHALF(oddhalf,VvV.w[idx % fVELEM(32)]) : 0;
+idx = fGETUBYTE(1,VuV.uh[i]);
+VddV.v[1].h[i] = ((idx & 0xF0) == (matchval << 4)) ? fGETHALF(oddhalf,VvV.w[idx % fVELEM(32)]) : 0)
+
+ITERATOR_INSN_PERMUTE_SLOT_DOUBLE_VEC(16,vlutvwh_oracci,"Vxx32.h|=vlut16(Vu32.b,Vv32.h,#u3)","vector-vector table lookup",
+fHIDE(unsigned int idx;) fHIDE(int matchval;) fHIDE(int oddhalf;)
+matchval = uiV & 0xF;
+oddhalf = (uiV >> (fVECLOGSIZE()-6)) & 0x1;
+idx = fGETUBYTE(0,VuV.uh[i]);
+VxxV.v[0].h[i] |= ((idx & 0xF0) == (matchval << 4)) ? fGETHALF(oddhalf,VvV.w[idx % fVELEM(32)]) : 0;
+idx = fGETUBYTE(1,VuV.uh[i]);
+VxxV.v[1].h[i] |= ((idx & 0xF0) == (matchval << 4)) ? fGETHALF(oddhalf,VvV.w[idx % fVELEM(32)]) : 0)
+
+ITERATOR_INSN_PERMUTE_SLOT(8,vlutvvb_nm,"Vd32.b=vlut32(Vu32.b,Vv32.b,Rt8):nomatch","vector-vector table lookup",
+fHIDE(unsigned int idx;) fHIDE(int oddhalf;) fHIDE(int matchval;)
+    matchval = RtV & 0x7;
+    oddhalf = (RtV >> (fVECLOGSIZE()-6)) & 0x1;
+    idx = VuV.ub[i];
+    idx = (idx&0x1F) | (matchval<<5);
+    VdV.b[i] = fGETBYTE(oddhalf,VvV.h[idx % fVELEM(16)]))
+
+ITERATOR_INSN_PERMUTE_SLOT_DOUBLE_VEC(16,vlutvwh_nm,"Vdd32.h=vlut16(Vu32.b,Vv32.h,Rt8):nomatch","vector-vector table lookup",
+fHIDE(unsigned int idx;) fHIDE(int oddhalf;) fHIDE(int matchval;)
+    matchval = RtV & 0xF;
+    oddhalf = (RtV >> (fVECLOGSIZE()-6)) & 0x1;
+    idx = fGETUBYTE(0,VuV.uh[i]);
+    idx = (idx&0x0F) | (matchval<<4);
+    VddV.v[0].h[i] = fGETHALF(oddhalf,VvV.w[idx % fVELEM(32)]);
+    idx = fGETUBYTE(1,VuV.uh[i]);
+    idx = (idx&0x0F) | (matchval<<4);
+    VddV.v[1].h[i] = fGETHALF(oddhalf,VvV.w[idx % fVELEM(32)]))
+
+
+
+
+/******************************************************************************
+NON LINEAR - V65
+ ******************************************************************************/
+
+ITERATOR_INSN_SLOT2_DOUBLE_VEC(16,vmpahhsat,"Vx32.h=vmpa(Vx32.h,Vu32.h,Rtt32.h):sat","piecewise linear approximation",
+    VxV.h[i]= fVSATH( ( ( fMPY16SS(VxV.h[i],VuV.h[i])<<1) + (fGETHALF(( (VuV.h[i]>>14)&0x3), RttV )<<15))>>16))
+
+
+ITERATOR_INSN_SLOT2_DOUBLE_VEC(16,vmpauhuhsat,"Vx32.h=vmpa(Vx32.h,Vu32.uh,Rtt32.uh):sat","piecewise linear approximation",
+    VxV.h[i]= fVSATH( (  fMPY16SU(VxV.h[i],VuV.uh[i]) + (fGETUHALF(((VuV.uh[i]>>14)&0x3), RttV )<<15))>>16))
+
+ITERATOR_INSN_SLOT2_DOUBLE_VEC(16,vmpsuhuhsat,"Vx32.h=vmps(Vx32.h,Vu32.uh,Rtt32.uh):sat","piecewise linear approximation",
+    VxV.h[i]= fVSATH( (  fMPY16SU(VxV.h[i],VuV.uh[i]) - (fGETUHALF(((VuV.uh[i]>>14)&0x3), RttV )<<15))>>16))
+
+
+ITERATOR_INSN_SLOT2_DOUBLE_VEC(16,vlut4,"Vd32.h=vlut4(Vu32.uh,Rtt32.h)","4 entry lookup table",
+    VdV.h[i]= fGETHALF(  ((VuV.h[i]>>14)&0x3), RttV ))
+
+
+
+/******************************************************************************
+V65
+ ******************************************************************************/
+
+ITERATOR_INSN_MPY_SLOT_NOV1(32,vmpyuhe,"Vd32.uw=vmpye(Vu32.uh,Rt32.uh)",
+"Vector even halfword unsigned multiply by scalar",
+    VdV.uw[i] = fMPY16UU(fGETUHALF(0, VuV.uw[i]),fGETUHALF(0,RtV)))
+
+
+ITERATOR_INSN_MPY_SLOT_NOV1(32,vmpyuhe_acc,"Vx32.uw+=vmpye(Vu32.uh,Rt32.uh)",
+"Vector even halfword unsigned multiply by scalar",
+    VxV.uw[i] += fMPY16UU(fGETUHALF(0, VuV.uw[i]),fGETUHALF(0,RtV)))
+
+
+
+
+EXTINSN(V6_vgathermw,  "vtmp.w=vgather(Rt32,Mu2,Vv32.w).w", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_GATHER,A_CVI_VA,A_CVI_VM,A_CVI_TMP_DST,A_MEMLIKE), "Gather Words",
+{
+    fHIDE(int i;)
+	fHIDE(int element_size = 4;)
+    fHIDE(fGATHER_INIT( RtV, MuV, element_size);)
+    fVLASTBYTE(MuV, element_size);
+    fVALIGN(RtV, element_size);
+    fVFOREACH(32, i) {
+        EA = RtV+VvV.uw[i];
+        fVLOG_VTCM_GATHER_WORD(EA, VvV.uw[i], i,MuV);
+    }
+    fGATHER_FINISH()
+})
+EXTINSN(V6_vgathermh,  "vtmp.h=vgather(Rt32,Mu2,Vv32.h).h", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_GATHER,A_CVI_VA,A_CVI_VM,A_CVI_TMP_DST,A_MEMLIKE), "Gather halfwords",
+{
+    fHIDE(int i;)
+	fHIDE(int element_size = 2;)
+    fHIDE(fGATHER_INIT( RtV, MuV, element_size);)
+    fVLASTBYTE(MuV, element_size);
+    fVALIGN(RtV, element_size);
+    fVFOREACH(16, i) {
+        EA = RtV+VvV.uh[i];
+        fVLOG_VTCM_GATHER_HALFWORD(EA, VvV.uh[i], i,MuV);
+    }
+    fGATHER_FINISH()
+})
+
+
+
+EXTINSN(V6_vgathermhw,  "vtmp.h=vgather(Rt32,Mu2,Vvv32.w).h", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_GATHER,A_CVI_VA_DV,A_CVI_VM,A_CVI_TMP_DST,A_MEMLIKE), "Gather halfwords",
+{
+    fHIDE(int i;)
+    fHIDE(int j;)
+	fHIDE(int element_size = 2;)
+    fHIDE(fGATHER_INIT( RtV, MuV, element_size);)
+    fVLASTBYTE(MuV, element_size);
+    fVALIGN(RtV, element_size);
+    fVFOREACH(32, i) {
+       for(j = 0; j < 2; j++) {
+            EA = RtV+VvvV.v[j].uw[i];
+            fVLOG_VTCM_GATHER_HALFWORD_DV(EA, VvvV.v[j].uw[i], (2*i+j),i,j,MuV);
+        }
+    }
+     fGATHER_FINISH()
+})
+
+
+EXTINSN(V6_vgathermwq,  "if (Qs4) vtmp.w=vgather(Rt32,Mu2,Vv32.w).w", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_GATHER,A_CVI_VA,A_CVI_VM,A_CVI_TMP_DST,A_MEMLIKE), "Gather Words",
+{
+    fHIDE(int i;)
+	fHIDE(int element_size = 4;)
+    fHIDE(fGATHER_INIT( RtV, MuV, element_size);)
+    fVLASTBYTE(MuV, element_size);
+    fVALIGN(RtV, element_size);
+    fVFOREACH(32, i) {
+        EA = RtV+VvV.uw[i];
+        fVLOG_VTCM_GATHER_WORDQ(EA, VvV.uw[i], i,QsV,MuV);
+    }
+    fGATHER_FINISH()
+})
+EXTINSN(V6_vgathermhq,  "if (Qs4) vtmp.h=vgather(Rt32,Mu2,Vv32.h).h", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_GATHER,A_CVI_VA,A_CVI_VM,A_CVI_TMP_DST,A_MEMLIKE), "Gather halfwords",
+{
+    fHIDE(int i;)
+	fHIDE(int element_size = 2;)
+    fHIDE(fGATHER_INIT( RtV, MuV, element_size);)
+    fVLASTBYTE(MuV, element_size);
+    fVALIGN(RtV, element_size);
+    fVFOREACH(16, i) {
+        EA = RtV+VvV.uh[i];
+        fVLOG_VTCM_GATHER_HALFWORDQ(EA, VvV.uh[i], i,QsV,MuV);
+    }
+    fGATHER_FINISH()
+})
+
+
+
+EXTINSN(V6_vgathermhwq,  "if (Qs4) vtmp.h=vgather(Rt32,Mu2,Vvv32.w).h", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_GATHER,A_CVI_VA_DV,A_CVI_VM,A_CVI_TMP_DST,A_MEMLIKE), "Gather halfwords",
+{
+    fHIDE(int i;)
+    fHIDE(int j;)
+	fHIDE(int element_size = 2;)
+    fHIDE(fGATHER_INIT( RtV, MuV, element_size);)
+    fVLASTBYTE(MuV, element_size);
+    fVALIGN(RtV, element_size);
+    fVFOREACH(32, i) {
+       for(j = 0; j < 2; j++) {
+            EA = RtV+VvvV.v[j].uw[i];
+            fVLOG_VTCM_GATHER_HALFWORDQ_DV(EA, VvvV.v[j].uw[i], (2*i+j),i,j,QsV,MuV);
+       }
+    }
+    fGATHER_FINISH()
+})
+
+
+
+EXTINSN(V6_vscattermw , "vscatter(Rt32,Mu2,Vv32.w).w=Vw32", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_SCATTER,A_CVI_VA,A_CVI_VM,A_MEMLIKE), "Scatter Words",
+{
+    fHIDE(int i;)
+	fHIDE(int element_size = 4;)
+    fHIDE(fSCATTER_INIT( RtV, MuV, element_size);)
+    fVLASTBYTE(MuV, element_size);
+    fVALIGN(RtV, element_size);
+    fVFOREACH(32, i) {
+        EA = RtV+VvV.uw[i];
+        fVLOG_VTCM_WORD(EA, VvV.uw[i], VwV,i,MuV);
+    }
+    fSCATTER_FINISH(0)
+})
+
+
+
+EXTINSN(V6_vscattermh , "vscatter(Rt32,Mu2,Vv32.h).h=Vw32", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_SCATTER,A_CVI_VA,A_CVI_VM,A_MEMLIKE), "Scatter halfWords",
+{
+    fHIDE(int i;)
+	fHIDE(int element_size = 2;)
+    fHIDE(fSCATTER_INIT( RtV, MuV, element_size);)
+    fVLASTBYTE(MuV, element_size);
+    fVALIGN(RtV, element_size);
+    fVFOREACH(16, i) {
+        EA = RtV+VvV.uh[i];
+        fVLOG_VTCM_HALFWORD(EA,VvV.uh[i],VwV,i,MuV);
+    }
+    fSCATTER_FINISH(0)
+})
+
+
+EXTINSN(V6_vscattermw_add,  "vscatter(Rt32,Mu2,Vv32.w).w+=Vw32", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_SCATTER,A_CVI_VA,A_CVI_VM,A_MEMLIKE), "Scatter Words-Add",
+{
+    fHIDE(int i;)
+    fHIDE(int ALIGNMENT=4;)
+	fHIDE(int element_size = 4;)
+    fHIDE(fSCATTER_INIT( RtV, MuV, element_size);)
+    fVLASTBYTE(MuV, element_size);
+    fVALIGN(RtV, element_size);
+    fVFOREACH(32, i) {
+        EA = (RtV+fVALIGN(VvV.uw[i],ALIGNMENT));
+        fVLOG_VTCM_WORD_INCREMENT(EA,VvV.uw[i],VwV,i,ALIGNMENT,MuV);
+    }
+    fHIDE(fLOG_SCATTER_OP(4);)
+    fSCATTER_FINISH(1)
+})
+
+EXTINSN(V6_vscattermh_add,  "vscatter(Rt32,Mu2,Vv32.h).h+=Vw32", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_SCATTER,A_CVI_VA,A_CVI_VM,A_MEMLIKE), "Scatter halfword-Add",
+{
+    fHIDE(int i;)
+    fHIDE(int ALIGNMENT=2;)
+	fHIDE(int element_size = 2;)
+    fHIDE(fSCATTER_INIT( RtV, MuV, element_size);)
+    fVLASTBYTE(MuV, element_size);
+    fVALIGN(RtV, element_size);
+    fVFOREACH(16, i) {
+        EA = (RtV+fVALIGN(VvV.uh[i],ALIGNMENT));
+        fVLOG_VTCM_HALFWORD_INCREMENT(EA,VvV.uh[i],VwV,i,ALIGNMENT,MuV);
+    }
+    fHIDE(fLOG_SCATTER_OP(2);)
+    fSCATTER_FINISH(1)
+})
+
+
+EXTINSN(V6_vscattermwq,  "if (Qs4) vscatter(Rt32,Mu2,Vv32.w).w=Vw32", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_SCATTER,A_CVI_VA,A_CVI_VM,A_MEMLIKE), "Scatter Words conditional",
+{
+    fHIDE(int i;)
+	fHIDE(int element_size = 4;)
+    fHIDE(fSCATTER_INIT( RtV, MuV, element_size);)
+    fVLASTBYTE(MuV, element_size);
+    fVALIGN(RtV, element_size);
+    fVFOREACH(32, i) {
+        EA = RtV+VvV.uw[i];
+        fVLOG_VTCM_WORDQ(EA,VvV.uw[i], VwV,i,QsV,MuV);
+    }
+    fSCATTER_FINISH(0)
+})
+
+EXTINSN(V6_vscattermhq,  "if (Qs4) vscatter(Rt32,Mu2,Vv32.h).h=Vw32", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_SCATTER,A_CVI_VA,A_CVI_VM,A_MEMLIKE), "Scatter HalfWords conditional",
+{
+    fHIDE(int i;)
+	fHIDE(int element_size = 2;)
+    fHIDE(fSCATTER_INIT( RtV, MuV, element_size);)
+    fVLASTBYTE(MuV, element_size);
+    fVALIGN(RtV, element_size);
+    fVFOREACH(16, i) {
+        EA = RtV+VvV.uh[i];
+        fVLOG_VTCM_HALFWORDQ(EA,VvV.uh[i],VwV,i,QsV,MuV);
+    }
+    fSCATTER_FINISH(0)
+})
+
+
+
+
+EXTINSN(V6_vscattermhw , "vscatter(Rt32,Mu2,Vvv32.w).h=Vw32", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_SCATTER,A_CVI_VA_DV,A_CVI_VM,A_MEMLIKE), "Scatter Words",
+{
+    fHIDE(int i;)
+    fHIDE(int j;)
+	fHIDE(int element_size = 2;)
+    fHIDE(fSCATTER_INIT( RtV, MuV, element_size);)
+    fVLASTBYTE(MuV, element_size);
+    fVALIGN(RtV, element_size);
+    fVFOREACH(32, i) {
+        for(j = 0; j < 2; j++) {
+            EA = RtV+VvvV.v[j].uw[i];
+            fVLOG_VTCM_HALFWORD_DV(EA,VvvV.v[j].uw[i],VwV,(2*i+j),i,j,MuV);
+        }
+    }
+    fSCATTER_FINISH(0)
+})
+
+
+
+EXTINSN(V6_vscattermhwq,  "if (Qs4) vscatter(Rt32,Mu2,Vvv32.w).h=Vw32", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_SCATTER,A_CVI_VA_DV,A_CVI_VM,A_MEMLIKE), "Scatter halfwords conditional",
+{
+    fHIDE(int i;)
+    fHIDE(int j;)
+	fHIDE(int element_size = 2;)
+    fHIDE(fSCATTER_INIT( RtV, MuV, element_size);)
+    fVLASTBYTE(MuV, element_size);
+    fVALIGN(RtV, element_size);
+    fVFOREACH(32, i) {
+        for(j = 0; j < 2; j++) {
+            EA = RtV+VvvV.v[j].uw[i];
+            fVLOG_VTCM_HALFWORDQ_DV(EA,VvvV.v[j].uw[i],VwV,(2*i+j),QsV,i,j,MuV);
+        }
+    }
+    fSCATTER_FINISH(0)
+})
+
+EXTINSN(V6_vscattermhw_add,  "vscatter(Rt32,Mu2,Vvv32.w).h+=Vw32", ATTRIBS(A_EXTENSION,A_CVI,A_CVI_SCATTER,A_CVI_VA_DV,A_CVI_VM,A_MEMLIKE), "Scatter halfwords-add",
+{
+    fHIDE(int i;)
+    fHIDE(int j;)
+    fHIDE(int ALIGNMENT=2;)
+	fHIDE(int element_size = 2;)
+    fHIDE(fSCATTER_INIT( RtV, MuV, element_size);)
+    fVLASTBYTE(MuV, element_size);
+    fVALIGN(RtV, element_size);
+    fVFOREACH(32, i) {
+        for(j = 0; j < 2; j++) {
+             EA =  RtV + fVALIGN(VvvV.v[j].uw[i],ALIGNMENT);;
+             fVLOG_VTCM_HALFWORD_INCREMENT_DV(EA,VvvV.v[j].uw[i],VwV,(2*i+j),i,j,ALIGNMENT,MuV);
+        }
+    }
+    fHIDE(fLOG_SCATTER_OP(2);)
+    fSCATTER_FINISH(1)
+})
+
+EXTINSN(V6_vprefixqb,"Vd32.b=prefixsum(Qv4)",   ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VS),  "parallel prefix sum of Q into byte",
+{
+    fHIDE(int i;)
+    fHIDE(size1u_t acc = 0;)
+    fVFOREACH(8, i) {
+        acc += fGETQBIT(QvV,i);
+        VdV.ub[i] = acc;
+    }
+    } )
+EXTINSN(V6_vprefixqh,"Vd32.h=prefixsum(Qv4)",   ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VS),  "parallel prefix sum of Q into halfwords",
+{
+    fHIDE(int i;)
+    fHIDE(size2u_t acc = 0;)
+    fVFOREACH(16, i) {
+        acc += fGETQBIT(QvV,i*2+0);
+        acc += fGETQBIT(QvV,i*2+1);
+        VdV.uh[i] = acc;
+    }
+    } )
+EXTINSN(V6_vprefixqw,"Vd32.w=prefixsum(Qv4)",   ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VS),  "parallel prefix sum of Q into words",
+{
+    fHIDE(int i;)
+    fHIDE(size4u_t acc = 0;)
+    fVFOREACH(32, i) {
+        acc += fGETQBIT(QvV,i*4+0);
+        acc += fGETQBIT(QvV,i*4+1);
+        acc += fGETQBIT(QvV,i*4+2);
+        acc += fGETQBIT(QvV,i*4+3);
+        VdV.uw[i] = acc;
+    }
+    } )
+
+
+
+
+
+/******************************************************************************
+ DEBUG Vector/Register Printing
+ ******************************************************************************/
+
+#define PRINT_VU(TYPE, TYPE2, COUNT)\
+    int i;  \
+    size4u_t vec_len = fVBYTES();\
+    fprintf(stdout,"V%2d: ",VuN);  \
+    for (i=0;i<vec_len>>COUNT;i++) {         \
+        fprintf(stdout,TYPE2 " ", VuV.TYPE[i]); \
+    };  \
+    fprintf(stdout,"\\n");  \
+	fflush(stdout);\
+
+#undef ATTR_VMEM
+#undef ATTR_VMEMU
+#undef ATTR_VMEM_NT
+
+#endif /* NO_MMVEC */
+
+#ifdef __SELF_DEF_EXTINSN
+#undef EXTINSN
+#undef __SELF_DEF_EXTINSN
+#endif
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 25/30] Hexagon HVX (target/hexagon) instruction decoding
  2021-09-20 21:23 [PATCH v3 00/30] Hexagon HVX (target/hexagon) patch series Taylor Simpson
                   ` (23 preceding siblings ...)
  2021-09-20 21:24 ` [PATCH v3 24/30] Hexagon HVX (target/hexagon) import semantics Taylor Simpson
@ 2021-09-20 21:24 ` Taylor Simpson
  2021-09-20 23:28   ` Richard Henderson
  2021-09-20 21:24 ` [PATCH v3 26/30] Hexagon HVX (target/hexagon) import instruction encodings Taylor Simpson
                   ` (4 subsequent siblings)
  29 siblings, 1 reply; 51+ messages in thread
From: Taylor Simpson @ 2021-09-20 21:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: ale, bcain, tsimpson, richard.henderson, f4bug

Add new file to target/hexagon/meson.build

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/mmvec/decode_ext_mmvec.h |  24 ++++
 target/hexagon/decode.c                 |  24 +++-
 target/hexagon/mmvec/decode_ext_mmvec.c | 236 ++++++++++++++++++++++++++++++++
 target/hexagon/meson.build              |   1 +
 4 files changed, 283 insertions(+), 2 deletions(-)
 create mode 100644 target/hexagon/mmvec/decode_ext_mmvec.h
 create mode 100644 target/hexagon/mmvec/decode_ext_mmvec.c

diff --git a/target/hexagon/mmvec/decode_ext_mmvec.h b/target/hexagon/mmvec/decode_ext_mmvec.h
new file mode 100644
index 0000000..3664b68
--- /dev/null
+++ b/target/hexagon/mmvec/decode_ext_mmvec.h
@@ -0,0 +1,24 @@
+/*
+ *  Copyright(c) 2019-2021 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HEXAGON_DECODE_EXT_MMVEC_H
+#define HEXAGON_DECODE_EXT_MMVEC_H
+
+void mmvec_ext_decode_checks(Packet *pkt, bool disas_only);
+SlotMask mmvec_ext_decode_find_iclass_slots(int opcode);
+
+#endif
diff --git a/target/hexagon/decode.c b/target/hexagon/decode.c
index d424245..653bfd7 100644
--- a/target/hexagon/decode.c
+++ b/target/hexagon/decode.c
@@ -22,6 +22,7 @@
 #include "decode.h"
 #include "insn.h"
 #include "printinsn.h"
+#include "mmvec/decode_ext_mmvec.h"
 
 #define fZXTN(N, M, VAL) ((VAL) & ((1LL << (N)) - 1))
 
@@ -566,8 +567,12 @@ static void decode_remove_extenders(Packet *packet)
 
 static SlotMask get_valid_slots(const Packet *pkt, unsigned int slot)
 {
-    return find_iclass_slots(pkt->insn[slot].opcode,
-                             pkt->insn[slot].iclass);
+    if (GET_ATTRIB(pkt->insn[slot].opcode, A_EXTENSION)) {
+        return mmvec_ext_decode_find_iclass_slots(pkt->insn[slot].opcode);
+    } else {
+        return find_iclass_slots(pkt->insn[slot].opcode,
+                                 pkt->insn[slot].iclass);
+    }
 }
 
 #define DECODE_NEW_TABLE(TAG, SIZE, WHATNOT)     /* NOTHING */
@@ -728,6 +733,11 @@ decode_insns_tablewalk(Insn *insn, const DectreeTable *table,
         }
         decode_op(insn, opc, encoding);
         return 1;
+    } else if (table->table[i].type == DECTREE_EXTSPACE) {
+        /*
+         * For now, HVX will be the only coproc
+         */
+        return decode_insns_tablewalk(insn, ext_trees[EXT_IDX_mmvec], encoding);
     } else {
         return 0;
     }
@@ -874,6 +884,7 @@ int decode_packet(int max_words, const uint32_t *words, Packet *pkt,
     int words_read = 0;
     bool end_of_packet = false;
     int new_insns = 0;
+    int i;
     uint32_t encoding32;
 
     /* Initialize */
@@ -901,6 +912,11 @@ int decode_packet(int max_words, const uint32_t *words, Packet *pkt,
         return 0;
     }
     pkt->encod_pkt_size_in_bytes = words_read * 4;
+    pkt->pkt_has_hvx = false;
+    for (i = 0; i < num_insns; i++) {
+        pkt->pkt_has_hvx |=
+            GET_ATTRIB(pkt->insn[i].opcode, A_CVI);
+    }
 
     /*
      * Check for :endloop in the parse bits
@@ -931,6 +947,10 @@ int decode_packet(int max_words, const uint32_t *words, Packet *pkt,
     decode_set_slot_number(pkt);
     decode_fill_newvalue_regno(pkt);
 
+    if (pkt->pkt_has_hvx) {
+        mmvec_ext_decode_checks(pkt, disas_only);
+    }
+
     if (!disas_only) {
         decode_shuffle_for_execution(pkt);
         decode_split_cmpjump(pkt);
diff --git a/target/hexagon/mmvec/decode_ext_mmvec.c b/target/hexagon/mmvec/decode_ext_mmvec.c
new file mode 100644
index 0000000..061a65a
--- /dev/null
+++ b/target/hexagon/mmvec/decode_ext_mmvec.c
@@ -0,0 +1,236 @@
+/*
+ *  Copyright(c) 2019-2021 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "decode.h"
+#include "opcodes.h"
+#include "insn.h"
+#include "iclass.h"
+#include "mmvec/mmvec.h"
+#include "mmvec/decode_ext_mmvec.h"
+
+static void
+check_new_value(Packet *pkt)
+{
+    /* .new value for a MMVector store */
+    int i, j;
+    const char *reginfo;
+    const char *destletters;
+    const char *dststr = NULL;
+    uint16_t def_opcode;
+    char letter;
+    int def_regnum;
+
+    for (i = 1; i < pkt->num_insns; i++) {
+        uint16_t use_opcode = pkt->insn[i].opcode;
+        if (GET_ATTRIB(use_opcode, A_DOTNEWVALUE) &&
+            GET_ATTRIB(use_opcode, A_CVI) &&
+            GET_ATTRIB(use_opcode, A_STORE)) {
+            int use_regidx = strchr(opcode_reginfo[use_opcode], 's') -
+                opcode_reginfo[use_opcode];
+            /*
+             * What's encoded at the N-field is the offset to who's producing
+             * the value.
+             * Shift off the LSB which indicates odd/even register.
+             */
+            int def_off = ((pkt->insn[i].regno[use_regidx]) >> 1);
+            int def_oreg = pkt->insn[i].regno[use_regidx] & 1;
+            int def_idx = -1;
+            for (j = i - 1; (j >= 0) && (def_off >= 0); j--) {
+                if (!GET_ATTRIB(pkt->insn[j].opcode, A_CVI)) {
+                    continue;
+                }
+                def_off--;
+                if (def_off == 0) {
+                    def_idx = j;
+                    break;
+                }
+            }
+            /*
+             * Check for a badly encoded N-field which points to an instruction
+             * out-of-range
+             */
+            g_assert(!((def_off != 0) || (def_idx < 0) ||
+                       (def_idx > (pkt->num_insns - 1))));
+
+            /* def_idx is the index of the producer */
+            def_opcode = pkt->insn[def_idx].opcode;
+            reginfo = opcode_reginfo[def_opcode];
+            destletters = "dexy";
+            for (j = 0; (letter = destletters[j]) != 0; j++) {
+                dststr = strchr(reginfo, letter);
+                if (dststr != NULL) {
+                    break;
+                }
+            }
+            if ((dststr == NULL)  && GET_ATTRIB(def_opcode, A_CVI_GATHER)) {
+                def_regnum = 0;
+                pkt->insn[i].regno[use_regidx] = def_oreg;
+                pkt->insn[i].new_value_producer_slot = pkt->insn[def_idx].slot;
+            } else {
+                if (dststr == NULL) {
+                    /* still not there, we have a bad packet */
+                    g_assert_not_reached();
+                }
+                def_regnum = pkt->insn[def_idx].regno[dststr - reginfo];
+                /* Now patch up the consumer with the register number */
+                pkt->insn[i].regno[use_regidx] = def_regnum ^ def_oreg;
+                /* special case for (Vx,Vy) */
+                dststr = strchr(reginfo, 'y');
+                if (def_oreg && strchr(reginfo, 'x') && dststr) {
+                    def_regnum = pkt->insn[def_idx].regno[dststr - reginfo];
+                    pkt->insn[i].regno[use_regidx] = def_regnum;
+                }
+                /*
+                 * We need to remember who produces this value to later
+                 * check if it was dynamically cancelled
+                 */
+                pkt->insn[i].new_value_producer_slot = pkt->insn[def_idx].slot;
+            }
+        }
+    }
+}
+
+/*
+ * We don't want to reorder slot1/slot0 with respect to each other.
+ * So in our shuffling, we don't want to move the .cur / .tmp vmem earlier
+ * Instead, we should move the producing instruction later
+ * But the producing instruction might feed a .new store!
+ * So we may need to move that even later.
+ */
+
+static void
+decode_mmvec_move_cvi_to_end(Packet *pkt, int max)
+{
+    int i;
+    for (i = 0; i < max; i++) {
+        if (GET_ATTRIB(pkt->insn[i].opcode, A_CVI)) {
+            int last_inst = pkt->num_insns - 1;
+            uint16_t last_opcode = pkt->insn[last_inst].opcode;
+
+            /*
+             * If the last instruction is an endloop, move to the one before it
+             * Keep endloop as the last thing always
+             */
+            if ((last_opcode == J2_endloop0) ||
+                (last_opcode == J2_endloop1) ||
+                (last_opcode == J2_endloop01)) {
+                last_inst--;
+            }
+
+            decode_send_insn_to(pkt, i, last_inst);
+            max--;
+            i--;    /* Retry this index now that packet has rotated */
+        }
+    }
+}
+
+static void
+decode_shuffle_for_execution_vops(Packet *pkt)
+{
+    /*
+     * Sort for .new
+     */
+    int i;
+    for (i = 0; i < pkt->num_insns; i++) {
+        uint16_t opcode = pkt->insn[i].opcode;
+        if (GET_ATTRIB(opcode, A_LOAD) &&
+            (GET_ATTRIB(opcode, A_CVI_NEW) ||
+             GET_ATTRIB(opcode, A_CVI_TMP))) {
+            /*
+             * Find prior consuming vector instructions
+             * Move to end of packet
+             */
+            decode_mmvec_move_cvi_to_end(pkt, i);
+            break;
+        }
+    }
+
+    /* Move HVX new value stores to the end of the packet */
+    for (i = 0; i < pkt->num_insns - 1; i++) {
+        uint16_t opcode = pkt->insn[i].opcode;
+        if (GET_ATTRIB(opcode, A_STORE) &&
+            GET_ATTRIB(opcode, A_CVI_NEW) &&
+            !GET_ATTRIB(opcode, A_CVI_SCATTER_RELEASE)) {
+            int last_inst = pkt->num_insns - 1;
+            uint16_t last_opcode = pkt->insn[last_inst].opcode;
+
+            /*
+             * If the last instruction is an endloop, move to the one before it
+             * Keep endloop as the last thing always
+             */
+            if ((last_opcode == J2_endloop0) ||
+                (last_opcode == J2_endloop1) ||
+                (last_opcode == J2_endloop01)) {
+                last_inst--;
+            }
+
+            decode_send_insn_to(pkt, i, last_inst);
+            break;
+        }
+    }
+}
+
+static void
+check_for_vhist(Packet *pkt)
+{
+    pkt->vhist_insn = NULL;
+    for (int i = 0; i < pkt->num_insns; i++) {
+        Insn *insn = &pkt->insn[i];
+        int opcode = insn->opcode;
+        if (GET_ATTRIB(opcode, A_CVI) && GET_ATTRIB(opcode, A_CVI_4SLOT)) {
+                pkt->vhist_insn = insn;
+                return;
+        }
+    }
+}
+
+/*
+ * Public Functions
+ */
+
+SlotMask mmvec_ext_decode_find_iclass_slots(int opcode)
+{
+    if (GET_ATTRIB(opcode, A_CVI_VM)) {
+        /* HVX memory instruction */
+        if (GET_ATTRIB(opcode, A_RESTRICT_SLOT0ONLY)) {
+            return SLOTS_0;
+        } else if (GET_ATTRIB(opcode, A_RESTRICT_SLOT1ONLY)) {
+            return SLOTS_1;
+        }
+        return SLOTS_01;
+    } else if (GET_ATTRIB(opcode, A_RESTRICT_SLOT2ONLY)) {
+        return SLOTS_2;
+    } else if (GET_ATTRIB(opcode, A_CVI_VX)) {
+        /* HVX multiply instruction */
+        return SLOTS_23;
+    } else if (GET_ATTRIB(opcode, A_CVI_VS_VX)) {
+        /* HVX permute/shift instruction */
+        return SLOTS_23;
+    } else {
+        return SLOTS_0123;
+    }
+}
+
+void mmvec_ext_decode_checks(Packet *pkt, bool disas_only)
+{
+    check_new_value(pkt);
+    if (!disas_only) {
+        decode_shuffle_for_execution_vops(pkt);
+    }
+    check_for_vhist(pkt);
+}
diff --git a/target/hexagon/meson.build b/target/hexagon/meson.build
index cae366c..64b144b 100644
--- a/target/hexagon/meson.build
+++ b/target/hexagon/meson.build
@@ -174,6 +174,7 @@ hexagon_ss.add(files(
     'printinsn.c',
     'arch.c',
     'fma_emu.c',
+    'mmvec/decode_ext_mmvec.c',
     'mmvec/system_ext_mmvec.c',
 ))
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 26/30] Hexagon HVX (target/hexagon) import instruction encodings
  2021-09-20 21:23 [PATCH v3 00/30] Hexagon HVX (target/hexagon) patch series Taylor Simpson
                   ` (24 preceding siblings ...)
  2021-09-20 21:24 ` [PATCH v3 25/30] Hexagon HVX (target/hexagon) instruction decoding Taylor Simpson
@ 2021-09-20 21:24 ` Taylor Simpson
  2021-09-20 21:24 ` [PATCH v3 27/30] Hexagon HVX (tests/tcg/hexagon) vector_add_int test Taylor Simpson
                   ` (3 subsequent siblings)
  29 siblings, 0 replies; 51+ messages in thread
From: Taylor Simpson @ 2021-09-20 21:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: ale, bcain, tsimpson, richard.henderson, f4bug

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/decode.c                      |   4 +
 target/hexagon/imported/allextenc.def        |  20 +
 target/hexagon/imported/encode.def           |   1 +
 target/hexagon/imported/mmvec/encode_ext.def | 794 +++++++++++++++++++++++++++
 4 files changed, 819 insertions(+)
 create mode 100644 target/hexagon/imported/allextenc.def
 create mode 100644 target/hexagon/imported/mmvec/encode_ext.def

diff --git a/target/hexagon/decode.c b/target/hexagon/decode.c
index 653bfd7..6f0f27b 100644
--- a/target/hexagon/decode.c
+++ b/target/hexagon/decode.c
@@ -47,6 +47,7 @@ enum {
         /* Name   Num Table */
 DEF_REGMAP(R_16,  16, 0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23)
 DEF_REGMAP(R__8,  8,  0, 2, 4, 6, 16, 18, 20, 22)
+DEF_REGMAP(R_8,   8,  0, 1, 2, 3, 4, 5, 6, 7)
 
 #define DECODE_MAPPED_REG(OPNUM, NAME) \
     insn->regno[OPNUM] = DECODE_REGISTER_##NAME[insn->regno[OPNUM]];
@@ -158,6 +159,9 @@ static void decode_ext_init(void)
     for (i = EXT_IDX_noext; i < EXT_IDX_noext_AFTER; i++) {
         ext_trees[i] = &dectree_table_DECODE_EXT_EXT_noext;
     }
+    for (i = EXT_IDX_mmvec; i < EXT_IDX_mmvec_AFTER; i++) {
+        ext_trees[i] = &dectree_table_DECODE_EXT_EXT_mmvec;
+    }
 }
 
 typedef struct {
diff --git a/target/hexagon/imported/allextenc.def b/target/hexagon/imported/allextenc.def
new file mode 100644
index 0000000..39a3e93
--- /dev/null
+++ b/target/hexagon/imported/allextenc.def
@@ -0,0 +1,20 @@
+/*
+ *  Copyright(c) 2019-2021 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#define EXTNAME mmvec
+#include "mmvec/encode_ext.def"
+#undef EXTNAME
diff --git a/target/hexagon/imported/encode.def b/target/hexagon/imported/encode.def
index b9368d1..e40e7fb 100644
--- a/target/hexagon/imported/encode.def
+++ b/target/hexagon/imported/encode.def
@@ -71,6 +71,7 @@
 
 #include "encode_pp.def"
 #include "encode_subinsn.def"
+#include "allextenc.def"
 
 #ifdef __SELF_DEF_FIELD32
 #undef __SELF_DEF_FIELD32
diff --git a/target/hexagon/imported/mmvec/encode_ext.def b/target/hexagon/imported/mmvec/encode_ext.def
new file mode 100644
index 0000000..6fbbe2c
--- /dev/null
+++ b/target/hexagon/imported/mmvec/encode_ext.def
@@ -0,0 +1,794 @@
+/*
+ *  Copyright(c) 2019-2021 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#define CONCAT(A,B) A##B
+#define EXTEXTNAME(X) CONCAT(EXT_,X)
+#define DEF_ENC(TAG,STR) DEF_EXT_ENC(TAG,EXTEXTNAME(EXTNAME),STR)
+
+
+#ifndef NO_MMVEC
+DEF_ENC(V6_extractw,  ICLASS_LD" 001 0 000sssss  PP0uuuuu  --1ddddd") /* coproc insn, returns Rd */
+#endif
+
+
+#ifndef NO_MMVEC
+
+
+
+DEF_CLASS32(ICLASS_NCJ" 1--- -------- PP------ --------",COPROC_VMEM)
+DEF_CLASS32(ICLASS_NCJ" 1000 0-0ttttt PPi--iii ---ddddd",BaseOffset_VMEM_Loads)
+DEF_CLASS32(ICLASS_NCJ" 1000 1-0ttttt PPivviii ---ddddd",BaseOffset_if_Pv_VMEM_Loads)
+DEF_CLASS32(ICLASS_NCJ" 1000 0-1ttttt PPi--iii --------",BaseOffset_VMEM_Stores1)
+DEF_CLASS32(ICLASS_NCJ" 1000 1-0ttttt PPi--iii 00------",BaseOffset_VMEM_Stores2)
+DEF_CLASS32(ICLASS_NCJ" 1000 1-1ttttt PPivviii --------",BaseOffset_if_Pv_VMEM_Stores)
+
+DEF_CLASS32(ICLASS_NCJ" 1001 0-0xxxxx PP---iii ---ddddd",PostImm_VMEM_Loads)
+DEF_CLASS32(ICLASS_NCJ" 1001 1-0xxxxx PP-vviii ---ddddd",PostImm_if_Pv_VMEM_Loads)
+DEF_CLASS32(ICLASS_NCJ" 1001 0-1xxxxx PP---iii --------",PostImm_VMEM_Stores1)
+DEF_CLASS32(ICLASS_NCJ" 1001 1-0xxxxx PP---iii 00------",PostImm_VMEM_Stores2)
+DEF_CLASS32(ICLASS_NCJ" 1001 1-1xxxxx PP-vviii --------",PostImm_if_Pv_VMEM_Stores)
+
+DEF_CLASS32(ICLASS_NCJ" 1011 0-0xxxxx PPu----- ---ddddd",PostM_VMEM_Loads)
+DEF_CLASS32(ICLASS_NCJ" 1011 1-0xxxxx PPuvv--- ---ddddd",PostM_if_Pv_VMEM_Loads)
+DEF_CLASS32(ICLASS_NCJ" 1011 0-1xxxxx PPu----- --------",PostM_VMEM_Stores1)
+DEF_CLASS32(ICLASS_NCJ" 1011 1-0xxxxx PPu----- 00------",PostM_VMEM_Stores2)
+DEF_CLASS32(ICLASS_NCJ" 1011 1-1xxxxx PPuvv--- --------",PostM_if_Pv_VMEM_Stores)
+
+DEF_CLASS32(ICLASS_NCJ" 110- 0------- PP------ --------",Z_Load)
+DEF_CLASS32(ICLASS_NCJ" 110- 1------- PP------ --------",Z_Load_if_Pv)
+
+DEF_CLASS32(ICLASS_NCJ" 1111 000ttttt PPu--0-- ---vvvvv",Gather)
+DEF_CLASS32(ICLASS_NCJ" 1111 000ttttt PPu--1-- -ssvvvvv",Gather_if_Qs)
+DEF_CLASS32(ICLASS_NCJ" 1111 001ttttt PPuvvvvv ---wwwww",Scatter)
+DEF_CLASS32(ICLASS_NCJ" 1111 001ttttt PPuvvvvv -----sss",Scatter_New)
+DEF_CLASS32(ICLASS_NCJ" 1111 1--ttttt PPuvvvvv -sswwwww",Scatter_if_Qs)
+
+
+DEF_FIELD32(ICLASS_NCJ" 1--- -!------ PP------ --------",NT,"NonTemporal")
+
+
+
+DEF_FIELDROW_DESC32(                ICLASS_NCJ" 1 000 --- ----- PP i --iii ----- ---","[#0] vmem(Rt+#s4)[:nt]")
+
+#define LDST_ENC(TAG,MAJ3,MID3,RREG,TINY6,MIN3,VREG) DEF_ENC(TAG, ICLASS_NCJ "1" #MAJ3 #MID3 #RREG "PP" #TINY6 #MIN3 #VREG)
+
+#define LDST_BO(TAGPRE,MID3,PRED,MIN3,VREG) LDST_ENC(TAGPRE##_ai, 000,MID3,ttttt,i PRED iii,MIN3,VREG)
+#define LDST_PI(TAGPRE,MID3,PRED,MIN3,VREG) LDST_ENC(TAGPRE##_pi, 001,MID3,xxxxx,- PRED iii,MIN3,VREG)
+#define LDST_PM(TAGPRE,MID3,PRED,MIN3,VREG) LDST_ENC(TAGPRE##_ppu,011,MID3,xxxxx,u PRED ---,MIN3,VREG)
+
+#define LDST_BASICLD(OP,TAGPRE) \
+    OP(TAGPRE,                000,00,000,ddddd) \
+    OP(TAGPRE##_nt,           010,00,000,ddddd) \
+    OP(TAGPRE##_cur,          000,00,001,ddddd) \
+    OP(TAGPRE##_nt_cur,       010,00,001,ddddd) \
+    OP(TAGPRE##_tmp,          000,00,010,ddddd) \
+    OP(TAGPRE##_nt_tmp,       010,00,010,ddddd)
+
+#define LDST_BASICST(OP,TAGPRE) \
+    OP(TAGPRE,           001,--,000,sssss) \
+    OP(TAGPRE##_nt,      011,--,000,sssss) \
+    OP(TAGPRE##_new,     001,--,001,-0sss) \
+    OP(TAGPRE##_srls,    001,--,001,-1---) \
+    OP(TAGPRE##_nt_new,  011,--,001,--sss) \
+
+
+#define LDST_QPREDST(OP,TAGPRE) \
+    OP(TAGPRE##_qpred,    100,vv,000,sssss) \
+    OP(TAGPRE##_nt_qpred, 110,vv,000,sssss) \
+    OP(TAGPRE##_nqpred,   100,vv,001,sssss) \
+    OP(TAGPRE##_nt_nqpred,110,vv,001,sssss) \
+
+#define LDST_CONDLD(OP,TAGPRE) \
+    OP(TAGPRE##_pred,         100,vv,010,ddddd) \
+    OP(TAGPRE##_nt_pred,      110,vv,010,ddddd) \
+    OP(TAGPRE##_npred,        100,vv,011,ddddd) \
+    OP(TAGPRE##_nt_npred,     110,vv,011,ddddd) \
+    OP(TAGPRE##_cur_pred,     100,vv,100,ddddd) \
+    OP(TAGPRE##_nt_cur_pred,  110,vv,100,ddddd) \
+    OP(TAGPRE##_cur_npred,    100,vv,101,ddddd) \
+    OP(TAGPRE##_nt_cur_npred, 110,vv,101,ddddd) \
+    OP(TAGPRE##_tmp_pred,     100,vv,110,ddddd) \
+    OP(TAGPRE##_nt_tmp_pred,  110,vv,110,ddddd) \
+    OP(TAGPRE##_tmp_npred,    100,vv,111,ddddd) \
+    OP(TAGPRE##_nt_tmp_npred, 110,vv,111,ddddd) \
+
+#define LDST_PREDST(OP,TAGPRE,NT,MIN2) \
+    OP(TAGPRE##_pred,      1 NT 1,vv,MIN2 0,sssss) \
+    OP(TAGPRE##_npred,     1 NT 1,vv,MIN2 1,sssss)
+
+#define LDST_PREDSTNEW(OP,TAGPRE,NT,MIN2) \
+    OP(TAGPRE##_pred,      1 NT 1,vv,MIN2 0,NT 0 sss) \
+    OP(TAGPRE##_npred,     1 NT 1,vv,MIN2 1,NT 1 sss)
+
+// 0.0,vv,0--,sssss: pred st
+#define LDST_BASICPREDST(OP,TAGPRE) \
+    LDST_PREDST(OP,TAGPRE,             0,00) \
+    LDST_PREDST(OP,TAGPRE##_nt,        1,00) \
+    LDST_PREDSTNEW(OP,TAGPRE##_new,    0,01) \
+    LDST_PREDSTNEW(OP,TAGPRE##_nt_new, 1,01)
+
+
+
+LDST_BASICLD(LDST_BO,V6_vL32b)
+LDST_CONDLD(LDST_BO,V6_vL32b)
+LDST_BASICLD(LDST_PI,V6_vL32b)
+LDST_CONDLD(LDST_PI,V6_vL32b)
+LDST_BASICLD(LDST_PM,V6_vL32b)
+LDST_CONDLD(LDST_PM,V6_vL32b)
+
+// Loads
+
+LDST_BO(V6_vL32Ub,000,00,111,ddddd)
+//Stores
+LDST_BASICST(LDST_BO,V6_vS32b)
+
+
+LDST_BO(V6_vS32Ub,001,--,111,sssss)
+
+
+
+
+// Byte Enabled Stores
+LDST_QPREDST(LDST_BO,V6_vS32b)
+
+// Scalar Predicated Stores
+LDST_BASICPREDST(LDST_BO,V6_vS32b)
+
+
+LDST_PREDST(LDST_BO,V6_vS32Ub,0,11)
+
+
+
+
+DEF_FIELDROW_DESC32(                ICLASS_NCJ" 1 001 --- ----- PP - ----- ddddd ---","[#1] vmem(Rx++#s3)[:nt]")
+
+// Loads
+LDST_PI(V6_vL32Ub,000,00,111,ddddd)
+
+//Stores
+LDST_BASICST(LDST_PI,V6_vS32b)
+
+
+
+LDST_PI(V6_vS32Ub,001,--,111,sssss)
+
+
+// Byte Enabled Stores
+LDST_QPREDST(LDST_PI,V6_vS32b)
+
+
+// Scalar Predicated Stores
+LDST_BASICPREDST(LDST_PI,V6_vS32b)
+
+
+LDST_PREDST(LDST_PI,V6_vS32Ub,0,11)
+
+
+
+DEF_FIELDROW_DESC32(            ICLASS_NCJ" 1 011 --- ----- PP - ----- ----- ---","[#3] vmem(Rx++#M)[:nt]")
+
+// Loads
+LDST_PM(V6_vL32Ub,000,00,111,ddddd)
+
+//Stores
+LDST_BASICST(LDST_PM,V6_vS32b)
+
+
+
+LDST_PM(V6_vS32Ub,001,--,111,sssss)
+
+// Byte Enabled Stores
+LDST_QPREDST(LDST_PM,V6_vS32b)
+
+// Scalar Predicated Stores
+LDST_BASICPREDST(LDST_PM,V6_vS32b)
+
+
+LDST_PREDST(LDST_PM,V6_vS32Ub,0,11)
+
+
+
+DEF_ENC(V6_vaddcarrysat,    ICLASS_CJ" 1 101 100 vvvvv PP 1 uuuuu 0ss ddddd") //
+DEF_ENC(V6_vaddcarryo,        ICLASS_CJ" 1 101 101 vvvvv PP 1 uuuuu 0ee ddddd") //
+DEF_ENC(V6_vsubcarryo,        ICLASS_CJ" 1 101 101 vvvvv PP 1 uuuuu 1ee ddddd") //
+DEF_ENC(V6_vsatdw,          ICLASS_CJ" 1 101 100 vvvvv PP 1 uuuuu 111 ddddd") //
+
+DEF_FIELDROW_DESC32(           ICLASS_NCJ" 1 111 --- ----- PP - ----- ----- ---","[#6] vgather,vscatter")
+DEF_ENC(V6_vgathermw,         ICLASS_NCJ" 1 111 000 ttttt PP u --000 --- vvvvv")    // vtmp.w=vmem(Rt32,Mu2,Vv32.w).w
+DEF_ENC(V6_vgathermh,         ICLASS_NCJ" 1 111 000 ttttt PP u --001 --- vvvvv")    // vtmp.h=vmem(Rt32,Mu2,Vv32.h).h
+DEF_ENC(V6_vgathermhw,         ICLASS_NCJ" 1 111 000 ttttt PP u --010 --- vvvvv")    // vtmp.h=vmem(Rt32,Mu2,Vvv32.w).h
+
+
+DEF_ENC(V6_vgathermwq,         ICLASS_NCJ" 1 111 000 ttttt PP u --100 -ss vvvvv")    // if (Qs4) vtmp.w=vmem(Rt32,Mu2,Vv32.w).w
+DEF_ENC(V6_vgathermhq,         ICLASS_NCJ" 1 111 000 ttttt PP u --101 -ss vvvvv")    // if (Qs4) vtmp.h=vmem(Rt32,Mu2,Vv32.h).h
+DEF_ENC(V6_vgathermhwq,     ICLASS_NCJ" 1 111 000 ttttt PP u --110 -ss vvvvv")    // if (Qs4) vtmp.h=vmem(Rt32,Mu2,Vvv32.w).h
+
+
+
+DEF_ENC(V6_vscattermw,         ICLASS_NCJ" 1 111 001 ttttt PP u vvvvv 000 wwwww")    // vmem(Rt32,Mu2,Vv32.w)=Vw32.w
+DEF_ENC(V6_vscattermh,         ICLASS_NCJ" 1 111 001 ttttt PP u vvvvv 001 wwwww")    // vmem(Rt32,Mu2,Vv32.h)=Vw32.h
+DEF_ENC(V6_vscattermhw,     ICLASS_NCJ" 1 111 001 ttttt PP u vvvvv 010 wwwww")    // vmem(Rt32,Mu2,Vv32.h)=Vw32.h
+
+DEF_ENC(V6_vscattermw_add,     ICLASS_NCJ" 1 111 001 ttttt PP u vvvvv 100 wwwww")    // vmem(Rt32,Mu2,Vv32.w) += Vw32.w
+DEF_ENC(V6_vscattermh_add,     ICLASS_NCJ" 1 111 001 ttttt PP u vvvvv 101 wwwww")    // vmem(Rt32,Mu2,Vv32.h) += Vw32.h
+DEF_ENC(V6_vscattermhw_add, ICLASS_NCJ" 1 111 001 ttttt PP u vvvvv 110 wwwww")    // vmem(Rt32,Mu2,Vv32.h) += Vw32.h
+
+
+DEF_ENC(V6_vscattermwq,     ICLASS_NCJ" 1 111 100 ttttt PP u vvvvv 0ss wwwww")    // if (Qs4) vmem(Rt32,Mu2,Vv32.w)=Vw32.w
+DEF_ENC(V6_vscattermhq,     ICLASS_NCJ" 1 111 100 ttttt PP u vvvvv 1ss wwwww")    // if (Qs4) vmem(Rt32,Mu2,Vv32.h)=Vw32.h
+DEF_ENC(V6_vscattermhwq,     ICLASS_NCJ" 1 111 101 ttttt PP u vvvvv 0ss wwwww")    // if (Qs4) vmem(Rt32,Mu2,Vv32.h)=Vw32.h
+
+
+
+
+
+DEF_CLASS32(ICLASS_CJ" 1--- -------- PP------ --------",COPROC_VX)
+
+
+
+/***************************************************************
+*
+*  Group #0, Uses Q6 Rt8: new in v61
+*
+****************************************************************/
+
+DEF_FIELDROW_DESC32(            ICLASS_CJ" 1 000 --- ----- PP - ----- ----- ---","[#1] Vd32=(Vu32, Vv32, Rt8)")
+DEF_ENC(V6_vasrhbsat,             ICLASS_CJ" 1 000 vvv vvttt PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vasruwuhrndsat,         ICLASS_CJ" 1 000 vvv vvttt PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vasrwuhrndsat,         ICLASS_CJ" 1 000 vvv vvttt PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vlutvvb_nm,             ICLASS_CJ" 1 000 vvv vvttt PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vlutvwh_nm,             ICLASS_CJ" 1 000 vvv vvttt PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vasruhubrndsat,         ICLASS_CJ" 1 000 vvv vvttt PP 0 uuuuu 111 ddddd") //
+
+DEF_ENC(V6_vasruwuhsat,         ICLASS_CJ" 1 000 vvv vvttt PP 1 uuuuu 100 ddddd") //
+DEF_ENC(V6_vasruhubsat,            ICLASS_CJ" 1 000 vvv vvttt PP 1 uuuuu 101 ddddd") //
+
+/***************************************************************
+*
+*  Group #1, Uses Q6 Rt32
+*
+****************************************************************/
+
+DEF_FIELDROW_DESC32(        ICLASS_CJ" 1 001 --- ----- PP - ----- ----- ---","[#1] Vd32=(Vu32, Rt32)")
+DEF_ENC(V6_vtmpyb,             ICLASS_CJ" 1 001 000 ttttt PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vtmpybus,         ICLASS_CJ" 1 001 000 ttttt PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vdmpyhb,         ICLASS_CJ" 1 001 000 ttttt PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vrmpyub,         ICLASS_CJ" 1 001 000 ttttt PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vrmpybus,         ICLASS_CJ" 1 001 000 ttttt PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vdsaduh,         ICLASS_CJ" 1 001 000 ttttt PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vdmpybus,         ICLASS_CJ" 1 001 000 ttttt PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vdmpybus_dv,     ICLASS_CJ" 1 001 000 ttttt PP 0 uuuuu 111 ddddd") //
+
+DEF_ENC(V6_vdmpyhsusat,     ICLASS_CJ" 1 001 001 ttttt PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vdmpyhsuisat,     ICLASS_CJ" 1 001 001 ttttt PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vdmpyhsat,         ICLASS_CJ" 1 001 001 ttttt PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vdmpyhisat,         ICLASS_CJ" 1 001 001 ttttt PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vdmpyhb_dv,         ICLASS_CJ" 1 001 001 ttttt PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vmpybus,         ICLASS_CJ" 1 001 001 ttttt PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vmpabus,         ICLASS_CJ" 1 001 001 ttttt PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vmpahb,             ICLASS_CJ" 1 001 001 ttttt PP 0 uuuuu 111 ddddd") //
+
+DEF_ENC(V6_vmpyh,             ICLASS_CJ" 1 001 010 ttttt PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vmpyhss,         ICLASS_CJ" 1 001 010 ttttt PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vmpyhsrs,         ICLASS_CJ" 1 001 010 ttttt PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vmpyuh,             ICLASS_CJ" 1 001 010 ttttt PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vrmpybusi,         ICLASS_CJ" 1 001 010 ttttt PP 0 uuuuu 10i ddddd") //
+DEF_ENC(V6_vrsadubi,         ICLASS_CJ" 1 001 010 ttttt PP 0 uuuuu 11i ddddd") //
+
+DEF_ENC(V6_vmpyihb,         ICLASS_CJ" 1 001 011 ttttt PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vror,             ICLASS_CJ" 1 001 011 ttttt PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vmpyuhe,         ICLASS_CJ" 1 001 011 ttttt PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vmpabuu,         ICLASS_CJ" 1 001 011 ttttt PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vlut4,            ICLASS_CJ" 1 001 011 ttttt PP 0 uuuuu 100 ddddd") //
+
+
+DEF_ENC(V6_vasrw,             ICLASS_CJ" 1 001 011 ttttt PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vasrh,             ICLASS_CJ" 1 001 011 ttttt PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vaslw,             ICLASS_CJ" 1 001 011 ttttt PP 0 uuuuu 111 ddddd") //
+
+DEF_ENC(V6_vaslh,             ICLASS_CJ" 1 001 100 ttttt PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vlsrw,             ICLASS_CJ" 1 001 100 ttttt PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vlsrh,             ICLASS_CJ" 1 001 100 ttttt PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vlsrb,            ICLASS_CJ" 1 001 100 ttttt PP 0 uuuuu 011 ddddd") //
+
+DEF_ENC(V6_vmpauhb,            ICLASS_CJ" 1 001 100 ttttt PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vmpyiwub,         ICLASS_CJ" 1 001 100 ttttt PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vmpyiwh,         ICLASS_CJ" 1 001 100 ttttt PP 0 uuuuu 111 ddddd") //
+
+DEF_ENC(V6_vmpyiwb,         ICLASS_CJ" 1 001 101 ttttt PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_lvsplatw,         ICLASS_CJ" 1 001 101 ttttt PP 0 ----0 001 ddddd") //
+
+
+
+DEF_ENC(V6_pred_scalar2,     ICLASS_CJ" 1 001 101 ttttt PP 0 ----- 010 -01dd") //
+DEF_ENC(V6_vandvrt,         ICLASS_CJ" 1 001 101 ttttt PP 0 uuuuu 010 -10dd") //
+DEF_ENC(V6_pred_scalar2v2,     ICLASS_CJ" 1 001 101 ttttt PP 0 ----- 010 -11dd") //
+
+DEF_ENC(V6_vtmpyhb,         ICLASS_CJ" 1 001 101 ttttt PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vandqrt,         ICLASS_CJ" 1 001 101 ttttt PP 0 --0uu 101 ddddd") //
+DEF_ENC(V6_vandnqrt,         ICLASS_CJ" 1 001 101 ttttt PP 0 --1uu 101 ddddd") //
+
+DEF_ENC(V6_vrmpyubi,         ICLASS_CJ" 1 001 101 ttttt PP 0 uuuuu 11i ddddd") //
+
+DEF_ENC(V6_vmpyub,             ICLASS_CJ" 1 001 110 ttttt PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_lvsplath,         ICLASS_CJ" 1 001 110 ttttt PP 0 ----- 001 ddddd") //
+DEF_ENC(V6_lvsplatb,         ICLASS_CJ" 1 001 110 ttttt PP 0 ----- 010 ddddd") //
+
+
+DEF_FIELDROW_DESC32(        ICLASS_CJ" 1 001 --- ----- PP - ----- ----- ---","[#1] Vx32=(Vu32, Rt32)")
+DEF_ENC(V6_vtmpyb_acc,         ICLASS_CJ" 1 001 000 ttttt PP 1 uuuuu 000 xxxxx") //
+DEF_ENC(V6_vtmpybus_acc,     ICLASS_CJ" 1 001 000 ttttt PP 1 uuuuu 001 xxxxx") //
+DEF_ENC(V6_vtmpyhb_acc,     ICLASS_CJ" 1 001 000 ttttt PP 1 uuuuu 010 xxxxx") //
+DEF_ENC(V6_vdmpyhb_acc,     ICLASS_CJ" 1 001 000 ttttt PP 1 uuuuu 011 xxxxx") //
+DEF_ENC(V6_vrmpyub_acc,     ICLASS_CJ" 1 001 000 ttttt PP 1 uuuuu 100 xxxxx") //
+DEF_ENC(V6_vrmpybus_acc,     ICLASS_CJ" 1 001 000 ttttt PP 1 uuuuu 101 xxxxx") //
+DEF_ENC(V6_vdmpybus_acc,     ICLASS_CJ" 1 001 000 ttttt PP 1 uuuuu 110 xxxxx") //
+DEF_ENC(V6_vdmpybus_dv_acc, ICLASS_CJ" 1 001 000 ttttt PP 1 uuuuu 111 xxxxx") //
+
+DEF_ENC(V6_vdmpyhsusat_acc, ICLASS_CJ" 1 001 001 ttttt PP 1 uuuuu 000 xxxxx") //
+DEF_ENC(V6_vdmpyhsuisat_acc,ICLASS_CJ" 1 001 001 ttttt PP 1 uuuuu 001 xxxxx") //
+DEF_ENC(V6_vdmpyhisat_acc,     ICLASS_CJ" 1 001 001 ttttt PP 1 uuuuu 010 xxxxx") //
+DEF_ENC(V6_vdmpyhsat_acc,     ICLASS_CJ" 1 001 001 ttttt PP 1 uuuuu 011 xxxxx") //
+DEF_ENC(V6_vdmpyhb_dv_acc,     ICLASS_CJ" 1 001 001 ttttt PP 1 uuuuu 100 xxxxx") //
+DEF_ENC(V6_vmpybus_acc,     ICLASS_CJ" 1 001 001 ttttt PP 1 uuuuu 101 xxxxx") //
+DEF_ENC(V6_vmpabus_acc,     ICLASS_CJ" 1 001 001 ttttt PP 1 uuuuu 110 xxxxx") //
+DEF_ENC(V6_vmpahb_acc,         ICLASS_CJ" 1 001 001 ttttt PP 1 uuuuu 111 xxxxx") //
+
+DEF_ENC(V6_vmpyhsat_acc,     ICLASS_CJ" 1 001 010 ttttt PP 1 uuuuu 000 xxxxx") //
+DEF_ENC(V6_vmpyuh_acc,         ICLASS_CJ" 1 001 010 ttttt PP 1 uuuuu 001 xxxxx") //
+DEF_ENC(V6_vmpyiwb_acc,     ICLASS_CJ" 1 001 010 ttttt PP 1 uuuuu 010 xxxxx") //
+DEF_ENC(V6_vmpyiwh_acc,     ICLASS_CJ" 1 001 010 ttttt PP 1 uuuuu 011 xxxxx") //
+DEF_ENC(V6_vrmpybusi_acc,     ICLASS_CJ" 1 001 010 ttttt PP 1 uuuuu 10i xxxxx") //
+DEF_ENC(V6_vrsadubi_acc,     ICLASS_CJ" 1 001 010 ttttt PP 1 uuuuu 11i xxxxx") //
+
+DEF_ENC(V6_vdsaduh_acc,     ICLASS_CJ" 1 001 011 ttttt PP 1 uuuuu 000 xxxxx") //
+DEF_ENC(V6_vmpyihb_acc,     ICLASS_CJ" 1 001 011 ttttt PP 1 uuuuu 001 xxxxx") //
+DEF_ENC(V6_vaslw_acc,         ICLASS_CJ" 1 001 011 ttttt PP 1 uuuuu 010 xxxxx") //
+DEF_ENC(V6_vandqrt_acc,     ICLASS_CJ" 1 001 011 ttttt PP 1 --0uu 011 xxxxx") //
+DEF_ENC(V6_vandnqrt_acc,     ICLASS_CJ" 1 001 011 ttttt PP 1 --1uu 011 xxxxx") //
+DEF_ENC(V6_vandvrt_acc,     ICLASS_CJ" 1 001 011 ttttt PP 1 uuuuu 100 ---xx") //
+DEF_ENC(V6_vasrw_acc,         ICLASS_CJ" 1 001 011 ttttt PP 1 uuuuu 101 xxxxx") //
+DEF_ENC(V6_vrmpyubi_acc,     ICLASS_CJ" 1 001 011 ttttt PP 1 uuuuu 11i xxxxx") //
+
+DEF_ENC(V6_vmpyub_acc,         ICLASS_CJ" 1 001 100 ttttt PP 1 uuuuu 000 xxxxx") //
+DEF_ENC(V6_vmpyiwub_acc,    ICLASS_CJ" 1 001 100 ttttt PP 1 uuuuu 001 xxxxx") //
+DEF_ENC(V6_vmpauhb_acc,        ICLASS_CJ" 1 001 100 ttttt PP 1 uuuuu 010 xxxxx") //
+DEF_ENC(V6_vmpyuhe_acc,        ICLASS_CJ" 1 001 100 ttttt PP 1 uuuuu 011 xxxxx")
+DEF_ENC(V6_vmpahhsat,        ICLASS_CJ" 1 001 100 ttttt PP 1 uuuuu 100 xxxxx") //
+DEF_ENC(V6_vmpauhuhsat,        ICLASS_CJ" 1 001 100 ttttt PP 1 uuuuu 101 xxxxx") //
+DEF_ENC(V6_vmpsuhuhsat,        ICLASS_CJ" 1 001 100 ttttt PP 1 uuuuu 110 xxxxx") //
+DEF_ENC(V6_vasrh_acc,         ICLASS_CJ" 1 001 100 ttttt PP 1 uuuuu 111 xxxxx") //
+
+
+
+
+DEF_ENC(V6_vinsertwr,        ICLASS_CJ" 1 001 101 ttttt PP 1 ----- 001 xxxxx")
+
+DEF_ENC(V6_vmpabuu_acc,        ICLASS_CJ" 1 001 101 ttttt PP 1 uuuuu 100 xxxxx") //
+DEF_ENC(V6_vaslh_acc,        ICLASS_CJ" 1 001 101 ttttt PP 1 uuuuu 101 xxxxx") //
+DEF_ENC(V6_vmpyh_acc,        ICLASS_CJ" 1 001 101 ttttt PP 1 uuuuu 110 xxxxx") //
+
+
+
+DEF_FIELDROW_DESC32(        ICLASS_CJ" 1 001 --- ----- PP - ----- ----- ---","[#1] (Vx32, Vy32, Rt32)")
+DEF_ENC(V6_vshuff,             ICLASS_CJ" 1 001 111 ttttt PP 1 yyyyy 001 xxxxx") //
+DEF_ENC(V6_vdeal,             ICLASS_CJ" 1 001 111 ttttt PP 1 yyyyy 010 xxxxx") //
+
+DEF_FIELDROW_DESC32(    ICLASS_CJ" 1 010 --- ----- PP - ----- ----- ---","[#2] if (Ps) Vd=Vu")
+DEF_ENC(V6_vcmov,         ICLASS_CJ" 1 010 000 ----- PP - uuuuu -ss ddddd")
+DEF_ENC(V6_vncmov,         ICLASS_CJ" 1 010 001 ----- PP - uuuuu -ss ddddd")
+DEF_ENC(V6_vnccombine,     ICLASS_CJ" 1 010 010 vvvvv PP - uuuuu -ss ddddd")
+DEF_ENC(V6_vccombine,     ICLASS_CJ" 1 010 011 vvvvv PP - uuuuu -ss ddddd")
+
+DEF_ENC(V6_vrotr,       ICLASS_CJ" 1 010 100 vvvvv PP 1 uuuuu 111 ddddd")
+DEF_ENC(V6_vasr_into,   ICLASS_CJ" 1 010 101 vvvvv PP 1 uuuuu 111 xxxxx")
+
+/***************************************************************
+*
+*  Group #3, Uses Q6 Rt8
+*
+****************************************************************/
+
+DEF_FIELDROW_DESC32(        ICLASS_CJ" 1 011 --- ----- PP - ----- ----- ---","[#3] Vd32=(Vu32, Vv32, Rt8)")
+DEF_ENC(V6_valignb,         ICLASS_CJ" 1 011 vvv vvttt PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vlalignb,         ICLASS_CJ" 1 011 vvv vvttt PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vasrwh,         ICLASS_CJ" 1 011 vvv vvttt PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vasrwhsat,         ICLASS_CJ" 1 011 vvv vvttt PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vasrwhrndsat,     ICLASS_CJ" 1 011 vvv vvttt PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vasrwuhsat,         ICLASS_CJ" 1 011 vvv vvttt PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vasrhubsat,         ICLASS_CJ" 1 011 vvv vvttt PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vasrhubrndsat,     ICLASS_CJ" 1 011 vvv vvttt PP 0 uuuuu 111 ddddd") //
+
+DEF_ENC(V6_vasrhbrndsat,     ICLASS_CJ" 1 011 vvv vvttt PP 1 uuuuu 000 ddddd") //
+DEF_ENC(V6_vlutvvb,            ICLASS_CJ" 1 011 vvv vvttt PP 1 uuuuu 001 ddddd")
+DEF_ENC(V6_vshuffvdd,         ICLASS_CJ" 1 011 vvv vvttt PP 1 uuuuu 011 ddddd") //
+DEF_ENC(V6_vdealvdd,         ICLASS_CJ" 1 011 vvv vvttt PP 1 uuuuu 100 ddddd") //
+DEF_ENC(V6_vlutvvb_oracc,    ICLASS_CJ" 1 011 vvv vvttt PP 1 uuuuu 101 xxxxx")
+DEF_ENC(V6_vlutvwh,            ICLASS_CJ" 1 011 vvv vvttt PP 1 uuuuu 110 ddddd")
+DEF_ENC(V6_vlutvwh_oracc,    ICLASS_CJ" 1 011 vvv vvttt PP 1 uuuuu 111 xxxxx")
+
+
+
+/***************************************************************
+*
+*  Group #4, No Q6 regs
+*
+****************************************************************/
+
+DEF_FIELDROW_DESC32(    ICLASS_CJ" 1 100 --- ----- PP 0 ----- ----- ---","[#4] Vd32=(Vu32, Vv32)")
+DEF_ENC(V6_vrmpyubv,     ICLASS_CJ" 1 100 000 vvvvv PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vrmpybv,     ICLASS_CJ" 1 100 000 vvvvv PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vrmpybusv,     ICLASS_CJ" 1 100 000 vvvvv PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vdmpyhvsat,     ICLASS_CJ" 1 100 000 vvvvv PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vmpybv,         ICLASS_CJ" 1 100 000 vvvvv PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vmpyubv,     ICLASS_CJ" 1 100 000 vvvvv PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vmpybusv,     ICLASS_CJ" 1 100 000 vvvvv PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vmpyhv,         ICLASS_CJ" 1 100 000 vvvvv PP 0 uuuuu 111 ddddd") //
+
+DEF_ENC(V6_vmpyuhv,     ICLASS_CJ" 1 100 001 vvvvv PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vmpyhvsrs,     ICLASS_CJ" 1 100 001 vvvvv PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vmpyhus,     ICLASS_CJ" 1 100 001 vvvvv PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vmpabusv,     ICLASS_CJ" 1 100 001 vvvvv PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vmpyih,         ICLASS_CJ" 1 100 001 vvvvv PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vand,         ICLASS_CJ" 1 100 001 vvvvv PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vor,         ICLASS_CJ" 1 100 001 vvvvv PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vxor,         ICLASS_CJ" 1 100 001 vvvvv PP 0 uuuuu 111 ddddd") //
+
+DEF_ENC(V6_vaddw,         ICLASS_CJ" 1 100 010 vvvvv PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vaddubsat,     ICLASS_CJ" 1 100 010 vvvvv PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vadduhsat,     ICLASS_CJ" 1 100 010 vvvvv PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vaddhsat,     ICLASS_CJ" 1 100 010 vvvvv PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vaddwsat,     ICLASS_CJ" 1 100 010 vvvvv PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vsubb,         ICLASS_CJ" 1 100 010 vvvvv PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vsubh,         ICLASS_CJ" 1 100 010 vvvvv PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vsubw,         ICLASS_CJ" 1 100 010 vvvvv PP 0 uuuuu 111 ddddd") //
+
+DEF_ENC(V6_vsububsat,     ICLASS_CJ" 1 100 011 vvvvv PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vsubuhsat,     ICLASS_CJ" 1 100 011 vvvvv PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vsubhsat,     ICLASS_CJ" 1 100 011 vvvvv PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vsubwsat,     ICLASS_CJ" 1 100 011 vvvvv PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vaddb_dv,     ICLASS_CJ" 1 100 011 vvvvv PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vaddh_dv,     ICLASS_CJ" 1 100 011 vvvvv PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vaddw_dv,     ICLASS_CJ" 1 100 011 vvvvv PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vaddubsat_dv,ICLASS_CJ" 1 100 011 vvvvv PP 0 uuuuu 111 ddddd") //
+
+DEF_ENC(V6_vadduhsat_dv,ICLASS_CJ" 1 100 100 vvvvv PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vaddhsat_dv, ICLASS_CJ" 1 100 100 vvvvv PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vaddwsat_dv, ICLASS_CJ" 1 100 100 vvvvv PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vsubb_dv,     ICLASS_CJ" 1 100 100 vvvvv PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vsubh_dv,     ICLASS_CJ" 1 100 100 vvvvv PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vsubw_dv,     ICLASS_CJ" 1 100 100 vvvvv PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vsububsat_dv,ICLASS_CJ" 1 100 100 vvvvv PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vsubuhsat_dv,ICLASS_CJ" 1 100 100 vvvvv PP 0 uuuuu 111 ddddd") //
+
+DEF_ENC(V6_vsubhsat_dv,    ICLASS_CJ" 1 100 101 vvvvv PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vsubwsat_dv, ICLASS_CJ" 1 100 101 vvvvv PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vaddubh,     ICLASS_CJ" 1 100 101 vvvvv PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vadduhw,     ICLASS_CJ" 1 100 101 vvvvv PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vaddhw,         ICLASS_CJ" 1 100 101 vvvvv PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vsububh,     ICLASS_CJ" 1 100 101 vvvvv PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vsubuhw,        ICLASS_CJ" 1 100 101 vvvvv PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vsubhw,        ICLASS_CJ" 1 100 101 vvvvv PP 0 uuuuu 111 ddddd") //
+
+DEF_ENC(V6_vabsdiffub,    ICLASS_CJ" 1 100 110 vvvvv PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vabsdiffh,     ICLASS_CJ" 1 100 110 vvvvv PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vabsdiffuh,     ICLASS_CJ" 1 100 110 vvvvv PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vabsdiffw,     ICLASS_CJ" 1 100 110 vvvvv PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vavgub,         ICLASS_CJ" 1 100 110 vvvvv PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vavguh,         ICLASS_CJ" 1 100 110 vvvvv PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vavgh,        ICLASS_CJ" 1 100 110 vvvvv PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vavgw,        ICLASS_CJ" 1 100 110 vvvvv PP 0 uuuuu 111 ddddd") //
+
+DEF_ENC(V6_vnavgub,        ICLASS_CJ" 1 100 111 vvvvv PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vnavgh,         ICLASS_CJ" 1 100 111 vvvvv PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vnavgw,         ICLASS_CJ" 1 100 111 vvvvv PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vavgubrnd,     ICLASS_CJ" 1 100 111 vvvvv PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vavguhrnd,     ICLASS_CJ" 1 100 111 vvvvv PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vavghrnd,     ICLASS_CJ" 1 100 111 vvvvv PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vavgwrnd,    ICLASS_CJ" 1 100 111 vvvvv PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vmpabuuv,    ICLASS_CJ" 1 100 111 vvvvv PP 0 uuuuu 111 ddddd") //
+
+DEF_FIELDROW_DESC32(        ICLASS_CJ" 1 100 --- ----- PP 1 ----- ----- ---","[#4] Vx32=(Vu32, Vv32)")
+DEF_ENC(V6_vrmpyubv_acc,      ICLASS_CJ" 1 100 000 vvvvv PP 1 uuuuu 000 xxxxx") //
+DEF_ENC(V6_vrmpybv_acc,       ICLASS_CJ" 1 100 000 vvvvv PP 1 uuuuu 001 xxxxx") //
+DEF_ENC(V6_vrmpybusv_acc,    ICLASS_CJ" 1 100 000 vvvvv PP 1 uuuuu 010 xxxxx") //
+DEF_ENC(V6_vdmpyhvsat_acc,    ICLASS_CJ" 1 100 000 vvvvv PP 1 uuuuu 011 xxxxx") //
+DEF_ENC(V6_vmpybv_acc,         ICLASS_CJ" 1 100 000 vvvvv PP 1 uuuuu 100 xxxxx") //
+DEF_ENC(V6_vmpyubv_acc,     ICLASS_CJ" 1 100 000 vvvvv PP 1 uuuuu 101 xxxxx") //
+DEF_ENC(V6_vmpybusv_acc,    ICLASS_CJ" 1 100 000 vvvvv PP 1 uuuuu 110 xxxxx") //
+DEF_ENC(V6_vmpyhv_acc,        ICLASS_CJ" 1 100 000 vvvvv PP 1 uuuuu 111 xxxxx") //
+
+DEF_ENC(V6_vmpyuhv_acc,        ICLASS_CJ" 1 100 001 vvvvv PP 1 uuuuu 000 xxxxx") //
+DEF_ENC(V6_vmpyhus_acc,     ICLASS_CJ" 1 100 001 vvvvv PP 1 uuuuu 001 xxxxx") //
+DEF_ENC(V6_vaddhw_acc,        ICLASS_CJ" 1 100 001 vvvvv PP 1 uuuuu 010 xxxxx") //
+DEF_ENC(V6_vmpyowh_64_acc,    ICLASS_CJ" 1 100 001 vvvvv PP 1 uuuuu 011 xxxxx")
+DEF_ENC(V6_vmpyih_acc,         ICLASS_CJ" 1 100 001 vvvvv PP 1 uuuuu 100 xxxxx") //
+DEF_ENC(V6_vmpyiewuh_acc,    ICLASS_CJ" 1 100 001 vvvvv PP 1 uuuuu 101 xxxxx") //
+DEF_ENC(V6_vmpyowh_sacc,    ICLASS_CJ" 1 100 001 vvvvv PP 1 uuuuu 110 xxxxx") //
+DEF_ENC(V6_vmpyowh_rnd_sacc,ICLASS_CJ" 1 100 001 vvvvv PP 1 uuuuu 111 xxxxx") //
+
+DEF_ENC(V6_vmpyiewh_acc,      ICLASS_CJ" 1 100 010 vvvvv PP 1 uuuuu 000 xxxxx") //
+
+DEF_ENC(V6_vadduhw_acc,          ICLASS_CJ" 1 100 010 vvvvv PP 1 uuuuu 100 xxxxx") //
+DEF_ENC(V6_vaddubh_acc,          ICLASS_CJ" 1 100 010 vvvvv PP 1 uuuuu 101 xxxxx") //
+
+DEF_FIELDROW_DESC32(    ICLASS_CJ" 1 100 100 ----- PP 1 ----- ----- ---","[#4] Qx4=(Vu32, Vv32)")
+// Grouped by element size (lsbs), operation (next-lsbs) and operation (next-lsbs)
+DEF_ENC(V6_veqb_and,     ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 000 000xx") //
+DEF_ENC(V6_veqh_and,     ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 000 001xx") //
+DEF_ENC(V6_veqw_and,     ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 000 010xx") //
+
+DEF_ENC(V6_vgtb_and,    ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 000 100xx") //
+DEF_ENC(V6_vgth_and,    ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 000 101xx") //
+DEF_ENC(V6_vgtw_and,    ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 000 110xx") //
+
+DEF_ENC(V6_vgtub_and,    ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 001 000xx") //
+DEF_ENC(V6_vgtuh_and,    ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 001 001xx") //
+DEF_ENC(V6_vgtuw_and,    ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 001 010xx") //
+
+DEF_ENC(V6_veqb_or,     ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 010 000xx") //
+DEF_ENC(V6_veqh_or,     ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 010 001xx") //
+DEF_ENC(V6_veqw_or,     ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 010 010xx") //
+
+DEF_ENC(V6_vgtb_or,        ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 010 100xx") //
+DEF_ENC(V6_vgth_or,        ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 010 101xx") //
+DEF_ENC(V6_vgtw_or,        ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 010 110xx") //
+
+DEF_ENC(V6_vgtub_or,    ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 011 000xx") //
+DEF_ENC(V6_vgtuh_or,    ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 011 001xx") //
+DEF_ENC(V6_vgtuw_or,    ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 011 010xx") //
+
+DEF_ENC(V6_veqb_xor,     ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 100 000xx") //
+DEF_ENC(V6_veqh_xor,     ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 100 001xx") //
+DEF_ENC(V6_veqw_xor,     ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 100 010xx") //
+
+DEF_ENC(V6_vgtb_xor,    ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 100 100xx") //
+DEF_ENC(V6_vgth_xor,    ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 100 101xx") //
+DEF_ENC(V6_vgtw_xor,    ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 100 110xx") //
+
+DEF_ENC(V6_vgtub_xor,    ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 101 000xx") //
+DEF_ENC(V6_vgtuh_xor,    ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 101 001xx") //
+DEF_ENC(V6_vgtuw_xor,    ICLASS_CJ" 1 100 100 vvvvv PP 1 uuuuu 101 010xx") //
+
+DEF_FIELDROW_DESC32(    ICLASS_CJ" 1 100 101 ----- PP 1 ----- ----- ---","[#4] Qx4,Vd32=(Vu32, Vv32)")
+DEF_ENC(V6_vaddcarry,    ICLASS_CJ" 1 100 101 vvvvv PP 1 uuuuu 0xx ddddd") //
+DEF_ENC(V6_vsubcarry,    ICLASS_CJ" 1 100 101 vvvvv PP 1 uuuuu 1xx ddddd") //
+
+DEF_FIELDROW_DESC32(        ICLASS_CJ" 1 100 11- ----- PP 1 ----- ----- ---","[#4] Vx32|=(Vu32, Vv32,#)")
+DEF_ENC(V6_vlutvvb_oracci,    ICLASS_CJ" 1 100 110 vvvvv PP 1 uuuuu iii xxxxx") //
+DEF_ENC(V6_vlutvwh_oracci,    ICLASS_CJ" 1 100 111 vvvvv PP 1 uuuuu iii xxxxx") //
+
+
+
+/***************************************************************
+*
+*  Group #5, Reserved/Deprecated. Uses Q6 Rx. Stupid FFT.
+*
+****************************************************************/
+
+
+
+
+/***************************************************************
+*
+*  Group #6, No Q6 regs
+*
+****************************************************************/
+
+DEF_FIELDROW_DESC32(    ICLASS_CJ" 1 110 --0 ----- PP 0 ----- ----- ---","[#6] Vd32=Vu32")
+DEF_ENC(V6_vabsh,         ICLASS_CJ" 1 110 --0 ---00 PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vabsh_sat,     ICLASS_CJ" 1 110 --0 ---00 PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vabsw,         ICLASS_CJ" 1 110 --0 ---00 PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vabsw_sat,     ICLASS_CJ" 1 110 --0 ---00 PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vnot,         ICLASS_CJ" 1 110 --0 ---00 PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vdealh,         ICLASS_CJ" 1 110 --0 ---00 PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vdealb,         ICLASS_CJ" 1 110 --0 ---00 PP 0 uuuuu 111 ddddd") //
+
+DEF_ENC(V6_vunpackub,     ICLASS_CJ" 1 110 --0 ---01 PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vunpackuh,     ICLASS_CJ" 1 110 --0 ---01 PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vunpackb,     ICLASS_CJ" 1 110 --0 ---01 PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vunpackh,     ICLASS_CJ" 1 110 --0 ---01 PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vabsb,         ICLASS_CJ" 1 110 --0 ---01 PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vabsb_sat,     ICLASS_CJ" 1 110 --0 ---01 PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vshuffh,     ICLASS_CJ" 1 110 --0 ---01 PP 0 uuuuu 111 ddddd") //
+
+DEF_ENC(V6_vshuffb,     ICLASS_CJ" 1 110 --0 ---10 PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vzb,         ICLASS_CJ" 1 110 --0 ---10 PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vzh,         ICLASS_CJ" 1 110 --0 ---10 PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vsb,         ICLASS_CJ" 1 110 --0 ---10 PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vsh,         ICLASS_CJ" 1 110 --0 ---10 PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vcl0w,         ICLASS_CJ" 1 110 --0 ---10 PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vpopcounth,     ICLASS_CJ" 1 110 --0 ---10 PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vcl0h,         ICLASS_CJ" 1 110 --0 ---10 PP 0 uuuuu 111 ddddd") //
+
+
+DEF_FIELDROW_DESC32(    ICLASS_CJ" 1 110 --0 ---11 PP 0 ----- ----- ---","[#6] Qd4=Qt4, Qs4")
+DEF_ENC(V6_pred_and,     ICLASS_CJ" 1 110 tt0 ---11 PP 0 ---ss 000 000dd") //
+DEF_ENC(V6_pred_or,     ICLASS_CJ" 1 110 tt0 ---11 PP 0 ---ss 000 001dd") //
+DEF_ENC(V6_pred_not,     ICLASS_CJ" 1 110 --0 ---11 PP 0 ---ss 000 010dd") //
+DEF_ENC(V6_pred_xor,     ICLASS_CJ" 1 110 tt0 ---11 PP 0 ---ss 000 011dd") //
+DEF_ENC(V6_pred_or_n,     ICLASS_CJ" 1 110 tt0 ---11 PP 0 ---ss 000 100dd") //
+DEF_ENC(V6_pred_and_n,     ICLASS_CJ" 1 110 tt0 ---11 PP 0 ---ss 000 101dd") //
+DEF_ENC(V6_shuffeqh,     ICLASS_CJ" 1 110 tt0 ---11 PP 0 ---ss 000 110dd") //
+DEF_ENC(V6_shuffeqw,     ICLASS_CJ" 1 110 tt0 ---11 PP 0 ---ss 000 111dd") //
+
+DEF_ENC(V6_vnormamtw,        ICLASS_CJ" 1 110 --0 ---11 PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vnormamth,        ICLASS_CJ" 1 110 --0 ---11 PP 0 uuuuu 101 ddddd") //
+
+DEF_FIELDROW_DESC32(        ICLASS_CJ" 1 110 --1 ----- PP 0 ----- ----- ---","[#6] Vd32=Vu32,Vv32")
+DEF_ENC(V6_vlutvvbi,        ICLASS_CJ" 1 110 001 vvvvv PP 0 uuuuu iii ddddd")
+DEF_ENC(V6_vlutvwhi,        ICLASS_CJ" 1 110 011 vvvvv PP 0 uuuuu iii ddddd")
+
+DEF_ENC(V6_vaddbsat_dv,        ICLASS_CJ" 1 110 101 vvvvv PP 0 uuuuu 000 ddddd")
+DEF_ENC(V6_vsubbsat_dv,        ICLASS_CJ" 1 110 101 vvvvv PP 0 uuuuu 001 ddddd")
+DEF_ENC(V6_vadduwsat_dv,    ICLASS_CJ" 1 110 101 vvvvv PP 0 uuuuu 010 ddddd")
+DEF_ENC(V6_vsubuwsat_dv,    ICLASS_CJ" 1 110 101 vvvvv PP 0 uuuuu 011 ddddd")
+DEF_ENC(V6_vaddububb_sat,    ICLASS_CJ" 1 110 101 vvvvv PP 0 uuuuu 100 ddddd")
+DEF_ENC(V6_vsubububb_sat,    ICLASS_CJ" 1 110 101 vvvvv PP 0 uuuuu 101 ddddd")
+DEF_ENC(V6_vmpyewuh_64,        ICLASS_CJ" 1 110 101 vvvvv PP 0 uuuuu 110 ddddd")
+
+DEF_FIELDROW_DESC32(        ICLASS_CJ" 1 110 --0 ----- PP 1 ----- ----- ---","Vx32=Vu32")
+DEF_ENC(V6_vunpackob,         ICLASS_CJ" 1 110 --0 ---00 PP 1 uuuuu 000 xxxxx") //
+DEF_ENC(V6_vunpackoh,         ICLASS_CJ" 1 110 --0 ---00 PP 1 uuuuu 001 xxxxx") //
+//DEF_ENC(V6_vunpackow,     ICLASS_CJ" 1 110 --0 ---00 PP 1 uuuuu 010 xxxxx") //
+
+DEF_ENC(V6_vhist,            ICLASS_CJ" 1 110 --0 ---00 PP 1 -000- 100 -----")
+DEF_ENC(V6_vwhist256,        ICLASS_CJ" 1 110 --0 ---00 PP 1 -0010 100 -----")
+DEF_ENC(V6_vwhist256_sat,    ICLASS_CJ" 1 110 --0 ---00 PP 1 -0011 100 -----")
+DEF_ENC(V6_vwhist128,        ICLASS_CJ" 1 110 --0 ---00 PP 1 -010- 100 -----")
+DEF_ENC(V6_vwhist128m,        ICLASS_CJ" 1 110 --0 ---00 PP 1 -011i 100 -----")
+
+DEF_FIELDROW_DESC32(        ICLASS_CJ" 1 110 --0 ----- PP 1 ----- ----- ---","if (Qv4) Vx32=Vu32")
+DEF_ENC(V6_vaddbq,             ICLASS_CJ" 1 110 vv0 ---01 PP 1 uuuuu 000 xxxxx") //
+DEF_ENC(V6_vaddhq,             ICLASS_CJ" 1 110 vv0 ---01 PP 1 uuuuu 001 xxxxx") //
+DEF_ENC(V6_vaddwq,             ICLASS_CJ" 1 110 vv0 ---01 PP 1 uuuuu 010 xxxxx") //
+DEF_ENC(V6_vaddbnq,         ICLASS_CJ" 1 110 vv0 ---01 PP 1 uuuuu 011 xxxxx") //
+DEF_ENC(V6_vaddhnq,         ICLASS_CJ" 1 110 vv0 ---01 PP 1 uuuuu 100 xxxxx") //
+DEF_ENC(V6_vaddwnq,         ICLASS_CJ" 1 110 vv0 ---01 PP 1 uuuuu 101 xxxxx") //
+DEF_ENC(V6_vsubbq,             ICLASS_CJ" 1 110 vv0 ---01 PP 1 uuuuu 110 xxxxx") //
+DEF_ENC(V6_vsubhq,             ICLASS_CJ" 1 110 vv0 ---01 PP 1 uuuuu 111 xxxxx") //
+
+DEF_ENC(V6_vsubwq,             ICLASS_CJ" 1 110 vv0 ---10 PP 1 uuuuu 000 xxxxx") //
+DEF_ENC(V6_vsubbnq,         ICLASS_CJ" 1 110 vv0 ---10 PP 1 uuuuu 001 xxxxx") //
+DEF_ENC(V6_vsubhnq,         ICLASS_CJ" 1 110 vv0 ---10 PP 1 uuuuu 010 xxxxx") //
+DEF_ENC(V6_vsubwnq,         ICLASS_CJ" 1 110 vv0 ---10 PP 1 uuuuu 011 xxxxx") //
+
+DEF_ENC(V6_vhistq,            ICLASS_CJ" 1 110 vv0 ---10 PP 1 --00- 100 -----")
+DEF_ENC(V6_vwhist256q,        ICLASS_CJ" 1 110 vv0 ---10 PP 1 --010 100 -----")
+DEF_ENC(V6_vwhist256q_sat,    ICLASS_CJ" 1 110 vv0 ---10 PP 1 --011 100 -----")
+DEF_ENC(V6_vwhist128q,        ICLASS_CJ" 1 110 vv0 ---10 PP 1 --10- 100 -----")
+DEF_ENC(V6_vwhist128qm,        ICLASS_CJ" 1 110 vv0 ---10 PP 1 --11i 100 -----")
+
+
+DEF_ENC(V6_vandvqv,            ICLASS_CJ" 1 110 vv0 ---11 PP 1 uuuuu 000 ddddd")
+DEF_ENC(V6_vandvnqv,        ICLASS_CJ" 1 110 vv0 ---11 PP 1 uuuuu 001 ddddd")
+
+
+DEF_ENC(V6_vprefixqb,       ICLASS_CJ" 1 110 vv0 ---11 PP 1 --000 010 ddddd") //
+DEF_ENC(V6_vprefixqh,       ICLASS_CJ" 1 110 vv0 ---11 PP 1 --001 010 ddddd") //
+DEF_ENC(V6_vprefixqw,       ICLASS_CJ" 1 110 vv0 ---11 PP 1 --010 010 ddddd") //
+
+
+
+
+DEF_ENC(V6_vassign,            ICLASS_CJ" 1 110 --0 ---11 PP 1 uuuuu 111 ddddd")
+
+DEF_ENC(V6_valignbi,         ICLASS_CJ" 1 110 001 vvvvv PP 1 uuuuu iii ddddd")
+DEF_ENC(V6_vlalignbi,         ICLASS_CJ" 1 110 011 vvvvv PP 1 uuuuu iii ddddd")
+DEF_ENC(V6_vswap,             ICLASS_CJ" 1 110 101 vvvvv PP 1 uuuuu -tt ddddd") //
+DEF_ENC(V6_vmux,             ICLASS_CJ" 1 110 111 vvvvv PP 1 uuuuu -tt ddddd") //
+
+
+
+/***************************************************************
+*
+*  Group #7, No Q6 regs
+*
+****************************************************************/
+
+DEF_FIELDROW_DESC32(    ICLASS_CJ" 1 111 --- ----- PP 0 ----- ----- ---","[#7] Vd32=(Vu32, Vv32)")
+DEF_ENC(V6_vaddbsat,    ICLASS_CJ" 1 111 000 vvvvv PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vminub,         ICLASS_CJ" 1 111 000 vvvvv PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vminuh,         ICLASS_CJ" 1 111 000 vvvvv PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vminh,         ICLASS_CJ" 1 111 000 vvvvv PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vminw,         ICLASS_CJ" 1 111 000 vvvvv PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vmaxub,         ICLASS_CJ" 1 111 000 vvvvv PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vmaxuh,         ICLASS_CJ" 1 111 000 vvvvv PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vmaxh,         ICLASS_CJ" 1 111 000 vvvvv PP 0 uuuuu 111 ddddd") //
+
+
+DEF_ENC(V6_vaddclbh,    ICLASS_CJ" 1 111 000 vvvvv PP 1 uuuuu 000 ddddd") //
+DEF_ENC(V6_vaddclbw,    ICLASS_CJ" 1 111 000 vvvvv PP 1 uuuuu 001 ddddd") //
+
+DEF_ENC(V6_vavguw,        ICLASS_CJ" 1 111 000 vvvvv PP 1 uuuuu 010 ddddd") //
+DEF_ENC(V6_vavguwrnd,    ICLASS_CJ" 1 111 000 vvvvv PP 1 uuuuu 011 ddddd") //
+DEF_ENC(V6_vavgb,        ICLASS_CJ" 1 111 000 vvvvv PP 1 uuuuu 100 ddddd") //
+DEF_ENC(V6_vavgbrnd,    ICLASS_CJ" 1 111 000 vvvvv PP 1 uuuuu 101 ddddd") //
+DEF_ENC(V6_vnavgb,        ICLASS_CJ" 1 111 000 vvvvv PP 1 uuuuu 110 ddddd") //
+
+
+DEF_ENC(V6_vmaxw,         ICLASS_CJ" 1 111 001 vvvvv PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vdelta,         ICLASS_CJ" 1 111 001 vvvvv PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vsubbsat,    ICLASS_CJ" 1 111 001 vvvvv PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vrdelta,     ICLASS_CJ" 1 111 001 vvvvv PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vminb,         ICLASS_CJ" 1 111 001 vvvvv PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vmaxb,         ICLASS_CJ" 1 111 001 vvvvv PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vsatuwuh,    ICLASS_CJ" 1 111 001 vvvvv PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vdealb4w,     ICLASS_CJ" 1 111 001 vvvvv PP 0 uuuuu 111 ddddd") //
+
+
+DEF_ENC(V6_vmpyowh_rnd,     ICLASS_CJ" 1 111 010 vvvvv PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vshuffeb,      ICLASS_CJ" 1 111 010 vvvvv PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vshuffob,      ICLASS_CJ" 1 111 010 vvvvv PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vshufeh,      ICLASS_CJ" 1 111 010 vvvvv PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vshufoh,      ICLASS_CJ" 1 111 010 vvvvv PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vshufoeh,      ICLASS_CJ" 1 111 010 vvvvv PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vshufoeb,      ICLASS_CJ" 1 111 010 vvvvv PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vcombine,     ICLASS_CJ" 1 111 010 vvvvv PP 0 uuuuu 111 ddddd") //
+
+DEF_ENC(V6_vmpyieoh,     ICLASS_CJ" 1 111 011 vvvvv PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vadduwsat,     ICLASS_CJ" 1 111 011 vvvvv PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vsathub,     ICLASS_CJ" 1 111 011 vvvvv PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vsatwh,         ICLASS_CJ" 1 111 011 vvvvv PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vroundwh,    ICLASS_CJ" 1 111 011 vvvvv PP 0 uuuuu 100 ddddd")
+DEF_ENC(V6_vroundwuh,    ICLASS_CJ" 1 111 011 vvvvv PP 0 uuuuu 101 ddddd")
+DEF_ENC(V6_vroundhb,    ICLASS_CJ" 1 111 011 vvvvv PP 0 uuuuu 110 ddddd")
+DEF_ENC(V6_vroundhub,    ICLASS_CJ" 1 111 011 vvvvv PP 0 uuuuu 111 ddddd")
+
+DEF_FIELDROW_DESC32(    ICLASS_CJ" 1 111 100 ----- PP - ----- ----- ---","[#7] Qd4=(Vu32, Vv32)")
+DEF_ENC(V6_veqb,         ICLASS_CJ" 1 111 100 vvvvv PP 0 uuuuu 000 000dd") //
+DEF_ENC(V6_veqh,         ICLASS_CJ" 1 111 100 vvvvv PP 0 uuuuu 000 001dd") //
+DEF_ENC(V6_veqw,         ICLASS_CJ" 1 111 100 vvvvv PP 0 uuuuu 000 010dd") //
+
+DEF_ENC(V6_vgtb,         ICLASS_CJ" 1 111 100 vvvvv PP 0 uuuuu 000 100dd") //
+DEF_ENC(V6_vgth,         ICLASS_CJ" 1 111 100 vvvvv PP 0 uuuuu 000 101dd") //
+DEF_ENC(V6_vgtw,         ICLASS_CJ" 1 111 100 vvvvv PP 0 uuuuu 000 110dd") //
+
+DEF_ENC(V6_vgtub,         ICLASS_CJ" 1 111 100 vvvvv PP 0 uuuuu 001 000dd") //
+DEF_ENC(V6_vgtuh,         ICLASS_CJ" 1 111 100 vvvvv PP 0 uuuuu 001 001dd") //
+DEF_ENC(V6_vgtuw,         ICLASS_CJ" 1 111 100 vvvvv PP 0 uuuuu 001 010dd") //
+
+
+DEF_ENC(V6_vasrwv,         ICLASS_CJ" 1 111 101 vvvvv PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vlsrwv,         ICLASS_CJ" 1 111 101 vvvvv PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vlsrhv,         ICLASS_CJ" 1 111 101 vvvvv PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vasrhv,         ICLASS_CJ" 1 111 101 vvvvv PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vaslwv,         ICLASS_CJ" 1 111 101 vvvvv PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vaslhv,         ICLASS_CJ" 1 111 101 vvvvv PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vaddb,         ICLASS_CJ" 1 111 101 vvvvv PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vaddh,         ICLASS_CJ" 1 111 101 vvvvv PP 0 uuuuu 111 ddddd") //
+
+
+DEF_ENC(V6_vmpyiewuh,     ICLASS_CJ" 1 111 110 vvvvv PP 0 uuuuu 000 ddddd")
+DEF_ENC(V6_vmpyiowh,    ICLASS_CJ" 1 111 110 vvvvv PP 0 uuuuu 001 ddddd")
+DEF_ENC(V6_vpackeb,     ICLASS_CJ" 1 111 110 vvvvv PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vpackeh,     ICLASS_CJ" 1 111 110 vvvvv PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vsubuwsat,     ICLASS_CJ" 1 111 110 vvvvv PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vpackhub_sat,ICLASS_CJ" 1 111 110 vvvvv PP 0 uuuuu 101 ddddd") //
+DEF_ENC(V6_vpackhb_sat, ICLASS_CJ" 1 111 110 vvvvv PP 0 uuuuu 110 ddddd") //
+DEF_ENC(V6_vpackwuh_sat,ICLASS_CJ" 1 111 110 vvvvv PP 0 uuuuu 111 ddddd") //
+
+DEF_ENC(V6_vpackwh_sat, ICLASS_CJ" 1 111 111 vvvvv PP 0 uuuuu 000 ddddd") //
+DEF_ENC(V6_vpackob,     ICLASS_CJ" 1 111 111 vvvvv PP 0 uuuuu 001 ddddd") //
+DEF_ENC(V6_vpackoh,     ICLASS_CJ" 1 111 111 vvvvv PP 0 uuuuu 010 ddddd") //
+DEF_ENC(V6_vrounduhub,     ICLASS_CJ" 1 111 111 vvvvv PP 0 uuuuu 011 ddddd") //
+DEF_ENC(V6_vrounduwuh,     ICLASS_CJ" 1 111 111 vvvvv PP 0 uuuuu 100 ddddd") //
+DEF_ENC(V6_vmpyewuh,    ICLASS_CJ" 1 111 111 vvvvv PP 0 uuuuu 101 ddddd")
+DEF_ENC(V6_vmpyowh,        ICLASS_CJ" 1 111 111 vvvvv PP 0 uuuuu 111 ddddd")
+
+
+#endif /* NO MMVEC */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 27/30] Hexagon HVX (tests/tcg/hexagon) vector_add_int test
  2021-09-20 21:23 [PATCH v3 00/30] Hexagon HVX (target/hexagon) patch series Taylor Simpson
                   ` (25 preceding siblings ...)
  2021-09-20 21:24 ` [PATCH v3 26/30] Hexagon HVX (target/hexagon) import instruction encodings Taylor Simpson
@ 2021-09-20 21:24 ` Taylor Simpson
  2021-09-20 21:24 ` [PATCH v3 28/30] Hexagon HVX (tests/tcg/hexagon) hvx_misc test Taylor Simpson
                   ` (2 subsequent siblings)
  29 siblings, 0 replies; 51+ messages in thread
From: Taylor Simpson @ 2021-09-20 21:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: ale, bcain, tsimpson, richard.henderson, f4bug

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 tests/tcg/hexagon/vector_add_int.c | 61 ++++++++++++++++++++++++++++++++++++++
 tests/tcg/hexagon/Makefile.target  |  3 ++
 2 files changed, 64 insertions(+)
 create mode 100644 tests/tcg/hexagon/vector_add_int.c

diff --git a/tests/tcg/hexagon/vector_add_int.c b/tests/tcg/hexagon/vector_add_int.c
new file mode 100644
index 0000000..d6010ea
--- /dev/null
+++ b/tests/tcg/hexagon/vector_add_int.c
@@ -0,0 +1,61 @@
+/*
+ *  Copyright(c) 2019-2021 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <stdio.h>
+
+int gA[401];
+int gB[401];
+int gC[401];
+
+void vector_add_int()
+{
+  int i;
+  for (i = 0; i < 400; i++) {
+    gA[i] = gB[i] + gC[i];
+  }
+}
+
+int main()
+{
+  int error = 0;
+  int i;
+  for (i = 0; i < 400; i++) {
+    gB[i] = i * 2;
+    gC[i] = i * 3;
+  }
+  gA[400] = 17;
+  vector_add_int();
+  for (i = 0; i < 400; i++) {
+    if (gA[i] != i * 5) {
+        error++;
+        printf("ERROR: gB[%d] = %d\t", i, gB[i]);
+        printf("gC[%d] = %d\t", i, gC[i]);
+        printf("gA[%d] = %d\n", i, gA[i]);
+    }
+  }
+  if (gA[400] != 17) {
+    error++;
+    printf("ERROR: Overran the buffer\n");
+  }
+  if (!error) {
+    printf("PASS\n");
+    return 0;
+  } else {
+    printf("FAIL\n");
+    return 1;
+  }
+}
diff --git a/tests/tcg/hexagon/Makefile.target b/tests/tcg/hexagon/Makefile.target
index 050cd61..18b4472 100644
--- a/tests/tcg/hexagon/Makefile.target
+++ b/tests/tcg/hexagon/Makefile.target
@@ -37,7 +37,10 @@ HEX_TESTS += circ
 HEX_TESTS += brev
 HEX_TESTS += load_unpack
 HEX_TESTS += load_align
+HEX_TESTS += vector_add_int
 HEX_TESTS += atomics
 HEX_TESTS += fpstuff
 
 TESTS += $(HEX_TESTS)
+
+vector_add_int: CFLAGS += -mhvx -fvectorize
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 28/30] Hexagon HVX (tests/tcg/hexagon) hvx_misc test
  2021-09-20 21:23 [PATCH v3 00/30] Hexagon HVX (target/hexagon) patch series Taylor Simpson
                   ` (26 preceding siblings ...)
  2021-09-20 21:24 ` [PATCH v3 27/30] Hexagon HVX (tests/tcg/hexagon) vector_add_int test Taylor Simpson
@ 2021-09-20 21:24 ` Taylor Simpson
  2021-09-20 21:24 ` [PATCH v3 29/30] Hexagon HVX (tests/tcg/hexagon) scatter_gather test Taylor Simpson
  2021-09-20 21:24 ` [PATCH v3 30/30] Hexagon HVX (tests/tcg/hexagon) histogram test Taylor Simpson
  29 siblings, 0 replies; 51+ messages in thread
From: Taylor Simpson @ 2021-09-20 21:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: ale, bcain, tsimpson, richard.henderson, f4bug

Tests for
    packet semantics
    vector loads (aligned and unaligned)
    vector stores (aligned and unaligned)
    vector masked stores
    vector operations

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 tests/tcg/hexagon/hvx_misc.c      | 414 ++++++++++++++++++++++++++++++++++++++
 tests/tcg/hexagon/Makefile.target |   2 +
 2 files changed, 416 insertions(+)
 create mode 100644 tests/tcg/hexagon/hvx_misc.c

diff --git a/tests/tcg/hexagon/hvx_misc.c b/tests/tcg/hexagon/hvx_misc.c
new file mode 100644
index 0000000..d809f81
--- /dev/null
+++ b/tests/tcg/hexagon/hvx_misc.c
@@ -0,0 +1,414 @@
+/*
+ *  Copyright(c) 2021 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <stdio.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include <string.h>
+
+int err;
+
+static void __check(int line, uint64_t result, uint64_t expect)
+{
+    if (result != expect) {
+        printf("ERROR at line %d: 0x%016llx != 0x%016llx\n",
+               line, result, expect);
+        err++;
+    }
+}
+
+#define check(RES, EXP) __check(__LINE__, RES, EXP)
+
+#define MAX_VEC_SIZE_BYTES         128
+
+typedef union {
+    uint64_t ud[MAX_VEC_SIZE_BYTES / 8];
+    int64_t   d[MAX_VEC_SIZE_BYTES / 8];
+    uint32_t uw[MAX_VEC_SIZE_BYTES / 4];
+    int32_t   w[MAX_VEC_SIZE_BYTES / 4];
+    uint16_t uh[MAX_VEC_SIZE_BYTES / 2];
+    int16_t   h[MAX_VEC_SIZE_BYTES / 2];
+    uint8_t  ub[MAX_VEC_SIZE_BYTES / 1];
+    int8_t    b[MAX_VEC_SIZE_BYTES / 1];
+} MMVector;
+
+#define BUFSIZE      16
+#define OUTSIZE      16
+#define MASKMOD      3
+
+MMVector buffer0[BUFSIZE] __attribute__((aligned(MAX_VEC_SIZE_BYTES)));
+MMVector buffer1[BUFSIZE] __attribute__((aligned(MAX_VEC_SIZE_BYTES)));
+MMVector mask[BUFSIZE] __attribute__((aligned(MAX_VEC_SIZE_BYTES)));
+MMVector output[OUTSIZE] __attribute__((aligned(MAX_VEC_SIZE_BYTES)));
+MMVector expect[OUTSIZE] __attribute__((aligned(MAX_VEC_SIZE_BYTES)));
+
+#define CHECK_OUTPUT_FUNC(FIELD, FIELDSZ) \
+static void check_output_##FIELD(int line, size_t num_vectors) \
+{ \
+    for (int i = 0; i < num_vectors; i++) { \
+        for (int j = 0; j < MAX_VEC_SIZE_BYTES / FIELDSZ; j++) { \
+            __check(line, output[i].FIELD[j], expect[i].FIELD[j]); \
+        } \
+    } \
+}
+
+CHECK_OUTPUT_FUNC(d,  8)
+CHECK_OUTPUT_FUNC(w,  4)
+CHECK_OUTPUT_FUNC(h,  2)
+CHECK_OUTPUT_FUNC(b,  1)
+
+static void init_buffers(void)
+{
+    int counter0 = 0;
+    int counter1 = 17;
+    for (int i = 0; i < BUFSIZE; i++) {
+        for (int j = 0; j < MAX_VEC_SIZE_BYTES; j++) {
+            buffer0[i].b[j] = counter0++;
+            buffer1[i].b[j] = counter1++;
+        }
+        for (int j = 0; j < MAX_VEC_SIZE_BYTES / 4; j++) {
+            mask[i].w[j] = (i + j % MASKMOD == 0) ? 0 : 1;
+        }
+    }
+}
+
+static void test_load_tmp(void)
+{
+    void *p0 = buffer0;
+    void *p1 = buffer1;
+    void *pout = output;
+
+    for (int i = 0; i < BUFSIZE; i++) {
+        /*
+         * Load into v12 as .tmp, then use it in the next packet
+         * Should get the new value within the same packet and
+         * the old value in the next packet
+         */
+        asm("v3 = vmem(%0 + #0)\n\t"
+            "r1 = #1\n\t"
+            "v12 = vsplat(r1)\n\t"
+            "{\n\t"
+            "    v12.tmp = vmem(%1 + #0)\n\t"
+            "    v4.w = vadd(v12.w, v3.w)\n\t"
+            "}\n\t"
+            "v4.w = vadd(v4.w, v12.w)\n\t"
+            "vmem(%2 + #0) = v4\n\t"
+            : : "r"(p0), "r"(p1), "r"(pout)
+            : "r1", "v12", "v3", "v4", "v6", "memory");
+        p0 += sizeof(MMVector);
+        p1 += sizeof(MMVector);
+        pout += sizeof(MMVector);
+
+        for (int j = 0; j < MAX_VEC_SIZE_BYTES / 4; j++) {
+            expect[i].w[j] = buffer0[i].w[j] + buffer1[i].w[j] + 1;
+        }
+    }
+
+    check_output_w(__LINE__, BUFSIZE);
+}
+
+static void test_load_cur(void)
+{
+    void *p0 = buffer0;
+    void *pout = output;
+
+    for (int i = 0; i < BUFSIZE; i++) {
+        asm("{\n\t"
+            "    v2.cur = vmem(%0 + #0)\n\t"
+            "    vmem(%1 + #0) = v2\n\t"
+            "}\n\t"
+            : : "r"(p0), "r"(pout) : "v2", "memory");
+        p0 += sizeof(MMVector);
+        pout += sizeof(MMVector);
+
+        for (int j = 0; j < MAX_VEC_SIZE_BYTES / 4; j++) {
+            expect[i].uw[j] = buffer0[i].uw[j];
+        }
+    }
+
+    check_output_w(__LINE__, BUFSIZE);
+}
+
+static void test_load_aligned(void)
+{
+    /* Aligned loads ignore the low bits of the address */
+    void *p0 = buffer0;
+    void *pout = output;
+    const size_t offset = 13;
+
+    p0 += offset;    /* Create an unaligned address */
+    asm("v2 = vmem(%0 + #0)\n\t"
+        "vmem(%1 + #0) = v2\n\t"
+        : : "r"(p0), "r"(pout) : "v2", "memory");
+
+    expect[0] = buffer0[0];
+
+    check_output_w(__LINE__, 1);
+}
+
+static void test_load_unaligned(void)
+{
+    void *p0 = buffer0;
+    void *pout = output;
+    const size_t offset = 12;
+
+    p0 += offset;    /* Create an unaligned address */
+    asm("v2 = vmemu(%0 + #0)\n\t"
+        "vmem(%1 + #0) = v2\n\t"
+        : : "r"(p0), "r"(pout) : "v2", "memory");
+
+    memcpy(expect, &buffer0[0].ub[offset], sizeof(MMVector));
+
+    check_output_w(__LINE__, 1);
+}
+
+static void test_store_aligned(void)
+{
+    /* Aligned stores ignore the low bits of the address */
+    void *p0 = buffer0;
+    void *pout = output;
+    const size_t offset = 13;
+
+    pout += offset;    /* Create an unaligned address */
+    asm("v2 = vmem(%0 + #0)\n\t"
+        "vmem(%1 + #0) = v2\n\t"
+        : : "r"(p0), "r"(pout) : "v2", "memory");
+
+    expect[0] = buffer0[0];
+
+    check_output_w(__LINE__, 1);
+}
+
+static void test_store_unaligned(void)
+{
+    void *p0 = buffer0;
+    void *pout = output;
+    const size_t offset = 12;
+
+    pout += offset;    /* Create an unaligned address */
+    asm("v2 = vmem(%0 + #0)\n\t"
+        "vmemu(%1 + #0) = v2\n\t"
+        : : "r"(p0), "r"(pout) : "v2", "memory");
+
+    memcpy(expect, buffer0, 2 * sizeof(MMVector));
+    memcpy(&expect[0].ub[offset], buffer0, sizeof(MMVector));
+
+    check_output_w(__LINE__, 2);
+}
+
+static void test_masked_store(bool invert)
+{
+    void *p0 = buffer0;
+    void *pmask = mask;
+    void *pout = output;
+
+    memset(expect, 0xff, sizeof(expect));
+    memset(output, 0xff, sizeof(expect));
+
+    for (int i = 0; i < BUFSIZE; i++) {
+        if (invert) {
+            asm("r4 = #0\n\t"
+                "v4 = vsplat(r4)\n\t"
+                "v5 = vmem(%0 + #0)\n\t"
+                "q0 = vcmp.eq(v4.w, v5.w)\n\t"
+                "v5 = vmem(%1)\n\t"
+                "if (!q0) vmem(%2) = v5\n\t"             /* Inverted test */
+                : : "r"(pmask), "r"(p0), "r"(pout)
+                : "r4", "v4", "v5", "q0", "memory");
+        } else {
+            asm("r4 = #0\n\t"
+                "v4 = vsplat(r4)\n\t"
+                "v5 = vmem(%0 + #0)\n\t"
+                "q0 = vcmp.eq(v4.w, v5.w)\n\t"
+                "v5 = vmem(%1)\n\t"
+                "if (q0) vmem(%2) = v5\n\t"             /* Non-inverted test */
+                : : "r"(pmask), "r"(p0), "r"(pout)
+                : "r4", "v4", "v5", "q0", "memory");
+        }
+        p0 += sizeof(MMVector);
+        pmask += sizeof(MMVector);
+        pout += sizeof(MMVector);
+
+        for (int j = 0; j < MAX_VEC_SIZE_BYTES / 4; j++) {
+            if (invert) {
+                if (i + j % MASKMOD != 0) {
+                    expect[i].w[j] = buffer0[i].w[j];
+                }
+            } else {
+                if (i + j % MASKMOD == 0) {
+                    expect[i].w[j] = buffer0[i].w[j];
+                }
+            }
+        }
+    }
+
+    check_output_w(__LINE__, BUFSIZE);
+}
+
+static void test_new_value_store(void)
+{
+    void *p0 = buffer0;
+    void *pout = output;
+
+    asm("{\n\t"
+        "    v2 = vmem(%0 + #0)\n\t"
+        "    vmem(%1 + #0) = v2.new\n\t"
+        "}\n\t"
+        : : "r"(p0), "r"(pout) : "v2", "memory");
+
+    expect[0] = buffer0[0];
+
+    check_output_w(__LINE__, 1);
+}
+
+static void test_max_temps()
+{
+    void *p0 = buffer0;
+    void *pout = output;
+
+    asm("v0 = vmem(%0 + #0)\n\t"
+        "v1 = vmem(%0 + #1)\n\t"
+        "v2 = vmem(%0 + #2)\n\t"
+        "v3 = vmem(%0 + #3)\n\t"
+        "v4 = vmem(%0 + #4)\n\t"
+        "{\n\t"
+        "    v1:0.w = vadd(v3:2.w, v1:0.w)\n\t"
+        "    v2.b = vshuffe(v3.b, v2.b)\n\t"
+        "    v3.w = vadd(v1.w, v4.w)\n\t"
+        "    v4.tmp = vmem(%0 + #5)\n\t"
+        "}\n\t"
+        "vmem(%1 + #0) = v0\n\t"
+        "vmem(%1 + #1) = v1\n\t"
+        "vmem(%1 + #2) = v2\n\t"
+        "vmem(%1 + #3) = v3\n\t"
+        "vmem(%1 + #4) = v4\n\t"
+        : : "r"(p0), "r"(pout) : "memory");
+
+        /* The first two vectors come from the vadd-pair instruction */
+        for (int i = 0; i < MAX_VEC_SIZE_BYTES / 4; i++) {
+            expect[0].w[i] = buffer0[0].w[i] + buffer0[2].w[i];
+            expect[1].w[i] = buffer0[1].w[i] + buffer0[3].w[i];
+        }
+        /* The third vector comes from the vshuffe instruction */
+        for (int i = 0; i < MAX_VEC_SIZE_BYTES / 2; i++) {
+            expect[2].uh[i] = (buffer0[2].uh[i] & 0xff) |
+                              (buffer0[3].uh[i] & 0xff) << 8;
+        }
+        /* The fourth vector comes from the vadd-single instruction */
+        for (int i = 0; i < MAX_VEC_SIZE_BYTES / 4; i++) {
+            expect[3].w[i] = buffer0[1].w[i] + buffer0[5].w[i];
+        }
+        /*
+         * The fifth vector comes from the load to v4
+         * make sure the .tmp is dropped
+         */
+        expect[4] = buffer0[4];
+
+        check_output_b(__LINE__, 5);
+}
+
+#define OP1(ASM, EL, IN, OUT) \
+    asm("v2 = vmem(%0 + #0)\n\t" \
+        "v2" #EL " = " #ASM "(v2" #EL ")\n\t" \
+        "vmem(%1 + #0) = v2\n\t" \
+        : : "r"(IN), "r"(OUT) : "v2", "memory")
+
+#define OP2(ASM, EL, IN0, IN1, OUT) \
+    asm("v2 = vmem(%0 + #0)\n\t" \
+        "v3 = vmem(%1 + #0)\n\t" \
+        "v2" #EL " = " #ASM "(v2" #EL ", v3" #EL ")\n\t" \
+        "vmem(%2 + #0) = v2\n\t" \
+        : : "r"(IN0), "r"(IN1), "r"(OUT) : "v2", "v3", "memory")
+
+#define TEST_OP1(NAME, ASM, EL, FIELD, FIELDSZ, OP) \
+static void test_##NAME(void) \
+{ \
+    void *pin = buffer0; \
+    void *pout = output; \
+    for (int i = 0; i < BUFSIZE; i++) { \
+        OP1(ASM, EL, pin, pout); \
+        pin += sizeof(MMVector); \
+        pout += sizeof(MMVector); \
+    } \
+    for (int i = 0; i < BUFSIZE; i++) { \
+        for (int j = 0; j < MAX_VEC_SIZE_BYTES / FIELDSZ; j++) { \
+            expect[i].FIELD[j] = OP buffer0[i].FIELD[j]; \
+        } \
+    } \
+    check_output_##FIELD(__LINE__, BUFSIZE); \
+}
+
+#define TEST_OP2(NAME, ASM, EL, FIELD, FIELDSZ, OP) \
+static void test_##NAME(void) \
+{ \
+    void *p0 = buffer0; \
+    void *p1 = buffer1; \
+    void *pout = output; \
+    for (int i = 0; i < BUFSIZE; i++) { \
+        OP2(ASM, EL, p0, p1, pout); \
+        p0 += sizeof(MMVector); \
+        p1 += sizeof(MMVector); \
+        pout += sizeof(MMVector); \
+    } \
+    for (int i = 0; i < BUFSIZE; i++) { \
+        for (int j = 0; j < MAX_VEC_SIZE_BYTES / FIELDSZ; j++) { \
+            expect[i].FIELD[j] = buffer0[i].FIELD[j] OP buffer1[i].FIELD[j]; \
+        } \
+    } \
+    check_output_##FIELD(__LINE__, BUFSIZE); \
+}
+
+TEST_OP2(vadd_w, vadd, .w, w, 4, +)
+TEST_OP2(vadd_h, vadd, .h, h, 2, +)
+TEST_OP2(vadd_b, vadd, .b, b, 1, +)
+TEST_OP2(vsub_w, vsub, .w, w, 4, -)
+TEST_OP2(vsub_h, vsub, .h, h, 2, -)
+TEST_OP2(vsub_b, vsub, .b, b, 1, -)
+TEST_OP2(vxor, vxor, , d, 8, ^)
+TEST_OP2(vand, vand, , d, 8, &)
+TEST_OP2(vor, vor, , d, 8, |)
+TEST_OP1(vnot, vnot, , d, 8, ~)
+
+int main()
+{
+    init_buffers();
+
+    test_load_tmp();
+    test_load_cur();
+    test_load_aligned();
+    test_load_unaligned();
+    test_store_aligned();
+    test_store_unaligned();
+    test_masked_store(false);
+    test_masked_store(true);
+    test_new_value_store();
+    test_max_temps();
+
+    test_vadd_w();
+    test_vadd_h();
+    test_vadd_b();
+    test_vsub_w();
+    test_vsub_h();
+    test_vsub_b();
+    test_vxor();
+    test_vand();
+    test_vor();
+    test_vnot();
+
+    puts(err ? "FAIL" : "PASS");
+    return err ? 1 : 0;
+}
diff --git a/tests/tcg/hexagon/Makefile.target b/tests/tcg/hexagon/Makefile.target
index 18b4472..7708e76 100644
--- a/tests/tcg/hexagon/Makefile.target
+++ b/tests/tcg/hexagon/Makefile.target
@@ -40,7 +40,9 @@ HEX_TESTS += load_align
 HEX_TESTS += vector_add_int
 HEX_TESTS += atomics
 HEX_TESTS += fpstuff
+HEX_TESTS += hvx_misc
 
 TESTS += $(HEX_TESTS)
 
 vector_add_int: CFLAGS += -mhvx -fvectorize
+hvx_misc: CFLAGS += -mhvx
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 29/30] Hexagon HVX (tests/tcg/hexagon) scatter_gather test
  2021-09-20 21:23 [PATCH v3 00/30] Hexagon HVX (target/hexagon) patch series Taylor Simpson
                   ` (27 preceding siblings ...)
  2021-09-20 21:24 ` [PATCH v3 28/30] Hexagon HVX (tests/tcg/hexagon) hvx_misc test Taylor Simpson
@ 2021-09-20 21:24 ` Taylor Simpson
  2021-09-20 21:24 ` [PATCH v3 30/30] Hexagon HVX (tests/tcg/hexagon) histogram test Taylor Simpson
  29 siblings, 0 replies; 51+ messages in thread
From: Taylor Simpson @ 2021-09-20 21:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: ale, bcain, tsimpson, richard.henderson, f4bug

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 tests/tcg/hexagon/scatter_gather.c | 1011 ++++++++++++++++++++++++++++++++++++
 tests/tcg/hexagon/Makefile.target  |    2 +
 2 files changed, 1013 insertions(+)
 create mode 100644 tests/tcg/hexagon/scatter_gather.c

diff --git a/tests/tcg/hexagon/scatter_gather.c b/tests/tcg/hexagon/scatter_gather.c
new file mode 100644
index 0000000..b93eb18
--- /dev/null
+++ b/tests/tcg/hexagon/scatter_gather.c
@@ -0,0 +1,1011 @@
+/*
+ *  Copyright(c) 2019-2021 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+/*
+ * This example tests the HVX scatter/gather instructions
+ *
+ * See section 5.13 of the V68 HVX Programmer's Reference
+ *
+ * There are 3 main classes operations
+ *     _16                 16-bit elements and 16-bit offsets
+ *     _32                 32-bit elements and 32-bit offsets
+ *     _16_32              16-bit elements and 32-bit offsets
+ *
+ * There are also masked and accumulate versions
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <stdlib.h>
+#include <inttypes.h>
+
+typedef long HVX_Vector       __attribute__((__vector_size__(128)))
+                              __attribute__((aligned(128)));
+typedef long HVX_VectorPair   __attribute__((__vector_size__(256)))
+                              __attribute__((aligned(128)));
+typedef long HVX_VectorPred   __attribute__((__vector_size__(128)))
+                              __attribute__((aligned(128)));
+
+#define VSCATTER_16(BASE, RGN, OFF, VALS) \
+    __builtin_HEXAGON_V6_vscattermh_128B((int)BASE, RGN, OFF, VALS)
+#define VSCATTER_16_MASKED(MASK, BASE, RGN, OFF, VALS) \
+    __builtin_HEXAGON_V6_vscattermhq_128B(MASK, (int)BASE, RGN, OFF, VALS)
+#define VSCATTER_32(BASE, RGN, OFF, VALS) \
+    __builtin_HEXAGON_V6_vscattermw_128B((int)BASE, RGN, OFF, VALS)
+#define VSCATTER_32_MASKED(MASK, BASE, RGN, OFF, VALS) \
+    __builtin_HEXAGON_V6_vscattermwq_128B(MASK, (int)BASE, RGN, OFF, VALS)
+#define VSCATTER_16_32(BASE, RGN, OFF, VALS) \
+    __builtin_HEXAGON_V6_vscattermhw_128B((int)BASE, RGN, OFF, VALS)
+#define VSCATTER_16_32_MASKED(MASK, BASE, RGN, OFF, VALS) \
+    __builtin_HEXAGON_V6_vscattermhwq_128B(MASK, (int)BASE, RGN, OFF, VALS)
+#define VSCATTER_16_ACC(BASE, RGN, OFF, VALS) \
+    __builtin_HEXAGON_V6_vscattermh_add_128B((int)BASE, RGN, OFF, VALS)
+#define VSCATTER_32_ACC(BASE, RGN, OFF, VALS) \
+    __builtin_HEXAGON_V6_vscattermw_add_128B((int)BASE, RGN, OFF, VALS)
+#define VSCATTER_16_32_ACC(BASE, RGN, OFF, VALS) \
+    __builtin_HEXAGON_V6_vscattermhw_add_128B((int)BASE, RGN, OFF, VALS)
+
+#define VGATHER_16(DSTADDR, BASE, RGN, OFF) \
+    __builtin_HEXAGON_V6_vgathermh_128B(DSTADDR, (int)BASE, RGN, OFF)
+#define VGATHER_16_MASKED(DSTADDR, MASK, BASE, RGN, OFF) \
+    __builtin_HEXAGON_V6_vgathermhq_128B(DSTADDR, MASK, (int)BASE, RGN, OFF)
+#define VGATHER_32(DSTADDR, BASE, RGN, OFF) \
+    __builtin_HEXAGON_V6_vgathermw_128B(DSTADDR, (int)BASE, RGN, OFF)
+#define VGATHER_32_MASKED(DSTADDR, MASK, BASE, RGN, OFF) \
+    __builtin_HEXAGON_V6_vgathermwq_128B(DSTADDR, MASK, (int)BASE, RGN, OFF)
+#define VGATHER_16_32(DSTADDR, BASE, RGN, OFF) \
+    __builtin_HEXAGON_V6_vgathermhw_128B(DSTADDR, (int)BASE, RGN, OFF)
+#define VGATHER_16_32_MASKED(DSTADDR, MASK, BASE, RGN, OFF) \
+    __builtin_HEXAGON_V6_vgathermhwq_128B(DSTADDR, MASK, (int)BASE, RGN, OFF)
+
+#define VSHUFF_H(V) \
+    __builtin_HEXAGON_V6_vshuffh_128B(V)
+#define VSPLAT_H(X) \
+    __builtin_HEXAGON_V6_lvsplath_128B(X)
+#define VAND_VAL(PRED, VAL) \
+    __builtin_HEXAGON_V6_vandvrt_128B(PRED, VAL)
+#define VDEAL_H(V) \
+    __builtin_HEXAGON_V6_vdealh_128B(V)
+
+int err;
+
+/* define the number of rows/cols in a square matrix */
+#define MATRIX_SIZE 64
+
+/* define the size of the scatter buffer */
+#define SCATTER_BUFFER_SIZE (MATRIX_SIZE * MATRIX_SIZE)
+
+/* fake vtcm - put buffers together and force alignment */
+static struct {
+    unsigned short vscatter16[SCATTER_BUFFER_SIZE];
+    unsigned short vgather16[MATRIX_SIZE];
+    unsigned int   vscatter32[SCATTER_BUFFER_SIZE];
+    unsigned int   vgather32[MATRIX_SIZE];
+    unsigned short vscatter16_32[SCATTER_BUFFER_SIZE];
+    unsigned short vgather16_32[MATRIX_SIZE];
+} vtcm __attribute__((aligned(0x10000)));
+
+/* declare the arrays of reference values */
+unsigned short vscatter16_ref[SCATTER_BUFFER_SIZE];
+unsigned short vgather16_ref[MATRIX_SIZE];
+unsigned int   vscatter32_ref[SCATTER_BUFFER_SIZE];
+unsigned int   vgather32_ref[MATRIX_SIZE];
+unsigned short vscatter16_32_ref[SCATTER_BUFFER_SIZE];
+unsigned short vgather16_32_ref[MATRIX_SIZE];
+
+/* declare the arrays of offsets */
+unsigned short half_offsets[MATRIX_SIZE];
+unsigned int   word_offsets[MATRIX_SIZE];
+
+/* declare the arrays of values */
+unsigned short half_values[MATRIX_SIZE];
+unsigned short half_values_acc[MATRIX_SIZE];
+unsigned short half_values_masked[MATRIX_SIZE];
+unsigned int   word_values[MATRIX_SIZE];
+unsigned int   word_values_acc[MATRIX_SIZE];
+unsigned int   word_values_masked[MATRIX_SIZE];
+
+/* declare the arrays of predicates */
+unsigned short half_predicates[MATRIX_SIZE];
+unsigned int   word_predicates[MATRIX_SIZE];
+
+/* make this big enough for all the intrinsics */
+const size_t region_len = sizeof(vtcm);
+
+/* optionally add sync instructions */
+#define SYNC_VECTOR 1
+
+static void sync_scatter(void *addr)
+{
+#if SYNC_VECTOR
+    /*
+     * Do the scatter release followed by a dummy load to complete the
+     * synchronization.  Normally the dummy load would be deferred as
+     * long as possible to minimize stalls.
+     */
+    asm volatile("vmem(%0 + #0):scatter_release\n" : : "r"(addr));
+    /* use volatile to force the load */
+    volatile HVX_Vector vDummy = *(HVX_Vector *)addr; vDummy = vDummy;
+#endif
+}
+
+static void sync_gather(void *addr)
+{
+#if SYNC_VECTOR
+    /* use volatile to force the load */
+    volatile HVX_Vector vDummy = *(HVX_Vector *)addr; vDummy = vDummy;
+#endif
+}
+
+/* optionally print the results */
+#define PRINT_DATA 0
+
+#define FILL_CHAR       '.'
+
+/* fill vtcm scratch with ee */
+void prefill_vtcm_scratch(void)
+{
+    memset(&vtcm, FILL_CHAR, sizeof(vtcm));
+}
+
+/* create byte offsets to be a diagonal of the matrix with 16 bit elements */
+void create_offsets_values_preds_16(void)
+{
+    unsigned short half_element = 0;
+    unsigned short half_element_masked = 0;
+    char letter = 'A';
+    char letter_masked = '@';
+
+    for (int i = 0; i < MATRIX_SIZE; i++) {
+        half_offsets[i] = i * (2 * MATRIX_SIZE + 2);
+
+        half_element = 0;
+        half_element_masked = 0;
+        for (int j = 0; j < 2; j++) {
+            half_element |= letter << j * 8;
+            half_element_masked |= letter_masked << j * 8;
+        }
+
+        half_values[i] = half_element;
+        half_values_acc[i] = ((i % 10) << 8) + (i % 10);
+        half_values_masked[i] = half_element_masked;
+
+        letter++;
+        /* reset to 'A' */
+        if (letter == 'M') {
+            letter = 'A';
+        }
+
+        half_predicates[i] = (i % 3 == 0 || i % 5 == 0) ? ~0 : 0;
+    }
+}
+
+/* create byte offsets to be a diagonal of the matrix with 32 bit elements */
+void create_offsets_values_preds_32(void)
+{
+    unsigned int word_element = 0;
+    unsigned int word_element_masked = 0;
+    char letter = 'A';
+    char letter_masked = '&';
+
+    for (int i = 0; i < MATRIX_SIZE; i++) {
+        word_offsets[i] = i * (4 * MATRIX_SIZE + 4);
+
+        word_element = 0;
+        word_element_masked = 0;
+        for (int j = 0; j < 4; j++) {
+            word_element |= letter << j * 8;
+            word_element_masked |= letter_masked << j * 8;
+        }
+
+        word_values[i] = word_element;
+        word_values_acc[i] = ((i % 10) << 8) + (i % 10);
+        word_values_masked[i] = word_element_masked;
+
+        letter++;
+        /* reset to 'A' */
+        if (letter == 'M') {
+            letter = 'A';
+        }
+
+        word_predicates[i] = (i % 4 == 0 || i % 7 == 0) ? ~0 : 0;
+    }
+}
+
+/*
+ * create byte offsets to be a diagonal of the matrix with 16 bit elements
+ * and 32 bit offsets
+ */
+void create_offsets_values_preds_16_32(void)
+{
+    unsigned short half_element = 0;
+    unsigned short half_element_masked = 0;
+    char letter = 'D';
+    char letter_masked = '$';
+
+    for (int i = 0; i < MATRIX_SIZE; i++) {
+        word_offsets[i] = i * (2 * MATRIX_SIZE + 2);
+
+        half_element = 0;
+        half_element_masked = 0;
+        for (int j = 0; j < 2; j++) {
+            half_element |= letter << j * 8;
+            half_element_masked |= letter_masked << j * 8;
+        }
+
+        half_values[i] = half_element;
+        half_values_acc[i] = ((i % 10) << 8) + (i % 10);
+        half_values_masked[i] = half_element_masked;
+
+        letter++;
+        /* reset to 'A' */
+        if (letter == 'P') {
+            letter = 'D';
+        }
+
+        half_predicates[i] = (i % 2 == 0 || i % 13 == 0) ? ~0 : 0;
+    }
+}
+
+/* scatter the 16 bit elements using intrinsics */
+void vector_scatter_16(void)
+{
+    /* copy the offsets and values to vectors */
+    HVX_Vector offsets = *(HVX_Vector *)half_offsets;
+    HVX_Vector values = *(HVX_Vector *)half_values;
+
+    VSCATTER_16(&vtcm.vscatter16, region_len, offsets, values);
+
+    sync_scatter(vtcm.vscatter16);
+}
+
+/* scatter-accumulate the 16 bit elements using intrinsics */
+void vector_scatter_16_acc(void)
+{
+    /* copy the offsets and values to vectors */
+    HVX_Vector offsets = *(HVX_Vector *)half_offsets;
+    HVX_Vector values = *(HVX_Vector *)half_values_acc;
+
+    VSCATTER_16_ACC(&vtcm.vscatter16, region_len, offsets, values);
+
+    sync_scatter(vtcm.vscatter16);
+}
+
+/* scatter the 16 bit elements using intrinsics */
+void vector_scatter_16_masked(void)
+{
+    /* copy the offsets and values to vectors */
+    HVX_Vector offsets = *(HVX_Vector *)half_offsets;
+    HVX_Vector values = *(HVX_Vector *)half_values_masked;
+    HVX_Vector pred_reg = *(HVX_Vector *)half_predicates;
+    HVX_VectorPred preds = VAND_VAL(pred_reg, ~0);
+
+    VSCATTER_16_MASKED(preds, &vtcm.vscatter16, region_len, offsets, values);
+
+    sync_scatter(vtcm.vscatter16);
+}
+
+/* scatter the 32 bit elements using intrinsics */
+void vector_scatter_32(void)
+{
+    /* copy the offsets and values to vectors */
+    HVX_Vector offsetslo = *(HVX_Vector *)word_offsets;
+    HVX_Vector offsetshi = *(HVX_Vector *)&word_offsets[MATRIX_SIZE / 2];
+    HVX_Vector valueslo = *(HVX_Vector *)word_values;
+    HVX_Vector valueshi = *(HVX_Vector *)&word_values[MATRIX_SIZE / 2];
+
+    VSCATTER_32(&vtcm.vscatter32, region_len, offsetslo, valueslo);
+    VSCATTER_32(&vtcm.vscatter32, region_len, offsetshi, valueshi);
+
+    sync_scatter(vtcm.vscatter32);
+}
+
+/* scatter-acc the 32 bit elements using intrinsics */
+void vector_scatter_32_acc(void)
+{
+    /* copy the offsets and values to vectors */
+    HVX_Vector offsetslo = *(HVX_Vector *)word_offsets;
+    HVX_Vector offsetshi = *(HVX_Vector *)&word_offsets[MATRIX_SIZE / 2];
+    HVX_Vector valueslo = *(HVX_Vector *)word_values_acc;
+    HVX_Vector valueshi = *(HVX_Vector *)&word_values_acc[MATRIX_SIZE / 2];
+
+    VSCATTER_32_ACC(&vtcm.vscatter32, region_len, offsetslo, valueslo);
+    VSCATTER_32_ACC(&vtcm.vscatter32, region_len, offsetshi, valueshi);
+
+    sync_scatter(vtcm.vscatter32);
+}
+
+/* scatter the 32 bit elements using intrinsics */
+void vector_scatter_32_masked(void)
+{
+    /* copy the offsets and values to vectors */
+    HVX_Vector offsetslo = *(HVX_Vector *)word_offsets;
+    HVX_Vector offsetshi = *(HVX_Vector *)&word_offsets[MATRIX_SIZE / 2];
+    HVX_Vector valueslo = *(HVX_Vector *)word_values_masked;
+    HVX_Vector valueshi = *(HVX_Vector *)&word_values_masked[MATRIX_SIZE / 2];
+    HVX_Vector pred_reglo = *(HVX_Vector *)word_predicates;
+    HVX_Vector pred_reghi = *(HVX_Vector *)&word_predicates[MATRIX_SIZE / 2];
+    HVX_VectorPred predslo = VAND_VAL(pred_reglo, ~0);
+    HVX_VectorPred predshi = VAND_VAL(pred_reghi, ~0);
+
+    VSCATTER_32_MASKED(predslo, &vtcm.vscatter32, region_len, offsetslo,
+                       valueslo);
+    VSCATTER_32_MASKED(predshi, &vtcm.vscatter32, region_len, offsetshi,
+                       valueshi);
+
+    sync_scatter(vtcm.vscatter16);
+}
+
+/* scatter the 16 bit elements with 32 bit offsets using intrinsics */
+void vector_scatter_16_32(void)
+{
+    HVX_VectorPair offsets;
+    HVX_Vector values;
+
+    /* get the word offsets in a vector pair */
+    offsets = *(HVX_VectorPair *)word_offsets;
+
+    /* these values need to be shuffled for the scatter */
+    values = *(HVX_Vector *)half_values;
+    values = VSHUFF_H(values);
+
+    VSCATTER_16_32(&vtcm.vscatter16_32, region_len, offsets, values);
+
+    sync_scatter(vtcm.vscatter16_32);
+}
+
+/* scatter-acc the 16 bit elements with 32 bit offsets using intrinsics */
+void vector_scatter_16_32_acc(void)
+{
+    HVX_VectorPair offsets;
+    HVX_Vector values;
+
+    /* get the word offsets in a vector pair */
+    offsets = *(HVX_VectorPair *)word_offsets;
+
+    /* these values need to be shuffled for the scatter */
+    values = *(HVX_Vector *)half_values_acc;
+    values = VSHUFF_H(values);
+
+    VSCATTER_16_32_ACC(&vtcm.vscatter16_32, region_len, offsets, values);
+
+    sync_scatter(vtcm.vscatter16_32);
+}
+
+/* masked scatter the 16 bit elements with 32 bit offsets using intrinsics */
+void vector_scatter_16_32_masked(void)
+{
+    HVX_VectorPair offsets;
+    HVX_Vector values;
+    HVX_Vector pred_reg;
+
+    /* get the word offsets in a vector pair */
+    offsets = *(HVX_VectorPair *)word_offsets;
+
+    /* these values need to be shuffled for the scatter */
+    values = *(HVX_Vector *)half_values_masked;
+    values = VSHUFF_H(values);
+
+    pred_reg = *(HVX_Vector *)half_predicates;
+    pred_reg = VSHUFF_H(pred_reg);
+    HVX_VectorPred preds = VAND_VAL(pred_reg, ~0);
+
+    VSCATTER_16_32_MASKED(preds, &vtcm.vscatter16_32, region_len, offsets,
+                          values);
+
+    sync_scatter(vtcm.vscatter16_32);
+}
+
+/* gather the elements from the scatter16 buffer */
+void vector_gather_16(void)
+{
+    HVX_Vector *vgather = (HVX_Vector *)&vtcm.vgather16;
+    HVX_Vector offsets = *(HVX_Vector *)half_offsets;
+
+    VGATHER_16(vgather, &vtcm.vscatter16, region_len, offsets);
+
+    sync_gather(vgather);
+}
+
+static unsigned short gather_16_masked_init(void)
+{
+    char letter = '?';
+    return letter | (letter << 8);
+}
+
+void vector_gather_16_masked(void)
+{
+    HVX_Vector *vgather = (HVX_Vector *)&vtcm.vgather16;
+    HVX_Vector offsets = *(HVX_Vector *)half_offsets;
+    HVX_Vector pred_reg = *(HVX_Vector *)half_predicates;
+    HVX_VectorPred preds = VAND_VAL(pred_reg, ~0);
+
+    *vgather = VSPLAT_H(gather_16_masked_init());
+    VGATHER_16_MASKED(vgather, preds, &vtcm.vscatter16, region_len, offsets);
+
+    sync_gather(vgather);
+}
+
+/* gather the elements from the scatter32 buffer */
+void vector_gather_32(void)
+{
+    HVX_Vector *vgatherlo = (HVX_Vector *)&vtcm.vgather32;
+    HVX_Vector *vgatherhi =
+        (HVX_Vector *)((int)&vtcm.vgather32 + (MATRIX_SIZE * 2));
+    HVX_Vector offsetslo = *(HVX_Vector *)word_offsets;
+    HVX_Vector offsetshi = *(HVX_Vector *)&word_offsets[MATRIX_SIZE / 2];
+
+    VGATHER_32(vgatherlo, &vtcm.vscatter32, region_len, offsetslo);
+    VGATHER_32(vgatherhi, &vtcm.vscatter32, region_len, offsetshi);
+
+    sync_gather(vgatherhi);
+}
+
+static unsigned int gather_32_masked_init(void)
+{
+    char letter = '?';
+    return letter | (letter << 8) | (letter << 16) | (letter << 24);
+}
+
+void vector_gather_32_masked(void)
+{
+    HVX_Vector *vgatherlo = (HVX_Vector *)&vtcm.vgather32;
+    HVX_Vector *vgatherhi =
+        (HVX_Vector *)((int)&vtcm.vgather32 + (MATRIX_SIZE * 2));
+    HVX_Vector offsetslo = *(HVX_Vector *)word_offsets;
+    HVX_Vector offsetshi = *(HVX_Vector *)&word_offsets[MATRIX_SIZE / 2];
+    HVX_Vector pred_reglo = *(HVX_Vector *)word_predicates;
+    HVX_VectorPred predslo = VAND_VAL(pred_reglo, ~0);
+    HVX_Vector pred_reghi = *(HVX_Vector *)&word_predicates[MATRIX_SIZE / 2];
+    HVX_VectorPred predshi = VAND_VAL(pred_reghi, ~0);
+
+    *vgatherlo = VSPLAT_H(gather_32_masked_init());
+    *vgatherhi = VSPLAT_H(gather_32_masked_init());
+    VGATHER_32_MASKED(vgatherlo, predslo, &vtcm.vscatter32, region_len,
+                      offsetslo);
+    VGATHER_32_MASKED(vgatherhi, predshi, &vtcm.vscatter32, region_len,
+                      offsetshi);
+
+    sync_gather(vgatherlo);
+    sync_gather(vgatherhi);
+}
+
+/* gather the elements from the scatter16_32 buffer */
+void vector_gather_16_32(void)
+{
+    HVX_Vector *vgather;
+    HVX_VectorPair offsets;
+    HVX_Vector values;
+
+    /* get the vtcm address to gather from */
+    vgather = (HVX_Vector *)&vtcm.vgather16_32;
+
+    /* get the word offsets in a vector pair */
+    offsets = *(HVX_VectorPair *)word_offsets;
+
+    VGATHER_16_32(vgather, &vtcm.vscatter16_32, region_len, offsets);
+
+    /* deal the elements to get the order back */
+    values = *(HVX_Vector *)vgather;
+    values = VDEAL_H(values);
+
+    /* write it back to vtcm address */
+    *(HVX_Vector *)vgather = values;
+}
+
+void vector_gather_16_32_masked(void)
+{
+    HVX_Vector *vgather;
+    HVX_VectorPair offsets;
+    HVX_Vector pred_reg;
+    HVX_VectorPred preds;
+    HVX_Vector values;
+
+    /* get the vtcm address to gather from */
+    vgather = (HVX_Vector *)&vtcm.vgather16_32;
+
+    /* get the word offsets in a vector pair */
+    offsets = *(HVX_VectorPair *)word_offsets;
+    pred_reg = *(HVX_Vector *)half_predicates;
+    pred_reg = VSHUFF_H(pred_reg);
+    preds = VAND_VAL(pred_reg, ~0);
+
+   *vgather = VSPLAT_H(gather_16_masked_init());
+   VGATHER_16_32_MASKED(vgather, preds, &vtcm.vscatter16_32, region_len,
+                        offsets);
+
+    /* deal the elements to get the order back */
+    values = *(HVX_Vector *)vgather;
+    values = VDEAL_H(values);
+
+    /* write it back to vtcm address */
+    *(HVX_Vector *)vgather = values;
+}
+
+static void check_buffer(const char *name, void *c, void *r, size_t size)
+{
+    char *check = (char *)c;
+    char *ref = (char *)r;
+    for (int i = 0; i < size; i++) {
+        if (check[i] != ref[i]) {
+            printf("ERROR %s [%d]: 0x%x (%c) != 0x%x (%c)\n", name, i,
+                   check[i], check[i], ref[i], ref[i]);
+            err++;
+        }
+    }
+}
+
+/*
+ * These scalar functions are the C equivalents of the vector functions that
+ * use HVX
+ */
+
+/* scatter the 16 bit elements using C */
+void scalar_scatter_16(unsigned short *vscatter16)
+{
+    for (int i = 0; i < MATRIX_SIZE; ++i) {
+        vscatter16[half_offsets[i] / 2] = half_values[i];
+    }
+}
+
+void check_scatter_16()
+{
+    memset(vscatter16_ref, FILL_CHAR,
+           SCATTER_BUFFER_SIZE * sizeof(unsigned short));
+    scalar_scatter_16(vscatter16_ref);
+    check_buffer(__func__, vtcm.vscatter16, vscatter16_ref,
+                 SCATTER_BUFFER_SIZE * sizeof(unsigned short));
+}
+
+/* scatter the 16 bit elements using C */
+void scalar_scatter_16_acc(unsigned short *vscatter16)
+{
+    for (int i = 0; i < MATRIX_SIZE; ++i) {
+        vscatter16[half_offsets[i] / 2] += half_values_acc[i];
+    }
+}
+
+void check_scatter_16_acc()
+{
+    memset(vscatter16_ref, FILL_CHAR,
+           SCATTER_BUFFER_SIZE * sizeof(unsigned short));
+    scalar_scatter_16(vscatter16_ref);
+    scalar_scatter_16_acc(vscatter16_ref);
+    check_buffer(__func__, vtcm.vscatter16, vscatter16_ref,
+                 SCATTER_BUFFER_SIZE * sizeof(unsigned short));
+}
+
+/* scatter the 16 bit elements using C */
+void scalar_scatter_16_masked(unsigned short *vscatter16)
+{
+    for (int i = 0; i < MATRIX_SIZE; i++) {
+        if (half_predicates[i]) {
+            vscatter16[half_offsets[i] / 2] = half_values_masked[i];
+        }
+    }
+
+}
+
+void check_scatter_16_masked()
+{
+    memset(vscatter16_ref, FILL_CHAR,
+           SCATTER_BUFFER_SIZE * sizeof(unsigned short));
+    scalar_scatter_16(vscatter16_ref);
+    scalar_scatter_16_acc(vscatter16_ref);
+    scalar_scatter_16_masked(vscatter16_ref);
+    check_buffer(__func__, vtcm.vscatter16, vscatter16_ref,
+                 SCATTER_BUFFER_SIZE * sizeof(unsigned short));
+}
+
+/* scatter the 32 bit elements using C */
+void scalar_scatter_32(unsigned int *vscatter32)
+{
+    for (int i = 0; i < MATRIX_SIZE; ++i) {
+        vscatter32[word_offsets[i] / 4] = word_values[i];
+    }
+}
+
+void check_scatter_32()
+{
+    memset(vscatter32_ref, FILL_CHAR,
+           SCATTER_BUFFER_SIZE * sizeof(unsigned int));
+    scalar_scatter_32(vscatter32_ref);
+    check_buffer(__func__, vtcm.vscatter32, vscatter32_ref,
+                 SCATTER_BUFFER_SIZE * sizeof(unsigned int));
+}
+
+/* scatter the 32 bit elements using C */
+void scalar_scatter_32_acc(unsigned int *vscatter32)
+{
+    for (int i = 0; i < MATRIX_SIZE; ++i) {
+        vscatter32[word_offsets[i] / 4] += word_values_acc[i];
+    }
+}
+
+void check_scatter_32_acc()
+{
+    memset(vscatter32_ref, FILL_CHAR,
+           SCATTER_BUFFER_SIZE * sizeof(unsigned int));
+    scalar_scatter_32(vscatter32_ref);
+    scalar_scatter_32_acc(vscatter32_ref);
+    check_buffer(__func__, vtcm.vscatter32, vscatter32_ref,
+                 SCATTER_BUFFER_SIZE * sizeof(unsigned int));
+}
+
+/* scatter the 32 bit elements using C */
+void scalar_scatter_32_masked(unsigned int *vscatter32)
+{
+    for (int i = 0; i < MATRIX_SIZE; i++) {
+        if (word_predicates[i]) {
+            vscatter32[word_offsets[i] / 4] = word_values_masked[i];
+        }
+    }
+}
+
+void check_scatter_32_masked()
+{
+    memset(vscatter32_ref, FILL_CHAR,
+           SCATTER_BUFFER_SIZE * sizeof(unsigned int));
+    scalar_scatter_32(vscatter32_ref);
+    scalar_scatter_32_acc(vscatter32_ref);
+    scalar_scatter_32_masked(vscatter32_ref);
+    check_buffer(__func__, vtcm.vscatter32, vscatter32_ref,
+                  SCATTER_BUFFER_SIZE * sizeof(unsigned int));
+}
+
+/* scatter the 32 bit elements using C */
+void scalar_scatter_16_32(unsigned short *vscatter16_32)
+{
+    for (int i = 0; i < MATRIX_SIZE; ++i) {
+        vscatter16_32[word_offsets[i] / 2] = half_values[i];
+    }
+}
+
+void check_scatter_16_32()
+{
+    memset(vscatter16_32_ref, FILL_CHAR,
+           SCATTER_BUFFER_SIZE * sizeof(unsigned short));
+    scalar_scatter_16_32(vscatter16_32_ref);
+    check_buffer(__func__, vtcm.vscatter16_32, vscatter16_32_ref,
+                 SCATTER_BUFFER_SIZE * sizeof(unsigned short));
+}
+
+/* scatter the 32 bit elements using C */
+void scalar_scatter_16_32_acc(unsigned short *vscatter16_32)
+{
+    for (int i = 0; i < MATRIX_SIZE; ++i) {
+        vscatter16_32[word_offsets[i] / 2] += half_values_acc[i];
+    }
+}
+
+void check_scatter_16_32_acc()
+{
+    memset(vscatter16_32_ref, FILL_CHAR,
+           SCATTER_BUFFER_SIZE * sizeof(unsigned short));
+    scalar_scatter_16_32(vscatter16_32_ref);
+    scalar_scatter_16_32_acc(vscatter16_32_ref);
+    check_buffer(__func__, vtcm.vscatter16_32, vscatter16_32_ref,
+                 SCATTER_BUFFER_SIZE * sizeof(unsigned short));
+}
+
+void scalar_scatter_16_32_masked(unsigned short *vscatter16_32)
+{
+    for (int i = 0; i < MATRIX_SIZE; i++) {
+        if (half_predicates[i]) {
+            vscatter16_32[word_offsets[i] / 2] = half_values_masked[i];
+        }
+    }
+}
+
+void check_scatter_16_32_masked()
+{
+    memset(vscatter16_32_ref, FILL_CHAR,
+           SCATTER_BUFFER_SIZE * sizeof(unsigned short));
+    scalar_scatter_16_32(vscatter16_32_ref);
+    scalar_scatter_16_32_acc(vscatter16_32_ref);
+    scalar_scatter_16_32_masked(vscatter16_32_ref);
+    check_buffer(__func__, vtcm.vscatter16_32, vscatter16_32_ref,
+                 SCATTER_BUFFER_SIZE * sizeof(unsigned short));
+}
+
+/* gather the elements from the scatter buffer using C */
+void scalar_gather_16(unsigned short *vgather16)
+{
+    for (int i = 0; i < MATRIX_SIZE; ++i) {
+        vgather16[i] = vtcm.vscatter16[half_offsets[i] / 2];
+    }
+}
+
+void check_gather_16()
+{
+      memset(vgather16_ref, 0, MATRIX_SIZE * sizeof(unsigned short));
+      scalar_gather_16(vgather16_ref);
+      check_buffer(__func__, vtcm.vgather16, vgather16_ref,
+                   MATRIX_SIZE * sizeof(unsigned short));
+}
+
+void scalar_gather_16_masked(unsigned short *vgather16)
+{
+    for (int i = 0; i < MATRIX_SIZE; ++i) {
+        if (half_predicates[i]) {
+            vgather16[i] = vtcm.vscatter16[half_offsets[i] / 2];
+        }
+    }
+}
+
+void check_gather_16_masked()
+{
+    memset(vgather16_ref, gather_16_masked_init(),
+           MATRIX_SIZE * sizeof(unsigned short));
+    scalar_gather_16_masked(vgather16_ref);
+    check_buffer(__func__, vtcm.vgather16, vgather16_ref,
+                 MATRIX_SIZE * sizeof(unsigned short));
+}
+
+/* gather the elements from the scatter buffer using C */
+void scalar_gather_32(unsigned int *vgather32)
+{
+    for (int i = 0; i < MATRIX_SIZE; ++i) {
+        vgather32[i] = vtcm.vscatter32[word_offsets[i] / 4];
+    }
+}
+
+void check_gather_32(void)
+{
+    memset(vgather32_ref, 0, MATRIX_SIZE * sizeof(unsigned int));
+    scalar_gather_32(vgather32_ref);
+    check_buffer(__func__, vtcm.vgather32, vgather32_ref,
+                 MATRIX_SIZE * sizeof(unsigned int));
+}
+
+void scalar_gather_32_masked(unsigned int *vgather32)
+{
+    for (int i = 0; i < MATRIX_SIZE; ++i) {
+        if (word_predicates[i]) {
+            vgather32[i] = vtcm.vscatter32[word_offsets[i] / 4];
+        }
+    }
+}
+
+
+void check_gather_32_masked(void)
+{
+    memset(vgather32_ref, gather_32_masked_init(),
+           MATRIX_SIZE * sizeof(unsigned int));
+    scalar_gather_32_masked(vgather32_ref);
+    check_buffer(__func__, vtcm.vgather32,
+                 vgather32_ref, MATRIX_SIZE * sizeof(unsigned int));
+}
+
+/* gather the elements from the scatter buffer using C */
+void scalar_gather_16_32(unsigned short *vgather16_32)
+{
+    for (int i = 0; i < MATRIX_SIZE; ++i) {
+        vgather16_32[i] = vtcm.vscatter16_32[word_offsets[i] / 2];
+    }
+}
+
+void check_gather_16_32(void)
+{
+    memset(vgather16_32_ref, 0, MATRIX_SIZE * sizeof(unsigned short));
+    scalar_gather_16_32(vgather16_32_ref);
+    check_buffer(__func__, vtcm.vgather16_32, vgather16_32_ref,
+                 MATRIX_SIZE * sizeof(unsigned short));
+}
+
+void scalar_gather_16_32_masked(unsigned short *vgather16_32)
+{
+    for (int i = 0; i < MATRIX_SIZE; ++i) {
+        if (half_predicates[i]) {
+            vgather16_32[i] = vtcm.vscatter16_32[word_offsets[i] / 2];
+        }
+    }
+
+}
+
+void check_gather_16_32_masked(void)
+{
+    memset(vgather16_32_ref, gather_16_masked_init(),
+           MATRIX_SIZE * sizeof(unsigned short));
+    scalar_gather_16_32_masked(vgather16_32_ref);
+    check_buffer(__func__, vtcm.vgather16_32, vgather16_32_ref,
+                 MATRIX_SIZE * sizeof(unsigned short));
+}
+
+/* print scatter16 buffer */
+void print_scatter16_buffer(void)
+{
+    if (PRINT_DATA) {
+        printf("\n\nPrinting the 16 bit scatter buffer");
+
+        for (int i = 0; i < SCATTER_BUFFER_SIZE; i++) {
+            if ((i % MATRIX_SIZE) == 0) {
+                printf("\n");
+            }
+            for (int j = 0; j < 2; j++) {
+                printf("%c", (char)((vtcm.vscatter16[i] >> j * 8) & 0xff));
+            }
+            printf(" ");
+        }
+        printf("\n");
+    }
+}
+
+/* print the gather 16 buffer */
+void print_gather_result_16(void)
+{
+    if (PRINT_DATA) {
+        printf("\n\nPrinting the 16 bit gather result\n");
+
+        for (int i = 0; i < MATRIX_SIZE; i++) {
+            for (int j = 0; j < 2; j++) {
+                printf("%c", (char)((vtcm.vgather16[i] >> j * 8) & 0xff));
+            }
+            printf(" ");
+        }
+        printf("\n");
+    }
+}
+
+/* print the scatter32 buffer */
+void print_scatter32_buffer(void)
+{
+    if (PRINT_DATA) {
+        printf("\n\nPrinting the 32 bit scatter buffer");
+
+        for (int i = 0; i < SCATTER_BUFFER_SIZE; i++) {
+            if ((i % MATRIX_SIZE) == 0) {
+                printf("\n");
+            }
+            for (int j = 0; j < 4; j++) {
+                printf("%c", (char)((vtcm.vscatter32[i] >> j * 8) & 0xff));
+            }
+            printf(" ");
+        }
+        printf("\n");
+    }
+}
+
+/* print the gather 32 buffer */
+void print_gather_result_32(void)
+{
+    if (PRINT_DATA) {
+        printf("\n\nPrinting the 32 bit gather result\n");
+
+        for (int i = 0; i < MATRIX_SIZE; i++) {
+            for (int j = 0; j < 4; j++) {
+                printf("%c", (char)((vtcm.vgather32[i] >> j * 8) & 0xff));
+            }
+            printf(" ");
+        }
+        printf("\n");
+    }
+}
+
+/* print the scatter16_32 buffer */
+void print_scatter16_32_buffer(void)
+{
+    if (PRINT_DATA) {
+        printf("\n\nPrinting the 16_32 bit scatter buffer");
+
+        for (int i = 0; i < SCATTER_BUFFER_SIZE; i++) {
+            if ((i % MATRIX_SIZE) == 0) {
+                printf("\n");
+            }
+            for (int j = 0; j < 2; j++) {
+                printf("%c",
+                      (unsigned char)((vtcm.vscatter16_32[i] >> j * 8) & 0xff));
+            }
+            printf(" ");
+        }
+        printf("\n");
+    }
+}
+
+/* print the gather 16_32 buffer */
+void print_gather_result_16_32(void)
+{
+    if (PRINT_DATA) {
+        printf("\n\nPrinting the 16_32 bit gather result\n");
+
+        for (int i = 0; i < MATRIX_SIZE; i++) {
+            for (int j = 0; j < 2; j++) {
+                printf("%c",
+                       (unsigned char)((vtcm.vgather16_32[i] >> j * 8) & 0xff));
+            }
+            printf(" ");
+        }
+        printf("\n");
+    }
+}
+
+int main()
+{
+    prefill_vtcm_scratch();
+
+    /* 16 bit elements with 16 bit offsets */
+    create_offsets_values_preds_16();
+
+    vector_scatter_16();
+    print_scatter16_buffer();
+    check_scatter_16();
+
+    vector_gather_16();
+    print_gather_result_16();
+    check_gather_16();
+
+    vector_gather_16_masked();
+    print_gather_result_16();
+    check_gather_16_masked();
+
+    vector_scatter_16_acc();
+    print_scatter16_buffer();
+    check_scatter_16_acc();
+
+    vector_scatter_16_masked();
+    print_scatter16_buffer();
+    check_scatter_16_masked();
+
+    /* 32 bit elements with 32 bit offsets */
+    create_offsets_values_preds_32();
+
+    vector_scatter_32();
+    print_scatter32_buffer();
+    check_scatter_32();
+
+    vector_gather_32();
+    print_gather_result_32();
+    check_gather_32();
+
+    vector_gather_32_masked();
+    print_gather_result_32();
+    check_gather_32_masked();
+
+    vector_scatter_32_acc();
+    print_scatter32_buffer();
+    check_scatter_32_acc();
+
+    vector_scatter_32_masked();
+    print_scatter32_buffer();
+    check_scatter_32_masked();
+
+    /* 16 bit elements with 32 bit offsets */
+    create_offsets_values_preds_16_32();
+
+    vector_scatter_16_32();
+    print_scatter16_32_buffer();
+    check_scatter_16_32();
+
+    vector_gather_16_32();
+    print_gather_result_16_32();
+    check_gather_16_32();
+
+    vector_gather_16_32_masked();
+    print_gather_result_16_32();
+    check_gather_16_32_masked();
+
+    vector_scatter_16_32_acc();
+    print_scatter16_32_buffer();
+    check_scatter_16_32_acc();
+
+    vector_scatter_16_32_masked();
+    print_scatter16_32_buffer();
+    check_scatter_16_32_masked();
+
+    puts(err ? "FAIL" : "PASS");
+    return err;
+}
diff --git a/tests/tcg/hexagon/Makefile.target b/tests/tcg/hexagon/Makefile.target
index 7708e76..fa1fa57 100644
--- a/tests/tcg/hexagon/Makefile.target
+++ b/tests/tcg/hexagon/Makefile.target
@@ -38,11 +38,13 @@ HEX_TESTS += brev
 HEX_TESTS += load_unpack
 HEX_TESTS += load_align
 HEX_TESTS += vector_add_int
+HEX_TESTS += scatter_gather
 HEX_TESTS += atomics
 HEX_TESTS += fpstuff
 HEX_TESTS += hvx_misc
 
 TESTS += $(HEX_TESTS)
 
+scatter_gather: CFLAGS += -mhvx
 vector_add_int: CFLAGS += -mhvx -fvectorize
 hvx_misc: CFLAGS += -mhvx
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 30/30] Hexagon HVX (tests/tcg/hexagon) histogram test
  2021-09-20 21:23 [PATCH v3 00/30] Hexagon HVX (target/hexagon) patch series Taylor Simpson
                   ` (28 preceding siblings ...)
  2021-09-20 21:24 ` [PATCH v3 29/30] Hexagon HVX (tests/tcg/hexagon) scatter_gather test Taylor Simpson
@ 2021-09-20 21:24 ` Taylor Simpson
  29 siblings, 0 replies; 51+ messages in thread
From: Taylor Simpson @ 2021-09-20 21:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: ale, bcain, tsimpson, richard.henderson, f4bug

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 tests/tcg/hexagon/hvx_histogram_input.h | 717 ++++++++++++++++++++++++++++++++
 tests/tcg/hexagon/hvx_histogram_row.h   |  24 ++
 tests/tcg/hexagon/hvx_histogram.c       |  88 ++++
 tests/tcg/hexagon/Makefile.target       |   5 +
 tests/tcg/hexagon/hvx_histogram_row.S   | 294 +++++++++++++
 5 files changed, 1128 insertions(+)
 create mode 100644 tests/tcg/hexagon/hvx_histogram_input.h
 create mode 100644 tests/tcg/hexagon/hvx_histogram_row.h
 create mode 100644 tests/tcg/hexagon/hvx_histogram.c
 create mode 100644 tests/tcg/hexagon/hvx_histogram_row.S

diff --git a/tests/tcg/hexagon/hvx_histogram_input.h b/tests/tcg/hexagon/hvx_histogram_input.h
new file mode 100644
index 0000000..2f91092
--- /dev/null
+++ b/tests/tcg/hexagon/hvx_histogram_input.h
@@ -0,0 +1,717 @@
+/*
+ *  Copyright(c) 2021 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+    { 0x26, 0x32, 0x2e, 0x2e, 0x2d, 0x2c, 0x2d, 0x2d,
+      0x2c, 0x2e, 0x31, 0x33, 0x36, 0x39, 0x3b, 0x3f,
+      0x42, 0x46, 0x4a, 0x4c, 0x51, 0x53, 0x53, 0x54,
+      0x56, 0x57, 0x58, 0x57, 0x56, 0x52, 0x51, 0x4f,
+      0x4c, 0x49, 0x47, 0x42, 0x3e, 0x3b, 0x38, 0x35,
+      0x33, 0x30, 0x2e, 0x2c, 0x2b, 0x2a, 0x2a, 0x28,
+      0x28, 0x27, 0x27, 0x28, 0x29, 0x2a, 0x2c, 0x2e,
+      0x2f, 0x33, 0x36, 0x38, 0x3c, 0x3d, 0x40, 0x42,
+      0x43, 0x42, 0x43, 0x44, 0x43, 0x41, 0x40, 0x3b,
+      0x3b, 0x3a, 0x38, 0x35, 0x32, 0x2f, 0x2c, 0x29,
+      0x27, 0x26, 0x23, 0x21, 0x1e, 0x1c, 0x1a, 0x19,
+      0x17, 0x15, 0x15, 0x14, 0x13, 0x12, 0x11, 0x10,
+      0x0f, 0x0e, 0x0f, 0x0f, 0x0e, 0x0d, 0x0d, 0x0d,
+      0x0c, 0x0d, 0x0e, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c,
+      0x0c, 0x0c, 0x0d, 0x0c, 0x0f, 0x0e, 0x0f, 0x0f,
+      0x0f, 0x10, 0x11, 0x12, 0x14, 0x16, 0x17, 0x19,
+      0x1c, 0x1d, 0x21, 0x25, 0x27, 0x29, 0x2b, 0x2f,
+      0x31, 0x33, 0x36, 0x38, 0x39, 0x3a, 0x3b, 0x3c,
+      0x3c, 0x3d, 0x3e, 0x3e, 0x3c, 0x3b, 0x3a, 0x39,
+      0x39, 0x3a, 0x3a, 0x3a, 0x3a, 0x3c, 0x3e, 0x43,
+      0x47, 0x4a, 0x4d, 0x51, 0x51, 0x54, 0x56, 0x56,
+      0x57, 0x56, 0x53, 0x4f, 0x4b, 0x47, 0x43, 0x41,
+      0x3e, 0x3c, 0x3a, 0x37, 0x36, 0x33, 0x32, 0x34,
+      0x34, 0x34, 0x34, 0x35, 0x36, 0x39, 0x3d, 0x3d,
+      0x3f, 0x40, 0x40, 0x40, 0x40, 0x3e, 0x40, 0x40,
+      0x42, 0x44, 0x47, 0x48, 0x4b, 0x4e, 0x56, 0x5c,
+      0x62, 0x68, 0x6f, 0x73, 0x76, 0x79, 0x7a, 0x7c,
+      0x7e, 0x7c, 0x78, 0x72, 0x6e, 0x69, 0x65, 0x60,
+      0x5b, 0x56, 0x52, 0x4d, 0x4a, 0x48, 0x47, 0x46,
+      0x44, 0x43, 0x42, 0x41, 0x41, 0x41, 0x40, 0x40,
+      0x3f, 0x3e, 0x3d, 0x3c, 0x3b, 0x3b, 0x38, 0x37,
+      0x36, 0x35, 0x36, 0x35, 0x36, 0x37, 0x38, 0x3c,
+      0x3d, 0x3f, 0x42, 0x44, 0x46, 0x48, 0x4b, 0x4c,
+      0x4e, 0x4e, 0x4d, 0x4c, 0x4a, 0x48, 0x49, 0x49,
+      0x4b, 0x4d, 0x4e, },
+    { 0x23, 0x2d, 0x29, 0x29, 0x28, 0x28, 0x29, 0x29,
+      0x28, 0x2b, 0x2d, 0x2f, 0x32, 0x34, 0x36, 0x3a,
+      0x3d, 0x41, 0x44, 0x47, 0x4a, 0x4c, 0x4e, 0x4e,
+      0x50, 0x51, 0x51, 0x51, 0x4f, 0x4c, 0x4b, 0x48,
+      0x46, 0x44, 0x40, 0x3d, 0x39, 0x36, 0x34, 0x30,
+      0x2f, 0x2d, 0x2a, 0x29, 0x28, 0x27, 0x26, 0x25,
+      0x25, 0x24, 0x24, 0x24, 0x26, 0x28, 0x28, 0x2a,
+      0x2b, 0x2e, 0x32, 0x34, 0x37, 0x39, 0x3b, 0x3c,
+      0x3d, 0x3d, 0x3e, 0x3e, 0x3e, 0x3c, 0x3b, 0x38,
+      0x37, 0x35, 0x33, 0x30, 0x2e, 0x2b, 0x27, 0x25,
+      0x24, 0x21, 0x20, 0x1d, 0x1b, 0x1a, 0x18, 0x16,
+      0x15, 0x14, 0x13, 0x12, 0x10, 0x11, 0x10, 0x0e,
+      0x0e, 0x0d, 0x0d, 0x0d, 0x0d, 0x0c, 0x0c, 0x0b,
+      0x0b, 0x0b, 0x0c, 0x0b, 0x0b, 0x09, 0x0a, 0x0b,
+      0x0b, 0x0a, 0x0a, 0x0c, 0x0c, 0x0c, 0x0d, 0x0e,
+      0x0e, 0x0f, 0x0f, 0x11, 0x12, 0x15, 0x15, 0x17,
+      0x1a, 0x1c, 0x1f, 0x22, 0x25, 0x26, 0x29, 0x2a,
+      0x2d, 0x30, 0x33, 0x34, 0x35, 0x35, 0x37, 0x37,
+      0x39, 0x3a, 0x39, 0x38, 0x37, 0x36, 0x36, 0x37,
+      0x35, 0x36, 0x35, 0x35, 0x36, 0x37, 0x3a, 0x3e,
+      0x40, 0x43, 0x48, 0x49, 0x4b, 0x4c, 0x4d, 0x4e,
+      0x4f, 0x4f, 0x4c, 0x48, 0x45, 0x41, 0x3e, 0x3b,
+      0x3a, 0x37, 0x36, 0x33, 0x32, 0x31, 0x30, 0x31,
+      0x32, 0x31, 0x31, 0x31, 0x31, 0x34, 0x37, 0x38,
+      0x3a, 0x3b, 0x3b, 0x3b, 0x3c, 0x3b, 0x3d, 0x3e,
+      0x3f, 0x40, 0x43, 0x44, 0x47, 0x4b, 0x4f, 0x56,
+      0x5a, 0x60, 0x66, 0x69, 0x6a, 0x6e, 0x71, 0x72,
+      0x73, 0x72, 0x6d, 0x69, 0x66, 0x60, 0x5c, 0x59,
+      0x54, 0x50, 0x4d, 0x48, 0x46, 0x44, 0x44, 0x43,
+      0x42, 0x41, 0x41, 0x40, 0x3f, 0x3f, 0x3e, 0x3d,
+      0x3d, 0x3d, 0x3c, 0x3a, 0x39, 0x38, 0x35, 0x35,
+      0x34, 0x34, 0x35, 0x34, 0x35, 0x36, 0x39, 0x3c,
+      0x3d, 0x3e, 0x41, 0x43, 0x44, 0x46, 0x48, 0x49,
+      0x4a, 0x49, 0x48, 0x47, 0x45, 0x43, 0x43, 0x44,
+      0x45, 0x47, 0x48, },
+    { 0x23, 0x2d, 0x2a, 0x2a, 0x29, 0x29, 0x2a, 0x2a,
+      0x29, 0x2c, 0x2d, 0x2f, 0x32, 0x34, 0x36, 0x3a,
+      0x3d, 0x40, 0x44, 0x48, 0x4a, 0x4c, 0x4e, 0x4e,
+      0x50, 0x51, 0x51, 0x51, 0x4f, 0x4c, 0x4b, 0x48,
+      0x46, 0x44, 0x40, 0x3d, 0x39, 0x36, 0x34, 0x30,
+      0x2f, 0x2d, 0x2a, 0x29, 0x28, 0x27, 0x26, 0x25,
+      0x25, 0x24, 0x24, 0x25, 0x26, 0x28, 0x29, 0x2a,
+      0x2b, 0x2e, 0x31, 0x34, 0x37, 0x39, 0x3b, 0x3c,
+      0x3d, 0x3e, 0x3e, 0x3d, 0x3e, 0x3c, 0x3c, 0x3a,
+      0x37, 0x35, 0x33, 0x30, 0x2f, 0x2b, 0x28, 0x26,
+      0x24, 0x21, 0x20, 0x1e, 0x1c, 0x1b, 0x18, 0x17,
+      0x16, 0x14, 0x13, 0x12, 0x10, 0x10, 0x0f, 0x0e,
+      0x0f, 0x0e, 0x0d, 0x0d, 0x0d, 0x0d, 0x0d, 0x0c,
+      0x0b, 0x0b, 0x0c, 0x0c, 0x0c, 0x0b, 0x0b, 0x0c,
+      0x0c, 0x0b, 0x0b, 0x0c, 0x0d, 0x0c, 0x0e, 0x0e,
+      0x0e, 0x0f, 0x11, 0x11, 0x13, 0x14, 0x16, 0x18,
+      0x1a, 0x1d, 0x1f, 0x22, 0x25, 0x26, 0x29, 0x2b,
+      0x2d, 0x31, 0x33, 0x34, 0x36, 0x37, 0x38, 0x38,
+      0x39, 0x3a, 0x39, 0x38, 0x37, 0x36, 0x37, 0x37,
+      0x35, 0x36, 0x35, 0x36, 0x35, 0x38, 0x3a, 0x3e,
+      0x40, 0x41, 0x45, 0x47, 0x49, 0x4a, 0x4c, 0x4d,
+      0x4e, 0x4d, 0x4a, 0x47, 0x44, 0x40, 0x3d, 0x3b,
+      0x39, 0x37, 0x34, 0x34, 0x32, 0x31, 0x31, 0x33,
+      0x32, 0x31, 0x32, 0x33, 0x32, 0x36, 0x38, 0x39,
+      0x3b, 0x3c, 0x3c, 0x3c, 0x3d, 0x3d, 0x3e, 0x3e,
+      0x41, 0x42, 0x43, 0x45, 0x48, 0x4c, 0x50, 0x56,
+      0x5b, 0x5f, 0x62, 0x67, 0x69, 0x6c, 0x6e, 0x6e,
+      0x70, 0x6f, 0x6b, 0x67, 0x63, 0x5e, 0x5b, 0x58,
+      0x54, 0x51, 0x4e, 0x4a, 0x48, 0x46, 0x46, 0x46,
+      0x45, 0x46, 0x44, 0x43, 0x44, 0x43, 0x42, 0x42,
+      0x41, 0x40, 0x3f, 0x3e, 0x3c, 0x3b, 0x3a, 0x39,
+      0x39, 0x39, 0x38, 0x37, 0x37, 0x3a, 0x3e, 0x40,
+      0x42, 0x43, 0x47, 0x47, 0x48, 0x4a, 0x4b, 0x4c,
+      0x4c, 0x4b, 0x4a, 0x48, 0x46, 0x44, 0x43, 0x45,
+      0x45, 0x46, 0x47, },
+    { 0x21, 0x2b, 0x28, 0x28, 0x28, 0x28, 0x29, 0x29,
+      0x28, 0x2a, 0x2d, 0x30, 0x32, 0x34, 0x37, 0x3a,
+      0x3c, 0x40, 0x44, 0x48, 0x4a, 0x4c, 0x4e, 0x4e,
+      0x50, 0x51, 0x52, 0x51, 0x4f, 0x4b, 0x4b, 0x48,
+      0x45, 0x43, 0x3f, 0x3c, 0x39, 0x36, 0x33, 0x30,
+      0x2f, 0x2d, 0x2b, 0x2a, 0x28, 0x27, 0x26, 0x25,
+      0x24, 0x24, 0x24, 0x25, 0x27, 0x27, 0x29, 0x2a,
+      0x2c, 0x2d, 0x31, 0x34, 0x37, 0x39, 0x3b, 0x3c,
+      0x3d, 0x3e, 0x3e, 0x3e, 0x3e, 0x3d, 0x3c, 0x3a,
+      0x37, 0x35, 0x33, 0x30, 0x2f, 0x2b, 0x28, 0x26,
+      0x25, 0x21, 0x20, 0x1e, 0x1c, 0x19, 0x19, 0x18,
+      0x17, 0x15, 0x15, 0x12, 0x11, 0x11, 0x11, 0x0f,
+      0x0e, 0x0e, 0x0e, 0x0e, 0x0d, 0x0d, 0x0d, 0x0c,
+      0x0c, 0x0c, 0x0b, 0x0b, 0x0b, 0x0b, 0x0b, 0x0b,
+      0x0c, 0x0c, 0x0c, 0x0c, 0x0e, 0x0e, 0x0f, 0x0f,
+      0x0f, 0x10, 0x11, 0x13, 0x13, 0x15, 0x16, 0x18,
+      0x1a, 0x1c, 0x1f, 0x22, 0x25, 0x28, 0x29, 0x2d,
+      0x2f, 0x32, 0x34, 0x35, 0x36, 0x37, 0x38, 0x38,
+      0x39, 0x3a, 0x39, 0x39, 0x37, 0x36, 0x37, 0x36,
+      0x35, 0x35, 0x37, 0x35, 0x36, 0x37, 0x3a, 0x3d,
+      0x3e, 0x41, 0x43, 0x46, 0x46, 0x47, 0x48, 0x49,
+      0x4a, 0x49, 0x47, 0x45, 0x42, 0x3f, 0x3d, 0x3b,
+      0x3a, 0x38, 0x36, 0x34, 0x32, 0x32, 0x32, 0x32,
+      0x32, 0x31, 0x33, 0x32, 0x34, 0x37, 0x38, 0x38,
+      0x3a, 0x3b, 0x3d, 0x3d, 0x3d, 0x3e, 0x3f, 0x41,
+      0x42, 0x44, 0x44, 0x46, 0x49, 0x4d, 0x50, 0x54,
+      0x58, 0x5c, 0x61, 0x63, 0x65, 0x69, 0x6a, 0x6c,
+      0x6d, 0x6c, 0x68, 0x64, 0x61, 0x5c, 0x59, 0x57,
+      0x53, 0x51, 0x4f, 0x4c, 0x4a, 0x48, 0x48, 0x49,
+      0x49, 0x48, 0x48, 0x48, 0x47, 0x47, 0x46, 0x46,
+      0x45, 0x44, 0x42, 0x41, 0x3f, 0x3e, 0x3c, 0x3c,
+      0x3c, 0x3d, 0x3c, 0x3c, 0x3c, 0x3e, 0x41, 0x43,
+      0x46, 0x48, 0x4a, 0x4b, 0x4c, 0x4d, 0x4e, 0x4e,
+      0x4e, 0x4d, 0x4b, 0x49, 0x47, 0x44, 0x44, 0x45,
+      0x45, 0x45, 0x46, },
+    { 0x22, 0x2b, 0x27, 0x27, 0x27, 0x27, 0x28, 0x28,
+      0x28, 0x2a, 0x2c, 0x2f, 0x30, 0x34, 0x37, 0x3b,
+      0x3d, 0x41, 0x45, 0x48, 0x4a, 0x4c, 0x4e, 0x4e,
+      0x50, 0x51, 0x52, 0x51, 0x4f, 0x4b, 0x4b, 0x47,
+      0x45, 0x43, 0x3f, 0x3c, 0x39, 0x36, 0x33, 0x30,
+      0x2f, 0x2d, 0x2b, 0x2a, 0x27, 0x26, 0x25, 0x24,
+      0x23, 0x24, 0x24, 0x25, 0x27, 0x27, 0x29, 0x2a,
+      0x2c, 0x2e, 0x31, 0x34, 0x37, 0x39, 0x3a, 0x3b,
+      0x3d, 0x3e, 0x3e, 0x3f, 0x3f, 0x3d, 0x3c, 0x3a,
+      0x38, 0x36, 0x34, 0x31, 0x2e, 0x2c, 0x29, 0x26,
+      0x25, 0x22, 0x20, 0x1e, 0x1c, 0x1a, 0x19, 0x18,
+      0x16, 0x15, 0x14, 0x12, 0x10, 0x11, 0x11, 0x0f,
+      0x0e, 0x0e, 0x0e, 0x0e, 0x0d, 0x0c, 0x0d, 0x0c,
+      0x0c, 0x0c, 0x0b, 0x0b, 0x0b, 0x0b, 0x0b, 0x0b,
+      0x0c, 0x0c, 0x0c, 0x0d, 0x0d, 0x0e, 0x0f, 0x0f,
+      0x0f, 0x10, 0x11, 0x13, 0x13, 0x15, 0x15, 0x18,
+      0x19, 0x1d, 0x1f, 0x21, 0x24, 0x27, 0x2a, 0x2c,
+      0x30, 0x33, 0x35, 0x36, 0x37, 0x38, 0x39, 0x39,
+      0x3a, 0x3a, 0x39, 0x39, 0x37, 0x36, 0x37, 0x36,
+      0x36, 0x36, 0x36, 0x36, 0x36, 0x37, 0x39, 0x3a,
+      0x3d, 0x3e, 0x41, 0x43, 0x43, 0x45, 0x46, 0x46,
+      0x47, 0x46, 0x44, 0x42, 0x40, 0x3d, 0x3a, 0x39,
+      0x37, 0x36, 0x35, 0x34, 0x33, 0x32, 0x32, 0x32,
+      0x32, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37, 0x38,
+      0x39, 0x3c, 0x3c, 0x3e, 0x3e, 0x3e, 0x41, 0x43,
+      0x44, 0x45, 0x46, 0x48, 0x49, 0x4c, 0x51, 0x54,
+      0x56, 0x5a, 0x5f, 0x61, 0x63, 0x65, 0x67, 0x69,
+      0x6a, 0x69, 0x67, 0x61, 0x5f, 0x5b, 0x58, 0x56,
+      0x54, 0x51, 0x50, 0x4e, 0x4c, 0x4a, 0x4b, 0x4c,
+      0x4c, 0x4b, 0x4b, 0x4b, 0x4b, 0x49, 0x4a, 0x49,
+      0x49, 0x48, 0x46, 0x44, 0x42, 0x41, 0x40, 0x3f,
+      0x3f, 0x40, 0x40, 0x40, 0x40, 0x42, 0x46, 0x49,
+      0x4b, 0x4c, 0x4f, 0x4f, 0x50, 0x52, 0x51, 0x51,
+      0x50, 0x4f, 0x4c, 0x4a, 0x48, 0x46, 0x45, 0x44,
+      0x44, 0x45, 0x46, },
+    { 0x21, 0x2a, 0x27, 0x27, 0x27, 0x27, 0x27, 0x27,
+      0x27, 0x29, 0x2d, 0x2f, 0x31, 0x34, 0x37, 0x3b,
+      0x3e, 0x41, 0x45, 0x48, 0x4a, 0x4c, 0x4e, 0x4e,
+      0x50, 0x51, 0x52, 0x51, 0x4f, 0x4b, 0x4b, 0x48,
+      0x45, 0x43, 0x3f, 0x3c, 0x39, 0x36, 0x33, 0x2f,
+      0x2f, 0x2d, 0x2a, 0x2a, 0x27, 0x26, 0x25, 0x24,
+      0x22, 0x24, 0x24, 0x25, 0x27, 0x27, 0x29, 0x2a,
+      0x2c, 0x2f, 0x31, 0x34, 0x37, 0x39, 0x3a, 0x3c,
+      0x3d, 0x3e, 0x3f, 0x40, 0x3f, 0x3d, 0x3d, 0x3a,
+      0x38, 0x36, 0x34, 0x31, 0x2e, 0x2c, 0x29, 0x26,
+      0x25, 0x22, 0x21, 0x1f, 0x1d, 0x1b, 0x19, 0x18,
+      0x16, 0x14, 0x14, 0x13, 0x11, 0x11, 0x11, 0x0f,
+      0x0f, 0x0f, 0x0e, 0x0e, 0x0d, 0x0d, 0x0d, 0x0d,
+      0x0d, 0x0d, 0x0c, 0x0b, 0x0b, 0x0b, 0x0b, 0x0c,
+      0x0c, 0x0d, 0x0d, 0x0d, 0x0e, 0x0e, 0x0f, 0x0f,
+      0x0f, 0x10, 0x13, 0x13, 0x14, 0x15, 0x17, 0x19,
+      0x1a, 0x1d, 0x1f, 0x22, 0x25, 0x27, 0x2a, 0x2e,
+      0x31, 0x33, 0x35, 0x38, 0x39, 0x3a, 0x3b, 0x3b,
+      0x3c, 0x3c, 0x3b, 0x3a, 0x39, 0x38, 0x38, 0x37,
+      0x36, 0x36, 0x37, 0x36, 0x37, 0x38, 0x38, 0x3a,
+      0x3b, 0x3e, 0x40, 0x40, 0x41, 0x42, 0x43, 0x42,
+      0x43, 0x42, 0x40, 0x40, 0x3f, 0x3c, 0x3b, 0x39,
+      0x38, 0x37, 0x36, 0x35, 0x34, 0x33, 0x32, 0x33,
+      0x32, 0x32, 0x34, 0x35, 0x35, 0x36, 0x39, 0x39,
+      0x3a, 0x3c, 0x3c, 0x3f, 0x40, 0x41, 0x43, 0x45,
+      0x45, 0x47, 0x48, 0x4a, 0x4b, 0x4d, 0x50, 0x53,
+      0x56, 0x59, 0x5c, 0x5f, 0x60, 0x65, 0x64, 0x66,
+      0x68, 0x66, 0x64, 0x61, 0x5e, 0x5a, 0x59, 0x56,
+      0x54, 0x52, 0x51, 0x50, 0x4e, 0x4c, 0x4d, 0x4f,
+      0x4f, 0x4f, 0x50, 0x50, 0x4f, 0x4f, 0x4e, 0x4d,
+      0x4c, 0x4b, 0x49, 0x47, 0x45, 0x44, 0x43, 0x43,
+      0x42, 0x43, 0x44, 0x44, 0x46, 0x47, 0x49, 0x4d,
+      0x4f, 0x51, 0x53, 0x54, 0x53, 0x54, 0x54, 0x53,
+      0x53, 0x51, 0x4e, 0x4b, 0x4a, 0x47, 0x45, 0x44,
+      0x44, 0x45, 0x46, },
+    { 0x20, 0x28, 0x26, 0x26, 0x25, 0x24, 0x27, 0x27,
+      0x27, 0x29, 0x2c, 0x2e, 0x31, 0x34, 0x37, 0x3b,
+      0x3e, 0x41, 0x45, 0x48, 0x4a, 0x4c, 0x4e, 0x4e,
+      0x50, 0x51, 0x52, 0x51, 0x4f, 0x4b, 0x4a, 0x49,
+      0x45, 0x43, 0x3f, 0x3c, 0x3a, 0x36, 0x33, 0x30,
+      0x2f, 0x2d, 0x2a, 0x28, 0x27, 0x26, 0x25, 0x24,
+      0x23, 0x24, 0x24, 0x25, 0x27, 0x27, 0x29, 0x2a,
+      0x2c, 0x2e, 0x31, 0x34, 0x37, 0x39, 0x3b, 0x3c,
+      0x3d, 0x3e, 0x3f, 0x40, 0x3e, 0x3d, 0x3d, 0x3a,
+      0x38, 0x36, 0x34, 0x31, 0x2f, 0x2c, 0x29, 0x27,
+      0x25, 0x21, 0x21, 0x1f, 0x1c, 0x1d, 0x19, 0x18,
+      0x16, 0x15, 0x15, 0x13, 0x12, 0x11, 0x11, 0x0f,
+      0x0f, 0x0e, 0x0f, 0x0f, 0x0e, 0x0d, 0x0d, 0x0d,
+      0x0c, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c,
+      0x0d, 0x0d, 0x0d, 0x0e, 0x0e, 0x0e, 0x0f, 0x10,
+      0x10, 0x10, 0x12, 0x13, 0x15, 0x16, 0x18, 0x1a,
+      0x1c, 0x1d, 0x20, 0x22, 0x25, 0x27, 0x2a, 0x2e,
+      0x30, 0x34, 0x38, 0x39, 0x3a, 0x3b, 0x3b, 0x3b,
+      0x3c, 0x3d, 0x3c, 0x3b, 0x3a, 0x39, 0x38, 0x37,
+      0x36, 0x36, 0x38, 0x37, 0x37, 0x37, 0x38, 0x3a,
+      0x3b, 0x3c, 0x3d, 0x3e, 0x3f, 0x40, 0x40, 0x40,
+      0x42, 0x40, 0x3f, 0x3e, 0x3d, 0x3b, 0x3a, 0x39,
+      0x37, 0x36, 0x36, 0x35, 0x34, 0x34, 0x33, 0x33,
+      0x33, 0x34, 0x35, 0x35, 0x35, 0x36, 0x38, 0x39,
+      0x3a, 0x3b, 0x3d, 0x3f, 0x42, 0x43, 0x45, 0x45,
+      0x46, 0x48, 0x49, 0x4b, 0x4b, 0x4d, 0x50, 0x53,
+      0x56, 0x57, 0x5a, 0x5c, 0x5e, 0x61, 0x63, 0x65,
+      0x66, 0x64, 0x62, 0x5f, 0x5c, 0x59, 0x58, 0x56,
+      0x55, 0x54, 0x52, 0x51, 0x50, 0x51, 0x51, 0x52,
+      0x52, 0x52, 0x52, 0x52, 0x51, 0x51, 0x51, 0x50,
+      0x4f, 0x4e, 0x4c, 0x4a, 0x47, 0x46, 0x45, 0x45,
+      0x45, 0x46, 0x46, 0x46, 0x4a, 0x4c, 0x4d, 0x52,
+      0x54, 0x56, 0x58, 0x58, 0x56, 0x57, 0x57, 0x56,
+      0x55, 0x53, 0x50, 0x4d, 0x49, 0x45, 0x44, 0x44,
+      0x43, 0x44, 0x45, },
+    { 0x1f, 0x27, 0x24, 0x23, 0x25, 0x24, 0x25, 0x26,
+      0x26, 0x28, 0x2b, 0x2e, 0x31, 0x34, 0x37, 0x3a,
+      0x3d, 0x41, 0x45, 0x48, 0x4b, 0x4d, 0x4f, 0x4e,
+      0x50, 0x51, 0x52, 0x50, 0x4f, 0x4b, 0x4a, 0x49,
+      0x45, 0x43, 0x3f, 0x3c, 0x3a, 0x36, 0x33, 0x30,
+      0x2f, 0x2d, 0x29, 0x28, 0x27, 0x26, 0x25, 0x24,
+      0x23, 0x25, 0x24, 0x25, 0x27, 0x27, 0x29, 0x2a,
+      0x2c, 0x2f, 0x32, 0x34, 0x37, 0x39, 0x3b, 0x3c,
+      0x3e, 0x3f, 0x3f, 0x40, 0x3e, 0x3d, 0x3c, 0x3a,
+      0x38, 0x36, 0x34, 0x31, 0x30, 0x2c, 0x29, 0x28,
+      0x25, 0x23, 0x22, 0x1f, 0x1c, 0x1c, 0x18, 0x18,
+      0x16, 0x14, 0x14, 0x13, 0x11, 0x11, 0x11, 0x0f,
+      0x0f, 0x0e, 0x0f, 0x0f, 0x0e, 0x0d, 0x0d, 0x0d,
+      0x0c, 0x0c, 0x0b, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c,
+      0x0d, 0x0e, 0x0e, 0x0f, 0x0d, 0x0f, 0x10, 0x10,
+      0x10, 0x11, 0x13, 0x14, 0x15, 0x16, 0x19, 0x1a,
+      0x1c, 0x1f, 0x20, 0x23, 0x26, 0x28, 0x2a, 0x2e,
+      0x31, 0x35, 0x38, 0x39, 0x3a, 0x3c, 0x3d, 0x3d,
+      0x3e, 0x3e, 0x3d, 0x3c, 0x3a, 0x3a, 0x39, 0x39,
+      0x38, 0x37, 0x38, 0x38, 0x37, 0x38, 0x39, 0x3a,
+      0x3c, 0x3c, 0x3d, 0x3e, 0x3f, 0x3f, 0x40, 0x3f,
+      0x41, 0x40, 0x3e, 0x3e, 0x3d, 0x3b, 0x3b, 0x39,
+      0x37, 0x37, 0x35, 0x36, 0x34, 0x34, 0x34, 0x35,
+      0x35, 0x34, 0x34, 0x35, 0x35, 0x37, 0x38, 0x39,
+      0x3a, 0x3c, 0x3f, 0x3f, 0x43, 0x43, 0x45, 0x47,
+      0x48, 0x48, 0x4a, 0x4b, 0x4e, 0x4d, 0x51, 0x53,
+      0x56, 0x58, 0x59, 0x5b, 0x5d, 0x60, 0x62, 0x63,
+      0x64, 0x63, 0x61, 0x5e, 0x5c, 0x5a, 0x57, 0x56,
+      0x55, 0x54, 0x53, 0x52, 0x51, 0x51, 0x52, 0x52,
+      0x54, 0x54, 0x55, 0x55, 0x55, 0x54, 0x54, 0x53,
+      0x52, 0x50, 0x4e, 0x4d, 0x4b, 0x4a, 0x48, 0x48,
+      0x48, 0x48, 0x4a, 0x4b, 0x4d, 0x4f, 0x52, 0x55,
+      0x58, 0x5a, 0x5b, 0x5b, 0x5b, 0x5b, 0x5a, 0x59,
+      0x58, 0x55, 0x51, 0x4e, 0x4a, 0x46, 0x45, 0x44,
+      0x44, 0x44, 0x44, },
+    { 0x1e, 0x26, 0x23, 0x23, 0x25, 0x24, 0x25, 0x26,
+      0x26, 0x28, 0x2b, 0x2e, 0x31, 0x34, 0x37, 0x3a,
+      0x3e, 0x42, 0x45, 0x48, 0x4b, 0x4d, 0x4f, 0x4f,
+      0x50, 0x51, 0x52, 0x50, 0x4f, 0x4b, 0x4a, 0x48,
+      0x46, 0x44, 0x3f, 0x3b, 0x39, 0x36, 0x33, 0x30,
+      0x2f, 0x2d, 0x2a, 0x28, 0x27, 0x26, 0x25, 0x24,
+      0x23, 0x24, 0x24, 0x25, 0x27, 0x27, 0x29, 0x2a,
+      0x2c, 0x2f, 0x32, 0x34, 0x37, 0x39, 0x3b, 0x3d,
+      0x3e, 0x3f, 0x41, 0x41, 0x40, 0x3e, 0x3d, 0x3b,
+      0x38, 0x37, 0x34, 0x32, 0x30, 0x2c, 0x2a, 0x27,
+      0x26, 0x23, 0x22, 0x20, 0x1d, 0x1b, 0x1a, 0x19,
+      0x17, 0x15, 0x15, 0x13, 0x12, 0x12, 0x11, 0x0f,
+      0x11, 0x0f, 0x0e, 0x0e, 0x0d, 0x0d, 0x0d, 0x0c,
+      0x0d, 0x0d, 0x0d, 0x0d, 0x0d, 0x0d, 0x0d, 0x0d,
+      0x0e, 0x0e, 0x0e, 0x0f, 0x10, 0x10, 0x11, 0x11,
+      0x11, 0x13, 0x16, 0x15, 0x15, 0x18, 0x1a, 0x1b,
+      0x1d, 0x20, 0x22, 0x24, 0x27, 0x29, 0x2c, 0x30,
+      0x33, 0x37, 0x3a, 0x3b, 0x3c, 0x3d, 0x3e, 0x3e,
+      0x40, 0x40, 0x40, 0x3f, 0x3e, 0x3d, 0x3c, 0x3a,
+      0x3a, 0x3a, 0x3a, 0x3a, 0x3a, 0x3a, 0x3b, 0x3d,
+      0x3d, 0x3f, 0x40, 0x40, 0x3f, 0x41, 0x41, 0x41,
+      0x41, 0x41, 0x40, 0x40, 0x3f, 0x3e, 0x3c, 0x3b,
+      0x3a, 0x39, 0x37, 0x36, 0x36, 0x35, 0x35, 0x36,
+      0x36, 0x35, 0x35, 0x36, 0x36, 0x38, 0x39, 0x39,
+      0x3b, 0x3c, 0x3e, 0x40, 0x41, 0x43, 0x45, 0x47,
+      0x48, 0x48, 0x4b, 0x4c, 0x4d, 0x4f, 0x51, 0x53,
+      0x56, 0x56, 0x59, 0x5b, 0x5d, 0x5f, 0x61, 0x62,
+      0x63, 0x63, 0x61, 0x5e, 0x5c, 0x5a, 0x59, 0x57,
+      0x56, 0x54, 0x54, 0x53, 0x52, 0x53, 0x53, 0x55,
+      0x56, 0x56, 0x57, 0x57, 0x57, 0x57, 0x56, 0x56,
+      0x55, 0x53, 0x51, 0x4f, 0x4d, 0x4b, 0x49, 0x4b,
+      0x4b, 0x4c, 0x4d, 0x4e, 0x51, 0x53, 0x55, 0x58,
+      0x5b, 0x5c, 0x60, 0x60, 0x5f, 0x5e, 0x5d, 0x5c,
+      0x5a, 0x57, 0x53, 0x4f, 0x4b, 0x46, 0x45, 0x44,
+      0x44, 0x44, 0x44, },
+    { 0x1d, 0x25, 0x22, 0x22, 0x23, 0x23, 0x24, 0x25,
+      0x25, 0x28, 0x2b, 0x2e, 0x31, 0x34, 0x37, 0x3a,
+      0x3e, 0x42, 0x45, 0x48, 0x4b, 0x4d, 0x4f, 0x4f,
+      0x50, 0x51, 0x52, 0x50, 0x4f, 0x4b, 0x4a, 0x47,
+      0x45, 0x43, 0x3f, 0x3c, 0x38, 0x35, 0x33, 0x30,
+      0x2f, 0x2d, 0x2a, 0x28, 0x27, 0x26, 0x25, 0x24,
+      0x23, 0x24, 0x24, 0x25, 0x27, 0x27, 0x29, 0x2a,
+      0x2b, 0x2f, 0x32, 0x34, 0x37, 0x39, 0x3c, 0x3d,
+      0x3e, 0x3f, 0x40, 0x41, 0x40, 0x3e, 0x3d, 0x3b,
+      0x39, 0x36, 0x34, 0x32, 0x30, 0x2d, 0x2a, 0x26,
+      0x26, 0x24, 0x22, 0x1f, 0x1d, 0x1c, 0x1a, 0x19,
+      0x18, 0x16, 0x15, 0x14, 0x12, 0x12, 0x12, 0x10,
+      0x10, 0x0f, 0x0e, 0x10, 0x0e, 0x0e, 0x0d, 0x0c,
+      0x0d, 0x0d, 0x0d, 0x0d, 0x0d, 0x0e, 0x0d, 0x0e,
+      0x0f, 0x0f, 0x0f, 0x10, 0x11, 0x11, 0x11, 0x12,
+      0x13, 0x14, 0x16, 0x16, 0x18, 0x1a, 0x1b, 0x1c,
+      0x1e, 0x21, 0x23, 0x25, 0x28, 0x2a, 0x2e, 0x32,
+      0x34, 0x38, 0x3a, 0x3c, 0x3d, 0x3f, 0x40, 0x42,
+      0x43, 0x43, 0x43, 0x42, 0x40, 0x3e, 0x3e, 0x3c,
+      0x3b, 0x3b, 0x3c, 0x3a, 0x3b, 0x3b, 0x3e, 0x3e,
+      0x40, 0x3f, 0x41, 0x41, 0x41, 0x42, 0x42, 0x43,
+      0x42, 0x41, 0x41, 0x41, 0x40, 0x3e, 0x3d, 0x3c,
+      0x3b, 0x3a, 0x39, 0x37, 0x36, 0x35, 0x36, 0x37,
+      0x35, 0x36, 0x36, 0x37, 0x38, 0x39, 0x3a, 0x3b,
+      0x3b, 0x3d, 0x3e, 0x40, 0x41, 0x41, 0x44, 0x46,
+      0x48, 0x48, 0x4a, 0x4c, 0x4d, 0x4f, 0x51, 0x53,
+      0x55, 0x57, 0x59, 0x5a, 0x5b, 0x5e, 0x5f, 0x61,
+      0x62, 0x61, 0x60, 0x5e, 0x5c, 0x5a, 0x59, 0x58,
+      0x56, 0x55, 0x54, 0x53, 0x53, 0x54, 0x54, 0x55,
+      0x57, 0x57, 0x58, 0x59, 0x5a, 0x58, 0x59, 0x58,
+      0x57, 0x55, 0x53, 0x52, 0x4f, 0x4e, 0x4d, 0x4d,
+      0x4d, 0x4f, 0x51, 0x50, 0x54, 0x56, 0x59, 0x5c,
+      0x5f, 0x61, 0x64, 0x64, 0x63, 0x61, 0x5e, 0x5e,
+      0x5c, 0x59, 0x54, 0x50, 0x4c, 0x46, 0x45, 0x44,
+      0x44, 0x44, 0x44, },
+    { 0x1c, 0x24, 0x21, 0x21, 0x21, 0x22, 0x23, 0x23,
+      0x25, 0x27, 0x2a, 0x2e, 0x31, 0x33, 0x37, 0x3b,
+      0x3e, 0x42, 0x45, 0x48, 0x4b, 0x4c, 0x50, 0x4f,
+      0x50, 0x51, 0x52, 0x50, 0x4e, 0x4b, 0x4a, 0x49,
+      0x45, 0x42, 0x3f, 0x3c, 0x38, 0x35, 0x33, 0x30,
+      0x2f, 0x2d, 0x2a, 0x28, 0x27, 0x26, 0x25, 0x24,
+      0x23, 0x24, 0x24, 0x25, 0x27, 0x27, 0x29, 0x2a,
+      0x2b, 0x2f, 0x32, 0x34, 0x38, 0x39, 0x3c, 0x3d,
+      0x3e, 0x3e, 0x40, 0x41, 0x40, 0x3e, 0x3c, 0x3a,
+      0x39, 0x37, 0x35, 0x33, 0x30, 0x2d, 0x2b, 0x28,
+      0x26, 0x23, 0x23, 0x20, 0x1e, 0x1b, 0x19, 0x19,
+      0x17, 0x16, 0x15, 0x14, 0x12, 0x12, 0x11, 0x10,
+      0x0f, 0x0e, 0x0e, 0x10, 0x0e, 0x0d, 0x0c, 0x0c,
+      0x0c, 0x0d, 0x0d, 0x0d, 0x0d, 0x0e, 0x0d, 0x0e,
+      0x0f, 0x0f, 0x0f, 0x10, 0x11, 0x11, 0x12, 0x14,
+      0x14, 0x14, 0x16, 0x18, 0x19, 0x1b, 0x1c, 0x1e,
+      0x20, 0x23, 0x26, 0x27, 0x29, 0x2c, 0x2f, 0x33,
+      0x36, 0x38, 0x3b, 0x3e, 0x3e, 0x42, 0x43, 0x46,
+      0x46, 0x46, 0x46, 0x44, 0x42, 0x41, 0x3f, 0x3e,
+      0x3d, 0x3d, 0x3e, 0x3d, 0x3d, 0x3e, 0x3e, 0x40,
+      0x40, 0x40, 0x43, 0x43, 0x42, 0x43, 0x45, 0x43,
+      0x43, 0x43, 0x42, 0x42, 0x41, 0x40, 0x40, 0x3e,
+      0x3c, 0x3a, 0x3a, 0x38, 0x36, 0x36, 0x36, 0x36,
+      0x37, 0x37, 0x36, 0x38, 0x38, 0x39, 0x3b, 0x3b,
+      0x3e, 0x3e, 0x3e, 0x40, 0x41, 0x43, 0x45, 0x46,
+      0x46, 0x49, 0x4c, 0x4c, 0x4d, 0x4f, 0x51, 0x54,
+      0x56, 0x57, 0x58, 0x5a, 0x5c, 0x5e, 0x60, 0x60,
+      0x61, 0x61, 0x60, 0x5f, 0x5c, 0x5a, 0x59, 0x58,
+      0x57, 0x57, 0x55, 0x54, 0x53, 0x55, 0x55, 0x58,
+      0x58, 0x59, 0x5a, 0x5a, 0x5a, 0x5b, 0x5b, 0x5b,
+      0x5a, 0x59, 0x56, 0x54, 0x53, 0x4e, 0x4e, 0x50,
+      0x50, 0x51, 0x52, 0x52, 0x57, 0x59, 0x5d, 0x60,
+      0x63, 0x63, 0x66, 0x66, 0x66, 0x64, 0x63, 0x61,
+      0x60, 0x5b, 0x55, 0x51, 0x4d, 0x48, 0x45, 0x44,
+      0x43, 0x43, 0x43, },
+    { 0x1b, 0x23, 0x20, 0x21, 0x22, 0x22, 0x23, 0x24,
+      0x26, 0x27, 0x2a, 0x2e, 0x31, 0x33, 0x37, 0x3b,
+      0x3d, 0x42, 0x46, 0x49, 0x4a, 0x4c, 0x4f, 0x4f,
+      0x50, 0x50, 0x52, 0x50, 0x4e, 0x4b, 0x4b, 0x49,
+      0x45, 0x42, 0x3e, 0x3c, 0x38, 0x35, 0x33, 0x30,
+      0x2f, 0x2d, 0x2a, 0x28, 0x27, 0x26, 0x25, 0x24,
+      0x23, 0x24, 0x24, 0x25, 0x27, 0x27, 0x29, 0x2a,
+      0x2c, 0x2f, 0x32, 0x35, 0x38, 0x3a, 0x3c, 0x3d,
+      0x3e, 0x3e, 0x40, 0x41, 0x40, 0x3f, 0x3d, 0x3b,
+      0x3a, 0x38, 0x36, 0x33, 0x30, 0x2d, 0x2b, 0x29,
+      0x27, 0x24, 0x24, 0x21, 0x1e, 0x1c, 0x1b, 0x1a,
+      0x18, 0x17, 0x16, 0x15, 0x13, 0x12, 0x10, 0x0f,
+      0x10, 0x0f, 0x0e, 0x0f, 0x0e, 0x0d, 0x0d, 0x0d,
+      0x0d, 0x0d, 0x0e, 0x0e, 0x0e, 0x0f, 0x0e, 0x0f,
+      0x10, 0x11, 0x11, 0x12, 0x13, 0x13, 0x14, 0x15,
+      0x15, 0x16, 0x17, 0x1a, 0x1b, 0x1d, 0x1e, 0x20,
+      0x21, 0x25, 0x27, 0x29, 0x2b, 0x2d, 0x31, 0x35,
+      0x37, 0x39, 0x3c, 0x3f, 0x40, 0x43, 0x46, 0x47,
+      0x4a, 0x49, 0x48, 0x46, 0x45, 0x43, 0x42, 0x41,
+      0x3f, 0x40, 0x3f, 0x3f, 0x40, 0x3f, 0x41, 0x43,
+      0x43, 0x43, 0x44, 0x45, 0x45, 0x45, 0x45, 0x45,
+      0x45, 0x45, 0x44, 0x43, 0x43, 0x42, 0x42, 0x40,
+      0x3e, 0x3d, 0x3c, 0x39, 0x38, 0x38, 0x38, 0x38,
+      0x38, 0x36, 0x38, 0x39, 0x39, 0x3a, 0x3c, 0x3d,
+      0x3e, 0x3e, 0x3f, 0x41, 0x42, 0x42, 0x43, 0x45,
+      0x46, 0x49, 0x4b, 0x4d, 0x4f, 0x50, 0x53, 0x54,
+      0x57, 0x58, 0x5a, 0x5c, 0x5b, 0x5e, 0x60, 0x61,
+      0x60, 0x60, 0x5f, 0x5f, 0x5d, 0x5b, 0x5b, 0x59,
+      0x58, 0x57, 0x56, 0x55, 0x55, 0x55, 0x57, 0x59,
+      0x5b, 0x5b, 0x5d, 0x5c, 0x5c, 0x5e, 0x5e, 0x5e,
+      0x5d, 0x5b, 0x59, 0x56, 0x54, 0x51, 0x51, 0x51,
+      0x52, 0x55, 0x56, 0x56, 0x5a, 0x5d, 0x5f, 0x63,
+      0x66, 0x68, 0x6b, 0x6b, 0x68, 0x67, 0x66, 0x64,
+      0x61, 0x5d, 0x57, 0x52, 0x4f, 0x49, 0x46, 0x45,
+      0x43, 0x43, 0x43, },
+    { 0x1a, 0x22, 0x1f, 0x20, 0x21, 0x22, 0x23, 0x24,
+      0x26, 0x27, 0x2a, 0x2d, 0x31, 0x33, 0x37, 0x3b,
+      0x3d, 0x41, 0x46, 0x49, 0x4a, 0x4d, 0x4f, 0x4f,
+      0x50, 0x51, 0x52, 0x50, 0x4e, 0x4b, 0x4b, 0x48,
+      0x44, 0x42, 0x3e, 0x3c, 0x39, 0x35, 0x33, 0x30,
+      0x2f, 0x2d, 0x2a, 0x28, 0x27, 0x26, 0x25, 0x24,
+      0x23, 0x24, 0x24, 0x25, 0x27, 0x27, 0x29, 0x2a,
+      0x2d, 0x2f, 0x32, 0x35, 0x39, 0x3a, 0x3c, 0x3d,
+      0x3e, 0x3f, 0x40, 0x41, 0x40, 0x3f, 0x3e, 0x3c,
+      0x3a, 0x38, 0x36, 0x33, 0x31, 0x2d, 0x2c, 0x29,
+      0x27, 0x26, 0x24, 0x21, 0x1f, 0x1d, 0x1c, 0x1a,
+      0x19, 0x18, 0x16, 0x15, 0x14, 0x13, 0x12, 0x10,
+      0x11, 0x10, 0x0f, 0x0f, 0x0f, 0x0e, 0x0e, 0x0e,
+      0x0f, 0x0f, 0x0e, 0x0e, 0x0e, 0x0f, 0x0f, 0x10,
+      0x11, 0x12, 0x12, 0x13, 0x15, 0x15, 0x16, 0x16,
+      0x17, 0x18, 0x1a, 0x1b, 0x1c, 0x1e, 0x1f, 0x21,
+      0x22, 0x25, 0x27, 0x2a, 0x2c, 0x2e, 0x33, 0x36,
+      0x39, 0x3a, 0x3d, 0x40, 0x41, 0x45, 0x47, 0x4a,
+      0x4c, 0x4d, 0x4c, 0x4a, 0x48, 0x45, 0x44, 0x41,
+      0x42, 0x42, 0x42, 0x42, 0x42, 0x43, 0x43, 0x44,
+      0x45, 0x47, 0x47, 0x48, 0x47, 0x48, 0x47, 0x47,
+      0x48, 0x48, 0x46, 0x46, 0x46, 0x43, 0x43, 0x41,
+      0x3f, 0x3e, 0x3b, 0x39, 0x38, 0x37, 0x37, 0x37,
+      0x38, 0x38, 0x37, 0x39, 0x39, 0x3a, 0x3c, 0x3e,
+      0x3e, 0x3f, 0x3f, 0x3f, 0x42, 0x43, 0x43, 0x45,
+      0x47, 0x48, 0x4b, 0x4c, 0x4e, 0x50, 0x51, 0x54,
+      0x56, 0x58, 0x5a, 0x5c, 0x5c, 0x5f, 0x5f, 0x5f,
+      0x61, 0x60, 0x5f, 0x5f, 0x5e, 0x5b, 0x5c, 0x5b,
+      0x59, 0x59, 0x57, 0x56, 0x55, 0x56, 0x57, 0x59,
+      0x5a, 0x5b, 0x5c, 0x5c, 0x5d, 0x5e, 0x5e, 0x5d,
+      0x5e, 0x5c, 0x5a, 0x57, 0x55, 0x52, 0x51, 0x52,
+      0x53, 0x55, 0x57, 0x58, 0x5c, 0x5e, 0x61, 0x65,
+      0x69, 0x6b, 0x6c, 0x6b, 0x6a, 0x69, 0x67, 0x64,
+      0x61, 0x5d, 0x59, 0x53, 0x4d, 0x48, 0x46, 0x45,
+      0x44, 0x44, 0x43, },
+    { 0x1a, 0x21, 0x1e, 0x1f, 0x20, 0x21, 0x23, 0x24,
+      0x25, 0x28, 0x2a, 0x2e, 0x31, 0x33, 0x37, 0x3b,
+      0x3e, 0x41, 0x46, 0x49, 0x4b, 0x4d, 0x4f, 0x4e,
+      0x50, 0x51, 0x51, 0x50, 0x4e, 0x4b, 0x4a, 0x48,
+      0x44, 0x42, 0x3e, 0x3c, 0x39, 0x35, 0x32, 0x30,
+      0x2f, 0x2d, 0x29, 0x27, 0x27, 0x26, 0x25, 0x24,
+      0x23, 0x24, 0x24, 0x25, 0x26, 0x27, 0x29, 0x2a,
+      0x2c, 0x2f, 0x32, 0x35, 0x38, 0x3b, 0x3c, 0x3e,
+      0x3f, 0x3f, 0x40, 0x41, 0x40, 0x3f, 0x3e, 0x3c,
+      0x3a, 0x39, 0x36, 0x34, 0x31, 0x2d, 0x2c, 0x29,
+      0x27, 0x26, 0x24, 0x21, 0x1f, 0x1d, 0x1c, 0x1a,
+      0x19, 0x17, 0x16, 0x15, 0x14, 0x13, 0x12, 0x10,
+      0x11, 0x10, 0x0f, 0x0f, 0x0f, 0x0e, 0x0e, 0x0e,
+      0x0e, 0x0e, 0x0e, 0x0e, 0x0e, 0x0f, 0x0f, 0x10,
+      0x11, 0x13, 0x14, 0x14, 0x15, 0x16, 0x17, 0x19,
+      0x19, 0x1a, 0x1c, 0x1d, 0x1e, 0x20, 0x22, 0x24,
+      0x25, 0x27, 0x29, 0x2c, 0x2e, 0x31, 0x35, 0x38,
+      0x3a, 0x3d, 0x41, 0x42, 0x45, 0x48, 0x4c, 0x4e,
+      0x4f, 0x4f, 0x4f, 0x4d, 0x4b, 0x49, 0x47, 0x47,
+      0x46, 0x45, 0x45, 0x45, 0x44, 0x44, 0x46, 0x47,
+      0x48, 0x49, 0x4b, 0x4b, 0x4a, 0x4b, 0x4b, 0x4a,
+      0x4b, 0x4a, 0x49, 0x49, 0x48, 0x46, 0x46, 0x44,
+      0x42, 0x41, 0x3d, 0x3b, 0x3a, 0x38, 0x38, 0x38,
+      0x37, 0x37, 0x39, 0x38, 0x3a, 0x3a, 0x3c, 0x3c,
+      0x3e, 0x40, 0x40, 0x41, 0x43, 0x43, 0x45, 0x46,
+      0x48, 0x49, 0x4b, 0x4e, 0x4f, 0x50, 0x53, 0x55,
+      0x57, 0x59, 0x5b, 0x5c, 0x5d, 0x5e, 0x5f, 0x60,
+      0x60, 0x60, 0x5f, 0x5f, 0x5e, 0x5c, 0x5b, 0x5a,
+      0x59, 0x58, 0x57, 0x57, 0x56, 0x56, 0x57, 0x58,
+      0x59, 0x5a, 0x5b, 0x5c, 0x5c, 0x5d, 0x5e, 0x5d,
+      0x5c, 0x5b, 0x58, 0x57, 0x54, 0x52, 0x52, 0x53,
+      0x54, 0x57, 0x58, 0x58, 0x5b, 0x5e, 0x62, 0x65,
+      0x69, 0x6b, 0x6d, 0x6c, 0x6a, 0x69, 0x67, 0x64,
+      0x62, 0x5e, 0x59, 0x54, 0x4d, 0x48, 0x47, 0x46,
+      0x45, 0x45, 0x44, },
+    { 0x1a, 0x21, 0x1e, 0x1f, 0x20, 0x21, 0x23, 0x24,
+      0x25, 0x28, 0x2a, 0x2e, 0x31, 0x34, 0x37, 0x3b,
+      0x3e, 0x42, 0x47, 0x49, 0x4b, 0x4d, 0x4f, 0x4f,
+      0x50, 0x51, 0x51, 0x50, 0x50, 0x4c, 0x4a, 0x47,
+      0x44, 0x42, 0x3e, 0x3c, 0x39, 0x35, 0x32, 0x31,
+      0x2f, 0x2d, 0x29, 0x27, 0x26, 0x26, 0x25, 0x24,
+      0x23, 0x24, 0x25, 0x25, 0x26, 0x27, 0x29, 0x2b,
+      0x2c, 0x2f, 0x33, 0x35, 0x38, 0x3a, 0x3c, 0x3e,
+      0x40, 0x40, 0x41, 0x42, 0x41, 0x3f, 0x3f, 0x3d,
+      0x3b, 0x39, 0x36, 0x33, 0x32, 0x2e, 0x2d, 0x2a,
+      0x27, 0x26, 0x25, 0x22, 0x1f, 0x1d, 0x1c, 0x1b,
+      0x19, 0x17, 0x17, 0x16, 0x15, 0x14, 0x12, 0x11,
+      0x11, 0x11, 0x10, 0x10, 0x0f, 0x0f, 0x0f, 0x0f,
+      0x0f, 0x0f, 0x10, 0x11, 0x10, 0x11, 0x11, 0x12,
+      0x11, 0x14, 0x15, 0x16, 0x17, 0x18, 0x19, 0x1b,
+      0x1c, 0x1c, 0x1e, 0x20, 0x21, 0x22, 0x23, 0x25,
+      0x27, 0x2a, 0x2c, 0x2f, 0x31, 0x35, 0x38, 0x3b,
+      0x3d, 0x40, 0x44, 0x47, 0x49, 0x4c, 0x4f, 0x51,
+      0x53, 0x53, 0x53, 0x51, 0x50, 0x4e, 0x4c, 0x4b,
+      0x4a, 0x49, 0x49, 0x49, 0x49, 0x4a, 0x4a, 0x4d,
+      0x4e, 0x4e, 0x4f, 0x50, 0x4f, 0x50, 0x51, 0x50,
+      0x50, 0x4e, 0x4d, 0x4c, 0x4b, 0x48, 0x48, 0x47,
+      0x44, 0x42, 0x3f, 0x3d, 0x3b, 0x3a, 0x39, 0x39,
+      0x39, 0x38, 0x39, 0x3b, 0x3a, 0x3c, 0x3e, 0x3d,
+      0x40, 0x40, 0x40, 0x42, 0x42, 0x42, 0x45, 0x46,
+      0x47, 0x49, 0x4c, 0x4e, 0x50, 0x50, 0x53, 0x56,
+      0x58, 0x59, 0x5d, 0x5d, 0x5e, 0x60, 0x61, 0x61,
+      0x62, 0x61, 0x60, 0x60, 0x5e, 0x5d, 0x5d, 0x5b,
+      0x57, 0x58, 0x56, 0x55, 0x55, 0x56, 0x56, 0x59,
+      0x59, 0x58, 0x5a, 0x5a, 0x5a, 0x5c, 0x5c, 0x5c,
+      0x5b, 0x5b, 0x58, 0x57, 0x54, 0x53, 0x52, 0x53,
+      0x54, 0x57, 0x58, 0x59, 0x5c, 0x5f, 0x63, 0x67,
+      0x6b, 0x6d, 0x6e, 0x6e, 0x6b, 0x6a, 0x68, 0x64,
+      0x62, 0x5e, 0x58, 0x53, 0x4f, 0x49, 0x47, 0x46,
+      0x45, 0x45, 0x44, },
+    { 0x19, 0x20, 0x1e, 0x1e, 0x1f, 0x20, 0x22, 0x23,
+      0x25, 0x27, 0x2a, 0x2e, 0x31, 0x34, 0x37, 0x3a,
+      0x3e, 0x41, 0x46, 0x49, 0x4a, 0x4d, 0x4f, 0x4e,
+      0x50, 0x51, 0x51, 0x4f, 0x4f, 0x4d, 0x49, 0x47,
+      0x44, 0x42, 0x3e, 0x3c, 0x39, 0x36, 0x32, 0x31,
+      0x2f, 0x2d, 0x29, 0x27, 0x26, 0x26, 0x25, 0x24,
+      0x23, 0x24, 0x25, 0x25, 0x26, 0x28, 0x29, 0x2b,
+      0x2c, 0x2f, 0x33, 0x35, 0x38, 0x3a, 0x3c, 0x3e,
+      0x3f, 0x3f, 0x41, 0x42, 0x41, 0x3f, 0x3f, 0x3d,
+      0x3c, 0x39, 0x36, 0x33, 0x32, 0x2e, 0x2d, 0x2a,
+      0x27, 0x26, 0x25, 0x22, 0x1f, 0x1e, 0x1d, 0x1b,
+      0x1a, 0x17, 0x17, 0x17, 0x14, 0x14, 0x12, 0x11,
+      0x11, 0x12, 0x11, 0x11, 0x10, 0x10, 0x10, 0x10,
+      0x10, 0x10, 0x11, 0x11, 0x11, 0x12, 0x13, 0x14,
+      0x14, 0x16, 0x17, 0x18, 0x19, 0x1a, 0x1c, 0x1e,
+      0x1e, 0x1f, 0x22, 0x23, 0x23, 0x24, 0x25, 0x27,
+      0x2a, 0x2d, 0x2f, 0x31, 0x35, 0x38, 0x3a, 0x3e,
+      0x41, 0x44, 0x48, 0x4b, 0x4d, 0x51, 0x53, 0x55,
+      0x57, 0x57, 0x56, 0x55, 0x54, 0x52, 0x52, 0x50,
+      0x4e, 0x50, 0x4e, 0x4d, 0x4d, 0x4d, 0x4f, 0x51,
+      0x51, 0x52, 0x54, 0x55, 0x55, 0x55, 0x57, 0x55,
+      0x54, 0x53, 0x52, 0x4e, 0x4d, 0x4b, 0x4a, 0x49,
+      0x46, 0x44, 0x41, 0x3f, 0x3d, 0x3b, 0x3a, 0x3a,
+      0x39, 0x39, 0x39, 0x39, 0x3a, 0x3b, 0x3d, 0x3e,
+      0x3f, 0x40, 0x41, 0x42, 0x44, 0x44, 0x45, 0x47,
+      0x49, 0x49, 0x4a, 0x4d, 0x50, 0x51, 0x53, 0x57,
+      0x5a, 0x5b, 0x5e, 0x5f, 0x60, 0x61, 0x62, 0x62,
+      0x63, 0x62, 0x60, 0x60, 0x5e, 0x5c, 0x5c, 0x59,
+      0x58, 0x56, 0x55, 0x55, 0x55, 0x55, 0x55, 0x54,
+      0x56, 0x56, 0x57, 0x58, 0x58, 0x59, 0x5a, 0x59,
+      0x58, 0x57, 0x56, 0x55, 0x54, 0x52, 0x53, 0x53,
+      0x53, 0x56, 0x57, 0x59, 0x5b, 0x5e, 0x62, 0x66,
+      0x6a, 0x6c, 0x6d, 0x6e, 0x6b, 0x69, 0x67, 0x64,
+      0x61, 0x5d, 0x58, 0x54, 0x50, 0x4a, 0x47, 0x46,
+      0x45, 0x45, 0x44, },
+    { 0x1a, 0x21, 0x1e, 0x1f, 0x1f, 0x20, 0x22, 0x23,
+      0x25, 0x27, 0x2b, 0x2e, 0x31, 0x34, 0x37, 0x3b,
+      0x3d, 0x42, 0x45, 0x49, 0x4a, 0x4d, 0x4e, 0x4e,
+      0x51, 0x52, 0x50, 0x4f, 0x4f, 0x4c, 0x49, 0x48,
+      0x45, 0x42, 0x3e, 0x3b, 0x39, 0x36, 0x32, 0x32,
+      0x2f, 0x2c, 0x2a, 0x28, 0x26, 0x26, 0x25, 0x24,
+      0x23, 0x24, 0x24, 0x25, 0x25, 0x28, 0x29, 0x2b,
+      0x2d, 0x2f, 0x33, 0x35, 0x38, 0x3a, 0x3c, 0x3e,
+      0x3f, 0x3f, 0x41, 0x42, 0x41, 0x3f, 0x3e, 0x3c,
+      0x3c, 0x3a, 0x37, 0x33, 0x32, 0x2f, 0x2d, 0x2b,
+      0x28, 0x26, 0x25, 0x22, 0x20, 0x1e, 0x1d, 0x1b,
+      0x1a, 0x17, 0x17, 0x16, 0x14, 0x14, 0x12, 0x11,
+      0x12, 0x11, 0x11, 0x11, 0x11, 0x10, 0x10, 0x10,
+      0x10, 0x11, 0x12, 0x12, 0x12, 0x13, 0x14, 0x14,
+      0x16, 0x18, 0x19, 0x1a, 0x1b, 0x1d, 0x1e, 0x1f,
+      0x21, 0x22, 0x23, 0x25, 0x26, 0x26, 0x28, 0x2a,
+      0x2c, 0x2e, 0x32, 0x34, 0x39, 0x39, 0x3d, 0x41,
+      0x45, 0x47, 0x4c, 0x4e, 0x51, 0x54, 0x56, 0x58,
+      0x5b, 0x5c, 0x5a, 0x59, 0x58, 0x56, 0x55, 0x53,
+      0x53, 0x52, 0x52, 0x51, 0x52, 0x52, 0x53, 0x55,
+      0x57, 0x58, 0x5a, 0x5a, 0x59, 0x5b, 0x59, 0x59,
+      0x58, 0x57, 0x55, 0x53, 0x51, 0x4e, 0x4c, 0x4a,
+      0x48, 0x46, 0x43, 0x40, 0x3e, 0x3c, 0x3b, 0x3b,
+      0x38, 0x39, 0x38, 0x39, 0x3a, 0x3d, 0x3d, 0x3e,
+      0x3f, 0x40, 0x41, 0x43, 0x44, 0x45, 0x46, 0x48,
+      0x4a, 0x4b, 0x4d, 0x4e, 0x50, 0x52, 0x54, 0x56,
+      0x59, 0x5c, 0x5e, 0x5f, 0x60, 0x62, 0x62, 0x63,
+      0x63, 0x63, 0x61, 0x5f, 0x5e, 0x5d, 0x5c, 0x5b,
+      0x59, 0x56, 0x56, 0x55, 0x54, 0x53, 0x53, 0x54,
+      0x55, 0x54, 0x55, 0x55, 0x55, 0x57, 0x58, 0x57,
+      0x57, 0x56, 0x55, 0x54, 0x54, 0x52, 0x52, 0x53,
+      0x54, 0x55, 0x57, 0x58, 0x5b, 0x5e, 0x62, 0x65,
+      0x69, 0x6b, 0x6d, 0x6e, 0x6a, 0x69, 0x67, 0x63,
+      0x61, 0x5d, 0x58, 0x54, 0x4f, 0x4b, 0x48, 0x47,
+      0x46, 0x45, 0x45, },
+    { 0x1a, 0x21, 0x1e, 0x1f, 0x1f, 0x20, 0x22, 0x23,
+      0x25, 0x27, 0x2b, 0x2d, 0x31, 0x34, 0x37, 0x3b,
+      0x3d, 0x42, 0x45, 0x48, 0x4c, 0x4e, 0x4e, 0x4f,
+      0x51, 0x52, 0x50, 0x50, 0x4f, 0x4c, 0x4a, 0x48,
+      0x45, 0x42, 0x3f, 0x3b, 0x39, 0x36, 0x32, 0x31,
+      0x2f, 0x2c, 0x2a, 0x28, 0x26, 0x26, 0x25, 0x24,
+      0x23, 0x24, 0x24, 0x25, 0x27, 0x28, 0x29, 0x2b,
+      0x2d, 0x30, 0x33, 0x36, 0x39, 0x3b, 0x3d, 0x3f,
+      0x3f, 0x40, 0x42, 0x43, 0x42, 0x40, 0x3e, 0x3c,
+      0x3c, 0x3a, 0x37, 0x34, 0x32, 0x2f, 0x2d, 0x2c,
+      0x2a, 0x27, 0x26, 0x23, 0x20, 0x1e, 0x1d, 0x1c,
+      0x1a, 0x18, 0x18, 0x17, 0x15, 0x16, 0x14, 0x12,
+      0x12, 0x12, 0x12, 0x12, 0x12, 0x11, 0x11, 0x12,
+      0x12, 0x12, 0x13, 0x14, 0x14, 0x14, 0x15, 0x16,
+      0x17, 0x19, 0x1b, 0x1c, 0x1e, 0x20, 0x20, 0x22,
+      0x24, 0x25, 0x26, 0x27, 0x28, 0x2a, 0x2c, 0x2c,
+      0x2f, 0x32, 0x35, 0x37, 0x3b, 0x3c, 0x41, 0x45,
+      0x48, 0x4c, 0x50, 0x52, 0x54, 0x57, 0x5a, 0x5c,
+      0x5f, 0x5f, 0x5f, 0x5d, 0x5c, 0x5b, 0x5a, 0x58,
+      0x57, 0x57, 0x57, 0x56, 0x56, 0x57, 0x57, 0x5a,
+      0x5c, 0x5e, 0x5f, 0x61, 0x5f, 0x5f, 0x5f, 0x5e,
+      0x5d, 0x5c, 0x5a, 0x57, 0x55, 0x52, 0x4f, 0x4e,
+      0x4a, 0x47, 0x46, 0x42, 0x41, 0x3e, 0x3d, 0x3c,
+      0x3b, 0x3a, 0x39, 0x39, 0x3b, 0x3c, 0x3d, 0x3f,
+      0x40, 0x42, 0x42, 0x44, 0x45, 0x46, 0x49, 0x49,
+      0x4b, 0x4c, 0x4e, 0x4f, 0x51, 0x54, 0x57, 0x58,
+      0x5b, 0x5d, 0x61, 0x61, 0x61, 0x63, 0x65, 0x65,
+      0x64, 0x64, 0x62, 0x61, 0x60, 0x5e, 0x5d, 0x5c,
+      0x59, 0x58, 0x56, 0x54, 0x53, 0x53, 0x53, 0x54,
+      0x54, 0x53, 0x53, 0x54, 0x54, 0x54, 0x55, 0x55,
+      0x56, 0x55, 0x54, 0x53, 0x53, 0x52, 0x52, 0x53,
+      0x55, 0x56, 0x57, 0x58, 0x5b, 0x5e, 0x62, 0x66,
+      0x69, 0x6b, 0x6d, 0x6d, 0x6b, 0x69, 0x67, 0x64,
+      0x61, 0x5d, 0x58, 0x55, 0x50, 0x4b, 0x48, 0x47,
+      0x46, 0x46, 0x46, },
+    { 0x1a, 0x20, 0x1e, 0x1f, 0x1f, 0x21, 0x22, 0x23,
+      0x25, 0x27, 0x2b, 0x2d, 0x31, 0x34, 0x37, 0x3b,
+      0x3d, 0x42, 0x45, 0x48, 0x4c, 0x4e, 0x4f, 0x4f,
+      0x51, 0x52, 0x51, 0x50, 0x4e, 0x4b, 0x4a, 0x48,
+      0x45, 0x42, 0x3f, 0x3b, 0x38, 0x36, 0x32, 0x31,
+      0x2f, 0x2c, 0x2a, 0x28, 0x26, 0x26, 0x25, 0x24,
+      0x23, 0x24, 0x24, 0x25, 0x27, 0x28, 0x29, 0x2b,
+      0x2e, 0x30, 0x33, 0x36, 0x39, 0x3b, 0x3d, 0x3f,
+      0x3f, 0x40, 0x41, 0x42, 0x41, 0x40, 0x3e, 0x3c,
+      0x3c, 0x3a, 0x37, 0x34, 0x33, 0x30, 0x2e, 0x2b,
+      0x29, 0x26, 0x24, 0x24, 0x20, 0x1f, 0x1d, 0x1d,
+      0x1a, 0x19, 0x17, 0x16, 0x16, 0x16, 0x16, 0x14,
+      0x13, 0x12, 0x13, 0x13, 0x13, 0x12, 0x12, 0x13,
+      0x13, 0x14, 0x15, 0x15, 0x14, 0x15, 0x16, 0x18,
+      0x19, 0x1b, 0x1c, 0x1e, 0x20, 0x21, 0x22, 0x24,
+      0x27, 0x28, 0x29, 0x2a, 0x2c, 0x2c, 0x2d, 0x2f,
+      0x32, 0x35, 0x37, 0x3a, 0x3c, 0x3e, 0x44, 0x48,
+      0x4c, 0x50, 0x54, 0x56, 0x58, 0x5b, 0x5e, 0x60,
+      0x61, 0x63, 0x62, 0x61, 0x60, 0x5f, 0x5e, 0x5e,
+      0x5c, 0x5c, 0x5b, 0x5a, 0x5a, 0x5b, 0x5c, 0x5e,
+      0x60, 0x63, 0x64, 0x65, 0x63, 0x62, 0x63, 0x63,
+      0x61, 0x60, 0x5e, 0x5b, 0x58, 0x55, 0x51, 0x4f,
+      0x4c, 0x4a, 0x47, 0x44, 0x42, 0x41, 0x3e, 0x3c,
+      0x3b, 0x3a, 0x3a, 0x3b, 0x3b, 0x3c, 0x3e, 0x3f,
+      0x40, 0x42, 0x43, 0x45, 0x46, 0x47, 0x49, 0x4a,
+      0x4c, 0x4c, 0x4f, 0x51, 0x52, 0x55, 0x58, 0x5b,
+      0x5c, 0x5f, 0x61, 0x62, 0x63, 0x64, 0x64, 0x65,
+      0x66, 0x65, 0x63, 0x62, 0x5f, 0x5e, 0x5e, 0x5c,
+      0x5b, 0x58, 0x56, 0x55, 0x54, 0x53, 0x52, 0x53,
+      0x52, 0x52, 0x52, 0x52, 0x52, 0x53, 0x55, 0x55,
+      0x55, 0x53, 0x53, 0x53, 0x52, 0x51, 0x52, 0x52,
+      0x55, 0x55, 0x58, 0x58, 0x5b, 0x5d, 0x61, 0x65,
+      0x68, 0x6a, 0x6c, 0x6b, 0x69, 0x68, 0x67, 0x64,
+      0x61, 0x5e, 0x58, 0x54, 0x4f, 0x4b, 0x49, 0x48,
+      0x47, 0x46, 0x45, },
+    { 0x19, 0x20, 0x1d, 0x1f, 0x1f, 0x20, 0x23, 0x23,
+      0x25, 0x27, 0x2b, 0x2d, 0x31, 0x34, 0x37, 0x3b,
+      0x3d, 0x42, 0x45, 0x48, 0x4c, 0x4e, 0x4f, 0x4f,
+      0x51, 0x52, 0x51, 0x50, 0x4e, 0x4b, 0x4a, 0x48,
+      0x44, 0x42, 0x3f, 0x3a, 0x38, 0x36, 0x32, 0x30,
+      0x2f, 0x2c, 0x2a, 0x28, 0x26, 0x26, 0x25, 0x24,
+      0x23, 0x24, 0x24, 0x25, 0x26, 0x28, 0x29, 0x2b,
+      0x2e, 0x30, 0x34, 0x36, 0x39, 0x3b, 0x3d, 0x3f,
+      0x3f, 0x40, 0x41, 0x42, 0x41, 0x40, 0x3e, 0x3c,
+      0x3c, 0x3a, 0x37, 0x34, 0x33, 0x30, 0x2e, 0x2b,
+      0x29, 0x27, 0x25, 0x24, 0x21, 0x1f, 0x1e, 0x1c,
+      0x1b, 0x19, 0x17, 0x16, 0x16, 0x16, 0x16, 0x14,
+      0x13, 0x12, 0x13, 0x13, 0x13, 0x13, 0x13, 0x13,
+      0x13, 0x14, 0x15, 0x14, 0x14, 0x14, 0x17, 0x19,
+      0x1a, 0x1c, 0x1e, 0x20, 0x21, 0x23, 0x24, 0x26,
+      0x29, 0x29, 0x2b, 0x2c, 0x2d, 0x2e, 0x30, 0x31,
+      0x34, 0x38, 0x3b, 0x3c, 0x3f, 0x42, 0x47, 0x4c,
+      0x50, 0x54, 0x57, 0x5b, 0x5c, 0x5e, 0x62, 0x63,
+      0x66, 0x66, 0x66, 0x65, 0x64, 0x63, 0x61, 0x62,
+      0x60, 0x60, 0x5f, 0x5e, 0x5e, 0x5f, 0x60, 0x62,
+      0x65, 0x67, 0x69, 0x6a, 0x69, 0x68, 0x69, 0x67,
+      0x66, 0x64, 0x62, 0x5f, 0x5c, 0x58, 0x54, 0x51,
+      0x4e, 0x4b, 0x49, 0x45, 0x43, 0x41, 0x40, 0x3e,
+      0x3c, 0x3a, 0x3b, 0x3b, 0x3c, 0x3d, 0x3e, 0x3f,
+      0x41, 0x42, 0x44, 0x46, 0x46, 0x48, 0x49, 0x4b,
+      0x4d, 0x50, 0x51, 0x53, 0x55, 0x57, 0x58, 0x5c,
+      0x5f, 0x60, 0x63, 0x64, 0x64, 0x65, 0x66, 0x66,
+      0x66, 0x65, 0x65, 0x63, 0x61, 0x5f, 0x5e, 0x5c,
+      0x5a, 0x58, 0x56, 0x55, 0x54, 0x53, 0x52, 0x52,
+      0x53, 0x52, 0x52, 0x52, 0x52, 0x53, 0x53, 0x53,
+      0x54, 0x53, 0x53, 0x52, 0x53, 0x51, 0x53, 0x53,
+      0x55, 0x57, 0x58, 0x59, 0x5b, 0x5d, 0x62, 0x64,
+      0x68, 0x6a, 0x6c, 0x6b, 0x69, 0x68, 0x67, 0x64,
+      0x61, 0x5d, 0x57, 0x54, 0x50, 0x4a, 0x48, 0x47,
+      0x46, 0x45, 0x45, },
diff --git a/tests/tcg/hexagon/hvx_histogram_row.h b/tests/tcg/hexagon/hvx_histogram_row.h
new file mode 100644
index 0000000..6a4531a
--- /dev/null
+++ b/tests/tcg/hexagon/hvx_histogram_row.h
@@ -0,0 +1,24 @@
+/*
+ *  Copyright(c) 2021 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HVX_HISTOGRAM_ROW_H
+#define HVX_HISTOGRAM_ROW_H
+
+void hvx_histogram_row(uint8_t *src, int stride, int width, int height,
+                       int *hist);
+
+#endif
diff --git a/tests/tcg/hexagon/hvx_histogram.c b/tests/tcg/hexagon/hvx_histogram.c
new file mode 100644
index 0000000..43377a9
--- /dev/null
+++ b/tests/tcg/hexagon/hvx_histogram.c
@@ -0,0 +1,88 @@
+/*
+ *  Copyright(c) 2021 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <stdio.h>
+#include <stdint.h>
+#include <string.h>
+#include "hvx_histogram_row.h"
+
+const int vector_len = 128;
+const int width = 275;
+const int height = 20;
+const int stride = (width + vector_len - 1) & -vector_len;
+
+int err;
+
+static uint8_t input[height][stride] __attribute__((aligned(128))) = {
+#include "hvx_histogram_input.h"
+};
+
+static int result[256] __attribute__((aligned(128)));
+static int expect[256] __attribute__((aligned(128)));
+
+static void check(void)
+{
+    for (int i = 0; i < 256; i++) {
+        int res = result[i];
+        int exp = expect[i];
+        if (res != exp) {
+            printf("ERROR at %3d: 0x%04x != 0x%04x\n",
+                   i, res, exp);
+            err++;
+        }
+    }
+}
+
+static void ref_histogram(uint8_t *src, int stride, int width, int height,
+                          int *hist)
+{
+    for (int i = 0; i < 256; i++) {
+        hist[i] = 0;
+    }
+
+    for (int i = 0; i < height; i++) {
+        for (int j = 0; j < width; j++) {
+            hist[src[i * stride + j]]++;
+        }
+    }
+}
+
+static void hvx_histogram(uint8_t *src, int stride, int width, int height,
+                          int *hist)
+{
+    int n = 8192 / width;
+
+    for (int i = 0; i < 256; i++) {
+        hist[i] = 0;
+    }
+
+    for (int i = 0; i < height; i += n) {
+        int k = height - i > n ? n : height - i;
+        hvx_histogram_row(src, stride, width, k, hist);
+        src += n * stride;
+    }
+}
+
+int main()
+{
+    ref_histogram(&input[0][0], stride, width, height, expect);
+    hvx_histogram(&input[0][0], stride, width, height, result);
+    check();
+
+    puts(err ? "FAIL" : "PASS");
+    return err ? 1 : 0;
+}
diff --git a/tests/tcg/hexagon/Makefile.target b/tests/tcg/hexagon/Makefile.target
index fa1fa57..d4b8f72 100644
--- a/tests/tcg/hexagon/Makefile.target
+++ b/tests/tcg/hexagon/Makefile.target
@@ -42,9 +42,14 @@ HEX_TESTS += scatter_gather
 HEX_TESTS += atomics
 HEX_TESTS += fpstuff
 HEX_TESTS += hvx_misc
+HEX_TESTS += hvx_histogram
 
 TESTS += $(HEX_TESTS)
 
 scatter_gather: CFLAGS += -mhvx
 vector_add_int: CFLAGS += -mhvx -fvectorize
 hvx_misc: CFLAGS += -mhvx
+hvx_histogram: CFLAGS += -mhvx -Wno-gnu-folding-constant
+
+hvx_histogram: hvx_histogram.c hvx_histogram_row.S
+	$(CC) $(CFLAGS) $(CROSS_CC_GUEST_CFLAGS) $^ -o $@
diff --git a/tests/tcg/hexagon/hvx_histogram_row.S b/tests/tcg/hexagon/hvx_histogram_row.S
new file mode 100644
index 0000000..5e42c33
--- /dev/null
+++ b/tests/tcg/hexagon/hvx_histogram_row.S
@@ -0,0 +1,294 @@
+/*
+ *  Copyright(c) 2021 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+
+/*
+ * void hvx_histogram_row(uint8_t *src,     => r0
+ *                        int stride,       => r1
+ *                        int width,        => r2
+ *                        int height,       => r3
+ *                        int *hist         => r4)
+ */
+    .text
+    .p2align 2
+    .global hvx_histogram_row
+    .type hvx_histogram_row, @function
+hvx_histogram_row:
+    { r2 = lsr(r2, #7)          /* size / VLEN */
+      r5 = and(r2, #127)        /* size % VLEN */
+      v1 = #0
+      v0 = #0
+    }
+    /*
+     * Step 1: Clean the whole vector register file
+     */
+    { v3:2 = v1:0
+      v5:4 = v1:0
+      p0 = cmp.gt(r2, #0)       /* P0 = (width / VLEN > 0) */
+      p1 = cmp.eq(r5, #0)       /* P1 = (width % VLEN == 0) */
+    }
+    { q0 = vsetq(r5)
+      v7:6 = v1:0
+    }
+    { v9:8   = v1:0
+      v11:10 = v1:0
+    }
+    { v13:12 = v1:0
+      v15:14 = v1:0
+    }
+    { v17:16 = v1:0
+      v19:18 = v1:0
+    }
+    { v21:20 = v1:0
+      v23:22 = v1:0
+    }
+    { v25:24 = v1:0
+      v27:26 = v1:0
+    }
+    { v29:28 = v1:0
+      v31:30 = v1:0
+      r10 = add(r0, r1)           /* R10 = &src[2 * stride] */
+      loop1(.outerloop, r3)
+    }
+
+    /*
+     * Step 2: vhist
+     */
+    .falign
+.outerloop:
+    { if (!p0) jump .loopend
+      loop0(.innerloop, r2)
+    }
+
+    .falign
+.innerloop:
+    { v12.tmp = vmem(R0++#1)
+      vhist
+    }:endloop0
+
+    .falign
+.loopend:
+    if (p1) jump .skip       /* if (width % VLEN == 0) done with current row */
+    { v13.tmp = vmem(r0 + #0)
+      vhist(q0)
+    }
+
+    .falign
+.skip:
+    { r0 = r10                    /* R0  = &src[(i + 1) * stride] */
+      r10 = add(r10, r1)          /* R10 = &src[(i + 2) * stride] */
+    }:endloop1
+
+
+    /*
+     * Step 3: Sum up the data
+     */
+    { v0.h = vshuff(v0.h)
+      r10 = ##0x00010001
+    }
+    v1.h = vshuff(v1.h)
+    { V2.h = vshuff(v2.h)
+      v0.w = vdmpy(v0.h, r10.h):sat
+    }
+    { v3.h = vshuff(v3.h)
+      v1.w = vdmpy(v1.h, r10.h):sat
+    }
+    { v4.h = vshuff(V4.h)
+      v2.w = vdmpy(v2.h, r10.h):sat
+    }
+    { v5.h = vshuff(v5.h)
+      v3.w = vdmpy(v3.h, r10.h):sat
+    }
+    { v6.h = vshuff(v6.h)
+      v4.w = vdmpy(v4.h, r10.h):sat
+    }
+    { v7.h = vshuff(v7.h)
+      v5.w = vdmpy(v5.h, r10.h):sat
+    }
+    { v8.h = vshuff(V8.h)
+      v6.w = vdmpy(v6.h, r10.h):sat
+    }
+    { v9.h = vshuff(V9.h)
+      v7.w = vdmpy(v7.h, r10.h):sat
+    }
+    { v10.h = vshuff(v10.h)
+      v8.w = vdmpy(v8.h, r10.h):sat
+    }
+    { v11.h = vshuff(v11.h)
+      v9.w = vdmpy(v9.h, r10.h):sat
+    }
+    { v12.h = vshuff(v12.h)
+      v10.w = vdmpy(v10.h, r10.h):sat
+    }
+    { v13.h = vshuff(V13.h)
+      v11.w = vdmpy(v11.h, r10.h):sat
+    }
+    { v14.h = vshuff(v14.h)
+      v12.w = vdmpy(v12.h, r10.h):sat
+    }
+    { v15.h = vshuff(v15.h)
+      v13.w = vdmpy(v13.h, r10.h):sat
+    }
+    { v16.h = vshuff(v16.h)
+      v14.w = vdmpy(v14.h, r10.h):sat
+    }
+    { v17.h = vshuff(v17.h)
+      v15.w = vdmpy(v15.h, r10.h):sat
+    }
+    { v18.h = vshuff(v18.h)
+      v16.w = vdmpy(v16.h, r10.h):sat
+    }
+    { v19.h = vshuff(v19.h)
+      v17.w = vdmpy(v17.h, r10.h):sat
+    }
+    { v20.h = vshuff(v20.h)
+      v18.W = vdmpy(v18.h, r10.h):sat
+    }
+    { v21.h = vshuff(v21.h)
+      v19.w = vdmpy(v19.h, r10.h):sat
+    }
+    { v22.h = vshuff(v22.h)
+      v20.w = vdmpy(v20.h, r10.h):sat
+    }
+    { v23.h = vshuff(v23.h)
+      v21.w = vdmpy(v21.h, r10.h):sat
+    }
+    { v24.h = vshuff(v24.h)
+      v22.w = vdmpy(v22.h, r10.h):sat
+    }
+    { v25.h = vshuff(v25.h)
+      v23.w = vdmpy(v23.h, r10.h):sat
+    }
+    { v26.h = vshuff(v26.h)
+      v24.w = vdmpy(v24.h, r10.h):sat
+    }
+    { v27.h = vshuff(V27.h)
+      v25.w = vdmpy(v25.h, r10.h):sat
+    }
+    { v28.h = vshuff(v28.h)
+      v26.w = vdmpy(v26.h, r10.h):sat
+    }
+    { v29.h = vshuff(v29.h)
+      v27.w = vdmpy(v27.h, r10.h):sat
+    }
+    { v30.h = vshuff(v30.h)
+      v28.w = vdmpy(v28.h, r10.h):sat
+    }
+    { v31.h = vshuff(v31.h)
+      v29.w = vdmpy(v29.h, r10.h):sat
+      r28 = #32
+    }
+    { vshuff(v1, v0, r28)
+      v30.w = vdmpy(v30.h, r10.h):sat
+    }
+    { vshuff(v3, v2, r28)
+      v31.w = vdmpy(v31.h, r10.h):sat
+    }
+    { vshuff(v5, v4, r28)
+      v0.w = vadd(v1.w, v0.w)
+      v2.w = vadd(v3.w, v2.w)
+    }
+    { vshuff(v7, v6, r28)
+      r7 = #64
+    }
+    { vshuff(v9, v8, r28)
+      v4.w = vadd(v5.w, v4.w)
+      v6.w = vadd(v7.w, v6.w)
+    }
+    vshuff(v11, v10, r28)
+    { vshuff(v13, v12, r28)
+      v8.w = vadd(v9.w, v8.w)
+      v10.w = vadd(v11.w, v10.w)
+    }
+    vshuff(v15, v14, r28)
+    { vshuff(v17, v16, r28)
+      v12.w = vadd(v13.w, v12.w)
+      v14.w = vadd(v15.w, v14.w)
+    }
+    vshuff(v19, v18, r28)
+    { vshuff(v21, v20, r28)
+      v16.w = vadd(v17.w, v16.w)
+      v18.w = vadd(v19.w, v18.w)
+    }
+    vshuff(v23, v22, r28)
+    { vshuff(v25, v24, r28)
+      v20.w = vadd(v21.w, v20.w)
+      v22.w = vadd(v23.w, v22.w)
+    }
+    vshuff(v27, v26, r28)
+    { vshuff(v29, v28, r28)
+      v24.w = vadd(v25.w, v24.w)
+      v26.w = vadd(v27.w, v26.w)
+    }
+    vshuff(v31, v30, r28)
+    { v28.w = vadd(v29.w, v28.w)
+      vshuff(v2, v0, r7)
+    }
+    { v30.w = vadd(v31.w, v30.w)
+      vshuff(v6, v4, r7)
+      v0.w  = vadd(v0.w, v2.w)
+    }
+    { vshuff(v10, v8, r7)
+      v1.tmp = vmem(r4 + #0)      /* update hist[0-31] */
+      v0.w  = vadd(v0.w, v1.w)
+      vmem(r4++#1) = v0.new
+    }
+    { vshuff(v14, v12, r7)
+      v4.w  = vadd(v4.w, v6.w)
+      v8.w  = vadd(v8.w, v10.w)
+    }
+    { vshuff(v18, v16, r7)
+      v1.tmp = vmem(r4 + #0)      /* update hist[32-63] */
+      v4.w  = vadd(v4.w, v1.w)
+      vmem(r4++#1) = v4.new
+    }
+    { vshuff(v22, v20, r7)
+      v12.w = vadd(v12.w, v14.w)
+      V16.w = vadd(v16.w, v18.w)
+    }
+    { vshuff(v26, v24, r7)
+      v1.tmp = vmem(r4 + #0)      /* update hist[64-95] */
+      v8.w  = vadd(v8.w, v1.w)
+      vmem(r4++#1) = v8.new
+    }
+    { vshuff(v30, v28, r7)
+      v1.tmp = vmem(r4 + #0)      /* update hist[96-127] */
+      v12.w  = vadd(v12.w, v1.w)
+      vmem(r4++#1) = v12.new
+    }
+
+    { v20.w = vadd(v20.w, v22.w)
+      v1.tmp = vmem(r4 + #0)      /* update hist[128-159] */
+      v16.w  = vadd(v16.w, v1.w)
+      vmem(r4++#1) = v16.new
+    }
+    { v24.w = vadd(v24.w, v26.w)
+      v1.tmp = vmem(r4 + #0)      /* update hist[160-191] */
+      v20.w  = vadd(v20.w, v1.w)
+      vmem(r4++#1) = v20.new
+    }
+    { v28.w = vadd(v28.w, v30.w)
+      v1.tmp = vmem(r4 + #0)      /* update hist[192-223] */
+      v24.w  = vadd(v24.w, v1.w)
+      vmem(r4++#1) = v24.new
+    }
+    { v1.tmp = vmem(r4 + #0)      /* update hist[224-255] */
+      v28.w  = vadd(v28.w, v1.w)
+      vmem(r4++#1) = v28.new
+    }
+    jumpr r31
+    .size hvx_histogram_row, .-hvx_histogram_row
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 15/30] Hexagon HVX (target/hexagon) helper overrides - vector assign & cmov
  2021-09-20 21:24 ` [PATCH v3 15/30] Hexagon HVX (target/hexagon) helper overrides - vector assign & cmov Taylor Simpson
@ 2021-09-20 21:59   ` Philippe Mathieu-Daudé
  2021-09-20 22:11     ` Taylor Simpson
  2021-09-20 23:19   ` Richard Henderson
  1 sibling, 1 reply; 51+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-09-20 21:59 UTC (permalink / raw)
  To: Taylor Simpson, qemu-devel; +Cc: ale, bcain, richard.henderson

On 9/20/21 23:24, Taylor Simpson wrote:
> Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
> ---
>  target/hexagon/gen_tcg_hvx.h | 31 +++++++++++++++++++++++++++++++
>  1 file changed, 31 insertions(+)
> 
> diff --git a/target/hexagon/gen_tcg_hvx.h b/target/hexagon/gen_tcg_hvx.h
> index eb29566..bcd53d4 100644
> --- a/target/hexagon/gen_tcg_hvx.h
> +++ b/target/hexagon/gen_tcg_hvx.h
> @@ -126,4 +126,35 @@ static inline void assert_vhist_tmp(DisasContext *ctx)
>      } while (0)
>  
>  
> +#define fGEN_TCG_V6_vassign(SHORTCODE) \
> +    tcg_gen_gvec_mov(MO_64, VdV_off, VuV_off, \
> +                     sizeof(MMVector), sizeof(MMVector))
> +
> +/* Vector conditional move */
> +#define fGEN_TCG_VEC_CMOV(PRED) \
> +    do { \
> +        TCGv lsb = tcg_temp_new(); \
> +        TCGLabel *false_label = gen_new_label(); \
> +        TCGLabel *end_label = gen_new_label(); \
> +        tcg_gen_andi_tl(lsb, PsV, 1); \
> +        tcg_gen_brcondi_tl(TCG_COND_NE, lsb, PRED, false_label); \
> +        tcg_temp_free(lsb); \
> +        tcg_gen_gvec_mov(MO_64, VdV_off, VuV_off, \
> +                         sizeof(MMVector), sizeof(MMVector)); \
> +        tcg_gen_br(end_label); \
> +        gen_set_label(false_label); \
> +        tcg_gen_ori_tl(hex_slot_cancelled, hex_slot_cancelled, \
> +                       1 << insn->slot); \
> +        gen_set_label(end_label); \
> +    } while (0)

Why a macro and not a (eventually inlined) function?

> +/* Vector conditional move (true) */
> +#define fGEN_TCG_V6_vcmov(SHORTCODE) \
> +    fGEN_TCG_VEC_CMOV(1)
> +
> +/* Vector conditional move (false) */
> +#define fGEN_TCG_V6_vncmov(SHORTCODE) \
> +    fGEN_TCG_VEC_CMOV(0)
> +
>  #endif
> 


^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH v3 15/30] Hexagon HVX (target/hexagon) helper overrides - vector assign & cmov
  2021-09-20 21:59   ` Philippe Mathieu-Daudé
@ 2021-09-20 22:11     ` Taylor Simpson
  0 siblings, 0 replies; 51+ messages in thread
From: Taylor Simpson @ 2021-09-20 22:11 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, qemu-devel
  Cc: ale, Brian Cain, richard.henderson



> -----Original Message-----
> From: Philippe Mathieu-Daudé <philippe.mathieu.daude@gmail.com> On
> Behalf Of Philippe Mathieu-Daudé
> Sent: Monday, September 20, 2021 4:59 PM
> To: Taylor Simpson <tsimpson@quicinc.com>; qemu-devel@nongnu.org
> Cc: ale@rev.ng; Brian Cain <bcain@quicinc.com>;
> richard.henderson@linaro.org
> Subject: Re: [PATCH v3 15/30] Hexagon HVX (target/hexagon) helper
> overrides - vector assign & cmov
> 
> On 9/20/21 23:24, Taylor Simpson wrote:
> > Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
> > ---
> >  target/hexagon/gen_tcg_hvx.h | 31
> +++++++++++++++++++++++++++++++
> >  1 file changed, 31 insertions(+)
> >
> > diff --git a/target/hexagon/gen_tcg_hvx.h
> > b/target/hexagon/gen_tcg_hvx.h index eb29566..bcd53d4 100644
> > --- a/target/hexagon/gen_tcg_hvx.h
> > +++ b/target/hexagon/gen_tcg_hvx.h
> > @@ -126,4 +126,35 @@ static inline void assert_vhist_tmp(DisasContext
> *ctx)
> >      } while (0)
> >
> >
> > +#define fGEN_TCG_V6_vassign(SHORTCODE) \
> > +    tcg_gen_gvec_mov(MO_64, VdV_off, VuV_off, \
> > +                     sizeof(MMVector), sizeof(MMVector))
> > +
> > +/* Vector conditional move */
> > +#define fGEN_TCG_VEC_CMOV(PRED) \
> > +    do { \
> > +        TCGv lsb = tcg_temp_new(); \
> > +        TCGLabel *false_label = gen_new_label(); \
> > +        TCGLabel *end_label = gen_new_label(); \
> > +        tcg_gen_andi_tl(lsb, PsV, 1); \
> > +        tcg_gen_brcondi_tl(TCG_COND_NE, lsb, PRED, false_label); \
> > +        tcg_temp_free(lsb); \
> > +        tcg_gen_gvec_mov(MO_64, VdV_off, VuV_off, \
> > +                         sizeof(MMVector), sizeof(MMVector)); \
> > +        tcg_gen_br(end_label); \
> > +        gen_set_label(false_label); \
> > +        tcg_gen_ori_tl(hex_slot_cancelled, hex_slot_cancelled, \
> > +                       1 << insn->slot); \
> > +        gen_set_label(end_label); \
> > +    } while (0)
> 
> Why a macro and not a (eventually inlined) function?

I make these macros to be consistent across the different overrides.  This one could easily be a function, but others cannot.  For example, fGEN_TCG_VEC_CMP_OP can't - see patch 20/30.

Having said that, I can change only the ones that don't need to be macros into functions if that is preferred.


Thanks,
Taylor



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 02/30] Hexagon HVX (target/hexagon) add Hexagon Vector eXtensions (HVX) to core
  2021-09-20 21:23 ` [PATCH v3 02/30] Hexagon HVX (target/hexagon) add Hexagon Vector eXtensions (HVX) to core Taylor Simpson
@ 2021-09-20 22:55   ` Richard Henderson
  0 siblings, 0 replies; 51+ messages in thread
From: Richard Henderson @ 2021-09-20 22:55 UTC (permalink / raw)
  To: Taylor Simpson, qemu-devel; +Cc: ale, bcain, f4bug

On 9/20/21 2:23 PM, Taylor Simpson wrote:
> HVX is a set of wide vector instructions.  Machine state includes
>      vector registers (VRegs)
>      vector predicate registers (QRegs)
>      temporary registers for intermediate values
>      store buffer (masked stores and scatter/gather)
> 
> Signed-off-by: Taylor Simpson<tsimpson@quicinc.com>
> ---

Acked-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 04/30] Hexagon HVX (target/hexagon) instruction attributes
  2021-09-20 21:23 ` [PATCH v3 04/30] Hexagon HVX (target/hexagon) instruction attributes Taylor Simpson
@ 2021-09-20 22:56   ` Richard Henderson
  0 siblings, 0 replies; 51+ messages in thread
From: Richard Henderson @ 2021-09-20 22:56 UTC (permalink / raw)
  To: Taylor Simpson, qemu-devel; +Cc: ale, bcain, f4bug

On 9/20/21 2:23 PM, Taylor Simpson wrote:
> Signed-off-by: Taylor Simpson<tsimpson@quicinc.com>
> ---
>   target/hexagon/attribs_def.h.inc | 22 ++++++++++++++++++++++
>   1 file changed, 22 insertions(+)

Acked-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 05/30] Hexagon HVX (target/hexagon) macros
  2021-09-20 21:24 ` [PATCH v3 05/30] Hexagon HVX (target/hexagon) macros Taylor Simpson
@ 2021-09-20 22:57   ` Richard Henderson
  0 siblings, 0 replies; 51+ messages in thread
From: Richard Henderson @ 2021-09-20 22:57 UTC (permalink / raw)
  To: Taylor Simpson, qemu-devel; +Cc: ale, bcain, f4bug

On 9/20/21 2:24 PM, Taylor Simpson wrote:
> macros to interface with the generator
> macros referenced in instruction semantics
> 
> Signed-off-by: Taylor Simpson<tsimpson@quicinc.com>
> ---
>   target/hexagon/macros.h       |  22 +++
>   target/hexagon/mmvec/macros.h | 341 ++++++++++++++++++++++++++++++++++++++++++
>   2 files changed, 363 insertions(+)
>   create mode 100644 target/hexagon/mmvec/macros.h

Just about unreadable, but ok.
Acked-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 06/30] Hexagon HVX (target/hexagon) import macro definitions
  2021-09-20 21:24 ` [PATCH v3 06/30] Hexagon HVX (target/hexagon) import macro definitions Taylor Simpson
@ 2021-09-20 22:58   ` Richard Henderson
  0 siblings, 0 replies; 51+ messages in thread
From: Richard Henderson @ 2021-09-20 22:58 UTC (permalink / raw)
  To: Taylor Simpson, qemu-devel; +Cc: ale, bcain, f4bug

On 9/20/21 2:24 PM, Taylor Simpson wrote:
> Imported from the Hexagon architecture library
>      imported/allext_macros.def       Top level macro include for all extensions
>      imported/macros.def              Scalar core macros (some HVX here)
>      imported/mmvec/macros.def        HVX macro definitions
> The macro definition files specify instruction attributes that are applied
> to each instruction that reverences the macro.
> 
> Signed-off-by: Taylor Simpson<tsimpson@quicinc.com>
> ---
>   target/hexagon/imported/allext_macros.def |  25 +
>   target/hexagon/imported/macros.def        |  88 ++++
>   target/hexagon/imported/mmvec/macros.def  | 842 ++++++++++++++++++++++++++++++
>   3 files changed, 955 insertions(+)
>   create mode 100644 target/hexagon/imported/allext_macros.def
>   create mode 100755 target/hexagon/imported/mmvec/macros.def

Acked-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 07/30] Hexagon HVX (target/hexagon) semantics generator
  2021-09-20 21:24 ` [PATCH v3 07/30] Hexagon HVX (target/hexagon) semantics generator Taylor Simpson
@ 2021-09-20 22:59   ` Richard Henderson
  0 siblings, 0 replies; 51+ messages in thread
From: Richard Henderson @ 2021-09-20 22:59 UTC (permalink / raw)
  To: Taylor Simpson, qemu-devel; +Cc: ale, bcain, f4bug

On 9/20/21 2:24 PM, Taylor Simpson wrote:
> Add HVX support to the semantics generator
> 
> Signed-off-by: Taylor Simpson<tsimpson@quicinc.com>
> ---
>   target/hexagon/gen_semantics.c | 33 +++++++++++++++++++++++++++++++++
>   target/hexagon/hex_common.py   | 13 +++++++++++++
>   2 files changed, 46 insertions(+)

Acked-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 08/30] Hexagon HVX (target/hexagon) semantics generator - part 2
  2021-09-20 21:24 ` [PATCH v3 08/30] Hexagon HVX (target/hexagon) semantics generator - part 2 Taylor Simpson
@ 2021-09-20 23:03   ` Richard Henderson
  0 siblings, 0 replies; 51+ messages in thread
From: Richard Henderson @ 2021-09-20 23:03 UTC (permalink / raw)
  To: Taylor Simpson, qemu-devel; +Cc: ale, bcain, f4bug

On 9/20/21 2:24 PM, Taylor Simpson wrote:
> Signed-off-by: Taylor Simpson<tsimpson@quicinc.com>
> ---
>   target/hexagon/gen_helper_funcs.py  | 112 ++++++++++++++--
>   target/hexagon/gen_helper_protos.py |  16 ++-
>   target/hexagon/gen_tcg_funcs.py     | 258 ++++++++++++++++++++++++++++++++++--
>   3 files changed, 364 insertions(+), 22 deletions(-)

Acked-by: Richard Henderson <richard.henderson@linaro.org>


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 09/30] Hexagon HVX (target/hexagon) C preprocessor for decode tree
  2021-09-20 21:24 ` [PATCH v3 09/30] Hexagon HVX (target/hexagon) C preprocessor for decode tree Taylor Simpson
@ 2021-09-20 23:04   ` Richard Henderson
  0 siblings, 0 replies; 51+ messages in thread
From: Richard Henderson @ 2021-09-20 23:04 UTC (permalink / raw)
  To: Taylor Simpson, qemu-devel; +Cc: ale, bcain, f4bug

On 9/20/21 2:24 PM, Taylor Simpson wrote:
> Signed-off-by: Taylor Simpson<tsimpson@quicinc.com>
> ---
>   target/hexagon/gen_dectree_import.c | 13 +++++++++++++
>   1 file changed, 13 insertions(+)

Acked-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 16/30] Hexagon HVX (target/hexagon) helper overrides - vector add & sub
  2021-09-20 21:24 ` [PATCH v3 16/30] Hexagon HVX (target/hexagon) helper overrides - vector add & sub Taylor Simpson
@ 2021-09-20 23:18   ` Richard Henderson
  0 siblings, 0 replies; 51+ messages in thread
From: Richard Henderson @ 2021-09-20 23:18 UTC (permalink / raw)
  To: Taylor Simpson, qemu-devel; +Cc: ale, bcain, f4bug

On 9/20/21 2:24 PM, Taylor Simpson wrote:
> Signed-off-by: Taylor Simpson<tsimpson@quicinc.com>
> ---
>   target/hexagon/gen_tcg_hvx.h | 50 ++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 50 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 15/30] Hexagon HVX (target/hexagon) helper overrides - vector assign & cmov
  2021-09-20 21:24 ` [PATCH v3 15/30] Hexagon HVX (target/hexagon) helper overrides - vector assign & cmov Taylor Simpson
  2021-09-20 21:59   ` Philippe Mathieu-Daudé
@ 2021-09-20 23:19   ` Richard Henderson
  1 sibling, 0 replies; 51+ messages in thread
From: Richard Henderson @ 2021-09-20 23:19 UTC (permalink / raw)
  To: Taylor Simpson, qemu-devel; +Cc: ale, bcain, f4bug

On 9/20/21 2:24 PM, Taylor Simpson wrote:
> Signed-off-by: Taylor Simpson<tsimpson@quicinc.com>
> ---
>   target/hexagon/gen_tcg_hvx.h | 31 +++++++++++++++++++++++++++++++
>   1 file changed, 31 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 17/30] Hexagon HVX (target/hexagon) helper overrides - vector shifts
  2021-09-20 21:24 ` [PATCH v3 17/30] Hexagon HVX (target/hexagon) helper overrides - vector shifts Taylor Simpson
@ 2021-09-20 23:20   ` Richard Henderson
  0 siblings, 0 replies; 51+ messages in thread
From: Richard Henderson @ 2021-09-20 23:20 UTC (permalink / raw)
  To: Taylor Simpson, qemu-devel; +Cc: ale, bcain, f4bug

On 9/20/21 2:24 PM, Taylor Simpson wrote:
> Signed-off-by: Taylor Simpson<tsimpson@quicinc.com>
> ---
>   target/hexagon/gen_tcg_hvx.h | 122 +++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 122 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 18/30] Hexagon HVX (target/hexagon) helper overrides - vector max/min
  2021-09-20 21:24 ` [PATCH v3 18/30] Hexagon HVX (target/hexagon) helper overrides - vector max/min Taylor Simpson
@ 2021-09-20 23:20   ` Richard Henderson
  0 siblings, 0 replies; 51+ messages in thread
From: Richard Henderson @ 2021-09-20 23:20 UTC (permalink / raw)
  To: Taylor Simpson, qemu-devel; +Cc: ale, bcain, f4bug

On 9/20/21 2:24 PM, Taylor Simpson wrote:
> Signed-off-by: Taylor Simpson<tsimpson@quicinc.com>
> ---
>   target/hexagon/gen_tcg_hvx.h | 34 ++++++++++++++++++++++++++++++++++
>   1 file changed, 34 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 19/30] Hexagon HVX (target/hexagon) helper overrides - vector logical ops
  2021-09-20 21:24 ` [PATCH v3 19/30] Hexagon HVX (target/hexagon) helper overrides - vector logical ops Taylor Simpson
@ 2021-09-20 23:22   ` Richard Henderson
  0 siblings, 0 replies; 51+ messages in thread
From: Richard Henderson @ 2021-09-20 23:22 UTC (permalink / raw)
  To: Taylor Simpson, qemu-devel; +Cc: ale, bcain, f4bug

On 9/20/21 2:24 PM, Taylor Simpson wrote:
> +#define fGEN_TCG_V6_pred_xor(SHORTCODE) \
> +    tcg_gen_gvec_xor(MO_64, QdV_off, QsV_off, QtV_off, \
> +                     sizeof(MMQReg), sizeof(MMQReg))
> +
> +#define fGEN_TCG_V6_pred_or_n(SHORTCODE) \
> +    do { \
> +        intptr_t tmpoff = offsetof(CPUHexagonState, qtmp); \
> +        tcg_gen_gvec_not(MO_64, tmpoff, QtV_off, \
> +                         sizeof(MMQReg), sizeof(MMQReg)); \
> +        tcg_gen_gvec_or(MO_64, QdV_off, QsV_off, tmpoff, \
> +                        sizeof(MMQReg), sizeof(MMQReg)); \
> +    } while (0)

tcg_gen_gvec_orc.

> +#define fGEN_TCG_V6_pred_and_n(SHORTCODE) \
> +    do { \
> +        intptr_t tmpoff = offsetof(CPUHexagonState, qtmp); \
> +        tcg_gen_gvec_not(MO_64, tmpoff, QtV_off, \
> +                         sizeof(MMQReg), sizeof(MMQReg)); \
> +        tcg_gen_gvec_and(MO_64, QdV_off, QsV_off, tmpoff, \
> +                         sizeof(MMQReg), sizeof(MMQReg)); \
> +    } while (0)

tcg_gen_gvec_andc.


r~


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 20/30] Hexagon HVX (target/hexagon) helper overrides - vector compares
  2021-09-20 21:24 ` [PATCH v3 20/30] Hexagon HVX (target/hexagon) helper overrides - vector compares Taylor Simpson
@ 2021-09-20 23:23   ` Richard Henderson
  0 siblings, 0 replies; 51+ messages in thread
From: Richard Henderson @ 2021-09-20 23:23 UTC (permalink / raw)
  To: Taylor Simpson, qemu-devel; +Cc: ale, bcain, f4bug

On 9/20/21 2:24 PM, Taylor Simpson wrote:
> Signed-off-by: Taylor Simpson<tsimpson@quicinc.com>
> ---
>   target/hexagon/gen_tcg_hvx.h | 103 +++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 103 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 21/30] Hexagon HVX (target/hexagon) helper overrides - vector splat and abs
  2021-09-20 21:24 ` [PATCH v3 21/30] Hexagon HVX (target/hexagon) helper overrides - vector splat and abs Taylor Simpson
@ 2021-09-20 23:24   ` Richard Henderson
  0 siblings, 0 replies; 51+ messages in thread
From: Richard Henderson @ 2021-09-20 23:24 UTC (permalink / raw)
  To: Taylor Simpson, qemu-devel; +Cc: ale, bcain, f4bug

On 9/20/21 2:24 PM, Taylor Simpson wrote:
> Signed-off-by: Taylor Simpson<tsimpson@quicinc.com>
> ---
>   target/hexagon/gen_tcg_hvx.h | 26 ++++++++++++++++++++++++++
>   1 file changed, 26 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 22/30] Hexagon HVX (target/hexagon) helper overrides - vector loads
  2021-09-20 21:24 ` [PATCH v3 22/30] Hexagon HVX (target/hexagon) helper overrides - vector loads Taylor Simpson
@ 2021-09-20 23:26   ` Richard Henderson
  0 siblings, 0 replies; 51+ messages in thread
From: Richard Henderson @ 2021-09-20 23:26 UTC (permalink / raw)
  To: Taylor Simpson, qemu-devel; +Cc: ale, bcain, f4bug

On 9/20/21 2:24 PM, Taylor Simpson wrote:
> Signed-off-by: Taylor Simpson<tsimpson@quicinc.com>
> ---
>   target/hexagon/gen_tcg_hvx.h | 150 +++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 150 insertions(+)

Acked-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 23/30] Hexagon HVX (target/hexagon) helper overrides - vector stores
  2021-09-20 21:24 ` [PATCH v3 23/30] Hexagon HVX (target/hexagon) helper overrides - vector stores Taylor Simpson
@ 2021-09-20 23:27   ` Richard Henderson
  0 siblings, 0 replies; 51+ messages in thread
From: Richard Henderson @ 2021-09-20 23:27 UTC (permalink / raw)
  To: Taylor Simpson, qemu-devel; +Cc: ale, bcain, f4bug

On 9/20/21 2:24 PM, Taylor Simpson wrote:
> Signed-off-by: Taylor Simpson<tsimpson@quicinc.com>
> ---
>   target/hexagon/gen_tcg_hvx.h | 218 +++++++++++++++++++++++++++++++++++++++++++
>   target/hexagon/helper.h      |   1 +
>   target/hexagon/op_helper.c   |   5 +
>   3 files changed, 224 insertions(+)

Acked-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 24/30] Hexagon HVX (target/hexagon) import semantics
  2021-09-20 21:24 ` [PATCH v3 24/30] Hexagon HVX (target/hexagon) import semantics Taylor Simpson
@ 2021-09-20 23:27   ` Richard Henderson
  0 siblings, 0 replies; 51+ messages in thread
From: Richard Henderson @ 2021-09-20 23:27 UTC (permalink / raw)
  To: Taylor Simpson, qemu-devel; +Cc: ale, bcain, f4bug

On 9/20/21 2:24 PM, Taylor Simpson wrote:
> Imported from the Hexagon architecture library
>      imported/allext.idef           Top level file for all extensions
>      imported/mmvec/ext.idef        HVX instruction definitions
> 
> Support functions added to target/hexagon/genptr.c

Acked-by: Richard Henderson <richard.henderson@linaro.org>


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 25/30] Hexagon HVX (target/hexagon) instruction decoding
  2021-09-20 21:24 ` [PATCH v3 25/30] Hexagon HVX (target/hexagon) instruction decoding Taylor Simpson
@ 2021-09-20 23:28   ` Richard Henderson
  0 siblings, 0 replies; 51+ messages in thread
From: Richard Henderson @ 2021-09-20 23:28 UTC (permalink / raw)
  To: Taylor Simpson, qemu-devel; +Cc: ale, bcain, f4bug

On 9/20/21 2:24 PM, Taylor Simpson wrote:
> Add new file to target/hexagon/meson.build
> 
> Signed-off-by: Taylor Simpson<tsimpson@quicinc.com>
> ---
>   target/hexagon/mmvec/decode_ext_mmvec.h |  24 ++++
>   target/hexagon/decode.c                 |  24 +++-
>   target/hexagon/mmvec/decode_ext_mmvec.c | 236 ++++++++++++++++++++++++++++++++
>   target/hexagon/meson.build              |   1 +
>   4 files changed, 283 insertions(+), 2 deletions(-)
>   create mode 100644 target/hexagon/mmvec/decode_ext_mmvec.h
>   create mode 100644 target/hexagon/mmvec/decode_ext_mmvec.c

Acked-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2021-09-20 23:31 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-20 21:23 [PATCH v3 00/30] Hexagon HVX (target/hexagon) patch series Taylor Simpson
2021-09-20 21:23 ` [PATCH v3 01/30] Hexagon HVX (target/hexagon) README Taylor Simpson
2021-09-20 21:23 ` [PATCH v3 02/30] Hexagon HVX (target/hexagon) add Hexagon Vector eXtensions (HVX) to core Taylor Simpson
2021-09-20 22:55   ` Richard Henderson
2021-09-20 21:23 ` [PATCH v3 03/30] Hexagon HVX (target/hexagon) register names Taylor Simpson
2021-09-20 21:23 ` [PATCH v3 04/30] Hexagon HVX (target/hexagon) instruction attributes Taylor Simpson
2021-09-20 22:56   ` Richard Henderson
2021-09-20 21:24 ` [PATCH v3 05/30] Hexagon HVX (target/hexagon) macros Taylor Simpson
2021-09-20 22:57   ` Richard Henderson
2021-09-20 21:24 ` [PATCH v3 06/30] Hexagon HVX (target/hexagon) import macro definitions Taylor Simpson
2021-09-20 22:58   ` Richard Henderson
2021-09-20 21:24 ` [PATCH v3 07/30] Hexagon HVX (target/hexagon) semantics generator Taylor Simpson
2021-09-20 22:59   ` Richard Henderson
2021-09-20 21:24 ` [PATCH v3 08/30] Hexagon HVX (target/hexagon) semantics generator - part 2 Taylor Simpson
2021-09-20 23:03   ` Richard Henderson
2021-09-20 21:24 ` [PATCH v3 09/30] Hexagon HVX (target/hexagon) C preprocessor for decode tree Taylor Simpson
2021-09-20 23:04   ` Richard Henderson
2021-09-20 21:24 ` [PATCH v3 10/30] Hexagon HVX (target/hexagon) instruction utility functions Taylor Simpson
2021-09-20 21:24 ` [PATCH v3 11/30] Hexagon HVX (target/hexagon) helper functions Taylor Simpson
2021-09-20 21:24 ` [PATCH v3 12/30] Hexagon HVX (target/hexagon) TCG generation Taylor Simpson
2021-09-20 21:24 ` [PATCH v3 13/30] Hexagon HVX (target/hexagon) helper overrides infrastructure Taylor Simpson
2021-09-20 21:24 ` [PATCH v3 14/30] Hexagon HVX (target/hexagon) helper overrides for histogram instructions Taylor Simpson
2021-09-20 21:24 ` [PATCH v3 15/30] Hexagon HVX (target/hexagon) helper overrides - vector assign & cmov Taylor Simpson
2021-09-20 21:59   ` Philippe Mathieu-Daudé
2021-09-20 22:11     ` Taylor Simpson
2021-09-20 23:19   ` Richard Henderson
2021-09-20 21:24 ` [PATCH v3 16/30] Hexagon HVX (target/hexagon) helper overrides - vector add & sub Taylor Simpson
2021-09-20 23:18   ` Richard Henderson
2021-09-20 21:24 ` [PATCH v3 17/30] Hexagon HVX (target/hexagon) helper overrides - vector shifts Taylor Simpson
2021-09-20 23:20   ` Richard Henderson
2021-09-20 21:24 ` [PATCH v3 18/30] Hexagon HVX (target/hexagon) helper overrides - vector max/min Taylor Simpson
2021-09-20 23:20   ` Richard Henderson
2021-09-20 21:24 ` [PATCH v3 19/30] Hexagon HVX (target/hexagon) helper overrides - vector logical ops Taylor Simpson
2021-09-20 23:22   ` Richard Henderson
2021-09-20 21:24 ` [PATCH v3 20/30] Hexagon HVX (target/hexagon) helper overrides - vector compares Taylor Simpson
2021-09-20 23:23   ` Richard Henderson
2021-09-20 21:24 ` [PATCH v3 21/30] Hexagon HVX (target/hexagon) helper overrides - vector splat and abs Taylor Simpson
2021-09-20 23:24   ` Richard Henderson
2021-09-20 21:24 ` [PATCH v3 22/30] Hexagon HVX (target/hexagon) helper overrides - vector loads Taylor Simpson
2021-09-20 23:26   ` Richard Henderson
2021-09-20 21:24 ` [PATCH v3 23/30] Hexagon HVX (target/hexagon) helper overrides - vector stores Taylor Simpson
2021-09-20 23:27   ` Richard Henderson
2021-09-20 21:24 ` [PATCH v3 24/30] Hexagon HVX (target/hexagon) import semantics Taylor Simpson
2021-09-20 23:27   ` Richard Henderson
2021-09-20 21:24 ` [PATCH v3 25/30] Hexagon HVX (target/hexagon) instruction decoding Taylor Simpson
2021-09-20 23:28   ` Richard Henderson
2021-09-20 21:24 ` [PATCH v3 26/30] Hexagon HVX (target/hexagon) import instruction encodings Taylor Simpson
2021-09-20 21:24 ` [PATCH v3 27/30] Hexagon HVX (tests/tcg/hexagon) vector_add_int test Taylor Simpson
2021-09-20 21:24 ` [PATCH v3 28/30] Hexagon HVX (tests/tcg/hexagon) hvx_misc test Taylor Simpson
2021-09-20 21:24 ` [PATCH v3 29/30] Hexagon HVX (tests/tcg/hexagon) scatter_gather test Taylor Simpson
2021-09-20 21:24 ` [PATCH v3 30/30] Hexagon HVX (tests/tcg/hexagon) histogram test Taylor Simpson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.