All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/36] target/arm: Convert Neon to decodetree (part 1)
@ 2020-04-30 18:09 Peter Maydell
  2020-04-30 18:09 ` [PATCH 01/36] target/arm/translate-vfp.inc.c: Remove duplicate simd_r32 check Peter Maydell
                   ` (37 more replies)
  0 siblings, 38 replies; 85+ messages in thread
From: Peter Maydell @ 2020-04-30 18:09 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

This patchseries starts in on the job of converting the Arm
Neon decoder to decodetree.

Neon insns come in three major parts:
 * the 'v8.0-and-later' extensions
 * the 'loads and stores' group
 * the 'data processing' group

This patchset converts all of the v8.0-and-later extensions
and the loads-and-stores, plus the "3-registers-same" subgroup
of the data-processing insns.

I'm working on the rest of the dp insns, but this seems like
a pretty large chunk of conversion patches to start with.

thanks
-- PMM

Peter Maydell (36):
  target/arm/translate-vfp.inc.c: Remove duplicate simd_r32 check
  target/arm: Don't allow Thumb Neon insns without FEATURE_NEON
  target/arm: Add stubs for AArch32 Neon decodetree
  target/arm: Convert VCMLA (vector) to decodetree
  target/arm: Convert VCADD (vector) to decodetree
  target/arm: Convert V[US]DOT (vector) to decodetree
  target/arm: Convert VFM[AS]L (vector) to decodetree
  target/arm: Convert VCMLA (scalar) to decodetree
  target/arm: Convert V[US]DOT (scalar) to decodetree
  target/arm: Convert VFM[AS]L (scalar) to decodetree
  target/arm: Convert Neon load/store multiple structures to decodetree
  target/arm: Convert Neon 'load single structure to all lanes' to
    decodetree
  target/arm: Convert Neon 'load/store single structure' to decodetree
  target/arm: Convert Neon 3-reg-same VADD/VSUB to decodetree
  target/arm: Convert Neon 3-reg-same logic ops to decodetree
  target/arm: Convert Neon 3-reg-same VMAX/VMIN to decodetree
  target/arm: Convert Neon 3-reg-same comparisons to decodetree
  target/arm: Convert Neon 3-reg-same VQADD/VQSUB to decodetree
  target/arm: Convert Neon 3-reg-same VMUL, VMLA, VMLS, VSHL to
    decodetree
  target/arm: Convert Neon 3-reg-same VQRDMLAH/VQRDMLSH to decodetree
  target/arm: Convert Neon 3-reg-same SHA to decodetree
  target/arm: Move gen_ function typedefs to translate.h
  target/arm: Convert Neon 64-bit element 3-reg-same insns
  target/arm: Convert Neon VHADD 3-reg-same insns
  target/arm: Convert Neon VRHADD, VHSUB, VABD 3-reg-same insns to
    decodetree
  target/arm: Convert Neon VQSHL, VRSHL, VQRSHL 3-reg-same insns to
    decodetree
  target/arm: Convert Neon VABA 3-reg-same to decodetree
  target/arm: Convert Neon VPMAX/VPMIN 3-reg-same insns to decodetree
  target/arm: Convert Neon VPADD 3-reg-same insns to decodetree
  target/arm: Convert Neon VQDMULH/VQRDMULH 3-reg-same to decodetree
  target/arm: Convert Neon VADD, VSUB, VABD 3-reg-same insns to
    decodetree
  target/arm: Convert Neon VPMIN/VPMAX/VPADD float 3-reg-same insns to
    decodetree
  target/arm: Convert Neon fp VMUL, VMLA, VMLS 3-reg-same insns to
    decodetree
  target/arm: Convert Neon 3-reg-same compare insns to decodetree
  target/arm: Convert Neon fp VMAX/VMIN/VMAXNM/VMINNM/VRECPS/VRSQRTS to
    decodetree
  target/arm: Convert NEON VFMA, VFMS 3-reg-same insns to decodetree

 target/arm/Makefile.objs        |   18 +
 target/arm/translate-a64.h      |    9 -
 target/arm/translate.h          |   26 +
 target/arm/translate-a64.c      |   17 -
 target/arm/translate-neon.inc.c | 1577 +++++++++++++++++++++++++++++++
 target/arm/translate-vfp.inc.c  |    6 -
 target/arm/translate.c          | 1200 +----------------------
 target/arm/neon-dp.decode       |  186 ++++
 target/arm/neon-ls.decode       |   52 +
 target/arm/neon-shared.decode   |   66 ++
 10 files changed, 1967 insertions(+), 1190 deletions(-)
 create mode 100644 target/arm/translate-neon.inc.c
 create mode 100644 target/arm/neon-dp.decode
 create mode 100644 target/arm/neon-ls.decode
 create mode 100644 target/arm/neon-shared.decode

-- 
2.20.1



^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 01/36] target/arm/translate-vfp.inc.c: Remove duplicate simd_r32 check
  2020-04-30 18:09 [PATCH 00/36] target/arm: Convert Neon to decodetree (part 1) Peter Maydell
@ 2020-04-30 18:09 ` Peter Maydell
  2020-04-30 18:21   ` Richard Henderson
  2020-05-01 16:55   ` Philippe Mathieu-Daudé
  2020-04-30 18:09 ` [PATCH 02/36] target/arm: Don't allow Thumb Neon insns without FEATURE_NEON Peter Maydell
                   ` (36 subsequent siblings)
  37 siblings, 2 replies; 85+ messages in thread
From: Peter Maydell @ 2020-04-30 18:09 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Somewhere along theline we accidentally added a duplicate
"using D16-D31 when they don't exist" check to do_vfm_dp()
(probably an artifact of a patchseries rebase). Remove it.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-vfp.inc.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index b087bbd812e..e1a90175983 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -1872,12 +1872,6 @@ static bool do_vfm_dp(DisasContext *s, arg_VFMA_dp *a, bool neg_n, bool neg_d)
         return false;
     }
 
-    /* UNDEF accesses to D16-D31 if they don't exist. */
-    if (!dc_isar_feature(aa32_simd_r32, s) &&
-        ((a->vd | a->vn | a->vm) & 0x10)) {
-        return false;
-    }
-
     if (!vfp_access_check(s)) {
         return true;
     }
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 02/36] target/arm: Don't allow Thumb Neon insns without FEATURE_NEON
  2020-04-30 18:09 [PATCH 00/36] target/arm: Convert Neon to decodetree (part 1) Peter Maydell
  2020-04-30 18:09 ` [PATCH 01/36] target/arm/translate-vfp.inc.c: Remove duplicate simd_r32 check Peter Maydell
@ 2020-04-30 18:09 ` Peter Maydell
  2020-04-30 18:22   ` Richard Henderson
  2020-05-01 16:56   ` Philippe Mathieu-Daudé
  2020-04-30 18:09 ` [PATCH 03/36] target/arm: Add stubs for AArch32 Neon decodetree Peter Maydell
                   ` (35 subsequent siblings)
  37 siblings, 2 replies; 85+ messages in thread
From: Peter Maydell @ 2020-04-30 18:09 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

We were accidentally permitting decode of Thumb Neon insns even if
the CPU didn't have the FEATURE_NEON bit set, because the feature
check was being done before the call to disas_neon_data_insn() and
disas_neon_ls_insn() in the Arm decoder but was omitted from the
Thumb decoder.  Push the feature bit check down into the called
functions so it is done for both Arm and Thumb encodings.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index d4ad2028f12..ab5324a5aaa 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -3258,6 +3258,10 @@ static int disas_neon_ls_insn(DisasContext *s, uint32_t insn)
     TCGv_i32 tmp2;
     TCGv_i64 tmp64;
 
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return 1;
+    }
+
     /* FIXME: this access check should not take precedence over UNDEF
      * for invalid encodings; we will generate incorrect syndrome information
      * for attempts to execute invalid vfp/neon encodings with FP disabled.
@@ -5002,6 +5006,10 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
     TCGv_ptr ptr1, ptr2, ptr3;
     TCGv_i64 tmp64;
 
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return 1;
+    }
+
     /* FIXME: this access check should not take precedence over UNDEF
      * for invalid encodings; we will generate incorrect syndrome information
      * for attempts to execute invalid vfp/neon encodings with FP disabled.
@@ -10948,10 +10956,6 @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
 
         if (((insn >> 25) & 7) == 1) {
             /* NEON Data processing.  */
-            if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
-                goto illegal_op;
-            }
-
             if (disas_neon_data_insn(s, insn)) {
                 goto illegal_op;
             }
@@ -10959,10 +10963,6 @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
         }
         if ((insn & 0x0f100000) == 0x04000000) {
             /* NEON load/store.  */
-            if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
-                goto illegal_op;
-            }
-
             if (disas_neon_ls_insn(s, insn)) {
                 goto illegal_op;
             }
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 03/36] target/arm: Add stubs for AArch32 Neon decodetree
  2020-04-30 18:09 [PATCH 00/36] target/arm: Convert Neon to decodetree (part 1) Peter Maydell
  2020-04-30 18:09 ` [PATCH 01/36] target/arm/translate-vfp.inc.c: Remove duplicate simd_r32 check Peter Maydell
  2020-04-30 18:09 ` [PATCH 02/36] target/arm: Don't allow Thumb Neon insns without FEATURE_NEON Peter Maydell
@ 2020-04-30 18:09 ` Peter Maydell
  2020-04-30 18:30   ` Richard Henderson
  2020-04-30 18:09 ` [PATCH 04/36] target/arm: Convert VCMLA (vector) to decodetree Peter Maydell
                   ` (34 subsequent siblings)
  37 siblings, 1 reply; 85+ messages in thread
From: Peter Maydell @ 2020-04-30 18:09 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Add the infrastructure for building and invoking a decodetree decoder
for the AArch32 Neon encodings.  At the moment the new decoder covers
nothing, so we always fall back to the existing hand-written decode.

We follow the same pattern we did for the VFP decodetree conversion
(commit 78e138bc1f672c145ef6ace74617d and following): code that deals
with Neon will be moving gradually out to translate-neon.vfp.inc,
which we #include into translate.c.

In order to share the decode files between A32 and T32, we
split Neon into 3 parts:
 * data-processing
 * load-store
 * 'shared' encodings

The first two groups of instructions have similar but not identical
A32 and T32 encodings, so we need to manually transform the T32
encoding into the A32 one before calling the decoder; the third group
covers the Neon instructions which are identical in A32 and T32.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/Makefile.objs        | 18 +++++++++++++++++
 target/arm/translate-neon.inc.c | 32 +++++++++++++++++++++++++++++
 target/arm/translate.c          | 36 +++++++++++++++++++++++++++++++--
 target/arm/neon-dp.decode       | 29 ++++++++++++++++++++++++++
 target/arm/neon-ls.decode       | 29 ++++++++++++++++++++++++++
 target/arm/neon-shared.decode   | 27 +++++++++++++++++++++++++
 6 files changed, 169 insertions(+), 2 deletions(-)
 create mode 100644 target/arm/translate-neon.inc.c
 create mode 100644 target/arm/neon-dp.decode
 create mode 100644 target/arm/neon-ls.decode
 create mode 100644 target/arm/neon-shared.decode

diff --git a/target/arm/Makefile.objs b/target/arm/Makefile.objs
index cf26c16f5f6..775b3e24f22 100644
--- a/target/arm/Makefile.objs
+++ b/target/arm/Makefile.objs
@@ -18,6 +18,21 @@ target/arm/decode-sve.inc.c: $(SRC_PATH)/target/arm/sve.decode $(DECODETREE)
 	  $(PYTHON) $(DECODETREE) --decode disas_sve -o $@ $<,\
 	  "GEN", $(TARGET_DIR)$@)
 
+target/arm/decode-neon-shared.inc.c: $(SRC_PATH)/target/arm/neon-shared.decode $(DECODETREE)
+	$(call quiet-command,\
+	  $(PYTHON) $(DECODETREE) --static-decode disas_neon_shared -o $@ $<,\
+	  "GEN", $(TARGET_DIR)$@)
+
+target/arm/decode-neon-dp.inc.c: $(SRC_PATH)/target/arm/neon-dp.decode $(DECODETREE)
+	$(call quiet-command,\
+	  $(PYTHON) $(DECODETREE) --static-decode disas_neon_dp -o $@ $<,\
+	  "GEN", $(TARGET_DIR)$@)
+
+target/arm/decode-neon-ls.inc.c: $(SRC_PATH)/target/arm/neon-ls.decode $(DECODETREE)
+	$(call quiet-command,\
+	  $(PYTHON) $(DECODETREE) --static-decode disas_neon_ls -o $@ $<,\
+	  "GEN", $(TARGET_DIR)$@)
+
 target/arm/decode-vfp.inc.c: $(SRC_PATH)/target/arm/vfp.decode $(DECODETREE)
 	$(call quiet-command,\
 	  $(PYTHON) $(DECODETREE) --static-decode disas_vfp -o $@ $<,\
@@ -49,6 +64,9 @@ target/arm/decode-t16.inc.c: $(SRC_PATH)/target/arm/t16.decode $(DECODETREE)
 	  "GEN", $(TARGET_DIR)$@)
 
 target/arm/translate-sve.o: target/arm/decode-sve.inc.c
+target/arm/translate.o: target/arm/decode-neon-shared.inc.c
+target/arm/translate.o: target/arm/decode-neon-dp.inc.c
+target/arm/translate.o: target/arm/decode-neon-ls.inc.c
 target/arm/translate.o: target/arm/decode-vfp.inc.c
 target/arm/translate.o: target/arm/decode-vfp-uncond.inc.c
 target/arm/translate.o: target/arm/decode-a32.inc.c
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
new file mode 100644
index 00000000000..a33e81ba3ab
--- /dev/null
+++ b/target/arm/translate-neon.inc.c
@@ -0,0 +1,32 @@
+/*
+ *  ARM translation: AArch32 Neon instructions
+ *
+ *  Copyright (c) 2003 Fabrice Bellard
+ *  Copyright (c) 2005-2007 CodeSourcery
+ *  Copyright (c) 2007 OpenedHand, Ltd.
+ *  Copyright (c) 2020 Linaro, Ltd.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+/*
+ * This file is intended to be included from translate.c; it uses
+ * some macros and definitions provided by that file.
+ * It might be possible to convert it to a standalone .c file eventually.
+ */
+
+/* Include the generated Neon decoder */
+#include "decode-neon-dp.inc.c"
+#include "decode-neon-ls.inc.c"
+#include "decode-neon-shared.inc.c"
diff --git a/target/arm/translate.c b/target/arm/translate.c
index ab5324a5aaa..bd766391e9e 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -1313,8 +1313,9 @@ static TCGv_ptr vfp_reg_ptr(bool dp, int reg)
 
 #define ARM_CP_RW_BIT   (1 << 20)
 
-/* Include the VFP decoder */
+/* Include the VFP and Neon decoders */
 #include "translate-vfp.inc.c"
+#include "translate-neon.inc.c"
 
 static inline void iwmmxt_load_reg(TCGv_i64 var, int reg)
 {
@@ -10949,7 +10950,10 @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
         /* Unconditional instructions.  */
         /* TODO: Perhaps merge these into one decodetree output file.  */
         if (disas_a32_uncond(s, insn) ||
-            disas_vfp_uncond(s, insn)) {
+            disas_vfp_uncond(s, insn) ||
+            disas_neon_dp(s, insn) ||
+            disas_neon_ls(s, insn) ||
+            disas_neon_shared(s, insn)) {
             return;
         }
         /* fall back to legacy decoder */
@@ -11102,6 +11106,33 @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
         ARCH(6T2);
     }
 
+    if ((insn & 0xef000000) == 0xef000000) {
+        /*
+         * T32 encodings 0b111p_1111_qqqq_qqqq_qqqq_qqqq_qqqq_qqqq
+         * transform into
+         * A32 encodings 0b1111_001p_qqqq_qqqq_qqqq_qqqq_qqqq_qqqq
+         */
+        uint32_t a32_insn = (insn & 0xe2ffffff) |
+            ((insn & (1 << 28)) >> 4) | (1 << 28);
+
+        if (disas_neon_dp(s, a32_insn)) {
+            return;
+        }
+    }
+
+    if ((insn & 0xff100000) == 0xf9000000) {
+        /*
+         * T32 encodings 0b1111_1001_ppp0_qqqq_qqqq_qqqq_qqqq_qqqq
+         * transform into
+         * A32 encodings 0b1111_0100_ppp0_qqqq_qqqq_qqqq_qqqq_qqqq
+         */
+        uint32_t a32_insn = (insn & 0x00ffffff) | 0xf4000000;
+
+        if (disas_neon_ls(s, a32_insn)) {
+            return;
+        }
+    }
+
     /*
      * TODO: Perhaps merge these into one decodetree output file.
      * Note disas_vfp is written for a32 with cond field in the
@@ -11109,6 +11140,7 @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
      */
     if (disas_t32(s, insn) ||
         disas_vfp_uncond(s, insn) ||
+        disas_neon_shared(s, insn) ||
         ((insn >> 28) == 0xe && disas_vfp(s, insn))) {
         return;
     }
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
new file mode 100644
index 00000000000..c89a1a58591
--- /dev/null
+++ b/target/arm/neon-dp.decode
@@ -0,0 +1,29 @@
+# AArch32 Neon data-processing instruction descriptions
+#
+#  Copyright (c) 2020 Linaro, Ltd
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, see <http://www.gnu.org/licenses/>.
+
+#
+# This file is processed by scripts/decodetree.py
+#
+
+# Encodings for Neon data processing instructions where the T32 encoding
+# is a simple transformation of the A32 encoding.
+# More specifically, this file covers instructions where the A32 encoding is
+#   0b1111_001p_qqqq_qqqq_qqqq_qqqq_qqqq_qqqq
+# and the T32 encoding is
+#   0b111p_1111_qqqq_qqqq_qqqq_qqqq_qqqq_qqqq
+# This file works on the A32 encoding only; calling code for T32 has to
+# transform the insn into the A32 version first.
diff --git a/target/arm/neon-ls.decode b/target/arm/neon-ls.decode
new file mode 100644
index 00000000000..2b16c9256df
--- /dev/null
+++ b/target/arm/neon-ls.decode
@@ -0,0 +1,29 @@
+# AArch32 Neon load/store instruction descriptions
+#
+#  Copyright (c) 2020 Linaro, Ltd
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, see <http://www.gnu.org/licenses/>.
+
+#
+# This file is processed by scripts/decodetree.py
+#
+
+# Encodings for Neon load/store instructions where the T32 encoding
+# is a simple transformation of the A32 encoding.
+# More specifically, this file covers instructions where the A32 encoding is
+#   0b1111_0100_xxx0_xxxx_xxxx_xxxx_xxxx_xxxx
+# and the T32 encoding is
+#   0b1111_1001_xxx0_xxxx_xxxx_xxxx_xxxx_xxxx
+# This file works on the A32 encoding only; calling code for T32 has to
+# transform the insn into the A32 version first.
diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
new file mode 100644
index 00000000000..3aea7c5e188
--- /dev/null
+++ b/target/arm/neon-shared.decode
@@ -0,0 +1,27 @@
+# AArch32 Neon instruction descriptions
+#
+#  Copyright (c) 2020 Linaro, Ltd
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, see <http://www.gnu.org/licenses/>.
+
+#
+# This file is processed by scripts/decodetree.py
+#
+
+# Encodings for Neon instructions whose encoding is the same for
+# both A32 and T32.
+
+# More specifically, this covers:
+# 2reg scalar ext: 0b1111_1110_xxxx_xxxx_xxxx_1x0x_xxxx_xxxx
+# 3same ext:       0b1111_110x_xxxx_xxxx_xxxx_1x0x_xxxx_xxxx
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 04/36] target/arm: Convert VCMLA (vector) to decodetree
  2020-04-30 18:09 [PATCH 00/36] target/arm: Convert Neon to decodetree (part 1) Peter Maydell
                   ` (2 preceding siblings ...)
  2020-04-30 18:09 ` [PATCH 03/36] target/arm: Add stubs for AArch32 Neon decodetree Peter Maydell
@ 2020-04-30 18:09 ` Peter Maydell
  2020-04-30 18:34   ` Richard Henderson
  2020-04-30 18:09 ` [PATCH 05/36] target/arm: Convert VCADD " Peter Maydell
                   ` (33 subsequent siblings)
  37 siblings, 1 reply; 85+ messages in thread
From: Peter Maydell @ 2020-04-30 18:09 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the VCMLA (vector) insns in the 3same extension group to
decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-neon.inc.c | 37 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 11 +---------
 target/arm/neon-shared.decode   | 11 ++++++++++
 3 files changed, 49 insertions(+), 10 deletions(-)

diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index a33e81ba3ab..0baae1338a3 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -30,3 +30,40 @@
 #include "decode-neon-dp.inc.c"
 #include "decode-neon-ls.inc.c"
 #include "decode-neon-shared.inc.c"
+
+static bool trans_VCMLA(DisasContext *s, arg_VCMLA *a)
+{
+    int opr_sz;
+    TCGv_ptr fpst;
+    gen_helper_gvec_3_ptr *fn_gvec_ptr;
+
+    if (!dc_isar_feature(aa32_vcma, s)
+        || (!a->size && !dc_isar_feature(aa32_fp16_arith, s))) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if ((a->vn | a->vm | a->vd) & a->q) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    opr_sz = (1 + a->q) * 8;
+    fpst = get_fpstatus_ptr(1);
+    fn_gvec_ptr = a->size ? gen_helper_gvec_fcmlas : gen_helper_gvec_fcmlah;
+    tcg_gen_gvec_3_ptr(vfp_reg_offset(1, a->vd),
+                       vfp_reg_offset(1, a->vn),
+                       vfp_reg_offset(1, a->vm),
+                       fpst, opr_sz, opr_sz, a->rot,
+                       fn_gvec_ptr);
+    tcg_temp_free_ptr(fpst);
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index bd766391e9e..17167634e29 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -7048,16 +7048,7 @@ static int disas_neon_insn_3same_ext(DisasContext *s, uint32_t insn)
     bool is_long = false, q = extract32(insn, 6, 1);
     bool ptr_is_env = false;
 
-    if ((insn & 0xfe200f10) == 0xfc200800) {
-        /* VCMLA -- 1111 110R R.1S .... .... 1000 ...0 .... */
-        int size = extract32(insn, 20, 1);
-        data = extract32(insn, 23, 2); /* rot */
-        if (!dc_isar_feature(aa32_vcma, s)
-            || (!size && !dc_isar_feature(aa32_fp16_arith, s))) {
-            return 1;
-        }
-        fn_gvec_ptr = size ? gen_helper_gvec_fcmlas : gen_helper_gvec_fcmlah;
-    } else if ((insn & 0xfea00f10) == 0xfc800800) {
+    if ((insn & 0xfea00f10) == 0xfc800800) {
         /* VCADD -- 1111 110R 1.0S .... .... 1000 ...0 .... */
         int size = extract32(insn, 20, 1);
         data = extract32(insn, 24, 1); /* rot */
diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
index 3aea7c5e188..d1d707a56d5 100644
--- a/target/arm/neon-shared.decode
+++ b/target/arm/neon-shared.decode
@@ -25,3 +25,14 @@
 # More specifically, this covers:
 # 2reg scalar ext: 0b1111_1110_xxxx_xxxx_xxxx_1x0x_xxxx_xxxx
 # 3same ext:       0b1111_110x_xxxx_xxxx_xxxx_1x0x_xxxx_xxxx
+
+# VFP/Neon register fields; same as vfp.decode
+%vm_dp  5:1 0:4
+%vm_sp  0:4 5:1
+%vn_dp  7:1 16:4
+%vn_sp  16:4 7:1
+%vd_dp  22:1 12:4
+%vd_sp  12:4 22:1
+
+VCMLA          1111 110 rot:2 . 1 size:1 .... .... 1000 . q:1 . 0 .... \
+               vm=%vm_dp vn=%vn_dp vd=%vd_dp
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 05/36] target/arm: Convert VCADD (vector) to decodetree
  2020-04-30 18:09 [PATCH 00/36] target/arm: Convert Neon to decodetree (part 1) Peter Maydell
                   ` (3 preceding siblings ...)
  2020-04-30 18:09 ` [PATCH 04/36] target/arm: Convert VCMLA (vector) to decodetree Peter Maydell
@ 2020-04-30 18:09 ` Peter Maydell
  2020-04-30 18:35   ` Richard Henderson
  2020-04-30 18:09 ` [PATCH 06/36] target/arm: Convert V[US]DOT " Peter Maydell
                   ` (32 subsequent siblings)
  37 siblings, 1 reply; 85+ messages in thread
From: Peter Maydell @ 2020-04-30 18:09 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the VCADD (vector) insns to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-neon.inc.c | 37 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 11 +---------
 target/arm/neon-shared.decode   |  3 +++
 3 files changed, 41 insertions(+), 10 deletions(-)

diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index 0baae1338a3..28011e88d9e 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -67,3 +67,40 @@ static bool trans_VCMLA(DisasContext *s, arg_VCMLA *a)
     tcg_temp_free_ptr(fpst);
     return true;
 }
+
+static bool trans_VCADD(DisasContext *s, arg_VCADD *a)
+{
+    int opr_sz;
+    TCGv_ptr fpst;
+    gen_helper_gvec_3_ptr *fn_gvec_ptr;
+
+    if (!dc_isar_feature(aa32_vcma, s)
+        || (!a->size && !dc_isar_feature(aa32_fp16_arith, s))) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if ((a->vn | a->vm | a->vd) & a->q) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    opr_sz = (1 + a->q) * 8;
+    fpst = get_fpstatus_ptr(1);
+    fn_gvec_ptr = a->size ? gen_helper_gvec_fcadds : gen_helper_gvec_fcaddh;
+    tcg_gen_gvec_3_ptr(vfp_reg_offset(1, a->vd),
+                       vfp_reg_offset(1, a->vn),
+                       vfp_reg_offset(1, a->vm),
+                       fpst, opr_sz, opr_sz, a->rot,
+                       fn_gvec_ptr);
+    tcg_temp_free_ptr(fpst);
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 17167634e29..571b64aa89d 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -7048,16 +7048,7 @@ static int disas_neon_insn_3same_ext(DisasContext *s, uint32_t insn)
     bool is_long = false, q = extract32(insn, 6, 1);
     bool ptr_is_env = false;
 
-    if ((insn & 0xfea00f10) == 0xfc800800) {
-        /* VCADD -- 1111 110R 1.0S .... .... 1000 ...0 .... */
-        int size = extract32(insn, 20, 1);
-        data = extract32(insn, 24, 1); /* rot */
-        if (!dc_isar_feature(aa32_vcma, s)
-            || (!size && !dc_isar_feature(aa32_fp16_arith, s))) {
-            return 1;
-        }
-        fn_gvec_ptr = size ? gen_helper_gvec_fcadds : gen_helper_gvec_fcaddh;
-    } else if ((insn & 0xfeb00f00) == 0xfc200d00) {
+    if ((insn & 0xfeb00f00) == 0xfc200d00) {
         /* V[US]DOT -- 1111 1100 0.10 .... .... 1101 .Q.U .... */
         bool u = extract32(insn, 4, 1);
         if (!dc_isar_feature(aa32_dp, s)) {
diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
index d1d707a56d5..ed65dae1809 100644
--- a/target/arm/neon-shared.decode
+++ b/target/arm/neon-shared.decode
@@ -36,3 +36,6 @@
 
 VCMLA          1111 110 rot:2 . 1 size:1 .... .... 1000 . q:1 . 0 .... \
                vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+VCADD          1111 110 rot:1 1 . 0 size:1 .... .... 1000 . q:1 . 0 .... \
+               vm=%vm_dp vn=%vn_dp vd=%vd_dp
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 06/36] target/arm: Convert V[US]DOT (vector) to decodetree
  2020-04-30 18:09 [PATCH 00/36] target/arm: Convert Neon to decodetree (part 1) Peter Maydell
                   ` (4 preceding siblings ...)
  2020-04-30 18:09 ` [PATCH 05/36] target/arm: Convert VCADD " Peter Maydell
@ 2020-04-30 18:09 ` Peter Maydell
  2020-04-30 18:36   ` Richard Henderson
  2020-04-30 18:09 ` [PATCH 07/36] target/arm: Convert VFM[AS]L " Peter Maydell
                   ` (31 subsequent siblings)
  37 siblings, 1 reply; 85+ messages in thread
From: Peter Maydell @ 2020-04-30 18:09 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the V[US]DOT (vector) insns to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-neon.inc.c | 32 ++++++++++++++++++++++++++++++++
 target/arm/translate.c          |  9 +--------
 target/arm/neon-shared.decode   |  4 ++++
 3 files changed, 37 insertions(+), 8 deletions(-)

diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index 28011e88d9e..6537506c5b6 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -104,3 +104,35 @@ static bool trans_VCADD(DisasContext *s, arg_VCADD *a)
     tcg_temp_free_ptr(fpst);
     return true;
 }
+
+static bool trans_VDOT(DisasContext *s, arg_VDOT *a)
+{
+    int opr_sz;
+    gen_helper_gvec_3 *fn_gvec;
+
+    if (!dc_isar_feature(aa32_dp, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if ((a->vn | a->vm | a->vd) & a->q) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    opr_sz = (1 + a->q) * 8;
+    fn_gvec = a->u ? gen_helper_gvec_udot_b : gen_helper_gvec_sdot_b;
+    tcg_gen_gvec_3_ool(vfp_reg_offset(1, a->vd),
+                       vfp_reg_offset(1, a->vn),
+                       vfp_reg_offset(1, a->vm),
+                       opr_sz, opr_sz, 0, fn_gvec);
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 571b64aa89d..1190ad17cfd 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -7048,14 +7048,7 @@ static int disas_neon_insn_3same_ext(DisasContext *s, uint32_t insn)
     bool is_long = false, q = extract32(insn, 6, 1);
     bool ptr_is_env = false;
 
-    if ((insn & 0xfeb00f00) == 0xfc200d00) {
-        /* V[US]DOT -- 1111 1100 0.10 .... .... 1101 .Q.U .... */
-        bool u = extract32(insn, 4, 1);
-        if (!dc_isar_feature(aa32_dp, s)) {
-            return 1;
-        }
-        fn_gvec = u ? gen_helper_gvec_udot_b : gen_helper_gvec_sdot_b;
-    } else if ((insn & 0xff300f10) == 0xfc200810) {
+    if ((insn & 0xff300f10) == 0xfc200810) {
         /* VFM[AS]L -- 1111 1100 S.10 .... .... 1000 .Q.1 .... */
         int is_s = extract32(insn, 23, 1);
         if (!dc_isar_feature(aa32_fhm, s)) {
diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
index ed65dae1809..c9c641905d3 100644
--- a/target/arm/neon-shared.decode
+++ b/target/arm/neon-shared.decode
@@ -39,3 +39,7 @@ VCMLA          1111 110 rot:2 . 1 size:1 .... .... 1000 . q:1 . 0 .... \
 
 VCADD          1111 110 rot:1 1 . 0 size:1 .... .... 1000 . q:1 . 0 .... \
                vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+# VUDOT and VSDOT
+VDOT           1111 110 00 . 10 .... .... 1101 . q:1 . u:1 .... \
+               vm=%vm_dp vn=%vn_dp vd=%vd_dp
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 07/36] target/arm: Convert VFM[AS]L (vector) to decodetree
  2020-04-30 18:09 [PATCH 00/36] target/arm: Convert Neon to decodetree (part 1) Peter Maydell
                   ` (5 preceding siblings ...)
  2020-04-30 18:09 ` [PATCH 06/36] target/arm: Convert V[US]DOT " Peter Maydell
@ 2020-04-30 18:09 ` Peter Maydell
  2020-04-30 18:43   ` Richard Henderson
  2020-04-30 18:09 ` [PATCH 08/36] target/arm: Convert VCMLA (scalar) " Peter Maydell
                   ` (30 subsequent siblings)
  37 siblings, 1 reply; 85+ messages in thread
From: Peter Maydell @ 2020-04-30 18:09 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the VFM[AS]L (vector) insns to decodetree.  This is the last
insn in the legacy decoder for the 3same_ext group, so we can
delete the legacy decoder function for the group entirely.

Note that in disas_thumb2_insn() the parts of this encoding space
where the decodetree decoder returns false will correctly be directed
to illegal_op by the "(insn & (1 << 28))" check so they won't fall
into disas_coproc_insn() by mistake.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-neon.inc.c | 31 +++++++++++
 target/arm/translate.c          | 92 +--------------------------------
 target/arm/neon-shared.decode   |  6 +++
 3 files changed, 38 insertions(+), 91 deletions(-)

diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index 6537506c5b6..6c58abc54b5 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -136,3 +136,34 @@ static bool trans_VDOT(DisasContext *s, arg_VDOT *a)
                        opr_sz, opr_sz, 0, fn_gvec);
     return true;
 }
+
+static bool trans_VFML(DisasContext *s, arg_VFML *a)
+{
+    int opr_sz;
+
+    if (!dc_isar_feature(aa32_fhm, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        (a->vd & 0x10)) {
+        return false;
+    }
+
+    if (a->vd & a->q) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    opr_sz = (1 + a->q) * 8;
+    tcg_gen_gvec_3_ptr(vfp_reg_offset(1, a->vd),
+                       vfp_reg_offset(a->q, a->vn),
+                       vfp_reg_offset(a->q, a->vm),
+                       cpu_env, opr_sz, opr_sz, a->s, /* is_2 == 0 */
+                       gen_helper_gvec_fmlal_a32);
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 1190ad17cfd..caa18c8c56c 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -7032,84 +7032,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
     return 0;
 }
 
-/* Advanced SIMD three registers of the same length extension.
- *  31           25    23  22    20   16   12  11   10   9    8        3     0
- * +---------------+-----+---+-----+----+----+---+----+---+----+---------+----+
- * | 1 1 1 1 1 1 0 | op1 | D | op2 | Vn | Vd | 1 | o3 | 0 | o4 | N Q M U | Vm |
- * +---------------+-----+---+-----+----+----+---+----+---+----+---------+----+
- */
-static int disas_neon_insn_3same_ext(DisasContext *s, uint32_t insn)
-{
-    gen_helper_gvec_3 *fn_gvec = NULL;
-    gen_helper_gvec_3_ptr *fn_gvec_ptr = NULL;
-    int rd, rn, rm, opr_sz;
-    int data = 0;
-    int off_rn, off_rm;
-    bool is_long = false, q = extract32(insn, 6, 1);
-    bool ptr_is_env = false;
-
-    if ((insn & 0xff300f10) == 0xfc200810) {
-        /* VFM[AS]L -- 1111 1100 S.10 .... .... 1000 .Q.1 .... */
-        int is_s = extract32(insn, 23, 1);
-        if (!dc_isar_feature(aa32_fhm, s)) {
-            return 1;
-        }
-        is_long = true;
-        data = is_s; /* is_2 == 0 */
-        fn_gvec_ptr = gen_helper_gvec_fmlal_a32;
-        ptr_is_env = true;
-    } else {
-        return 1;
-    }
-
-    VFP_DREG_D(rd, insn);
-    if (rd & q) {
-        return 1;
-    }
-    if (q || !is_long) {
-        VFP_DREG_N(rn, insn);
-        VFP_DREG_M(rm, insn);
-        if ((rn | rm) & q & !is_long) {
-            return 1;
-        }
-        off_rn = vfp_reg_offset(1, rn);
-        off_rm = vfp_reg_offset(1, rm);
-    } else {
-        rn = VFP_SREG_N(insn);
-        rm = VFP_SREG_M(insn);
-        off_rn = vfp_reg_offset(0, rn);
-        off_rm = vfp_reg_offset(0, rm);
-    }
-
-    if (s->fp_excp_el) {
-        gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
-                           syn_simd_access_trap(1, 0xe, false), s->fp_excp_el);
-        return 0;
-    }
-    if (!s->vfp_enabled) {
-        return 1;
-    }
-
-    opr_sz = (1 + q) * 8;
-    if (fn_gvec_ptr) {
-        TCGv_ptr ptr;
-        if (ptr_is_env) {
-            ptr = cpu_env;
-        } else {
-            ptr = get_fpstatus_ptr(1);
-        }
-        tcg_gen_gvec_3_ptr(vfp_reg_offset(1, rd), off_rn, off_rm, ptr,
-                           opr_sz, opr_sz, data, fn_gvec_ptr);
-        if (!ptr_is_env) {
-            tcg_temp_free_ptr(ptr);
-        }
-    } else {
-        tcg_gen_gvec_3_ool(vfp_reg_offset(1, rd), off_rn, off_rm,
-                           opr_sz, opr_sz, data, fn_gvec);
-    }
-    return 0;
-}
-
 /* Advanced SIMD two registers and a scalar extension.
  *  31             24   23  22   20   16   12  11   10   9    8        3     0
  * +-----------------+----+---+----+----+----+---+----+---+----+---------+----+
@@ -10956,12 +10878,6 @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
                     }
                 }
             }
-        } else if ((insn & 0x0e000a00) == 0x0c000800
-                   && arm_dc_feature(s, ARM_FEATURE_V8)) {
-            if (disas_neon_insn_3same_ext(s, insn)) {
-                goto illegal_op;
-            }
-            return;
         } else if ((insn & 0x0f000a00) == 0x0e000800
                    && arm_dc_feature(s, ARM_FEATURE_V8)) {
             if (disas_neon_insn_2reg_scalar_ext(s, insn)) {
@@ -11145,15 +11061,9 @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
             }
             break;
         }
-        if ((insn & 0xfe000a00) == 0xfc000800
+        if ((insn & 0xff000a00) == 0xfe000800
             && arm_dc_feature(s, ARM_FEATURE_V8)) {
             /* The Thumb2 and ARM encodings are identical.  */
-            if (disas_neon_insn_3same_ext(s, insn)) {
-                goto illegal_op;
-            }
-        } else if ((insn & 0xff000a00) == 0xfe000800
-                   && arm_dc_feature(s, ARM_FEATURE_V8)) {
-            /* The Thumb2 and ARM encodings are identical.  */
             if (disas_neon_insn_2reg_scalar_ext(s, insn)) {
                 goto illegal_op;
             }
diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
index c9c641905d3..90cd5c871e2 100644
--- a/target/arm/neon-shared.decode
+++ b/target/arm/neon-shared.decode
@@ -43,3 +43,9 @@ VCADD          1111 110 rot:1 1 . 0 size:1 .... .... 1000 . q:1 . 0 .... \
 # VUDOT and VSDOT
 VDOT           1111 110 00 . 10 .... .... 1101 . q:1 . u:1 .... \
                vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+# VFM[AS]L
+VFML           1111 110 0 s:1 . 10 .... .... 1000 . 0 . 1 .... \
+               vm=%vm_sp vn=%vn_sp vd=%vd_dp q=0
+VFML           1111 110 0 s:1 . 10 .... .... 1000 . 1 . 1 .... \
+               vm=%vm_dp vn=%vn_dp vd=%vd_dp q=1
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 08/36] target/arm: Convert VCMLA (scalar) to decodetree
  2020-04-30 18:09 [PATCH 00/36] target/arm: Convert Neon to decodetree (part 1) Peter Maydell
                   ` (6 preceding siblings ...)
  2020-04-30 18:09 ` [PATCH 07/36] target/arm: Convert VFM[AS]L " Peter Maydell
@ 2020-04-30 18:09 ` Peter Maydell
  2020-04-30 19:00   ` Richard Henderson
  2020-04-30 18:09 ` [PATCH 09/36] target/arm: Convert V[US]DOT " Peter Maydell
                   ` (29 subsequent siblings)
  37 siblings, 1 reply; 85+ messages in thread
From: Peter Maydell @ 2020-04-30 18:09 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert VCMLA (scalar) in the 2reg-scalar-ext group to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-neon.inc.c | 40 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 26 +--------------------
 target/arm/neon-shared.decode   |  5 +++++
 3 files changed, 46 insertions(+), 25 deletions(-)

diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index 6c58abc54b5..92eccbf8236 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -167,3 +167,43 @@ static bool trans_VFML(DisasContext *s, arg_VFML *a)
                        gen_helper_gvec_fmlal_a32);
     return true;
 }
+
+static bool trans_VCMLA_scalar(DisasContext *s, arg_VCMLA_scalar *a)
+{
+    gen_helper_gvec_3_ptr *fn_gvec_ptr;
+    int opr_sz;
+    TCGv_ptr fpst;
+
+    if (!dc_isar_feature(aa32_vcma, s)) {
+        return false;
+    }
+    if (a->size == 0 && !dc_isar_feature(aa32_fp16_arith, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if ((a->vd | a->vn) & a->q) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    fn_gvec_ptr = (a->size ? gen_helper_gvec_fcmlas_idx
+                   : gen_helper_gvec_fcmlah_idx);
+    opr_sz = (1 + a->q) * 8;
+    fpst = get_fpstatus_ptr(1);
+    tcg_gen_gvec_3_ptr(vfp_reg_offset(1, a->vd),
+                       vfp_reg_offset(1, a->vn),
+                       vfp_reg_offset(1, a->vm),
+                       fpst, opr_sz, opr_sz,
+                       (a->index << 2) | a->rot, fn_gvec_ptr);
+    tcg_temp_free_ptr(fpst);
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index caa18c8c56c..b82e54b7b23 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -7049,31 +7049,7 @@ static int disas_neon_insn_2reg_scalar_ext(DisasContext *s, uint32_t insn)
     bool is_long = false, q = extract32(insn, 6, 1);
     bool ptr_is_env = false;
 
-    if ((insn & 0xff000f10) == 0xfe000800) {
-        /* VCMLA (indexed) -- 1111 1110 S.RR .... .... 1000 ...0 .... */
-        int rot = extract32(insn, 20, 2);
-        int size = extract32(insn, 23, 1);
-        int index;
-
-        if (!dc_isar_feature(aa32_vcma, s)) {
-            return 1;
-        }
-        if (size == 0) {
-            if (!dc_isar_feature(aa32_fp16_arith, s)) {
-                return 1;
-            }
-            /* For fp16, rm is just Vm, and index is M.  */
-            rm = extract32(insn, 0, 4);
-            index = extract32(insn, 5, 1);
-        } else {
-            /* For fp32, rm is the usual M:Vm, and index is 0.  */
-            VFP_DREG_M(rm, insn);
-            index = 0;
-        }
-        data = (index << 2) | rot;
-        fn_gvec_ptr = (size ? gen_helper_gvec_fcmlas_idx
-                       : gen_helper_gvec_fcmlah_idx);
-    } else if ((insn & 0xffb00f00) == 0xfe200d00) {
+    if ((insn & 0xffb00f00) == 0xfe200d00) {
         /* V[US]DOT -- 1111 1110 0.10 .... .... 1101 .Q.U .... */
         int u = extract32(insn, 4, 1);
 
diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
index 90cd5c871e2..c11d755ed14 100644
--- a/target/arm/neon-shared.decode
+++ b/target/arm/neon-shared.decode
@@ -49,3 +49,8 @@ VFML           1111 110 0 s:1 . 10 .... .... 1000 . 0 . 1 .... \
                vm=%vm_sp vn=%vn_sp vd=%vd_dp q=0
 VFML           1111 110 0 s:1 . 10 .... .... 1000 . 1 . 1 .... \
                vm=%vm_dp vn=%vn_dp vd=%vd_dp q=1
+
+VCMLA_scalar   1111 1110 0 . rot:2 .... .... 1000 . q:1 index:1 0 vm:4 \
+               vn=%vn_dp vd=%vd_dp size=0
+VCMLA_scalar   1111 1110 1 . rot:2 .... .... 1000 . q:1 . 0 .... \
+               vm=%vm_dp vn=%vn_dp vd=%vd_dp size=1 index=0
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 09/36] target/arm: Convert V[US]DOT (scalar) to decodetree
  2020-04-30 18:09 [PATCH 00/36] target/arm: Convert Neon to decodetree (part 1) Peter Maydell
                   ` (7 preceding siblings ...)
  2020-04-30 18:09 ` [PATCH 08/36] target/arm: Convert VCMLA (scalar) " Peter Maydell
@ 2020-04-30 18:09 ` Peter Maydell
  2020-04-30 19:01   ` Richard Henderson
  2020-04-30 18:09 ` [PATCH 10/36] target/arm: Convert VFM[AS]L " Peter Maydell
                   ` (28 subsequent siblings)
  37 siblings, 1 reply; 85+ messages in thread
From: Peter Maydell @ 2020-04-30 18:09 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the V[US]DOT (scalar) insns in the 2reg-scalar-ext group
to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-neon.inc.c | 35 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 13 +-----------
 target/arm/neon-shared.decode   |  3 +++
 3 files changed, 39 insertions(+), 12 deletions(-)

diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index 92eccbf8236..7cc6ccb0697 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -207,3 +207,38 @@ static bool trans_VCMLA_scalar(DisasContext *s, arg_VCMLA_scalar *a)
     tcg_temp_free_ptr(fpst);
     return true;
 }
+
+static bool trans_VDOT_scalar(DisasContext *s, arg_VDOT_scalar *a)
+{
+    gen_helper_gvec_3 *fn_gvec;
+    int opr_sz;
+    TCGv_ptr fpst;
+
+    if (!dc_isar_feature(aa32_dp, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn) & 0x10)) {
+        return false;
+    }
+
+    if ((a->vd | a->vn) & a->q) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    fn_gvec = a->u ? gen_helper_gvec_udot_idx_b : gen_helper_gvec_sdot_idx_b;
+    opr_sz = (1 + a->q) * 8;
+    fpst = get_fpstatus_ptr(1);
+    tcg_gen_gvec_3_ool(vfp_reg_offset(1, a->vd),
+                       vfp_reg_offset(1, a->vn),
+                       vfp_reg_offset(1, a->rm),
+                       opr_sz, opr_sz, a->index, fn_gvec);
+    tcg_temp_free_ptr(fpst);
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index b82e54b7b23..af2714292ea 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -7049,18 +7049,7 @@ static int disas_neon_insn_2reg_scalar_ext(DisasContext *s, uint32_t insn)
     bool is_long = false, q = extract32(insn, 6, 1);
     bool ptr_is_env = false;
 
-    if ((insn & 0xffb00f00) == 0xfe200d00) {
-        /* V[US]DOT -- 1111 1110 0.10 .... .... 1101 .Q.U .... */
-        int u = extract32(insn, 4, 1);
-
-        if (!dc_isar_feature(aa32_dp, s)) {
-            return 1;
-        }
-        fn_gvec = u ? gen_helper_gvec_udot_idx_b : gen_helper_gvec_sdot_idx_b;
-        /* rm is just Vm, and index is M.  */
-        data = extract32(insn, 5, 1); /* index */
-        rm = extract32(insn, 0, 4);
-    } else if ((insn & 0xffa00f10) == 0xfe000810) {
+    if ((insn & 0xffa00f10) == 0xfe000810) {
         /* VFM[AS]L -- 1111 1110 0.0S .... .... 1000 .Q.1 .... */
         int is_s = extract32(insn, 20, 1);
         int vm20 = extract32(insn, 0, 3);
diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
index c11d755ed14..63a46c63c07 100644
--- a/target/arm/neon-shared.decode
+++ b/target/arm/neon-shared.decode
@@ -54,3 +54,6 @@ VCMLA_scalar   1111 1110 0 . rot:2 .... .... 1000 . q:1 index:1 0 vm:4 \
                vn=%vn_dp vd=%vd_dp size=0
 VCMLA_scalar   1111 1110 1 . rot:2 .... .... 1000 . q:1 . 0 .... \
                vm=%vm_dp vn=%vn_dp vd=%vd_dp size=1 index=0
+
+VDOT_scalar    1111 1110 0 . 10 .... .... 1101 . q:1 index:1 u:1 rm:4 \
+               vm=%vm_dp vn=%vn_dp vd=%vd_dp
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 10/36] target/arm: Convert VFM[AS]L (scalar) to decodetree
  2020-04-30 18:09 [PATCH 00/36] target/arm: Convert Neon to decodetree (part 1) Peter Maydell
                   ` (8 preceding siblings ...)
  2020-04-30 18:09 ` [PATCH 09/36] target/arm: Convert V[US]DOT " Peter Maydell
@ 2020-04-30 18:09 ` Peter Maydell
  2020-04-30 19:06   ` Richard Henderson
  2020-04-30 18:09 ` [PATCH 11/36] target/arm: Convert Neon load/store multiple structures " Peter Maydell
                   ` (27 subsequent siblings)
  37 siblings, 1 reply; 85+ messages in thread
From: Peter Maydell @ 2020-04-30 18:09 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the VFM[AS]L (scalar) insns in the 2reg-scalar-ext group
to decodetree. These are the last ones in the group so we can remove
all the legacy decode for the group.

Note that in disas_thumb2_insn() the parts of this encoding space
where the decodetree decoder returns false will correctly be directed
to illegal_op by the "(insn & (1 << 28))" check so they won't fall
into disas_coproc_insn() by mistake.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-neon.inc.c |  32 ++++++++++
 target/arm/translate.c          | 107 +-------------------------------
 target/arm/neon-shared.decode   |   7 +++
 3 files changed, 40 insertions(+), 106 deletions(-)

diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index 7cc6ccb0697..b06542b8b83 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -242,3 +242,35 @@ static bool trans_VDOT_scalar(DisasContext *s, arg_VDOT_scalar *a)
     tcg_temp_free_ptr(fpst);
     return true;
 }
+
+static bool trans_VFML_scalar(DisasContext *s, arg_VFML_scalar *a)
+{
+    int opr_sz;
+
+    if (!dc_isar_feature(aa32_fhm, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd & 0x10) || (a->q && (a->vn & 0x10)))) {
+        return false;
+    }
+
+    if (a->vd & a->q) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    opr_sz = (1 + a->q) * 8;
+    tcg_gen_gvec_3_ptr(vfp_reg_offset(1, a->vd),
+                       vfp_reg_offset(a->q, a->vn),
+                       vfp_reg_offset(a->q, a->rm),
+                       cpu_env, opr_sz, opr_sz,
+                       (a->index << 2) | a->s, /* is_2 == 0 */
+                       gen_helper_gvec_fmlal_idx_a32);
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index af2714292ea..90f2f37908b 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -2610,8 +2610,6 @@ static int disas_dsp_insn(DisasContext *s, uint32_t insn)
 }
 
 #define VFP_REG_SHR(x, n) (((n) > 0) ? (x) >> (n) : (x) << -(n))
-#define VFP_SREG(insn, bigbit, smallbit) \
-  ((VFP_REG_SHR(insn, bigbit - 1) & 0x1e) | (((insn) >> (smallbit)) & 1))
 #define VFP_DREG(reg, insn, bigbit, smallbit) do { \
     if (dc_isar_feature(aa32_simd_r32, s)) { \
         reg = (((insn) >> (bigbit)) & 0x0f) \
@@ -2622,11 +2620,8 @@ static int disas_dsp_insn(DisasContext *s, uint32_t insn)
         reg = ((insn) >> (bigbit)) & 0x0f; \
     }} while (0)
 
-#define VFP_SREG_D(insn) VFP_SREG(insn, 12, 22)
 #define VFP_DREG_D(reg, insn) VFP_DREG(reg, insn, 12, 22)
-#define VFP_SREG_N(insn) VFP_SREG(insn, 16,  7)
 #define VFP_DREG_N(reg, insn) VFP_DREG(reg, insn, 16,  7)
-#define VFP_SREG_M(insn) VFP_SREG(insn,  0,  5)
 #define VFP_DREG_M(reg, insn) VFP_DREG(reg, insn,  0,  5)
 
 static void gen_neon_dup_low16(TCGv_i32 var)
@@ -7032,94 +7027,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
     return 0;
 }
 
-/* Advanced SIMD two registers and a scalar extension.
- *  31             24   23  22   20   16   12  11   10   9    8        3     0
- * +-----------------+----+---+----+----+----+---+----+---+----+---------+----+
- * | 1 1 1 1 1 1 1 0 | o1 | D | o2 | Vn | Vd | 1 | o3 | 0 | o4 | N Q M U | Vm |
- * +-----------------+----+---+----+----+----+---+----+---+----+---------+----+
- *
- */
-
-static int disas_neon_insn_2reg_scalar_ext(DisasContext *s, uint32_t insn)
-{
-    gen_helper_gvec_3 *fn_gvec = NULL;
-    gen_helper_gvec_3_ptr *fn_gvec_ptr = NULL;
-    int rd, rn, rm, opr_sz, data;
-    int off_rn, off_rm;
-    bool is_long = false, q = extract32(insn, 6, 1);
-    bool ptr_is_env = false;
-
-    if ((insn & 0xffa00f10) == 0xfe000810) {
-        /* VFM[AS]L -- 1111 1110 0.0S .... .... 1000 .Q.1 .... */
-        int is_s = extract32(insn, 20, 1);
-        int vm20 = extract32(insn, 0, 3);
-        int vm3 = extract32(insn, 3, 1);
-        int m = extract32(insn, 5, 1);
-        int index;
-
-        if (!dc_isar_feature(aa32_fhm, s)) {
-            return 1;
-        }
-        if (q) {
-            rm = vm20;
-            index = m * 2 + vm3;
-        } else {
-            rm = vm20 * 2 + m;
-            index = vm3;
-        }
-        is_long = true;
-        data = (index << 2) | is_s; /* is_2 == 0 */
-        fn_gvec_ptr = gen_helper_gvec_fmlal_idx_a32;
-        ptr_is_env = true;
-    } else {
-        return 1;
-    }
-
-    VFP_DREG_D(rd, insn);
-    if (rd & q) {
-        return 1;
-    }
-    if (q || !is_long) {
-        VFP_DREG_N(rn, insn);
-        if (rn & q & !is_long) {
-            return 1;
-        }
-        off_rn = vfp_reg_offset(1, rn);
-        off_rm = vfp_reg_offset(1, rm);
-    } else {
-        rn = VFP_SREG_N(insn);
-        off_rn = vfp_reg_offset(0, rn);
-        off_rm = vfp_reg_offset(0, rm);
-    }
-    if (s->fp_excp_el) {
-        gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
-                           syn_simd_access_trap(1, 0xe, false), s->fp_excp_el);
-        return 0;
-    }
-    if (!s->vfp_enabled) {
-        return 1;
-    }
-
-    opr_sz = (1 + q) * 8;
-    if (fn_gvec_ptr) {
-        TCGv_ptr ptr;
-        if (ptr_is_env) {
-            ptr = cpu_env;
-        } else {
-            ptr = get_fpstatus_ptr(1);
-        }
-        tcg_gen_gvec_3_ptr(vfp_reg_offset(1, rd), off_rn, off_rm, ptr,
-                           opr_sz, opr_sz, data, fn_gvec_ptr);
-        if (!ptr_is_env) {
-            tcg_temp_free_ptr(ptr);
-        }
-    } else {
-        tcg_gen_gvec_3_ool(vfp_reg_offset(1, rd), off_rn, off_rm,
-                           opr_sz, opr_sz, data, fn_gvec);
-    }
-    return 0;
-}
-
 static int disas_coproc_insn(DisasContext *s, uint32_t insn)
 {
     int cpnum, is64, crn, crm, opc1, opc2, isread, rt, rt2;
@@ -10843,12 +10750,6 @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
                     }
                 }
             }
-        } else if ((insn & 0x0f000a00) == 0x0e000800
-                   && arm_dc_feature(s, ARM_FEATURE_V8)) {
-            if (disas_neon_insn_2reg_scalar_ext(s, insn)) {
-                goto illegal_op;
-            }
-            return;
         }
         goto illegal_op;
     }
@@ -11026,13 +10927,7 @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
             }
             break;
         }
-        if ((insn & 0xff000a00) == 0xfe000800
-            && arm_dc_feature(s, ARM_FEATURE_V8)) {
-            /* The Thumb2 and ARM encodings are identical.  */
-            if (disas_neon_insn_2reg_scalar_ext(s, insn)) {
-                goto illegal_op;
-            }
-        } else if (((insn >> 24) & 3) == 3) {
+        if (((insn >> 24) & 3) == 3) {
             /* Translate into the equivalent ARM encoding.  */
             insn = (insn & 0xe2ffffff) | ((insn & (1 << 28)) >> 4) | (1 << 28);
             if (disas_neon_data_insn(s, insn)) {
diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
index 63a46c63c07..f297ba8cdfc 100644
--- a/target/arm/neon-shared.decode
+++ b/target/arm/neon-shared.decode
@@ -57,3 +57,10 @@ VCMLA_scalar   1111 1110 1 . rot:2 .... .... 1000 . q:1 . 0 .... \
 
 VDOT_scalar    1111 1110 0 . 10 .... .... 1101 . q:1 index:1 u:1 rm:4 \
                vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+%vfml_scalar_q0_rm 0:3 5:1
+%vfml_scalar_q1_index 5:1 3:1
+VFML_scalar    1111 1110 0 . 0 s:1 .... .... 1000 . 0 . 1 index:1 ... \
+               rm=%vfml_scalar_q0_rm vn=%vn_sp vd=%vd_dp q=0
+VFML_scalar    1111 1110 0 . 0 s:1 .... .... 1000 . 1 . 1 . rm:3 \
+               index=%vfml_scalar_q1_index vn=%vn_dp vd=%vd_dp q=1
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 11/36] target/arm: Convert Neon load/store multiple structures to decodetree
  2020-04-30 18:09 [PATCH 00/36] target/arm: Convert Neon to decodetree (part 1) Peter Maydell
                   ` (9 preceding siblings ...)
  2020-04-30 18:09 ` [PATCH 10/36] target/arm: Convert VFM[AS]L " Peter Maydell
@ 2020-04-30 18:09 ` Peter Maydell
  2020-04-30 19:09   ` Richard Henderson
  2020-04-30 18:09 ` [PATCH 12/36] target/arm: Convert Neon 'load single structure to all lanes' " Peter Maydell
                   ` (26 subsequent siblings)
  37 siblings, 1 reply; 85+ messages in thread
From: Peter Maydell @ 2020-04-30 18:09 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the Neon "load/store multiple structures" insns to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-neon.inc.c | 124 ++++++++++++++++++++++++++++++++
 target/arm/translate.c          |  91 +----------------------
 target/arm/neon-ls.decode       |   7 ++
 3 files changed, 133 insertions(+), 89 deletions(-)

diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index b06542b8b83..966c0d92012 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -274,3 +274,127 @@ static bool trans_VFML_scalar(DisasContext *s, arg_VFML_scalar *a)
                        gen_helper_gvec_fmlal_idx_a32);
     return true;
 }
+
+static struct {
+    int nregs;
+    int interleave;
+    int spacing;
+} const neon_ls_element_type[11] = {
+    {1, 4, 1},
+    {1, 4, 2},
+    {4, 1, 1},
+    {2, 2, 2},
+    {1, 3, 1},
+    {1, 3, 2},
+    {3, 1, 1},
+    {1, 1, 1},
+    {1, 2, 1},
+    {1, 2, 2},
+    {2, 1, 1}
+};
+
+static void gen_neon_ldst_base_update(DisasContext *s, int rm, int rn,
+                                      int stride)
+{
+    if (rm != 15) {
+        TCGv_i32 base;
+
+        base = load_reg(s, rn);
+        if (rm == 13) {
+            tcg_gen_addi_i32(base, base, stride);
+        } else {
+            TCGv_i32 index;
+            index = load_reg(s, rm);
+            tcg_gen_add_i32(base, base, index);
+            tcg_temp_free_i32(index);
+        }
+        store_reg(s, rn, base);
+    }
+}
+
+static bool trans_VLDST_multiple(DisasContext *s, arg_VLDST_multiple *a)
+{
+    /* Neon load/store multiple structures */
+    int nregs, interleave, spacing, reg, n;
+    MemOp endian = s->be_data;
+    int mmu_idx = get_mem_index(s);
+    int size = a->size;
+    TCGv_i64 tmp64;
+    TCGv_i32 addr, tmp;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist */
+    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vd & 0x10)) {
+        return false;
+    }
+    if (a->itype > 10) {
+        return false;
+    }
+    /* Catch UNDEF cases for bad values of align field */
+    switch (a->itype & 0xc) {
+    case 4:
+        if (a->align >= 2) {
+            return false;
+        }
+        break;
+    case 8:
+        if (a->align == 3) {
+            return false;
+        }
+        break;
+    default:
+        break;
+    }
+    nregs = neon_ls_element_type[a->itype].nregs;
+    interleave = neon_ls_element_type[a->itype].interleave;
+    spacing = neon_ls_element_type[a->itype].spacing;
+    if (size == 3 && (interleave | spacing) != 1) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    /* For our purposes, bytes are always little-endian.  */
+    if (size == 0) {
+        endian = MO_LE;
+    }
+    /*
+     * Consecutive little-endian elements from a single register
+     * can be promoted to a larger little-endian operation.
+     */
+    if (interleave == 1 && endian == MO_LE) {
+        size = 3;
+    }
+    tmp64 = tcg_temp_new_i64();
+    addr = tcg_temp_new_i32();
+    tmp = tcg_const_i32(1 << size);
+    load_reg_var(s, addr, a->rn);
+    for (reg = 0; reg < nregs; reg++) {
+        for (n = 0; n < 8 >> size; n++) {
+            int xs;
+            for (xs = 0; xs < interleave; xs++) {
+                int tt = a->vd + reg + spacing * xs;
+
+                if (a->l) {
+                    gen_aa32_ld_i64(s, tmp64, addr, mmu_idx, endian | size);
+                    neon_store_element64(tt, n, size, tmp64);
+                } else {
+                    neon_load_element64(tmp64, tt, n, size);
+                    gen_aa32_st_i64(s, tmp64, addr, mmu_idx, endian | size);
+                }
+                tcg_gen_add_i32(addr, addr, tmp);
+            }
+        }
+    }
+    tcg_temp_free_i32(addr);
+    tcg_temp_free_i32(tmp);
+    tcg_temp_free_i64(tmp64);
+
+    gen_neon_ldst_base_update(s, a->rm, a->rn, nregs * interleave * 8);
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 90f2f37908b..3f97635a524 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -3214,45 +3214,19 @@ static void gen_neon_trn_u16(TCGv_i32 t0, TCGv_i32 t1)
 }
 
 
-static struct {
-    int nregs;
-    int interleave;
-    int spacing;
-} const neon_ls_element_type[11] = {
-    {1, 4, 1},
-    {1, 4, 2},
-    {4, 1, 1},
-    {2, 2, 2},
-    {1, 3, 1},
-    {1, 3, 2},
-    {3, 1, 1},
-    {1, 1, 1},
-    {1, 2, 1},
-    {1, 2, 2},
-    {2, 1, 1}
-};
-
 /* Translate a NEON load/store element instruction.  Return nonzero if the
    instruction is invalid.  */
 static int disas_neon_ls_insn(DisasContext *s, uint32_t insn)
 {
     int rd, rn, rm;
-    int op;
     int nregs;
-    int interleave;
-    int spacing;
     int stride;
     int size;
     int reg;
     int load;
-    int n;
     int vec_size;
-    int mmu_idx;
-    MemOp endian;
     TCGv_i32 addr;
     TCGv_i32 tmp;
-    TCGv_i32 tmp2;
-    TCGv_i64 tmp64;
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
         return 1;
@@ -3274,70 +3248,9 @@ static int disas_neon_ls_insn(DisasContext *s, uint32_t insn)
     rn = (insn >> 16) & 0xf;
     rm = insn & 0xf;
     load = (insn & (1 << 21)) != 0;
-    endian = s->be_data;
-    mmu_idx = get_mem_index(s);
     if ((insn & (1 << 23)) == 0) {
-        /* Load store all elements.  */
-        op = (insn >> 8) & 0xf;
-        size = (insn >> 6) & 3;
-        if (op > 10)
-            return 1;
-        /* Catch UNDEF cases for bad values of align field */
-        switch (op & 0xc) {
-        case 4:
-            if (((insn >> 5) & 1) == 1) {
-                return 1;
-            }
-            break;
-        case 8:
-            if (((insn >> 4) & 3) == 3) {
-                return 1;
-            }
-            break;
-        default:
-            break;
-        }
-        nregs = neon_ls_element_type[op].nregs;
-        interleave = neon_ls_element_type[op].interleave;
-        spacing = neon_ls_element_type[op].spacing;
-        if (size == 3 && (interleave | spacing) != 1) {
-            return 1;
-        }
-        /* For our purposes, bytes are always little-endian.  */
-        if (size == 0) {
-            endian = MO_LE;
-        }
-        /* Consecutive little-endian elements from a single register
-         * can be promoted to a larger little-endian operation.
-         */
-        if (interleave == 1 && endian == MO_LE) {
-            size = 3;
-        }
-        tmp64 = tcg_temp_new_i64();
-        addr = tcg_temp_new_i32();
-        tmp2 = tcg_const_i32(1 << size);
-        load_reg_var(s, addr, rn);
-        for (reg = 0; reg < nregs; reg++) {
-            for (n = 0; n < 8 >> size; n++) {
-                int xs;
-                for (xs = 0; xs < interleave; xs++) {
-                    int tt = rd + reg + spacing * xs;
-
-                    if (load) {
-                        gen_aa32_ld_i64(s, tmp64, addr, mmu_idx, endian | size);
-                        neon_store_element64(tt, n, size, tmp64);
-                    } else {
-                        neon_load_element64(tmp64, tt, n, size);
-                        gen_aa32_st_i64(s, tmp64, addr, mmu_idx, endian | size);
-                    }
-                    tcg_gen_add_i32(addr, addr, tmp2);
-                }
-            }
-        }
-        tcg_temp_free_i32(addr);
-        tcg_temp_free_i32(tmp2);
-        tcg_temp_free_i64(tmp64);
-        stride = nregs * interleave * 8;
+        /* Load store all elements -- handled already by decodetree */
+        return 1;
     } else {
         size = (insn >> 10) & 3;
         if (size == 3) {
diff --git a/target/arm/neon-ls.decode b/target/arm/neon-ls.decode
index 2b16c9256df..dd03d5a37bd 100644
--- a/target/arm/neon-ls.decode
+++ b/target/arm/neon-ls.decode
@@ -27,3 +27,10 @@
 #   0b1111_1001_xxx0_xxxx_xxxx_xxxx_xxxx_xxxx
 # This file works on the A32 encoding only; calling code for T32 has to
 # transform the insn into the A32 version first.
+
+%vd_dp  22:1 12:4
+
+# Neon load/store multiple structures
+
+VLDST_multiple 1111 0100 0 . l:1 0 rn:4 .... itype:4 size:2 align:2 rm:4 \
+               vd=%vd_dp
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 12/36] target/arm: Convert Neon 'load single structure to all lanes' to decodetree
  2020-04-30 18:09 [PATCH 00/36] target/arm: Convert Neon to decodetree (part 1) Peter Maydell
                   ` (10 preceding siblings ...)
  2020-04-30 18:09 ` [PATCH 11/36] target/arm: Convert Neon load/store multiple structures " Peter Maydell
@ 2020-04-30 18:09 ` Peter Maydell
  2020-04-30 19:17   ` Richard Henderson
  2020-04-30 18:09 ` [PATCH 13/36] target/arm: Convert Neon 'load/store single structure' " Peter Maydell
                   ` (25 subsequent siblings)
  37 siblings, 1 reply; 85+ messages in thread
From: Peter Maydell @ 2020-04-30 18:09 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the Neon "load single structure to all lanes" insns to
decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-neon.inc.c | 73 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 55 +------------------------
 target/arm/neon-ls.decode       |  5 +++
 3 files changed, 80 insertions(+), 53 deletions(-)

diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index 966c0d92012..e60e9559bad 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -398,3 +398,76 @@ static bool trans_VLDST_multiple(DisasContext *s, arg_VLDST_multiple *a)
     gen_neon_ldst_base_update(s, a->rm, a->rn, nregs * interleave * 8);
     return true;
 }
+
+static bool trans_VLD_all_lanes(DisasContext *s, arg_VLD_all_lanes *a)
+{
+    /* Neon load single structure to all lanes */
+    int reg, stride, vec_size;
+    int vd = a->vd;
+    int size = a->size;
+    int nregs = a->n + 1;
+    TCGv_i32 addr, tmp;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist */
+    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vd & 0x10)) {
+        return false;
+    }
+
+    if (size == 3) {
+        if (nregs != 4 || a->a == 0) {
+            return false;
+        }
+        /* For VLD4 size == 3 a == 1 means 32 bits at 16 byte alignment */
+        size = 2;
+    }
+    if (nregs == 1 && a->a == 1 && size == 0) {
+        return false;
+    }
+    if (nregs == 3 && a->a == 1) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    /*
+     * VLD1 to all lanes: T bit indicates how many Dregs to write.
+     * VLD2/3/4 to all lanes: T bit indicates register stride.
+     */
+    stride = a->t ? 2 : 1;
+    vec_size = nregs == 1 ? stride * 8 : 8;
+
+    tmp = tcg_temp_new_i32();
+    addr = tcg_temp_new_i32();
+    load_reg_var(s, addr, a->rn);
+    for (reg = 0; reg < nregs; reg++) {
+        gen_aa32_ld_i32(s, tmp, addr, get_mem_index(s),
+                        s->be_data | size);
+        if ((vd & 1) && vec_size == 16) {
+            /*
+             * We cannot write 16 bytes at once because the
+             * destination is unaligned.
+             */
+            tcg_gen_gvec_dup_i32(size, neon_reg_offset(vd, 0),
+                                 8, 8, tmp);
+            tcg_gen_gvec_mov(0, neon_reg_offset(vd + 1, 0),
+                             neon_reg_offset(vd, 0), 8, 8);
+        } else {
+            tcg_gen_gvec_dup_i32(size, neon_reg_offset(vd, 0),
+                                 vec_size, vec_size, tmp);
+        }
+        tcg_gen_addi_i32(addr, addr, 1 << size);
+        vd += stride;
+    }
+    tcg_temp_free_i32(tmp);
+    tcg_temp_free_i32(addr);
+
+    gen_neon_ldst_base_update(s, a->rm, a->rn, (1 << size) * nregs);
+
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 3f97635a524..a9cad04ba91 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -3224,7 +3224,6 @@ static int disas_neon_ls_insn(DisasContext *s, uint32_t insn)
     int size;
     int reg;
     int load;
-    int vec_size;
     TCGv_i32 addr;
     TCGv_i32 tmp;
 
@@ -3254,58 +3253,8 @@ static int disas_neon_ls_insn(DisasContext *s, uint32_t insn)
     } else {
         size = (insn >> 10) & 3;
         if (size == 3) {
-            /* Load single element to all lanes.  */
-            int a = (insn >> 4) & 1;
-            if (!load) {
-                return 1;
-            }
-            size = (insn >> 6) & 3;
-            nregs = ((insn >> 8) & 3) + 1;
-
-            if (size == 3) {
-                if (nregs != 4 || a == 0) {
-                    return 1;
-                }
-                /* For VLD4 size==3 a == 1 means 32 bits at 16 byte alignment */
-                size = 2;
-            }
-            if (nregs == 1 && a == 1 && size == 0) {
-                return 1;
-            }
-            if (nregs == 3 && a == 1) {
-                return 1;
-            }
-            addr = tcg_temp_new_i32();
-            load_reg_var(s, addr, rn);
-
-            /* VLD1 to all lanes: bit 5 indicates how many Dregs to write.
-             * VLD2/3/4 to all lanes: bit 5 indicates register stride.
-             */
-            stride = (insn & (1 << 5)) ? 2 : 1;
-            vec_size = nregs == 1 ? stride * 8 : 8;
-
-            tmp = tcg_temp_new_i32();
-            for (reg = 0; reg < nregs; reg++) {
-                gen_aa32_ld_i32(s, tmp, addr, get_mem_index(s),
-                                s->be_data | size);
-                if ((rd & 1) && vec_size == 16) {
-                    /* We cannot write 16 bytes at once because the
-                     * destination is unaligned.
-                     */
-                    tcg_gen_gvec_dup_i32(size, neon_reg_offset(rd, 0),
-                                         8, 8, tmp);
-                    tcg_gen_gvec_mov(0, neon_reg_offset(rd + 1, 0),
-                                     neon_reg_offset(rd, 0), 8, 8);
-                } else {
-                    tcg_gen_gvec_dup_i32(size, neon_reg_offset(rd, 0),
-                                         vec_size, vec_size, tmp);
-                }
-                tcg_gen_addi_i32(addr, addr, 1 << size);
-                rd += stride;
-            }
-            tcg_temp_free_i32(tmp);
-            tcg_temp_free_i32(addr);
-            stride = (1 << size) * nregs;
+            /* Load single element to all lanes -- handled by decodetree  */
+            return 1;
         } else {
             /* Single element.  */
             int idx = (insn >> 4) & 0xf;
diff --git a/target/arm/neon-ls.decode b/target/arm/neon-ls.decode
index dd03d5a37bd..f0ab6d2c987 100644
--- a/target/arm/neon-ls.decode
+++ b/target/arm/neon-ls.decode
@@ -34,3 +34,8 @@
 
 VLDST_multiple 1111 0100 0 . l:1 0 rn:4 .... itype:4 size:2 align:2 rm:4 \
                vd=%vd_dp
+
+# Neon load single element to all lanes
+
+VLD_all_lanes  1111 0100 1 . 1 0 rn:4 .... 11 n:2 size:2 t:1 a:1 rm:4 \
+               vd=%vd_dp
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 13/36] target/arm: Convert Neon 'load/store single structure' to decodetree
  2020-04-30 18:09 [PATCH 00/36] target/arm: Convert Neon to decodetree (part 1) Peter Maydell
                   ` (11 preceding siblings ...)
  2020-04-30 18:09 ` [PATCH 12/36] target/arm: Convert Neon 'load single structure to all lanes' " Peter Maydell
@ 2020-04-30 18:09 ` Peter Maydell
  2020-04-30 19:32   ` Richard Henderson
  2020-04-30 18:09 ` [PATCH 14/36] target/arm: Convert Neon 3-reg-same VADD/VSUB " Peter Maydell
                   ` (24 subsequent siblings)
  37 siblings, 1 reply; 85+ messages in thread
From: Peter Maydell @ 2020-04-30 18:09 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the Neon "load/store single structure to one lane" insns to
decodetree.

As this is the last set of insns in the neon load/store group,
we can remove the whole disas_neon_ls_insn() function.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-neon.inc.c |  89 +++++++++++++++++++
 target/arm/translate.c          | 147 --------------------------------
 target/arm/neon-ls.decode       |  11 +++
 3 files changed, 100 insertions(+), 147 deletions(-)

diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index e60e9559bad..c881d1cf607 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -26,6 +26,11 @@
  * It might be possible to convert it to a standalone .c file eventually.
  */
 
+static inline int plus1(DisasContext *s, int x)
+{
+    return x + 1;
+}
+
 /* Include the generated Neon decoder */
 #include "decode-neon-dp.inc.c"
 #include "decode-neon-ls.inc.c"
@@ -471,3 +476,87 @@ static bool trans_VLD_all_lanes(DisasContext *s, arg_VLD_all_lanes *a)
 
     return true;
 }
+
+static bool trans_VLDST_single(DisasContext *s, arg_VLDST_single *a)
+{
+    /* Neon load/store single structure to one lane */
+    int reg;
+    int nregs = a->n + 1;
+    int vd = a->vd;
+    TCGv_i32 addr, tmp;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist */
+    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vd & 0x10)) {
+        return false;
+    }
+
+    /* Catch the UNDEF cases. This is unavoidably a bit messy. */
+    switch (nregs) {
+    case 1:
+        if (((a->align & (1 << a->size)) != 0) ||
+            (a->size == 2 && ((a->align & 3) == 1 || (a->align & 3) == 2))) {
+            return false;
+        }
+        break;
+    case 3:
+        if ((a->align & 1) != 0) {
+            return false;
+        }
+        /* fall through */
+    case 2:
+        if (a->size == 2 && (a->align & 2) != 0) {
+            return false;
+        }
+        break;
+    case 4:
+        if ((a->size == 2) && ((a->align & 3) == 3)) {
+            return false;
+        }
+        break;
+    default:
+        abort();
+    }
+    if ((vd + a->stride * (nregs - 1)) > 31) {
+        /*
+         * Attempts to write off the end of the register file are
+         * UNPREDICTABLE; we choose to UNDEF because otherwise we would
+         * access off the end of the array that holds the register data.
+         */
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    tmp = tcg_temp_new_i32();
+    addr = tcg_temp_new_i32();
+    load_reg_var(s, addr, a->rn);
+    /*
+     * TODO: if we implemented alignment exceptions, we should check
+     * addr against the alignment encoded in a->align here.
+     */
+    for (reg = 0; reg < nregs; reg++) {
+        if (a->l) {
+            gen_aa32_ld_i32(s, tmp, addr, get_mem_index(s),
+                            s->be_data | a->size);
+            neon_store_element(vd, a->reg_idx, a->size, tmp);
+        } else { /* Store */
+            neon_load_element(tmp, vd, a->reg_idx, a->size);
+            gen_aa32_st_i32(s, tmp, addr, get_mem_index(s),
+                            s->be_data | a->size);
+        }
+        vd += a->stride;
+        tcg_gen_addi_i32(addr, addr, 1 << a->size);
+    }
+    tcg_temp_free_i32(addr);
+    tcg_temp_free_i32(tmp);
+
+    gen_neon_ldst_base_update(s, a->rm, a->rn, (1 << a->size) * nregs);
+
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index a9cad04ba91..8c059af0e7e 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -3213,140 +3213,6 @@ static void gen_neon_trn_u16(TCGv_i32 t0, TCGv_i32 t1)
     tcg_temp_free_i32(rd);
 }
 
-
-/* Translate a NEON load/store element instruction.  Return nonzero if the
-   instruction is invalid.  */
-static int disas_neon_ls_insn(DisasContext *s, uint32_t insn)
-{
-    int rd, rn, rm;
-    int nregs;
-    int stride;
-    int size;
-    int reg;
-    int load;
-    TCGv_i32 addr;
-    TCGv_i32 tmp;
-
-    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
-        return 1;
-    }
-
-    /* FIXME: this access check should not take precedence over UNDEF
-     * for invalid encodings; we will generate incorrect syndrome information
-     * for attempts to execute invalid vfp/neon encodings with FP disabled.
-     */
-    if (s->fp_excp_el) {
-        gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
-                           syn_simd_access_trap(1, 0xe, false), s->fp_excp_el);
-        return 0;
-    }
-
-    if (!s->vfp_enabled)
-      return 1;
-    VFP_DREG_D(rd, insn);
-    rn = (insn >> 16) & 0xf;
-    rm = insn & 0xf;
-    load = (insn & (1 << 21)) != 0;
-    if ((insn & (1 << 23)) == 0) {
-        /* Load store all elements -- handled already by decodetree */
-        return 1;
-    } else {
-        size = (insn >> 10) & 3;
-        if (size == 3) {
-            /* Load single element to all lanes -- handled by decodetree  */
-            return 1;
-        } else {
-            /* Single element.  */
-            int idx = (insn >> 4) & 0xf;
-            int reg_idx;
-            switch (size) {
-            case 0:
-                reg_idx = (insn >> 5) & 7;
-                stride = 1;
-                break;
-            case 1:
-                reg_idx = (insn >> 6) & 3;
-                stride = (insn & (1 << 5)) ? 2 : 1;
-                break;
-            case 2:
-                reg_idx = (insn >> 7) & 1;
-                stride = (insn & (1 << 6)) ? 2 : 1;
-                break;
-            default:
-                abort();
-            }
-            nregs = ((insn >> 8) & 3) + 1;
-            /* Catch the UNDEF cases. This is unavoidably a bit messy. */
-            switch (nregs) {
-            case 1:
-                if (((idx & (1 << size)) != 0) ||
-                    (size == 2 && ((idx & 3) == 1 || (idx & 3) == 2))) {
-                    return 1;
-                }
-                break;
-            case 3:
-                if ((idx & 1) != 0) {
-                    return 1;
-                }
-                /* fall through */
-            case 2:
-                if (size == 2 && (idx & 2) != 0) {
-                    return 1;
-                }
-                break;
-            case 4:
-                if ((size == 2) && ((idx & 3) == 3)) {
-                    return 1;
-                }
-                break;
-            default:
-                abort();
-            }
-            if ((rd + stride * (nregs - 1)) > 31) {
-                /* Attempts to write off the end of the register file
-                 * are UNPREDICTABLE; we choose to UNDEF because otherwise
-                 * the neon_load_reg() would write off the end of the array.
-                 */
-                return 1;
-            }
-            tmp = tcg_temp_new_i32();
-            addr = tcg_temp_new_i32();
-            load_reg_var(s, addr, rn);
-            for (reg = 0; reg < nregs; reg++) {
-                if (load) {
-                    gen_aa32_ld_i32(s, tmp, addr, get_mem_index(s),
-                                    s->be_data | size);
-                    neon_store_element(rd, reg_idx, size, tmp);
-                } else { /* Store */
-                    neon_load_element(tmp, rd, reg_idx, size);
-                    gen_aa32_st_i32(s, tmp, addr, get_mem_index(s),
-                                    s->be_data | size);
-                }
-                rd += stride;
-                tcg_gen_addi_i32(addr, addr, 1 << size);
-            }
-            tcg_temp_free_i32(addr);
-            tcg_temp_free_i32(tmp);
-            stride = nregs * (1 << size);
-        }
-    }
-    if (rm != 15) {
-        TCGv_i32 base;
-
-        base = load_reg(s, rn);
-        if (rm == 13) {
-            tcg_gen_addi_i32(base, base, stride);
-        } else {
-            TCGv_i32 index;
-            index = load_reg(s, rm);
-            tcg_gen_add_i32(base, base, index);
-            tcg_temp_free_i32(index);
-        }
-        store_reg(s, rn, base);
-    }
-    return 0;
-}
-
 static inline void gen_neon_narrow(int size, TCGv_i32 dest, TCGv_i64 src)
 {
     switch (size) {
@@ -10596,13 +10462,6 @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
             }
             return;
         }
-        if ((insn & 0x0f100000) == 0x04000000) {
-            /* NEON load/store.  */
-            if (disas_neon_ls_insn(s, insn)) {
-                goto illegal_op;
-            }
-            return;
-        }
         if ((insn & 0x0e000f00) == 0x0c000100) {
             if (arm_dc_feature(s, ARM_FEATURE_IWMMXT)) {
                 /* iWMMXt register transfer.  */
@@ -10807,12 +10666,6 @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
         }
         break;
     case 12:
-        if ((insn & 0x01100000) == 0x01000000) {
-            if (disas_neon_ls_insn(s, insn)) {
-                goto illegal_op;
-            }
-            break;
-        }
         goto illegal_op;
     default:
     illegal_op:
diff --git a/target/arm/neon-ls.decode b/target/arm/neon-ls.decode
index f0ab6d2c987..c7b03a72e8d 100644
--- a/target/arm/neon-ls.decode
+++ b/target/arm/neon-ls.decode
@@ -39,3 +39,14 @@ VLDST_multiple 1111 0100 0 . l:1 0 rn:4 .... itype:4 size:2 align:2 rm:4 \
 
 VLD_all_lanes  1111 0100 1 . 1 0 rn:4 .... 11 n:2 size:2 t:1 a:1 rm:4 \
                vd=%vd_dp
+
+# Neon load/store single structure to one lane
+%imm1_5_p1 5:1 !function=plus1
+%imm1_6_p1 6:1 !function=plus1
+
+VLDST_single   1111 0100 1 . l:1 0 rn:4 .... 00 n:2 reg_idx:3 align:1 rm:4 \
+               vd=%vd_dp size=0 stride=1
+VLDST_single   1111 0100 1 . l:1 0 rn:4 .... 01 n:2 reg_idx:2 align:2 rm:4 \
+               vd=%vd_dp size=1 stride=%imm1_5_p1
+VLDST_single   1111 0100 1 . l:1 0 rn:4 .... 10 n:2 reg_idx:1 align:3 rm:4 \
+               vd=%vd_dp size=2 stride=%imm1_6_p1
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 14/36] target/arm: Convert Neon 3-reg-same VADD/VSUB to decodetree
  2020-04-30 18:09 [PATCH 00/36] target/arm: Convert Neon to decodetree (part 1) Peter Maydell
                   ` (12 preceding siblings ...)
  2020-04-30 18:09 ` [PATCH 13/36] target/arm: Convert Neon 'load/store single structure' " Peter Maydell
@ 2020-04-30 18:09 ` Peter Maydell
  2020-04-30 19:36   ` Richard Henderson
  2020-04-30 18:09 ` [PATCH 15/36] target/arm: Convert Neon 3-reg-same logic ops " Peter Maydell
                   ` (23 subsequent siblings)
  37 siblings, 1 reply; 85+ messages in thread
From: Peter Maydell @ 2020-04-30 18:09 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the Neon 3-reg-same VADD and VSUB insns to decodetree.

Note that we don't need the neon_3r_sizes[op] check here because all
size values are OK for VADD and VSUB; we'll add this when we convert
the first insn that has size restrictions.

For this we need one of the GVecGen*Fn typedefs currently in
translate-a64.h; move them all to translate.h as a block so they
are visible to the 32-bit decoder.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-a64.h      |  9 --------
 target/arm/translate.h          |  9 ++++++++
 target/arm/translate-neon.inc.c | 38 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 14 ++++--------
 target/arm/neon-dp.decode       | 17 +++++++++++++++
 5 files changed, 68 insertions(+), 19 deletions(-)

diff --git a/target/arm/translate-a64.h b/target/arm/translate-a64.h
index 4c2c91ae1b2..f02fbb63a4a 100644
--- a/target/arm/translate-a64.h
+++ b/target/arm/translate-a64.h
@@ -115,13 +115,4 @@ static inline int vec_full_reg_size(DisasContext *s)
 
 bool disas_sve(DisasContext *, uint32_t);
 
-/* Note that the gvec expanders operate on offsets + sizes.  */
-typedef void GVecGen2Fn(unsigned, uint32_t, uint32_t, uint32_t, uint32_t);
-typedef void GVecGen2iFn(unsigned, uint32_t, uint32_t, int64_t,
-                         uint32_t, uint32_t);
-typedef void GVecGen3Fn(unsigned, uint32_t, uint32_t,
-                        uint32_t, uint32_t, uint32_t);
-typedef void GVecGen4Fn(unsigned, uint32_t, uint32_t, uint32_t,
-                        uint32_t, uint32_t, uint32_t);
-
 #endif /* TARGET_ARM_TRANSLATE_A64_H */
diff --git a/target/arm/translate.h b/target/arm/translate.h
index 98b319f3f69..95b43e7ab65 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -305,4 +305,13 @@ void gen_sshl_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
 #define dc_isar_feature(name, ctx) \
     ({ DisasContext *ctx_ = (ctx); isar_feature_##name(ctx_->isar); })
 
+/* Note that the gvec expanders operate on offsets + sizes.  */
+typedef void GVecGen2Fn(unsigned, uint32_t, uint32_t, uint32_t, uint32_t);
+typedef void GVecGen2iFn(unsigned, uint32_t, uint32_t, int64_t,
+                         uint32_t, uint32_t);
+typedef void GVecGen3Fn(unsigned, uint32_t, uint32_t,
+                        uint32_t, uint32_t, uint32_t);
+typedef void GVecGen4Fn(unsigned, uint32_t, uint32_t, uint32_t,
+                        uint32_t, uint32_t, uint32_t);
+
 #endif /* TARGET_ARM_TRANSLATE_H */
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index c881d1cf607..bd9e697b3e2 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -560,3 +560,41 @@ static bool trans_VLDST_single(DisasContext *s, arg_VLDST_single *a)
 
     return true;
 }
+
+static bool do_3same(DisasContext *s, arg_3same *a, GVecGen3Fn fn)
+{
+    int vec_size = a->q ? 16 : 8;
+    int rd_ofs = neon_reg_offset(a->vd, 0);
+    int rn_ofs = neon_reg_offset(a->vn, 0);
+    int rm_ofs = neon_reg_offset(a->vm, 0);
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if ((a->vn | a->vm | a->vd) & a->q) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    fn(a->size, rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size);
+    return true;
+}
+
+#define DO_3SAME(INSN, FUNC)                                            \
+    static bool trans_##INSN##_3s(DisasContext *s, arg_3same *a)        \
+    {                                                                   \
+        return do_3same(s, a, FUNC);                                    \
+    }
+
+DO_3SAME(VADD, tcg_gen_gvec_add)
+DO_3SAME(VSUB, tcg_gen_gvec_sub)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 8c059af0e7e..81a0df78e40 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -4885,16 +4885,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             }
             return 0;
 
-        case NEON_3R_VADD_VSUB:
-            if (u) {
-                tcg_gen_gvec_sub(size, rd_ofs, rn_ofs, rm_ofs,
-                                 vec_size, vec_size);
-            } else {
-                tcg_gen_gvec_add(size, rd_ofs, rn_ofs, rm_ofs,
-                                 vec_size, vec_size);
-            }
-            return 0;
-
         case NEON_3R_VQADD:
             tcg_gen_gvec_4(rd_ofs, offsetof(CPUARMState, vfp.qc),
                            rn_ofs, rm_ofs, vec_size, vec_size,
@@ -4970,6 +4960,10 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             tcg_gen_gvec_3(rd_ofs, rm_ofs, rn_ofs, vec_size, vec_size,
                            u ? &ushl_op[size] : &sshl_op[size]);
             return 0;
+
+        case NEON_3R_VADD_VSUB:
+            /* Already handled by decodetree */
+            return 1;
         }
 
         if (size == 3) {
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index c89a1a58591..a61b1e88476 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -18,6 +18,10 @@
 #
 # This file is processed by scripts/decodetree.py
 #
+# VFP/Neon register fields; same as vfp.decode
+%vm_dp  5:1 0:4
+%vn_dp  7:1 16:4
+%vd_dp  22:1 12:4
 
 # Encodings for Neon data processing instructions where the T32 encoding
 # is a simple transformation of the A32 encoding.
@@ -27,3 +31,16 @@
 #   0b111p_1111_qqqq_qqqq_qqqq_qqqq_qqqq_qqqq
 # This file works on the A32 encoding only; calling code for T32 has to
 # transform the insn into the A32 version first.
+
+######################################################################
+# 3-reg-same grouping:
+# 1111 001 U 0 D sz:2 Vn:4 Vd:4 opc:4 N Q M op Vm:4
+######################################################################
+
+&3same vm vn vd q size
+
+@3same           .... ... . . . size:2 .... .... .... . q:1 . . .... \
+                 &3same vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+VADD_3s          1111 001 0 0 . .. .... .... 1000 . . . 0 .... @3same
+VSUB_3s          1111 001 1 0 . .. .... .... 1000 . . . 0 .... @3same
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 15/36] target/arm: Convert Neon 3-reg-same logic ops to decodetree
  2020-04-30 18:09 [PATCH 00/36] target/arm: Convert Neon to decodetree (part 1) Peter Maydell
                   ` (13 preceding siblings ...)
  2020-04-30 18:09 ` [PATCH 14/36] target/arm: Convert Neon 3-reg-same VADD/VSUB " Peter Maydell
@ 2020-04-30 18:09 ` Peter Maydell
  2020-04-30 19:39   ` Richard Henderson
  2020-04-30 18:09 ` [PATCH 16/36] target/arm: Convert Neon 3-reg-same VMAX/VMIN " Peter Maydell
                   ` (22 subsequent siblings)
  37 siblings, 1 reply; 85+ messages in thread
From: Peter Maydell @ 2020-04-30 18:09 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the Neon logic ops in the 3-reg-same grouping to decodetree.
Note that for the logic ops the 'size' field forms part of their
decode and the actual operations are always bitwise.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-neon.inc.c | 19 +++++++++++++++++
 target/arm/translate.c          | 38 +--------------------------------
 target/arm/neon-dp.decode       | 12 +++++++++++
 3 files changed, 32 insertions(+), 37 deletions(-)

diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index bd9e697b3e2..507f0abe801 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -598,3 +598,22 @@ static bool do_3same(DisasContext *s, arg_3same *a, GVecGen3Fn fn)
 
 DO_3SAME(VADD, tcg_gen_gvec_add)
 DO_3SAME(VSUB, tcg_gen_gvec_sub)
+DO_3SAME(VAND, tcg_gen_gvec_and)
+DO_3SAME(VBIC, tcg_gen_gvec_andc)
+DO_3SAME(VORR, tcg_gen_gvec_or)
+DO_3SAME(VORN, tcg_gen_gvec_orc)
+DO_3SAME(VEOR, tcg_gen_gvec_xor)
+
+/* These insns are all gvec_bitsel but with the inputs in various orders. */
+#define DO_3SAME_BITSEL(INSN, O1, O2, O3)                               \
+    static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs,         \
+                                uint32_t rn_ofs, uint32_t rm_ofs,       \
+                                uint32_t oprsz, uint32_t maxsz)         \
+    {                                                                   \
+        tcg_gen_gvec_bitsel(vece, rd_ofs, O1, O2, O3, oprsz, maxsz);    \
+    }                                                                   \
+    DO_3SAME(INSN, gen_##INSN##_3s)
+
+DO_3SAME_BITSEL(VBSL, rd_ofs, rn_ofs, rm_ofs)
+DO_3SAME_BITSEL(VBIT, rm_ofs, rn_ofs, rd_ofs)
+DO_3SAME_BITSEL(VBIF, rm_ofs, rd_ofs, rn_ofs)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 81a0df78e40..a3eaf9a82b7 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -4848,43 +4848,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             }
             return 1;
 
-        case NEON_3R_LOGIC: /* Logic ops.  */
-            switch ((u << 2) | size) {
-            case 0: /* VAND */
-                tcg_gen_gvec_and(0, rd_ofs, rn_ofs, rm_ofs,
-                                 vec_size, vec_size);
-                break;
-            case 1: /* VBIC */
-                tcg_gen_gvec_andc(0, rd_ofs, rn_ofs, rm_ofs,
-                                  vec_size, vec_size);
-                break;
-            case 2: /* VORR */
-                tcg_gen_gvec_or(0, rd_ofs, rn_ofs, rm_ofs,
-                                vec_size, vec_size);
-                break;
-            case 3: /* VORN */
-                tcg_gen_gvec_orc(0, rd_ofs, rn_ofs, rm_ofs,
-                                 vec_size, vec_size);
-                break;
-            case 4: /* VEOR */
-                tcg_gen_gvec_xor(0, rd_ofs, rn_ofs, rm_ofs,
-                                 vec_size, vec_size);
-                break;
-            case 5: /* VBSL */
-                tcg_gen_gvec_bitsel(MO_8, rd_ofs, rd_ofs, rn_ofs, rm_ofs,
-                                    vec_size, vec_size);
-                break;
-            case 6: /* VBIT */
-                tcg_gen_gvec_bitsel(MO_8, rd_ofs, rm_ofs, rn_ofs, rd_ofs,
-                                    vec_size, vec_size);
-                break;
-            case 7: /* VBIF */
-                tcg_gen_gvec_bitsel(MO_8, rd_ofs, rm_ofs, rd_ofs, rn_ofs,
-                                    vec_size, vec_size);
-                break;
-            }
-            return 0;
-
         case NEON_3R_VQADD:
             tcg_gen_gvec_4(rd_ofs, offsetof(CPUARMState, vfp.qc),
                            rn_ofs, rm_ofs, vec_size, vec_size,
@@ -4962,6 +4925,7 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             return 0;
 
         case NEON_3R_VADD_VSUB:
+        case NEON_3R_LOGIC:
             /* Already handled by decodetree */
             return 1;
         }
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index a61b1e88476..f62dbaa72d5 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -42,5 +42,17 @@
 @3same           .... ... . . . size:2 .... .... .... . q:1 . . .... \
                  &3same vm=%vm_dp vn=%vn_dp vd=%vd_dp
 
+@3same_logic     .... ... . . . .. .... .... .... . q:1 .. .... \
+                 &3same vm=%vm_dp vn=%vn_dp vd=%vd_dp size=0
+
+VAND_3s          1111 001 0 0 . 00 .... .... 0001 ... 1 .... @3same_logic
+VBIC_3s          1111 001 0 0 . 01 .... .... 0001 ... 1 .... @3same_logic
+VORR_3s          1111 001 0 0 . 10 .... .... 0001 ... 1 .... @3same_logic
+VORN_3s          1111 001 0 0 . 11 .... .... 0001 ... 1 .... @3same_logic
+VEOR_3s          1111 001 1 0 . 00 .... .... 0001 ... 1 .... @3same_logic
+VBSL_3s          1111 001 1 0 . 01 .... .... 0001 ... 1 .... @3same_logic
+VBIT_3s          1111 001 1 0 . 10 .... .... 0001 ... 1 .... @3same_logic
+VBIF_3s          1111 001 1 0 . 11 .... .... 0001 ... 1 .... @3same_logic
+
 VADD_3s          1111 001 0 0 . .. .... .... 1000 . . . 0 .... @3same
 VSUB_3s          1111 001 1 0 . .. .... .... 1000 . . . 0 .... @3same
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 16/36] target/arm: Convert Neon 3-reg-same VMAX/VMIN to decodetree
  2020-04-30 18:09 [PATCH 00/36] target/arm: Convert Neon to decodetree (part 1) Peter Maydell
                   ` (14 preceding siblings ...)
  2020-04-30 18:09 ` [PATCH 15/36] target/arm: Convert Neon 3-reg-same logic ops " Peter Maydell
@ 2020-04-30 18:09 ` Peter Maydell
  2020-04-30 19:45   ` Richard Henderson
  2020-04-30 18:09 ` [PATCH 17/36] target/arm: Convert Neon 3-reg-same comparisons " Peter Maydell
                   ` (21 subsequent siblings)
  37 siblings, 1 reply; 85+ messages in thread
From: Peter Maydell @ 2020-04-30 18:09 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the Neon 3-reg-same VMAX and VMIN insns to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-neon.inc.c | 14 ++++++++++++++
 target/arm/translate.c          | 21 ++-------------------
 target/arm/neon-dp.decode       |  5 +++++
 3 files changed, 21 insertions(+), 19 deletions(-)

diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index 507f0abe801..ab1740201c4 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -617,3 +617,17 @@ DO_3SAME(VEOR, tcg_gen_gvec_xor)
 DO_3SAME_BITSEL(VBSL, rd_ofs, rn_ofs, rm_ofs)
 DO_3SAME_BITSEL(VBIT, rm_ofs, rn_ofs, rd_ofs)
 DO_3SAME_BITSEL(VBIF, rm_ofs, rd_ofs, rn_ofs)
+
+#define DO_3SAME_NO_SZ_3(INSN, FUNC)                                    \
+    static bool trans_##INSN##_3s(DisasContext *s, arg_3same *a)        \
+    {                                                                   \
+        if (a->size == 3) {                                             \
+            return false;                                               \
+        }                                                               \
+        return do_3same(s, a, FUNC);                                    \
+    }
+
+DO_3SAME_NO_SZ_3(VMAX_S, tcg_gen_gvec_smax)
+DO_3SAME_NO_SZ_3(VMAX_U, tcg_gen_gvec_umax)
+DO_3SAME_NO_SZ_3(VMIN_S, tcg_gen_gvec_smin)
+DO_3SAME_NO_SZ_3(VMIN_U, tcg_gen_gvec_umin)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index a3eaf9a82b7..a22aee802a5 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -4899,25 +4899,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                              rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size);
             return 0;
 
-        case NEON_3R_VMAX:
-            if (u) {
-                tcg_gen_gvec_umax(size, rd_ofs, rn_ofs, rm_ofs,
-                                  vec_size, vec_size);
-            } else {
-                tcg_gen_gvec_smax(size, rd_ofs, rn_ofs, rm_ofs,
-                                  vec_size, vec_size);
-            }
-            return 0;
-        case NEON_3R_VMIN:
-            if (u) {
-                tcg_gen_gvec_umin(size, rd_ofs, rn_ofs, rm_ofs,
-                                  vec_size, vec_size);
-            } else {
-                tcg_gen_gvec_smin(size, rd_ofs, rn_ofs, rm_ofs,
-                                  vec_size, vec_size);
-            }
-            return 0;
-
         case NEON_3R_VSHL:
             /* Note the operation is vshl vd,vm,vn */
             tcg_gen_gvec_3(rd_ofs, rm_ofs, rn_ofs, vec_size, vec_size,
@@ -4926,6 +4907,8 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
 
         case NEON_3R_VADD_VSUB:
         case NEON_3R_LOGIC:
+        case NEON_3R_VMAX:
+        case NEON_3R_VMIN:
             /* Already handled by decodetree */
             return 1;
         }
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index f62dbaa72d5..b721d39c7ba 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -54,5 +54,10 @@ VBSL_3s          1111 001 1 0 . 01 .... .... 0001 ... 1 .... @3same_logic
 VBIT_3s          1111 001 1 0 . 10 .... .... 0001 ... 1 .... @3same_logic
 VBIF_3s          1111 001 1 0 . 11 .... .... 0001 ... 1 .... @3same_logic
 
+VMAX_S_3s        1111 001 0 0 . .. .... .... 0110 . . . 0 .... @3same
+VMAX_U_3s        1111 001 1 0 . .. .... .... 0110 . . . 0 .... @3same
+VMIN_S_3s        1111 001 0 0 . .. .... .... 0110 . . . 1 .... @3same
+VMIN_U_3s        1111 001 1 0 . .. .... .... 0110 . . . 1 .... @3same
+
 VADD_3s          1111 001 0 0 . .. .... .... 1000 . . . 0 .... @3same
 VSUB_3s          1111 001 1 0 . .. .... .... 1000 . . . 0 .... @3same
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 17/36] target/arm: Convert Neon 3-reg-same comparisons to decodetree
  2020-04-30 18:09 [PATCH 00/36] target/arm: Convert Neon to decodetree (part 1) Peter Maydell
                   ` (15 preceding siblings ...)
  2020-04-30 18:09 ` [PATCH 16/36] target/arm: Convert Neon 3-reg-same VMAX/VMIN " Peter Maydell
@ 2020-04-30 18:09 ` Peter Maydell
  2020-04-30 19:48   ` Richard Henderson
  2020-04-30 18:09 ` [PATCH 18/36] target/arm: Convert Neon 3-reg-same VQADD/VQSUB " Peter Maydell
                   ` (20 subsequent siblings)
  37 siblings, 1 reply; 85+ messages in thread
From: Peter Maydell @ 2020-04-30 18:09 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the Neon comparison ops in the 3-reg-same grouping
to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-neon.inc.c | 22 ++++++++++++++++++++++
 target/arm/translate.c          | 23 +++--------------------
 target/arm/neon-dp.decode       |  8 ++++++++
 3 files changed, 33 insertions(+), 20 deletions(-)

diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index ab1740201c4..952e4456f5e 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -631,3 +631,25 @@ DO_3SAME_NO_SZ_3(VMAX_S, tcg_gen_gvec_smax)
 DO_3SAME_NO_SZ_3(VMAX_U, tcg_gen_gvec_umax)
 DO_3SAME_NO_SZ_3(VMIN_S, tcg_gen_gvec_smin)
 DO_3SAME_NO_SZ_3(VMIN_U, tcg_gen_gvec_umin)
+
+#define DO_3SAME_CMP(INSN, COND)                                        \
+    static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs,         \
+                                uint32_t rn_ofs, uint32_t rm_ofs,       \
+                                uint32_t oprsz, uint32_t maxsz)         \
+    {                                                                   \
+        tcg_gen_gvec_cmp(COND, vece, rd_ofs, rn_ofs, rm_ofs, oprsz, maxsz); \
+    }                                                                   \
+    DO_3SAME_NO_SZ_3(INSN, gen_##INSN##_3s)
+
+DO_3SAME_CMP(VCGT_S, TCG_COND_GT)
+DO_3SAME_CMP(VCGT_U, TCG_COND_GTU)
+DO_3SAME_CMP(VCGE_S, TCG_COND_GE)
+DO_3SAME_CMP(VCGE_U, TCG_COND_GEU)
+DO_3SAME_CMP(VCEQ, TCG_COND_EQ)
+
+static void gen_VTST_3s(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+                         uint32_t rm_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+    tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, oprsz, maxsz, &cmtst_op[vece]);
+}
+DO_3SAME_NO_SZ_3(VTST, gen_VTST_3s)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index a22aee802a5..7e4a57157c2 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -4879,26 +4879,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                            u ? &mls_op[size] : &mla_op[size]);
             return 0;
 
-        case NEON_3R_VTST_VCEQ:
-            if (u) { /* VCEQ */
-                tcg_gen_gvec_cmp(TCG_COND_EQ, size, rd_ofs, rn_ofs, rm_ofs,
-                                 vec_size, vec_size);
-            } else { /* VTST */
-                tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs,
-                               vec_size, vec_size, &cmtst_op[size]);
-            }
-            return 0;
-
-        case NEON_3R_VCGT:
-            tcg_gen_gvec_cmp(u ? TCG_COND_GTU : TCG_COND_GT, size,
-                             rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size);
-            return 0;
-
-        case NEON_3R_VCGE:
-            tcg_gen_gvec_cmp(u ? TCG_COND_GEU : TCG_COND_GE, size,
-                             rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size);
-            return 0;
-
         case NEON_3R_VSHL:
             /* Note the operation is vshl vd,vm,vn */
             tcg_gen_gvec_3(rd_ofs, rm_ofs, rn_ofs, vec_size, vec_size,
@@ -4909,6 +4889,9 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         case NEON_3R_LOGIC:
         case NEON_3R_VMAX:
         case NEON_3R_VMIN:
+        case NEON_3R_VTST_VCEQ:
+        case NEON_3R_VCGT:
+        case NEON_3R_VCGE:
             /* Already handled by decodetree */
             return 1;
         }
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index b721d39c7ba..b89ea6819a9 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -54,6 +54,11 @@ VBSL_3s          1111 001 1 0 . 01 .... .... 0001 ... 1 .... @3same_logic
 VBIT_3s          1111 001 1 0 . 10 .... .... 0001 ... 1 .... @3same_logic
 VBIF_3s          1111 001 1 0 . 11 .... .... 0001 ... 1 .... @3same_logic
 
+VCGT_S_3s        1111 001 0 0 . .. .... .... 0011 . . . 0 .... @3same
+VCGT_U_3s        1111 001 1 0 . .. .... .... 0011 . . . 0 .... @3same
+VCGE_S_3s        1111 001 0 0 . .. .... .... 0011 . . . 1 .... @3same
+VCGE_U_3s        1111 001 1 0 . .. .... .... 0011 . . . 1 .... @3same
+
 VMAX_S_3s        1111 001 0 0 . .. .... .... 0110 . . . 0 .... @3same
 VMAX_U_3s        1111 001 1 0 . .. .... .... 0110 . . . 0 .... @3same
 VMIN_S_3s        1111 001 0 0 . .. .... .... 0110 . . . 1 .... @3same
@@ -61,3 +66,6 @@ VMIN_U_3s        1111 001 1 0 . .. .... .... 0110 . . . 1 .... @3same
 
 VADD_3s          1111 001 0 0 . .. .... .... 1000 . . . 0 .... @3same
 VSUB_3s          1111 001 1 0 . .. .... .... 1000 . . . 0 .... @3same
+
+VTST_3s          1111 001 0 0 . .. .... .... 1000 . . . 1 .... @3same
+VCEQ_3s          1111 001 1 0 . .. .... .... 1000 . . . 1 .... @3same
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 18/36] target/arm: Convert Neon 3-reg-same VQADD/VQSUB to decodetree
  2020-04-30 18:09 [PATCH 00/36] target/arm: Convert Neon to decodetree (part 1) Peter Maydell
                   ` (16 preceding siblings ...)
  2020-04-30 18:09 ` [PATCH 17/36] target/arm: Convert Neon 3-reg-same comparisons " Peter Maydell
@ 2020-04-30 18:09 ` Peter Maydell
  2020-04-30 19:50   ` Richard Henderson
  2020-04-30 18:09 ` [PATCH 19/36] target/arm: Convert Neon 3-reg-same VMUL, VMLA, VMLS, VSHL " Peter Maydell
                   ` (19 subsequent siblings)
  37 siblings, 1 reply; 85+ messages in thread
From: Peter Maydell @ 2020-04-30 18:09 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the Neon VQADD/VQSUB insns in the 3-reg-same grouping
to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-neon.inc.c | 15 +++++++++++++++
 target/arm/translate.c          | 14 ++------------
 target/arm/neon-dp.decode       |  6 ++++++
 3 files changed, 23 insertions(+), 12 deletions(-)

diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index 952e4456f5e..854ab70cd79 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -653,3 +653,18 @@ static void gen_VTST_3s(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
     tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, oprsz, maxsz, &cmtst_op[vece]);
 }
 DO_3SAME_NO_SZ_3(VTST, gen_VTST_3s)
+
+#define DO_3SAME_GVEC4(INSN, OPARRAY)                                   \
+    static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs,         \
+                                uint32_t rn_ofs, uint32_t rm_ofs,       \
+                                uint32_t oprsz, uint32_t maxsz)         \
+    {                                                                   \
+        tcg_gen_gvec_4(rd_ofs, offsetof(CPUARMState, vfp.qc),           \
+                       rn_ofs, rm_ofs, oprsz, maxsz, &OPARRAY[vece]);   \
+    }                                                                   \
+    DO_3SAME(INSN, gen_##INSN##_3s)
+
+DO_3SAME_GVEC4(VQADD_S, sqadd_op)
+DO_3SAME_GVEC4(VQADD_U, uqadd_op)
+DO_3SAME_GVEC4(VQSUB_S, sqsub_op)
+DO_3SAME_GVEC4(VQSUB_U, uqsub_op)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 7e4a57157c2..538e4be8f1b 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -4848,18 +4848,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             }
             return 1;
 
-        case NEON_3R_VQADD:
-            tcg_gen_gvec_4(rd_ofs, offsetof(CPUARMState, vfp.qc),
-                           rn_ofs, rm_ofs, vec_size, vec_size,
-                           (u ? uqadd_op : sqadd_op) + size);
-            return 0;
-
-        case NEON_3R_VQSUB:
-            tcg_gen_gvec_4(rd_ofs, offsetof(CPUARMState, vfp.qc),
-                           rn_ofs, rm_ofs, vec_size, vec_size,
-                           (u ? uqsub_op : sqsub_op) + size);
-            return 0;
-
         case NEON_3R_VMUL: /* VMUL */
             if (u) {
                 /* Polynomial case allows only P8.  */
@@ -4892,6 +4880,8 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         case NEON_3R_VTST_VCEQ:
         case NEON_3R_VCGT:
         case NEON_3R_VCGE:
+        case NEON_3R_VQADD:
+        case NEON_3R_VQSUB:
             /* Already handled by decodetree */
             return 1;
         }
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index b89ea6819a9..ab59b349aaa 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -42,6 +42,9 @@
 @3same           .... ... . . . size:2 .... .... .... . q:1 . . .... \
                  &3same vm=%vm_dp vn=%vn_dp vd=%vd_dp
 
+VQADD_S_3s       1111 001 0 0 . .. .... .... 0000 . . . 1 .... @3same
+VQADD_U_3s       1111 001 1 0 . .. .... .... 0000 . . . 1 .... @3same
+
 @3same_logic     .... ... . . . .. .... .... .... . q:1 .. .... \
                  &3same vm=%vm_dp vn=%vn_dp vd=%vd_dp size=0
 
@@ -54,6 +57,9 @@ VBSL_3s          1111 001 1 0 . 01 .... .... 0001 ... 1 .... @3same_logic
 VBIT_3s          1111 001 1 0 . 10 .... .... 0001 ... 1 .... @3same_logic
 VBIF_3s          1111 001 1 0 . 11 .... .... 0001 ... 1 .... @3same_logic
 
+VQSUB_S_3s       1111 001 0 0 . .. .... .... 0010 . . . 1 .... @3same
+VQSUB_U_3s       1111 001 1 0 . .. .... .... 0010 . . . 1 .... @3same
+
 VCGT_S_3s        1111 001 0 0 . .. .... .... 0011 . . . 0 .... @3same
 VCGT_U_3s        1111 001 1 0 . .. .... .... 0011 . . . 0 .... @3same
 VCGE_S_3s        1111 001 0 0 . .. .... .... 0011 . . . 1 .... @3same
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 19/36] target/arm: Convert Neon 3-reg-same VMUL, VMLA, VMLS, VSHL to decodetree
  2020-04-30 18:09 [PATCH 00/36] target/arm: Convert Neon to decodetree (part 1) Peter Maydell
                   ` (17 preceding siblings ...)
  2020-04-30 18:09 ` [PATCH 18/36] target/arm: Convert Neon 3-reg-same VQADD/VQSUB " Peter Maydell
@ 2020-04-30 18:09 ` Peter Maydell
  2020-04-30 19:58   ` Richard Henderson
  2020-04-30 18:09 ` [PATCH 20/36] target/arm: Convert Neon 3-reg-same VQRDMLAH/VQRDMLSH " Peter Maydell
                   ` (18 subsequent siblings)
  37 siblings, 1 reply; 85+ messages in thread
From: Peter Maydell @ 2020-04-30 18:09 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the Neon VMUL, VMLA, VMLS and VSHL insns in the
3-reg-same grouping to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-neon.inc.c | 44 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 28 +++------------------
 target/arm/neon-dp.decode       |  9 +++++++
 3 files changed, 56 insertions(+), 25 deletions(-)

diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index 854ab70cd79..50b77b6d714 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -631,6 +631,7 @@ DO_3SAME_NO_SZ_3(VMAX_S, tcg_gen_gvec_smax)
 DO_3SAME_NO_SZ_3(VMAX_U, tcg_gen_gvec_umax)
 DO_3SAME_NO_SZ_3(VMIN_S, tcg_gen_gvec_smin)
 DO_3SAME_NO_SZ_3(VMIN_U, tcg_gen_gvec_umin)
+DO_3SAME_NO_SZ_3(VMUL, tcg_gen_gvec_mul)
 
 #define DO_3SAME_CMP(INSN, COND)                                        \
     static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs,         \
@@ -668,3 +669,46 @@ DO_3SAME_GVEC4(VQADD_S, sqadd_op)
 DO_3SAME_GVEC4(VQADD_U, uqadd_op)
 DO_3SAME_GVEC4(VQSUB_S, sqsub_op)
 DO_3SAME_GVEC4(VQSUB_U, uqsub_op)
+
+static void gen_VMUL_p_3s(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+                           uint32_t rm_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+    tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, oprsz, maxsz,
+                       0, gen_helper_gvec_pmul_b);
+}
+
+static bool trans_VMUL_p_3s(DisasContext *s, arg_3same *a)
+{
+    if (a->size != 0) {
+        return false;
+    }
+    return do_3same(s, a, gen_VMUL_p_3s);
+}
+
+#define DO_3SAME_GVEC3_NO_SZ_3(INSN, OPARRAY)                           \
+    static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs,         \
+                                uint32_t rn_ofs, uint32_t rm_ofs,       \
+                                uint32_t oprsz, uint32_t maxsz)         \
+    {                                                                   \
+        tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs,                          \
+                       oprsz, maxsz, &OPARRAY[vece]);                   \
+    }                                                                   \
+    DO_3SAME_NO_SZ_3(INSN, gen_##INSN##_3s)
+
+
+DO_3SAME_GVEC3_NO_SZ_3(VMLA, mla_op)
+DO_3SAME_GVEC3_NO_SZ_3(VMLS, mls_op)
+
+#define DO_3SAME_GVEC3_SHIFT(INSN, OPARRAY)                             \
+    static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs,         \
+                                uint32_t rn_ofs, uint32_t rm_ofs,       \
+                                uint32_t oprsz, uint32_t maxsz)         \
+    {                                                                   \
+        /* Note the operation is vshl vd,vm,vn */                       \
+        tcg_gen_gvec_3(rd_ofs, rm_ofs, rn_ofs,                          \
+                       oprsz, maxsz, &OPARRAY[vece]);                   \
+    }                                                                   \
+    DO_3SAME(INSN, gen_##INSN##_3s)
+
+DO_3SAME_GVEC3_SHIFT(VSHL_S, sshl_op)
+DO_3SAME_GVEC3_SHIFT(VSHL_U, ushl_op)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 538e4be8f1b..ad60b7190f9 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -4848,31 +4848,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             }
             return 1;
 
-        case NEON_3R_VMUL: /* VMUL */
-            if (u) {
-                /* Polynomial case allows only P8.  */
-                if (size != 0) {
-                    return 1;
-                }
-                tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size,
-                                   0, gen_helper_gvec_pmul_b);
-            } else {
-                tcg_gen_gvec_mul(size, rd_ofs, rn_ofs, rm_ofs,
-                                 vec_size, vec_size);
-            }
-            return 0;
-
-        case NEON_3R_VML: /* VMLA, VMLS */
-            tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size,
-                           u ? &mls_op[size] : &mla_op[size]);
-            return 0;
-
-        case NEON_3R_VSHL:
-            /* Note the operation is vshl vd,vm,vn */
-            tcg_gen_gvec_3(rd_ofs, rm_ofs, rn_ofs, vec_size, vec_size,
-                           u ? &ushl_op[size] : &sshl_op[size]);
-            return 0;
-
         case NEON_3R_VADD_VSUB:
         case NEON_3R_LOGIC:
         case NEON_3R_VMAX:
@@ -4882,6 +4857,9 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         case NEON_3R_VCGE:
         case NEON_3R_VQADD:
         case NEON_3R_VQSUB:
+        case NEON_3R_VMUL:
+        case NEON_3R_VML:
+        case NEON_3R_VSHL:
             /* Already handled by decodetree */
             return 1;
         }
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index ab59b349aaa..ec3a92fe753 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -65,6 +65,9 @@ VCGT_U_3s        1111 001 1 0 . .. .... .... 0011 . . . 0 .... @3same
 VCGE_S_3s        1111 001 0 0 . .. .... .... 0011 . . . 1 .... @3same
 VCGE_U_3s        1111 001 1 0 . .. .... .... 0011 . . . 1 .... @3same
 
+VSHL_S_3s        1111 001 0 0 . .. .... .... 0100 . . . 0 .... @3same
+VSHL_U_3s        1111 001 1 0 . .. .... .... 0100 . . . 0 .... @3same
+
 VMAX_S_3s        1111 001 0 0 . .. .... .... 0110 . . . 0 .... @3same
 VMAX_U_3s        1111 001 1 0 . .. .... .... 0110 . . . 0 .... @3same
 VMIN_S_3s        1111 001 0 0 . .. .... .... 0110 . . . 1 .... @3same
@@ -75,3 +78,9 @@ VSUB_3s          1111 001 1 0 . .. .... .... 1000 . . . 0 .... @3same
 
 VTST_3s          1111 001 0 0 . .. .... .... 1000 . . . 1 .... @3same
 VCEQ_3s          1111 001 1 0 . .. .... .... 1000 . . . 1 .... @3same
+
+VMLA_3s          1111 001 0 0 . .. .... .... 1001 . . . 0 .... @3same
+VMLS_3s          1111 001 1 0 . .. .... .... 1001 . . . 0 .... @3same
+
+VMUL_3s          1111 001 0 0 . .. .... .... 1001 . . . 1 .... @3same
+VMUL_p_3s        1111 001 1 0 . .. .... .... 1001 . . . 1 .... @3same
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 20/36] target/arm: Convert Neon 3-reg-same VQRDMLAH/VQRDMLSH to decodetree
  2020-04-30 18:09 [PATCH 00/36] target/arm: Convert Neon to decodetree (part 1) Peter Maydell
                   ` (18 preceding siblings ...)
  2020-04-30 18:09 ` [PATCH 19/36] target/arm: Convert Neon 3-reg-same VMUL, VMLA, VMLS, VSHL " Peter Maydell
@ 2020-04-30 18:09 ` Peter Maydell
  2020-04-30 20:03   ` Richard Henderson
  2020-04-30 20:28   ` Richard Henderson
  2020-04-30 18:09 ` [PATCH 21/36] target/arm: Convert Neon 3-reg-same SHA " Peter Maydell
                   ` (17 subsequent siblings)
  37 siblings, 2 replies; 85+ messages in thread
From: Peter Maydell @ 2020-04-30 18:09 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the Neon VQRDMLAH and VQRDMLSH insns in the 3-reg-same group
to decodetree.  These don't use do_3same() because they want to
operate on VFP double registers, whose offsets are different from the
neon_reg_offset() calculations do_3same does.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-neon.inc.c | 57 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 36 ++-------------------
 target/arm/neon-dp.decode       |  3 ++
 3 files changed, 62 insertions(+), 34 deletions(-)

diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index 50b77b6d714..c8beb048fa2 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -712,3 +712,60 @@ DO_3SAME_GVEC3_NO_SZ_3(VMLS, mls_op)
 
 DO_3SAME_GVEC3_SHIFT(VSHL_S, sshl_op)
 DO_3SAME_GVEC3_SHIFT(VSHL_U, ushl_op)
+
+static bool do_vqrdmlah(DisasContext *s, arg_3same *a,
+                        gen_helper_gvec_3_ptr *fn)
+{
+    int vec_size = a->q ? 16 : 8;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON) ||
+        !dc_isar_feature(aa32_rdm, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if (!fn) {
+        return false; /* bad size */
+    }
+
+    if ((a->vn | a->vm | a->vd) & a->q) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    tcg_gen_gvec_3_ptr(vfp_reg_offset(1, a->vd),
+                       vfp_reg_offset(1, a->vn),
+                       vfp_reg_offset(1, a->vm),
+                       cpu_env, vec_size, vec_size, 0, fn);
+    return true;
+}
+
+static bool trans_VQRDMLAH_3s(DisasContext *s, arg_3same *a)
+{
+    static gen_helper_gvec_3_ptr * const fns[] = {
+        NULL,
+        gen_helper_gvec_qrdmlah_s16,
+        gen_helper_gvec_qrdmlah_s32,
+        NULL,
+    };
+    return do_vqrdmlah(s, a, fns[a->size]);
+}
+
+static bool trans_VQRDMLSH_3s(DisasContext *s, arg_3same *a)
+{
+    static gen_helper_gvec_3_ptr * const fns[] = {
+        NULL,
+        gen_helper_gvec_qrdmlsh_s16,
+        gen_helper_gvec_qrdmlsh_s32,
+        NULL,
+    };
+    return do_vqrdmlah(s, a, fns[a->size]);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index ad60b7190f9..adc42362469 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -3629,22 +3629,6 @@ static const uint8_t neon_2rm_sizes[] = {
     [NEON_2RM_VCVT_UF] = 0x4,
 };
 
-
-/* Expand v8.1 simd helper.  */
-static int do_v81_helper(DisasContext *s, gen_helper_gvec_3_ptr *fn,
-                         int q, int rd, int rn, int rm)
-{
-    if (dc_isar_feature(aa32_rdm, s)) {
-        int opr_sz = (1 + q) * 8;
-        tcg_gen_gvec_3_ptr(vfp_reg_offset(1, rd),
-                           vfp_reg_offset(1, rn),
-                           vfp_reg_offset(1, rm), cpu_env,
-                           opr_sz, opr_sz, 0, fn);
-        return 0;
-    }
-    return 1;
-}
-
 static void gen_ceq0_i32(TCGv_i32 d, TCGv_i32 a)
 {
     tcg_gen_setcondi_i32(TCG_COND_EQ, d, a, 0);
@@ -4818,15 +4802,7 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             if (!u) {
                 break;  /* VPADD */
             }
-            /* VQRDMLAH */
-            switch (size) {
-            case 1:
-                return do_v81_helper(s, gen_helper_gvec_qrdmlah_s16,
-                                     q, rd, rn, rm);
-            case 2:
-                return do_v81_helper(s, gen_helper_gvec_qrdmlah_s32,
-                                     q, rd, rn, rm);
-            }
+            /* VQRDMLAH : handled by decodetree */
             return 1;
 
         case NEON_3R_VFM_VQRDMLSH:
@@ -4837,15 +4813,7 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                 }
                 break;
             }
-            /* VQRDMLSH */
-            switch (size) {
-            case 1:
-                return do_v81_helper(s, gen_helper_gvec_qrdmlsh_s16,
-                                     q, rd, rn, rm);
-            case 2:
-                return do_v81_helper(s, gen_helper_gvec_qrdmlsh_s32,
-                                     q, rd, rn, rm);
-            }
+            /* VQRDMLSH : handled by decodetree */
             return 1;
 
         case NEON_3R_VADD_VSUB:
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index ec3a92fe753..ce0db476c88 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -84,3 +84,6 @@ VMLS_3s          1111 001 1 0 . .. .... .... 1001 . . . 0 .... @3same
 
 VMUL_3s          1111 001 0 0 . .. .... .... 1001 . . . 1 .... @3same
 VMUL_p_3s        1111 001 1 0 . .. .... .... 1001 . . . 1 .... @3same
+
+VQRDMLAH_3s      1111 001 1 0 . .. .... .... 1011 ... 1 .... @3same
+VQRDMLSH_3s      1111 001 1 0 . .. .... .... 1100 ... 1 .... @3same
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 21/36] target/arm: Convert Neon 3-reg-same SHA to decodetree
  2020-04-30 18:09 [PATCH 00/36] target/arm: Convert Neon to decodetree (part 1) Peter Maydell
                   ` (19 preceding siblings ...)
  2020-04-30 18:09 ` [PATCH 20/36] target/arm: Convert Neon 3-reg-same VQRDMLAH/VQRDMLSH " Peter Maydell
@ 2020-04-30 18:09 ` Peter Maydell
  2020-04-30 20:30   ` Richard Henderson
  2020-04-30 18:09 ` [PATCH 22/36] target/arm: Move gen_ function typedefs to translate.h Peter Maydell
                   ` (16 subsequent siblings)
  37 siblings, 1 reply; 85+ messages in thread
From: Peter Maydell @ 2020-04-30 18:09 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the Neon SHA instructions in the 3-reg-same group
to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-neon.inc.c | 139 ++++++++++++++++++++++++++++++++
 target/arm/translate.c          |  46 +----------
 target/arm/neon-dp.decode       |  10 +++
 3 files changed, 151 insertions(+), 44 deletions(-)

diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index c8beb048fa2..161313ad879 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -769,3 +769,142 @@ static bool trans_VQRDMLSH_3s(DisasContext *s, arg_3same *a)
     };
     return do_vqrdmlah(s, a, fns[a->size]);
 }
+
+static bool trans_SHA1_3s(DisasContext *s, arg_SHA1_3s *a)
+{
+    TCGv_ptr ptr1, ptr2, ptr3;
+    TCGv_i32 tmp;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON) ||
+        !dc_isar_feature(aa32_sha1, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if ((a->vn | a->vm | a->vd) & 1) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    ptr1 = vfp_reg_ptr(true, a->vd);
+    ptr2 = vfp_reg_ptr(true, a->vn);
+    ptr3 = vfp_reg_ptr(true, a->vm);
+    tmp = tcg_const_i32(a->optype);
+    gen_helper_crypto_sha1_3reg(ptr1, ptr2, ptr3, tmp);
+    tcg_temp_free_i32(tmp);
+    tcg_temp_free_ptr(ptr1);
+    tcg_temp_free_ptr(ptr2);
+    tcg_temp_free_ptr(ptr3);
+
+    return true;
+}
+
+static bool trans_SHA256H_3s(DisasContext *s, arg_SHA256H_3s *a)
+{
+    TCGv_ptr ptr1, ptr2, ptr3;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON) ||
+        !dc_isar_feature(aa32_sha2, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if ((a->vn | a->vm | a->vd) & 1) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    ptr1 = vfp_reg_ptr(true, a->vd);
+    ptr2 = vfp_reg_ptr(true, a->vn);
+    ptr3 = vfp_reg_ptr(true, a->vm);
+    gen_helper_crypto_sha256h(ptr1, ptr2, ptr3);
+    tcg_temp_free_ptr(ptr1);
+    tcg_temp_free_ptr(ptr2);
+    tcg_temp_free_ptr(ptr3);
+
+    return true;
+}
+
+static bool trans_SHA256H2_3s(DisasContext *s, arg_SHA256H2_3s *a)
+{
+    TCGv_ptr ptr1, ptr2, ptr3;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON) ||
+        !dc_isar_feature(aa32_sha2, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if ((a->vn | a->vm | a->vd) & 1) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    ptr1 = vfp_reg_ptr(true, a->vd);
+    ptr2 = vfp_reg_ptr(true, a->vn);
+    ptr3 = vfp_reg_ptr(true, a->vm);
+    gen_helper_crypto_sha256h2(ptr1, ptr2, ptr3);
+    tcg_temp_free_ptr(ptr1);
+    tcg_temp_free_ptr(ptr2);
+    tcg_temp_free_ptr(ptr3);
+
+    return true;
+}
+
+static bool trans_SHA256SU1_3s(DisasContext *s, arg_SHA256SU1_3s *a)
+{
+    TCGv_ptr ptr1, ptr2, ptr3;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON) ||
+        !dc_isar_feature(aa32_sha2, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if ((a->vn | a->vm | a->vd) & 1) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    ptr1 = vfp_reg_ptr(true, a->vd);
+    ptr2 = vfp_reg_ptr(true, a->vn);
+    ptr3 = vfp_reg_ptr(true, a->vm);
+    gen_helper_crypto_sha256su1(ptr1, ptr2, ptr3);
+    tcg_temp_free_ptr(ptr1);
+    tcg_temp_free_ptr(ptr2);
+    tcg_temp_free_ptr(ptr3);
+
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index adc42362469..160638e2a7c 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -4711,7 +4711,7 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
     int vec_size;
     uint32_t imm;
     TCGv_i32 tmp, tmp2, tmp3, tmp4, tmp5;
-    TCGv_ptr ptr1, ptr2, ptr3;
+    TCGv_ptr ptr1, ptr2;
     TCGv_i64 tmp64;
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
@@ -4755,49 +4755,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             return 1;
         }
         switch (op) {
-        case NEON_3R_SHA:
-            /* The SHA-1/SHA-256 3-register instructions require special
-             * treatment here, as their size field is overloaded as an
-             * op type selector, and they all consume their input in a
-             * single pass.
-             */
-            if (!q) {
-                return 1;
-            }
-            if (!u) { /* SHA-1 */
-                if (!dc_isar_feature(aa32_sha1, s)) {
-                    return 1;
-                }
-                ptr1 = vfp_reg_ptr(true, rd);
-                ptr2 = vfp_reg_ptr(true, rn);
-                ptr3 = vfp_reg_ptr(true, rm);
-                tmp4 = tcg_const_i32(size);
-                gen_helper_crypto_sha1_3reg(ptr1, ptr2, ptr3, tmp4);
-                tcg_temp_free_i32(tmp4);
-            } else { /* SHA-256 */
-                if (!dc_isar_feature(aa32_sha2, s) || size == 3) {
-                    return 1;
-                }
-                ptr1 = vfp_reg_ptr(true, rd);
-                ptr2 = vfp_reg_ptr(true, rn);
-                ptr3 = vfp_reg_ptr(true, rm);
-                switch (size) {
-                case 0:
-                    gen_helper_crypto_sha256h(ptr1, ptr2, ptr3);
-                    break;
-                case 1:
-                    gen_helper_crypto_sha256h2(ptr1, ptr2, ptr3);
-                    break;
-                case 2:
-                    gen_helper_crypto_sha256su1(ptr1, ptr2, ptr3);
-                    break;
-                }
-            }
-            tcg_temp_free_ptr(ptr1);
-            tcg_temp_free_ptr(ptr2);
-            tcg_temp_free_ptr(ptr3);
-            return 0;
-
         case NEON_3R_VPADD_VQRDMLAH:
             if (!u) {
                 break;  /* VPADD */
@@ -4828,6 +4785,7 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         case NEON_3R_VMUL:
         case NEON_3R_VML:
         case NEON_3R_VSHL:
+        case NEON_3R_SHA:
             /* Already handled by decodetree */
             return 1;
         }
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index ce0db476c88..f22606b2bd5 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -86,4 +86,14 @@ VMUL_3s          1111 001 0 0 . .. .... .... 1001 . . . 1 .... @3same
 VMUL_p_3s        1111 001 1 0 . .. .... .... 1001 . . . 1 .... @3same
 
 VQRDMLAH_3s      1111 001 1 0 . .. .... .... 1011 ... 1 .... @3same
+
+SHA1_3s          1111 001 0 0 . optype:2 .... .... 1100 . 1 . 0 .... \
+                 vm=%vm_dp vn=%vn_dp vd=%vd_dp
+SHA256H_3s       1111 001 1 0 . 00 .... .... 1100 . 1 . 0 .... \
+                 vm=%vm_dp vn=%vn_dp vd=%vd_dp
+SHA256H2_3s      1111 001 1 0 . 01 .... .... 1100 . 1 . 0 .... \
+                 vm=%vm_dp vn=%vn_dp vd=%vd_dp
+SHA256SU1_3s     1111 001 1 0 . 10 .... .... 1100 . 1 . 0 .... \
+                 vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
 VQRDMLSH_3s      1111 001 1 0 . .. .... .... 1100 ... 1 .... @3same
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 22/36] target/arm: Move gen_ function typedefs to translate.h
  2020-04-30 18:09 [PATCH 00/36] target/arm: Convert Neon to decodetree (part 1) Peter Maydell
                   ` (20 preceding siblings ...)
  2020-04-30 18:09 ` [PATCH 21/36] target/arm: Convert Neon 3-reg-same SHA " Peter Maydell
@ 2020-04-30 18:09 ` Peter Maydell
  2020-04-30 20:32   ` Richard Henderson
  2020-04-30 18:09 ` [PATCH 23/36] target/arm: Convert Neon 64-bit element 3-reg-same insns Peter Maydell
                   ` (15 subsequent siblings)
  37 siblings, 1 reply; 85+ messages in thread
From: Peter Maydell @ 2020-04-30 18:09 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

We're going to want at least some of the NeonGen* typedefs
for the refactored 32-bit Neon decoder, so move them all
to translate.h since it makes more sense to keep them in
one group.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.h     | 17 +++++++++++++++++
 target/arm/translate-a64.c | 17 -----------------
 2 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/target/arm/translate.h b/target/arm/translate.h
index 95b43e7ab65..cb7925ea461 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -314,4 +314,21 @@ typedef void GVecGen3Fn(unsigned, uint32_t, uint32_t,
 typedef void GVecGen4Fn(unsigned, uint32_t, uint32_t, uint32_t,
                         uint32_t, uint32_t, uint32_t);
 
+/* Function prototype for gen_ functions for calling Neon helpers */
+typedef void NeonGenOneOpEnvFn(TCGv_i32, TCGv_ptr, TCGv_i32);
+typedef void NeonGenTwoOpFn(TCGv_i32, TCGv_i32, TCGv_i32);
+typedef void NeonGenTwoOpEnvFn(TCGv_i32, TCGv_ptr, TCGv_i32, TCGv_i32);
+typedef void NeonGenTwo64OpFn(TCGv_i64, TCGv_i64, TCGv_i64);
+typedef void NeonGenTwo64OpEnvFn(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv_i64);
+typedef void NeonGenNarrowFn(TCGv_i32, TCGv_i64);
+typedef void NeonGenNarrowEnvFn(TCGv_i32, TCGv_ptr, TCGv_i64);
+typedef void NeonGenWidenFn(TCGv_i64, TCGv_i32);
+typedef void NeonGenTwoSingleOPFn(TCGv_i32, TCGv_i32, TCGv_i32, TCGv_ptr);
+typedef void NeonGenTwoDoubleOPFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_ptr);
+typedef void NeonGenOneOpFn(TCGv_i64, TCGv_i64);
+typedef void CryptoTwoOpFn(TCGv_ptr, TCGv_ptr);
+typedef void CryptoThreeOpIntFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
+typedef void CryptoThreeOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
+typedef void AtomicThreeOpFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGArg, MemOp);
+
 #endif /* TARGET_ARM_TRANSLATE_H */
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index efb1c4adc4e..a896f9c4b83 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -70,23 +70,6 @@ typedef struct AArch64DecodeTable {
     AArch64DecodeFn *disas_fn;
 } AArch64DecodeTable;
 
-/* Function prototype for gen_ functions for calling Neon helpers */
-typedef void NeonGenOneOpEnvFn(TCGv_i32, TCGv_ptr, TCGv_i32);
-typedef void NeonGenTwoOpFn(TCGv_i32, TCGv_i32, TCGv_i32);
-typedef void NeonGenTwoOpEnvFn(TCGv_i32, TCGv_ptr, TCGv_i32, TCGv_i32);
-typedef void NeonGenTwo64OpFn(TCGv_i64, TCGv_i64, TCGv_i64);
-typedef void NeonGenTwo64OpEnvFn(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv_i64);
-typedef void NeonGenNarrowFn(TCGv_i32, TCGv_i64);
-typedef void NeonGenNarrowEnvFn(TCGv_i32, TCGv_ptr, TCGv_i64);
-typedef void NeonGenWidenFn(TCGv_i64, TCGv_i32);
-typedef void NeonGenTwoSingleOPFn(TCGv_i32, TCGv_i32, TCGv_i32, TCGv_ptr);
-typedef void NeonGenTwoDoubleOPFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_ptr);
-typedef void NeonGenOneOpFn(TCGv_i64, TCGv_i64);
-typedef void CryptoTwoOpFn(TCGv_ptr, TCGv_ptr);
-typedef void CryptoThreeOpIntFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
-typedef void CryptoThreeOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
-typedef void AtomicThreeOpFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGArg, MemOp);
-
 /* initialize TCG globals.  */
 void a64_translate_init(void)
 {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 23/36] target/arm: Convert Neon 64-bit element 3-reg-same insns
  2020-04-30 18:09 [PATCH 00/36] target/arm: Convert Neon to decodetree (part 1) Peter Maydell
                   ` (21 preceding siblings ...)
  2020-04-30 18:09 ` [PATCH 22/36] target/arm: Move gen_ function typedefs to translate.h Peter Maydell
@ 2020-04-30 18:09 ` Peter Maydell
  2020-04-30 20:54   ` Richard Henderson
  2020-04-30 18:09 ` [PATCH 24/36] target/arm: Convert Neon VHADD " Peter Maydell
                   ` (14 subsequent siblings)
  37 siblings, 1 reply; 85+ messages in thread
From: Peter Maydell @ 2020-04-30 18:09 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the 64-bit element insns in the 3-reg-same group
to decodetree. This covers VQSHL, VRSHL and VQRSHL where
size==0b11.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-neon.inc.c | 62 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 38 ++------------------
 target/arm/neon-dp.decode       | 11 ++++++
 3 files changed, 75 insertions(+), 36 deletions(-)

diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index 161313ad879..bc5afb368e3 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -908,3 +908,65 @@ static bool trans_SHA256SU1_3s(DisasContext *s, arg_SHA256SU1_3s *a)
 
     return true;
 }
+
+static bool do_3same_64(DisasContext *s, arg_3same *a, NeonGenTwo64OpFn *fn)
+{
+    /* Handle 3-reg-same operations to be performed 64 bits at a time */
+    TCGv_i64 rn, rm, rd;
+    int pass;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if ((a->vn | a->vm | a->vd) & a->q) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    rn = tcg_temp_new_i64();
+    rm = tcg_temp_new_i64();
+    rd = tcg_temp_new_i64();
+
+    for (pass = 0; pass < (a->q ? 2 : 1); pass++) {
+        neon_load_reg64(rn, a->vn + pass);
+        neon_load_reg64(rm, a->vm + pass);
+        fn(rd, rm, rn);
+        neon_store_reg64(rd, a->vd + pass);
+    }
+
+    tcg_temp_free_i64(rn);
+    tcg_temp_free_i64(rm);
+    tcg_temp_free_i64(rd);
+
+    return true;
+}
+
+#define DO_3SAME_64(INSN, FUNC)                                         \
+    static bool trans_##INSN##_3s(DisasContext *s, arg_3same *a)        \
+    {                                                                   \
+        return do_3same_64(s, a, FUNC);                                 \
+    }
+
+#define DO_3SAME_64_ENV(INSN, FUNC)                                     \
+    static void gen_##INSN##_3s(TCGv_i64 d, TCGv_i64 n, TCGv_i64 m)     \
+    {                                                                   \
+        FUNC(d, cpu_env, n, m);                                         \
+    }                                                                   \
+    DO_3SAME_64(INSN, gen_##INSN##_3s)
+
+DO_3SAME_64(VRSHL_S64, gen_helper_neon_rshl_s64)
+DO_3SAME_64(VRSHL_U64, gen_helper_neon_rshl_u64)
+DO_3SAME_64_ENV(VQSHL_S64, gen_helper_neon_qshl_s64)
+DO_3SAME_64_ENV(VQSHL_U64, gen_helper_neon_qshl_u64)
+DO_3SAME_64_ENV(VQRSHL_S64, gen_helper_neon_qrshl_s64)
+DO_3SAME_64_ENV(VQRSHL_U64, gen_helper_neon_qrshl_u64)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 160638e2a7c..fb64eb3a800 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -4791,42 +4791,8 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         }
 
         if (size == 3) {
-            /* 64-bit element instructions. */
-            for (pass = 0; pass < (q ? 2 : 1); pass++) {
-                neon_load_reg64(cpu_V0, rn + pass);
-                neon_load_reg64(cpu_V1, rm + pass);
-                switch (op) {
-                case NEON_3R_VQSHL:
-                    if (u) {
-                        gen_helper_neon_qshl_u64(cpu_V0, cpu_env,
-                                                 cpu_V1, cpu_V0);
-                    } else {
-                        gen_helper_neon_qshl_s64(cpu_V0, cpu_env,
-                                                 cpu_V1, cpu_V0);
-                    }
-                    break;
-                case NEON_3R_VRSHL:
-                    if (u) {
-                        gen_helper_neon_rshl_u64(cpu_V0, cpu_V1, cpu_V0);
-                    } else {
-                        gen_helper_neon_rshl_s64(cpu_V0, cpu_V1, cpu_V0);
-                    }
-                    break;
-                case NEON_3R_VQRSHL:
-                    if (u) {
-                        gen_helper_neon_qrshl_u64(cpu_V0, cpu_env,
-                                                  cpu_V1, cpu_V0);
-                    } else {
-                        gen_helper_neon_qrshl_s64(cpu_V0, cpu_env,
-                                                  cpu_V1, cpu_V0);
-                    }
-                    break;
-                default:
-                    abort();
-                }
-                neon_store_reg64(cpu_V0, rd + pass);
-            }
-            return 0;
+            /* 64-bit element instructions: handled by decodetree */
+            return 1;
         }
         pairwise = 0;
         switch (op) {
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index f22606b2bd5..a4932e550ed 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -68,6 +68,17 @@ VCGE_U_3s        1111 001 1 0 . .. .... .... 0011 . . . 1 .... @3same
 VSHL_S_3s        1111 001 0 0 . .. .... .... 0100 . . . 0 .... @3same
 VSHL_U_3s        1111 001 1 0 . .. .... .... 0100 . . . 0 .... @3same
 
+# Insns operating on 64-bit elements (size!=0b11 handled elsewhere)
+@3same_64        .... ... . . . 11 .... .... .... . q:1 . . .... \
+                 &3same vm=%vm_dp vn=%vn_dp vd=%vd_dp size=3
+
+VQSHL_S64_3s     1111 001 0 0 . .. .... .... 0100 . . . 1 .... @3same_64
+VQSHL_U64_3s     1111 001 1 0 . .. .... .... 0100 . . . 1 .... @3same_64
+VRSHL_S64_3s     1111 001 0 0 . .. .... .... 0101 . . . 0 .... @3same_64
+VRSHL_U64_3s     1111 001 1 0 . .. .... .... 0101 . . . 0 .... @3same_64
+VQRSHL_S64_3s    1111 001 0 0 . .. .... .... 0101 . . . 1 .... @3same_64
+VQRSHL_U64_3s    1111 001 1 0 . .. .... .... 0101 . . . 1 .... @3same_64
+
 VMAX_S_3s        1111 001 0 0 . .. .... .... 0110 . . . 0 .... @3same
 VMAX_U_3s        1111 001 1 0 . .. .... .... 0110 . . . 0 .... @3same
 VMIN_S_3s        1111 001 0 0 . .. .... .... 0110 . . . 1 .... @3same
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 24/36] target/arm: Convert Neon VHADD 3-reg-same insns
  2020-04-30 18:09 [PATCH 00/36] target/arm: Convert Neon to decodetree (part 1) Peter Maydell
                   ` (22 preceding siblings ...)
  2020-04-30 18:09 ` [PATCH 23/36] target/arm: Convert Neon 64-bit element 3-reg-same insns Peter Maydell
@ 2020-04-30 18:09 ` Peter Maydell
  2020-04-30 20:59   ` Richard Henderson
  2020-04-30 18:09 ` [PATCH 25/36] target/arm: Convert Neon VRHADD, VHSUB, VABD 3-reg-same insns to decodetree Peter Maydell
                   ` (13 subsequent siblings)
  37 siblings, 1 reply; 85+ messages in thread
From: Peter Maydell @ 2020-04-30 18:09 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the Neon VHADD insns in the 3-reg-same group to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-neon.inc.c | 62 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          |  4 +--
 target/arm/neon-dp.decode       |  2 ++
 3 files changed, 65 insertions(+), 3 deletions(-)

diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index bc5afb368e3..7a602d76566 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -970,3 +970,65 @@ DO_3SAME_64_ENV(VQSHL_S64, gen_helper_neon_qshl_s64)
 DO_3SAME_64_ENV(VQSHL_U64, gen_helper_neon_qshl_u64)
 DO_3SAME_64_ENV(VQRSHL_S64, gen_helper_neon_qrshl_s64)
 DO_3SAME_64_ENV(VQRSHL_U64, gen_helper_neon_qrshl_u64)
+
+static bool do_3same_32(DisasContext *s, arg_3same *a, NeonGenTwoOpFn *fn)
+{
+    /* Operations handled elementwise 32 bits at a time */
+    TCGv_i32 tmp, tmp2;
+    int pass;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if ((a->vn | a->vm | a->vd) & a->q) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
+        tmp = neon_load_reg(a->vn, pass);
+        tmp2 = neon_load_reg(a->vm, pass);
+        fn(tmp, tmp, tmp2);
+        tcg_temp_free_i32(tmp2);
+        neon_store_reg(a->vd, pass, tmp);
+    }
+    return true;
+}
+
+#define DO_3SAME_32(INSN, func)                                         \
+    static bool trans_##INSN##_S_3s(DisasContext *s, arg_3same *a)      \
+    {                                                                   \
+        static NeonGenTwoOpFn * const fns[] = {                         \
+            gen_helper_neon_##func##_s8,                                \
+            gen_helper_neon_##func##_s16,                               \
+            gen_helper_neon_##func##_s32,                               \
+        };                                                              \
+        if (a->size > 2) {                                              \
+            return false;                                               \
+        }                                                               \
+        return do_3same_32(s, a, fns[a->size]);                         \
+    }                                                                   \
+    static bool trans_##INSN##_U_3s(DisasContext *s, arg_3same *a)      \
+    {                                                                   \
+        static NeonGenTwoOpFn * const fns[] = {                         \
+            gen_helper_neon_##func##_u8,                                \
+            gen_helper_neon_##func##_u16,                               \
+            gen_helper_neon_##func##_u32,                               \
+        };                                                              \
+        if (a->size > 2) {                                              \
+            return false;                                               \
+        }                                                               \
+        return do_3same_32(s, a, fns[a->size]);                         \
+    }
+
+DO_3SAME_32(VHADD, hadd)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index fb64eb3a800..67616fc218a 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -4786,6 +4786,7 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         case NEON_3R_VML:
         case NEON_3R_VSHL:
         case NEON_3R_SHA:
+        case NEON_3R_VHADD:
             /* Already handled by decodetree */
             return 1;
         }
@@ -4866,9 +4867,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             tmp2 = neon_load_reg(rm, pass);
         }
         switch (op) {
-        case NEON_3R_VHADD:
-            GEN_NEON_INTEGER_OP(hadd);
-            break;
         case NEON_3R_VRHADD:
             GEN_NEON_INTEGER_OP(rhadd);
             break;
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index a4932e550ed..055004df4e8 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -42,6 +42,8 @@
 @3same           .... ... . . . size:2 .... .... .... . q:1 . . .... \
                  &3same vm=%vm_dp vn=%vn_dp vd=%vd_dp
 
+VHADD_S_3s       1111 001 0 0 . .. .... .... 0000 . . . 0 .... @3same
+VHADD_U_3s       1111 001 1 0 . .. .... .... 0000 . . . 0 .... @3same
 VQADD_S_3s       1111 001 0 0 . .. .... .... 0000 . . . 1 .... @3same
 VQADD_U_3s       1111 001 1 0 . .. .... .... 0000 . . . 1 .... @3same
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 25/36] target/arm: Convert Neon VRHADD, VHSUB, VABD 3-reg-same insns to decodetree
  2020-04-30 18:09 [PATCH 00/36] target/arm: Convert Neon to decodetree (part 1) Peter Maydell
                   ` (23 preceding siblings ...)
  2020-04-30 18:09 ` [PATCH 24/36] target/arm: Convert Neon VHADD " Peter Maydell
@ 2020-04-30 18:09 ` Peter Maydell
  2020-04-30 18:09 ` [PATCH 26/36] target/arm: Convert Neon VQSHL, VRSHL, VQRSHL " Peter Maydell
                   ` (12 subsequent siblings)
  37 siblings, 0 replies; 85+ messages in thread
From: Peter Maydell @ 2020-04-30 18:09 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the Neon VRHADD, VHSUB and VABD 3-reg-same insns to
decodetree.  (These are all the other insns in 3-reg-same which were
using GEN_NEON_INTEGER_OP() and which are not pairwise or
reversed-operands.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-neon.inc.c |  3 +++
 target/arm/translate.c          | 12 +++---------
 target/arm/neon-dp.decode       |  9 +++++++++
 3 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index 7a602d76566..bdd5f33214e 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -1032,3 +1032,6 @@ static bool do_3same_32(DisasContext *s, arg_3same *a, NeonGenTwoOpFn *fn)
     }
 
 DO_3SAME_32(VHADD, hadd)
+DO_3SAME_32(VHSUB, hsub)
+DO_3SAME_32(VRHADD, rhadd)
+DO_3SAME_32(VABD, abd)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 67616fc218a..29301061ca5 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -4787,6 +4787,9 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         case NEON_3R_VSHL:
         case NEON_3R_SHA:
         case NEON_3R_VHADD:
+        case NEON_3R_VRHADD:
+        case NEON_3R_VHSUB:
+        case NEON_3R_VABD:
             /* Already handled by decodetree */
             return 1;
         }
@@ -4867,12 +4870,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             tmp2 = neon_load_reg(rm, pass);
         }
         switch (op) {
-        case NEON_3R_VRHADD:
-            GEN_NEON_INTEGER_OP(rhadd);
-            break;
-        case NEON_3R_VHSUB:
-            GEN_NEON_INTEGER_OP(hsub);
-            break;
         case NEON_3R_VQSHL:
             GEN_NEON_INTEGER_OP_ENV(qshl);
             break;
@@ -4882,9 +4879,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         case NEON_3R_VQRSHL:
             GEN_NEON_INTEGER_OP_ENV(qrshl);
             break;
-        case NEON_3R_VABD:
-            GEN_NEON_INTEGER_OP(abd);
-            break;
         case NEON_3R_VABA:
             GEN_NEON_INTEGER_OP(abd);
             tcg_temp_free_i32(tmp2);
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index 055004df4e8..4b15e52221b 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -47,6 +47,9 @@ VHADD_U_3s       1111 001 1 0 . .. .... .... 0000 . . . 0 .... @3same
 VQADD_S_3s       1111 001 0 0 . .. .... .... 0000 . . . 1 .... @3same
 VQADD_U_3s       1111 001 1 0 . .. .... .... 0000 . . . 1 .... @3same
 
+VRHADD_S_3s      1111 001 0 0 . .. .... .... 0001 . . . 0 .... @3same
+VRHADD_U_3s      1111 001 1 0 . .. .... .... 0001 . . . 0 .... @3same
+
 @3same_logic     .... ... . . . .. .... .... .... . q:1 .. .... \
                  &3same vm=%vm_dp vn=%vn_dp vd=%vd_dp size=0
 
@@ -59,6 +62,9 @@ VBSL_3s          1111 001 1 0 . 01 .... .... 0001 ... 1 .... @3same_logic
 VBIT_3s          1111 001 1 0 . 10 .... .... 0001 ... 1 .... @3same_logic
 VBIF_3s          1111 001 1 0 . 11 .... .... 0001 ... 1 .... @3same_logic
 
+VHSUB_S_3s       1111 001 0 0 . .. .... .... 0010 . . . 0 .... @3same
+VHSUB_U_3s       1111 001 1 0 . .. .... .... 0010 . . . 0 .... @3same
+
 VQSUB_S_3s       1111 001 0 0 . .. .... .... 0010 . . . 1 .... @3same
 VQSUB_U_3s       1111 001 1 0 . .. .... .... 0010 . . . 1 .... @3same
 
@@ -86,6 +92,9 @@ VMAX_U_3s        1111 001 1 0 . .. .... .... 0110 . . . 0 .... @3same
 VMIN_S_3s        1111 001 0 0 . .. .... .... 0110 . . . 1 .... @3same
 VMIN_U_3s        1111 001 1 0 . .. .... .... 0110 . . . 1 .... @3same
 
+VABD_S_3s        1111 001 0 0 . .. .... .... 0111 . . . 0 .... @3same
+VABD_U_3s        1111 001 1 0 . .. .... .... 0111 . . . 0 .... @3same
+
 VADD_3s          1111 001 0 0 . .. .... .... 1000 . . . 0 .... @3same
 VSUB_3s          1111 001 1 0 . .. .... .... 1000 . . . 0 .... @3same
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 26/36] target/arm: Convert Neon VQSHL, VRSHL, VQRSHL 3-reg-same insns to decodetree
  2020-04-30 18:09 [PATCH 00/36] target/arm: Convert Neon to decodetree (part 1) Peter Maydell
                   ` (24 preceding siblings ...)
  2020-04-30 18:09 ` [PATCH 25/36] target/arm: Convert Neon VRHADD, VHSUB, VABD 3-reg-same insns to decodetree Peter Maydell
@ 2020-04-30 18:09 ` Peter Maydell
  2020-05-01  1:55   ` Richard Henderson
  2020-04-30 18:09 ` [PATCH 27/36] target/arm: Convert Neon VABA 3-reg-same " Peter Maydell
                   ` (11 subsequent siblings)
  37 siblings, 1 reply; 85+ messages in thread
From: Peter Maydell @ 2020-04-30 18:09 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the VQSHL, VRSHL and VQRSHL insns in the 3-reg-same
group to decodetree. We have already implemented the size==0b11
case of these insns; this commit handles the remaining sizes.

TODO: find out from rth why decodetree insists on VSHL going
into the group...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-neon.inc.c | 93 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 23 ++------
 target/arm/neon-dp.decode       | 30 ++++++++---
 3 files changed, 120 insertions(+), 26 deletions(-)

diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index bdd5f33214e..084c78eea58 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -1035,3 +1035,96 @@ DO_3SAME_32(VHADD, hadd)
 DO_3SAME_32(VHSUB, hsub)
 DO_3SAME_32(VRHADD, rhadd)
 DO_3SAME_32(VABD, abd)
+
+static bool do_3same_qs32(DisasContext *s, arg_3same *a, NeonGenTwoOpEnvFn *fn)
+{
+    /*
+     * Saturating shift operations handled elementwise 32 bits at a
+     * time which need to pass cpu_env to the helper and where the rn
+     * and rm operands are reversed from the usual do_3same() order.
+     */
+    TCGv_i32 tmp, tmp2;
+    int pass;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if ((a->vn | a->vm | a->vd) & a->q) {
+        return false;
+    }
+
+    if (a->size == 3) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
+        /* Note reversal of operand order */
+        tmp = neon_load_reg(a->vm, pass);
+        tmp2 = neon_load_reg(a->vn, pass);
+        fn(tmp, cpu_env, tmp, tmp2);
+        tcg_temp_free_i32(tmp2);
+        neon_store_reg(a->vd, pass, tmp);
+    }
+    return true;
+}
+
+/*
+ * Handling for shifts with sizes 8/16/32 bits. 64-bit shifts are
+ * covered by the *_S64_3s and *_U64_3s patterns and the grouping in
+ * the decode file means those functions are called first for
+ * size==0b11. Note that we must 'return false' here for the
+ * size==0b11 case rather than asserting, because where the 64-bit
+ * function has an UNDEF case and returns false the decoder will fall
+ * through to trying these functions.
+ */
+#define DO_3SAME_QS32(INSN, func)                                       \
+    static bool trans_##INSN##_3s(DisasContext *s, arg_3same *a)        \
+    {                                                                   \
+        static NeonGenTwoOpEnvFn * const fns[] = {                      \
+            gen_helper_neon_##func##8,                                  \
+            gen_helper_neon_##func##16,                                 \
+            gen_helper_neon_##func##32,                                 \
+        };                                                              \
+        if (a->size > 2) {                                              \
+            return false;                                               \
+        }                                                               \
+        return do_3same_qs32(s, a, fns[a->size]);                       \
+    }
+
+DO_3SAME_QS32(VQSHL_S,qshl_s)
+DO_3SAME_QS32(VQSHL_U,qshl_u)
+DO_3SAME_QS32(VQRSHL_S,qrshl_s)
+DO_3SAME_QS32(VQRSHL_U,qrshl_u)
+
+#define DO_3SAME_SHIFT32(INSN, func) \
+    static bool trans_##INSN##_3s(DisasContext *s, arg_3same *a)        \
+    {                                                                   \
+        static NeonGenTwoOpFn * const fns[] = {                         \
+            gen_helper_neon_##func##8,                                  \
+            gen_helper_neon_##func##16,                                 \
+            gen_helper_neon_##func##32,                                 \
+        };                                                              \
+        int rtmp;                                                       \
+        if (a->size > 2) {                                              \
+            return false;                                               \
+        }                                                               \
+        /* Shift operand order is reversed */                           \
+        rtmp = a->vn;                                                   \
+        a->vn = a->vm;                                                  \
+        a->vm = rtmp;                                                   \
+        return do_3same_32(s, a, fns[a->size]);                         \
+    }
+
+DO_3SAME_SHIFT32(VRSHL_S, rshl_s)
+DO_3SAME_SHIFT32(VRSHL_U, rshl_u)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 29301061ca5..4406fe54647 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -4790,6 +4790,9 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         case NEON_3R_VRHADD:
         case NEON_3R_VHSUB:
         case NEON_3R_VABD:
+        case NEON_3R_VQSHL:
+        case NEON_3R_VRSHL:
+        case NEON_3R_VQRSHL:
             /* Already handled by decodetree */
             return 1;
         }
@@ -4800,17 +4803,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         }
         pairwise = 0;
         switch (op) {
-        case NEON_3R_VQSHL:
-        case NEON_3R_VRSHL:
-        case NEON_3R_VQRSHL:
-            {
-                int rtmp;
-                /* Shift instruction operands are reversed.  */
-                rtmp = rn;
-                rn = rm;
-                rm = rtmp;
-            }
-            break;
         case NEON_3R_VPADD_VQRDMLAH:
         case NEON_3R_VPMAX:
         case NEON_3R_VPMIN:
@@ -4870,15 +4862,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             tmp2 = neon_load_reg(rm, pass);
         }
         switch (op) {
-        case NEON_3R_VQSHL:
-            GEN_NEON_INTEGER_OP_ENV(qshl);
-            break;
-        case NEON_3R_VRSHL:
-            GEN_NEON_INTEGER_OP(rshl);
-            break;
-        case NEON_3R_VQRSHL:
-            GEN_NEON_INTEGER_OP_ENV(qrshl);
-            break;
         case NEON_3R_VABA:
             GEN_NEON_INTEGER_OP(abd);
             tcg_temp_free_i32(tmp2);
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index 4b15e52221b..ae442071ef1 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -80,12 +80,30 @@ VSHL_U_3s        1111 001 1 0 . .. .... .... 0100 . . . 0 .... @3same
 @3same_64        .... ... . . . 11 .... .... .... . q:1 . . .... \
                  &3same vm=%vm_dp vn=%vn_dp vd=%vd_dp size=3
 
-VQSHL_S64_3s     1111 001 0 0 . .. .... .... 0100 . . . 1 .... @3same_64
-VQSHL_U64_3s     1111 001 1 0 . .. .... .... 0100 . . . 1 .... @3same_64
-VRSHL_S64_3s     1111 001 0 0 . .. .... .... 0101 . . . 0 .... @3same_64
-VRSHL_U64_3s     1111 001 1 0 . .. .... .... 0101 . . . 0 .... @3same_64
-VQRSHL_S64_3s    1111 001 0 0 . .. .... .... 0101 . . . 1 .... @3same_64
-VQRSHL_U64_3s    1111 001 1 0 . .. .... .... 0101 . . . 1 .... @3same_64
+{
+  VQSHL_S64_3s   1111 001 0 0 . .. .... .... 0100 . . . 1 .... @3same_64
+  VQSHL_S_3s     1111 001 0 0 . .. .... .... 0100 . . . 1 .... @3same
+}
+{
+  VQSHL_U64_3s   1111 001 1 0 . .. .... .... 0100 . . . 1 .... @3same_64
+  VQSHL_U_3s     1111 001 1 0 . .. .... .... 0100 . . . 1 .... @3same
+}
+{
+  VRSHL_S64_3s   1111 001 0 0 . .. .... .... 0101 . . . 0 .... @3same_64
+  VRSHL_S_3s     1111 001 0 0 . .. .... .... 0101 . . . 0 .... @3same
+}
+{
+  VRSHL_U64_3s   1111 001 1 0 . .. .... .... 0101 . . . 0 .... @3same_64
+  VRSHL_U_3s     1111 001 1 0 . .. .... .... 0101 . . . 0 .... @3same
+}
+{
+  VQRSHL_S64_3s  1111 001 0 0 . .. .... .... 0101 . . . 1 .... @3same_64
+  VQRSHL_S_3s    1111 001 0 0 . .. .... .... 0101 . . . 1 .... @3same
+}
+{
+  VQRSHL_U64_3s  1111 001 1 0 . .. .... .... 0101 . . . 1 .... @3same_64
+  VQRSHL_U_3s    1111 001 1 0 . .. .... .... 0101 . . . 1 .... @3same
+}
 
 VMAX_S_3s        1111 001 0 0 . .. .... .... 0110 . . . 0 .... @3same
 VMAX_U_3s        1111 001 1 0 . .. .... .... 0110 . . . 0 .... @3same
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 27/36] target/arm: Convert Neon VABA 3-reg-same to decodetree
  2020-04-30 18:09 [PATCH 00/36] target/arm: Convert Neon to decodetree (part 1) Peter Maydell
                   ` (25 preceding siblings ...)
  2020-04-30 18:09 ` [PATCH 26/36] target/arm: Convert Neon VQSHL, VRSHL, VQRSHL " Peter Maydell
@ 2020-04-30 18:09 ` Peter Maydell
  2020-05-01  2:29   ` Richard Henderson
  2020-04-30 18:09 ` [PATCH 28/36] target/arm: Convert Neon VPMAX/VPMIN 3-reg-same insns " Peter Maydell
                   ` (10 subsequent siblings)
  37 siblings, 1 reply; 85+ messages in thread
From: Peter Maydell @ 2020-04-30 18:09 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the NEON VABA insn in the 3-reg-same group to decodetree.
This is the only insn in this group which does an integer
accumulate into the destination register.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-neon.inc.c | 76 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          |  7 +--
 target/arm/neon-dp.decode       |  3 ++
 3 files changed, 80 insertions(+), 6 deletions(-)

diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index 084c78eea58..4692448fc5f 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -1128,3 +1128,79 @@ DO_3SAME_QS32(VQRSHL_U,qrshl_u)
 
 DO_3SAME_SHIFT32(VRSHL_S, rshl_s)
 DO_3SAME_SHIFT32(VRSHL_U, rshl_u)
+
+static bool do_vaba(DisasContext *s, arg_3same *a,
+                    NeonGenTwoOpFn *abd_fn, NeonGenTwoOpFn *add_fn)
+{
+    /* VABA: handled elementwise 32 bits at a time, accumulating */
+    TCGv_i32 tmp, tmp2;
+    int pass;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if ((a->vn | a->vm | a->vd) & a->q) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
+        tmp = neon_load_reg(a->vn, pass);
+        tmp2 = neon_load_reg(a->vm, pass);
+        abd_fn(tmp, tmp, tmp2);
+        tcg_temp_free_i32(tmp2);
+        tmp2 = neon_load_reg(a->vd, pass);
+        add_fn(tmp, tmp, tmp2);
+        tcg_temp_free_i32(tmp2);
+        neon_store_reg(a->vd, pass, tmp);
+    }
+    return true;
+}
+
+static bool trans_VABA_S_3s(DisasContext *s, arg_3same *a)
+{
+    static NeonGenTwoOpFn * const abd_fns[] = {
+        gen_helper_neon_abd_s8,
+        gen_helper_neon_abd_s16,
+        gen_helper_neon_abd_s32,
+    };
+    static NeonGenTwoOpFn * const add_fns[] = {
+        gen_helper_neon_add_u8,
+        gen_helper_neon_add_u16,
+        tcg_gen_add_i32,
+    };
+
+    if (a->size > 2) {
+        return false;
+    }
+    return do_vaba(s, a, abd_fns[a->size], add_fns[a->size]);
+}
+
+static bool trans_VABA_U_3s(DisasContext *s, arg_3same *a)
+{
+    static NeonGenTwoOpFn * const abd_fns[] = {
+        gen_helper_neon_abd_u8,
+        gen_helper_neon_abd_u16,
+        gen_helper_neon_abd_u32,
+    };
+    static NeonGenTwoOpFn * const add_fns[] = {
+        gen_helper_neon_add_u8,
+        gen_helper_neon_add_u16,
+        tcg_gen_add_i32,
+    };
+
+    if (a->size > 2) {
+        return false;
+    }
+    return do_vaba(s, a, abd_fns[a->size], add_fns[a->size]);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 4406fe54647..b04643cec9a 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -4793,6 +4793,7 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         case NEON_3R_VQSHL:
         case NEON_3R_VRSHL:
         case NEON_3R_VQRSHL:
+        case NEON_3R_VABA:
             /* Already handled by decodetree */
             return 1;
         }
@@ -4862,12 +4863,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             tmp2 = neon_load_reg(rm, pass);
         }
         switch (op) {
-        case NEON_3R_VABA:
-            GEN_NEON_INTEGER_OP(abd);
-            tcg_temp_free_i32(tmp2);
-            tmp2 = neon_load_reg(rd, pass);
-            gen_neon_add(size, tmp, tmp2);
-            break;
         case NEON_3R_VPMAX:
             GEN_NEON_INTEGER_OP(pmax);
             break;
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index ae442071ef1..d91f944f84a 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -113,6 +113,9 @@ VMIN_U_3s        1111 001 1 0 . .. .... .... 0110 . . . 1 .... @3same
 VABD_S_3s        1111 001 0 0 . .. .... .... 0111 . . . 0 .... @3same
 VABD_U_3s        1111 001 1 0 . .. .... .... 0111 . . . 0 .... @3same
 
+VABA_S_3s        1111 001 0 0 . .. .... .... 0111 . . . 1 .... @3same
+VABA_U_3s        1111 001 1 0 . .. .... .... 0111 . . . 1 .... @3same
+
 VADD_3s          1111 001 0 0 . .. .... .... 1000 . . . 0 .... @3same
 VSUB_3s          1111 001 1 0 . .. .... .... 1000 . . . 0 .... @3same
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 28/36] target/arm: Convert Neon VPMAX/VPMIN 3-reg-same insns to decodetree
  2020-04-30 18:09 [PATCH 00/36] target/arm: Convert Neon to decodetree (part 1) Peter Maydell
                   ` (26 preceding siblings ...)
  2020-04-30 18:09 ` [PATCH 27/36] target/arm: Convert Neon VABA 3-reg-same " Peter Maydell
@ 2020-04-30 18:09 ` Peter Maydell
  2020-05-01  3:36   ` Richard Henderson
  2020-04-30 18:09 ` [PATCH 29/36] target/arm: Convert Neon VPADD " Peter Maydell
                   ` (9 subsequent siblings)
  37 siblings, 1 reply; 85+ messages in thread
From: Peter Maydell @ 2020-04-30 18:09 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the Neon integer VPMAX and VPMIN 3-reg-same insns to
decodetree. These are 'pairwise' operations.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-neon.inc.c | 71 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 16 +-------
 target/arm/neon-dp.decode       |  9 +++++
 3 files changed, 82 insertions(+), 14 deletions(-)

diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index 4692448fc5f..cd4c9dd6f28 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -1204,3 +1204,74 @@ static bool trans_VABA_U_3s(DisasContext *s, arg_3same *a)
     }
     return do_vaba(s, a, abd_fns[a->size], add_fns[a->size]);
 }
+
+static bool do_3same_pair(DisasContext *s, arg_3same *a, NeonGenTwoOpFn *fn)
+{
+    /* Operations handled pairwise 32 bits at a time */
+    TCGv_i32 tmp, tmp2, tmp3;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if (a->size == 3) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    assert(a->q == 0); /* enforced by decode patterns */
+
+    /*
+     * Note that we have to be careful not to clobber the source operands
+     * in the "vm == vd" case by storing the result of the first pass too
+     * early. Since Q is 0 there are always just two passes, so instead
+     * of a complicated loop over each pass we just unroll.
+     */
+    tmp = neon_load_reg(a->vn, 0);
+    tmp2 = neon_load_reg(a->vn, 1);
+    fn(tmp, tmp, tmp2);
+    tcg_temp_free_i32(tmp2);
+
+    tmp3 = neon_load_reg(a->vm, 0);
+    tmp2 = neon_load_reg(a->vm, 1);
+    fn(tmp3, tmp3, tmp2);
+    tcg_temp_free_i32(tmp2);
+
+    neon_store_reg(a->vd, 0, tmp);
+    neon_store_reg(a->vd, 1, tmp3);
+    return true;
+}
+
+#define DO_3SAME_PAIR(INSN, func)                                       \
+    static bool trans_##INSN##_3s(DisasContext *s, arg_3same *a)        \
+    {                                                                   \
+        static NeonGenTwoOpFn * const fns[] = {                         \
+            gen_helper_neon_##func##8,                                  \
+            gen_helper_neon_##func##16,                                 \
+            gen_helper_neon_##func##32,                                 \
+        };                                                              \
+        if (a->size > 2) {                                              \
+            return false;                                               \
+        }                                                               \
+        return do_3same_pair(s, a, fns[a->size]);                       \
+    }
+
+/* 32-bit pairwise ops end up the same as the elementwise versions.  */
+#define gen_helper_neon_pmax_s32  tcg_gen_smax_i32
+#define gen_helper_neon_pmax_u32  tcg_gen_umax_i32
+#define gen_helper_neon_pmin_s32  tcg_gen_smin_i32
+#define gen_helper_neon_pmin_u32  tcg_gen_umin_i32
+
+DO_3SAME_PAIR(VPMAX_S, pmax_s)
+DO_3SAME_PAIR(VPMIN_S, pmin_s)
+DO_3SAME_PAIR(VPMAX_U, pmax_u)
+DO_3SAME_PAIR(VPMIN_U, pmin_u)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index b04643cec9a..4bbdddaa30c 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -3011,12 +3011,6 @@ static inline void gen_neon_rsb(int size, TCGv_i32 t0, TCGv_i32 t1)
     }
 }
 
-/* 32-bit pairwise ops end up the same as the elementwise versions.  */
-#define gen_helper_neon_pmax_s32  tcg_gen_smax_i32
-#define gen_helper_neon_pmax_u32  tcg_gen_umax_i32
-#define gen_helper_neon_pmin_s32  tcg_gen_smin_i32
-#define gen_helper_neon_pmin_u32  tcg_gen_umin_i32
-
 #define GEN_NEON_INTEGER_OP_ENV(name) do { \
     switch ((size << 1) | u) { \
     case 0: \
@@ -4794,6 +4788,8 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         case NEON_3R_VRSHL:
         case NEON_3R_VQRSHL:
         case NEON_3R_VABA:
+        case NEON_3R_VPMAX:
+        case NEON_3R_VPMIN:
             /* Already handled by decodetree */
             return 1;
         }
@@ -4805,8 +4801,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         pairwise = 0;
         switch (op) {
         case NEON_3R_VPADD_VQRDMLAH:
-        case NEON_3R_VPMAX:
-        case NEON_3R_VPMIN:
             pairwise = 1;
             break;
         case NEON_3R_FLOAT_ARITH:
@@ -4863,12 +4857,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             tmp2 = neon_load_reg(rm, pass);
         }
         switch (op) {
-        case NEON_3R_VPMAX:
-            GEN_NEON_INTEGER_OP(pmax);
-            break;
-        case NEON_3R_VPMIN:
-            GEN_NEON_INTEGER_OP(pmin);
-            break;
         case NEON_3R_VQDMULH_VQRDMULH: /* Multiply high.  */
             if (!u) { /* VQDMULH */
                 switch (size) {
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index d91f944f84a..e47998899ce 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -42,6 +42,9 @@
 @3same           .... ... . . . size:2 .... .... .... . q:1 . . .... \
                  &3same vm=%vm_dp vn=%vn_dp vd=%vd_dp
 
+@3same_q0        .... ... . . . size:2 .... .... .... . 0 . . .... \
+                 &3same vm=%vm_dp vn=%vn_dp vd=%vd_dp q=0
+
 VHADD_S_3s       1111 001 0 0 . .. .... .... 0000 . . . 0 .... @3same
 VHADD_U_3s       1111 001 1 0 . .. .... .... 0000 . . . 0 .... @3same
 VQADD_S_3s       1111 001 0 0 . .. .... .... 0000 . . . 1 .... @3same
@@ -128,6 +131,12 @@ VMLS_3s          1111 001 1 0 . .. .... .... 1001 . . . 0 .... @3same
 VMUL_3s          1111 001 0 0 . .. .... .... 1001 . . . 1 .... @3same
 VMUL_p_3s        1111 001 1 0 . .. .... .... 1001 . . . 1 .... @3same
 
+VPMAX_S_3s       1111 001 0 0 . .. .... .... 1010 . . . 0 .... @3same_q0
+VPMAX_U_3s       1111 001 1 0 . .. .... .... 1010 . . . 0 .... @3same_q0
+
+VPMIN_S_3s       1111 001 0 0 . .. .... .... 1010 . . . 1 .... @3same_q0
+VPMIN_U_3s       1111 001 1 0 . .. .... .... 1010 . . . 1 .... @3same_q0
+
 VQRDMLAH_3s      1111 001 1 0 . .. .... .... 1011 ... 1 .... @3same
 
 SHA1_3s          1111 001 0 0 . optype:2 .... .... 1100 . 1 . 0 .... \
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 29/36] target/arm: Convert Neon VPADD 3-reg-same insns to decodetree
  2020-04-30 18:09 [PATCH 00/36] target/arm: Convert Neon to decodetree (part 1) Peter Maydell
                   ` (27 preceding siblings ...)
  2020-04-30 18:09 ` [PATCH 28/36] target/arm: Convert Neon VPMAX/VPMIN 3-reg-same insns " Peter Maydell
@ 2020-04-30 18:09 ` Peter Maydell
  2020-05-01  3:39   ` Richard Henderson
  2020-04-30 18:09 ` [PATCH 30/36] target/arm: Convert Neon VQDMULH/VQRDMULH 3-reg-same " Peter Maydell
                   ` (8 subsequent siblings)
  37 siblings, 1 reply; 85+ messages in thread
From: Peter Maydell @ 2020-04-30 18:09 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the Neon integer VPADD 3-reg-same insns to decodetree.  These
are 'pairwise' operations.  (Note that VQRDMLAH, which shares the
same primary opcode but has U=1, has already been converted.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-neon.inc.c |  2 ++
 target/arm/translate.c          | 19 +------------------
 target/arm/neon-dp.decode       |  2 ++
 3 files changed, 5 insertions(+), 18 deletions(-)

diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index cd4c9dd6f28..31a8e4ef486 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -1270,8 +1270,10 @@ static bool do_3same_pair(DisasContext *s, arg_3same *a, NeonGenTwoOpFn *fn)
 #define gen_helper_neon_pmax_u32  tcg_gen_umax_i32
 #define gen_helper_neon_pmin_s32  tcg_gen_smin_i32
 #define gen_helper_neon_pmin_u32  tcg_gen_umin_i32
+#define gen_helper_neon_padd_u32  tcg_gen_add_i32
 
 DO_3SAME_PAIR(VPMAX_S, pmax_s)
 DO_3SAME_PAIR(VPMIN_S, pmin_s)
 DO_3SAME_PAIR(VPMAX_U, pmax_u)
 DO_3SAME_PAIR(VPMIN_U, pmin_u)
+DO_3SAME_PAIR(VPADD, padd_u)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 4bbdddaa30c..f583cc900e1 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -4749,13 +4749,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             return 1;
         }
         switch (op) {
-        case NEON_3R_VPADD_VQRDMLAH:
-            if (!u) {
-                break;  /* VPADD */
-            }
-            /* VQRDMLAH : handled by decodetree */
-            return 1;
-
         case NEON_3R_VFM_VQRDMLSH:
             if (!u) {
                 /* VFM, VFMS */
@@ -4790,6 +4783,7 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         case NEON_3R_VABA:
         case NEON_3R_VPMAX:
         case NEON_3R_VPMIN:
+        case NEON_3R_VPADD_VQRDMLAH:
             /* Already handled by decodetree */
             return 1;
         }
@@ -4800,9 +4794,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         }
         pairwise = 0;
         switch (op) {
-        case NEON_3R_VPADD_VQRDMLAH:
-            pairwise = 1;
-            break;
         case NEON_3R_FLOAT_ARITH:
             pairwise = (u && size < 2); /* if VPADD (float) */
             break;
@@ -4880,14 +4871,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                 }
             }
             break;
-        case NEON_3R_VPADD_VQRDMLAH:
-            switch (size) {
-            case 0: gen_helper_neon_padd_u8(tmp, tmp, tmp2); break;
-            case 1: gen_helper_neon_padd_u16(tmp, tmp, tmp2); break;
-            case 2: tcg_gen_add_i32(tmp, tmp, tmp2); break;
-            default: abort();
-            }
-            break;
         case NEON_3R_FLOAT_ARITH: /* Floating point arithmetic. */
         {
             TCGv_ptr fpstatus = get_fpstatus_ptr(1);
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index e47998899ce..acaf278cc8d 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -137,6 +137,8 @@ VPMAX_U_3s       1111 001 1 0 . .. .... .... 1010 . . . 0 .... @3same_q0
 VPMIN_S_3s       1111 001 0 0 . .. .... .... 1010 . . . 1 .... @3same_q0
 VPMIN_U_3s       1111 001 1 0 . .. .... .... 1010 . . . 1 .... @3same_q0
 
+VPADD_3s         1111 001 0 0 . .. .... .... 1011 . . . 1 .... @3same_q0
+
 VQRDMLAH_3s      1111 001 1 0 . .. .... .... 1011 ... 1 .... @3same
 
 SHA1_3s          1111 001 0 0 . optype:2 .... .... 1100 . 1 . 0 .... \
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 30/36] target/arm: Convert Neon VQDMULH/VQRDMULH 3-reg-same to decodetree
  2020-04-30 18:09 [PATCH 00/36] target/arm: Convert Neon to decodetree (part 1) Peter Maydell
                   ` (28 preceding siblings ...)
  2020-04-30 18:09 ` [PATCH 29/36] target/arm: Convert Neon VPADD " Peter Maydell
@ 2020-04-30 18:09 ` Peter Maydell
  2020-05-01  3:47   ` Richard Henderson
  2020-04-30 18:09 ` [PATCH 31/36] target/arm: Convert Neon VADD, VSUB, VABD 3-reg-same insns " Peter Maydell
                   ` (7 subsequent siblings)
  37 siblings, 1 reply; 85+ messages in thread
From: Peter Maydell @ 2020-04-30 18:09 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the Neon VQDMULH and VQRDMULH 3-reg-same insns to
decodetree. These are the last integer operations in the
3-reg-same group.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-neon.inc.c | 44 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 24 +-----------------
 target/arm/neon-dp.decode       |  3 +++
 3 files changed, 48 insertions(+), 23 deletions(-)

diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index 31a8e4ef486..2fab547840d 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -1277,3 +1277,47 @@ DO_3SAME_PAIR(VPMIN_S, pmin_s)
 DO_3SAME_PAIR(VPMAX_U, pmax_u)
 DO_3SAME_PAIR(VPMIN_U, pmin_u)
 DO_3SAME_PAIR(VPADD, padd_u)
+
+static void gen_VQDMULH_s16(TCGv_i32 rd, TCGv_i32 rn, TCGv_i32 rm)
+{
+    gen_helper_neon_qdmulh_s16(rd, cpu_env, rn, rm);
+}
+
+static void gen_VQDMULH_s32(TCGv_i32 rd, TCGv_i32 rn, TCGv_i32 rm)
+{
+    gen_helper_neon_qdmulh_s32(rd, cpu_env, rn, rm);
+}
+
+static bool trans_VQDMULH_3s(DisasContext *s, arg_3same *a)
+{
+    static NeonGenTwoOpFn * const fns[] = {
+        gen_VQDMULH_s16, gen_VQDMULH_s32,
+    };
+
+    if (a->size != 1 && a->size != 2) {
+        return false;
+    }
+    return do_3same_32(s, a, fns[a->size - 1]);
+}
+
+static void gen_VQRDMULH_s16(TCGv_i32 rd, TCGv_i32 rn, TCGv_i32 rm)
+{
+    gen_helper_neon_qrdmulh_s16(rd, cpu_env, rn, rm);
+}
+
+static void gen_VQRDMULH_s32(TCGv_i32 rd, TCGv_i32 rn, TCGv_i32 rm)
+{
+    gen_helper_neon_qrdmulh_s32(rd, cpu_env, rn, rm);
+}
+
+static bool trans_VQRDMULH_3s(DisasContext *s, arg_3same *a)
+{
+    static NeonGenTwoOpFn * const fns[] = {
+        gen_VQRDMULH_s16, gen_VQRDMULH_s32,
+    };
+
+    if (a->size != 1 && a->size != 2) {
+        return false;
+    }
+    return do_3same_32(s, a, fns[a->size - 1]);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index f583cc900e1..9fec1889613 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -4784,6 +4784,7 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         case NEON_3R_VPMAX:
         case NEON_3R_VPMIN:
         case NEON_3R_VPADD_VQRDMLAH:
+        case NEON_3R_VQDMULH_VQRDMULH:
             /* Already handled by decodetree */
             return 1;
         }
@@ -4848,29 +4849,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             tmp2 = neon_load_reg(rm, pass);
         }
         switch (op) {
-        case NEON_3R_VQDMULH_VQRDMULH: /* Multiply high.  */
-            if (!u) { /* VQDMULH */
-                switch (size) {
-                case 1:
-                    gen_helper_neon_qdmulh_s16(tmp, cpu_env, tmp, tmp2);
-                    break;
-                case 2:
-                    gen_helper_neon_qdmulh_s32(tmp, cpu_env, tmp, tmp2);
-                    break;
-                default: abort();
-                }
-            } else { /* VQRDMULH */
-                switch (size) {
-                case 1:
-                    gen_helper_neon_qrdmulh_s16(tmp, cpu_env, tmp, tmp2);
-                    break;
-                case 2:
-                    gen_helper_neon_qrdmulh_s32(tmp, cpu_env, tmp, tmp2);
-                    break;
-                default: abort();
-                }
-            }
-            break;
         case NEON_3R_FLOAT_ARITH: /* Floating point arithmetic. */
         {
             TCGv_ptr fpstatus = get_fpstatus_ptr(1);
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index acaf278cc8d..8ceedd8b8d8 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -137,6 +137,9 @@ VPMAX_U_3s       1111 001 1 0 . .. .... .... 1010 . . . 0 .... @3same_q0
 VPMIN_S_3s       1111 001 0 0 . .. .... .... 1010 . . . 1 .... @3same_q0
 VPMIN_U_3s       1111 001 1 0 . .. .... .... 1010 . . . 1 .... @3same_q0
 
+VQDMULH_3s       1111 001 0 0 . .. .... .... 1011 . . . 0 .... @3same
+VQRDMULH_3s      1111 001 1 0 . .. .... .... 1011 . . . 0 .... @3same
+
 VPADD_3s         1111 001 0 0 . .. .... .... 1011 . . . 1 .... @3same_q0
 
 VQRDMLAH_3s      1111 001 1 0 . .. .... .... 1011 ... 1 .... @3same
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 31/36] target/arm: Convert Neon VADD, VSUB, VABD 3-reg-same insns to decodetree
  2020-04-30 18:09 [PATCH 00/36] target/arm: Convert Neon to decodetree (part 1) Peter Maydell
                   ` (29 preceding siblings ...)
  2020-04-30 18:09 ` [PATCH 30/36] target/arm: Convert Neon VQDMULH/VQRDMULH 3-reg-same " Peter Maydell
@ 2020-04-30 18:09 ` Peter Maydell
  2020-05-01  3:57   ` Richard Henderson
  2020-04-30 18:09 ` [PATCH 32/36] target/arm: Convert Neon VPMIN/VPMAX/VPADD float " Peter Maydell
                   ` (6 subsequent siblings)
  37 siblings, 1 reply; 85+ messages in thread
From: Peter Maydell @ 2020-04-30 18:09 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the Neon VADD, VSUB, VABD 3-reg-same insns to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-neon.inc.c | 54 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 10 ++----
 target/arm/neon-dp.decode       |  8 +++++
 3 files changed, 65 insertions(+), 7 deletions(-)

diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index 2fab547840d..6a27b7673c2 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -1321,3 +1321,57 @@ static bool trans_VQRDMULH_3s(DisasContext *s, arg_3same *a)
     }
     return do_3same_32(s, a, fns[a->size - 1]);
 }
+
+static bool do_3same_fp(DisasContext *s, arg_3same *a, VFPGen3OpSPFn *fn)
+{
+    /* FP operations handled elementwise 32 bits at a time */
+    TCGv_i32 tmp, tmp2;
+    int pass;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if ((a->vn | a->vm | a->vd) & a->q) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    TCGv_ptr fpstatus = get_fpstatus_ptr(1);
+    for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
+        tmp = neon_load_reg(a->vn, pass);
+        tmp2 = neon_load_reg(a->vm, pass);
+        fn(tmp, tmp, tmp2, fpstatus);
+        tcg_temp_free_i32(tmp2);
+        neon_store_reg(a->vd, pass, tmp);
+    }
+    tcg_temp_free_ptr(fpstatus);
+    return true;
+}
+
+/*
+ * For all the functions using this macro, size == 1 means fp16,
+ * which is an architecture extension we don't implement yet.
+ */
+#define DO_3S_FP(INSN,FUNC)                                         \
+    static bool trans_##INSN##_fp_3s(DisasContext *s, arg_3same *a) \
+    {                                                               \
+        if (a->size != 0) {                                         \
+            /* TODO fp16 support */                                 \
+            return false;                                           \
+        }                                                           \
+        return do_3same_fp(s, a, FUNC);                             \
+    }
+
+DO_3S_FP(VADD, gen_helper_vfp_adds)
+DO_3S_FP(VSUB, gen_helper_vfp_subs)
+DO_3S_FP(VABD, gen_helper_neon_abd_f32)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 9fec1889613..c944cbf20af 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -4797,6 +4797,9 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         switch (op) {
         case NEON_3R_FLOAT_ARITH:
             pairwise = (u && size < 2); /* if VPADD (float) */
+            if (!pairwise) {
+                return 1; /* handled by decodetree */
+            }
             break;
         case NEON_3R_FLOAT_MINMAX:
             pairwise = u; /* if VPMIN/VPMAX (float) */
@@ -4853,16 +4856,9 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         {
             TCGv_ptr fpstatus = get_fpstatus_ptr(1);
             switch ((u << 2) | size) {
-            case 0: /* VADD */
             case 4: /* VPADD */
                 gen_helper_vfp_adds(tmp, tmp, tmp2, fpstatus);
                 break;
-            case 2: /* VSUB */
-                gen_helper_vfp_subs(tmp, tmp, tmp2, fpstatus);
-                break;
-            case 6: /* VABD */
-                gen_helper_neon_abd_f32(tmp, tmp, tmp2, fpstatus);
-                break;
             default:
                 abort();
             }
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index 8ceedd8b8d8..9d6a17d6f04 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -45,6 +45,10 @@
 @3same_q0        .... ... . . . size:2 .... .... .... . 0 . . .... \
                  &3same vm=%vm_dp vn=%vn_dp vd=%vd_dp q=0
 
+# For FP insns the high bit of 'size' is used as part of opcode decode
+@3same_fp        .... ... . . . . size:1 .... .... .... . q:1 . . .... \
+                 &3same vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
 VHADD_S_3s       1111 001 0 0 . .. .... .... 0000 . . . 0 .... @3same
 VHADD_U_3s       1111 001 1 0 . .. .... .... 0000 . . . 0 .... @3same
 VQADD_S_3s       1111 001 0 0 . .. .... .... 0000 . . . 1 .... @3same
@@ -154,3 +158,7 @@ SHA256SU1_3s     1111 001 1 0 . 10 .... .... 1100 . 1 . 0 .... \
                  vm=%vm_dp vn=%vn_dp vd=%vd_dp
 
 VQRDMLSH_3s      1111 001 1 0 . .. .... .... 1100 ... 1 .... @3same
+
+VADD_fp_3s       1111 001 0 0 . 0 . .... .... 1101 ... 0 .... @3same_fp
+VSUB_fp_3s       1111 001 0 0 . 1 . .... .... 1101 ... 0 .... @3same_fp
+VABD_fp_3s       1111 001 1 0 . 1 . .... .... 1101 ... 0 .... @3same_fp
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 32/36] target/arm: Convert Neon VPMIN/VPMAX/VPADD float 3-reg-same insns to decodetree
  2020-04-30 18:09 [PATCH 00/36] target/arm: Convert Neon to decodetree (part 1) Peter Maydell
                   ` (30 preceding siblings ...)
  2020-04-30 18:09 ` [PATCH 31/36] target/arm: Convert Neon VADD, VSUB, VABD 3-reg-same insns " Peter Maydell
@ 2020-04-30 18:09 ` Peter Maydell
  2020-05-01  3:59   ` Richard Henderson
  2020-04-30 18:10 ` [PATCH 33/36] target/arm: Convert Neon fp VMUL, VMLA, VMLS " Peter Maydell
                   ` (5 subsequent siblings)
  37 siblings, 1 reply; 85+ messages in thread
From: Peter Maydell @ 2020-04-30 18:09 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the Neon float VPMIN, VPMAX and VPADD 3-reg-same insns to
decodetree. These are the only remaining 'pairwise' operations,
so we can delete the pairwise-specific bits of the old decoder's
for-each-element loop now.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-neon.inc.c | 63 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 63 +++++----------------------------
 target/arm/neon-dp.decode       |  5 +++
 3 files changed, 76 insertions(+), 55 deletions(-)

diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index 6a27b7673c2..30832309924 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -1375,3 +1375,66 @@ static bool do_3same_fp(DisasContext *s, arg_3same *a, VFPGen3OpSPFn *fn)
 DO_3S_FP(VADD, gen_helper_vfp_adds)
 DO_3S_FP(VSUB, gen_helper_vfp_subs)
 DO_3S_FP(VABD, gen_helper_neon_abd_f32)
+
+static bool do_3same_fp_pair(DisasContext *s, arg_3same *a, VFPGen3OpSPFn *fn)
+{
+    /* FP operations handled pairwise 32 bits at a time */
+    TCGv_i32 tmp, tmp2, tmp3;
+    TCGv_ptr fpstatus;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    assert(a->q == 0); /* enforced by decode patterns */
+
+    /*
+     * Note that we have to be careful not to clobber the source operands
+     * in the "vm == vd" case by storing the result of the first pass too
+     * early. Since Q is 0 there are always just two passes, so instead
+     * of a complicated loop over each pass we just unroll.
+     */
+    fpstatus = get_fpstatus_ptr(1);
+    tmp = neon_load_reg(a->vn, 0);
+    tmp2 = neon_load_reg(a->vn, 1);
+    fn(tmp, tmp, tmp2, fpstatus);
+    tcg_temp_free_i32(tmp2);
+
+    tmp3 = neon_load_reg(a->vm, 0);
+    tmp2 = neon_load_reg(a->vm, 1);
+    fn(tmp3, tmp3, tmp2, fpstatus);
+    tcg_temp_free_i32(tmp2);
+    tcg_temp_free_ptr(fpstatus);
+
+    neon_store_reg(a->vd, 0, tmp);
+    neon_store_reg(a->vd, 1, tmp3);
+    return true;
+}
+
+/*
+ * For all the functions using this macro, size == 1 means fp16,
+ * which is an architecture extension we don't implement yet.
+ */
+#define DO_3S_FP_PAIR(INSN,FUNC)                                    \
+    static bool trans_##INSN##_fp_3s(DisasContext *s, arg_3same *a) \
+    {                                                               \
+        if (a->size != 0) {                                         \
+            /* TODO fp16 support */                                 \
+            return false;                                           \
+        }                                                           \
+        return do_3same_fp_pair(s, a, FUNC);                        \
+    }
+
+DO_3S_FP_PAIR(VPADD, gen_helper_vfp_adds)
+DO_3S_FP_PAIR(VPMAX, gen_helper_vfp_maxs)
+DO_3S_FP_PAIR(VPMIN, gen_helper_vfp_mins)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index c944cbf20af..5ae982ee253 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -4700,7 +4700,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
     int shift;
     int pass;
     int count;
-    int pairwise;
     int u;
     int vec_size;
     uint32_t imm;
@@ -4785,6 +4784,7 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         case NEON_3R_VPMIN:
         case NEON_3R_VPADD_VQRDMLAH:
         case NEON_3R_VQDMULH_VQRDMULH:
+        case NEON_3R_FLOAT_ARITH:
             /* Already handled by decodetree */
             return 1;
         }
@@ -4793,16 +4793,11 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             /* 64-bit element instructions: handled by decodetree */
             return 1;
         }
-        pairwise = 0;
         switch (op) {
-        case NEON_3R_FLOAT_ARITH:
-            pairwise = (u && size < 2); /* if VPADD (float) */
-            if (!pairwise) {
-                return 1; /* handled by decodetree */
-            }
-            break;
         case NEON_3R_FLOAT_MINMAX:
-            pairwise = u; /* if VPMIN/VPMAX (float) */
+            if (u) {
+                return 1; /* VPMIN/VPMAX handled by decodetree */
+            }
             break;
         case NEON_3R_FLOAT_CMP:
             if (!u && size) {
@@ -4830,41 +4825,12 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             break;
         }
 
-        if (pairwise && q) {
-            /* All the pairwise insns UNDEF if Q is set */
-            return 1;
-        }
-
         for (pass = 0; pass < (q ? 4 : 2); pass++) {
 
-        if (pairwise) {
-            /* Pairwise.  */
-            if (pass < 1) {
-                tmp = neon_load_reg(rn, 0);
-                tmp2 = neon_load_reg(rn, 1);
-            } else {
-                tmp = neon_load_reg(rm, 0);
-                tmp2 = neon_load_reg(rm, 1);
-            }
-        } else {
-            /* Elementwise.  */
-            tmp = neon_load_reg(rn, pass);
-            tmp2 = neon_load_reg(rm, pass);
-        }
+        /* Elementwise.  */
+        tmp = neon_load_reg(rn, pass);
+        tmp2 = neon_load_reg(rm, pass);
         switch (op) {
-        case NEON_3R_FLOAT_ARITH: /* Floating point arithmetic. */
-        {
-            TCGv_ptr fpstatus = get_fpstatus_ptr(1);
-            switch ((u << 2) | size) {
-            case 4: /* VPADD */
-                gen_helper_vfp_adds(tmp, tmp, tmp2, fpstatus);
-                break;
-            default:
-                abort();
-            }
-            tcg_temp_free_ptr(fpstatus);
-            break;
-        }
         case NEON_3R_FLOAT_MULTIPLY:
         {
             TCGv_ptr fpstatus = get_fpstatus_ptr(1);
@@ -4955,22 +4921,9 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         }
         tcg_temp_free_i32(tmp2);
 
-        /* Save the result.  For elementwise operations we can put it
-           straight into the destination register.  For pairwise operations
-           we have to be careful to avoid clobbering the source operands.  */
-        if (pairwise && rd == rm) {
-            neon_store_scratch(pass, tmp);
-        } else {
-            neon_store_reg(rd, pass, tmp);
-        }
+        neon_store_reg(rd, pass, tmp);
 
         } /* for pass */
-        if (pairwise && rd == rm) {
-            for (pass = 0; pass < (q ? 4 : 2); pass++) {
-                tmp = neon_load_scratch(pass);
-                neon_store_reg(rd, pass, tmp);
-            }
-        }
         /* End of 3 register same size operations.  */
     } else if (insn & (1 << 4)) {
         if ((insn & 0x00380080) != 0) {
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index 9d6a17d6f04..378c2dd5105 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -48,6 +48,8 @@
 # For FP insns the high bit of 'size' is used as part of opcode decode
 @3same_fp        .... ... . . . . size:1 .... .... .... . q:1 . . .... \
                  &3same vm=%vm_dp vn=%vn_dp vd=%vd_dp
+@3same_fp_q0     .... ... . . . . size:1 .... .... .... . 0 . . .... \
+                 &3same vm=%vm_dp vn=%vn_dp vd=%vd_dp q=0
 
 VHADD_S_3s       1111 001 0 0 . .. .... .... 0000 . . . 0 .... @3same
 VHADD_U_3s       1111 001 1 0 . .. .... .... 0000 . . . 0 .... @3same
@@ -161,4 +163,7 @@ VQRDMLSH_3s      1111 001 1 0 . .. .... .... 1100 ... 1 .... @3same
 
 VADD_fp_3s       1111 001 0 0 . 0 . .... .... 1101 ... 0 .... @3same_fp
 VSUB_fp_3s       1111 001 0 0 . 1 . .... .... 1101 ... 0 .... @3same_fp
+VPADD_fp_3s      1111 001 1 0 . 0 . .... .... 1101 ... 0 .... @3same_fp_q0
 VABD_fp_3s       1111 001 1 0 . 1 . .... .... 1101 ... 0 .... @3same_fp
+VPMAX_fp_3s      1111 001 1 0 . 0 . .... .... 1111 ... 0 .... @3same_fp_q0
+VPMIN_fp_3s      1111 001 1 0 . 1 . .... .... 1111 ... 0 .... @3same_fp_q0
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 33/36] target/arm: Convert Neon fp VMUL, VMLA, VMLS 3-reg-same insns to decodetree
  2020-04-30 18:09 [PATCH 00/36] target/arm: Convert Neon to decodetree (part 1) Peter Maydell
                   ` (31 preceding siblings ...)
  2020-04-30 18:09 ` [PATCH 32/36] target/arm: Convert Neon VPMIN/VPMAX/VPADD float " Peter Maydell
@ 2020-04-30 18:10 ` Peter Maydell
  2020-05-01  4:07   ` Richard Henderson
  2020-04-30 18:10 ` [PATCH 34/36] target/arm: Convert Neon 3-reg-same compare " Peter Maydell
                   ` (4 subsequent siblings)
  37 siblings, 1 reply; 85+ messages in thread
From: Peter Maydell @ 2020-04-30 18:10 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the Neon integer VMUL, VMLA, and VMLS 3-reg-same inssn to
decodetree.

Since VMLA and VMLS accumulate into the destination register, we add
a reads_vd parameter to do_3same_fp() which tells it to load the
old value into vd before calling the callback function, in the same
way that the translate-vfp.inc.c do_vfp_3op_sp() and do_vfp_3op_dp()
functions work.

This conversion fixes in passing an underdecoding for VMUL
(originally reported by Fredrik Strupe <fredrik@strupe.net>): bit 1
of the 'size' field must be 0.  The old decoder didn't enforce this,
but the decodetree pattern does.

The gen_VMLA_fp_reg() function performs the addition operation
with the operands in the opposite order to the old decoder:
since Neon sets 'default NaN mode' float32_add operations are
commutative so there is no behaviour difference, but putting
them this way around matches the Arm ARM pseudocode and the
required operation order for the subtraction in gen_VMLS_fp_reg().

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-neon.inc.c | 49 +++++++++++++++++++++++++++------
 target/arm/translate.c          | 17 +-----------
 target/arm/neon-dp.decode       |  3 ++
 3 files changed, 44 insertions(+), 25 deletions(-)

diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index 30832309924..47879bbb6c9 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -1322,9 +1322,15 @@ static bool trans_VQRDMULH_3s(DisasContext *s, arg_3same *a)
     return do_3same_32(s, a, fns[a->size - 1]);
 }
 
-static bool do_3same_fp(DisasContext *s, arg_3same *a, VFPGen3OpSPFn *fn)
+static bool do_3same_fp(DisasContext *s, arg_3same *a, VFPGen3OpSPFn *fn,
+                        bool reads_vd)
 {
-    /* FP operations handled elementwise 32 bits at a time */
+    /*
+     * FP operations handled elementwise 32 bits at a time.
+     * If reads_vd is true then the old value of Vd will be
+     * loaded before calling the callback function. This is
+     * used for multiply-accumulate type operations.
+     */
     TCGv_i32 tmp, tmp2;
     int pass;
 
@@ -1350,9 +1356,16 @@ static bool do_3same_fp(DisasContext *s, arg_3same *a, VFPGen3OpSPFn *fn)
     for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
         tmp = neon_load_reg(a->vn, pass);
         tmp2 = neon_load_reg(a->vm, pass);
-        fn(tmp, tmp, tmp2, fpstatus);
+        if (reads_vd) {
+            TCGv_i32 tmp_rd = neon_load_reg(a->vd, pass);
+            fn(tmp_rd, tmp, tmp2, fpstatus);
+            neon_store_reg(a->vd, pass, tmp_rd);
+            tcg_temp_free_i32(tmp);
+        } else {
+            fn(tmp, tmp, tmp2, fpstatus);
+            neon_store_reg(a->vd, pass, tmp);
+        }
         tcg_temp_free_i32(tmp2);
-        neon_store_reg(a->vd, pass, tmp);
     }
     tcg_temp_free_ptr(fpstatus);
     return true;
@@ -1362,19 +1375,37 @@ static bool do_3same_fp(DisasContext *s, arg_3same *a, VFPGen3OpSPFn *fn)
  * For all the functions using this macro, size == 1 means fp16,
  * which is an architecture extension we don't implement yet.
  */
-#define DO_3S_FP(INSN,FUNC)                                         \
+#define DO_3S_FP(INSN,FUNC,READS_VD)                                \
     static bool trans_##INSN##_fp_3s(DisasContext *s, arg_3same *a) \
     {                                                               \
         if (a->size != 0) {                                         \
             /* TODO fp16 support */                                 \
             return false;                                           \
         }                                                           \
-        return do_3same_fp(s, a, FUNC);                             \
+        return do_3same_fp(s, a, FUNC, READS_VD);                   \
     }
 
-DO_3S_FP(VADD, gen_helper_vfp_adds)
-DO_3S_FP(VSUB, gen_helper_vfp_subs)
-DO_3S_FP(VABD, gen_helper_neon_abd_f32)
+DO_3S_FP(VADD, gen_helper_vfp_adds, false)
+DO_3S_FP(VSUB, gen_helper_vfp_subs, false)
+DO_3S_FP(VABD, gen_helper_neon_abd_f32, false)
+DO_3S_FP(VMUL, gen_helper_vfp_muls, false)
+
+static void gen_VMLA_fp_3s(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm,
+                            TCGv_ptr fpstatus)
+{
+    gen_helper_vfp_muls(vn, vn, vm, fpstatus);
+    gen_helper_vfp_adds(vd, vd, vn, fpstatus);
+}
+
+static void gen_VMLS_fp_3s(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm,
+                            TCGv_ptr fpstatus)
+{
+    gen_helper_vfp_muls(vn, vn, vm, fpstatus);
+    gen_helper_vfp_subs(vd, vd, vn, fpstatus);
+}
+
+DO_3S_FP(VMLA, gen_VMLA_fp_3s, true)
+DO_3S_FP(VMLS, gen_VMLS_fp_3s, true)
 
 static bool do_3same_fp_pair(DisasContext *s, arg_3same *a, VFPGen3OpSPFn *fn)
 {
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 5ae982ee253..57343daa10a 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -4785,6 +4785,7 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         case NEON_3R_VPADD_VQRDMLAH:
         case NEON_3R_VQDMULH_VQRDMULH:
         case NEON_3R_FLOAT_ARITH:
+        case NEON_3R_FLOAT_MULTIPLY:
             /* Already handled by decodetree */
             return 1;
         }
@@ -4831,22 +4832,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         tmp = neon_load_reg(rn, pass);
         tmp2 = neon_load_reg(rm, pass);
         switch (op) {
-        case NEON_3R_FLOAT_MULTIPLY:
-        {
-            TCGv_ptr fpstatus = get_fpstatus_ptr(1);
-            gen_helper_vfp_muls(tmp, tmp, tmp2, fpstatus);
-            if (!u) {
-                tcg_temp_free_i32(tmp2);
-                tmp2 = neon_load_reg(rd, pass);
-                if (size == 0) {
-                    gen_helper_vfp_adds(tmp, tmp, tmp2, fpstatus);
-                } else {
-                    gen_helper_vfp_subs(tmp, tmp2, tmp, fpstatus);
-                }
-            }
-            tcg_temp_free_ptr(fpstatus);
-            break;
-        }
         case NEON_3R_FLOAT_CMP:
         {
             TCGv_ptr fpstatus = get_fpstatus_ptr(1);
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index 378c2dd5105..96866c03db4 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -165,5 +165,8 @@ VADD_fp_3s       1111 001 0 0 . 0 . .... .... 1101 ... 0 .... @3same_fp
 VSUB_fp_3s       1111 001 0 0 . 1 . .... .... 1101 ... 0 .... @3same_fp
 VPADD_fp_3s      1111 001 1 0 . 0 . .... .... 1101 ... 0 .... @3same_fp_q0
 VABD_fp_3s       1111 001 1 0 . 1 . .... .... 1101 ... 0 .... @3same_fp
+VMLA_fp_3s       1111 001 0 0 . 0 . .... .... 1101 ... 1 .... @3same_fp
+VMLS_fp_3s       1111 001 0 0 . 1 . .... .... 1101 ... 1 .... @3same_fp
+VMUL_fp_3s       1111 001 1 0 . 0 . .... .... 1101 ... 1 .... @3same_fp
 VPMAX_fp_3s      1111 001 1 0 . 0 . .... .... 1111 ... 0 .... @3same_fp_q0
 VPMIN_fp_3s      1111 001 1 0 . 1 . .... .... 1111 ... 0 .... @3same_fp_q0
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 34/36] target/arm: Convert Neon 3-reg-same compare insns to decodetree
  2020-04-30 18:09 [PATCH 00/36] target/arm: Convert Neon to decodetree (part 1) Peter Maydell
                   ` (32 preceding siblings ...)
  2020-04-30 18:10 ` [PATCH 33/36] target/arm: Convert Neon fp VMUL, VMLA, VMLS " Peter Maydell
@ 2020-04-30 18:10 ` Peter Maydell
  2020-05-01  4:09   ` Richard Henderson
  2020-04-30 18:10 ` [PATCH 35/36] target/arm: Convert Neon fp VMAX/VMIN/VMAXNM/VMINNM/VRECPS/VRSQRTS " Peter Maydell
                   ` (3 subsequent siblings)
  37 siblings, 1 reply; 85+ messages in thread
From: Peter Maydell @ 2020-04-30 18:10 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the Neon integer 3-reg-same compare insns VCGE, VCGT,
VCEQ, VACGE and VACGT to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-neon.inc.c |  5 +++++
 target/arm/translate.c          | 39 ++-------------------------------
 target/arm/neon-dp.decode       |  5 +++++
 3 files changed, 12 insertions(+), 37 deletions(-)

diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index 47879bbb6c9..29a3f7677c7 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -1389,6 +1389,11 @@ DO_3S_FP(VADD, gen_helper_vfp_adds, false)
 DO_3S_FP(VSUB, gen_helper_vfp_subs, false)
 DO_3S_FP(VABD, gen_helper_neon_abd_f32, false)
 DO_3S_FP(VMUL, gen_helper_vfp_muls, false)
+DO_3S_FP(VCEQ, gen_helper_neon_ceq_f32, false)
+DO_3S_FP(VCGE, gen_helper_neon_cge_f32, false)
+DO_3S_FP(VCGT, gen_helper_neon_cgt_f32, false)
+DO_3S_FP(VACGE, gen_helper_neon_acge_f32, false)
+DO_3S_FP(VACGT, gen_helper_neon_acgt_f32, false)
 
 static void gen_VMLA_fp_3s(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm,
                             TCGv_ptr fpstatus)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 57343daa10a..c68dbe126eb 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -4786,6 +4786,8 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         case NEON_3R_VQDMULH_VQRDMULH:
         case NEON_3R_FLOAT_ARITH:
         case NEON_3R_FLOAT_MULTIPLY:
+        case NEON_3R_FLOAT_CMP:
+        case NEON_3R_FLOAT_ACMP:
             /* Already handled by decodetree */
             return 1;
         }
@@ -4800,17 +4802,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                 return 1; /* VPMIN/VPMAX handled by decodetree */
             }
             break;
-        case NEON_3R_FLOAT_CMP:
-            if (!u && size) {
-                /* no encoding for U=0 C=1x */
-                return 1;
-            }
-            break;
-        case NEON_3R_FLOAT_ACMP:
-            if (!u) {
-                return 1;
-            }
-            break;
         case NEON_3R_FLOAT_MISC:
             /* VMAXNM/VMINNM in ARMv8 */
             if (u && !arm_dc_feature(s, ARM_FEATURE_V8)) {
@@ -4832,32 +4823,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         tmp = neon_load_reg(rn, pass);
         tmp2 = neon_load_reg(rm, pass);
         switch (op) {
-        case NEON_3R_FLOAT_CMP:
-        {
-            TCGv_ptr fpstatus = get_fpstatus_ptr(1);
-            if (!u) {
-                gen_helper_neon_ceq_f32(tmp, tmp, tmp2, fpstatus);
-            } else {
-                if (size == 0) {
-                    gen_helper_neon_cge_f32(tmp, tmp, tmp2, fpstatus);
-                } else {
-                    gen_helper_neon_cgt_f32(tmp, tmp, tmp2, fpstatus);
-                }
-            }
-            tcg_temp_free_ptr(fpstatus);
-            break;
-        }
-        case NEON_3R_FLOAT_ACMP:
-        {
-            TCGv_ptr fpstatus = get_fpstatus_ptr(1);
-            if (size == 0) {
-                gen_helper_neon_acge_f32(tmp, tmp, tmp2, fpstatus);
-            } else {
-                gen_helper_neon_acgt_f32(tmp, tmp, tmp2, fpstatus);
-            }
-            tcg_temp_free_ptr(fpstatus);
-            break;
-        }
         case NEON_3R_FLOAT_MINMAX:
         {
             TCGv_ptr fpstatus = get_fpstatus_ptr(1);
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index 96866c03db4..e90c7a9afe9 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -168,5 +168,10 @@ VABD_fp_3s       1111 001 1 0 . 1 . .... .... 1101 ... 0 .... @3same_fp
 VMLA_fp_3s       1111 001 0 0 . 0 . .... .... 1101 ... 1 .... @3same_fp
 VMLS_fp_3s       1111 001 0 0 . 1 . .... .... 1101 ... 1 .... @3same_fp
 VMUL_fp_3s       1111 001 1 0 . 0 . .... .... 1101 ... 1 .... @3same_fp
+VCEQ_fp_3s       1111 001 0 0 . 0 . .... .... 1110 ... 0 .... @3same_fp
+VCGE_fp_3s       1111 001 1 0 . 0 . .... .... 1110 ... 0 .... @3same_fp
+VACGE_fp_3s      1111 001 1 0 . 0 . .... .... 1110 ... 1 .... @3same_fp
+VCGT_fp_3s       1111 001 1 0 . 1 . .... .... 1110 ... 0 .... @3same_fp
+VACGT_fp_3s      1111 001 1 0 . 1 . .... .... 1110 ... 1 .... @3same_fp
 VPMAX_fp_3s      1111 001 1 0 . 0 . .... .... 1111 ... 0 .... @3same_fp_q0
 VPMIN_fp_3s      1111 001 1 0 . 1 . .... .... 1111 ... 0 .... @3same_fp_q0
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 35/36] target/arm: Convert Neon fp VMAX/VMIN/VMAXNM/VMINNM/VRECPS/VRSQRTS to decodetree
  2020-04-30 18:09 [PATCH 00/36] target/arm: Convert Neon to decodetree (part 1) Peter Maydell
                   ` (33 preceding siblings ...)
  2020-04-30 18:10 ` [PATCH 34/36] target/arm: Convert Neon 3-reg-same compare " Peter Maydell
@ 2020-04-30 18:10 ` Peter Maydell
  2020-05-01  4:13   ` Richard Henderson
  2020-04-30 18:10 ` [PATCH 36/36] target/arm: Convert NEON VFMA, VFMS 3-reg-same insns " Peter Maydell
                   ` (2 subsequent siblings)
  37 siblings, 1 reply; 85+ messages in thread
From: Peter Maydell @ 2020-04-30 18:10 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the Neon fp VMAX/VMIN/VMAXNM/VMINNM/VRECPS/VRSQRTS 3-reg-same
insns to decodetree. (These are all the remaining non-accumulation
instructions in this group.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-neon.inc.c | 60 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 42 ++---------------------
 target/arm/neon-dp.decode       |  6 ++++
 3 files changed, 68 insertions(+), 40 deletions(-)

diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index 29a3f7677c7..00b0b252e13 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -1394,6 +1394,8 @@ DO_3S_FP(VCGE, gen_helper_neon_cge_f32, false)
 DO_3S_FP(VCGT, gen_helper_neon_cgt_f32, false)
 DO_3S_FP(VACGE, gen_helper_neon_acge_f32, false)
 DO_3S_FP(VACGT, gen_helper_neon_acgt_f32, false)
+DO_3S_FP(VMAX, gen_helper_vfp_maxs, false)
+DO_3S_FP(VMIN, gen_helper_vfp_mins, false)
 
 static void gen_VMLA_fp_3s(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm,
                             TCGv_ptr fpstatus)
@@ -1412,6 +1414,64 @@ static void gen_VMLS_fp_3s(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm,
 DO_3S_FP(VMLA, gen_VMLA_fp_3s, true)
 DO_3S_FP(VMLS, gen_VMLS_fp_3s, true)
 
+static bool trans_VMAXNM_fp_3s(DisasContext *s, arg_3same *a)
+{
+    if (!arm_dc_feature(s, ARM_FEATURE_V8)) {
+            return false;
+    }
+
+    if (a->size != 0) {
+        /* TODO fp16 support */
+        return false;
+    }
+
+    return do_3same_fp(s, a, gen_helper_vfp_maxnums, false);
+}
+
+static bool trans_VMINNM_fp_3s(DisasContext *s, arg_3same *a)
+{
+    if (!arm_dc_feature(s, ARM_FEATURE_V8)) {
+            return false;
+    }
+
+    if (a->size != 0) {
+        /* TODO fp16 support */
+        return false;
+    }
+
+    return do_3same_fp(s, a, gen_helper_vfp_minnums, false);
+}
+
+static void gen_VRECPS_fp_3s(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm)
+{
+    gen_helper_recps_f32(vd, vn, vm, cpu_env);
+}
+
+static bool trans_VRECPS_fp_3s(DisasContext *s, arg_3same *a)
+{
+    if (a->size != 0) {
+        /* TODO fp16 support */
+        return false;
+    }
+
+    return do_3same_32(s, a, gen_VRECPS_fp_3s);
+}
+
+static void gen_VRSQRTS_fp_3s(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm)
+{
+    gen_helper_rsqrts_f32(vd, vn, vm, cpu_env);
+}
+
+static bool trans_VRSQRTS_fp_3s(DisasContext *s, arg_3same *a)
+{
+    if (a->size != 0) {
+        /* TODO fp16 support */
+        return false;
+    }
+
+    return do_3same_32(s, a, gen_VRSQRTS_fp_3s);
+}
+
 static bool do_3same_fp_pair(DisasContext *s, arg_3same *a, VFPGen3OpSPFn *fn)
 {
     /* FP operations handled pairwise 32 bits at a time */
diff --git a/target/arm/translate.c b/target/arm/translate.c
index c68dbe126eb..d34a96e9018 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -4788,6 +4788,8 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         case NEON_3R_FLOAT_MULTIPLY:
         case NEON_3R_FLOAT_CMP:
         case NEON_3R_FLOAT_ACMP:
+        case NEON_3R_FLOAT_MINMAX:
+        case NEON_3R_FLOAT_MISC:
             /* Already handled by decodetree */
             return 1;
         }
@@ -4797,17 +4799,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             return 1;
         }
         switch (op) {
-        case NEON_3R_FLOAT_MINMAX:
-            if (u) {
-                return 1; /* VPMIN/VPMAX handled by decodetree */
-            }
-            break;
-        case NEON_3R_FLOAT_MISC:
-            /* VMAXNM/VMINNM in ARMv8 */
-            if (u && !arm_dc_feature(s, ARM_FEATURE_V8)) {
-                return 1;
-            }
-            break;
         case NEON_3R_VFM_VQRDMLSH:
             if (!dc_isar_feature(aa32_simdfmac, s)) {
                 return 1;
@@ -4823,35 +4814,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         tmp = neon_load_reg(rn, pass);
         tmp2 = neon_load_reg(rm, pass);
         switch (op) {
-        case NEON_3R_FLOAT_MINMAX:
-        {
-            TCGv_ptr fpstatus = get_fpstatus_ptr(1);
-            if (size == 0) {
-                gen_helper_vfp_maxs(tmp, tmp, tmp2, fpstatus);
-            } else {
-                gen_helper_vfp_mins(tmp, tmp, tmp2, fpstatus);
-            }
-            tcg_temp_free_ptr(fpstatus);
-            break;
-        }
-        case NEON_3R_FLOAT_MISC:
-            if (u) {
-                /* VMAXNM/VMINNM */
-                TCGv_ptr fpstatus = get_fpstatus_ptr(1);
-                if (size == 0) {
-                    gen_helper_vfp_maxnums(tmp, tmp, tmp2, fpstatus);
-                } else {
-                    gen_helper_vfp_minnums(tmp, tmp, tmp2, fpstatus);
-                }
-                tcg_temp_free_ptr(fpstatus);
-            } else {
-                if (size == 0) {
-                    gen_helper_recps_f32(tmp, tmp, tmp2, cpu_env);
-                } else {
-                    gen_helper_rsqrts_f32(tmp, tmp, tmp2, cpu_env);
-              }
-            }
-            break;
         case NEON_3R_VFM_VQRDMLSH:
         {
             /* VFMA, VFMS: fused multiply-add */
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index e90c7a9afe9..c4a90e70753 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -173,5 +173,11 @@ VCGE_fp_3s       1111 001 1 0 . 0 . .... .... 1110 ... 0 .... @3same_fp
 VACGE_fp_3s      1111 001 1 0 . 0 . .... .... 1110 ... 1 .... @3same_fp
 VCGT_fp_3s       1111 001 1 0 . 1 . .... .... 1110 ... 0 .... @3same_fp
 VACGT_fp_3s      1111 001 1 0 . 1 . .... .... 1110 ... 1 .... @3same_fp
+VMAX_fp_3s       1111 001 0 0 . 0 . .... .... 1111 ... 0 .... @3same_fp
+VMIN_fp_3s       1111 001 0 0 . 1 . .... .... 1111 ... 0 .... @3same_fp
 VPMAX_fp_3s      1111 001 1 0 . 0 . .... .... 1111 ... 0 .... @3same_fp_q0
 VPMIN_fp_3s      1111 001 1 0 . 1 . .... .... 1111 ... 0 .... @3same_fp_q0
+VRECPS_fp_3s     1111 001 0 0 . 0 . .... .... 1111 ... 1 .... @3same_fp
+VRSQRTS_fp_3s    1111 001 0 0 . 1 . .... .... 1111 ... 1 .... @3same_fp
+VMAXNM_fp_3s     1111 001 1 0 . 0 . .... .... 1111 ... 1 .... @3same_fp
+VMINNM_fp_3s     1111 001 1 0 . 1 . .... .... 1111 ... 1 .... @3same_fp
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 36/36] target/arm: Convert NEON VFMA, VFMS 3-reg-same insns to decodetree
  2020-04-30 18:09 [PATCH 00/36] target/arm: Convert Neon to decodetree (part 1) Peter Maydell
                   ` (34 preceding siblings ...)
  2020-04-30 18:10 ` [PATCH 35/36] target/arm: Convert Neon fp VMAX/VMIN/VMAXNM/VMINNM/VRECPS/VRSQRTS " Peter Maydell
@ 2020-04-30 18:10 ` Peter Maydell
  2020-05-01  4:14   ` Richard Henderson
  2020-05-01  7:32 ` [PATCH 00/36] target/arm: Convert Neon to decodetree (part 1) no-reply
  2020-05-04 12:04 ` Peter Maydell
  37 siblings, 1 reply; 85+ messages in thread
From: Peter Maydell @ 2020-04-30 18:10 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the Neon floating point VFMA and VFMS insn to decodetree.
These are the last insns in the 3-reg-same group so we can
remove all the support/loop code from the old decoder.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-neon.inc.c |  41 ++++++++
 target/arm/translate.c          | 176 +-------------------------------
 target/arm/neon-dp.decode       |   3 +
 3 files changed, 46 insertions(+), 174 deletions(-)

diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index 00b0b252e13..1a4c718b6e4 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -1472,6 +1472,47 @@ static bool trans_VRSQRTS_fp_3s(DisasContext *s, arg_3same *a)
     return do_3same_32(s, a, gen_VRSQRTS_fp_3s);
 }
 
+static void gen_VFMA_fp_3s(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm,
+                            TCGv_ptr fpstatus)
+{
+    gen_helper_vfp_muladds(vd, vn, vm, vd, fpstatus);
+}
+
+static bool trans_VFMA_fp_3s(DisasContext *s, arg_3same *a)
+{
+    if (!dc_isar_feature(aa32_simdfmac, s)) {
+        return false;
+    }
+
+    if (a->size != 0) {
+        /* TODO fp16 support */
+        return false;
+    }
+
+    return do_3same_fp(s, a, gen_VFMA_fp_3s, true);
+}
+
+static void gen_VFMS_fp_3s(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm,
+                            TCGv_ptr fpstatus)
+{
+    gen_helper_vfp_negs(vn, vn);
+    gen_helper_vfp_muladds(vd, vn, vm, vd, fpstatus);
+}
+
+static bool trans_VFMS_fp_3s(DisasContext *s, arg_3same *a)
+{
+    if (!dc_isar_feature(aa32_simdfmac, s)) {
+        return false;
+    }
+
+    if (a->size != 0) {
+        /* TODO fp16 support */
+        return false;
+    }
+
+    return do_3same_fp(s, a, gen_VFMS_fp_3s, true);
+}
+
 static bool do_3same_fp_pair(DisasContext *s, arg_3same *a, VFPGen3OpSPFn *fn)
 {
     /* FP operations handled pairwise 32 bits at a time */
diff --git a/target/arm/translate.c b/target/arm/translate.c
index d34a96e9018..f392a15ffbf 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -3391,78 +3391,6 @@ static void gen_neon_narrow_op(int op, int u, int size,
     }
 }
 
-/* Symbolic constants for op fields for Neon 3-register same-length.
- * The values correspond to bits [11:8,4]; see the ARM ARM DDI0406B
- * table A7-9.
- */
-#define NEON_3R_VHADD 0
-#define NEON_3R_VQADD 1
-#define NEON_3R_VRHADD 2
-#define NEON_3R_LOGIC 3 /* VAND,VBIC,VORR,VMOV,VORN,VEOR,VBIF,VBIT,VBSL */
-#define NEON_3R_VHSUB 4
-#define NEON_3R_VQSUB 5
-#define NEON_3R_VCGT 6
-#define NEON_3R_VCGE 7
-#define NEON_3R_VSHL 8
-#define NEON_3R_VQSHL 9
-#define NEON_3R_VRSHL 10
-#define NEON_3R_VQRSHL 11
-#define NEON_3R_VMAX 12
-#define NEON_3R_VMIN 13
-#define NEON_3R_VABD 14
-#define NEON_3R_VABA 15
-#define NEON_3R_VADD_VSUB 16
-#define NEON_3R_VTST_VCEQ 17
-#define NEON_3R_VML 18 /* VMLA, VMLS */
-#define NEON_3R_VMUL 19
-#define NEON_3R_VPMAX 20
-#define NEON_3R_VPMIN 21
-#define NEON_3R_VQDMULH_VQRDMULH 22
-#define NEON_3R_VPADD_VQRDMLAH 23
-#define NEON_3R_SHA 24 /* SHA1C,SHA1P,SHA1M,SHA1SU0,SHA256H{2},SHA256SU1 */
-#define NEON_3R_VFM_VQRDMLSH 25 /* VFMA, VFMS, VQRDMLSH */
-#define NEON_3R_FLOAT_ARITH 26 /* float VADD, VSUB, VPADD, VABD */
-#define NEON_3R_FLOAT_MULTIPLY 27 /* float VMLA, VMLS, VMUL */
-#define NEON_3R_FLOAT_CMP 28 /* float VCEQ, VCGE, VCGT */
-#define NEON_3R_FLOAT_ACMP 29 /* float VACGE, VACGT, VACLE, VACLT */
-#define NEON_3R_FLOAT_MINMAX 30 /* float VMIN, VMAX */
-#define NEON_3R_FLOAT_MISC 31 /* float VRECPS, VRSQRTS, VMAXNM/MINNM */
-
-static const uint8_t neon_3r_sizes[] = {
-    [NEON_3R_VHADD] = 0x7,
-    [NEON_3R_VQADD] = 0xf,
-    [NEON_3R_VRHADD] = 0x7,
-    [NEON_3R_LOGIC] = 0xf, /* size field encodes op type */
-    [NEON_3R_VHSUB] = 0x7,
-    [NEON_3R_VQSUB] = 0xf,
-    [NEON_3R_VCGT] = 0x7,
-    [NEON_3R_VCGE] = 0x7,
-    [NEON_3R_VSHL] = 0xf,
-    [NEON_3R_VQSHL] = 0xf,
-    [NEON_3R_VRSHL] = 0xf,
-    [NEON_3R_VQRSHL] = 0xf,
-    [NEON_3R_VMAX] = 0x7,
-    [NEON_3R_VMIN] = 0x7,
-    [NEON_3R_VABD] = 0x7,
-    [NEON_3R_VABA] = 0x7,
-    [NEON_3R_VADD_VSUB] = 0xf,
-    [NEON_3R_VTST_VCEQ] = 0x7,
-    [NEON_3R_VML] = 0x7,
-    [NEON_3R_VMUL] = 0x7,
-    [NEON_3R_VPMAX] = 0x7,
-    [NEON_3R_VPMIN] = 0x7,
-    [NEON_3R_VQDMULH_VQRDMULH] = 0x6,
-    [NEON_3R_VPADD_VQRDMLAH] = 0x7,
-    [NEON_3R_SHA] = 0xf, /* size field encodes op type */
-    [NEON_3R_VFM_VQRDMLSH] = 0x7, /* For VFM, size bit 1 encodes op */
-    [NEON_3R_FLOAT_ARITH] = 0x5, /* size bit 1 encodes op */
-    [NEON_3R_FLOAT_MULTIPLY] = 0x5, /* size bit 1 encodes op */
-    [NEON_3R_FLOAT_CMP] = 0x5, /* size bit 1 encodes op */
-    [NEON_3R_FLOAT_ACMP] = 0x5, /* size bit 1 encodes op */
-    [NEON_3R_FLOAT_MINMAX] = 0x5, /* size bit 1 encodes op */
-    [NEON_3R_FLOAT_MISC] = 0x5, /* size bit 1 encodes op */
-};
-
 /* Symbolic constants for op fields for Neon 2-register miscellaneous.
  * The values correspond to bits [17:16,10:7]; see the ARM ARM DDI0406B
  * table A7-13.
@@ -4735,108 +4663,8 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
     rm_ofs = neon_reg_offset(rm, 0);
 
     if ((insn & (1 << 23)) == 0) {
-        /* Three register same length.  */
-        op = ((insn >> 7) & 0x1e) | ((insn >> 4) & 1);
-        /* Catch invalid op and bad size combinations: UNDEF */
-        if ((neon_3r_sizes[op] & (1 << size)) == 0) {
-            return 1;
-        }
-        /* All insns of this form UNDEF for either this condition or the
-         * superset of cases "Q==1"; we catch the latter later.
-         */
-        if (q && ((rd | rn | rm) & 1)) {
-            return 1;
-        }
-        switch (op) {
-        case NEON_3R_VFM_VQRDMLSH:
-            if (!u) {
-                /* VFM, VFMS */
-                if (size == 1) {
-                    return 1;
-                }
-                break;
-            }
-            /* VQRDMLSH : handled by decodetree */
-            return 1;
-
-        case NEON_3R_VADD_VSUB:
-        case NEON_3R_LOGIC:
-        case NEON_3R_VMAX:
-        case NEON_3R_VMIN:
-        case NEON_3R_VTST_VCEQ:
-        case NEON_3R_VCGT:
-        case NEON_3R_VCGE:
-        case NEON_3R_VQADD:
-        case NEON_3R_VQSUB:
-        case NEON_3R_VMUL:
-        case NEON_3R_VML:
-        case NEON_3R_VSHL:
-        case NEON_3R_SHA:
-        case NEON_3R_VHADD:
-        case NEON_3R_VRHADD:
-        case NEON_3R_VHSUB:
-        case NEON_3R_VABD:
-        case NEON_3R_VQSHL:
-        case NEON_3R_VRSHL:
-        case NEON_3R_VQRSHL:
-        case NEON_3R_VABA:
-        case NEON_3R_VPMAX:
-        case NEON_3R_VPMIN:
-        case NEON_3R_VPADD_VQRDMLAH:
-        case NEON_3R_VQDMULH_VQRDMULH:
-        case NEON_3R_FLOAT_ARITH:
-        case NEON_3R_FLOAT_MULTIPLY:
-        case NEON_3R_FLOAT_CMP:
-        case NEON_3R_FLOAT_ACMP:
-        case NEON_3R_FLOAT_MINMAX:
-        case NEON_3R_FLOAT_MISC:
-            /* Already handled by decodetree */
-            return 1;
-        }
-
-        if (size == 3) {
-            /* 64-bit element instructions: handled by decodetree */
-            return 1;
-        }
-        switch (op) {
-        case NEON_3R_VFM_VQRDMLSH:
-            if (!dc_isar_feature(aa32_simdfmac, s)) {
-                return 1;
-            }
-            break;
-        default:
-            break;
-        }
-
-        for (pass = 0; pass < (q ? 4 : 2); pass++) {
-
-        /* Elementwise.  */
-        tmp = neon_load_reg(rn, pass);
-        tmp2 = neon_load_reg(rm, pass);
-        switch (op) {
-        case NEON_3R_VFM_VQRDMLSH:
-        {
-            /* VFMA, VFMS: fused multiply-add */
-            TCGv_ptr fpstatus = get_fpstatus_ptr(1);
-            TCGv_i32 tmp3 = neon_load_reg(rd, pass);
-            if (size) {
-                /* VFMS */
-                gen_helper_vfp_negs(tmp, tmp);
-            }
-            gen_helper_vfp_muladds(tmp, tmp, tmp2, tmp3, fpstatus);
-            tcg_temp_free_i32(tmp3);
-            tcg_temp_free_ptr(fpstatus);
-            break;
-        }
-        default:
-            abort();
-        }
-        tcg_temp_free_i32(tmp2);
-
-        neon_store_reg(rd, pass, tmp);
-
-        } /* for pass */
-        /* End of 3 register same size operations.  */
+        /* Three register same length: handled by decodetree */
+        return 1;
     } else if (insn & (1 << 4)) {
         if ((insn & 0x00380080) != 0) {
             /* Two registers and shift.  */
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index c4a90e70753..3c5a9f0d0e0 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -159,6 +159,9 @@ SHA256H2_3s      1111 001 1 0 . 01 .... .... 1100 . 1 . 0 .... \
 SHA256SU1_3s     1111 001 1 0 . 10 .... .... 1100 . 1 . 0 .... \
                  vm=%vm_dp vn=%vn_dp vd=%vd_dp
 
+VFMA_fp_3s       1111 001 0 0 . 0 . .... .... 1100 ... 1 .... @3same_fp
+VFMS_fp_3s       1111 001 0 0 . 1 . .... .... 1100 ... 1 .... @3same_fp
+
 VQRDMLSH_3s      1111 001 1 0 . .. .... .... 1100 ... 1 .... @3same
 
 VADD_fp_3s       1111 001 0 0 . 0 . .... .... 1101 ... 0 .... @3same_fp
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 85+ messages in thread

* Re: [PATCH 01/36] target/arm/translate-vfp.inc.c: Remove duplicate simd_r32 check
  2020-04-30 18:09 ` [PATCH 01/36] target/arm/translate-vfp.inc.c: Remove duplicate simd_r32 check Peter Maydell
@ 2020-04-30 18:21   ` Richard Henderson
  2020-05-01 16:55   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 85+ messages in thread
From: Richard Henderson @ 2020-04-30 18:21 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 4/30/20 11:09 AM, Peter Maydell wrote:
> Somewhere along theline we accidentally added a duplicate
> "using D16-D31 when they don't exist" check to do_vfm_dp()
> (probably an artifact of a patchseries rebase). Remove it.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/translate-vfp.inc.c | 6 ------
>  1 file changed, 6 deletions(-)

My fault.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 02/36] target/arm: Don't allow Thumb Neon insns without FEATURE_NEON
  2020-04-30 18:09 ` [PATCH 02/36] target/arm: Don't allow Thumb Neon insns without FEATURE_NEON Peter Maydell
@ 2020-04-30 18:22   ` Richard Henderson
  2020-05-01 16:56   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 85+ messages in thread
From: Richard Henderson @ 2020-04-30 18:22 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 4/30/20 11:09 AM, Peter Maydell wrote:
> We were accidentally permitting decode of Thumb Neon insns even if
> the CPU didn't have the FEATURE_NEON bit set, because the feature
> check was being done before the call to disas_neon_data_insn() and
> disas_neon_ls_insn() in the Arm decoder but was omitted from the
> Thumb decoder.  Push the feature bit check down into the called
> functions so it is done for both Arm and Thumb encodings.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/translate.c | 16 ++++++++--------
>  1 file changed, 8 insertions(+), 8 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 03/36] target/arm: Add stubs for AArch32 Neon decodetree
  2020-04-30 18:09 ` [PATCH 03/36] target/arm: Add stubs for AArch32 Neon decodetree Peter Maydell
@ 2020-04-30 18:30   ` Richard Henderson
  0 siblings, 0 replies; 85+ messages in thread
From: Richard Henderson @ 2020-04-30 18:30 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 4/30/20 11:09 AM, Peter Maydell wrote:
> Add the infrastructure for building and invoking a decodetree decoder
> for the AArch32 Neon encodings.  At the moment the new decoder covers
> nothing, so we always fall back to the existing hand-written decode.
> 
> We follow the same pattern we did for the VFP decodetree conversion
> (commit 78e138bc1f672c145ef6ace74617d and following): code that deals
> with Neon will be moving gradually out to translate-neon.vfp.inc,
> which we #include into translate.c.
> 
> In order to share the decode files between A32 and T32, we
> split Neon into 3 parts:
>  * data-processing
>  * load-store
>  * 'shared' encodings
> 
> The first two groups of instructions have similar but not identical
> A32 and T32 encodings, so we need to manually transform the T32
> encoding into the A32 one before calling the decoder; the third group
> covers the Neon instructions which are identical in A32 and T32.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/Makefile.objs        | 18 +++++++++++++++++
>  target/arm/translate-neon.inc.c | 32 +++++++++++++++++++++++++++++
>  target/arm/translate.c          | 36 +++++++++++++++++++++++++++++++--
>  target/arm/neon-dp.decode       | 29 ++++++++++++++++++++++++++
>  target/arm/neon-ls.decode       | 29 ++++++++++++++++++++++++++
>  target/arm/neon-shared.decode   | 27 +++++++++++++++++++++++++
>  6 files changed, 169 insertions(+), 2 deletions(-)
>  create mode 100644 target/arm/translate-neon.inc.c
>  create mode 100644 target/arm/neon-dp.decode
>  create mode 100644 target/arm/neon-ls.decode
>  create mode 100644 target/arm/neon-shared.decode

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 04/36] target/arm: Convert VCMLA (vector) to decodetree
  2020-04-30 18:09 ` [PATCH 04/36] target/arm: Convert VCMLA (vector) to decodetree Peter Maydell
@ 2020-04-30 18:34   ` Richard Henderson
  0 siblings, 0 replies; 85+ messages in thread
From: Richard Henderson @ 2020-04-30 18:34 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 4/30/20 11:09 AM, Peter Maydell wrote:
> Convert the VCMLA (vector) insns in the 3same extension group to
> decodetree.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/translate-neon.inc.c | 37 +++++++++++++++++++++++++++++++++
>  target/arm/translate.c          | 11 +---------
>  target/arm/neon-shared.decode   | 11 ++++++++++
>  3 files changed, 49 insertions(+), 10 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 05/36] target/arm: Convert VCADD (vector) to decodetree
  2020-04-30 18:09 ` [PATCH 05/36] target/arm: Convert VCADD " Peter Maydell
@ 2020-04-30 18:35   ` Richard Henderson
  0 siblings, 0 replies; 85+ messages in thread
From: Richard Henderson @ 2020-04-30 18:35 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 4/30/20 11:09 AM, Peter Maydell wrote:
> Convert the VCADD (vector) insns to decodetree.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/translate-neon.inc.c | 37 +++++++++++++++++++++++++++++++++
>  target/arm/translate.c          | 11 +---------
>  target/arm/neon-shared.decode   |  3 +++
>  3 files changed, 41 insertions(+), 10 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 06/36] target/arm: Convert V[US]DOT (vector) to decodetree
  2020-04-30 18:09 ` [PATCH 06/36] target/arm: Convert V[US]DOT " Peter Maydell
@ 2020-04-30 18:36   ` Richard Henderson
  0 siblings, 0 replies; 85+ messages in thread
From: Richard Henderson @ 2020-04-30 18:36 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 4/30/20 11:09 AM, Peter Maydell wrote:
> Convert the V[US]DOT (vector) insns to decodetree.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/translate-neon.inc.c | 32 ++++++++++++++++++++++++++++++++
>  target/arm/translate.c          |  9 +--------
>  target/arm/neon-shared.decode   |  4 ++++
>  3 files changed, 37 insertions(+), 8 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 07/36] target/arm: Convert VFM[AS]L (vector) to decodetree
  2020-04-30 18:09 ` [PATCH 07/36] target/arm: Convert VFM[AS]L " Peter Maydell
@ 2020-04-30 18:43   ` Richard Henderson
  0 siblings, 0 replies; 85+ messages in thread
From: Richard Henderson @ 2020-04-30 18:43 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 4/30/20 11:09 AM, Peter Maydell wrote:
> Convert the VFM[AS]L (vector) insns to decodetree.  This is the last
> insn in the legacy decoder for the 3same_ext group, so we can
> delete the legacy decoder function for the group entirely.
> 
> Note that in disas_thumb2_insn() the parts of this encoding space
> where the decodetree decoder returns false will correctly be directed
> to illegal_op by the "(insn & (1 << 28))" check so they won't fall
> into disas_coproc_insn() by mistake.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/translate-neon.inc.c | 31 +++++++++++
>  target/arm/translate.c          | 92 +--------------------------------
>  target/arm/neon-shared.decode   |  6 +++
>  3 files changed, 38 insertions(+), 91 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 08/36] target/arm: Convert VCMLA (scalar) to decodetree
  2020-04-30 18:09 ` [PATCH 08/36] target/arm: Convert VCMLA (scalar) " Peter Maydell
@ 2020-04-30 19:00   ` Richard Henderson
  0 siblings, 0 replies; 85+ messages in thread
From: Richard Henderson @ 2020-04-30 19:00 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 4/30/20 11:09 AM, Peter Maydell wrote:
> Convert VCMLA (scalar) in the 2reg-scalar-ext group to decodetree.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/translate-neon.inc.c | 40 +++++++++++++++++++++++++++++++++
>  target/arm/translate.c          | 26 +--------------------
>  target/arm/neon-shared.decode   |  5 +++++
>  3 files changed, 46 insertions(+), 25 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 09/36] target/arm: Convert V[US]DOT (scalar) to decodetree
  2020-04-30 18:09 ` [PATCH 09/36] target/arm: Convert V[US]DOT " Peter Maydell
@ 2020-04-30 19:01   ` Richard Henderson
  0 siblings, 0 replies; 85+ messages in thread
From: Richard Henderson @ 2020-04-30 19:01 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 4/30/20 11:09 AM, Peter Maydell wrote:
> Convert the V[US]DOT (scalar) insns in the 2reg-scalar-ext group
> to decodetree.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/translate-neon.inc.c | 35 +++++++++++++++++++++++++++++++++
>  target/arm/translate.c          | 13 +-----------
>  target/arm/neon-shared.decode   |  3 +++
>  3 files changed, 39 insertions(+), 12 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 10/36] target/arm: Convert VFM[AS]L (scalar) to decodetree
  2020-04-30 18:09 ` [PATCH 10/36] target/arm: Convert VFM[AS]L " Peter Maydell
@ 2020-04-30 19:06   ` Richard Henderson
  0 siblings, 0 replies; 85+ messages in thread
From: Richard Henderson @ 2020-04-30 19:06 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 4/30/20 11:09 AM, Peter Maydell wrote:
> Convert the VFM[AS]L (scalar) insns in the 2reg-scalar-ext group
> to decodetree. These are the last ones in the group so we can remove
> all the legacy decode for the group.
> 
> Note that in disas_thumb2_insn() the parts of this encoding space
> where the decodetree decoder returns false will correctly be directed
> to illegal_op by the "(insn & (1 << 28))" check so they won't fall
> into disas_coproc_insn() by mistake.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/translate-neon.inc.c |  32 ++++++++++
>  target/arm/translate.c          | 107 +-------------------------------
>  target/arm/neon-shared.decode   |   7 +++
>  3 files changed, 40 insertions(+), 106 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 11/36] target/arm: Convert Neon load/store multiple structures to decodetree
  2020-04-30 18:09 ` [PATCH 11/36] target/arm: Convert Neon load/store multiple structures " Peter Maydell
@ 2020-04-30 19:09   ` Richard Henderson
  0 siblings, 0 replies; 85+ messages in thread
From: Richard Henderson @ 2020-04-30 19:09 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 4/30/20 11:09 AM, Peter Maydell wrote:
> Convert the Neon "load/store multiple structures" insns to decodetree.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/translate-neon.inc.c | 124 ++++++++++++++++++++++++++++++++
>  target/arm/translate.c          |  91 +----------------------
>  target/arm/neon-ls.decode       |   7 ++
>  3 files changed, 133 insertions(+), 89 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 12/36] target/arm: Convert Neon 'load single structure to all lanes' to decodetree
  2020-04-30 18:09 ` [PATCH 12/36] target/arm: Convert Neon 'load single structure to all lanes' " Peter Maydell
@ 2020-04-30 19:17   ` Richard Henderson
  0 siblings, 0 replies; 85+ messages in thread
From: Richard Henderson @ 2020-04-30 19:17 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 4/30/20 11:09 AM, Peter Maydell wrote:
> Convert the Neon "load single structure to all lanes" insns to
> decodetree.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/translate-neon.inc.c | 73 +++++++++++++++++++++++++++++++++
>  target/arm/translate.c          | 55 +------------------------
>  target/arm/neon-ls.decode       |  5 +++
>  3 files changed, 80 insertions(+), 53 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 13/36] target/arm: Convert Neon 'load/store single structure' to decodetree
  2020-04-30 18:09 ` [PATCH 13/36] target/arm: Convert Neon 'load/store single structure' " Peter Maydell
@ 2020-04-30 19:32   ` Richard Henderson
  0 siblings, 0 replies; 85+ messages in thread
From: Richard Henderson @ 2020-04-30 19:32 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 4/30/20 11:09 AM, Peter Maydell wrote:
> Convert the Neon "load/store single structure to one lane" insns to
> decodetree.
> 
> As this is the last set of insns in the neon load/store group,
> we can remove the whole disas_neon_ls_insn() function.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/translate-neon.inc.c |  89 +++++++++++++++++++
>  target/arm/translate.c          | 147 --------------------------------
>  target/arm/neon-ls.decode       |  11 +++
>  3 files changed, 100 insertions(+), 147 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 14/36] target/arm: Convert Neon 3-reg-same VADD/VSUB to decodetree
  2020-04-30 18:09 ` [PATCH 14/36] target/arm: Convert Neon 3-reg-same VADD/VSUB " Peter Maydell
@ 2020-04-30 19:36   ` Richard Henderson
  0 siblings, 0 replies; 85+ messages in thread
From: Richard Henderson @ 2020-04-30 19:36 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 4/30/20 11:09 AM, Peter Maydell wrote:
> Convert the Neon 3-reg-same VADD and VSUB insns to decodetree.
> 
> Note that we don't need the neon_3r_sizes[op] check here because all
> size values are OK for VADD and VSUB; we'll add this when we convert
> the first insn that has size restrictions.
> 
> For this we need one of the GVecGen*Fn typedefs currently in
> translate-a64.h; move them all to translate.h as a block so they
> are visible to the 32-bit decoder.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/translate-a64.h      |  9 --------
>  target/arm/translate.h          |  9 ++++++++
>  target/arm/translate-neon.inc.c | 38 +++++++++++++++++++++++++++++++++
>  target/arm/translate.c          | 14 ++++--------
>  target/arm/neon-dp.decode       | 17 +++++++++++++++
>  5 files changed, 68 insertions(+), 19 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 15/36] target/arm: Convert Neon 3-reg-same logic ops to decodetree
  2020-04-30 18:09 ` [PATCH 15/36] target/arm: Convert Neon 3-reg-same logic ops " Peter Maydell
@ 2020-04-30 19:39   ` Richard Henderson
  0 siblings, 0 replies; 85+ messages in thread
From: Richard Henderson @ 2020-04-30 19:39 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 4/30/20 11:09 AM, Peter Maydell wrote:
> Convert the Neon logic ops in the 3-reg-same grouping to decodetree.
> Note that for the logic ops the 'size' field forms part of their
> decode and the actual operations are always bitwise.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/translate-neon.inc.c | 19 +++++++++++++++++
>  target/arm/translate.c          | 38 +--------------------------------
>  target/arm/neon-dp.decode       | 12 +++++++++++
>  3 files changed, 32 insertions(+), 37 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 16/36] target/arm: Convert Neon 3-reg-same VMAX/VMIN to decodetree
  2020-04-30 18:09 ` [PATCH 16/36] target/arm: Convert Neon 3-reg-same VMAX/VMIN " Peter Maydell
@ 2020-04-30 19:45   ` Richard Henderson
  0 siblings, 0 replies; 85+ messages in thread
From: Richard Henderson @ 2020-04-30 19:45 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 4/30/20 11:09 AM, Peter Maydell wrote:
> Convert the Neon 3-reg-same VMAX and VMIN insns to decodetree.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/translate-neon.inc.c | 14 ++++++++++++++
>  target/arm/translate.c          | 21 ++-------------------
>  target/arm/neon-dp.decode       |  5 +++++
>  3 files changed, 21 insertions(+), 19 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 17/36] target/arm: Convert Neon 3-reg-same comparisons to decodetree
  2020-04-30 18:09 ` [PATCH 17/36] target/arm: Convert Neon 3-reg-same comparisons " Peter Maydell
@ 2020-04-30 19:48   ` Richard Henderson
  0 siblings, 0 replies; 85+ messages in thread
From: Richard Henderson @ 2020-04-30 19:48 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 4/30/20 11:09 AM, Peter Maydell wrote:
> Convert the Neon comparison ops in the 3-reg-same grouping
> to decodetree.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/translate-neon.inc.c | 22 ++++++++++++++++++++++
>  target/arm/translate.c          | 23 +++--------------------
>  target/arm/neon-dp.decode       |  8 ++++++++
>  3 files changed, 33 insertions(+), 20 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 18/36] target/arm: Convert Neon 3-reg-same VQADD/VQSUB to decodetree
  2020-04-30 18:09 ` [PATCH 18/36] target/arm: Convert Neon 3-reg-same VQADD/VQSUB " Peter Maydell
@ 2020-04-30 19:50   ` Richard Henderson
  0 siblings, 0 replies; 85+ messages in thread
From: Richard Henderson @ 2020-04-30 19:50 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 4/30/20 11:09 AM, Peter Maydell wrote:
> Convert the Neon VQADD/VQSUB insns in the 3-reg-same grouping
> to decodetree.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/translate-neon.inc.c | 15 +++++++++++++++
>  target/arm/translate.c          | 14 ++------------
>  target/arm/neon-dp.decode       |  6 ++++++
>  3 files changed, 23 insertions(+), 12 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 19/36] target/arm: Convert Neon 3-reg-same VMUL, VMLA, VMLS, VSHL to decodetree
  2020-04-30 18:09 ` [PATCH 19/36] target/arm: Convert Neon 3-reg-same VMUL, VMLA, VMLS, VSHL " Peter Maydell
@ 2020-04-30 19:58   ` Richard Henderson
  0 siblings, 0 replies; 85+ messages in thread
From: Richard Henderson @ 2020-04-30 19:58 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 4/30/20 11:09 AM, Peter Maydell wrote:
> Convert the Neon VMUL, VMLA, VMLS and VSHL insns in the
> 3-reg-same grouping to decodetree.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/translate-neon.inc.c | 44 +++++++++++++++++++++++++++++++++
>  target/arm/translate.c          | 28 +++------------------
>  target/arm/neon-dp.decode       |  9 +++++++
>  3 files changed, 56 insertions(+), 25 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 20/36] target/arm: Convert Neon 3-reg-same VQRDMLAH/VQRDMLSH to decodetree
  2020-04-30 18:09 ` [PATCH 20/36] target/arm: Convert Neon 3-reg-same VQRDMLAH/VQRDMLSH " Peter Maydell
@ 2020-04-30 20:03   ` Richard Henderson
  2020-04-30 20:28   ` Richard Henderson
  1 sibling, 0 replies; 85+ messages in thread
From: Richard Henderson @ 2020-04-30 20:03 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 4/30/20 11:09 AM, Peter Maydell wrote:
> Convert the Neon VQRDMLAH and VQRDMLSH insns in the 3-reg-same group
> to decodetree.  These don't use do_3same() because they want to
> operate on VFP double registers, whose offsets are different from the
> neon_reg_offset() calculations do_3same does.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/translate-neon.inc.c | 57 +++++++++++++++++++++++++++++++++
>  target/arm/translate.c          | 36 ++-------------------
>  target/arm/neon-dp.decode       |  3 ++
>  3 files changed, 62 insertions(+), 34 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 20/36] target/arm: Convert Neon 3-reg-same VQRDMLAH/VQRDMLSH to decodetree
  2020-04-30 18:09 ` [PATCH 20/36] target/arm: Convert Neon 3-reg-same VQRDMLAH/VQRDMLSH " Peter Maydell
  2020-04-30 20:03   ` Richard Henderson
@ 2020-04-30 20:28   ` Richard Henderson
  2020-05-01 14:23     ` Peter Maydell
  1 sibling, 1 reply; 85+ messages in thread
From: Richard Henderson @ 2020-04-30 20:28 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 4/30/20 11:09 AM, Peter Maydell wrote:
> These don't use do_3same() because they want to
> operate on VFP double registers, whose offsets are different from the
> neon_reg_offset() calculations do_3same does.

Actually, no, it's an around the bush way of computing the same register offset.

vfp_reg_offset(true, reg)

->  vfp.zregs[reg >> 1].d[reg & 1];

neon_reg_offset(reg, 0)

->  vfp_reg_offset(false, 2 * reg + 0)
->  vfp.zregs[(2 * reg) >> 2].d[((2 * reg) >> 1) & 1]
    + ((2 * reg) & 1) * offsetof(lower/upper)
->  vfp.zregs[reg >> 1].d[reg & 1] + 0


r~


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 21/36] target/arm: Convert Neon 3-reg-same SHA to decodetree
  2020-04-30 18:09 ` [PATCH 21/36] target/arm: Convert Neon 3-reg-same SHA " Peter Maydell
@ 2020-04-30 20:30   ` Richard Henderson
  0 siblings, 0 replies; 85+ messages in thread
From: Richard Henderson @ 2020-04-30 20:30 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 4/30/20 11:09 AM, Peter Maydell wrote:
> Convert the Neon SHA instructions in the 3-reg-same group
> to decodetree.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/translate-neon.inc.c | 139 ++++++++++++++++++++++++++++++++
>  target/arm/translate.c          |  46 +----------
>  target/arm/neon-dp.decode       |  10 +++
>  3 files changed, 151 insertions(+), 44 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

I have patches to convert these helpers to a gvec prototype, which will let us
re-use do_3same.  I'll fix that up when I rebase on top of your patch set.


r~


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 22/36] target/arm: Move gen_ function typedefs to translate.h
  2020-04-30 18:09 ` [PATCH 22/36] target/arm: Move gen_ function typedefs to translate.h Peter Maydell
@ 2020-04-30 20:32   ` Richard Henderson
  0 siblings, 0 replies; 85+ messages in thread
From: Richard Henderson @ 2020-04-30 20:32 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 4/30/20 11:09 AM, Peter Maydell wrote:
> We're going to want at least some of the NeonGen* typedefs
> for the refactored 32-bit Neon decoder, so move them all
> to translate.h since it makes more sense to keep them in
> one group.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/translate.h     | 17 +++++++++++++++++
>  target/arm/translate-a64.c | 17 -----------------
>  2 files changed, 17 insertions(+), 17 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 23/36] target/arm: Convert Neon 64-bit element 3-reg-same insns
  2020-04-30 18:09 ` [PATCH 23/36] target/arm: Convert Neon 64-bit element 3-reg-same insns Peter Maydell
@ 2020-04-30 20:54   ` Richard Henderson
  2020-05-01 15:36     ` Peter Maydell
  2020-05-01 15:54     ` Peter Maydell
  0 siblings, 2 replies; 85+ messages in thread
From: Richard Henderson @ 2020-04-30 20:54 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 4/30/20 11:09 AM, Peter Maydell wrote:
> +
> +    rn = tcg_temp_new_i64();
> +    rm = tcg_temp_new_i64();
> +    rd = tcg_temp_new_i64();
> +
> +    for (pass = 0; pass < (a->q ? 2 : 1); pass++) {
> +        neon_load_reg64(rn, a->vn + pass);
> +        neon_load_reg64(rm, a->vm + pass);
> +        fn(rd, rm, rn);
> +        neon_store_reg64(rd, a->vd + pass);
> +    }
> +
> +    tcg_temp_free_i64(rn);
> +    tcg_temp_free_i64(rm);
> +    tcg_temp_free_i64(rd);
> +
> +    return true;
> +}
> +
> +#define DO_3SAME_64(INSN, FUNC)                                         \
> +    static bool trans_##INSN##_3s(DisasContext *s, arg_3same *a)        \
> +    {                                                                   \
> +        return do_3same_64(s, a, FUNC);                                 \
> +    }

You can morph this into the gvec interface like so:

#define DO_3SAME_64(INSN, FUNC) \
    static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs,
                                uint32_t rn_ofs, uint32_t rm_ofs,
                                uint32_t oprsz, uint32_t maxsz)
    {
        static const GVecGen3 op = { .fni8 = FUNC };
        tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs,
                       oprsz, maxsz, &op);
    }
    DO_3SAME(INSN, gen_##INSN##_3s)

The .fni8 function tells gvec that we have a helper that processes the
operation in 8 byte chunks.  It will handle the pass loop for you.

There's also a .fni4 member, for those neon helpers that operate on 4-byte
quantities, fwiw.


r~


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 24/36] target/arm: Convert Neon VHADD 3-reg-same insns
  2020-04-30 18:09 ` [PATCH 24/36] target/arm: Convert Neon VHADD " Peter Maydell
@ 2020-04-30 20:59   ` Richard Henderson
  0 siblings, 0 replies; 85+ messages in thread
From: Richard Henderson @ 2020-04-30 20:59 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 4/30/20 11:09 AM, Peter Maydell wrote:
> +    for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
> +        tmp = neon_load_reg(a->vn, pass);
> +        tmp2 = neon_load_reg(a->vm, pass);
> +        fn(tmp, tmp, tmp2);
> +        tcg_temp_free_i32(tmp2);
> +        neon_store_reg(a->vd, pass, tmp);
> +    }
> +    return true;
> +}
> +
> +#define DO_3SAME_32(INSN, func)                                         \
> +    static bool trans_##INSN##_S_3s(DisasContext *s, arg_3same *a)      \
> +    {                                                                   \
> +        static NeonGenTwoOpFn * const fns[] = {                         \
> +            gen_helper_neon_##func##_s8,                                \
> +            gen_helper_neon_##func##_s16,                               \
> +            gen_helper_neon_##func##_s32,                               \
> +        };                                                              \
> +        if (a->size > 2) {                                              \
> +            return false;                                               \
> +        }                                                               \
> +        return do_3same_32(s, a, fns[a->size]);                         \
> +    }                                                                   \

Right, I just talked about the .fni4 hook vs the DO_3SAME_64 patch.


r~


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 26/36] target/arm: Convert Neon VQSHL, VRSHL, VQRSHL 3-reg-same insns to decodetree
  2020-04-30 18:09 ` [PATCH 26/36] target/arm: Convert Neon VQSHL, VRSHL, VQRSHL " Peter Maydell
@ 2020-05-01  1:55   ` Richard Henderson
  2020-05-01 18:10     ` Peter Maydell
  0 siblings, 1 reply; 85+ messages in thread
From: Richard Henderson @ 2020-05-01  1:55 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 4/30/20 11:09 AM, Peter Maydell wrote:
> +static bool do_3same_qs32(DisasContext *s, arg_3same *a, NeonGenTwoOpEnvFn *fn)
> +{
> +    /*
> +     * Saturating shift operations handled elementwise 32 bits at a
> +     * time which need to pass cpu_env to the helper and where the rn
> +     * and rm operands are reversed from the usual do_3same() order.
> +     */

Perhaps better to handle this as you did in "Convert Neon 64-bit element
3-reg-same insns", by adding a shim expander that adds env?

It would appear we can then merge

> +{
> +  VQSHL_S64_3s   1111 001 0 0 . .. .... .... 0100 . . . 1 .... @3same_64
> +  VQSHL_S_3s     1111 001 0 0 . .. .... .... 0100 . . . 1 .... @3same
> +}

back into a single pattern:

void gen_gvec_srshl(unsigned vece, uint32_t rd_ofs,
                    uint32_t rn_ofs, uint32_t rm_ofs,
                    uint32_t oprsz, uint32_t maxsz)
{
    static const GVecGen3 ops[4] = {
        { .fni4 = gen_helper_neon_rshl_s8 },
        { .fni4 = gen_helper_neon_rshl_s16 },
        { .fni4 = gen_helper_neon_rshl_s32 },
        { .fni8 = gen_helper_neon_rshl_s64 }
    };
    tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs,
                   oprsz, maxsz, &ops[vece]);
}

I'm not 100% sure how best to handle the swapped operands issue.  I don't think
we want to do it here in gen_gvec_srshl, because we don't have the same reverse
operand problem in the aarch64 encoding, and I'm looking forward to re-using
this generator function in aa64 and sve2.

Maybe it would be better to have

@3same     .... ... . . . size:2 .... .... .... . q:1 . . .... \
           &3same vm=%vm_dp vn=%vn_dp vd=%vd_dp
@3same_rev .... ... . . . size:2 .... .... .... . q:1 . . .... \
           &3same vn=%vm_dp vm=%vn_dp vd=%vd_dp

and swap the operands to "normal" during decode.

FWIW, over in sve.decode, I prepared for reversed operands from the start (to
handle things like SUBR), so the formats have the register names in order:
@rd_rn_rm vs @rd_rm_rn.


r~


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 27/36] target/arm: Convert Neon VABA 3-reg-same to decodetree
  2020-04-30 18:09 ` [PATCH 27/36] target/arm: Convert Neon VABA 3-reg-same " Peter Maydell
@ 2020-05-01  2:29   ` Richard Henderson
  0 siblings, 0 replies; 85+ messages in thread
From: Richard Henderson @ 2020-05-01  2:29 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 4/30/20 11:09 AM, Peter Maydell wrote:
> +    for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
> +        tmp = neon_load_reg(a->vn, pass);
> +        tmp2 = neon_load_reg(a->vm, pass);
> +        abd_fn(tmp, tmp, tmp2);
> +        tcg_temp_free_i32(tmp2);
> +        tmp2 = neon_load_reg(a->vd, pass);
> +        add_fn(tmp, tmp, tmp2);
> +        tcg_temp_free_i32(tmp2);
> +        neon_store_reg(a->vd, pass, tmp);
> +    }
> +    return true;
> +}
> +
> +static bool trans_VABA_S_3s(DisasContext *s, arg_3same *a)
> +{
> +    static NeonGenTwoOpFn * const abd_fns[] = {
> +        gen_helper_neon_abd_s8,
> +        gen_helper_neon_abd_s16,
> +        gen_helper_neon_abd_s32,
> +    };
> +    static NeonGenTwoOpFn * const add_fns[] = {
> +        gen_helper_neon_add_u8,
> +        gen_helper_neon_add_u16,
> +        tcg_gen_add_i32,
> +    };

This can be packaged into one operation.  E.g.

static void gen_aba_s8(TCGv_i32 d, TCGv_i32 n, TCGv_i32 m)
{
    TCGv_i32 t = tcg_temp_new_i32();

    gen_helper_neon_abd_s8(t, n, m);
    gen_helper_neon_add_u8(d, d, t);
    tcg_temp_free_i32(t);gen_aba_s8
}

static const GVecGen3 op = {
    .fni4 = gen_aba_s8,
    .load_dest = true
};

etc.

FWIW, this is one that I've fully converted on my sve2 branch.  aba(n,m,a) =
max(n,m) - min(n,m) + a -- four fully vectorized operations.  So anything that
allows a drop-in replacement would be nice.  But whatever is easiest for you.


r~


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 28/36] target/arm: Convert Neon VPMAX/VPMIN 3-reg-same insns to decodetree
  2020-04-30 18:09 ` [PATCH 28/36] target/arm: Convert Neon VPMAX/VPMIN 3-reg-same insns " Peter Maydell
@ 2020-05-01  3:36   ` Richard Henderson
  0 siblings, 0 replies; 85+ messages in thread
From: Richard Henderson @ 2020-05-01  3:36 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 4/30/20 11:09 AM, Peter Maydell wrote:
> Convert the Neon integer VPMAX and VPMIN 3-reg-same insns to
> decodetree. These are 'pairwise' operations.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/translate-neon.inc.c | 71 +++++++++++++++++++++++++++++++++
>  target/arm/translate.c          | 16 +-------
>  target/arm/neon-dp.decode       |  9 +++++
>  3 files changed, 82 insertions(+), 14 deletions(-)


Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 29/36] target/arm: Convert Neon VPADD 3-reg-same insns to decodetree
  2020-04-30 18:09 ` [PATCH 29/36] target/arm: Convert Neon VPADD " Peter Maydell
@ 2020-05-01  3:39   ` Richard Henderson
  0 siblings, 0 replies; 85+ messages in thread
From: Richard Henderson @ 2020-05-01  3:39 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 4/30/20 11:09 AM, Peter Maydell wrote:
> Convert the Neon integer VPADD 3-reg-same insns to decodetree.  These
> are 'pairwise' operations.  (Note that VQRDMLAH, which shares the
> same primary opcode but has U=1, has already been converted.)
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/translate-neon.inc.c |  2 ++
>  target/arm/translate.c          | 19 +------------------
>  target/arm/neon-dp.decode       |  2 ++
>  3 files changed, 5 insertions(+), 18 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 30/36] target/arm: Convert Neon VQDMULH/VQRDMULH 3-reg-same to decodetree
  2020-04-30 18:09 ` [PATCH 30/36] target/arm: Convert Neon VQDMULH/VQRDMULH 3-reg-same " Peter Maydell
@ 2020-05-01  3:47   ` Richard Henderson
  0 siblings, 0 replies; 85+ messages in thread
From: Richard Henderson @ 2020-05-01  3:47 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 4/30/20 11:09 AM, Peter Maydell wrote:
> Convert the Neon VQDMULH and VQRDMULH 3-reg-same insns to
> decodetree. These are the last integer operations in the
> 3-reg-same group.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/translate-neon.inc.c | 44 +++++++++++++++++++++++++++++++++
>  target/arm/translate.c          | 24 +-----------------
>  target/arm/neon-dp.decode       |  3 +++
>  3 files changed, 48 insertions(+), 23 deletions(-)

Modulo the other do_3same_32 comments,

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 31/36] target/arm: Convert Neon VADD, VSUB, VABD 3-reg-same insns to decodetree
  2020-04-30 18:09 ` [PATCH 31/36] target/arm: Convert Neon VADD, VSUB, VABD 3-reg-same insns " Peter Maydell
@ 2020-05-01  3:57   ` Richard Henderson
  0 siblings, 0 replies; 85+ messages in thread
From: Richard Henderson @ 2020-05-01  3:57 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 4/30/20 11:09 AM, Peter Maydell wrote:
> +    TCGv_ptr fpstatus = get_fpstatus_ptr(1);
> +    for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
> +        tmp = neon_load_reg(a->vn, pass);
> +        tmp2 = neon_load_reg(a->vm, pass);
> +        fn(tmp, tmp, tmp2, fpstatus);
> +        tcg_temp_free_i32(tmp2);
> +        neon_store_reg(a->vd, pass, tmp);
> +    }
> +    tcg_temp_free_ptr(fpstatus);
> +    return true;
> +}
> +
> +/*
> + * For all the functions using this macro, size == 1 means fp16,
> + * which is an architecture extension we don't implement yet.
> + */
> +#define DO_3S_FP(INSN,FUNC)                                         \
> +    static bool trans_##INSN##_fp_3s(DisasContext *s, arg_3same *a) \
> +    {                                                               \
> +        if (a->size != 0) {                                         \
> +            /* TODO fp16 support */                                 \
> +            return false;                                           \
> +        }                                                           \
> +        return do_3same_fp(s, a, FUNC);                             \
> +    }

We already have helper_gvec_fadd_s and helper_fsub_s to handle the whole vector
with one call.  Use with tcg_gen_gvec_3_ptr, with the status pointer as the 4th
argument.

Interestingly, I can't find the current use of this helper.  I must have been
starting on that translation but got stopped?  There's no current full-vector
helper for abd_f32, but it would take very few lines to add it.


r~


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 32/36] target/arm: Convert Neon VPMIN/VPMAX/VPADD float 3-reg-same insns to decodetree
  2020-04-30 18:09 ` [PATCH 32/36] target/arm: Convert Neon VPMIN/VPMAX/VPADD float " Peter Maydell
@ 2020-05-01  3:59   ` Richard Henderson
  0 siblings, 0 replies; 85+ messages in thread
From: Richard Henderson @ 2020-05-01  3:59 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 4/30/20 11:09 AM, Peter Maydell wrote:
> Convert the Neon float VPMIN, VPMAX and VPADD 3-reg-same insns to
> decodetree. These are the only remaining 'pairwise' operations,
> so we can delete the pairwise-specific bits of the old decoder's
> for-each-element loop now.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/translate-neon.inc.c | 63 +++++++++++++++++++++++++++++++++
>  target/arm/translate.c          | 63 +++++----------------------------
>  target/arm/neon-dp.decode       |  5 +++
>  3 files changed, 76 insertions(+), 55 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 33/36] target/arm: Convert Neon fp VMUL, VMLA, VMLS 3-reg-same insns to decodetree
  2020-04-30 18:10 ` [PATCH 33/36] target/arm: Convert Neon fp VMUL, VMLA, VMLS " Peter Maydell
@ 2020-05-01  4:07   ` Richard Henderson
  0 siblings, 0 replies; 85+ messages in thread
From: Richard Henderson @ 2020-05-01  4:07 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 4/30/20 11:10 AM, Peter Maydell wrote:
> Convert the Neon integer VMUL, VMLA, and VMLS 3-reg-same inssn to
> decodetree.
> 
> Since VMLA and VMLS accumulate into the destination register, we add
> a reads_vd parameter to do_3same_fp() which tells it to load the
> old value into vd before calling the callback function, in the same
> way that the translate-vfp.inc.c do_vfp_3op_sp() and do_vfp_3op_dp()
> functions work.
> 
> This conversion fixes in passing an underdecoding for VMUL
> (originally reported by Fredrik Strupe <fredrik@strupe.net>): bit 1
> of the 'size' field must be 0.  The old decoder didn't enforce this,
> but the decodetree pattern does.
> 
> The gen_VMLA_fp_reg() function performs the addition operation
> with the operands in the opposite order to the old decoder:
> since Neon sets 'default NaN mode' float32_add operations are
> commutative so there is no behaviour difference, but putting
> them this way around matches the Arm ARM pseudocode and the
> required operation order for the subtraction in gen_VMLS_fp_reg().
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/translate-neon.inc.c | 49 +++++++++++++++++++++++++++------
>  target/arm/translate.c          | 17 +-----------
>  target/arm/neon-dp.decode       |  3 ++
>  3 files changed, 44 insertions(+), 25 deletions(-)

Note that we do have helper_gvec_fmul_s, similar to fadd before, but currently
no mla.

Otherwise,

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 34/36] target/arm: Convert Neon 3-reg-same compare insns to decodetree
  2020-04-30 18:10 ` [PATCH 34/36] target/arm: Convert Neon 3-reg-same compare " Peter Maydell
@ 2020-05-01  4:09   ` Richard Henderson
  0 siblings, 0 replies; 85+ messages in thread
From: Richard Henderson @ 2020-05-01  4:09 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 4/30/20 11:10 AM, Peter Maydell wrote:
> Convert the Neon integer 3-reg-same compare insns VCGE, VCGT,
> VCEQ, VACGE and VACGT to decodetree.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/translate-neon.inc.c |  5 +++++
>  target/arm/translate.c          | 39 ++-------------------------------
>  target/arm/neon-dp.decode       |  5 +++++
>  3 files changed, 12 insertions(+), 37 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 35/36] target/arm: Convert Neon fp VMAX/VMIN/VMAXNM/VMINNM/VRECPS/VRSQRTS to decodetree
  2020-04-30 18:10 ` [PATCH 35/36] target/arm: Convert Neon fp VMAX/VMIN/VMAXNM/VMINNM/VRECPS/VRSQRTS " Peter Maydell
@ 2020-05-01  4:13   ` Richard Henderson
  0 siblings, 0 replies; 85+ messages in thread
From: Richard Henderson @ 2020-05-01  4:13 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 4/30/20 11:10 AM, Peter Maydell wrote:
> Convert the Neon fp VMAX/VMIN/VMAXNM/VMINNM/VRECPS/VRSQRTS 3-reg-same
> insns to decodetree. (These are all the remaining non-accumulation
> instructions in this group.)
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/translate-neon.inc.c | 60 +++++++++++++++++++++++++++++++++
>  target/arm/translate.c          | 42 ++---------------------
>  target/arm/neon-dp.decode       |  6 ++++
>  3 files changed, 68 insertions(+), 40 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 36/36] target/arm: Convert NEON VFMA, VFMS 3-reg-same insns to decodetree
  2020-04-30 18:10 ` [PATCH 36/36] target/arm: Convert NEON VFMA, VFMS 3-reg-same insns " Peter Maydell
@ 2020-05-01  4:14   ` Richard Henderson
  0 siblings, 0 replies; 85+ messages in thread
From: Richard Henderson @ 2020-05-01  4:14 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 4/30/20 11:10 AM, Peter Maydell wrote:
> Convert the Neon floating point VFMA and VFMS insn to decodetree.
> These are the last insns in the 3-reg-same group so we can
> remove all the support/loop code from the old decoder.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/translate-neon.inc.c |  41 ++++++++
>  target/arm/translate.c          | 176 +-------------------------------
>  target/arm/neon-dp.decode       |   3 +
>  3 files changed, 46 insertions(+), 174 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 00/36] target/arm: Convert Neon to decodetree (part 1)
  2020-04-30 18:09 [PATCH 00/36] target/arm: Convert Neon to decodetree (part 1) Peter Maydell
                   ` (35 preceding siblings ...)
  2020-04-30 18:10 ` [PATCH 36/36] target/arm: Convert NEON VFMA, VFMS 3-reg-same insns " Peter Maydell
@ 2020-05-01  7:32 ` no-reply
  2020-05-04 12:04 ` Peter Maydell
  37 siblings, 0 replies; 85+ messages in thread
From: no-reply @ 2020-05-01  7:32 UTC (permalink / raw)
  To: peter.maydell; +Cc: qemu-arm, richard.henderson, qemu-devel

Patchew URL: https://patchew.org/QEMU/20200430181003.21682-1-peter.maydell@linaro.org/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Message-id: 20200430181003.21682-1-peter.maydell@linaro.org
Subject: [PATCH 00/36] target/arm: Convert Neon to decodetree (part 1)
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
dda62df target/arm: Convert NEON VFMA, VFMS 3-reg-same insns to decodetree
9920691 target/arm: Convert Neon fp VMAX/VMIN/VMAXNM/VMINNM/VRECPS/VRSQRTS to decodetree
c81e71a target/arm: Convert Neon 3-reg-same compare insns to decodetree
e02542d target/arm: Convert Neon fp VMUL, VMLA, VMLS 3-reg-same insns to decodetree
f7c2ba7 target/arm: Convert Neon VPMIN/VPMAX/VPADD float 3-reg-same insns to decodetree
f09c375 target/arm: Convert Neon VADD, VSUB, VABD 3-reg-same insns to decodetree
c80424b target/arm: Convert Neon VQDMULH/VQRDMULH 3-reg-same to decodetree
b5595cc target/arm: Convert Neon VPADD 3-reg-same insns to decodetree
e8df4c7 target/arm: Convert Neon VPMAX/VPMIN 3-reg-same insns to decodetree
ca8faad target/arm: Convert Neon VABA 3-reg-same to decodetree
6672323 target/arm: Convert Neon VQSHL, VRSHL, VQRSHL 3-reg-same insns to decodetree
3337f5d target/arm: Convert Neon VRHADD, VHSUB, VABD 3-reg-same insns to decodetree
e28906c target/arm: Convert Neon VHADD 3-reg-same insns
c0f2111 target/arm: Convert Neon 64-bit element 3-reg-same insns
e07354a target/arm: Move gen_ function typedefs to translate.h
a9c75a1 target/arm: Convert Neon 3-reg-same SHA to decodetree
d327f96 target/arm: Convert Neon 3-reg-same VQRDMLAH/VQRDMLSH to decodetree
ff01a94 target/arm: Convert Neon 3-reg-same VMUL, VMLA, VMLS, VSHL to decodetree
40471d2 target/arm: Convert Neon 3-reg-same VQADD/VQSUB to decodetree
e040a02 target/arm: Convert Neon 3-reg-same comparisons to decodetree
7bc446d target/arm: Convert Neon 3-reg-same VMAX/VMIN to decodetree
a27e8ba target/arm: Convert Neon 3-reg-same logic ops to decodetree
3a38072 target/arm: Convert Neon 3-reg-same VADD/VSUB to decodetree
96ba901 target/arm: Convert Neon 'load/store single structure' to decodetree
aaa004f target/arm: Convert Neon 'load single structure to all lanes' to decodetree
29e5dea target/arm: Convert Neon load/store multiple structures to decodetree
0a891d9 target/arm: Convert VFM[AS]L (scalar) to decodetree
bfd4b39 target/arm: Convert V[US]DOT (scalar) to decodetree
d4e50c7 target/arm: Convert VCMLA (scalar) to decodetree
66295d9 target/arm: Convert VFM[AS]L (vector) to decodetree
2139cca target/arm: Convert V[US]DOT (vector) to decodetree
8c99e49 target/arm: Convert VCADD (vector) to decodetree
1877996 target/arm: Convert VCMLA (vector) to decodetree
7cf4ee8 target/arm: Add stubs for AArch32 Neon decodetree
bc4a94c target/arm: Don't allow Thumb Neon insns without FEATURE_NEON
38340b0 target/arm/translate-vfp.inc.c: Remove duplicate simd_r32 check

=== OUTPUT BEGIN ===
1/36 Checking commit 38340b07257e (target/arm/translate-vfp.inc.c: Remove duplicate simd_r32 check)
2/36 Checking commit bc4a94c2f447 (target/arm: Don't allow Thumb Neon insns without FEATURE_NEON)
3/36 Checking commit 7cf4ee82da7a (target/arm: Add stubs for AArch32 Neon decodetree)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#67: 
new file mode 100644

total: 0 errors, 1 warnings, 208 lines checked

Patch 3/36 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
4/36 Checking commit 1877996e0e6a (target/arm: Convert VCMLA (vector) to decodetree)
5/36 Checking commit 8c99e490843c (target/arm: Convert VCADD (vector) to decodetree)
6/36 Checking commit 2139ccaa801e (target/arm: Convert V[US]DOT (vector) to decodetree)
7/36 Checking commit 66295d9f8f44 (target/arm: Convert VFM[AS]L (vector) to decodetree)
8/36 Checking commit d4e50c793916 (target/arm: Convert VCMLA (scalar) to decodetree)
9/36 Checking commit bfd4b39ba7d7 (target/arm: Convert V[US]DOT (scalar) to decodetree)
10/36 Checking commit 0a891d9b9d00 (target/arm: Convert VFM[AS]L (scalar) to decodetree)
11/36 Checking commit 29e5dea9adba (target/arm: Convert Neon load/store multiple structures to decodetree)
12/36 Checking commit aaa004fcacb2 (target/arm: Convert Neon 'load single structure to all lanes' to decodetree)
13/36 Checking commit 96ba9016e79b (target/arm: Convert Neon 'load/store single structure' to decodetree)
14/36 Checking commit 3a38072d19d7 (target/arm: Convert Neon 3-reg-same VADD/VSUB to decodetree)
15/36 Checking commit a27e8bab0827 (target/arm: Convert Neon 3-reg-same logic ops to decodetree)
16/36 Checking commit 7bc446da4a45 (target/arm: Convert Neon 3-reg-same VMAX/VMIN to decodetree)
17/36 Checking commit e040a02db745 (target/arm: Convert Neon 3-reg-same comparisons to decodetree)
18/36 Checking commit 40471d203e61 (target/arm: Convert Neon 3-reg-same VQADD/VQSUB to decodetree)
19/36 Checking commit ff01a94e5d20 (target/arm: Convert Neon 3-reg-same VMUL, VMLA, VMLS, VSHL to decodetree)
WARNING: Block comments use a leading /* on a separate line
#88: FILE: target/arm/translate-neon.inc.c:707:
+        /* Note the operation is vshl vd,vm,vn */                       \

total: 0 errors, 1 warnings, 111 lines checked

Patch 19/36 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
20/36 Checking commit d327f96131a7 (target/arm: Convert Neon 3-reg-same VQRDMLAH/VQRDMLSH to decodetree)
21/36 Checking commit a9c75a1d4182 (target/arm: Convert Neon 3-reg-same SHA to decodetree)
22/36 Checking commit e07354aca256 (target/arm: Move gen_ function typedefs to translate.h)
23/36 Checking commit c0f2111a7739 (target/arm: Convert Neon 64-bit element 3-reg-same insns)
24/36 Checking commit e28906c17abf (target/arm: Convert Neon VHADD 3-reg-same insns)
25/36 Checking commit 3337f5dfa7d6 (target/arm: Convert Neon VRHADD, VHSUB, VABD 3-reg-same insns to decodetree)
26/36 Checking commit 6672323224ab (target/arm: Convert Neon VQSHL, VRSHL, VQRSHL 3-reg-same insns to decodetree)
ERROR: space required after that ',' (ctx:VxV)
#133: FILE: target/arm/translate-neon.inc.c:1105:
+DO_3SAME_QS32(VQSHL_S,qshl_s)
                      ^

ERROR: space required after that ',' (ctx:VxV)
#134: FILE: target/arm/translate-neon.inc.c:1106:
+DO_3SAME_QS32(VQSHL_U,qshl_u)
                      ^

ERROR: space required after that ',' (ctx:VxV)
#135: FILE: target/arm/translate-neon.inc.c:1107:
+DO_3SAME_QS32(VQRSHL_S,qrshl_s)
                       ^

ERROR: space required after that ',' (ctx:VxV)
#136: FILE: target/arm/translate-neon.inc.c:1108:
+DO_3SAME_QS32(VQRSHL_U,qrshl_u)
                       ^

WARNING: Block comments use a leading /* on a separate line
#150: FILE: target/arm/translate-neon.inc.c:1122:
+        /* Shift operand order is reversed */                           \

total: 4 errors, 1 warnings, 173 lines checked

Patch 26/36 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

27/36 Checking commit ca8faade49db (target/arm: Convert Neon VABA 3-reg-same to decodetree)
28/36 Checking commit e8df4c7e9300 (target/arm: Convert Neon VPMAX/VPMIN 3-reg-same insns to decodetree)
29/36 Checking commit b5595ccef501 (target/arm: Convert Neon VPADD 3-reg-same insns to decodetree)
30/36 Checking commit c80424b4f834 (target/arm: Convert Neon VQDMULH/VQRDMULH 3-reg-same to decodetree)
31/36 Checking commit f09c375fa01f (target/arm: Convert Neon VADD, VSUB, VABD 3-reg-same insns to decodetree)
ERROR: space required after that ',' (ctx:VxV)
#84: FILE: target/arm/translate-neon.inc.c:1365:
+#define DO_3S_FP(INSN,FUNC)                                         \
                      ^

WARNING: Block comments use a leading /* on a separate line
#88: FILE: target/arm/translate-neon.inc.c:1369:
+            /* TODO fp16 support */                                 \

total: 1 errors, 1 warnings, 99 lines checked

Patch 31/36 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

32/36 Checking commit f7c2ba79ec7a (target/arm: Convert Neon VPMIN/VPMAX/VPADD float 3-reg-same insns to decodetree)
ERROR: space required after that ',' (ctx:VxV)
#94: FILE: target/arm/translate-neon.inc.c:1428:
+#define DO_3S_FP_PAIR(INSN,FUNC)                                    \
                           ^

WARNING: Block comments use a leading /* on a separate line
#98: FILE: target/arm/translate-neon.inc.c:1432:
+            /* TODO fp16 support */                                 \

ERROR: suspect code indent for conditional statements (8, 8)
#156: FILE: target/arm/translate.c:4828:
         for (pass = 0; pass < (q ? 4 : 2); pass++) {
[...]
+        /* Elementwise.  */

total: 2 errors, 1 warnings, 181 lines checked

Patch 32/36 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

33/36 Checking commit e02542d0d861 (target/arm: Convert Neon fp VMUL, VMLA, VMLS 3-reg-same insns to decodetree)
ERROR: space required after that ',' (ctx:VxV)
#90: FILE: target/arm/translate-neon.inc.c:1378:
+#define DO_3S_FP(INSN,FUNC,READS_VD)                                \
                      ^

ERROR: space required after that ',' (ctx:VxV)
#90: FILE: target/arm/translate-neon.inc.c:1378:
+#define DO_3S_FP(INSN,FUNC,READS_VD)                                \
                           ^

total: 2 errors, 0 warnings, 114 lines checked

Patch 33/36 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

34/36 Checking commit c81e71ad337e (target/arm: Convert Neon 3-reg-same compare insns to decodetree)
35/36 Checking commit 992069157a4b (target/arm: Convert Neon fp VMAX/VMIN/VMAXNM/VMINNM/VRECPS/VRSQRTS to decodetree)
36/36 Checking commit dda62df05bcf (target/arm: Convert NEON VFMA, VFMS 3-reg-same insns to decodetree)
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20200430181003.21682-1-peter.maydell@linaro.org/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 20/36] target/arm: Convert Neon 3-reg-same VQRDMLAH/VQRDMLSH to decodetree
  2020-04-30 20:28   ` Richard Henderson
@ 2020-05-01 14:23     ` Peter Maydell
  0 siblings, 0 replies; 85+ messages in thread
From: Peter Maydell @ 2020-05-01 14:23 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-arm, QEMU Developers

On Thu, 30 Apr 2020 at 21:28, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 4/30/20 11:09 AM, Peter Maydell wrote:
> > These don't use do_3same() because they want to
> > operate on VFP double registers, whose offsets are different from the
> > neon_reg_offset() calculations do_3same does.
>
> Actually, no, it's an around the bush way of computing the same register offset.

So it is. I could have sworn I'd written this using
do_3same first time around and found it didn't work,
but maybe I'm misremembering a change I had to make to
some other patch.

thanks
-- PMM


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 23/36] target/arm: Convert Neon 64-bit element 3-reg-same insns
  2020-04-30 20:54   ` Richard Henderson
@ 2020-05-01 15:36     ` Peter Maydell
  2020-05-01 15:50       ` Richard Henderson
  2020-05-01 15:54     ` Peter Maydell
  1 sibling, 1 reply; 85+ messages in thread
From: Peter Maydell @ 2020-05-01 15:36 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-arm, QEMU Developers

On Thu, 30 Apr 2020 at 21:54, Richard Henderson
<richard.henderson@linaro.org> wrote:
> You can morph this into the gvec interface like so:
>
> #define DO_3SAME_64(INSN, FUNC) \
>     static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs,
>                                 uint32_t rn_ofs, uint32_t rm_ofs,
>                                 uint32_t oprsz, uint32_t maxsz)
>     {
>         static const GVecGen3 op = { .fni8 = FUNC };
>         tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs,
>                        oprsz, maxsz, &op);
>     }
>     DO_3SAME(INSN, gen_##INSN##_3s)
>
> The .fni8 function tells gvec that we have a helper that processes the
> operation in 8 byte chunks.  It will handle the pass loop for you.
>
> There's also a .fni4 member, for those neon helpers that operate on 4-byte
> quantities, fwiw.

Is there a version of this that works on functions that need
to be passed the cpu_env, or do I have to create a trampoline
function that just calls the real helper function passing it
the extra argument ?

thanks
-- PMM


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 23/36] target/arm: Convert Neon 64-bit element 3-reg-same insns
  2020-05-01 15:36     ` Peter Maydell
@ 2020-05-01 15:50       ` Richard Henderson
  2020-05-01 15:57         ` Peter Maydell
  0 siblings, 1 reply; 85+ messages in thread
From: Richard Henderson @ 2020-05-01 15:50 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-arm, QEMU Developers

On 5/1/20 8:36 AM, Peter Maydell wrote:
> On Thu, 30 Apr 2020 at 21:54, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>> You can morph this into the gvec interface like so:
>>
>> #define DO_3SAME_64(INSN, FUNC) \
>>     static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs,
>>                                 uint32_t rn_ofs, uint32_t rm_ofs,
>>                                 uint32_t oprsz, uint32_t maxsz)
>>     {
>>         static const GVecGen3 op = { .fni8 = FUNC };
>>         tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs,
>>                        oprsz, maxsz, &op);
>>     }
>>     DO_3SAME(INSN, gen_##INSN##_3s)
>>
>> The .fni8 function tells gvec that we have a helper that processes the
>> operation in 8 byte chunks.  It will handle the pass loop for you.
>>
>> There's also a .fni4 member, for those neon helpers that operate on 4-byte
>> quantities, fwiw.
> 
> Is there a version of this that works on functions that need
> to be passed the cpu_env, or do I have to create a trampoline
> function that just calls the real helper function passing it
> the extra argument ?

A trampoline is required.

The original intention of the hook is to expand some inline tcg ops.  That it
can be used to call a helper is a happy accident.  For a helper that needs env,
ideally we would use tcg_gen_gvec_ptr and handle the vector with one call.


r~


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 23/36] target/arm: Convert Neon 64-bit element 3-reg-same insns
  2020-04-30 20:54   ` Richard Henderson
  2020-05-01 15:36     ` Peter Maydell
@ 2020-05-01 15:54     ` Peter Maydell
  2020-05-01 16:13       ` Richard Henderson
  1 sibling, 1 reply; 85+ messages in thread
From: Peter Maydell @ 2020-05-01 15:54 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-arm, QEMU Developers

On Thu, 30 Apr 2020 at 21:54, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 4/30/20 11:09 AM, Peter Maydell wrote:
> > +
> > +    rn = tcg_temp_new_i64();
> > +    rm = tcg_temp_new_i64();
> > +    rd = tcg_temp_new_i64();
> > +
> > +    for (pass = 0; pass < (a->q ? 2 : 1); pass++) {
> > +        neon_load_reg64(rn, a->vn + pass);
> > +        neon_load_reg64(rm, a->vm + pass);
> > +        fn(rd, rm, rn);
> > +        neon_store_reg64(rd, a->vd + pass);
> > +    }
> > +
> > +    tcg_temp_free_i64(rn);
> > +    tcg_temp_free_i64(rm);
> > +    tcg_temp_free_i64(rd);
> > +
> > +    return true;
> > +}
> > +
> > +#define DO_3SAME_64(INSN, FUNC)                                         \
> > +    static bool trans_##INSN##_3s(DisasContext *s, arg_3same *a)        \
> > +    {                                                                   \
> > +        return do_3same_64(s, a, FUNC);                                 \
> > +    }
>
> You can morph this into the gvec interface like so:
>
> #define DO_3SAME_64(INSN, FUNC) \
>     static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs,
>                                 uint32_t rn_ofs, uint32_t rm_ofs,
>                                 uint32_t oprsz, uint32_t maxsz)
>     {
>         static const GVecGen3 op = { .fni8 = FUNC };
>         tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs,
>                        oprsz, maxsz, &op);
>     }
>     DO_3SAME(INSN, gen_##INSN##_3s)
>
> The .fni8 function tells gvec that we have a helper that processes the
> operation in 8 byte chunks.  It will handle the pass loop for you.

This doesn't quite work, because these are shift ops and
so the operands are passed to the helper in the order
rd, rm, rn. Reshuffling the order of arguments to
tcg_gen_gvec_3() fixes this, though.

I guess I should call the macro DO_3SAME_SHIFT64, I hadn't
noticed it was shift specific because the only thing we do
with it is shifts.

thanks
-- PMM


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 23/36] target/arm: Convert Neon 64-bit element 3-reg-same insns
  2020-05-01 15:50       ` Richard Henderson
@ 2020-05-01 15:57         ` Peter Maydell
  2020-05-01 16:12           ` Richard Henderson
  0 siblings, 1 reply; 85+ messages in thread
From: Peter Maydell @ 2020-05-01 15:57 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-arm, QEMU Developers

On Fri, 1 May 2020 at 16:50, Richard Henderson
<richard.henderson@linaro.org> wrote:
> The original intention of the hook is to expand some inline tcg ops.  That it
> can be used to call a helper is a happy accident.  For a helper that needs env,
> ideally we would use tcg_gen_gvec_ptr and handle the vector with one call.

The inconsistency where half the helpers nede to be passed cpu_env
and the other half don't is really irritating for writing code
that calls them. Lots of ought-to-be-common code ends up needing
two versions :-(

-- PMM


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 23/36] target/arm: Convert Neon 64-bit element 3-reg-same insns
  2020-05-01 15:57         ` Peter Maydell
@ 2020-05-01 16:12           ` Richard Henderson
  0 siblings, 0 replies; 85+ messages in thread
From: Richard Henderson @ 2020-05-01 16:12 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-arm, QEMU Developers

On 5/1/20 8:57 AM, Peter Maydell wrote:
> On Fri, 1 May 2020 at 16:50, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>> The original intention of the hook is to expand some inline tcg ops.  That it
>> can be used to call a helper is a happy accident.  For a helper that needs env,
>> ideally we would use tcg_gen_gvec_ptr and handle the vector with one call.
> 
> The inconsistency where half the helpers nede to be passed cpu_env
> and the other half don't is really irritating for writing code
> that calls them. Lots of ought-to-be-common code ends up needing
> two versions :-(

Yep.  Lots of room for additional cleanup here.


r~


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 23/36] target/arm: Convert Neon 64-bit element 3-reg-same insns
  2020-05-01 15:54     ` Peter Maydell
@ 2020-05-01 16:13       ` Richard Henderson
  0 siblings, 0 replies; 85+ messages in thread
From: Richard Henderson @ 2020-05-01 16:13 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-arm, QEMU Developers

On 5/1/20 8:54 AM, Peter Maydell wrote:
> On Thu, 30 Apr 2020 at 21:54, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>>
>> On 4/30/20 11:09 AM, Peter Maydell wrote:
>>> +
>>> +    rn = tcg_temp_new_i64();
>>> +    rm = tcg_temp_new_i64();
>>> +    rd = tcg_temp_new_i64();
>>> +
>>> +    for (pass = 0; pass < (a->q ? 2 : 1); pass++) {
>>> +        neon_load_reg64(rn, a->vn + pass);
>>> +        neon_load_reg64(rm, a->vm + pass);
>>> +        fn(rd, rm, rn);
>>> +        neon_store_reg64(rd, a->vd + pass);
>>> +    }
>>> +
>>> +    tcg_temp_free_i64(rn);
>>> +    tcg_temp_free_i64(rm);
>>> +    tcg_temp_free_i64(rd);
>>> +
>>> +    return true;
>>> +}
>>> +
>>> +#define DO_3SAME_64(INSN, FUNC)                                         \
>>> +    static bool trans_##INSN##_3s(DisasContext *s, arg_3same *a)        \
>>> +    {                                                                   \
>>> +        return do_3same_64(s, a, FUNC);                                 \
>>> +    }
>>
>> You can morph this into the gvec interface like so:
>>
>> #define DO_3SAME_64(INSN, FUNC) \
>>     static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs,
>>                                 uint32_t rn_ofs, uint32_t rm_ofs,
>>                                 uint32_t oprsz, uint32_t maxsz)
>>     {
>>         static const GVecGen3 op = { .fni8 = FUNC };
>>         tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs,
>>                        oprsz, maxsz, &op);
>>     }
>>     DO_3SAME(INSN, gen_##INSN##_3s)
>>
>> The .fni8 function tells gvec that we have a helper that processes the
>> operation in 8 byte chunks.  It will handle the pass loop for you.
> 
> This doesn't quite work, because these are shift ops and
> so the operands are passed to the helper in the order
> rd, rm, rn. Reshuffling the order of arguments to
> tcg_gen_gvec_3() fixes this, though.
> 
> I guess I should call the macro DO_3SAME_SHIFT64, I hadn't
> noticed it was shift specific because the only thing we do
> with it is shifts.

See my reply to patch 26.  I think we should swap these operands during decode.


r~



^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 01/36] target/arm/translate-vfp.inc.c: Remove duplicate simd_r32 check
  2020-04-30 18:09 ` [PATCH 01/36] target/arm/translate-vfp.inc.c: Remove duplicate simd_r32 check Peter Maydell
  2020-04-30 18:21   ` Richard Henderson
@ 2020-05-01 16:55   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 85+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-05-01 16:55 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel; +Cc: Richard Henderson


On 4/30/20 8:09 PM, Peter Maydell wrote:
> Somewhere along theline we accidentally added a duplicate

"the line"?

> "using D16-D31 when they don't exist" check to do_vfm_dp()
> (probably an artifact of a patchseries rebase). Remove it.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>   target/arm/translate-vfp.inc.c | 6 ------
>   1 file changed, 6 deletions(-)
> 
> diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
> index b087bbd812e..e1a90175983 100644
> --- a/target/arm/translate-vfp.inc.c
> +++ b/target/arm/translate-vfp.inc.c
> @@ -1872,12 +1872,6 @@ static bool do_vfm_dp(DisasContext *s, arg_VFMA_dp *a, bool neg_n, bool neg_d)
>           return false;
>       }
>   
> -    /* UNDEF accesses to D16-D31 if they don't exist. */
> -    if (!dc_isar_feature(aa32_simd_r32, s) &&
> -        ((a->vd | a->vn | a->vm) & 0x10)) {
> -        return false;
> -    }
> -
>       if (!vfp_access_check(s)) {
>           return true;
>       }
> 

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 02/36] target/arm: Don't allow Thumb Neon insns without FEATURE_NEON
  2020-04-30 18:09 ` [PATCH 02/36] target/arm: Don't allow Thumb Neon insns without FEATURE_NEON Peter Maydell
  2020-04-30 18:22   ` Richard Henderson
@ 2020-05-01 16:56   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 85+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-05-01 16:56 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel; +Cc: Richard Henderson

On 4/30/20 8:09 PM, Peter Maydell wrote:
> We were accidentally permitting decode of Thumb Neon insns even if
> the CPU didn't have the FEATURE_NEON bit set, because the feature
> check was being done before the call to disas_neon_data_insn() and
> disas_neon_ls_insn() in the Arm decoder but was omitted from the
> Thumb decoder.  Push the feature bit check down into the called
> functions so it is done for both Arm and Thumb encodings.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>   target/arm/translate.c | 16 ++++++++--------
>   1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/target/arm/translate.c b/target/arm/translate.c
> index d4ad2028f12..ab5324a5aaa 100644
> --- a/target/arm/translate.c
> +++ b/target/arm/translate.c
> @@ -3258,6 +3258,10 @@ static int disas_neon_ls_insn(DisasContext *s, uint32_t insn)
>       TCGv_i32 tmp2;
>       TCGv_i64 tmp64;
>   
> +    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
> +        return 1;
> +    }
> +
>       /* FIXME: this access check should not take precedence over UNDEF
>        * for invalid encodings; we will generate incorrect syndrome information
>        * for attempts to execute invalid vfp/neon encodings with FP disabled.
> @@ -5002,6 +5006,10 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
>       TCGv_ptr ptr1, ptr2, ptr3;
>       TCGv_i64 tmp64;
>   
> +    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
> +        return 1;
> +    }
> +
>       /* FIXME: this access check should not take precedence over UNDEF
>        * for invalid encodings; we will generate incorrect syndrome information
>        * for attempts to execute invalid vfp/neon encodings with FP disabled.
> @@ -10948,10 +10956,6 @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
>   
>           if (((insn >> 25) & 7) == 1) {
>               /* NEON Data processing.  */
> -            if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
> -                goto illegal_op;
> -            }
> -
>               if (disas_neon_data_insn(s, insn)) {
>                   goto illegal_op;
>               }
> @@ -10959,10 +10963,6 @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
>           }
>           if ((insn & 0x0f100000) == 0x04000000) {
>               /* NEON load/store.  */
> -            if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
> -                goto illegal_op;
> -            }
> -
>               if (disas_neon_ls_insn(s, insn)) {
>                   goto illegal_op;
>               }
> 

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 26/36] target/arm: Convert Neon VQSHL, VRSHL, VQRSHL 3-reg-same insns to decodetree
  2020-05-01  1:55   ` Richard Henderson
@ 2020-05-01 18:10     ` Peter Maydell
  0 siblings, 0 replies; 85+ messages in thread
From: Peter Maydell @ 2020-05-01 18:10 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-arm, QEMU Developers

On Fri, 1 May 2020 at 02:55, Richard Henderson
<richard.henderson@linaro.org> wrote:
> I'm not 100% sure how best to handle the swapped operands issue.  I don't think
> we want to do it here in gen_gvec_srshl, because we don't have the same reverse
> operand problem in the aarch64 encoding, and I'm looking forward to re-using
> this generator function in aa64 and sve2.
>
> Maybe it would be better to have
>
> @3same     .... ... . . . size:2 .... .... .... . q:1 . . .... \
>            &3same vm=%vm_dp vn=%vn_dp vd=%vd_dp
> @3same_rev .... ... . . . size:2 .... .... .... . q:1 . . .... \
>            &3same vn=%vm_dp vm=%vn_dp vd=%vd_dp
>
> and swap the operands to "normal" during decode.

Yeah, I guess so. It's a little confusing because the operands
are going to appear with the "wrong" names in the trans_ functions,
but we can hopefully deflect some of that with a suitable comment
by the @3same_rev format definition.

I think that all the affected insns have asm formats like
 VSHL <Dd>, <Dm>, <Dn>
in contrast to eg
 VSUB <Dd>, <Dn>, <Dm>

so it's effectively just that the field names in the official
insn definition are backwards from what you'd expect.

thanks
-- PMM


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 00/36] target/arm: Convert Neon to decodetree (part 1)
  2020-04-30 18:09 [PATCH 00/36] target/arm: Convert Neon to decodetree (part 1) Peter Maydell
                   ` (36 preceding siblings ...)
  2020-05-01  7:32 ` [PATCH 00/36] target/arm: Convert Neon to decodetree (part 1) no-reply
@ 2020-05-04 12:04 ` Peter Maydell
  37 siblings, 0 replies; 85+ messages in thread
From: Peter Maydell @ 2020-05-04 12:04 UTC (permalink / raw)
  To: qemu-arm, QEMU Developers; +Cc: Richard Henderson

On Thu, 30 Apr 2020 at 19:10, Peter Maydell <peter.maydell@linaro.org> wrote:
>
> This patchseries starts in on the job of converting the Arm
> Neon decoder to decodetree.
>
> Neon insns come in three major parts:
>  * the 'v8.0-and-later' extensions
>  * the 'loads and stores' group
>  * the 'data processing' group
>
> This patchset converts all of the v8.0-and-later extensions
> and the loads-and-stores, plus the "3-registers-same" subgroup
> of the data-processing insns.
>
> I'm working on the rest of the dp insns, but this seems like
> a pretty large chunk of conversion patches to start with.

I'm going to apply patches 1-19 and 22 (that's up to
"3-reg-same VMUL, VMLA, VMLS, VSHL", plus the "move gen function
typedefs" patch) to target-arm.next as they've been reviewed.
That will leave the 3-reg-same in a partially converted state
but it is not too confusingly so (or at least not much more
so than having the rest of the neon-dp group unconverted) and
I think that it will be easier to deal with this series and
the rest of the conversion if we get the completed parts into
the tree sooner rather than later.

thanks
-- PMM


^ permalink raw reply	[flat|nested] 85+ messages in thread

end of thread, other threads:[~2020-05-04 12:24 UTC | newest]

Thread overview: 85+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-30 18:09 [PATCH 00/36] target/arm: Convert Neon to decodetree (part 1) Peter Maydell
2020-04-30 18:09 ` [PATCH 01/36] target/arm/translate-vfp.inc.c: Remove duplicate simd_r32 check Peter Maydell
2020-04-30 18:21   ` Richard Henderson
2020-05-01 16:55   ` Philippe Mathieu-Daudé
2020-04-30 18:09 ` [PATCH 02/36] target/arm: Don't allow Thumb Neon insns without FEATURE_NEON Peter Maydell
2020-04-30 18:22   ` Richard Henderson
2020-05-01 16:56   ` Philippe Mathieu-Daudé
2020-04-30 18:09 ` [PATCH 03/36] target/arm: Add stubs for AArch32 Neon decodetree Peter Maydell
2020-04-30 18:30   ` Richard Henderson
2020-04-30 18:09 ` [PATCH 04/36] target/arm: Convert VCMLA (vector) to decodetree Peter Maydell
2020-04-30 18:34   ` Richard Henderson
2020-04-30 18:09 ` [PATCH 05/36] target/arm: Convert VCADD " Peter Maydell
2020-04-30 18:35   ` Richard Henderson
2020-04-30 18:09 ` [PATCH 06/36] target/arm: Convert V[US]DOT " Peter Maydell
2020-04-30 18:36   ` Richard Henderson
2020-04-30 18:09 ` [PATCH 07/36] target/arm: Convert VFM[AS]L " Peter Maydell
2020-04-30 18:43   ` Richard Henderson
2020-04-30 18:09 ` [PATCH 08/36] target/arm: Convert VCMLA (scalar) " Peter Maydell
2020-04-30 19:00   ` Richard Henderson
2020-04-30 18:09 ` [PATCH 09/36] target/arm: Convert V[US]DOT " Peter Maydell
2020-04-30 19:01   ` Richard Henderson
2020-04-30 18:09 ` [PATCH 10/36] target/arm: Convert VFM[AS]L " Peter Maydell
2020-04-30 19:06   ` Richard Henderson
2020-04-30 18:09 ` [PATCH 11/36] target/arm: Convert Neon load/store multiple structures " Peter Maydell
2020-04-30 19:09   ` Richard Henderson
2020-04-30 18:09 ` [PATCH 12/36] target/arm: Convert Neon 'load single structure to all lanes' " Peter Maydell
2020-04-30 19:17   ` Richard Henderson
2020-04-30 18:09 ` [PATCH 13/36] target/arm: Convert Neon 'load/store single structure' " Peter Maydell
2020-04-30 19:32   ` Richard Henderson
2020-04-30 18:09 ` [PATCH 14/36] target/arm: Convert Neon 3-reg-same VADD/VSUB " Peter Maydell
2020-04-30 19:36   ` Richard Henderson
2020-04-30 18:09 ` [PATCH 15/36] target/arm: Convert Neon 3-reg-same logic ops " Peter Maydell
2020-04-30 19:39   ` Richard Henderson
2020-04-30 18:09 ` [PATCH 16/36] target/arm: Convert Neon 3-reg-same VMAX/VMIN " Peter Maydell
2020-04-30 19:45   ` Richard Henderson
2020-04-30 18:09 ` [PATCH 17/36] target/arm: Convert Neon 3-reg-same comparisons " Peter Maydell
2020-04-30 19:48   ` Richard Henderson
2020-04-30 18:09 ` [PATCH 18/36] target/arm: Convert Neon 3-reg-same VQADD/VQSUB " Peter Maydell
2020-04-30 19:50   ` Richard Henderson
2020-04-30 18:09 ` [PATCH 19/36] target/arm: Convert Neon 3-reg-same VMUL, VMLA, VMLS, VSHL " Peter Maydell
2020-04-30 19:58   ` Richard Henderson
2020-04-30 18:09 ` [PATCH 20/36] target/arm: Convert Neon 3-reg-same VQRDMLAH/VQRDMLSH " Peter Maydell
2020-04-30 20:03   ` Richard Henderson
2020-04-30 20:28   ` Richard Henderson
2020-05-01 14:23     ` Peter Maydell
2020-04-30 18:09 ` [PATCH 21/36] target/arm: Convert Neon 3-reg-same SHA " Peter Maydell
2020-04-30 20:30   ` Richard Henderson
2020-04-30 18:09 ` [PATCH 22/36] target/arm: Move gen_ function typedefs to translate.h Peter Maydell
2020-04-30 20:32   ` Richard Henderson
2020-04-30 18:09 ` [PATCH 23/36] target/arm: Convert Neon 64-bit element 3-reg-same insns Peter Maydell
2020-04-30 20:54   ` Richard Henderson
2020-05-01 15:36     ` Peter Maydell
2020-05-01 15:50       ` Richard Henderson
2020-05-01 15:57         ` Peter Maydell
2020-05-01 16:12           ` Richard Henderson
2020-05-01 15:54     ` Peter Maydell
2020-05-01 16:13       ` Richard Henderson
2020-04-30 18:09 ` [PATCH 24/36] target/arm: Convert Neon VHADD " Peter Maydell
2020-04-30 20:59   ` Richard Henderson
2020-04-30 18:09 ` [PATCH 25/36] target/arm: Convert Neon VRHADD, VHSUB, VABD 3-reg-same insns to decodetree Peter Maydell
2020-04-30 18:09 ` [PATCH 26/36] target/arm: Convert Neon VQSHL, VRSHL, VQRSHL " Peter Maydell
2020-05-01  1:55   ` Richard Henderson
2020-05-01 18:10     ` Peter Maydell
2020-04-30 18:09 ` [PATCH 27/36] target/arm: Convert Neon VABA 3-reg-same " Peter Maydell
2020-05-01  2:29   ` Richard Henderson
2020-04-30 18:09 ` [PATCH 28/36] target/arm: Convert Neon VPMAX/VPMIN 3-reg-same insns " Peter Maydell
2020-05-01  3:36   ` Richard Henderson
2020-04-30 18:09 ` [PATCH 29/36] target/arm: Convert Neon VPADD " Peter Maydell
2020-05-01  3:39   ` Richard Henderson
2020-04-30 18:09 ` [PATCH 30/36] target/arm: Convert Neon VQDMULH/VQRDMULH 3-reg-same " Peter Maydell
2020-05-01  3:47   ` Richard Henderson
2020-04-30 18:09 ` [PATCH 31/36] target/arm: Convert Neon VADD, VSUB, VABD 3-reg-same insns " Peter Maydell
2020-05-01  3:57   ` Richard Henderson
2020-04-30 18:09 ` [PATCH 32/36] target/arm: Convert Neon VPMIN/VPMAX/VPADD float " Peter Maydell
2020-05-01  3:59   ` Richard Henderson
2020-04-30 18:10 ` [PATCH 33/36] target/arm: Convert Neon fp VMUL, VMLA, VMLS " Peter Maydell
2020-05-01  4:07   ` Richard Henderson
2020-04-30 18:10 ` [PATCH 34/36] target/arm: Convert Neon 3-reg-same compare " Peter Maydell
2020-05-01  4:09   ` Richard Henderson
2020-04-30 18:10 ` [PATCH 35/36] target/arm: Convert Neon fp VMAX/VMIN/VMAXNM/VMINNM/VRECPS/VRSQRTS " Peter Maydell
2020-05-01  4:13   ` Richard Henderson
2020-04-30 18:10 ` [PATCH 36/36] target/arm: Convert NEON VFMA, VFMS 3-reg-same insns " Peter Maydell
2020-05-01  4:14   ` Richard Henderson
2020-05-01  7:32 ` [PATCH 00/36] target/arm: Convert Neon to decodetree (part 1) no-reply
2020-05-04 12:04 ` Peter Maydell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.