qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Peter Maydell <peter.maydell@linaro.org>
To: qemu-devel@nongnu.org
Subject: [PULL 41/45] target/arm: Convert Neon fp VMUL, VMLA, VMLS 3-reg-same insns to decodetree
Date: Thu, 14 May 2020 15:21:34 +0100	[thread overview]
Message-ID: <20200514142138.20875-42-peter.maydell@linaro.org> (raw)
In-Reply-To: <20200514142138.20875-1-peter.maydell@linaro.org>

Convert the Neon integer VMUL, VMLA, and VMLS 3-reg-same inssn to
decodetree.

We don't have a gvec helper for multiply-accumulate, so VMLA and VMLS
need a loop function do_3same_fp().  This takes a reads_vd parameter
to do_3same_fp() which tells it to load the old value into vd before
calling the callback function, in the same way that the do_vfp_3op_sp()
and do_vfp_3op_dp() functions in translate-vfp.inc.c work. (The
only uses in this patch pass reads_vd == true, but later commits
will use reads_vd == false.)

This conversion fixes in passing an underdecoding for VMUL
(originally reported by Fredrik Strupe <fredrik@strupe.net>): bit 1
of the 'size' field must be 0.  The old decoder didn't enforce this,
but the decodetree pattern does.

The gen_VMLA_fp_reg() function performs the addition operation
with the operands in the opposite order to the old decoder:
since Neon sets 'default NaN mode' float32_add operations are
commutative so there is no behaviour difference, but putting
them this way around matches the Arm ARM pseudocode and the
required operation order for the subtraction in gen_VMLS_fp_reg().

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200512163904.10918-14-peter.maydell@linaro.org
---
 target/arm/neon-dp.decode       |  3 ++
 target/arm/translate-neon.inc.c | 81 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 17 +------
 3 files changed, 85 insertions(+), 16 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index d66c67ca585..4c2f8c770d1 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -180,5 +180,8 @@ VADD_fp_3s       1111 001 0 0 . 0 . .... .... 1101 ... 0 .... @3same_fp
 VSUB_fp_3s       1111 001 0 0 . 1 . .... .... 1101 ... 0 .... @3same_fp
 VPADD_fp_3s      1111 001 1 0 . 0 . .... .... 1101 ... 0 .... @3same_fp_q0
 VABD_fp_3s       1111 001 1 0 . 1 . .... .... 1101 ... 0 .... @3same_fp
+VMLA_fp_3s       1111 001 0 0 . 0 . .... .... 1101 ... 1 .... @3same_fp
+VMLS_fp_3s       1111 001 0 0 . 1 . .... .... 1101 ... 1 .... @3same_fp
+VMUL_fp_3s       1111 001 1 0 . 0 . .... .... 1101 ... 1 .... @3same_fp
 VPMAX_fp_3s      1111 001 1 0 . 0 . .... .... 1111 ... 0 .... @3same_fp_q0
 VPMIN_fp_3s      1111 001 1 0 . 1 . .... .... 1111 ... 0 .... @3same_fp_q0
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index 7bdf1e3fee8..18896598bb4 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -1022,6 +1022,55 @@ DO_3SAME_PAIR(VPADD, padd_u)
 DO_3SAME_VQDMULH(VQDMULH, qdmulh)
 DO_3SAME_VQDMULH(VQRDMULH, qrdmulh)
 
+static bool do_3same_fp(DisasContext *s, arg_3same *a, VFPGen3OpSPFn *fn,
+                        bool reads_vd)
+{
+    /*
+     * FP operations handled elementwise 32 bits at a time.
+     * If reads_vd is true then the old value of Vd will be
+     * loaded before calling the callback function. This is
+     * used for multiply-accumulate type operations.
+     */
+    TCGv_i32 tmp, tmp2;
+    int pass;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if ((a->vn | a->vm | a->vd) & a->q) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    TCGv_ptr fpstatus = get_fpstatus_ptr(1);
+    for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
+        tmp = neon_load_reg(a->vn, pass);
+        tmp2 = neon_load_reg(a->vm, pass);
+        if (reads_vd) {
+            TCGv_i32 tmp_rd = neon_load_reg(a->vd, pass);
+            fn(tmp_rd, tmp, tmp2, fpstatus);
+            neon_store_reg(a->vd, pass, tmp_rd);
+            tcg_temp_free_i32(tmp);
+        } else {
+            fn(tmp, tmp, tmp2, fpstatus);
+            neon_store_reg(a->vd, pass, tmp);
+        }
+        tcg_temp_free_i32(tmp2);
+    }
+    tcg_temp_free_ptr(fpstatus);
+    return true;
+}
+
 /*
  * For all the functions using this macro, size == 1 means fp16,
  * which is an architecture extension we don't implement yet.
@@ -1049,6 +1098,38 @@ DO_3SAME_VQDMULH(VQRDMULH, qrdmulh)
 DO_3S_FP_GVEC(VADD, gen_helper_gvec_fadd_s)
 DO_3S_FP_GVEC(VSUB, gen_helper_gvec_fsub_s)
 DO_3S_FP_GVEC(VABD, gen_helper_gvec_fabd_s)
+DO_3S_FP_GVEC(VMUL, gen_helper_gvec_fmul_s)
+
+/*
+ * For all the functions using this macro, size == 1 means fp16,
+ * which is an architecture extension we don't implement yet.
+ */
+#define DO_3S_FP(INSN,FUNC,READS_VD)                                \
+    static bool trans_##INSN##_fp_3s(DisasContext *s, arg_3same *a) \
+    {                                                               \
+        if (a->size != 0) {                                         \
+            /* TODO fp16 support */                                 \
+            return false;                                           \
+        }                                                           \
+        return do_3same_fp(s, a, FUNC, READS_VD);                   \
+    }
+
+static void gen_VMLA_fp_3s(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm,
+                            TCGv_ptr fpstatus)
+{
+    gen_helper_vfp_muls(vn, vn, vm, fpstatus);
+    gen_helper_vfp_adds(vd, vd, vn, fpstatus);
+}
+
+static void gen_VMLS_fp_3s(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm,
+                            TCGv_ptr fpstatus)
+{
+    gen_helper_vfp_muls(vn, vn, vm, fpstatus);
+    gen_helper_vfp_subs(vd, vd, vn, fpstatus);
+}
+
+DO_3S_FP(VMLA, gen_VMLA_fp_3s, true)
+DO_3S_FP(VMLS, gen_VMLS_fp_3s, true)
 
 static bool do_3same_fp_pair(DisasContext *s, arg_3same *a, VFPGen3OpSPFn *fn)
 {
diff --git a/target/arm/translate.c b/target/arm/translate.c
index ca6ed09ec34..06b6925d31e 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -5433,6 +5433,7 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         case NEON_3R_VPADD_VQRDMLAH:
         case NEON_3R_VQDMULH_VQRDMULH:
         case NEON_3R_FLOAT_ARITH:
+        case NEON_3R_FLOAT_MULTIPLY:
             /* Already handled by decodetree */
             return 1;
         }
@@ -5479,22 +5480,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         tmp = neon_load_reg(rn, pass);
         tmp2 = neon_load_reg(rm, pass);
         switch (op) {
-        case NEON_3R_FLOAT_MULTIPLY:
-        {
-            TCGv_ptr fpstatus = get_fpstatus_ptr(1);
-            gen_helper_vfp_muls(tmp, tmp, tmp2, fpstatus);
-            if (!u) {
-                tcg_temp_free_i32(tmp2);
-                tmp2 = neon_load_reg(rd, pass);
-                if (size == 0) {
-                    gen_helper_vfp_adds(tmp, tmp, tmp2, fpstatus);
-                } else {
-                    gen_helper_vfp_subs(tmp, tmp2, tmp, fpstatus);
-                }
-            }
-            tcg_temp_free_ptr(fpstatus);
-            break;
-        }
         case NEON_3R_FLOAT_CMP:
         {
             TCGv_ptr fpstatus = get_fpstatus_ptr(1);
-- 
2.20.1



  parent reply	other threads:[~2020-05-14 14:55 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-14 14:20 [PULL 00/45] target-arm queue Peter Maydell
2020-05-14 14:20 ` [PULL 01/45] target/arm: Use correct GDB XML for M-profile cores Peter Maydell
2020-05-14 14:20 ` [PULL 02/45] target/arm: Create gen_gvec_[us]sra Peter Maydell
2020-05-14 14:20 ` [PULL 03/45] target/arm: Create gen_gvec_{u,s}{rshr,rsra} Peter Maydell
2020-05-14 14:20 ` [PULL 04/45] target/arm: Create gen_gvec_{sri,sli} Peter Maydell
2020-05-14 14:20 ` [PULL 05/45] target/arm: Remove unnecessary range check for VSHL Peter Maydell
2020-05-14 14:20 ` [PULL 06/45] target/arm: Tidy handle_vec_simd_shri Peter Maydell
2020-05-14 14:21 ` [PULL 07/45] target/arm: Create gen_gvec_{ceq,clt,cle,cgt,cge}0 Peter Maydell
2020-05-14 14:21 ` [PULL 08/45] target/arm: Create gen_gvec_{mla,mls} Peter Maydell
2020-05-14 14:21 ` [PULL 09/45] target/arm: Swap argument order for VSHL during decode Peter Maydell
2020-05-14 14:21 ` [PULL 10/45] target/arm: Create gen_gvec_{cmtst,ushl,sshl} Peter Maydell
2020-05-14 14:21 ` [PULL 11/45] target/arm: Create gen_gvec_{uqadd, sqadd, uqsub, sqsub} Peter Maydell
2020-05-14 14:21 ` [PULL 12/45] target/arm: Remove fp_status from helper_{recpe, rsqrte}_u32 Peter Maydell
2020-05-14 14:21 ` [PULL 13/45] target/arm: Create gen_gvec_{qrdmla,qrdmls} Peter Maydell
2020-05-14 14:21 ` [PULL 14/45] target/arm: Pass pointer to qc to qrdmla/qrdmls Peter Maydell
2020-05-14 14:21 ` [PULL 15/45] target/arm: Clear tail in gvec_fmul_idx_*, gvec_fmla_idx_* Peter Maydell
2020-05-14 14:21 ` [PULL 16/45] target/arm: Vectorize SABD/UABD Peter Maydell
2020-05-14 14:21 ` [PULL 17/45] target/arm: Vectorize SABA/UABA Peter Maydell
2020-05-21 13:11   ` Peter Maydell
2020-05-14 14:21 ` [PULL 18/45] aspeed: Add support for the sonorapass-bmc board Peter Maydell
2020-05-14 14:21 ` [PULL 19/45] acpi: nvdimm: change NVDIMM_UUID_LE to a common macro Peter Maydell
2020-05-14 14:21 ` [PULL 20/45] hw/arm/virt: Introduce a RAS machine option Peter Maydell
2020-05-14 14:21 ` [PULL 21/45] docs: APEI GHES generation and CPER record description Peter Maydell
2020-05-14 14:21 ` [PULL 22/45] ACPI: Build related register address fields via hardware error fw_cfg blob Peter Maydell
2020-05-14 14:21 ` [PULL 23/45] ACPI: Build Hardware Error Source Table Peter Maydell
2020-05-14 14:21 ` [PULL 24/45] ACPI: Record the Generic Error Status Block address Peter Maydell
2020-05-14 14:21 ` [PULL 25/45] KVM: Move hwpoison page related functions into kvm-all.c Peter Maydell
2020-05-14 14:21 ` [PULL 26/45] ACPI: Record Generic Error Status Block(GESB) table Peter Maydell
2020-05-21 13:03   ` Peter Maydell
2020-05-21 15:31     ` Michael S. Tsirkin
2020-06-19 17:21       ` Peter Maydell
2020-06-20  1:50         ` Dongjiu Geng
2020-05-14 14:21 ` [PULL 27/45] target-arm: kvm64: handle SIGBUS signal from kernel or KVM Peter Maydell
2020-05-14 14:21 ` [PULL 28/45] MAINTAINERS: Add ACPI/HEST/GHES entries Peter Maydell
2020-05-14 14:21 ` [PULL 29/45] target/arm: Convert Neon 3-reg-same VQRDMLAH/VQRDMLSH to decodetree Peter Maydell
2020-05-14 14:21 ` [PULL 30/45] target/arm: Convert Neon 3-reg-same SHA " Peter Maydell
2020-05-14 14:21 ` [PULL 31/45] target/arm: Convert Neon 64-bit element 3-reg-same insns Peter Maydell
2020-05-14 14:21 ` [PULL 32/45] target/arm: Convert Neon VHADD " Peter Maydell
2020-05-14 14:21 ` [PULL 33/45] target/arm: Convert Neon VABA/VABD 3-reg-same to decodetree Peter Maydell
2020-05-14 14:21 ` [PULL 34/45] target/arm: Convert Neon VRHADD, VHSUB 3-reg-same insns " Peter Maydell
2020-05-14 14:21 ` [PULL 35/45] target/arm: Convert Neon VQSHL, VRSHL, VQRSHL " Peter Maydell
2020-05-14 14:21 ` [PULL 36/45] target/arm: Convert Neon VPMAX/VPMIN " Peter Maydell
2020-05-14 14:21 ` [PULL 37/45] target/arm: Convert Neon VPADD " Peter Maydell
2020-05-14 14:21 ` [PULL 38/45] target/arm: Convert Neon VQDMULH/VQRDMULH 3-reg-same " Peter Maydell
2020-05-14 14:21 ` [PULL 39/45] target/arm: Convert Neon VADD, VSUB, VABD 3-reg-same insns " Peter Maydell
2020-05-14 14:21 ` [PULL 40/45] target/arm: Convert Neon VPMIN/VPMAX/VPADD float " Peter Maydell
2020-05-14 14:21 ` Peter Maydell [this message]
2020-05-14 14:21 ` [PULL 42/45] target/arm: Convert Neon 3-reg-same compare " Peter Maydell
2020-05-14 14:21 ` [PULL 43/45] target/arm: Move 'env' argument of recps_f32 and rsqrts_f32 helpers to usual place Peter Maydell
2020-05-14 14:21 ` [PULL 44/45] target/arm: Convert Neon fp VMAX/VMIN/VMAXNM/VMINNM/VRECPS/VRSQRTS to decodetree Peter Maydell
2020-05-14 14:21 ` [PULL 45/45] target/arm: Convert NEON VFMA, VFMS 3-reg-same insns " Peter Maydell
2020-05-14 16:42 ` [PULL 00/45] target-arm queue Peter Maydell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200514142138.20875-42-peter.maydell@linaro.org \
    --to=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).