From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:33757) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fnhYS-0006rn-4y for qemu-devel@nongnu.org; Thu, 09 Aug 2018 05:48:29 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fnhYO-00086O-5b for qemu-devel@nongnu.org; Thu, 09 Aug 2018 05:48:28 -0400 Received: from mail-wr1-x441.google.com ([2a00:1450:4864:20::441]:35276) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fnhYN-000842-Po for qemu-devel@nongnu.org; Thu, 09 Aug 2018 05:48:24 -0400 Received: by mail-wr1-x441.google.com with SMTP id g1-v6so4601684wru.2 for ; Thu, 09 Aug 2018 02:48:23 -0700 (PDT) References: <20180809034033.10579-1-richard.henderson@linaro.org> <20180809034033.10579-4-richard.henderson@linaro.org> From: Alex =?utf-8?Q?Benn=C3=A9e?= In-reply-to: <20180809034033.10579-4-richard.henderson@linaro.org> Date: Thu, 09 Aug 2018 10:48:20 +0100 Message-ID: <87mutvq3a3.fsf@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH 03/11] target/arm: Reorganize SVE WHILE List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Richard Henderson Cc: qemu-devel@nongnu.org, laurent.desnogues@gmail.com, peter.maydell@linaro.org, qemu-stable@nongnu.org Richard Henderson writes: > The pseudocode for this operation is an increment + compare loop, > so comparing <=3D the maximum integer produces an all-true predicate. > > Rather than bound in both the inline code and the helper, pass the > helper the number of predicate bits to set instead of the number > of predicate elements to set. > > Cc: qemu-stable@nongnu.org (3.0.1) > Tested-by: Laurent Desnogues > Reviewed-by: Laurent Desnogues > Reported-by: Laurent Desnogues > Signed-off-by: Richard Henderson Reviewed-by: Alex Benn=C3=A9e > --- > target/arm/sve_helper.c | 5 ---- > target/arm/translate-sve.c | 49 +++++++++++++++++++++++++------------- > 2 files changed, 32 insertions(+), 22 deletions(-) > > diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c > index 9bd0694d55..87594a8adb 100644 > --- a/target/arm/sve_helper.c > +++ b/target/arm/sve_helper.c > @@ -2846,11 +2846,6 @@ uint32_t HELPER(sve_while)(void *vd, uint32_t coun= t, uint32_t pred_desc) > return flags; > } > > - /* Scale from predicate element count to bits. */ > - count <<=3D esz; > - /* Bound to the bits in the predicate. */ > - count =3D MIN(count, oprsz * 8); > - > /* Set all of the requested bits. */ > for (i =3D 0; i < count / 64; ++i) { > d->p[i] =3D esz_mask; > diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c > index 9dd4c38bab..89efc80ee7 100644 > --- a/target/arm/translate-sve.c > +++ b/target/arm/translate-sve.c > @@ -3173,19 +3173,19 @@ static bool trans_CTERM(DisasContext *s, arg_CTER= M *a, uint32_t insn) > > static bool trans_WHILE(DisasContext *s, arg_WHILE *a, uint32_t insn) > { > - if (!sve_access_check(s)) { > - return true; > - } > - > - TCGv_i64 op0 =3D read_cpu_reg(s, a->rn, 1); > - TCGv_i64 op1 =3D read_cpu_reg(s, a->rm, 1); > - TCGv_i64 t0 =3D tcg_temp_new_i64(); > - TCGv_i64 t1 =3D tcg_temp_new_i64(); > + TCGv_i64 op0, op1, t0, t1, tmax; > TCGv_i32 t2, t3; > TCGv_ptr ptr; > unsigned desc, vsz =3D vec_full_reg_size(s); > TCGCond cond; > > + if (!sve_access_check(s)) { > + return true; > + } > + > + op0 =3D read_cpu_reg(s, a->rn, 1); > + op1 =3D read_cpu_reg(s, a->rm, 1); > + > if (!a->sf) { > if (a->u) { > tcg_gen_ext32u_i64(op0, op0); > @@ -3198,32 +3198,47 @@ static bool trans_WHILE(DisasContext *s, arg_WHIL= E *a, uint32_t insn) > > /* For the helper, compress the different conditions into a computat= ion > * of how many iterations for which the condition is true. > - * > - * This is slightly complicated by 0 <=3D UINT64_MAX, which is nomin= ally > - * 2**64 iterations, overflowing to 0. Of course, predicate registe= rs > - * aren't that large, so any value >=3D predicate size is sufficient. > */ > + t0 =3D tcg_temp_new_i64(); > + t1 =3D tcg_temp_new_i64(); > tcg_gen_sub_i64(t0, op1, op0); > > - /* t0 =3D MIN(op1 - op0, vsz). */ > - tcg_gen_movi_i64(t1, vsz); > - tcg_gen_umin_i64(t0, t0, t1); > + tmax =3D tcg_const_i64(vsz >> a->esz); > if (a->eq) { > /* Equality means one more iteration. */ > tcg_gen_addi_i64(t0, t0, 1); > + > + /* If op1 is max (un)signed integer (and the only time the addit= ion > + * above could overflow), then we produce an all-true predicate = by > + * setting the count to the vector length. This is because the > + * pseudocode is described as an increment + compare loop, and t= he > + * max integer would always compare true. > + */ > + tcg_gen_movi_i64(t1, (a->sf > + ? (a->u ? UINT64_MAX : INT64_MAX) > + : (a->u ? UINT32_MAX : INT32_MAX))); > + tcg_gen_movcond_i64(TCG_COND_EQ, t0, op1, t1, tmax, t0); > } > > - /* t0 =3D (condition true ? t0 : 0). */ > + /* Bound to the maximum. */ > + tcg_gen_umin_i64(t0, t0, tmax); > + tcg_temp_free_i64(tmax); > + > + /* Set the count to zero if the condition is false. */ > cond =3D (a->u > ? (a->eq ? TCG_COND_LEU : TCG_COND_LTU) > : (a->eq ? TCG_COND_LE : TCG_COND_LT)); > tcg_gen_movi_i64(t1, 0); > tcg_gen_movcond_i64(cond, t0, op0, op1, t0, t1); > + tcg_temp_free_i64(t1); > > + /* Since we're bounded, pass as a 32-bit type. */ > t2 =3D tcg_temp_new_i32(); > tcg_gen_extrl_i64_i32(t2, t0); > tcg_temp_free_i64(t0); > - tcg_temp_free_i64(t1); > + > + /* Scale elements to bits. */ > + tcg_gen_shli_i32(t2, t2, a->esz); > > desc =3D (vsz / 8) - 2; > desc =3D deposit32(desc, SIMD_DATA_SHIFT, 2, a->esz); -- Alex Benn=C3=A9e