From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:49760) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bpR37-0000nV-6L for qemu-devel@nongnu.org; Wed, 28 Sep 2016 22:26:14 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bpR32-0002AT-Rt for qemu-devel@nongnu.org; Wed, 28 Sep 2016 22:26:13 -0400 Date: Thu, 29 Sep 2016 11:38:41 +1000 From: David Gibson Message-ID: <20160929013841.GB8390@umbus.fritz.box> References: <1475040687-27523-1-git-send-email-nikunj@linux.vnet.ibm.com> <1475040687-27523-5-git-send-email-nikunj@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="wq9mPyueHGvFACwf" Content-Disposition: inline In-Reply-To: <1475040687-27523-5-git-send-email-nikunj@linux.vnet.ibm.com> Subject: Re: [Qemu-devel] [PATCH v4 4/9] target-ppc: improve lxvw4x implementation List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Nikunj A Dadhania Cc: qemu-ppc@nongnu.org, rth@twiddle.net, qemu-devel@nongnu.org, benh@kernel.crashing.org --wq9mPyueHGvFACwf Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Sep 28, 2016 at 11:01:22AM +0530, Nikunj A Dadhania wrote: > Load 8byte at a time and manipulate. >=20 > Big-Endian Storage > +-------------+-------------+-------------+-------------+ > | 00 11 22 33 | 44 55 66 77 | 88 99 AA BB | CC DD EE FF | > +-------------+-------------+-------------+-------------+ >=20 > Little-Endian Storage > +-------------+-------------+-------------+-------------+ > | 33 22 11 00 | 77 66 55 44 | BB AA 99 88 | FF EE DD CC | > +-------------+-------------+-------------+-------------+ >=20 > Vector load results in: > +-------------+-------------+-------------+-------------+ > | 00 11 22 33 | 44 55 66 77 | 88 99 AA BB | CC DD EE FF | > +-------------+-------------+-------------+-------------+ Ok. I'm guessing from this that implementing those GPR<->VSR instructions showed that the earlier versions were endian-incorrect as I suspected. Have you verified that this new implementation is actually faster (or at least no slower) on LE than the original implementation with individual 32-bit stores? > Signed-off-by: Nikunj A Dadhania > --- > target-ppc/translate/vsx-impl.inc.c | 33 +++++++++++++++++++------------= -- > 1 file changed, 19 insertions(+), 14 deletions(-) >=20 > diff --git a/target-ppc/translate/vsx-impl.inc.c b/target-ppc/translate/v= sx-impl.inc.c > index 74d0533..1eca042 100644 > --- a/target-ppc/translate/vsx-impl.inc.c > +++ b/target-ppc/translate/vsx-impl.inc.c > @@ -75,7 +75,6 @@ static void gen_lxvdsx(DisasContext *ctx) > static void gen_lxvw4x(DisasContext *ctx) > { > TCGv EA; > - TCGv_i64 tmp; > TCGv_i64 xth =3D cpu_vsrh(xT(ctx->opcode)); > TCGv_i64 xtl =3D cpu_vsrl(xT(ctx->opcode)); > if (unlikely(!ctx->vsx_enabled)) { > @@ -84,22 +83,28 @@ static void gen_lxvw4x(DisasContext *ctx) > } > gen_set_access_type(ctx, ACCESS_INT); > EA =3D tcg_temp_new(); > - tmp =3D tcg_temp_new_i64(); > =20 > gen_addr_reg_index(ctx, EA); > - gen_qemu_ld32u_i64(ctx, tmp, EA); > - tcg_gen_addi_tl(EA, EA, 4); > - gen_qemu_ld32u_i64(ctx, xth, EA); > - tcg_gen_deposit_i64(xth, xth, tmp, 32, 32); > - > - tcg_gen_addi_tl(EA, EA, 4); > - gen_qemu_ld32u_i64(ctx, tmp, EA); > - tcg_gen_addi_tl(EA, EA, 4); > - gen_qemu_ld32u_i64(ctx, xtl, EA); > - tcg_gen_deposit_i64(xtl, xtl, tmp, 32, 32); > - > + if (ctx->le_mode) { > + TCGv_i64 t0, t1; > + > + t0 =3D tcg_temp_new_i64(); > + t1 =3D tcg_temp_new_i64(); > + tcg_gen_qemu_ld_i64(t0, EA, ctx->mem_idx, MO_LEQ); > + tcg_gen_shri_i64(t1, t0, 32); > + tcg_gen_deposit_i64(xth, t1, t0, 32, 32); > + tcg_gen_addi_tl(EA, EA, 8); > + tcg_gen_qemu_ld_i64(t0, EA, ctx->mem_idx, MO_LEQ); > + tcg_gen_shri_i64(t1, t0, 32); > + tcg_gen_deposit_i64(xtl, t1, t0, 32, 32); > + tcg_temp_free_i64(t0); > + tcg_temp_free_i64(t1); > + } else { > + tcg_gen_qemu_ld_i64(xth, EA, ctx->mem_idx, MO_BEQ); > + tcg_gen_addi_tl(EA, EA, 8); > + tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_BEQ); > + } > tcg_temp_free(EA); > - tcg_temp_free_i64(tmp); > } > =20 > #define VSX_STORE_SCALAR(name, operation) \ --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --wq9mPyueHGvFACwf Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJX7HChAAoJEGw4ysog2bOS+54QALNQprRUxqR8nTxULFGBm03S NPCyNwZF/i/hhZbU51bemmUn7jYJi1UehTB7fOwQfL5OISpFcXcl3sAcWCbc5TwS g8dTN0Jt6OmtvWFRTOzT93fhYCU29ykLtiuRwweVQjf94zAxn34Yint94e9IL1Fh cKRl133A0+wmE+mTYLctNq0RGl42LzV4NN5e/D6i4KPPNfJy9yooR+j0THq98f+q zMr5XQGijsqwlzNqaEtRgY1fsYxmfwuES6L/rf8Z+mL1D7/8ryi6Q/UxJApCwsMA cklcUyZLZ63RgnHMrcSy1XscTTyQhCDLkDrO3egaPB9GdNeclL4ZQUhsDOni5EjL YoeaMwOi9O9ouYSEOXglzkat7/hFPNT3covJQEeNAY3nG7gTW4ehICVk3B+A+5aP yJEkboKkAjjqCAaOgGrqG4RLo/RQHxb7J9kTpZququ5NdlA1radqLSjxQHQmRc/M ll64HJ8LQK3FUy89GaN7N6TcCbikTQYAgbWEDgr923tyjkehLSCmo/CaVKmJBAny dCCW6NMWROMSpQc8OvTz26cJJxHDxNuUDl3eYsBdwWBOaRBCwWDQseX9bdpYybpR aN3Lr4dA4apJIHjwg33+dIC87M/tQ82EaFKJOjz4tJtLVHLgj9wTqyFz4CWaECHO 1NU5/tp5KJwmOsTnqpwF =+Qua -----END PGP SIGNATURE----- --wq9mPyueHGvFACwf--