From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:44914) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bmOYs-0002eR-Kf for qemu-devel@nongnu.org; Tue, 20 Sep 2016 13:10:27 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bmOYl-0000IK-HZ for qemu-devel@nongnu.org; Tue, 20 Sep 2016 13:10:25 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:43944 helo=mx0a-001b2d01.pphosted.com) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bmOYl-0000I7-C3 for qemu-devel@nongnu.org; Tue, 20 Sep 2016 13:10:19 -0400 Received: from pps.filterd (m0098420.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.17/8.16.0.17) with SMTP id u8KH8m0t016422 for ; Tue, 20 Sep 2016 13:10:19 -0400 Received: from e23smtp04.au.ibm.com (e23smtp04.au.ibm.com [202.81.31.146]) by mx0b-001b2d01.pphosted.com with ESMTP id 25jmr0xvn5-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Tue, 20 Sep 2016 13:10:18 -0400 Received: from localhost by e23smtp04.au.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 21 Sep 2016 03:10:15 +1000 From: Nikunj A Dadhania In-Reply-To: <20160920043429.GI20488@umbus> References: <1474023111-11992-1-git-send-email-nikunj@linux.vnet.ibm.com> <1474023111-11992-3-git-send-email-nikunj@linux.vnet.ibm.com> <20160919061934.GC20488@umbus> <20160919065037.GF20488@umbus> <87wpi8kwg7.fsf@abhimanyu.i-did-not-set--mail-host-address--so-tickle-me> <20160920043429.GI20488@umbus> Date: Tue, 20 Sep 2016 22:40:03 +0530 MIME-Version: 1.0 Content-Type: text/plain Message-Id: <87lgymqyz8.fsf@abhimanyu.i-did-not-set--mail-host-address--so-tickle-me> Subject: Re: [Qemu-devel] [PATCH v3 2/5] target-ppc: improve lxvw4x implementation List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: David Gibson Cc: qemu-ppc@nongnu.org, rth@twiddle.net, qemu-devel@nongnu.org, benh@kernel.crashing.org David Gibson writes: > [ Unknown signature status ] > On Mon, Sep 19, 2016 at 04:06:40PM +0530, Nikunj A Dadhania wrote: >> David Gibson writes: >> > [ Unknown signature status ] >> > On Mon, Sep 19, 2016 at 04:19:34PM +1000, David Gibson wrote: >> >> On Fri, Sep 16, 2016 at 04:21:48PM +0530, Nikunj A Dadhania wrote: >> >> > diff --git a/target-ppc/translate/vsx-impl.inc.c b/target-ppc/translate/vsx-impl.inc.c >> >> > index eee6052..df278df 100644 >> >> > --- a/target-ppc/translate/vsx-impl.inc.c >> >> > +++ b/target-ppc/translate/vsx-impl.inc.c >> >> > @@ -75,7 +75,6 @@ static void gen_lxvdsx(DisasContext *ctx) >> >> > static void gen_lxvw4x(DisasContext *ctx) >> >> > { >> >> > TCGv EA; >> >> > - TCGv_i64 tmp; >> >> > TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode)); >> >> > TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode)); >> >> > if (unlikely(!ctx->vsx_enabled)) { >> >> > @@ -84,22 +83,14 @@ static void gen_lxvw4x(DisasContext *ctx) >> >> > } >> >> > gen_set_access_type(ctx, ACCESS_INT); >> >> > EA = tcg_temp_new(); >> >> > - tmp = tcg_temp_new_i64(); >> >> > >> >> > gen_addr_reg_index(ctx, EA); >> >> > - gen_qemu_ld32u_i64(ctx, tmp, EA); >> >> > - tcg_gen_addi_tl(EA, EA, 4); >> >> > - gen_qemu_ld32u_i64(ctx, xth, EA); >> >> > - tcg_gen_deposit_i64(xth, xth, tmp, 32, 32); >> >> > - >> >> > - tcg_gen_addi_tl(EA, EA, 4); >> >> > - gen_qemu_ld32u_i64(ctx, tmp, EA); >> >> > - tcg_gen_addi_tl(EA, EA, 4); >> >> > - gen_qemu_ld32u_i64(ctx, xtl, EA); >> >> > - tcg_gen_deposit_i64(xtl, xtl, tmp, 32, 32); >> >> > - >> >> > + tcg_gen_qemu_ld_i64(xth, EA, ctx->mem_idx, MO_LEQ); >> >> > + gen_helper_deposit32x2(xth, xth); >> >> > + tcg_gen_addi_tl(EA, EA, 8); >> >> > + tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_LEQ); >> >> > + gen_helper_deposit32x2(xtl, xtl); >> > >> > ..and I think this is wrong for BE mode. The deposit32x2 will get the >> > words in the right order, but the bytes within each word will be wrong >> > because of the LE mode load on a BE setup. >> >> Since lxvw4x/stxvw4x is available on POWER8. I tried running my test >> code on BE and LE Fedora24 VM. TCG Results match the POWER8 hardware. >> The order within the word is not changed. Snippet of the test code at >> the end of email. Can share full code if needed (maybe will do it in >> kvm-unit-test) > > Ugh.. now I'm confused. I would not have expected the results you've > seen from these tests. But I still can't understand *how* the > emulation could be correct: IIUC MO_LEQ would mean it loads the 8 > bytes as a single 64-bit LE integer. For both the case LE/BE we do a LE read ... > Which should be the same as > loading one 32-bit LE integer into the low half of the target > register, then a 32-bit LE integer into the high half ot the target > register. .. The 64-bit integer read is not same in these cases. The input itself would be in the order of the format. Input rb32[]: 00010203 20212223 30313233 40414243 LE: helper_deposit32x2: 2021222300010203 helper_deposit32x2: 4041424330313233 BE helper_deposit32x2: 2322212003020100 helper_deposit32x2: 4342414033323130 > > As I said above, the deposit32x2 will swap the order of the two ints, > but it won't byteswap the individual int32s which should have been BE > in memory. > > Can you find the flaw in my reasoning? One anomaly that I see in BE code generation: it also generates a stxvw4x after lxvw4x. I am not sure why. >>>>>>>>>>>>>>>> BE BE BE >>>>>>>>>>>>>> Input rb32[]: 00010203 20212223 30313233 40414243 gen_lxvw4x: called helper_deposit32x2: 2322212003020100 helper_deposit32x2: 4342414033323130 gen_stxvw4x: called helper_deposit32x2: 0302010023222120 helper_deposit32x2: 3332313043424140 Output VRT32: 00010203 20212223 30313233 40414243 >> vsx.h: >> ====== >> #define U32_SIZE (sizeof(__vector uint32_t) / sizeof(uint32_t)) >> >> typedef union { >> __vector uint32_t v; >> uint32_t a[U32_SIZE]; >> } vuint32_t; > > I am a little suspicious that whatever the compiler does to convert > the vector to an array via this union might be undoing a byte reverse. > > I'd be more confident if you used VSX instructions to extract and > store separately one of the 32-bit subwords of the vector. I will try to figure those instructions. Regards Nikunj