From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:39894) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bpIZ5-0005iF-5K for qemu-devel@nongnu.org; Wed, 28 Sep 2016 13:22:40 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bpIYz-0000fR-Jy for qemu-devel@nongnu.org; Wed, 28 Sep 2016 13:22:38 -0400 Sender: Richard Henderson References: <1475040687-27523-1-git-send-email-nikunj@linux.vnet.ibm.com> <1475040687-27523-7-git-send-email-nikunj@linux.vnet.ibm.com> <650d5391-5f9b-855f-6a5f-5f9b0fa9adad@twiddle.net> <87shskvtix.fsf@abhimanyu.i-did-not-set--mail-host-address--so-tickle-me> From: Richard Henderson Message-ID: <42e5aa24-2b9f-daec-a2ba-e2f2d2e5e7d0@twiddle.net> Date: Wed, 28 Sep 2016 10:22:30 -0700 MIME-Version: 1.0 In-Reply-To: <87shskvtix.fsf@abhimanyu.i-did-not-set--mail-host-address--so-tickle-me> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v4 6/9] target-ppc: add lxvh8x instruction List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Nikunj A Dadhania , qemu-ppc@nongnu.org, david@gibson.dropbear.id.au Cc: qemu-devel@nongnu.org, benh@kernel.crashing.org On 09/28/2016 10:11 AM, Nikunj A Dadhania wrote: > Richard Henderson writes: > >> On 09/27/2016 10:31 PM, Nikunj A Dadhania wrote: >>> +DEF_HELPER_1(bswap16x4, i64, i64) >> >> DEF_HELPER_FLAGS_1(bswap16x4, TCG_CALL_NO_RWG_SE, i64, i64) >> >>> + uint64_t m = 0x00ff00ff00ff00ffull; >>> + return ((x & m) << 8) | ((x >> 8) & m); >> >> ... although I suppose this is only 5 instructions, and could reasonably be >> done inline too. Especially if you shared the one 64-bit constant across the >> two bswaps. > > Something like this: > > static void gen_bswap16x4(TCGv_i64 val) > { > TCGv_i64 mask = tcg_const_i64(0x00FF00FF00FF00FF); > TCGv_i64 t0 = tcg_temp_new_i64(); > TCGv_i64 t1 = tcg_temp_new_i64(); > > /* val = ((val & mask) << 8) | ((val >> 8) & mask) */ > tcg_gen_and_i64(t0, val, mask); > tcg_gen_shri_i64(t0, t0, 8); > tcg_gen_shli_i64(t1, val, 8); > tcg_gen_and_i64(t1, t1, mask); > tcg_gen_or_i64(val, t0, t1); > > tcg_temp_free_i64(t0); > tcg_temp_free_i64(t1); > tcg_temp_free_i64(mask); > } Like that, except that since you always perform this twice, you should share the expensive constant load. Recall also that you need temporaries for the store, so static void gen_bswap16x8(TCGv_i64 outh, TCGv_i64 outl, TCGv_i64 inh, TCGv_i64 inl) r~