From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:53573) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bWd5y-0005Uv-Uv for qemu-devel@nongnu.org; Mon, 08 Aug 2016 01:27:27 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bWd5v-0004Bd-Ld for qemu-devel@nongnu.org; Mon, 08 Aug 2016 01:27:26 -0400 Sender: Richard Henderson References: <1470591415-3268-1-git-send-email-nikunj@linux.vnet.ibm.com> <1470591415-3268-6-git-send-email-nikunj@linux.vnet.ibm.com> From: Richard Henderson Message-ID: Date: Mon, 8 Aug 2016 10:57:13 +0530 MIME-Version: 1.0 In-Reply-To: <1470591415-3268-6-git-send-email-nikunj@linux.vnet.ibm.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 5/6] target-ppc: add lxvb16x and lxvh8x List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Nikunj A Dadhania , qemu-ppc@nongnu.org, david@gibson.dropbear.id.au Cc: qemu-devel@nongnu.org, benh@kernel.crashing.org On 08/07/2016 11:06 PM, Nikunj A Dadhania wrote: > +#define LXV(name, access, swap, type, elems) \ > +uint64_t helper_##name(CPUPPCState *env, \ > + target_ulong addr) \ > +{ \ > + type r[elems] = {0}; \ > + int i, index, bound, step; \ > + if (msr_le) { \ > + index = elems - 1; \ > + bound = -1; \ > + step = -1; \ > + } else { \ > + index = 0; \ > + bound = elems; \ > + step = 1; \ > + } \ > + \ > + for (i = index; i != bound; i += step) { \ > + if (needs_byteswap(env)) { \ > + r[i] = swap(access(env, addr, GETPC())); \ > + } else { \ > + r[i] = access(env, addr, GETPC()); \ > + } \ > + addr = addr_add(env, addr, sizeof(type)); \ > + } \ > + return *((uint64_t *)r); \ > +} This looks more complicated than necessary. (1) In big-endian mode, surely this simplifies to two 64-bit big-endian loads. (2) In little-endian mode, the overhead of accessing memory surely dominates, and therefore we should perform two 64-bit loads and manipulate the data after. AFAICS, this is easiest done by requesting two 64-bit *big-endian* loads, and then swapping bytes. E.g. uint64_t helper_bswap16x4(uint64_t x) { uint64_t m = 0x00ff00ff00ff00ffull; return ((x & m) << 8) | ((x >> 8) & m); } uint64_t helper_bswap32x2(uint64_t x) { return deposit64(bswap32(x >> 32), 32, 32, bswap32(x)); } tcg_gen_qemu_ld_i64(dest, addr, MO_BEQ, s->mem_index); if (ctx->le_mode) { gen_helper_bswap16x4(dest, dest); } r~