From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:54253) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bz4tZ-0001Ge-1S for qemu-devel@nongnu.org; Tue, 25 Oct 2016 12:48:13 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bz4tV-00064Q-4a for qemu-devel@nongnu.org; Tue, 25 Oct 2016 12:48:13 -0400 Received: from mx1.redhat.com ([209.132.183.28]:60468) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1bz4tU-000644-VS for qemu-devel@nongnu.org; Tue, 25 Oct 2016 12:48:09 -0400 References: <1476803431-7208-1-git-send-email-rth@twiddle.net> <1476803431-7208-8-git-send-email-rth@twiddle.net> <9e61bace-5212-2e97-2c34-64c11c29c127@twiddle.net> From: Paolo Bonzini Message-ID: <0649ba28-15c1-dbbe-ce60-e93df67557e8@redhat.com> Date: Tue, 25 Oct 2016 18:48:03 +0200 MIME-Version: 1.0 In-Reply-To: <9e61bace-5212-2e97-2c34-64c11c29c127@twiddle.net> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v2 07/18] tcg/i386: Implement field extraction opcodes List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Richard Henderson , qemu-devel@nongnu.org On 25/10/2016 18:46, Richard Henderson wrote: > On 10/25/2016 05:46 AM, Paolo Bonzini wrote: >> >> >> On 18/10/2016 17:10, Richard Henderson wrote: >>> + case INDEX_op_extract_i32: >>> + /* On the off-chance that we can use the high-byte registers. >>> + Otherwise we emit the same ext16 + shift pattern that we >>> + would have gotten from the normal tcg-op.c expansion. */ >>> + tcg_debug_assert(args[2] == 8 && args[3] == 8); >>> + if (args[1] < 4 && args[0] < 8) { >>> + tcg_out_modrm(s, OPC_MOVZBL, args[0], args[1] + 4); >>> + } else { >>> + tcg_out_ext16u(s, args[0], args[1]); >>> + tcg_out_shifti(s, SHIFT_SHR, args[0], 8); >>> + } >> >> Since the opcode is pretty rare, perhaps it's worth restricting the >> constraints to, respectively, a new constraint for 0xff ("R"?) and "Q"? >> It should generate slightly better code without constraining the >> register allocator too much. > > I tried that, but since our allocator does nothing to look forward to future > uses, it will only properly load a value into Q if this is the first use of the > value within the TB. Otherwise it'll generate an extra move to satisfy the > constraint. > > Given that movzwl can operate on any source, and can copy to another > destination at the same time, it's wasteful to force the register allocator to > generate the extra move. > > This ext16u+shift form is what we'll generate without the special case here. > So if you prefer I could drop the %[abcd]h special case entirely. Nah, as you said there's always a chance of satisfying the constraint (and of getting a better register allocator). > The one that's particularly valuable is the 32-bit shift as extraction from a > 64-bit input. That turns out to happen lots for e.g. ppc64abi32 guest. Sounds good, thanks! Paolo