From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1O3rTI-0005se-U5
	for qemu-devel@nongnu.org; Mon, 19 Apr 2010 09:57:09 -0400
Received: from [140.186.70.92] (port=43936 helo=eggs.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1O3rTD-0005lL-V3
	for qemu-devel@nongnu.org; Mon, 19 Apr 2010 09:57:08 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69)
	(envelope-from <rth7680@gmail.com>) id 1O3rTB-00078S-4p
	for qemu-devel@nongnu.org; Mon, 19 Apr 2010 09:57:03 -0400
Received: from mail-pw0-f45.google.com ([209.85.160.45]:58517)
	by eggs.gnu.org with esmtp (Exim 4.69)
	(envelope-from <rth7680@gmail.com>) id 1O3rTA-00077w-VH
	for qemu-devel@nongnu.org; Mon, 19 Apr 2010 09:57:01 -0400
Received: by pwi6 with SMTP id 6so3067988pwi.4
	for <qemu-devel@nongnu.org>; Mon, 19 Apr 2010 06:56:59 -0700 (PDT)
Sender: Richard Henderson <rth7680@gmail.com>
Message-ID: <4BCC611C.3020202@twiddle.net>
Date: Mon, 19 Apr 2010 08:56:44 -0500
From: Richard Henderson <rth@twiddle.net>
MIME-Version: 1.0
Subject: Re: [Qemu-devel] [PATCH 05/21] tcg-i386: Tidy bswap operations.
References: <cover.1271277329.git.rth@twiddle.net>	<e36e7f72dbd7724145773f759814a3c0a184c667.1271277329.git.rth@twiddle.net>
	<20100418221302.GA26784@volta.aurel32.net>
In-Reply-To: <20100418221302.GA26784@volta.aurel32.net>
Content-Type: text/plain; charset=ISO-8859-15
Content-Transfer-Encoding: 7bit
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Aurelien Jarno <aurelien@aurel32.net>
Cc: qemu-devel@nongnu.org

On 04/18/2010 05:13 PM, Aurelien Jarno wrote:
> On Tue, Apr 13, 2010 at 04:33:59PM -0700, Richard Henderson wrote:
>> Define OPC_BSWAP.  Factor opcode emission to separate functions.
>> Use bswap+shift to implement 16-bit swap instead of a rolw; this
>> gets the proper zero-extension required by INDEX_op_bswap16_i32.
> 
> This is not required by INDEX_op_bswap16_i32. What is need is that the
> value in the input register has the 16 upper bits set to 0.

Ah.

> Considering
> that, the rolw instruction is faster than bswap + shift.

Well, no, it isn't.

 static inline int test_rolw(unsigned short *s)
 {
   int i, start, end;
   asm volatile("rdtsc\n\t"
                "movl %%eax, %1\n\t"
                "movzwl %3,%2\n\t"
                "rolw $8, %w2\n\t"
                "addl $1,%2\n\t"
                "rdtsc"
                : "=&a"(end), "=r"(start), "=r"(i) : "m"(*s) : "edx");
   return end - start;
 }
 
 static inline int test_bswap(unsigned short *s)
 {
   int i, start, end;
   asm volatile("rdtsc\n\t"
                "movl %%eax, %1\n\t"
                "movzwl %3,%2\n\t"
                "bswap %2\n\t"
                "shl $16,%2\n\t"
                "addl $1,%2\n\t"
                "rdtsc"
                : "=&a"(end), "=r"(start), "=r"(i) : "m"(*s) : "edx");
   return end - start;
 }


model name	: Intel(R) Core(TM)2 Duo CPU     T7700  @ 2.40GHz
 rolw	   60   60   72   60   60   72   60   60   72   60
 bswap	   60   60   60   60   60   60   60   60   60   60

model name	: Dual-Core AMD Opteron(tm) Processor 1210
 rolw	    9   10    9    9    8    8    8    8    8    8
 bswap	    9    9    8    8    8    8    8    8    8    8

The rolw sequence isn't ever faster, and it's more unstable,
likely due to the partial register stall I mentioned.

I will grant that the rolw sequence is smaller, and I can 
adjust this patch to use that sequence if you wish.


r~