All of lore.kernel.org
 help / color / mirror / Atom feed
From: malc <av1474@comtv.ru>
To: Richard Henderson <rth@twiddle.net>
Cc: qemu-devel@nongnu.org, Aurelien Jarno <aurelien@aurel32.net>
Subject: Re: [Qemu-devel] [PATCH 05/21] tcg-i386: Tidy bswap operations.
Date: Mon, 19 Apr 2010 20:05:53 +0400 (MSD)	[thread overview]
Message-ID: <alpine.LNX.2.00.1004192004080.1477@linmac> (raw)
In-Reply-To: <4BCC611C.3020202@twiddle.net>

On Mon, 19 Apr 2010, Richard Henderson wrote:

> On 04/18/2010 05:13 PM, Aurelien Jarno wrote:
> > On Tue, Apr 13, 2010 at 04:33:59PM -0700, Richard Henderson wrote:
> >> Define OPC_BSWAP.  Factor opcode emission to separate functions.
> >> Use bswap+shift to implement 16-bit swap instead of a rolw; this
> >> gets the proper zero-extension required by INDEX_op_bswap16_i32.
> > 
> > This is not required by INDEX_op_bswap16_i32. What is need is that the
> > value in the input register has the 16 upper bits set to 0.
> 
> Ah.

Apparently i'm not the only one who misinterpreted this bit of bswap
documentation. How about:

diff --git a/tcg/README b/tcg/README
index 68d27ff..5b39a38 100644
--- a/tcg/README
+++ b/tcg/README
@@ -269,7 +269,7 @@ ext32u_i64 t0, t1
 * bswap16_i32/i64 t0, t1
 
 16 bit byte swap on a 32/64 bit value. It assumes that the two/six high 
order
-bytes are set to zero.
+bytes of t1 are set to zero.
 
 * bswap32_i32/i64 t0, t1
 

> 
> > Considering
> > that, the rolw instruction is faster than bswap + shift.
> 
> Well, no, it isn't.
> 
>  static inline int test_rolw(unsigned short *s)
>  {
>    int i, start, end;
>    asm volatile("rdtsc\n\t"
>                 "movl %%eax, %1\n\t"
>                 "movzwl %3,%2\n\t"
>                 "rolw $8, %w2\n\t"
>                 "addl $1,%2\n\t"
>                 "rdtsc"
>                 : "=&a"(end), "=r"(start), "=r"(i) : "m"(*s) : "edx");
>    return end - start;
>  }
>  
>  static inline int test_bswap(unsigned short *s)
>  {
>    int i, start, end;
>    asm volatile("rdtsc\n\t"
>                 "movl %%eax, %1\n\t"
>                 "movzwl %3,%2\n\t"
>                 "bswap %2\n\t"
>                 "shl $16,%2\n\t"
>                 "addl $1,%2\n\t"
>                 "rdtsc"
>                 : "=&a"(end), "=r"(start), "=r"(i) : "m"(*s) : "edx");
>    return end - start;
>  }
> 
> 
> model name	: Intel(R) Core(TM)2 Duo CPU     T7700  @ 2.40GHz
>  rolw	   60   60   72   60   60   72   60   60   72   60
>  bswap	   60   60   60   60   60   60   60   60   60   60
> 
> model name	: Dual-Core AMD Opteron(tm) Processor 1210
>  rolw	    9   10    9    9    8    8    8    8    8    8
>  bswap	    9    9    8    8    8    8    8    8    8    8
> 
> The rolw sequence isn't ever faster, and it's more unstable,
> likely due to the partial register stall I mentioned.
> 
> I will grant that the rolw sequence is smaller, and I can 
> adjust this patch to use that sequence if you wish.
> 
> 
> r~
> 
> 

-- 
mailto:av1474@comtv.ru

  reply	other threads:[~2010-04-19 16:06 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-14 20:35 [Qemu-devel] [PATCH 00/21] tcg-i386 cleanup and improvement Richard Henderson
2010-04-13 22:23 ` [Qemu-devel] [PATCH 01/21] tcg-i386: Allocate call-saved registers first Richard Henderson
2010-04-13 22:26 ` [Qemu-devel] [PATCH 02/21] tcg-i386: Tidy initialization of tcg_target_call_clobber_regs Richard Henderson
2010-04-13 22:59 ` [Qemu-devel] [PATCH 03/21] tcg-i386: Tidy ext8u and ext16u operations Richard Henderson
2010-04-13 23:13 ` [Qemu-devel] [PATCH 04/21] tcg-i386: Tidy ext8s and ext16s operations Richard Henderson
2010-04-13 23:33 ` [Qemu-devel] [PATCH 05/21] tcg-i386: Tidy bswap operations Richard Henderson
2010-04-18 22:13   ` Aurelien Jarno
2010-04-19 13:56     ` Richard Henderson
2010-04-19 16:05       ` malc [this message]
2010-04-19 19:19         ` Richard Henderson
2010-04-13 23:44 ` [Qemu-devel] [PATCH 06/21] tcg-i386: Tidy shift operations Richard Henderson
2010-04-14 14:58 ` [Qemu-devel] [PATCH 07/21] tcg-i386: Tidy move operations Richard Henderson
2010-04-14 15:06 ` [Qemu-devel] [PATCH 08/21] tcg-i386: Eliminate extra move from qemu_ld64 Richard Henderson
2010-04-14 15:26 ` [Qemu-devel] [PATCH 09/21] tcg-i386: Tidy jumps Richard Henderson
2010-04-14 15:38 ` [Qemu-devel] [PATCH 10/21] tcg-i386: Tidy immediate arithmetic operations Richard Henderson
2010-04-14 17:16 ` [Qemu-devel] [PATCH 11/21] tcg-i386: Tidy non-immediate " Richard Henderson
2010-04-14 17:20 ` [Qemu-devel] [PATCH 12/21] tcg-i386: Tidy movi Richard Henderson
2010-04-14 17:59 ` [Qemu-devel] [PATCH 13/21] tcg-i386: Tidy push/pop Richard Henderson
2010-04-14 18:02 ` [Qemu-devel] [PATCH 14/21] tcg-i386: Tidy calls Richard Henderson
2010-04-14 18:04 ` [Qemu-devel] [PATCH 15/21] tcg-i386: Tidy ret Richard Henderson
2010-04-14 18:07 ` [Qemu-devel] [PATCH 16/21] tcg-i386: Tidy setcc Richard Henderson
2010-04-14 18:22 ` [Qemu-devel] [PATCH 17/21] tcg-i386: Tidy unary arithmetic Richard Henderson
2010-04-14 18:29 ` [Qemu-devel] [PATCH 18/21] tcg-i386: Tidy multiply Richard Henderson
2010-04-14 18:32 ` [Qemu-devel] [PATCH 19/21] tcg-i386: Tidy xchg Richard Henderson
2010-04-14 19:08 ` [Qemu-devel] [PATCH 20/21] tcg-i386: Tidy lea Richard Henderson
2010-04-14 20:29 ` [Qemu-devel] [PATCH 21/21] tcg-i386: Use lea for three-operand add Richard Henderson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LNX.2.00.1004192004080.1477@linmac \
    --to=av1474@comtv.ru \
    --cc=aurelien@aurel32.net \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.