All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/4] target/arm, cris, mips: optimize "swap bytes within words"
@ 2017-05-16 23:01 Aurelien Jarno
  2017-05-16 23:01 ` [Qemu-devel] [PATCH 1/4] target/arm: optimize aarch32 rev16 Aurelien Jarno
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Aurelien Jarno @ 2017-05-16 23:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson, Aurelien Jarno

This patchset optimizes the "swap bytes within words" instructions on the
arm, cris and mips targets. It all started with the patchset from Philippe
Mathieu-Daudé optimizing TCG code by using the extract op. Looking at the
patch I have found that the aarch64 rev16 function can be optimized even
more. Richard Henderson then suggested an even more optimized version.

Aurelien Jarno (4):
  target/arm: optimize aarch32 rev16
  target/arm: simplify and optimize aarch64 rev16
  target/cris: optimize swap
  target/mips: optimize WSBH, DSBH and DSHD

 target/arm/translate-a64.c | 24 ++++++------------------
 target/arm/translate.c     |  6 ++++--
 target/cris/translate.c    | 15 +++++++--------
 target/mips/translate.c    | 18 ++++++++++++------
 4 files changed, 29 insertions(+), 34 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH 1/4] target/arm: optimize aarch32 rev16
  2017-05-16 23:01 [Qemu-devel] [PATCH 0/4] target/arm, cris, mips: optimize "swap bytes within words" Aurelien Jarno
@ 2017-05-16 23:01 ` Aurelien Jarno
  2017-05-17  0:56   ` [Qemu-devel] [Qemu-arm] " Philippe Mathieu-Daudé
  2017-05-23  0:21   ` [Qemu-devel] " Richard Henderson
  2017-05-16 23:01 ` [Qemu-devel] [PATCH 2/4] target/arm: simplify and optimize aarch64 rev16 Aurelien Jarno
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 12+ messages in thread
From: Aurelien Jarno @ 2017-05-16 23:01 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Aurelien Jarno, Peter Maydell, open list:ARM

Use the same mask to avoid having to load two different constants, as
suggested by Richard Henderson.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 target/arm/translate.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index 0b5a0bca06..5becb2bb89 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -339,11 +339,13 @@ static void gen_smul_dual(TCGv_i32 a, TCGv_i32 b)
 static void gen_rev16(TCGv_i32 var)
 {
     TCGv_i32 tmp = tcg_temp_new_i32();
+    TCGv_i32 mask = tcg_const_i32(0x00ff00ff);
     tcg_gen_shri_i32(tmp, var, 8);
-    tcg_gen_andi_i32(tmp, tmp, 0x00ff00ff);
+    tcg_gen_and_i32(tmp, tmp, mask);
+    tcg_gen_and_i32(var, var, mask);
     tcg_gen_shli_i32(var, var, 8);
-    tcg_gen_andi_i32(var, var, 0xff00ff00);
     tcg_gen_or_i32(var, var, tmp);
+    tcg_temp_free_i32(mask);
     tcg_temp_free_i32(tmp);
 }
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH 2/4] target/arm: simplify and optimize aarch64 rev16
  2017-05-16 23:01 [Qemu-devel] [PATCH 0/4] target/arm, cris, mips: optimize "swap bytes within words" Aurelien Jarno
  2017-05-16 23:01 ` [Qemu-devel] [PATCH 1/4] target/arm: optimize aarch32 rev16 Aurelien Jarno
@ 2017-05-16 23:01 ` Aurelien Jarno
  2017-05-17  0:56   ` [Qemu-devel] [Qemu-arm] " Philippe Mathieu-Daudé
  2017-05-23  0:21   ` [Qemu-devel] " Richard Henderson
  2017-05-16 23:01 ` [Qemu-devel] [PATCH 3/4] target/cris: optimize swap Aurelien Jarno
  2017-05-16 23:01 ` [Qemu-devel] [PATCH 4/4] target/mips: optimize WSBH, DSBH and DSHD Aurelien Jarno
  3 siblings, 2 replies; 12+ messages in thread
From: Aurelien Jarno @ 2017-05-16 23:01 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Aurelien Jarno, Peter Maydell, open list:ARM

Instead of byteswapping individual 16-bit words one by one, work on the
whole register at the same time using shifts and mask. This is the same
strategy than the aarch32 version of rev16 and is much more efficient
in the case sf=1.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 target/arm/translate-a64.c | 24 ++++++------------------
 1 file changed, 6 insertions(+), 18 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 24de30d92c..ed15d21655 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -4035,24 +4035,12 @@ static void handle_rev16(DisasContext *s, unsigned int sf,
     TCGv_i64 tcg_tmp = tcg_temp_new_i64();
     TCGv_i64 tcg_rn = read_cpu_reg(s, rn, sf);
 
-    tcg_gen_andi_i64(tcg_tmp, tcg_rn, 0xffff);
-    tcg_gen_bswap16_i64(tcg_rd, tcg_tmp);
-
-    tcg_gen_shri_i64(tcg_tmp, tcg_rn, 16);
-    tcg_gen_andi_i64(tcg_tmp, tcg_tmp, 0xffff);
-    tcg_gen_bswap16_i64(tcg_tmp, tcg_tmp);
-    tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_tmp, 16, 16);
-
-    if (sf) {
-        tcg_gen_shri_i64(tcg_tmp, tcg_rn, 32);
-        tcg_gen_andi_i64(tcg_tmp, tcg_tmp, 0xffff);
-        tcg_gen_bswap16_i64(tcg_tmp, tcg_tmp);
-        tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_tmp, 32, 16);
-
-        tcg_gen_shri_i64(tcg_tmp, tcg_rn, 48);
-        tcg_gen_bswap16_i64(tcg_tmp, tcg_tmp);
-        tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_tmp, 48, 16);
-    }
+    TCGv mask = tcg_const_i64(sf ? 0x00ff00ff00ff00ffull : 0x00ff00ff);
+    tcg_gen_shri_i64(tcg_tmp, tcg_rn, 8);
+    tcg_gen_and_i64(tcg_rd, tcg_rn, mask);
+    tcg_gen_and_i64(tcg_tmp, tcg_tmp, mask);
+    tcg_gen_shli_i64(tcg_rd, tcg_rd, 8);
+    tcg_gen_or_i64(tcg_rd, tcg_rd, tcg_tmp);
 
     tcg_temp_free_i64(tcg_tmp);
 }
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH 3/4] target/cris: optimize swap
  2017-05-16 23:01 [Qemu-devel] [PATCH 0/4] target/arm, cris, mips: optimize "swap bytes within words" Aurelien Jarno
  2017-05-16 23:01 ` [Qemu-devel] [PATCH 1/4] target/arm: optimize aarch32 rev16 Aurelien Jarno
  2017-05-16 23:01 ` [Qemu-devel] [PATCH 2/4] target/arm: simplify and optimize aarch64 rev16 Aurelien Jarno
@ 2017-05-16 23:01 ` Aurelien Jarno
  2017-05-17  0:59   ` Philippe Mathieu-Daudé
  2017-05-23  0:23   ` Richard Henderson
  2017-05-16 23:01 ` [Qemu-devel] [PATCH 4/4] target/mips: optimize WSBH, DSBH and DSHD Aurelien Jarno
  3 siblings, 2 replies; 12+ messages in thread
From: Aurelien Jarno @ 2017-05-16 23:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson, Aurelien Jarno, Edgar E. Iglesias

Use the same mask to avoid having to load two different constants, as
suggest by Richard Henderson. Also use one less temp.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 target/cris/translate.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/target/cris/translate.c b/target/cris/translate.c
index 0ee05ca02d..103b214233 100644
--- a/target/cris/translate.c
+++ b/target/cris/translate.c
@@ -433,20 +433,19 @@ static inline void t_gen_subx_carry(DisasContext *dc, TCGv d)
    T0 = ((T0 << 8) & 0xff00ff00) | ((T0 >> 8) & 0x00ff00ff)  */
 static inline void t_gen_swapb(TCGv d, TCGv s)
 {
-    TCGv t, org_s;
+    TCGv t, m;
 
     t = tcg_temp_new();
-    org_s = tcg_temp_new();
+    m = tcg_const_tl(0x00ff00ff);
 
     /* d and s may refer to the same object.  */
-    tcg_gen_mov_tl(org_s, s);
-    tcg_gen_shli_tl(t, org_s, 8);
-    tcg_gen_andi_tl(d, t, 0xff00ff00);
-    tcg_gen_shri_tl(t, org_s, 8);
-    tcg_gen_andi_tl(t, t, 0x00ff00ff);
+    tcg_gen_shri_tl(t, s, 8);
+    tcg_gen_and_tl(t, t, m);
+    tcg_gen_and_tl(d, s, m);
+    tcg_gen_shli_tl(d, d, 8);
     tcg_gen_or_tl(d, d, t);
+    tcg_temp_free(m);
     tcg_temp_free(t);
-    tcg_temp_free(org_s);
 }
 
 /* Swap the halfwords of the s operand.  */
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH 4/4] target/mips: optimize WSBH, DSBH and DSHD
  2017-05-16 23:01 [Qemu-devel] [PATCH 0/4] target/arm, cris, mips: optimize "swap bytes within words" Aurelien Jarno
                   ` (2 preceding siblings ...)
  2017-05-16 23:01 ` [Qemu-devel] [PATCH 3/4] target/cris: optimize swap Aurelien Jarno
@ 2017-05-16 23:01 ` Aurelien Jarno
  2017-05-23  1:07   ` Richard Henderson
  3 siblings, 1 reply; 12+ messages in thread
From: Aurelien Jarno @ 2017-05-16 23:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson, Aurelien Jarno, Yongbok Kim

Use the same mask to avoid having to load two different constants, as
suggested by Richard Henderson.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 target/mips/translate.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/target/mips/translate.c b/target/mips/translate.c
index 3022f349cb..c71eed498c 100644
--- a/target/mips/translate.c
+++ b/target/mips/translate.c
@@ -4572,12 +4572,14 @@ static void gen_bshfl (DisasContext *ctx, uint32_t op2, int rt, int rd)
     case OPC_WSBH:
         {
             TCGv t1 = tcg_temp_new();
+            TCGv t2 = tcg_const_tl(0x00FF00FF);
 
             tcg_gen_shri_tl(t1, t0, 8);
-            tcg_gen_andi_tl(t1, t1, 0x00FF00FF);
+            tcg_gen_and_tl(t1, t1, t2);
+            tcg_gen_and_tl(t0, t0, t2);
             tcg_gen_shli_tl(t0, t0, 8);
-            tcg_gen_andi_tl(t0, t0, ~0x00FF00FF);
             tcg_gen_or_tl(t0, t0, t1);
+            tcg_temp_free(t2);
             tcg_temp_free(t1);
             tcg_gen_ext32s_tl(cpu_gpr[rd], t0);
         }
@@ -4592,27 +4594,31 @@ static void gen_bshfl (DisasContext *ctx, uint32_t op2, int rt, int rd)
     case OPC_DSBH:
         {
             TCGv t1 = tcg_temp_new();
+            TCGv t2 = tcg_const_tl(0x00FF00FF00FF00FFULL);
 
             tcg_gen_shri_tl(t1, t0, 8);
-            tcg_gen_andi_tl(t1, t1, 0x00FF00FF00FF00FFULL);
+            tcg_gen_and_tl(t1, t1, t2);
+            tcg_gen_and_tl(t0, t0, t2);
             tcg_gen_shli_tl(t0, t0, 8);
-            tcg_gen_andi_tl(t0, t0, ~0x00FF00FF00FF00FFULL);
             tcg_gen_or_tl(cpu_gpr[rd], t0, t1);
+            tcg_temp_free(t2);
             tcg_temp_free(t1);
         }
         break;
     case OPC_DSHD:
         {
             TCGv t1 = tcg_temp_new();
+            TCGv t2 = tcg_const_tl(0x0000FFFF0000FFFFULL);
 
             tcg_gen_shri_tl(t1, t0, 16);
-            tcg_gen_andi_tl(t1, t1, 0x0000FFFF0000FFFFULL);
+            tcg_gen_and_tl(t1, t1, t2);
+            tcg_gen_and_tl(t0, t0, t2);
             tcg_gen_shli_tl(t0, t0, 16);
-            tcg_gen_andi_tl(t0, t0, ~0x0000FFFF0000FFFFULL);
             tcg_gen_or_tl(t0, t0, t1);
             tcg_gen_shri_tl(t1, t0, 32);
             tcg_gen_shli_tl(t0, t0, 32);
             tcg_gen_or_tl(cpu_gpr[rd], t0, t1);
+            tcg_temp_free(t2);
             tcg_temp_free(t1);
         }
         break;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH 1/4] target/arm: optimize aarch32 rev16
  2017-05-16 23:01 ` [Qemu-devel] [PATCH 1/4] target/arm: optimize aarch32 rev16 Aurelien Jarno
@ 2017-05-17  0:56   ` Philippe Mathieu-Daudé
  2017-05-23  0:21   ` [Qemu-devel] " Richard Henderson
  1 sibling, 0 replies; 12+ messages in thread
From: Philippe Mathieu-Daudé @ 2017-05-17  0:56 UTC (permalink / raw)
  To: Aurelien Jarno, qemu-devel
  Cc: Peter Maydell, open list:ARM, Richard Henderson

Hi Aurelien,

On 05/16/2017 08:01 PM, Aurelien Jarno wrote:
> Use the same mask to avoid having to load two different constants, as

> suggested by Richard Henderson.

What about
Suggested-by: Richard Henderson <rth@twiddle.net> ?

> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>

> ---
>  target/arm/translate.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/target/arm/translate.c b/target/arm/translate.c
> index 0b5a0bca06..5becb2bb89 100644
> --- a/target/arm/translate.c
> +++ b/target/arm/translate.c
> @@ -339,11 +339,13 @@ static void gen_smul_dual(TCGv_i32 a, TCGv_i32 b)
>  static void gen_rev16(TCGv_i32 var)
>  {
>      TCGv_i32 tmp = tcg_temp_new_i32();
> +    TCGv_i32 mask = tcg_const_i32(0x00ff00ff);
>      tcg_gen_shri_i32(tmp, var, 8);
> -    tcg_gen_andi_i32(tmp, tmp, 0x00ff00ff);
> +    tcg_gen_and_i32(tmp, tmp, mask);
> +    tcg_gen_and_i32(var, var, mask);
>      tcg_gen_shli_i32(var, var, 8);
> -    tcg_gen_andi_i32(var, var, 0xff00ff00);
>      tcg_gen_or_i32(var, var, tmp);
> +    tcg_temp_free_i32(mask);
>      tcg_temp_free_i32(tmp);
>  }
>
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH 2/4] target/arm: simplify and optimize aarch64 rev16
  2017-05-16 23:01 ` [Qemu-devel] [PATCH 2/4] target/arm: simplify and optimize aarch64 rev16 Aurelien Jarno
@ 2017-05-17  0:56   ` Philippe Mathieu-Daudé
  2017-05-23  0:21   ` [Qemu-devel] " Richard Henderson
  1 sibling, 0 replies; 12+ messages in thread
From: Philippe Mathieu-Daudé @ 2017-05-17  0:56 UTC (permalink / raw)
  To: Aurelien Jarno, qemu-devel
  Cc: Peter Maydell, open list:ARM, Richard Henderson

On 05/16/2017 08:01 PM, Aurelien Jarno wrote:
> Instead of byteswapping individual 16-bit words one by one, work on the
> whole register at the same time using shifts and mask. This is the same
> strategy than the aarch32 version of rev16 and is much more efficient
> in the case sf=1.
>
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>

> ---
>  target/arm/translate-a64.c | 24 ++++++------------------
>  1 file changed, 6 insertions(+), 18 deletions(-)
>
> diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
> index 24de30d92c..ed15d21655 100644
> --- a/target/arm/translate-a64.c
> +++ b/target/arm/translate-a64.c
> @@ -4035,24 +4035,12 @@ static void handle_rev16(DisasContext *s, unsigned int sf,
>      TCGv_i64 tcg_tmp = tcg_temp_new_i64();
>      TCGv_i64 tcg_rn = read_cpu_reg(s, rn, sf);
>
> -    tcg_gen_andi_i64(tcg_tmp, tcg_rn, 0xffff);
> -    tcg_gen_bswap16_i64(tcg_rd, tcg_tmp);
> -
> -    tcg_gen_shri_i64(tcg_tmp, tcg_rn, 16);
> -    tcg_gen_andi_i64(tcg_tmp, tcg_tmp, 0xffff);
> -    tcg_gen_bswap16_i64(tcg_tmp, tcg_tmp);
> -    tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_tmp, 16, 16);
> -
> -    if (sf) {
> -        tcg_gen_shri_i64(tcg_tmp, tcg_rn, 32);
> -        tcg_gen_andi_i64(tcg_tmp, tcg_tmp, 0xffff);
> -        tcg_gen_bswap16_i64(tcg_tmp, tcg_tmp);
> -        tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_tmp, 32, 16);
> -
> -        tcg_gen_shri_i64(tcg_tmp, tcg_rn, 48);
> -        tcg_gen_bswap16_i64(tcg_tmp, tcg_tmp);
> -        tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_tmp, 48, 16);
> -    }
> +    TCGv mask = tcg_const_i64(sf ? 0x00ff00ff00ff00ffull : 0x00ff00ff);
> +    tcg_gen_shri_i64(tcg_tmp, tcg_rn, 8);
> +    tcg_gen_and_i64(tcg_rd, tcg_rn, mask);
> +    tcg_gen_and_i64(tcg_tmp, tcg_tmp, mask);
> +    tcg_gen_shli_i64(tcg_rd, tcg_rd, 8);
> +    tcg_gen_or_i64(tcg_rd, tcg_rd, tcg_tmp);
>
>      tcg_temp_free_i64(tcg_tmp);
>  }
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] target/cris: optimize swap
  2017-05-16 23:01 ` [Qemu-devel] [PATCH 3/4] target/cris: optimize swap Aurelien Jarno
@ 2017-05-17  0:59   ` Philippe Mathieu-Daudé
  2017-05-23  0:23   ` Richard Henderson
  1 sibling, 0 replies; 12+ messages in thread
From: Philippe Mathieu-Daudé @ 2017-05-17  0:59 UTC (permalink / raw)
  To: Aurelien Jarno, qemu-devel; +Cc: Edgar E. Iglesias, Richard Henderson

On 05/16/2017 08:01 PM, Aurelien Jarno wrote:
> Use the same mask to avoid having to load two different constants, as
> suggest by Richard Henderson. Also use one less temp.
>
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
> ---
>  target/cris/translate.c | 15 +++++++--------
>  1 file changed, 7 insertions(+), 8 deletions(-)
>
> diff --git a/target/cris/translate.c b/target/cris/translate.c
> index 0ee05ca02d..103b214233 100644
> --- a/target/cris/translate.c
> +++ b/target/cris/translate.c
> @@ -433,20 +433,19 @@ static inline void t_gen_subx_carry(DisasContext *dc, TCGv d)
>     T0 = ((T0 << 8) & 0xff00ff00) | ((T0 >> 8) & 0x00ff00ff)  */
>  static inline void t_gen_swapb(TCGv d, TCGv s)
>  {
> -    TCGv t, org_s;
> +    TCGv t, m;
>
>      t = tcg_temp_new();
> -    org_s = tcg_temp_new();
> +    m = tcg_const_tl(0x00ff00ff);
>
>      /* d and s may refer to the same object.  */
> -    tcg_gen_mov_tl(org_s, s);
> -    tcg_gen_shli_tl(t, org_s, 8);
> -    tcg_gen_andi_tl(d, t, 0xff00ff00);
> -    tcg_gen_shri_tl(t, org_s, 8);
> -    tcg_gen_andi_tl(t, t, 0x00ff00ff);
> +    tcg_gen_shri_tl(t, s, 8);
> +    tcg_gen_and_tl(t, t, m);
> +    tcg_gen_and_tl(d, s, m);

Eventually add a comment /* set d 0xff00ff00 */

Anyway,
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>

> +    tcg_gen_shli_tl(d, d, 8);
>      tcg_gen_or_tl(d, d, t);
> +    tcg_temp_free(m);
>      tcg_temp_free(t);
> -    tcg_temp_free(org_s);
>  }
>
>  /* Swap the halfwords of the s operand.  */
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH 1/4] target/arm: optimize aarch32 rev16
  2017-05-16 23:01 ` [Qemu-devel] [PATCH 1/4] target/arm: optimize aarch32 rev16 Aurelien Jarno
  2017-05-17  0:56   ` [Qemu-devel] [Qemu-arm] " Philippe Mathieu-Daudé
@ 2017-05-23  0:21   ` Richard Henderson
  1 sibling, 0 replies; 12+ messages in thread
From: Richard Henderson @ 2017-05-23  0:21 UTC (permalink / raw)
  To: Aurelien Jarno, qemu-devel; +Cc: Peter Maydell, open list:ARM

On 05/16/2017 04:01 PM, Aurelien Jarno wrote:
> Use the same mask to avoid having to load two different constants, as
> suggested by Richard Henderson.
> 
> Signed-off-by: Aurelien Jarno<aurelien@aurel32.net>
> ---
>   target/arm/translate.c | 6 ++++--
>   1 file changed, 4 insertions(+), 2 deletions(-)

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH 2/4] target/arm: simplify and optimize aarch64 rev16
  2017-05-16 23:01 ` [Qemu-devel] [PATCH 2/4] target/arm: simplify and optimize aarch64 rev16 Aurelien Jarno
  2017-05-17  0:56   ` [Qemu-devel] [Qemu-arm] " Philippe Mathieu-Daudé
@ 2017-05-23  0:21   ` Richard Henderson
  1 sibling, 0 replies; 12+ messages in thread
From: Richard Henderson @ 2017-05-23  0:21 UTC (permalink / raw)
  To: Aurelien Jarno, qemu-devel; +Cc: Peter Maydell, open list:ARM

On 05/16/2017 04:01 PM, Aurelien Jarno wrote:
> Instead of byteswapping individual 16-bit words one by one, work on the
> whole register at the same time using shifts and mask. This is the same
> strategy than the aarch32 version of rev16 and is much more efficient
> in the case sf=1.
> 
> Signed-off-by: Aurelien Jarno<aurelien@aurel32.net>
> ---
>   target/arm/translate-a64.c | 24 ++++++------------------
>   1 file changed, 6 insertions(+), 18 deletions(-)

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] target/cris: optimize swap
  2017-05-16 23:01 ` [Qemu-devel] [PATCH 3/4] target/cris: optimize swap Aurelien Jarno
  2017-05-17  0:59   ` Philippe Mathieu-Daudé
@ 2017-05-23  0:23   ` Richard Henderson
  1 sibling, 0 replies; 12+ messages in thread
From: Richard Henderson @ 2017-05-23  0:23 UTC (permalink / raw)
  To: Aurelien Jarno, qemu-devel; +Cc: Edgar E. Iglesias

On 05/16/2017 04:01 PM, Aurelien Jarno wrote:
> Use the same mask to avoid having to load two different constants, as
> suggest by Richard Henderson. Also use one less temp.
> 
> Signed-off-by: Aurelien Jarno<aurelien@aurel32.net>
> ---
>   target/cris/translate.c | 15 +++++++--------
>   1 file changed, 7 insertions(+), 8 deletions(-)

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] target/mips: optimize WSBH, DSBH and DSHD
  2017-05-16 23:01 ` [Qemu-devel] [PATCH 4/4] target/mips: optimize WSBH, DSBH and DSHD Aurelien Jarno
@ 2017-05-23  1:07   ` Richard Henderson
  0 siblings, 0 replies; 12+ messages in thread
From: Richard Henderson @ 2017-05-23  1:07 UTC (permalink / raw)
  To: Aurelien Jarno, qemu-devel; +Cc: Yongbok Kim

On 05/16/2017 04:01 PM, Aurelien Jarno wrote:
> Use the same mask to avoid having to load two different constants, as
> suggested by Richard Henderson.
> 
> Signed-off-by: Aurelien Jarno<aurelien@aurel32.net>
> ---
>   target/mips/translate.c | 18 ++++++++++++------
>   1 file changed, 12 insertions(+), 6 deletions(-)

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-05-23  1:07 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-16 23:01 [Qemu-devel] [PATCH 0/4] target/arm, cris, mips: optimize "swap bytes within words" Aurelien Jarno
2017-05-16 23:01 ` [Qemu-devel] [PATCH 1/4] target/arm: optimize aarch32 rev16 Aurelien Jarno
2017-05-17  0:56   ` [Qemu-devel] [Qemu-arm] " Philippe Mathieu-Daudé
2017-05-23  0:21   ` [Qemu-devel] " Richard Henderson
2017-05-16 23:01 ` [Qemu-devel] [PATCH 2/4] target/arm: simplify and optimize aarch64 rev16 Aurelien Jarno
2017-05-17  0:56   ` [Qemu-devel] [Qemu-arm] " Philippe Mathieu-Daudé
2017-05-23  0:21   ` [Qemu-devel] " Richard Henderson
2017-05-16 23:01 ` [Qemu-devel] [PATCH 3/4] target/cris: optimize swap Aurelien Jarno
2017-05-17  0:59   ` Philippe Mathieu-Daudé
2017-05-23  0:23   ` Richard Henderson
2017-05-16 23:01 ` [Qemu-devel] [PATCH 4/4] target/mips: optimize WSBH, DSBH and DSHD Aurelien Jarno
2017-05-23  1:07   ` Richard Henderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.