All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v3 0/5] optimize various tcg_gen() functions using extract op
@ 2017-05-12  3:35 Philippe Mathieu-Daudé
  2017-05-12  3:35 ` [Qemu-devel] [RFC PATCH v3 1/5] coccinelle: add a script to optimize tcg op using tcg_gen_extract() Philippe Mathieu-Daudé
                   ` (4 more replies)
  0 siblings, 5 replies; 21+ messages in thread
From: Philippe Mathieu-Daudé @ 2017-05-12  3:35 UTC (permalink / raw)
  To: qemu-devel, qemu-arm, qemu-ppc, Richard Henderson,
	Alexander Graf, Artyom Tarasenko, Aurelien Jarno, David Gibson,
	Eduardo Habkost, Eric Blake, Laurent Vivier, Laurent Vivier,
	Mark Cave-Ayland, Markus Armbruster, Michael Tokarev,
	Nikunj A Dadhania, Paolo Bonzini, Peter Maydell
  Cc: Philippe Mathieu-Daudé, Markus Elfring, Julia Lawall, Nicolas Palix

Changes from v1:

In my first attempt I misunderstood tcg_gen_extract() intrinsics, and Richard
Henderson pointed that out.
In this patchset the cocci script is corrected and clarified, it also print how
arguments are checked while running.
Also:
- incorrect patches have been removed. (Richard Henderson, Nikunj A Dadhania)
- Coccinelle script licensed GPLv2+ (Eric Blake)
- comment in each commit about how to apply the patch (Eric Blake)
- added Acked-by for m68k (Laurent Vivier)
- Cc: Coccinelle developers.

[v1]

While reviewing a commit from Aurelien Jarno where he optimized a TCG generator
for SH-4 [1] I found the same optimization done on PPC by Nikunj A Dadhania few
months ago [2].
After asking on the ML about a cocci script [3] I thought it would be easier to
learn about Coccinelle.

citing Aurelien Jarno:
    This doesn't change the generated code on x86, but optimizes it on most
    RISC architectures and makes the code simpler to read.

I actually applied the script using the following command:

$ docker run -v `pwd`:`pwd` -w `pwd` petersenna/coccinelle \
    --sp-file scripts/coccinelle/tcg_gen_extract.cocci \
    --macro-file scripts/cocci-macro-file.h \
    --dir target \
    --in-place

Please review, thanks.

[1] http://lists.nongnu.org/archive/html/qemu-devel/2017-05/msg01466.html
[2] http://lists.nongnu.org/archive/html/qemu-devel/2017-02/msg05211.html
[3] http://lists.nongnu.org/archive/html/qemu-devel/2017-05/msg01499.html

Philippe Mathieu-Daudé (5):
  coccinelle: add a script to optimize tcg op using tcg_gen_extract()
  target/arm: optimize rev16() using extract op
  target/m68k: optimize bcd_flags() using extract op
  target/ppc: using various functions using extract op
  target/sparc: optimize various functions using extract op

 scripts/coccinelle/tcg_gen_extract.cocci | 69 ++++++++++++++++++++++++++++++++
 target/arm/translate-a64.c               |  6 +--
 target/m68k/translate.c                  |  3 +-
 target/ppc/translate.c                   |  9 ++---
 target/ppc/translate/vsx-impl.inc.c      | 15 +++----
 target/sparc/translate.c                 | 15 +++----
 6 files changed, 85 insertions(+), 32 deletions(-)
 create mode 100644 scripts/coccinelle/tcg_gen_extract.cocci

-- 
2.11.0

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Qemu-devel] [RFC PATCH v3 1/5] coccinelle: add a script to optimize tcg op using tcg_gen_extract()
  2017-05-12  3:35 [Qemu-devel] [PATCH v3 0/5] optimize various tcg_gen() functions using extract op Philippe Mathieu-Daudé
@ 2017-05-12  3:35 ` Philippe Mathieu-Daudé
  2017-05-12  3:41   ` Julia Lawall
                     ` (2 more replies)
  2017-05-12  3:35 ` [Qemu-devel] [PATCH v3 2/5] target/arm: optimize rev16() using extract op Philippe Mathieu-Daudé
                   ` (3 subsequent siblings)
  4 siblings, 3 replies; 21+ messages in thread
From: Philippe Mathieu-Daudé @ 2017-05-12  3:35 UTC (permalink / raw)
  To: qemu-devel, Aurelien Jarno, Richard Henderson, Nikunj A Dadhania,
	Eric Blake, Markus Armbruster, Laurent Vivier, Michael Tokarev,
	Eduardo Habkost, Paolo Bonzini
  Cc: Philippe Mathieu-Daudé, Markus Elfring, Julia Lawall, Nicolas Palix

If you have coccinelle installed you can apply this script using:

    $ spatch \
        --macro-file scripts/cocci-macro-file.h \
        --dir target --in-place

You can also use directly Peter Senna Tschudin docker image (easier):

    $ docker run -v `pwd`:`pwd` -w `pwd` petersenna/coccinelle \
        --sp-file scripts/coccinelle/tcg_gen_extract.cocci \
        --macro-file scripts/cocci-macro-file.h \
        --dir target --in-place

Then verified that no manual touchups are required.

The following thread was helpful while writing this script:

    https://github.com/coccinelle/coccinelle/issues/86

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
---
 scripts/coccinelle/tcg_gen_extract.cocci | 71 ++++++++++++++++++++++++++++++++
 1 file changed, 71 insertions(+)
 create mode 100644 scripts/coccinelle/tcg_gen_extract.cocci

diff --git a/scripts/coccinelle/tcg_gen_extract.cocci b/scripts/coccinelle/tcg_gen_extract.cocci
new file mode 100644
index 0000000000..4823073005
--- /dev/null
+++ b/scripts/coccinelle/tcg_gen_extract.cocci
@@ -0,0 +1,71 @@
+// optimize TCG using extract op
+//
+// Copyright: (C) 2017 Philippe Mathieu-Daudé. GPLv2+.
+// Confidence: High
+// Options: --macro-file scripts/cocci-macro-file.h
+//
+// Nikunj A Dadhania optimization:
+// http://lists.nongnu.org/archive/html/qemu-devel/2017-02/msg05211.html
+// Aurelien Jarno optimization:
+// http://lists.nongnu.org/archive/html/qemu-devel/2017-05/msg01466.html
+// Coccinelle helpful issue:
+// https://github.com/coccinelle/coccinelle/issues/86
+
+@match@ // match shri*+andi* pattern, calls script verify_len
+identifier ret, arg;
+constant ofs, len;
+identifier shr_fn =~ "^tcg_gen_shri_";
+identifier and_fn =~ "^tcg_gen_andi_";
+position shr_p;
+position and_p;
+@@
+(
+shr_fn@shr_p(ret, arg, ofs);
+and_fn@and_p(ret, ret, len);
+)
+
+@script:python verify_len@
+ret_s << match.ret;
+len_s << match.len;
+shr_s << match.shr_fn;
+and_s << match.and_fn;
+shr_p << match.shr_p;
+extract_fn;
+@@
+print "candidate at %s:%s" % (shr_p[0].file, shr_p[0].line)
+len_fn=len("tcg_gen_shri_")
+shr_sz=shr_s[len_fn:]
+and_sz=and_s[len_fn:]
+# TODO: op_size shr<and allowed?
+is_same_op_size = shr_sz == and_sz
+print "  op_size: %s/%s (%s)" % (shr_sz, and_sz, "same" if is_same_op_size else "DIFFERENT")
+is_optimizable = False
+if is_same_op_size:
+    try: # only eval integer, no #define like 'SR_M' (cpp did this, else some headers are missing).
+        len_v = long(len_s.strip("UL"), 0)
+        low_bits = 0
+        while (len_v & (1 << low_bits)):
+            low_bits += 1
+        print "  low_bits:", low_bits, "(value: 0x%x)" % ((1 << low_bits) - 1)
+        print "  len: 0x%x" % len_v
+        is_optimizable = ((1 << low_bits) - 1) == len_v # check low_bits
+        print "  len_bits %s= low_bits" % ("=" if is_optimizable else "!")
+        print "  candidate", "IS" if is_optimizable else "is NOT", "optimizable"
+        coccinelle.extract_fn = "tcg_gen_extract_" + and_sz
+    except:
+        print "  ERROR (check included headers?)"
+    cocci.include_match(is_optimizable)
+print
+
+@replacement depends on verify_len@
+identifier match.ret, match.arg;
+constant match.ofs, match.len;
+identifier match.shr_fn;
+identifier match.and_fn;
+position match.shr_p;
+position match.and_p;
+identifier verify_len.extract_fn;
+@@
+-shr_fn@shr_p(ret, arg, ofs);
+-and_fn@and_p(ret, ret, len);
++extract_fn(ret, arg, ofs, len);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH v3 2/5] target/arm: optimize rev16() using extract op
  2017-05-12  3:35 [Qemu-devel] [PATCH v3 0/5] optimize various tcg_gen() functions using extract op Philippe Mathieu-Daudé
  2017-05-12  3:35 ` [Qemu-devel] [RFC PATCH v3 1/5] coccinelle: add a script to optimize tcg op using tcg_gen_extract() Philippe Mathieu-Daudé
@ 2017-05-12  3:35 ` Philippe Mathieu-Daudé
  2017-05-12 16:50   ` Richard Henderson
  2017-05-12  3:35 ` [Qemu-devel] [PATCH v3 3/5] target/m68k: optimize bcd_flags() " Philippe Mathieu-Daudé
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 21+ messages in thread
From: Philippe Mathieu-Daudé @ 2017-05-12  3:35 UTC (permalink / raw)
  To: qemu-devel, Peter Maydell, Aurelien Jarno, Richard Henderson, qemu-arm
  Cc: Philippe Mathieu-Daudé

Patch created mechanically using Coccinelle script via:

    $ spatch --macro-file scripts/cocci-macro-file.h --in-place \
        --sp-file scripts/coccinelle/tcg_gen_extract.cocci --dir target

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
---
 target/arm/translate-a64.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 24de30d92c..7ea130107e 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -4038,14 +4038,12 @@ static void handle_rev16(DisasContext *s, unsigned int sf,
     tcg_gen_andi_i64(tcg_tmp, tcg_rn, 0xffff);
     tcg_gen_bswap16_i64(tcg_rd, tcg_tmp);
 
-    tcg_gen_shri_i64(tcg_tmp, tcg_rn, 16);
-    tcg_gen_andi_i64(tcg_tmp, tcg_tmp, 0xffff);
+    tcg_gen_extract_i64(tcg_tmp, tcg_rn, 16, 0xffff);
     tcg_gen_bswap16_i64(tcg_tmp, tcg_tmp);
     tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_tmp, 16, 16);
 
     if (sf) {
-        tcg_gen_shri_i64(tcg_tmp, tcg_rn, 32);
-        tcg_gen_andi_i64(tcg_tmp, tcg_tmp, 0xffff);
+        tcg_gen_extract_i64(tcg_tmp, tcg_rn, 32, 0xffff);
         tcg_gen_bswap16_i64(tcg_tmp, tcg_tmp);
         tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_tmp, 32, 16);
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH v3 3/5] target/m68k: optimize bcd_flags() using extract op
  2017-05-12  3:35 [Qemu-devel] [PATCH v3 0/5] optimize various tcg_gen() functions using extract op Philippe Mathieu-Daudé
  2017-05-12  3:35 ` [Qemu-devel] [RFC PATCH v3 1/5] coccinelle: add a script to optimize tcg op using tcg_gen_extract() Philippe Mathieu-Daudé
  2017-05-12  3:35 ` [Qemu-devel] [PATCH v3 2/5] target/arm: optimize rev16() using extract op Philippe Mathieu-Daudé
@ 2017-05-12  3:35 ` Philippe Mathieu-Daudé
  2017-05-12  3:35 ` [Qemu-devel] [PATCH v3 4/5] target/ppc: using various functions " Philippe Mathieu-Daudé
  2017-05-12  3:35 ` [Qemu-devel] [PATCH v3 5/5] target/sparc: optimize " Philippe Mathieu-Daudé
  4 siblings, 0 replies; 21+ messages in thread
From: Philippe Mathieu-Daudé @ 2017-05-12  3:35 UTC (permalink / raw)
  To: qemu-devel, Aurelien Jarno, Richard Henderson, Laurent Vivier
  Cc: Philippe Mathieu-Daudé

Patch created mechanically using Coccinelle script via:

    $ spatch --macro-file scripts/cocci-macro-file.h --in-place \
        --sp-file scripts/coccinelle/tcg_gen_extract.cocci --dir target

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Acked-by: Laurent Vivier <laurent@vivier.eu>
---
 target/m68k/translate.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/target/m68k/translate.c b/target/m68k/translate.c
index 9f60fbc0db..babb9e2c5b 100644
--- a/target/m68k/translate.c
+++ b/target/m68k/translate.c
@@ -1463,8 +1463,7 @@ static void bcd_flags(TCGv val)
     tcg_gen_andi_i32(QREG_CC_C, val, 0x0ff);
     tcg_gen_or_i32(QREG_CC_Z, QREG_CC_Z, QREG_CC_C);
 
-    tcg_gen_shri_i32(QREG_CC_C, val, 8);
-    tcg_gen_andi_i32(QREG_CC_C, QREG_CC_C, 1);
+    tcg_gen_extract_i32(QREG_CC_C, val, 8, 1);
 
     tcg_gen_mov_i32(QREG_CC_X, QREG_CC_C);
 }
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH v3 4/5] target/ppc: using various functions using extract op
  2017-05-12  3:35 [Qemu-devel] [PATCH v3 0/5] optimize various tcg_gen() functions using extract op Philippe Mathieu-Daudé
                   ` (2 preceding siblings ...)
  2017-05-12  3:35 ` [Qemu-devel] [PATCH v3 3/5] target/m68k: optimize bcd_flags() " Philippe Mathieu-Daudé
@ 2017-05-12  3:35 ` Philippe Mathieu-Daudé
  2017-05-12 23:48   ` Philippe Mathieu-Daudé
  2017-05-12  3:35 ` [Qemu-devel] [PATCH v3 5/5] target/sparc: optimize " Philippe Mathieu-Daudé
  4 siblings, 1 reply; 21+ messages in thread
From: Philippe Mathieu-Daudé @ 2017-05-12  3:35 UTC (permalink / raw)
  To: qemu-devel, Aurelien Jarno, Richard Henderson, David Gibson,
	Alexander Graf, qemu-ppc
  Cc: Philippe Mathieu-Daudé

Patch created mechanically using Coccinelle script via:

    $ spatch --macro-file scripts/cocci-macro-file.h --in-place \
        --sp-file scripts/coccinelle/tcg_gen_extract.cocci --dir target

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
---

David I did not add your Reviewed-by as suggested by Laurent Vivier after
Nikunj A Dadhania review.

 target/ppc/translate.c              |  9 +++------
 target/ppc/translate/vsx-impl.inc.c | 15 +++++----------
 2 files changed, 8 insertions(+), 16 deletions(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index f40b5a1abf..64ab412bf3 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -868,8 +868,7 @@ static inline void gen_op_arith_add(DisasContext *ctx, TCGv ret, TCGv arg1,
             }
             tcg_gen_xor_tl(cpu_ca, t0, t1);        /* bits changed w/ carry */
             tcg_temp_free(t1);
-            tcg_gen_shri_tl(cpu_ca, cpu_ca, 32);   /* extract bit 32 */
-            tcg_gen_andi_tl(cpu_ca, cpu_ca, 1);
+            tcg_gen_extract_tl(cpu_ca, cpu_ca, 32, 1);
             if (is_isa300(ctx)) {
                 tcg_gen_mov_tl(cpu_ca32, cpu_ca);
             }
@@ -1399,8 +1398,7 @@ static inline void gen_op_arith_subf(DisasContext *ctx, TCGv ret, TCGv arg1,
             tcg_temp_free(inv1);
             tcg_gen_xor_tl(cpu_ca, t0, t1);         /* bits changes w/ carry */
             tcg_temp_free(t1);
-            tcg_gen_shri_tl(cpu_ca, cpu_ca, 32);    /* extract bit 32 */
-            tcg_gen_andi_tl(cpu_ca, cpu_ca, 1);
+            tcg_gen_extract_tl(cpu_ca, cpu_ca, 32, 1);
             if (is_isa300(ctx)) {
                 tcg_gen_mov_tl(cpu_ca32, cpu_ca);
             }
@@ -5383,8 +5381,7 @@ static void gen_mfsri(DisasContext *ctx)
     CHK_SV;
     t0 = tcg_temp_new();
     gen_addr_reg_index(ctx, t0);
-    tcg_gen_shri_tl(t0, t0, 28);
-    tcg_gen_andi_tl(t0, t0, 0xF);
+    tcg_gen_extract_tl(t0, t0, 28, 0xF);
     gen_helper_load_sr(cpu_gpr[rd], cpu_env, t0);
     tcg_temp_free(t0);
     if (ra != 0 && ra != rd)
diff --git a/target/ppc/translate/vsx-impl.inc.c b/target/ppc/translate/vsx-impl.inc.c
index 7f12908029..9faffd2ddc 100644
--- a/target/ppc/translate/vsx-impl.inc.c
+++ b/target/ppc/translate/vsx-impl.inc.c
@@ -1262,8 +1262,7 @@ static void gen_xsxexpqp(DisasContext *ctx)
         gen_exception(ctx, POWERPC_EXCP_VSXU);
         return;
     }
-    tcg_gen_shri_i64(xth, xbh, 48);
-    tcg_gen_andi_i64(xth, xth, 0x7FFF);
+    tcg_gen_extract_i64(xth, xbh, 48, 0x7FFF);
     tcg_gen_movi_i64(xtl, 0);
 }
 
@@ -1448,10 +1447,8 @@ static void gen_xvxexpdp(DisasContext *ctx)
         gen_exception(ctx, POWERPC_EXCP_VSXU);
         return;
     }
-    tcg_gen_shri_i64(xth, xbh, 52);
-    tcg_gen_andi_i64(xth, xth, 0x7FF);
-    tcg_gen_shri_i64(xtl, xbl, 52);
-    tcg_gen_andi_i64(xtl, xtl, 0x7FF);
+    tcg_gen_extract_i64(xth, xbh, 52, 0x7FF);
+    tcg_gen_extract_i64(xtl, xbl, 52, 0x7FF);
 }
 
 GEN_VSX_HELPER_2(xvxsigsp, 0x00, 0x04, 0, PPC2_ISA300)
@@ -1474,16 +1471,14 @@ static void gen_xvxsigdp(DisasContext *ctx)
     zr = tcg_const_i64(0);
     nan = tcg_const_i64(2047);
 
-    tcg_gen_shri_i64(exp, xbh, 52);
-    tcg_gen_andi_i64(exp, exp, 0x7FF);
+    tcg_gen_extract_i64(exp, xbh, 52, 0x7FF);
     tcg_gen_movi_i64(t0, 0x0010000000000000);
     tcg_gen_movcond_i64(TCG_COND_EQ, t0, exp, zr, zr, t0);
     tcg_gen_movcond_i64(TCG_COND_EQ, t0, exp, nan, zr, t0);
     tcg_gen_andi_i64(xth, xbh, 0x000FFFFFFFFFFFFF);
     tcg_gen_or_i64(xth, xth, t0);
 
-    tcg_gen_shri_i64(exp, xbl, 52);
-    tcg_gen_andi_i64(exp, exp, 0x7FF);
+    tcg_gen_extract_i64(exp, xbl, 52, 0x7FF);
     tcg_gen_movi_i64(t0, 0x0010000000000000);
     tcg_gen_movcond_i64(TCG_COND_EQ, t0, exp, zr, zr, t0);
     tcg_gen_movcond_i64(TCG_COND_EQ, t0, exp, nan, zr, t0);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH v3 5/5] target/sparc: optimize various functions using extract op
  2017-05-12  3:35 [Qemu-devel] [PATCH v3 0/5] optimize various tcg_gen() functions using extract op Philippe Mathieu-Daudé
                   ` (3 preceding siblings ...)
  2017-05-12  3:35 ` [Qemu-devel] [PATCH v3 4/5] target/ppc: using various functions " Philippe Mathieu-Daudé
@ 2017-05-12  3:35 ` Philippe Mathieu-Daudé
  2017-05-12 23:49   ` Philippe Mathieu-Daudé
  4 siblings, 1 reply; 21+ messages in thread
From: Philippe Mathieu-Daudé @ 2017-05-12  3:35 UTC (permalink / raw)
  To: qemu-devel, Aurelien Jarno, Richard Henderson, Mark Cave-Ayland,
	Artyom Tarasenko
  Cc: Philippe Mathieu-Daudé

Patch created mechanically using Coccinelle script via:

    $ spatch --macro-file scripts/cocci-macro-file.h --in-place \
        --sp-file scripts/coccinelle/tcg_gen_extract.cocci --dir target

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
---
 target/sparc/translate.c | 15 +++++----------
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/target/sparc/translate.c b/target/sparc/translate.c
index aa6734d54e..a92b5c425c 100644
--- a/target/sparc/translate.c
+++ b/target/sparc/translate.c
@@ -380,29 +380,25 @@ static inline void gen_goto_tb(DisasContext *s, int tb_num,
 static inline void gen_mov_reg_N(TCGv reg, TCGv_i32 src)
 {
     tcg_gen_extu_i32_tl(reg, src);
-    tcg_gen_shri_tl(reg, reg, PSR_NEG_SHIFT);
-    tcg_gen_andi_tl(reg, reg, 0x1);
+    tcg_gen_extract_tl(reg, reg, PSR_NEG_SHIFT, 0x1);
 }
 
 static inline void gen_mov_reg_Z(TCGv reg, TCGv_i32 src)
 {
     tcg_gen_extu_i32_tl(reg, src);
-    tcg_gen_shri_tl(reg, reg, PSR_ZERO_SHIFT);
-    tcg_gen_andi_tl(reg, reg, 0x1);
+    tcg_gen_extract_tl(reg, reg, PSR_ZERO_SHIFT, 0x1);
 }
 
 static inline void gen_mov_reg_V(TCGv reg, TCGv_i32 src)
 {
     tcg_gen_extu_i32_tl(reg, src);
-    tcg_gen_shri_tl(reg, reg, PSR_OVF_SHIFT);
-    tcg_gen_andi_tl(reg, reg, 0x1);
+    tcg_gen_extract_tl(reg, reg, PSR_OVF_SHIFT, 0x1);
 }
 
 static inline void gen_mov_reg_C(TCGv reg, TCGv_i32 src)
 {
     tcg_gen_extu_i32_tl(reg, src);
-    tcg_gen_shri_tl(reg, reg, PSR_CARRY_SHIFT);
-    tcg_gen_andi_tl(reg, reg, 0x1);
+    tcg_gen_extract_tl(reg, reg, PSR_CARRY_SHIFT, 0x1);
 }
 
 static inline void gen_op_add_cc(TCGv dst, TCGv src1, TCGv src2)
@@ -638,8 +634,7 @@ static inline void gen_op_mulscc(TCGv dst, TCGv src1, TCGv src2)
     // env->y = (b2 << 31) | (env->y >> 1);
     tcg_gen_andi_tl(r_temp, cpu_cc_src, 0x1);
     tcg_gen_shli_tl(r_temp, r_temp, 31);
-    tcg_gen_shri_tl(t0, cpu_y, 1);
-    tcg_gen_andi_tl(t0, t0, 0x7fffffff);
+    tcg_gen_extract_tl(t0, cpu_y, 1, 0x7fffffff);
     tcg_gen_or_tl(t0, t0, r_temp);
     tcg_gen_andi_tl(cpu_y, t0, 0xffffffff);
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v3 1/5] coccinelle: add a script to optimize tcg op using tcg_gen_extract()
  2017-05-12  3:35 ` [Qemu-devel] [RFC PATCH v3 1/5] coccinelle: add a script to optimize tcg op using tcg_gen_extract() Philippe Mathieu-Daudé
@ 2017-05-12  3:41   ` Julia Lawall
  2017-05-12  5:36     ` Philippe Mathieu-Daudé
  2017-05-12  5:48   ` SF Markus Elfring
  2017-05-12 11:00   ` SF Markus Elfring
  2 siblings, 1 reply; 21+ messages in thread
From: Julia Lawall @ 2017-05-12  3:41 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: qemu-devel, Aurelien Jarno, Richard Henderson, Nikunj A Dadhania,
	Eric Blake, Markus Armbruster, Laurent Vivier, Michael Tokarev,
	Eduardo Habkost, Paolo Bonzini, Markus Elfring, Julia Lawall,
	Nicolas Palix

Hello,

I don't think I have seen earlier versions of this script.  Are you
proposing it to be added to the kernel?  If so, it should be put in an
appropriate subdirectory of Coccinelle.

Overall, could you explain at a high level what it is intended to do?  It
uses rather heavily regular expressions and python code, so I wonder if
this is the best way to do it.

thanks,
julia

On Fri, 12 May 2017, Philippe Mathieu-Daudé wrote:

> If you have coccinelle installed you can apply this script using:
>
>     $ spatch \
>         --macro-file scripts/cocci-macro-file.h \
>         --dir target --in-place
>
> You can also use directly Peter Senna Tschudin docker image (easier):
>
>     $ docker run -v `pwd`:`pwd` -w `pwd` petersenna/coccinelle \
>         --sp-file scripts/coccinelle/tcg_gen_extract.cocci \
>         --macro-file scripts/cocci-macro-file.h \
>         --dir target --in-place
>
> Then verified that no manual touchups are required.
>
> The following thread was helpful while writing this script:
>
>     https://github.com/coccinelle/coccinelle/issues/86
>
> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
> ---
>  scripts/coccinelle/tcg_gen_extract.cocci | 71 ++++++++++++++++++++++++++++++++
>  1 file changed, 71 insertions(+)
>  create mode 100644 scripts/coccinelle/tcg_gen_extract.cocci
>
> diff --git a/scripts/coccinelle/tcg_gen_extract.cocci b/scripts/coccinelle/tcg_gen_extract.cocci
> new file mode 100644
> index 0000000000..4823073005
> --- /dev/null
> +++ b/scripts/coccinelle/tcg_gen_extract.cocci
> @@ -0,0 +1,71 @@
> +// optimize TCG using extract op
> +//
> +// Copyright: (C) 2017 Philippe Mathieu-Daudé. GPLv2+.
> +// Confidence: High
> +// Options: --macro-file scripts/cocci-macro-file.h
> +//
> +// Nikunj A Dadhania optimization:
> +// http://lists.nongnu.org/archive/html/qemu-devel/2017-02/msg05211.html
> +// Aurelien Jarno optimization:
> +// http://lists.nongnu.org/archive/html/qemu-devel/2017-05/msg01466.html
> +// Coccinelle helpful issue:
> +// https://github.com/coccinelle/coccinelle/issues/86
> +
> +@match@ // match shri*+andi* pattern, calls script verify_len
> +identifier ret, arg;
> +constant ofs, len;
> +identifier shr_fn =~ "^tcg_gen_shri_";
> +identifier and_fn =~ "^tcg_gen_andi_";
> +position shr_p;
> +position and_p;
> +@@
> +(
> +shr_fn@shr_p(ret, arg, ofs);
> +and_fn@and_p(ret, ret, len);
> +)
> +
> +@script:python verify_len@
> +ret_s << match.ret;
> +len_s << match.len;
> +shr_s << match.shr_fn;
> +and_s << match.and_fn;
> +shr_p << match.shr_p;
> +extract_fn;
> +@@
> +print "candidate at %s:%s" % (shr_p[0].file, shr_p[0].line)
> +len_fn=len("tcg_gen_shri_")
> +shr_sz=shr_s[len_fn:]
> +and_sz=and_s[len_fn:]
> +# TODO: op_size shr<and allowed?
> +is_same_op_size = shr_sz == and_sz
> +print "  op_size: %s/%s (%s)" % (shr_sz, and_sz, "same" if is_same_op_size else "DIFFERENT")
> +is_optimizable = False
> +if is_same_op_size:
> +    try: # only eval integer, no #define like 'SR_M' (cpp did this, else some headers are missing).
> +        len_v = long(len_s.strip("UL"), 0)
> +        low_bits = 0
> +        while (len_v & (1 << low_bits)):
> +            low_bits += 1
> +        print "  low_bits:", low_bits, "(value: 0x%x)" % ((1 << low_bits) - 1)
> +        print "  len: 0x%x" % len_v
> +        is_optimizable = ((1 << low_bits) - 1) == len_v # check low_bits
> +        print "  len_bits %s= low_bits" % ("=" if is_optimizable else "!")
> +        print "  candidate", "IS" if is_optimizable else "is NOT", "optimizable"
> +        coccinelle.extract_fn = "tcg_gen_extract_" + and_sz
> +    except:
> +        print "  ERROR (check included headers?)"
> +    cocci.include_match(is_optimizable)
> +print
> +
> +@replacement depends on verify_len@
> +identifier match.ret, match.arg;
> +constant match.ofs, match.len;
> +identifier match.shr_fn;
> +identifier match.and_fn;
> +position match.shr_p;
> +position match.and_p;
> +identifier verify_len.extract_fn;
> +@@
> +-shr_fn@shr_p(ret, arg, ofs);
> +-and_fn@and_p(ret, ret, len);
> ++extract_fn(ret, arg, ofs, len);
> --
> 2.11.0
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v3 1/5] coccinelle: add a script to optimize tcg op using tcg_gen_extract()
  2017-05-12  3:41   ` Julia Lawall
@ 2017-05-12  5:36     ` Philippe Mathieu-Daudé
  2017-05-12  7:16       ` Julia Lawall
  2017-05-12 17:49       ` Eric Blake
  0 siblings, 2 replies; 21+ messages in thread
From: Philippe Mathieu-Daudé @ 2017-05-12  5:36 UTC (permalink / raw)
  To: Julia Lawall, qemu-devel, Richard Henderson, Markus Armbruster,
	Nicolas Palix
  Cc: Aurelien Jarno, Nikunj A Dadhania, Eric Blake, Laurent Vivier,
	Michael Tokarev, Eduardo Habkost, Paolo Bonzini, Markus Elfring

Hi Julia,

Sorry I planed to send you another mail but sent this mail to QEMU list 
first.

> I don't think I have seen earlier versions of this script.  Are you
> proposing it to be added to the kernel?  If so, it should be put in an
> appropriate subdirectory of Coccinelle.

This script is specific to QEMU codebase and wont benefit the Linux kernel.

> Overall, could you explain at a high level what it is intended to do?  It
> uses rather heavily regular expressions and python code, so I wonder if
> this is the best way to do it.

In this patch 
http://lists.nongnu.org/archive/html/qemu-devel/2017-05/msg01466.html 
Aurelien does:

-    tcg_gen_shri_i32(cpu_sr_q, src, SR_Q);
-    tcg_gen_andi_i32(cpu_sr_q, cpu_sr_q, 1);
+    tcg_gen_extract_i32(cpu_sr_q, src, SR_Q, 1);

having:

#define SR_Q  8

I wanted to write a Coccinelle script to check for this pattern.
My first version was wrong, as Richard Henderson reminded me this 
pattern can be applied as long as the len argument (here "1") is a 
Mersenne prime (all least significant bits as "1").

The codebase also defines:

#if TARGET_LONG_BITS == 64
# define tcg_gen_andi_tl tcg_gen_andi_i64
# define tcg_gen_shri_tl tcg_gen_shri_i64
#else
# define tcg_gen_andi_tl tcg_gen_andi_i32
# define tcg_gen_shri_tl tcg_gen_shri_i32
#endif

The same pattern can be applied for i32/i64/tl uses.

>> The following thread was helpful while writing this script:
>>
>>     https://github.com/coccinelle/coccinelle/issues/86
>>
>> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
>> ---
>>  scripts/coccinelle/tcg_gen_extract.cocci | 71 ++++++++++++++++++++++++++++++++
>>  1 file changed, 71 insertions(+)
>>  create mode 100644 scripts/coccinelle/tcg_gen_extract.cocci
>>
>> diff --git a/scripts/coccinelle/tcg_gen_extract.cocci b/scripts/coccinelle/tcg_gen_extract.cocci
>> new file mode 100644
>> index 0000000000..4823073005
>> --- /dev/null
>> +++ b/scripts/coccinelle/tcg_gen_extract.cocci
>> @@ -0,0 +1,71 @@
>> +// optimize TCG using extract op
>> +//
>> +// Copyright: (C) 2017 Philippe Mathieu-Daudé. GPLv2+.
>> +// Confidence: High
>> +// Options: --macro-file scripts/cocci-macro-file.h
>> +//
>> +// Nikunj A Dadhania optimization:
>> +// http://lists.nongnu.org/archive/html/qemu-devel/2017-02/msg05211.html
>> +// Aurelien Jarno optimization:
>> +// http://lists.nongnu.org/archive/html/qemu-devel/2017-05/msg01466.html
>> +// Coccinelle helpful issue:
>> +// https://github.com/coccinelle/coccinelle/issues/86
>> +
>> +@match@ // match shri*+andi* pattern, calls script verify_len
>> +identifier ret, arg;
>> +constant ofs, len;
>> +identifier shr_fn =~ "^tcg_gen_shri_";
>> +identifier and_fn =~ "^tcg_gen_andi_";
>> +position shr_p;
>> +position and_p;
>> +@@
>> +(
>> +shr_fn@shr_p(ret, arg, ofs);
>> +and_fn@and_p(ret, ret, len);
>> +)

First I want to match any of:
- tcg_gen_andi_i32/tcg_gen_shri_i32
- tcg_gen_andi_i64/tcg_gen_shri_i64
- tcg_gen_andi_tl/tcg_gen_shri_tl

Now I want to verify "len" is Mersenne prime.

>> +@script:python verify_len@
>> +ret_s << match.ret;
>> +len_s << match.len;
>> +shr_s << match.shr_fn;
>> +and_s << match.and_fn;
>> +shr_p << match.shr_p;
>> +extract_fn;
>> +@@
>> +print "candidate at %s:%s" % (shr_p[0].file, shr_p[0].line)
>> +len_fn=len("tcg_gen_shri_")
>> +shr_sz=shr_s[len_fn:]
>> +and_sz=and_s[len_fn:]
>> +# TODO: op_size shr<and allowed?
>> +is_same_op_size = shr_sz == and_sz

shr_s/and_s are strings containing function name.
check we matched a combination i32/i32 or i64/i64 or tl/tl.

(I think having i32/i64 and i32/tl is also valid but expect Richard's 
confirmation, anyway I doubt those combinations are used).

>> +print "  op_size: %s/%s (%s)" % (shr_sz, and_sz, "same" if is_same_op_size else "DIFFERENT")
>> +is_optimizable = False
>> +if is_same_op_size:
>> +    try: # only eval integer, no #define like 'SR_M' (cpp did this, else some headers are missing).
>> +        len_v = long(len_s.strip("UL"), 0)

Here len_s is also a string.

Some "len" encountered:
[1, 0xffff, 0x1, 0x00FF00FF, 0x0000FFFF0000FFFFULL]

Now len_v is the value of len_s.

>> +        low_bits = 0
>> +        while (len_v & (1 << low_bits)):
>> +            low_bits += 1

Dumbly count least significant bits.

>> +        print "  low_bits:", low_bits, "(value: 0x%x)" % ((1 << low_bits) - 1)
>> +        print "  len: 0x%x" % len_v
>> +        is_optimizable = ((1 << low_bits) - 1) == len_v # check low_bits

Check if Mersenne prime of "low_bits" least significant bits is the same 
number than len_v, the function argument.

If Yes: len_v is a Mersenne prime and we can optimize.

>> +        print "  len_bits %s= low_bits" % ("=" if is_optimizable else "!")
>> +        print "  candidate", "IS" if is_optimizable else "is NOT", "optimizable"
>> +        coccinelle.extract_fn = "tcg_gen_extract_" + and_sz

Add the "tcg_gen_extract()" function name as identifier in coccinelle 
namespace, appending if we are handling i32, i64 or tl.

>> +    except:
>> +        print "  ERROR (check included headers?)"
>> +    cocci.include_match(is_optimizable)

If we can not optimize, then discard this environment.

>> +print
>> +
>> +@replacement depends on verify_len@
>> +identifier match.ret, match.arg;
>> +constant match.ofs, match.len;
>> +identifier match.shr_fn;
>> +identifier match.and_fn;
>> +position match.shr_p;
>> +position match.and_p;
>> +identifier verify_len.extract_fn;
>> +@@
>> +-shr_fn@shr_p(ret, arg, ofs);
>> +-and_fn@and_p(ret, ret, len);
>> ++extract_fn(ret, arg, ofs, len);

Emit patch with function name from verify_len rule, other from the match 
rule.

>> --
>> 2.11.0

Thank you for your interest and fast reply!

Phil.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v3 1/5] coccinelle: add a script to optimize tcg op using tcg_gen_extract()
  2017-05-12  3:35 ` [Qemu-devel] [RFC PATCH v3 1/5] coccinelle: add a script to optimize tcg op using tcg_gen_extract() Philippe Mathieu-Daudé
  2017-05-12  3:41   ` Julia Lawall
@ 2017-05-12  5:48   ` SF Markus Elfring
  2017-05-12 11:00   ` SF Markus Elfring
  2 siblings, 0 replies; 21+ messages in thread
From: SF Markus Elfring @ 2017-05-12  5:48 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, qemu-devel
  Cc: Aurelien Jarno, Richard Henderson, Nikunj A Dadhania, Eric Blake,
	Markus Armbruster, Laurent Vivier, Michael Tokarev,
	Eduardo Habkost, Paolo Bonzini, Julia Lawall, Nicolas Palix

>  create mode 100644 scripts/coccinelle/tcg_gen_extract.cocci

Will an other subdirectory be more appropriate for this SmPL script?


> +// Coccinelle helpful issue:
> +// https://github.com/coccinelle/coccinelle/issues/86

I am curious if such an information source will trigger further
software evolution.
How do you think about to mention also the corresponding topic
“Propagating values back from Python script to SmPL rule with other metavariable
type than “identifier”” just for the case that the issue number can be fragile?


> +@match@ // match shri*+andi* pattern, calls script verify_len
> +identifier ret, arg;
> +constant ofs, len;
> +identifier shr_fn =~ "^tcg_gen_shri_";
> +identifier and_fn =~ "^tcg_gen_andi_";
> +position shr_p;
> +position and_p;
> +@@
> +(
> +shr_fn@shr_p(ret, arg, ofs);
> +and_fn@and_p(ret, ret, len);
> +)

My software development attention was caught also a bit by this specification.
How much do you care for coding style there?

* Two repeated SmPL key words while using the variable list functionality before.

* I wonder about the relevance for the parentheses.
  Did you try to express a disjunction for the semantic patch language
  besides the usage of two function (or macro) calls?


> +        print "  candidate", "IS" if is_optimizable else "is NOT", "optimizable"

Would you like to move this information display into a separate function?

Do you care if the “print” is the usage of a function call or a statement?
https://docs.python.org/3.0/whatsnew/3.0.html#print-is-a-function


> +-shr_fn@shr_p(ret, arg, ofs);
> +-and_fn@and_p(ret, ret, len);
> ++extract_fn(ret, arg, ofs, len);

Are there any more cases to consider for the sown function call replacement?

Regards,
Markus

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v3 1/5] coccinelle: add a script to optimize tcg op using tcg_gen_extract()
  2017-05-12  5:36     ` Philippe Mathieu-Daudé
@ 2017-05-12  7:16       ` Julia Lawall
  2017-05-12  7:39         ` SF Markus Elfring
  2017-05-12 17:49       ` Eric Blake
  1 sibling, 1 reply; 21+ messages in thread
From: Julia Lawall @ 2017-05-12  7:16 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: Julia Lawall, qemu-devel, Richard Henderson, Markus Armbruster,
	Nicolas Palix, Aurelien Jarno, Nikunj A Dadhania, Eric Blake,
	Laurent Vivier, Michael Tokarev, Eduardo Habkost, Paolo Bonzini,
	Markus Elfring



On Fri, 12 May 2017, Philippe Mathieu-Daudé wrote:

> Hi Julia,
>
> Sorry I planed to send you another mail but sent this mail to QEMU list first.
>
> > I don't think I have seen earlier versions of this script.  Are you
> > proposing it to be added to the kernel?  If so, it should be put in an
> > appropriate subdirectory of Coccinelle.
>
> This script is specific to QEMU codebase and wont benefit the Linux kernel.

OK.  Thanks for the explanation, which gives a better idea of what you are
trying to do.

Even for 6 functions, I would suggest to write out the function names in
the pattern matching code rather than using regular expressions.  If the
names are explicit, then Coccinelle can do some filtering, either based on
an index made with idutils or glimpse (see the coccinelle scripts
directory for tools for making these indices) or based on grep, to drop
without parsing files that are not relevant.  It doesn't do this for
regular expressions.  So you will get a faster result if the names are
explicit in the pattern code.

julia

>
> > Overall, could you explain at a high level what it is intended to do?  It
> > uses rather heavily regular expressions and python code, so I wonder if
> > this is the best way to do it.
>
> In this patch
> http://lists.nongnu.org/archive/html/qemu-devel/2017-05/msg01466.html Aurelien
> does:
>
> -    tcg_gen_shri_i32(cpu_sr_q, src, SR_Q);
> -    tcg_gen_andi_i32(cpu_sr_q, cpu_sr_q, 1);
> +    tcg_gen_extract_i32(cpu_sr_q, src, SR_Q, 1);
>
> having:
>
> #define SR_Q  8
>
> I wanted to write a Coccinelle script to check for this pattern.
> My first version was wrong, as Richard Henderson reminded me this pattern can
> be applied as long as the len argument (here "1") is a Mersenne prime (all
> least significant bits as "1").
>
> The codebase also defines:
>
> #if TARGET_LONG_BITS == 64
> # define tcg_gen_andi_tl tcg_gen_andi_i64
> # define tcg_gen_shri_tl tcg_gen_shri_i64
> #else
> # define tcg_gen_andi_tl tcg_gen_andi_i32
> # define tcg_gen_shri_tl tcg_gen_shri_i32
> #endif
>
> The same pattern can be applied for i32/i64/tl uses.
>
> > > The following thread was helpful while writing this script:
> > >
> > >     https://github.com/coccinelle/coccinelle/issues/86
> > >
> > > Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
> > > ---
> > >  scripts/coccinelle/tcg_gen_extract.cocci | 71
> > > ++++++++++++++++++++++++++++++++
> > >  1 file changed, 71 insertions(+)
> > >  create mode 100644 scripts/coccinelle/tcg_gen_extract.cocci
> > >
> > > diff --git a/scripts/coccinelle/tcg_gen_extract.cocci
> > > b/scripts/coccinelle/tcg_gen_extract.cocci
> > > new file mode 100644
> > > index 0000000000..4823073005
> > > --- /dev/null
> > > +++ b/scripts/coccinelle/tcg_gen_extract.cocci
> > > @@ -0,0 +1,71 @@
> > > +// optimize TCG using extract op
> > > +//
> > > +// Copyright: (C) 2017 Philippe Mathieu-Daudé. GPLv2+.
> > > +// Confidence: High
> > > +// Options: --macro-file scripts/cocci-macro-file.h
> > > +//
> > > +// Nikunj A Dadhania optimization:
> > > +// http://lists.nongnu.org/archive/html/qemu-devel/2017-02/msg05211.html
> > > +// Aurelien Jarno optimization:
> > > +// http://lists.nongnu.org/archive/html/qemu-devel/2017-05/msg01466.html
> > > +// Coccinelle helpful issue:
> > > +// https://github.com/coccinelle/coccinelle/issues/86
> > > +
> > > +@match@ // match shri*+andi* pattern, calls script verify_len
> > > +identifier ret, arg;
> > > +constant ofs, len;
> > > +identifier shr_fn =~ "^tcg_gen_shri_";
> > > +identifier and_fn =~ "^tcg_gen_andi_";
> > > +position shr_p;
> > > +position and_p;
> > > +@@
> > > +(
> > > +shr_fn@shr_p(ret, arg, ofs);
> > > +and_fn@and_p(ret, ret, len);
> > > +)
>
> First I want to match any of:
> - tcg_gen_andi_i32/tcg_gen_shri_i32
> - tcg_gen_andi_i64/tcg_gen_shri_i64
> - tcg_gen_andi_tl/tcg_gen_shri_tl
>
> Now I want to verify "len" is Mersenne prime.
>
> > > +@script:python verify_len@
> > > +ret_s << match.ret;
> > > +len_s << match.len;
> > > +shr_s << match.shr_fn;
> > > +and_s << match.and_fn;
> > > +shr_p << match.shr_p;
> > > +extract_fn;
> > > +@@
> > > +print "candidate at %s:%s" % (shr_p[0].file, shr_p[0].line)
> > > +len_fn=len("tcg_gen_shri_")
> > > +shr_sz=shr_s[len_fn:]
> > > +and_sz=and_s[len_fn:]
> > > +# TODO: op_size shr<and allowed?
> > > +is_same_op_size = shr_sz == and_sz
>
> shr_s/and_s are strings containing function name.
> check we matched a combination i32/i32 or i64/i64 or tl/tl.
>
> (I think having i32/i64 and i32/tl is also valid but expect Richard's
> confirmation, anyway I doubt those combinations are used).
>
> > > +print "  op_size: %s/%s (%s)" % (shr_sz, and_sz, "same" if
> > > is_same_op_size else "DIFFERENT")
> > > +is_optimizable = False
> > > +if is_same_op_size:
> > > +    try: # only eval integer, no #define like 'SR_M' (cpp did this, else
> > > some headers are missing).
> > > +        len_v = long(len_s.strip("UL"), 0)
>
> Here len_s is also a string.
>
> Some "len" encountered:
> [1, 0xffff, 0x1, 0x00FF00FF, 0x0000FFFF0000FFFFULL]
>
> Now len_v is the value of len_s.
>
> > > +        low_bits = 0
> > > +        while (len_v & (1 << low_bits)):
> > > +            low_bits += 1
>
> Dumbly count least significant bits.
>
> > > +        print "  low_bits:", low_bits, "(value: 0x%x)" % ((1 << low_bits)
> > > - 1)
> > > +        print "  len: 0x%x" % len_v
> > > +        is_optimizable = ((1 << low_bits) - 1) == len_v # check low_bits
>
> Check if Mersenne prime of "low_bits" least significant bits is the same
> number than len_v, the function argument.
>
> If Yes: len_v is a Mersenne prime and we can optimize.
>
> > > +        print "  len_bits %s= low_bits" % ("=" if is_optimizable else
> > > "!")
> > > +        print "  candidate", "IS" if is_optimizable else "is NOT",
> > > "optimizable"
> > > +        coccinelle.extract_fn = "tcg_gen_extract_" + and_sz
>
> Add the "tcg_gen_extract()" function name as identifier in coccinelle
> namespace, appending if we are handling i32, i64 or tl.
>
> > > +    except:
> > > +        print "  ERROR (check included headers?)"
> > > +    cocci.include_match(is_optimizable)
>
> If we can not optimize, then discard this environment.
>
> > > +print
> > > +
> > > +@replacement depends on verify_len@
> > > +identifier match.ret, match.arg;
> > > +constant match.ofs, match.len;
> > > +identifier match.shr_fn;
> > > +identifier match.and_fn;
> > > +position match.shr_p;
> > > +position match.and_p;
> > > +identifier verify_len.extract_fn;
> > > +@@
> > > +-shr_fn@shr_p(ret, arg, ofs);
> > > +-and_fn@and_p(ret, ret, len);
> > > ++extract_fn(ret, arg, ofs, len);
>
> Emit patch with function name from verify_len rule, other from the match rule.
>
> > > --
> > > 2.11.0
>
> Thank you for your interest and fast reply!
>
> Phil.
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v3 1/5] coccinelle: add a script to optimize tcg op using tcg_gen_extract()
  2017-05-12  7:16       ` Julia Lawall
@ 2017-05-12  7:39         ` SF Markus Elfring
  0 siblings, 0 replies; 21+ messages in thread
From: SF Markus Elfring @ 2017-05-12  7:39 UTC (permalink / raw)
  To: Julia Lawall
  Cc: Philippe Mathieu-Daudé,
	qemu-devel, Richard Henderson, Markus Armbruster, Nicolas Palix,
	Aurelien Jarno, Nikunj A Dadhania, Eric Blake, Laurent Vivier,
	Michael Tokarev, Eduardo Habkost, Paolo Bonzini

> Even for 6 functions, I would suggest to write out the function names in
> the pattern matching code rather than using regular expressions.  If the
> names are explicit, then Coccinelle can do some filtering, either based on
> an index made with idutils or glimpse (see the coccinelle scripts
> directory for tools for making these indices) or based on grep, to drop
> without parsing files that are not relevant.  It doesn't do this for
> regular expressions.  So you will get a faster result if the names are
> explicit in the pattern code.

Is such filtering performed for source file names?

I would find it more convenient to specify function prefixes as
regular expressions in constraints for the shown SmPL metavariables.

Regards,
Markus

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v3 1/5] coccinelle: add a script to optimize tcg op using tcg_gen_extract()
  2017-05-12  3:35 ` [Qemu-devel] [RFC PATCH v3 1/5] coccinelle: add a script to optimize tcg op using tcg_gen_extract() Philippe Mathieu-Daudé
  2017-05-12  3:41   ` Julia Lawall
  2017-05-12  5:48   ` SF Markus Elfring
@ 2017-05-12 11:00   ` SF Markus Elfring
  2 siblings, 0 replies; 21+ messages in thread
From: SF Markus Elfring @ 2017-05-12 11:00 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, qemu-devel
  Cc: Aurelien Jarno, Richard Henderson, Nikunj A Dadhania, Eric Blake,
	Markus Armbruster, Laurent Vivier, Michael Tokarev,
	Eduardo Habkost, Paolo Bonzini, Julia Lawall, Nicolas Palix

> +        print "  candidate", "IS" if is_optimizable else "is NOT", "optimizable"

I suggest to increase your software development attention also for
another detail here.

This information display is using the channel “sys.stdout”.
How do you think about to use the function “sys.stderr.write” (or an other
separate file) instead?

Regards,
Markus

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/5] target/arm: optimize rev16() using extract op
  2017-05-12  3:35 ` [Qemu-devel] [PATCH v3 2/5] target/arm: optimize rev16() using extract op Philippe Mathieu-Daudé
@ 2017-05-12 16:50   ` Richard Henderson
  2017-05-12 18:21     ` Aurelien Jarno
  2017-05-12 23:50     ` Philippe Mathieu-Daudé
  0 siblings, 2 replies; 21+ messages in thread
From: Richard Henderson @ 2017-05-12 16:50 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé,
	qemu-devel, Peter Maydell, Aurelien Jarno, qemu-arm

On 05/11/2017 08:35 PM, Philippe Mathieu-Daudé wrote:
> -    tcg_gen_shri_i64(tcg_tmp, tcg_rn, 16);
> -    tcg_gen_andi_i64(tcg_tmp, tcg_tmp, 0xffff);
> +    tcg_gen_extract_i64(tcg_tmp, tcg_rn, 16, 0xffff);

So your new script didn't work then?  This should be "..., 16, 16);".


r~

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v3 1/5] coccinelle: add a script to optimize tcg op using tcg_gen_extract()
  2017-05-12  5:36     ` Philippe Mathieu-Daudé
  2017-05-12  7:16       ` Julia Lawall
@ 2017-05-12 17:49       ` Eric Blake
  1 sibling, 0 replies; 21+ messages in thread
From: Eric Blake @ 2017-05-12 17:49 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé,
	Julia Lawall, qemu-devel, Richard Henderson, Markus Armbruster,
	Nicolas Palix
  Cc: Aurelien Jarno, Nikunj A Dadhania, Laurent Vivier,
	Michael Tokarev, Eduardo Habkost, Paolo Bonzini, Markus Elfring

[-- Attachment #1: Type: text/plain, Size: 1112 bytes --]

On 05/12/2017 12:36 AM, Philippe Mathieu-Daudé wrote:

> In this patch
> http://lists.nongnu.org/archive/html/qemu-devel/2017-05/msg01466.html
> Aurelien does:
> 
> -    tcg_gen_shri_i32(cpu_sr_q, src, SR_Q);
> -    tcg_gen_andi_i32(cpu_sr_q, cpu_sr_q, 1);
> +    tcg_gen_extract_i32(cpu_sr_q, src, SR_Q, 1);
> 
> having:
> 
> #define SR_Q  8
> 
> I wanted to write a Coccinelle script to check for this pattern.
> My first version was wrong, as Richard Henderson reminded me this
> pattern can be applied as long as the len argument (here "1") is a
> Mersenne prime (all least significant bits as "1").

Side note: while you are correct that a Mersenne prime is one less than
a power of 2, your use of the term here is incorrect. You are looking
for ALL instances of numbers that are one less than a power of two, and
not just the Mersenne primes.  (For instance, 0xf is NOT a Mersenne
prime, but IS a candidate for an optimization using a length of 4.)

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/5] target/arm: optimize rev16() using extract op
  2017-05-12 16:50   ` Richard Henderson
@ 2017-05-12 18:21     ` Aurelien Jarno
  2017-05-12 19:05       ` Richard Henderson
  2017-05-12 23:50     ` Philippe Mathieu-Daudé
  1 sibling, 1 reply; 21+ messages in thread
From: Aurelien Jarno @ 2017-05-12 18:21 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Philippe Mathieu-Daudé, qemu-devel, Peter Maydell, qemu-arm

On 2017-05-12 09:50, Richard Henderson wrote:
> On 05/11/2017 08:35 PM, Philippe Mathieu-Daudé wrote:
> > -    tcg_gen_shri_i64(tcg_tmp, tcg_rn, 16);
> > -    tcg_gen_andi_i64(tcg_tmp, tcg_tmp, 0xffff);
> > +    tcg_gen_extract_i64(tcg_tmp, tcg_rn, 16, 0xffff);
> 
> So your new script didn't work then?  This should be "..., 16, 16);".

Indeed that should be ..., 16, 16). That said looking a bit at the 
actual code, it looks like rev16 is not implemented efficiently. Instead
of byteswapping individual 16-bit words one by one, it would be better
to work on the whole register at the same time using shifts and mask.
This is actually how rev16 is implemented for aarch32 (and a few other
targets). Something like that (i can send a proper patch later):

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 24de30d92c..ccb276417b 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -4034,25 +4034,14 @@ static void handle_rev16(DisasContext *s, unsigned int sf,
     TCGv_i64 tcg_rd = cpu_reg(s, rd);
     TCGv_i64 tcg_tmp = tcg_temp_new_i64();
     TCGv_i64 tcg_rn = read_cpu_reg(s, rn, sf);
-
-    tcg_gen_andi_i64(tcg_tmp, tcg_rn, 0xffff);
-    tcg_gen_bswap16_i64(tcg_rd, tcg_tmp);
-
-    tcg_gen_shri_i64(tcg_tmp, tcg_rn, 16);
-    tcg_gen_andi_i64(tcg_tmp, tcg_tmp, 0xffff);
-    tcg_gen_bswap16_i64(tcg_tmp, tcg_tmp);
-    tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_tmp, 16, 16);
-
-    if (sf) {
-        tcg_gen_shri_i64(tcg_tmp, tcg_rn, 32);
-        tcg_gen_andi_i64(tcg_tmp, tcg_tmp, 0xffff);
-        tcg_gen_bswap16_i64(tcg_tmp, tcg_tmp);
-        tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_tmp, 32, 16);
-
-        tcg_gen_shri_i64(tcg_tmp, tcg_rn, 48);
-        tcg_gen_bswap16_i64(tcg_tmp, tcg_tmp);
-        tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_tmp, 48, 16);
-    }
+    uint64_t mask1 = sf ? 0x00ff00ff00ff00ffull : 0x00ff00ff;
+    uint64_t mask2 = sf ? 0xff00ff00ff00ff00ull : 0xff00ff00;
+
+    tcg_gen_shri_i64(tcg_tmp, tcg_rn, 8);
+    tcg_gen_andi_i64(tcg_tmp, tcg_tmp, mask1);
+    tcg_gen_shli_i64(tcg_rd, tcg_rn, 8);
+    tcg_gen_andi_i64(tcg_rd, tcg_rd, mask2);
+    tcg_gen_or_i64(tcg_rd, tcg_rd, tcg_tmp);
 
     tcg_temp_free_i64(tcg_tmp);
 }

This makes the generated x86-64 code much shorter, especially with sf=1:


* rev16 with sf = 0

before:
    0x5631ecfda582:  movzwl %bx,%r12d
    0x5631ecfda586:  rol    $0x8,%r12w
    0x5631ecfda58b:  shr    $0x10,%rbx
    0x5631ecfda58f:  rol    $0x8,%bx
    0x5631ecfda593:  movzwl %bx,%ebx
    0x5631ecfda596:  shl    $0x10,%rbx
    0x5631ecfda59a:  mov    $0xffffffff0000ffff,%r13
    0x5631ecfda5a4:  and    %r13,%r12
    0x5631ecfda5a7:  or     %rbx,%r12
    
after:
    0x559f7aeae5f2:  mov    %rbx,%r12
    0x559f7aeae5f5:  shr    $0x8,%r12
    0x559f7aeae5f9:  and    $0xff00ff,%r12d
    0x559f7aeae600:  shl    $0x8,%rbx
    0x559f7aeae604:  and    $0xff00ff00,%ebx
    0x559f7aeae60a:  or     %r12,%rbx
    
    
* rev16 with sf = 1
    
before:
    0x5631ecfe5380:  mov    %rbx,%r12
    0x5631ecfe5383:  movzwl %bx,%ebx
    0x5631ecfe5386:  rol    $0x8,%bx
    0x5631ecfe538a:  mov    %r12,%r13
    0x5631ecfe538d:  shr    $0x10,%r13
    0x5631ecfe5391:  movzwl %r13w,%r13d
    0x5631ecfe5395:  rol    $0x8,%r13w
    0x5631ecfe539a:  movzwl %r13w,%r13d
    0x5631ecfe539e:  shl    $0x10,%r13
    0x5631ecfe53a2:  mov    $0xffffffff0000ffff,%r15
    0x5631ecfe53ac:  and    %r15,%rbx
    0x5631ecfe53af:  or     %r13,%rbx
    0x5631ecfe53b2:  mov    %r12,%r13
    0x5631ecfe53b5:  shr    $0x20,%r13
    0x5631ecfe53b9:  movzwl %r13w,%r13d
    0x5631ecfe53bd:  rol    $0x8,%r13w
    0x5631ecfe53c2:  movzwl %r13w,%r13d
    0x5631ecfe53c6:  shl    $0x20,%r13
    0x5631ecfe53ca:  mov    $0xffff0000ffffffff,%r15
    0x5631ecfe53d4:  and    %r15,%rbx
    0x5631ecfe53d7:  or     %r13,%rbx
    0x5631ecfe53da:  shr    $0x30,%r12
    0x5631ecfe53de:  rol    $0x8,%r12w
    0x5631ecfe53e3:  shl    $0x30,%r12
    0x5631ecfe53e7:  mov    $0xffffffffffff,%r13
    0x5631ecfe53f1:  and    %r13,%rbx
    0x5631ecfe53f4:  or     %r12,%rbx
    
after:
    0x559f7aeb93e0:  mov    %rbx,%r12
    0x559f7aeb93e3:  shr    $0x8,%r12
    0x559f7aeb93e7:  mov    $0xff00ff00ff00ff,%r13
    0x559f7aeb93f1:  and    %r13,%r12
    0x559f7aeb93f4:  shl    $0x8,%rbx
    0x559f7aeb93f8:  mov    $0xff00ff00ff00ff00,%r13
    0x559f7aeb9402:  and    %r13,%rbx
    0x559f7aeb9405:  or     %r12,%rbx

Aurelien

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/5] target/arm: optimize rev16() using extract op
  2017-05-12 18:21     ` Aurelien Jarno
@ 2017-05-12 19:05       ` Richard Henderson
  2017-05-12 19:22         ` Aurelien Jarno
  0 siblings, 1 reply; 21+ messages in thread
From: Richard Henderson @ 2017-05-12 19:05 UTC (permalink / raw)
  To: Aurelien Jarno
  Cc: Philippe Mathieu-Daudé, qemu-devel, Peter Maydell, qemu-arm

On 05/12/2017 11:21 AM, Aurelien Jarno wrote:
> +    uint64_t mask1 = sf ? 0x00ff00ff00ff00ffull : 0x00ff00ff;
> +    uint64_t mask2 = sf ? 0xff00ff00ff00ff00ull : 0xff00ff00;
> +
> +    tcg_gen_shri_i64(tcg_tmp, tcg_rn, 8);
> +    tcg_gen_andi_i64(tcg_tmp, tcg_tmp, mask1);
> +    tcg_gen_shli_i64(tcg_rd, tcg_rn, 8);
> +    tcg_gen_andi_i64(tcg_rd, tcg_rd, mask2);

It would probably be better to use a single mask, since they're not free to 
instantiate in a register.  So e.g.

   TCGv mask = tcg_const_i64(sf ? 0x00ff00ff00ff00ffull : 0x00ff00ff);
   tcg_gen_shri_i64(tcg_tmp, tcg_rn, 8);
   tcg_gen_and_i64(tcg_rd, tcg_rn, mask);
   tcg_gen_and_i64(tcg_tmp, tcg_tmp, mask);
   tcg_gen_shli_i64(tcg_rd, tcg_rd, 8);


r~

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/5] target/arm: optimize rev16() using extract op
  2017-05-12 19:05       ` Richard Henderson
@ 2017-05-12 19:22         ` Aurelien Jarno
  2017-05-12 19:38           ` Richard Henderson
  0 siblings, 1 reply; 21+ messages in thread
From: Aurelien Jarno @ 2017-05-12 19:22 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Philippe Mathieu-Daudé, qemu-devel, Peter Maydell, qemu-arm

On 2017-05-12 12:05, Richard Henderson wrote:
> On 05/12/2017 11:21 AM, Aurelien Jarno wrote:
> > +    uint64_t mask1 = sf ? 0x00ff00ff00ff00ffull : 0x00ff00ff;
> > +    uint64_t mask2 = sf ? 0xff00ff00ff00ff00ull : 0xff00ff00;
> > +
> > +    tcg_gen_shri_i64(tcg_tmp, tcg_rn, 8);
> > +    tcg_gen_andi_i64(tcg_tmp, tcg_tmp, mask1);
> > +    tcg_gen_shli_i64(tcg_rd, tcg_rn, 8);
> > +    tcg_gen_andi_i64(tcg_rd, tcg_rd, mask2);
> 
> It would probably be better to use a single mask, since they're not free to
> instantiate in a register.  So e.g.
> 
>   TCGv mask = tcg_const_i64(sf ? 0x00ff00ff00ff00ffull : 0x00ff00ff);
>   tcg_gen_shri_i64(tcg_tmp, tcg_rn, 8);
>   tcg_gen_and_i64(tcg_rd, tcg_rn, mask);
>   tcg_gen_and_i64(tcg_tmp, tcg_tmp, mask);
>   tcg_gen_shli_i64(tcg_rd, tcg_rd, 8);

Indeed that improves things a bit for sf=1. For sf=0 though the
constant is never loaded into a register, it is passed to the and
instructions as an immediate.

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/5] target/arm: optimize rev16() using extract op
  2017-05-12 19:22         ` Aurelien Jarno
@ 2017-05-12 19:38           ` Richard Henderson
  0 siblings, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2017-05-12 19:38 UTC (permalink / raw)
  To: Aurelien Jarno
  Cc: Philippe Mathieu-Daudé, qemu-devel, Peter Maydell, qemu-arm

On 05/12/2017 12:22 PM, Aurelien Jarno wrote:
> On 2017-05-12 12:05, Richard Henderson wrote:
>> On 05/12/2017 11:21 AM, Aurelien Jarno wrote:
>>> +    uint64_t mask1 = sf ? 0x00ff00ff00ff00ffull : 0x00ff00ff;
>>> +    uint64_t mask2 = sf ? 0xff00ff00ff00ff00ull : 0xff00ff00;
>>> +
>>> +    tcg_gen_shri_i64(tcg_tmp, tcg_rn, 8);
>>> +    tcg_gen_andi_i64(tcg_tmp, tcg_tmp, mask1);
>>> +    tcg_gen_shli_i64(tcg_rd, tcg_rn, 8);
>>> +    tcg_gen_andi_i64(tcg_rd, tcg_rd, mask2);
>>
>> It would probably be better to use a single mask, since they're not free to
>> instantiate in a register.  So e.g.
>>
>>    TCGv mask = tcg_const_i64(sf ? 0x00ff00ff00ff00ffull : 0x00ff00ff);
>>    tcg_gen_shri_i64(tcg_tmp, tcg_rn, 8);
>>    tcg_gen_and_i64(tcg_rd, tcg_rn, mask);
>>    tcg_gen_and_i64(tcg_tmp, tcg_tmp, mask);
>>    tcg_gen_shli_i64(tcg_rd, tcg_rd, 8);
> 
> Indeed that improves things a bit for sf=1. For sf=0 though the
> constant is never loaded into a register, it is passed to the and
> instructions as an immediate.
> 
For x86 (and sometimes s390) it isn't, but it certainly would be for all other 
hosts.


r~

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v3 4/5] target/ppc: using various functions using extract op
  2017-05-12  3:35 ` [Qemu-devel] [PATCH v3 4/5] target/ppc: using various functions " Philippe Mathieu-Daudé
@ 2017-05-12 23:48   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 21+ messages in thread
From: Philippe Mathieu-Daudé @ 2017-05-12 23:48 UTC (permalink / raw)
  To: qemu-devel, Aurelien Jarno, Richard Henderson, David Gibson,
	Alexander Graf, qemu-ppc

This patch is also incorrect, please see v4.

On 05/12/2017 12:35 AM, Philippe Mathieu-Daudé wrote:
> Patch created mechanically using Coccinelle script via:
>
>     $ spatch --macro-file scripts/cocci-macro-file.h --in-place \
>         --sp-file scripts/coccinelle/tcg_gen_extract.cocci --dir target
>
> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
> ---
>
> David I did not add your Reviewed-by as suggested by Laurent Vivier after
> Nikunj A Dadhania review.
>
>  target/ppc/translate.c              |  9 +++------
>  target/ppc/translate/vsx-impl.inc.c | 15 +++++----------
>  2 files changed, 8 insertions(+), 16 deletions(-)
>
> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> index f40b5a1abf..64ab412bf3 100644
> --- a/target/ppc/translate.c
> +++ b/target/ppc/translate.c
> @@ -868,8 +868,7 @@ static inline void gen_op_arith_add(DisasContext *ctx, TCGv ret, TCGv arg1,
>              }
>              tcg_gen_xor_tl(cpu_ca, t0, t1);        /* bits changed w/ carry */
>              tcg_temp_free(t1);
> -            tcg_gen_shri_tl(cpu_ca, cpu_ca, 32);   /* extract bit 32 */
> -            tcg_gen_andi_tl(cpu_ca, cpu_ca, 1);
> +            tcg_gen_extract_tl(cpu_ca, cpu_ca, 32, 1);
>              if (is_isa300(ctx)) {
>                  tcg_gen_mov_tl(cpu_ca32, cpu_ca);
>              }
> @@ -1399,8 +1398,7 @@ static inline void gen_op_arith_subf(DisasContext *ctx, TCGv ret, TCGv arg1,
>              tcg_temp_free(inv1);
>              tcg_gen_xor_tl(cpu_ca, t0, t1);         /* bits changes w/ carry */
>              tcg_temp_free(t1);
> -            tcg_gen_shri_tl(cpu_ca, cpu_ca, 32);    /* extract bit 32 */
> -            tcg_gen_andi_tl(cpu_ca, cpu_ca, 1);
> +            tcg_gen_extract_tl(cpu_ca, cpu_ca, 32, 1);
>              if (is_isa300(ctx)) {
>                  tcg_gen_mov_tl(cpu_ca32, cpu_ca);
>              }
> @@ -5383,8 +5381,7 @@ static void gen_mfsri(DisasContext *ctx)
>      CHK_SV;
>      t0 = tcg_temp_new();
>      gen_addr_reg_index(ctx, t0);
> -    tcg_gen_shri_tl(t0, t0, 28);
> -    tcg_gen_andi_tl(t0, t0, 0xF);
> +    tcg_gen_extract_tl(t0, t0, 28, 0xF);

WRONG

>      gen_helper_load_sr(cpu_gpr[rd], cpu_env, t0);
>      tcg_temp_free(t0);
>      if (ra != 0 && ra != rd)
> diff --git a/target/ppc/translate/vsx-impl.inc.c b/target/ppc/translate/vsx-impl.inc.c
> index 7f12908029..9faffd2ddc 100644
> --- a/target/ppc/translate/vsx-impl.inc.c
> +++ b/target/ppc/translate/vsx-impl.inc.c
> @@ -1262,8 +1262,7 @@ static void gen_xsxexpqp(DisasContext *ctx)
>          gen_exception(ctx, POWERPC_EXCP_VSXU);
>          return;
>      }
> -    tcg_gen_shri_i64(xth, xbh, 48);
> -    tcg_gen_andi_i64(xth, xth, 0x7FFF);
> +    tcg_gen_extract_i64(xth, xbh, 48, 0x7FFF);

WRONG

>      tcg_gen_movi_i64(xtl, 0);
>  }
>
> @@ -1448,10 +1447,8 @@ static void gen_xvxexpdp(DisasContext *ctx)
>          gen_exception(ctx, POWERPC_EXCP_VSXU);
>          return;
>      }
> -    tcg_gen_shri_i64(xth, xbh, 52);
> -    tcg_gen_andi_i64(xth, xth, 0x7FF);
> -    tcg_gen_shri_i64(xtl, xbl, 52);
> -    tcg_gen_andi_i64(xtl, xtl, 0x7FF);
> +    tcg_gen_extract_i64(xth, xbh, 52, 0x7FF);
> +    tcg_gen_extract_i64(xtl, xbl, 52, 0x7FF);

WRONG

>  }
>
>  GEN_VSX_HELPER_2(xvxsigsp, 0x00, 0x04, 0, PPC2_ISA300)
> @@ -1474,16 +1471,14 @@ static void gen_xvxsigdp(DisasContext *ctx)
>      zr = tcg_const_i64(0);
>      nan = tcg_const_i64(2047);
>
> -    tcg_gen_shri_i64(exp, xbh, 52);
> -    tcg_gen_andi_i64(exp, exp, 0x7FF);
> +    tcg_gen_extract_i64(exp, xbh, 52, 0x7FF);

WRONG

>      tcg_gen_movi_i64(t0, 0x0010000000000000);
>      tcg_gen_movcond_i64(TCG_COND_EQ, t0, exp, zr, zr, t0);
>      tcg_gen_movcond_i64(TCG_COND_EQ, t0, exp, nan, zr, t0);
>      tcg_gen_andi_i64(xth, xbh, 0x000FFFFFFFFFFFFF);
>      tcg_gen_or_i64(xth, xth, t0);
>
> -    tcg_gen_shri_i64(exp, xbl, 52);
> -    tcg_gen_andi_i64(exp, exp, 0x7FF);
> +    tcg_gen_extract_i64(exp, xbl, 52, 0x7FF);

WRONG

>      tcg_gen_movi_i64(t0, 0x0010000000000000);
>      tcg_gen_movcond_i64(TCG_COND_EQ, t0, exp, zr, zr, t0);
>      tcg_gen_movcond_i64(TCG_COND_EQ, t0, exp, nan, zr, t0);
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v3 5/5] target/sparc: optimize various functions using extract op
  2017-05-12  3:35 ` [Qemu-devel] [PATCH v3 5/5] target/sparc: optimize " Philippe Mathieu-Daudé
@ 2017-05-12 23:49   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 21+ messages in thread
From: Philippe Mathieu-Daudé @ 2017-05-12 23:49 UTC (permalink / raw)
  To: qemu-devel, Aurelien Jarno, Richard Henderson, Mark Cave-Ayland,
	Artyom Tarasenko

This patch is incorrect, please see v4.

On 05/12/2017 12:35 AM, Philippe Mathieu-Daudé wrote:
> Patch created mechanically using Coccinelle script via:
>
>     $ spatch --macro-file scripts/cocci-macro-file.h --in-place \
>         --sp-file scripts/coccinelle/tcg_gen_extract.cocci --dir target
>
> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
> ---
>  target/sparc/translate.c | 15 +++++----------
>  1 file changed, 5 insertions(+), 10 deletions(-)
>
> diff --git a/target/sparc/translate.c b/target/sparc/translate.c
> index aa6734d54e..a92b5c425c 100644
> --- a/target/sparc/translate.c
> +++ b/target/sparc/translate.c
> @@ -380,29 +380,25 @@ static inline void gen_goto_tb(DisasContext *s, int tb_num,
>  static inline void gen_mov_reg_N(TCGv reg, TCGv_i32 src)
>  {
>      tcg_gen_extu_i32_tl(reg, src);
> -    tcg_gen_shri_tl(reg, reg, PSR_NEG_SHIFT);
> -    tcg_gen_andi_tl(reg, reg, 0x1);
> +    tcg_gen_extract_tl(reg, reg, PSR_NEG_SHIFT, 0x1);
>  }
>
>  static inline void gen_mov_reg_Z(TCGv reg, TCGv_i32 src)
>  {
>      tcg_gen_extu_i32_tl(reg, src);
> -    tcg_gen_shri_tl(reg, reg, PSR_ZERO_SHIFT);
> -    tcg_gen_andi_tl(reg, reg, 0x1);
> +    tcg_gen_extract_tl(reg, reg, PSR_ZERO_SHIFT, 0x1);
>  }
>
>  static inline void gen_mov_reg_V(TCGv reg, TCGv_i32 src)
>  {
>      tcg_gen_extu_i32_tl(reg, src);
> -    tcg_gen_shri_tl(reg, reg, PSR_OVF_SHIFT);
> -    tcg_gen_andi_tl(reg, reg, 0x1);
> +    tcg_gen_extract_tl(reg, reg, PSR_OVF_SHIFT, 0x1);
>  }
>
>  static inline void gen_mov_reg_C(TCGv reg, TCGv_i32 src)
>  {
>      tcg_gen_extu_i32_tl(reg, src);
> -    tcg_gen_shri_tl(reg, reg, PSR_CARRY_SHIFT);
> -    tcg_gen_andi_tl(reg, reg, 0x1);
> +    tcg_gen_extract_tl(reg, reg, PSR_CARRY_SHIFT, 0x1);
>  }
>
>  static inline void gen_op_add_cc(TCGv dst, TCGv src1, TCGv src2)
> @@ -638,8 +634,7 @@ static inline void gen_op_mulscc(TCGv dst, TCGv src1, TCGv src2)
>      // env->y = (b2 << 31) | (env->y >> 1);
>      tcg_gen_andi_tl(r_temp, cpu_cc_src, 0x1);
>      tcg_gen_shli_tl(r_temp, r_temp, 31);
> -    tcg_gen_shri_tl(t0, cpu_y, 1);
> -    tcg_gen_andi_tl(t0, t0, 0x7fffffff);
> +    tcg_gen_extract_tl(t0, cpu_y, 1, 0x7fffffff);

WRONG

>      tcg_gen_or_tl(t0, t0, r_temp);
>      tcg_gen_andi_tl(cpu_y, t0, 0xffffffff);
>
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/5] target/arm: optimize rev16() using extract op
  2017-05-12 16:50   ` Richard Henderson
  2017-05-12 18:21     ` Aurelien Jarno
@ 2017-05-12 23:50     ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 21+ messages in thread
From: Philippe Mathieu-Daudé @ 2017-05-12 23:50 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel, Peter Maydell, Aurelien Jarno, qemu-arm

On 05/12/2017 01:50 PM, Richard Henderson wrote:
> On 05/11/2017 08:35 PM, Philippe Mathieu-Daudé wrote:
>> -    tcg_gen_shri_i64(tcg_tmp, tcg_rn, 16);
>> -    tcg_gen_andi_i64(tcg_tmp, tcg_tmp, 0xffff);
>> +    tcg_gen_extract_i64(tcg_tmp, tcg_rn, 16, 0xffff);
>
> So your new script didn't work then?  This should be "..., 16, 16);".

Yeah this is wrong :(

I hope I got it in the last patchset (v4).

Thank for the review,

Phil.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2017-05-12 23:51 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-12  3:35 [Qemu-devel] [PATCH v3 0/5] optimize various tcg_gen() functions using extract op Philippe Mathieu-Daudé
2017-05-12  3:35 ` [Qemu-devel] [RFC PATCH v3 1/5] coccinelle: add a script to optimize tcg op using tcg_gen_extract() Philippe Mathieu-Daudé
2017-05-12  3:41   ` Julia Lawall
2017-05-12  5:36     ` Philippe Mathieu-Daudé
2017-05-12  7:16       ` Julia Lawall
2017-05-12  7:39         ` SF Markus Elfring
2017-05-12 17:49       ` Eric Blake
2017-05-12  5:48   ` SF Markus Elfring
2017-05-12 11:00   ` SF Markus Elfring
2017-05-12  3:35 ` [Qemu-devel] [PATCH v3 2/5] target/arm: optimize rev16() using extract op Philippe Mathieu-Daudé
2017-05-12 16:50   ` Richard Henderson
2017-05-12 18:21     ` Aurelien Jarno
2017-05-12 19:05       ` Richard Henderson
2017-05-12 19:22         ` Aurelien Jarno
2017-05-12 19:38           ` Richard Henderson
2017-05-12 23:50     ` Philippe Mathieu-Daudé
2017-05-12  3:35 ` [Qemu-devel] [PATCH v3 3/5] target/m68k: optimize bcd_flags() " Philippe Mathieu-Daudé
2017-05-12  3:35 ` [Qemu-devel] [PATCH v3 4/5] target/ppc: using various functions " Philippe Mathieu-Daudé
2017-05-12 23:48   ` Philippe Mathieu-Daudé
2017-05-12  3:35 ` [Qemu-devel] [PATCH v3 5/5] target/sparc: optimize " Philippe Mathieu-Daudé
2017-05-12 23:49   ` Philippe Mathieu-Daudé

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.