All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] sparc64 lazy conditional codes evaluation
@ 2010-05-03  7:17 Igor Kovalenko
  2010-05-03 19:24 ` [Qemu-devel] " Blue Swirl
  0 siblings, 1 reply; 13+ messages in thread
From: Igor Kovalenko @ 2010-05-03  7:17 UTC (permalink / raw)
  To: qemu-devel, Blue Swirl

Hi!

There is an issue with lazy conditional codes evaluation where
we return from trap handler with mismatching conditionals.

I seldom reproduce it here when dragging qemu window while
machine is working through silo initialization. I use gentoo minimal cd
install-sparc64-minimal-20100322.iso but I think anything with silo boot
would experience the same. Once in a while it would report crc error,
unable to open cd partition or it would fail to decompress image.

Pattern that fails appears to require a sequence of compare insn
possibly followed by a few instructions which do not touch conditionals,
then conditional branch insn. If it happens that we trap while processing
conditional branch insn so it is restarted after return from trap then
seldom conditional codes are calculated incorrectly.

I cannot point to exact cause but it appears that after trap return
we may have CC_OP and CC_SRC* mismatch somewhere,
since adding more cond evaluation flushes over the code helps.

We already tried doing flush more frequently and it is still not
complete, so the question is how to finally do this once and right :)

Obviously I do not get the design of lazy evaluation right, but
the following list appears to be good start. Plan is to prepare
a change to qemu and find a way to test it.

1. Since SPARC* is a RISC CPU it seems to be not profitable to
   use DisasContext->cc_op to predict if flags should be not evaluated
   due to overriding insn. Instead we can drop cc_op from disassembler
   context and simplify code to only use cc_op from env.
   Another point is that we always write to env->cc_op when
translating *cc insns
   This should solve any issue with dc->cc_op prediction going
   out of sync with env->cc_op and cpu_cc_src*

2. We must flush lazy evaluation back to CC_OP_FLAGS in a few cases when
   a. conditional code is required by insn (like addc, cond branch etc.)
      - here we can optimize by evaluating specific bits (carry?)
      - not sure if it works in case we have two cond consuming insns,
        where first needs carry another needs the rest of flags
   b. CCR is read by rdccr (helper_rdccr)
      - have to compute all flags
   c. trap occurs and we prepare trap level context (saving pstate)
      - have to compute all flags
   d. control goes out of tcg runtime (so gdbstub reads correct value from env)
      - have to compute all flags
   ...

-- 
Kind regards,
Igor V. Kovalenko

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Qemu-devel] Re: sparc64 lazy conditional codes evaluation
  2010-05-03  7:17 [Qemu-devel] sparc64 lazy conditional codes evaluation Igor Kovalenko
@ 2010-05-03 19:24 ` Blue Swirl
  2010-05-03 19:46   ` Igor Kovalenko
  0 siblings, 1 reply; 13+ messages in thread
From: Blue Swirl @ 2010-05-03 19:24 UTC (permalink / raw)
  To: Igor Kovalenko; +Cc: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2990 bytes --]

On 5/3/10, Igor Kovalenko <igor.v.kovalenko@gmail.com> wrote:
> Hi!
>
>  There is an issue with lazy conditional codes evaluation where
>  we return from trap handler with mismatching conditionals.
>
>  I seldom reproduce it here when dragging qemu window while
>  machine is working through silo initialization. I use gentoo minimal cd
>  install-sparc64-minimal-20100322.iso but I think anything with silo boot
>  would experience the same. Once in a while it would report crc error,
>  unable to open cd partition or it would fail to decompress image.

I think I've also seen this.

>  Pattern that fails appears to require a sequence of compare insn
>  possibly followed by a few instructions which do not touch conditionals,
>  then conditional branch insn. If it happens that we trap while processing
>  conditional branch insn so it is restarted after return from trap then
>  seldom conditional codes are calculated incorrectly.
>
>  I cannot point to exact cause but it appears that after trap return
>  we may have CC_OP and CC_SRC* mismatch somewhere,
>  since adding more cond evaluation flushes over the code helps.
>
>  We already tried doing flush more frequently and it is still not
>  complete, so the question is how to finally do this once and right :)
>
>  Obviously I do not get the design of lazy evaluation right, but
>  the following list appears to be good start. Plan is to prepare
>  a change to qemu and find a way to test it.
>
>  1. Since SPARC* is a RISC CPU it seems to be not profitable to
>    use DisasContext->cc_op to predict if flags should be not evaluated
>    due to overriding insn. Instead we can drop cc_op from disassembler
>    context and simplify code to only use cc_op from env.

Not currently, but in the future we may use that to do even lazier
flags computation. For example the sequence 'cmp x, y; bne target'
could be much more optimal by changing the branch to do the
comparison. Here's an old unfinished patch to do some of this.

>    Another point is that we always write to env->cc_op when
>  translating *cc insns
>    This should solve any issue with dc->cc_op prediction going
>    out of sync with env->cc_op and cpu_cc_src*

I think this is what is happening now.

>  2. We must flush lazy evaluation back to CC_OP_FLAGS in a few cases when
>    a. conditional code is required by insn (like addc, cond branch etc.)
>       - here we can optimize by evaluating specific bits (carry?)
>       - not sure if it works in case we have two cond consuming insns,
>         where first needs carry another needs the rest of flags

Here's another patch to optimize C flag handling. It doesn't pass my
tests though.

>    b. CCR is read by rdccr (helper_rdccr)
>       - have to compute all flags
>    c. trap occurs and we prepare trap level context (saving pstate)
>       - have to compute all flags
>    d. control goes out of tcg runtime (so gdbstub reads correct value from env)
>       - have to compute all flags

Fully agree.

[-- Attachment #2: 0001-Convert-C-flag-input-BROKEN.patch --]
[-- Type: text/x-diff, Size: 5475 bytes --]

From b0863c213ce487e9c1034674668d1b64a43b7266 Mon Sep 17 00:00:00 2001
From: Blue Swirl <blauwirbel@gmail.com>
Date: Mon, 3 May 2010 19:11:37 +0000
Subject: [PATCH] Convert C flag input BROKEN

Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
---
 target-sparc/translate.c |   24 ++++++++----------------
 1 files changed, 8 insertions(+), 16 deletions(-)

diff --git a/target-sparc/translate.c b/target-sparc/translate.c
index be2a116..94c343d 100644
--- a/target-sparc/translate.c
+++ b/target-sparc/translate.c
@@ -336,7 +336,7 @@ static inline void gen_op_addxi_cc(TCGv dst, TCGv src1, target_long src2)
 {
     tcg_gen_mov_tl(cpu_cc_src, src1);
     tcg_gen_movi_tl(cpu_cc_src2, src2);
-    gen_mov_reg_C(cpu_tmp0, cpu_psr);
+    gen_helper_compute_C_icc(cpu_tmp0);
     tcg_gen_add_tl(cpu_cc_dst, cpu_cc_src, cpu_tmp0);
     tcg_gen_addi_tl(cpu_cc_dst, cpu_cc_dst, src2);
     tcg_gen_mov_tl(dst, cpu_cc_dst);
@@ -346,7 +346,7 @@ static inline void gen_op_addx_cc(TCGv dst, TCGv src1, TCGv src2)
 {
     tcg_gen_mov_tl(cpu_cc_src, src1);
     tcg_gen_mov_tl(cpu_cc_src2, src2);
-    gen_mov_reg_C(cpu_tmp0, cpu_psr);
+    gen_helper_compute_C_icc(cpu_tmp0);
     tcg_gen_add_tl(cpu_cc_dst, cpu_cc_src, cpu_tmp0);
     tcg_gen_add_tl(cpu_cc_dst, cpu_cc_dst, cpu_cc_src2);
     tcg_gen_mov_tl(dst, cpu_cc_dst);
@@ -419,7 +419,7 @@ static inline void gen_op_subxi_cc(TCGv dst, TCGv src1, target_long src2)
 {
     tcg_gen_mov_tl(cpu_cc_src, src1);
     tcg_gen_movi_tl(cpu_cc_src2, src2);
-    gen_mov_reg_C(cpu_tmp0, cpu_psr);
+    gen_helper_compute_C_icc(cpu_tmp0);
     tcg_gen_sub_tl(cpu_cc_dst, cpu_cc_src, cpu_tmp0);
     tcg_gen_subi_tl(cpu_cc_dst, cpu_cc_dst, src2);
     tcg_gen_mov_tl(dst, cpu_cc_dst);
@@ -429,7 +429,7 @@ static inline void gen_op_subx_cc(TCGv dst, TCGv src1, TCGv src2)
 {
     tcg_gen_mov_tl(cpu_cc_src, src1);
     tcg_gen_mov_tl(cpu_cc_src2, src2);
-    gen_mov_reg_C(cpu_tmp0, cpu_psr);
+    gen_helper_compute_C_icc(cpu_tmp0);
     tcg_gen_sub_tl(cpu_cc_dst, cpu_cc_src, cpu_tmp0);
     tcg_gen_sub_tl(cpu_cc_dst, cpu_cc_dst, cpu_cc_src2);
     tcg_gen_mov_tl(dst, cpu_cc_dst);
@@ -2953,25 +2953,21 @@ static void disas_sparc_insn(DisasContext * dc)
                         if (IS_IMM) {
                             simm = GET_FIELDs(insn, 19, 31);
                             if (xop & 0x10) {
-                                gen_helper_compute_psr();
                                 gen_op_addxi_cc(cpu_dst, cpu_src1, simm);
                                 tcg_gen_movi_i32(cpu_cc_op, CC_OP_ADDX);
                                 dc->cc_op = CC_OP_ADDX;
                             } else {
-                                gen_helper_compute_psr();
-                                gen_mov_reg_C(cpu_tmp0, cpu_psr);
+                                gen_helper_compute_C_icc(cpu_tmp0);
                                 tcg_gen_addi_tl(cpu_tmp0, cpu_tmp0, simm);
                                 tcg_gen_add_tl(cpu_dst, cpu_src1, cpu_tmp0);
                             }
                         } else {
                             if (xop & 0x10) {
-                                gen_helper_compute_psr();
                                 gen_op_addx_cc(cpu_dst, cpu_src1, cpu_src2);
                                 tcg_gen_movi_i32(cpu_cc_op, CC_OP_ADDX);
                                 dc->cc_op = CC_OP_ADDX;
                             } else {
-                                gen_helper_compute_psr();
-                                gen_mov_reg_C(cpu_tmp0, cpu_psr);
+                                gen_helper_compute_C_icc(cpu_tmp0);
                                 tcg_gen_add_tl(cpu_tmp0, cpu_src2, cpu_tmp0);
                                 tcg_gen_add_tl(cpu_dst, cpu_src1, cpu_tmp0);
                             }
@@ -3009,25 +3005,21 @@ static void disas_sparc_insn(DisasContext * dc)
                         if (IS_IMM) {
                             simm = GET_FIELDs(insn, 19, 31);
                             if (xop & 0x10) {
-                                gen_helper_compute_psr();
                                 gen_op_subxi_cc(cpu_dst, cpu_src1, simm);
                                 tcg_gen_movi_i32(cpu_cc_op, CC_OP_SUBX);
                                 dc->cc_op = CC_OP_SUBX;
                             } else {
-                                gen_helper_compute_psr();
-                                gen_mov_reg_C(cpu_tmp0, cpu_psr);
+                                gen_helper_compute_C_icc(cpu_tmp0);
                                 tcg_gen_addi_tl(cpu_tmp0, cpu_tmp0, simm);
                                 tcg_gen_sub_tl(cpu_dst, cpu_src1, cpu_tmp0);
                             }
                         } else {
                             if (xop & 0x10) {
-                                gen_helper_compute_psr();
                                 gen_op_subx_cc(cpu_dst, cpu_src1, cpu_src2);
                                 tcg_gen_movi_i32(cpu_cc_op, CC_OP_SUBX);
                                 dc->cc_op = CC_OP_SUBX;
                             } else {
-                                gen_helper_compute_psr();
-                                gen_mov_reg_C(cpu_tmp0, cpu_psr);
+                                gen_helper_compute_C_icc(cpu_tmp0);
                                 tcg_gen_add_tl(cpu_tmp0, cpu_src2, cpu_tmp0);
                                 tcg_gen_sub_tl(cpu_dst, cpu_src1, cpu_tmp0);
                             }
-- 
1.5.6.5


[-- Attachment #3: 0002-Branch-optimization-BROKEN.patch --]
[-- Type: text/x-diff, Size: 4402 bytes --]

From 93cce43be043ca25770165b8c06546eafc320716 Mon Sep 17 00:00:00 2001
From: Blue Swirl <blauwirbel@gmail.com>
Date: Mon, 3 May 2010 19:21:59 +0000
Subject: [PATCH] Branch optimization BROKEN

Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
---
 target-sparc/translate.c |  108 +++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 106 insertions(+), 2 deletions(-)

diff --git a/target-sparc/translate.c b/target-sparc/translate.c
index 94c343d..57bda12 100644
--- a/target-sparc/translate.c
+++ b/target-sparc/translate.c
@@ -1115,6 +1115,104 @@ static inline void gen_cond_reg(TCGv r_dst, int cond, TCGv r_src)
 }
 #endif
 
+// Inverted logic
+static const int gen_tcg_cond[16] = {
+    -1,
+    TCG_COND_NE,
+    TCG_COND_GT,
+    TCG_COND_GE,
+    TCG_COND_GTU,
+    TCG_COND_GEU,
+    -1,
+    -1,
+    -1,
+    TCG_COND_EQ,
+    TCG_COND_LE,
+    TCG_COND_LT,
+    TCG_COND_LEU,
+    TCG_COND_LTU,
+    -1,
+    -1,
+};
+
+/* generate a conditional jump to label 'l1' according to jump opcode
+   value 'b'. In the fast case, T0 is guaranted not to be used. */
+static inline void gen_brcond(DisasContext *dc, int cond, int l1, int cc, TCGv r_cond)
+{
+    //printf("gen_brcond: cc_op %d\n", dc->cc_op);
+    switch (dc->cc_op) {
+        /* we optimize the cmp/br case */
+    case CC_OP_SUB:
+        // Inverted logic
+        switch (cond) {
+        case 0x0: // n
+            tcg_gen_br(l1);
+            break;
+        case 0x1: // e
+            if (cc == 1) {
+                tcg_gen_brcondi_i64(TCG_COND_NE, cpu_cc_dst, 0, l1);
+            } else {
+                tcg_gen_brcondi_i32(TCG_COND_NE, cpu_cc_dst, 0, l1);
+            }
+            break;
+        case 0x2: // le
+        case 0x3: // l
+        case 0x4: // leu
+        case 0x5: // cs/lu
+        case 0xa: // g
+        case 0xb: // ge
+        case 0xc: // gu
+        case 0xd: // cc/geu
+            if (cc == 1) {
+                tcg_gen_brcondi_i64(gen_tcg_cond[cond], cpu_cc_src, cpu_cc_src2, l1);
+            } else {
+                tcg_gen_brcondi_i32(gen_tcg_cond[cond], cpu_cc_src, cpu_cc_src2, l1);
+            }
+            break;
+        case 0x6: // neg
+            if (cc == 1) {
+                tcg_gen_brcondi_i64(TCG_COND_GE, cpu_cc_dst, 0, l1);
+            } else {
+                tcg_gen_brcondi_i32(TCG_COND_GE, cpu_cc_dst, 0, l1);
+            }
+            break;
+        case 0x7: // vs
+            gen_helper_compute_psr();
+            dc->cc_op = CC_OP_FLAGS;
+            gen_op_eval_bvs(cpu_cc_dst, cpu_cc_src);
+            break;
+        case 0x8: // a
+            // nop
+            break;
+        case 0x9: // ne
+            if (cc == 1) {
+                tcg_gen_brcondi_i64(TCG_COND_EQ, cpu_cc_dst, 0, l1);
+            } else {
+                tcg_gen_brcondi_i32(TCG_COND_EQ, cpu_cc_dst, 0, l1);
+            }
+            break;
+        case 0xe: // pos
+            if (cc == 1) {
+                tcg_gen_brcondi_i64(TCG_COND_LT, cpu_cc_dst, 0, l1);
+            } else {
+                tcg_gen_brcondi_i32(TCG_COND_LT, cpu_cc_dst, 0, l1);
+            }
+            break;
+        case 0xf: // vc
+            gen_helper_compute_psr();
+            dc->cc_op = CC_OP_FLAGS;
+            gen_op_eval_bvc(cpu_cc_dst, cpu_cc_src);
+            break;
+        }
+        break;
+    case CC_OP_FLAGS:
+    default:
+        gen_cond(r_cond, cc, cond, dc);
+        tcg_gen_brcondi_tl(TCG_COND_EQ, r_cond, 0, l1);
+        break;
+    }
+}
+
 /* XXX: potentially incorrect if dynamic npc */
 static void do_branch(DisasContext *dc, int32_t offset, uint32_t insn, int cc,
                       TCGv r_cond)
@@ -1143,11 +1241,17 @@ static void do_branch(DisasContext *dc, int32_t offset, uint32_t insn, int cc,
         }
     } else {
         flush_cond(dc, r_cond);
-        gen_cond(r_cond, cc, cond, dc);
         if (a) {
-            gen_branch_a(dc, target, dc->npc, r_cond);
+            int l1 = gen_new_label();
+
+            gen_brcond(dc, cond, l1, cc, r_cond);
+            gen_goto_tb(dc, 0, dc->npc, target);
+
+            gen_set_label(l1);
+            gen_goto_tb(dc, 1, dc->npc + 4, dc->npc + 8);
             dc->is_br = 1;
         } else {
+            gen_cond(r_cond, cc, cond, dc);
             dc->pc = dc->npc;
             dc->jump_pc[0] = target;
             dc->jump_pc[1] = dc->npc + 4;
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] Re: sparc64 lazy conditional codes evaluation
  2010-05-03 19:24 ` [Qemu-devel] " Blue Swirl
@ 2010-05-03 19:46   ` Igor Kovalenko
  2010-05-03 19:54     ` Blue Swirl
  0 siblings, 1 reply; 13+ messages in thread
From: Igor Kovalenko @ 2010-05-03 19:46 UTC (permalink / raw)
  To: Blue Swirl; +Cc: qemu-devel

On Mon, May 3, 2010 at 11:24 PM, Blue Swirl <blauwirbel@gmail.com> wrote:
> On 5/3/10, Igor Kovalenko <igor.v.kovalenko@gmail.com> wrote:
>> Hi!
>>
>>  There is an issue with lazy conditional codes evaluation where
>>  we return from trap handler with mismatching conditionals.
>>
>>  I seldom reproduce it here when dragging qemu window while
>>  machine is working through silo initialization. I use gentoo minimal cd
>>  install-sparc64-minimal-20100322.iso but I think anything with silo boot
>>  would experience the same. Once in a while it would report crc error,
>>  unable to open cd partition or it would fail to decompress image.
>
> I think I've also seen this.
>
>>  Pattern that fails appears to require a sequence of compare insn
>>  possibly followed by a few instructions which do not touch conditionals,
>>  then conditional branch insn. If it happens that we trap while processing
>>  conditional branch insn so it is restarted after return from trap then
>>  seldom conditional codes are calculated incorrectly.
>>
>>  I cannot point to exact cause but it appears that after trap return
>>  we may have CC_OP and CC_SRC* mismatch somewhere,
>>  since adding more cond evaluation flushes over the code helps.
>>
>>  We already tried doing flush more frequently and it is still not
>>  complete, so the question is how to finally do this once and right :)
>>
>>  Obviously I do not get the design of lazy evaluation right, but
>>  the following list appears to be good start. Plan is to prepare
>>  a change to qemu and find a way to test it.
>>
>>  1. Since SPARC* is a RISC CPU it seems to be not profitable to
>>    use DisasContext->cc_op to predict if flags should be not evaluated
>>    due to overriding insn. Instead we can drop cc_op from disassembler
>>    context and simplify code to only use cc_op from env.
>
> Not currently, but in the future we may use that to do even lazier
> flags computation. For example the sequence 'cmp x, y; bne target'
> could be much more optimal by changing the branch to do the
> comparison. Here's an old unfinished patch to do some of this.
>
>>    Another point is that we always write to env->cc_op when
>>  translating *cc insns
>>    This should solve any issue with dc->cc_op prediction going
>>    out of sync with env->cc_op and cpu_cc_src*
>
> I think this is what is happening now.
>
>>  2. We must flush lazy evaluation back to CC_OP_FLAGS in a few cases when
>>    a. conditional code is required by insn (like addc, cond branch etc.)
>>       - here we can optimize by evaluating specific bits (carry?)
>>       - not sure if it works in case we have two cond consuming insns,
>>         where first needs carry another needs the rest of flags
>
> Here's another patch to optimize C flag handling. It doesn't pass my
> tests though.
>
>>    b. CCR is read by rdccr (helper_rdccr)
>>       - have to compute all flags
>>    c. trap occurs and we prepare trap level context (saving pstate)
>>       - have to compute all flags
>>    d. control goes out of tcg runtime (so gdbstub reads correct value from env)
>>       - have to compute all flags
>
> Fully agree.

Cool

Still I'd propose to kill dc->cc_op, find a reliable way to test it
and then add it back possibly with more optimizations.
I'm lost in the code up to the point where I believe we need to
save/restore cc_op and cpu_cc* while switching trap levels.

-- 
Kind regards,
Igor V. Kovalenko

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Qemu-devel] Re: sparc64 lazy conditional codes evaluation
  2010-05-03 19:46   ` Igor Kovalenko
@ 2010-05-03 19:54     ` Blue Swirl
  2010-05-03 20:03       ` Igor Kovalenko
  0 siblings, 1 reply; 13+ messages in thread
From: Blue Swirl @ 2010-05-03 19:54 UTC (permalink / raw)
  To: Igor Kovalenko; +Cc: qemu-devel

On 5/3/10, Igor Kovalenko <igor.v.kovalenko@gmail.com> wrote:
> On Mon, May 3, 2010 at 11:24 PM, Blue Swirl <blauwirbel@gmail.com> wrote:
>  > On 5/3/10, Igor Kovalenko <igor.v.kovalenko@gmail.com> wrote:
>  >> Hi!
>  >>
>  >>  There is an issue with lazy conditional codes evaluation where
>  >>  we return from trap handler with mismatching conditionals.
>  >>
>  >>  I seldom reproduce it here when dragging qemu window while
>  >>  machine is working through silo initialization. I use gentoo minimal cd
>  >>  install-sparc64-minimal-20100322.iso but I think anything with silo boot
>  >>  would experience the same. Once in a while it would report crc error,
>  >>  unable to open cd partition or it would fail to decompress image.
>  >
>  > I think I've also seen this.
>  >
>  >>  Pattern that fails appears to require a sequence of compare insn
>  >>  possibly followed by a few instructions which do not touch conditionals,
>  >>  then conditional branch insn. If it happens that we trap while processing
>  >>  conditional branch insn so it is restarted after return from trap then
>  >>  seldom conditional codes are calculated incorrectly.
>  >>
>  >>  I cannot point to exact cause but it appears that after trap return
>  >>  we may have CC_OP and CC_SRC* mismatch somewhere,
>  >>  since adding more cond evaluation flushes over the code helps.
>  >>
>  >>  We already tried doing flush more frequently and it is still not
>  >>  complete, so the question is how to finally do this once and right :)
>  >>
>  >>  Obviously I do not get the design of lazy evaluation right, but
>  >>  the following list appears to be good start. Plan is to prepare
>  >>  a change to qemu and find a way to test it.
>  >>
>  >>  1. Since SPARC* is a RISC CPU it seems to be not profitable to
>  >>    use DisasContext->cc_op to predict if flags should be not evaluated
>  >>    due to overriding insn. Instead we can drop cc_op from disassembler
>  >>    context and simplify code to only use cc_op from env.
>  >
>  > Not currently, but in the future we may use that to do even lazier
>  > flags computation. For example the sequence 'cmp x, y; bne target'
>  > could be much more optimal by changing the branch to do the
>  > comparison. Here's an old unfinished patch to do some of this.
>  >
>  >>    Another point is that we always write to env->cc_op when
>  >>  translating *cc insns
>  >>    This should solve any issue with dc->cc_op prediction going
>  >>    out of sync with env->cc_op and cpu_cc_src*
>  >
>  > I think this is what is happening now.
>  >
>  >>  2. We must flush lazy evaluation back to CC_OP_FLAGS in a few cases when
>  >>    a. conditional code is required by insn (like addc, cond branch etc.)
>  >>       - here we can optimize by evaluating specific bits (carry?)
>  >>       - not sure if it works in case we have two cond consuming insns,
>  >>         where first needs carry another needs the rest of flags
>  >
>  > Here's another patch to optimize C flag handling. It doesn't pass my
>  > tests though.
>  >
>  >>    b. CCR is read by rdccr (helper_rdccr)
>  >>       - have to compute all flags
>  >>    c. trap occurs and we prepare trap level context (saving pstate)
>  >>       - have to compute all flags
>  >>    d. control goes out of tcg runtime (so gdbstub reads correct value from env)
>  >>       - have to compute all flags
>  >
>  > Fully agree.
>
>
> Cool
>
>  Still I'd propose to kill dc->cc_op, find a reliable way to test it
>  and then add it back possibly with more optimizations.
>  I'm lost in the code up to the point where I believe we need to
>  save/restore cc_op and cpu_cc* while switching trap levels.

I'd think this should do the trick:

diff --git a/target-sparc/op_helper.c b/target-sparc/op_helper.c
index b27778b..94921cd 100644
--- a/target-sparc/op_helper.c
+++ b/target-sparc/op_helper.c
@@ -3506,6 +3506,8 @@ void do_interrupt(CPUState *env)
     }
     tsptr = cpu_tsptr(env);

+    helper_compute_psr();
+
     tsptr->tstate = ((uint64_t)GET_CCR(env) << 32) |
         ((env->asi & 0xff) << 24) | ((env->pstate & 0xf3f) << 8) |
         GET_CWP64(env);

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] Re: sparc64 lazy conditional codes evaluation
  2010-05-03 19:54     ` Blue Swirl
@ 2010-05-03 20:03       ` Igor Kovalenko
  2010-05-04 20:21         ` Blue Swirl
  0 siblings, 1 reply; 13+ messages in thread
From: Igor Kovalenko @ 2010-05-03 20:03 UTC (permalink / raw)
  To: Blue Swirl; +Cc: qemu-devel

On Mon, May 3, 2010 at 11:54 PM, Blue Swirl <blauwirbel@gmail.com> wrote:
> On 5/3/10, Igor Kovalenko <igor.v.kovalenko@gmail.com> wrote:
>> On Mon, May 3, 2010 at 11:24 PM, Blue Swirl <blauwirbel@gmail.com> wrote:
>>  > On 5/3/10, Igor Kovalenko <igor.v.kovalenko@gmail.com> wrote:
>>  >> Hi!
>>  >>
>>  >>  There is an issue with lazy conditional codes evaluation where
>>  >>  we return from trap handler with mismatching conditionals.
>>  >>
>>  >>  I seldom reproduce it here when dragging qemu window while
>>  >>  machine is working through silo initialization. I use gentoo minimal cd
>>  >>  install-sparc64-minimal-20100322.iso but I think anything with silo boot
>>  >>  would experience the same. Once in a while it would report crc error,
>>  >>  unable to open cd partition or it would fail to decompress image.
>>  >
>>  > I think I've also seen this.
>>  >
>>  >>  Pattern that fails appears to require a sequence of compare insn
>>  >>  possibly followed by a few instructions which do not touch conditionals,
>>  >>  then conditional branch insn. If it happens that we trap while processing
>>  >>  conditional branch insn so it is restarted after return from trap then
>>  >>  seldom conditional codes are calculated incorrectly.
>>  >>
>>  >>  I cannot point to exact cause but it appears that after trap return
>>  >>  we may have CC_OP and CC_SRC* mismatch somewhere,
>>  >>  since adding more cond evaluation flushes over the code helps.
>>  >>
>>  >>  We already tried doing flush more frequently and it is still not
>>  >>  complete, so the question is how to finally do this once and right :)
>>  >>
>>  >>  Obviously I do not get the design of lazy evaluation right, but
>>  >>  the following list appears to be good start. Plan is to prepare
>>  >>  a change to qemu and find a way to test it.
>>  >>
>>  >>  1. Since SPARC* is a RISC CPU it seems to be not profitable to
>>  >>    use DisasContext->cc_op to predict if flags should be not evaluated
>>  >>    due to overriding insn. Instead we can drop cc_op from disassembler
>>  >>    context and simplify code to only use cc_op from env.
>>  >
>>  > Not currently, but in the future we may use that to do even lazier
>>  > flags computation. For example the sequence 'cmp x, y; bne target'
>>  > could be much more optimal by changing the branch to do the
>>  > comparison. Here's an old unfinished patch to do some of this.
>>  >
>>  >>    Another point is that we always write to env->cc_op when
>>  >>  translating *cc insns
>>  >>    This should solve any issue with dc->cc_op prediction going
>>  >>    out of sync with env->cc_op and cpu_cc_src*
>>  >
>>  > I think this is what is happening now.
>>  >
>>  >>  2. We must flush lazy evaluation back to CC_OP_FLAGS in a few cases when
>>  >>    a. conditional code is required by insn (like addc, cond branch etc.)
>>  >>       - here we can optimize by evaluating specific bits (carry?)
>>  >>       - not sure if it works in case we have two cond consuming insns,
>>  >>         where first needs carry another needs the rest of flags
>>  >
>>  > Here's another patch to optimize C flag handling. It doesn't pass my
>>  > tests though.
>>  >
>>  >>    b. CCR is read by rdccr (helper_rdccr)
>>  >>       - have to compute all flags
>>  >>    c. trap occurs and we prepare trap level context (saving pstate)
>>  >>       - have to compute all flags
>>  >>    d. control goes out of tcg runtime (so gdbstub reads correct value from env)
>>  >>       - have to compute all flags
>>  >
>>  > Fully agree.
>>
>>
>> Cool
>>
>>  Still I'd propose to kill dc->cc_op, find a reliable way to test it
>>  and then add it back possibly with more optimizations.
>>  I'm lost in the code up to the point where I believe we need to
>>  save/restore cc_op and cpu_cc* while switching trap levels.
>
> I'd think this should do the trick:
>
> diff --git a/target-sparc/op_helper.c b/target-sparc/op_helper.c
> index b27778b..94921cd 100644
> --- a/target-sparc/op_helper.c
> +++ b/target-sparc/op_helper.c
> @@ -3506,6 +3506,8 @@ void do_interrupt(CPUState *env)
>     }
>     tsptr = cpu_tsptr(env);
>
> +    helper_compute_psr();
> +
>     tsptr->tstate = ((uint64_t)GET_CCR(env) << 32) |
>         ((env->asi & 0xff) << 24) | ((env->pstate & 0xf3f) << 8) |
>         GET_CWP64(env);
>

Thanks, this change seems to work here for silo issue.

Another change would be to flush for gdbstub use of GET_CCR and for
helper_rdccr.
I tried to embed flush into GET_CCR but the code looks ugly since we
need to proxy a call to helper_compute_psr from gdbstub passing
available env pointer.

Not really tested with your changes, but still what is the breakage you see?

-- 
Kind regards,
Igor V. Kovalenko

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Qemu-devel] Re: sparc64 lazy conditional codes evaluation
  2010-05-03 20:03       ` Igor Kovalenko
@ 2010-05-04 20:21         ` Blue Swirl
  2010-05-05 20:24           ` Igor Kovalenko
  0 siblings, 1 reply; 13+ messages in thread
From: Blue Swirl @ 2010-05-04 20:21 UTC (permalink / raw)
  To: Igor Kovalenko; +Cc: qemu-devel

On 5/3/10, Igor Kovalenko <igor.v.kovalenko@gmail.com> wrote:
> On Mon, May 3, 2010 at 11:54 PM, Blue Swirl <blauwirbel@gmail.com> wrote:
>  > On 5/3/10, Igor Kovalenko <igor.v.kovalenko@gmail.com> wrote:
>  >> On Mon, May 3, 2010 at 11:24 PM, Blue Swirl <blauwirbel@gmail.com> wrote:
>  >>  > On 5/3/10, Igor Kovalenko <igor.v.kovalenko@gmail.com> wrote:
>  >>  >> Hi!
>  >>  >>
>  >>  >>  There is an issue with lazy conditional codes evaluation where
>  >>  >>  we return from trap handler with mismatching conditionals.
>  >>  >>
>  >>  >>  I seldom reproduce it here when dragging qemu window while
>  >>  >>  machine is working through silo initialization. I use gentoo minimal cd
>  >>  >>  install-sparc64-minimal-20100322.iso but I think anything with silo boot
>  >>  >>  would experience the same. Once in a while it would report crc error,
>  >>  >>  unable to open cd partition or it would fail to decompress image.
>  >>  >
>  >>  > I think I've also seen this.
>  >>  >
>  >>  >>  Pattern that fails appears to require a sequence of compare insn
>  >>  >>  possibly followed by a few instructions which do not touch conditionals,
>  >>  >>  then conditional branch insn. If it happens that we trap while processing
>  >>  >>  conditional branch insn so it is restarted after return from trap then
>  >>  >>  seldom conditional codes are calculated incorrectly.
>  >>  >>
>  >>  >>  I cannot point to exact cause but it appears that after trap return
>  >>  >>  we may have CC_OP and CC_SRC* mismatch somewhere,
>  >>  >>  since adding more cond evaluation flushes over the code helps.
>  >>  >>
>  >>  >>  We already tried doing flush more frequently and it is still not
>  >>  >>  complete, so the question is how to finally do this once and right :)
>  >>  >>
>  >>  >>  Obviously I do not get the design of lazy evaluation right, but
>  >>  >>  the following list appears to be good start. Plan is to prepare
>  >>  >>  a change to qemu and find a way to test it.
>  >>  >>
>  >>  >>  1. Since SPARC* is a RISC CPU it seems to be not profitable to
>  >>  >>    use DisasContext->cc_op to predict if flags should be not evaluated
>  >>  >>    due to overriding insn. Instead we can drop cc_op from disassembler
>  >>  >>    context and simplify code to only use cc_op from env.
>  >>  >
>  >>  > Not currently, but in the future we may use that to do even lazier
>  >>  > flags computation. For example the sequence 'cmp x, y; bne target'
>  >>  > could be much more optimal by changing the branch to do the
>  >>  > comparison. Here's an old unfinished patch to do some of this.
>  >>  >
>  >>  >>    Another point is that we always write to env->cc_op when
>  >>  >>  translating *cc insns
>  >>  >>    This should solve any issue with dc->cc_op prediction going
>  >>  >>    out of sync with env->cc_op and cpu_cc_src*
>  >>  >
>  >>  > I think this is what is happening now.
>  >>  >
>  >>  >>  2. We must flush lazy evaluation back to CC_OP_FLAGS in a few cases when
>  >>  >>    a. conditional code is required by insn (like addc, cond branch etc.)
>  >>  >>       - here we can optimize by evaluating specific bits (carry?)
>  >>  >>       - not sure if it works in case we have two cond consuming insns,
>  >>  >>         where first needs carry another needs the rest of flags
>  >>  >
>  >>  > Here's another patch to optimize C flag handling. It doesn't pass my
>  >>  > tests though.
>  >>  >
>  >>  >>    b. CCR is read by rdccr (helper_rdccr)
>  >>  >>       - have to compute all flags
>  >>  >>    c. trap occurs and we prepare trap level context (saving pstate)
>  >>  >>       - have to compute all flags
>  >>  >>    d. control goes out of tcg runtime (so gdbstub reads correct value from env)
>  >>  >>       - have to compute all flags
>  >>  >
>  >>  > Fully agree.
>  >>
>  >>
>  >> Cool
>  >>
>  >>  Still I'd propose to kill dc->cc_op, find a reliable way to test it
>  >>  and then add it back possibly with more optimizations.
>  >>  I'm lost in the code up to the point where I believe we need to
>  >>  save/restore cc_op and cpu_cc* while switching trap levels.
>  >
>  > I'd think this should do the trick:
>  >
>  > diff --git a/target-sparc/op_helper.c b/target-sparc/op_helper.c
>  > index b27778b..94921cd 100644
>  > --- a/target-sparc/op_helper.c
>  > +++ b/target-sparc/op_helper.c
>  > @@ -3506,6 +3506,8 @@ void do_interrupt(CPUState *env)
>  >     }
>  >     tsptr = cpu_tsptr(env);
>  >
>  > +    helper_compute_psr();
>  > +
>  >     tsptr->tstate = ((uint64_t)GET_CCR(env) << 32) |
>  >         ((env->asi & 0xff) << 24) | ((env->pstate & 0xf3f) << 8) |
>  >         GET_CWP64(env);
>  >
>
>
> Thanks, this change seems to work here for silo issue.
>
>  Another change would be to flush for gdbstub use of GET_CCR and for
>  helper_rdccr.
>  I tried to embed flush into GET_CCR but the code looks ugly since we
>  need to proxy a call to helper_compute_psr from gdbstub passing
>  available env pointer.
>
>  Not really tested with your changes, but still what is the breakage you see?

Aurora 2.0 (http://distro.ibiblio.org/pub/linux/distributions/aurora/build-2.0/sparc/iso/)
breaks.

This is what I get with git HEAD, having pressed enter key twice:
Welcome to Aurora SPARC Linux




                   +--------------+ CD Found +--------------+
                   |                                        |
                   | To begin testing the CD media before   |
                   | installation press OK.                 |
                   |                                        |
                   | Choose Skip to skip the media test     |
                   | and start the installation.            |
                   |                                        |
                   |      +----+             +------+       |
                   |      | OK |             | Skip |       |
                   |      +----+             +------+       |
                   |                                        |
                   |                                        |
                   +----------------------------------------+




  <Tab>/<Alt-Tab> between elements  | <Space> selects | <F12> next screen

This is what I get with the C flag patch applied:
Welcome to Aurora SPARC Linux





                    +--------------+ Error +---------------+
                    |                                      |
                    | failed to read keymap information:   |
                    | Success                              |
                    |                                      |
                    |               +----+                 |
                    |               | OK |                 |
                    |               +----+                 |
                    |                                      |
                    |                                      |
                    +--------------------------------------+






  <Tab>/<Alt-Tab> between elements  | <Space> selects | <F12> next screen

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Qemu-devel] Re: sparc64 lazy conditional codes evaluation
  2010-05-04 20:21         ` Blue Swirl
@ 2010-05-05 20:24           ` Igor Kovalenko
  2010-05-06 18:51             ` Blue Swirl
  0 siblings, 1 reply; 13+ messages in thread
From: Igor Kovalenko @ 2010-05-05 20:24 UTC (permalink / raw)
  To: Blue Swirl; +Cc: qemu-devel

On Wed, May 5, 2010 at 12:21 AM, Blue Swirl <blauwirbel@gmail.com> wrote:
> On 5/3/10, Igor Kovalenko <igor.v.kovalenko@gmail.com> wrote:
>> On Mon, May 3, 2010 at 11:54 PM, Blue Swirl <blauwirbel@gmail.com> wrote:
>>  > On 5/3/10, Igor Kovalenko <igor.v.kovalenko@gmail.com> wrote:
>>  >> On Mon, May 3, 2010 at 11:24 PM, Blue Swirl <blauwirbel@gmail.com> wrote:
>>  >>  > On 5/3/10, Igor Kovalenko <igor.v.kovalenko@gmail.com> wrote:
>>  >>  >> Hi!
>>  >>  >>
>>  >>  >>  There is an issue with lazy conditional codes evaluation where
>>  >>  >>  we return from trap handler with mismatching conditionals.
>>  >>  >>
>>  >>  >>  I seldom reproduce it here when dragging qemu window while
>>  >>  >>  machine is working through silo initialization. I use gentoo minimal cd
>>  >>  >>  install-sparc64-minimal-20100322.iso but I think anything with silo boot
>>  >>  >>  would experience the same. Once in a while it would report crc error,
>>  >>  >>  unable to open cd partition or it would fail to decompress image.
>>  >>  >
>>  >>  > I think I've also seen this.
>>  >>  >
>>  >>  >>  Pattern that fails appears to require a sequence of compare insn
>>  >>  >>  possibly followed by a few instructions which do not touch conditionals,
>>  >>  >>  then conditional branch insn. If it happens that we trap while processing
>>  >>  >>  conditional branch insn so it is restarted after return from trap then
>>  >>  >>  seldom conditional codes are calculated incorrectly.
>>  >>  >>
>>  >>  >>  I cannot point to exact cause but it appears that after trap return
>>  >>  >>  we may have CC_OP and CC_SRC* mismatch somewhere,
>>  >>  >>  since adding more cond evaluation flushes over the code helps.
>>  >>  >>
>>  >>  >>  We already tried doing flush more frequently and it is still not
>>  >>  >>  complete, so the question is how to finally do this once and right :)
>>  >>  >>
>>  >>  >>  Obviously I do not get the design of lazy evaluation right, but
>>  >>  >>  the following list appears to be good start. Plan is to prepare
>>  >>  >>  a change to qemu and find a way to test it.
>>  >>  >>
>>  >>  >>  1. Since SPARC* is a RISC CPU it seems to be not profitable to
>>  >>  >>    use DisasContext->cc_op to predict if flags should be not evaluated
>>  >>  >>    due to overriding insn. Instead we can drop cc_op from disassembler
>>  >>  >>    context and simplify code to only use cc_op from env.
>>  >>  >
>>  >>  > Not currently, but in the future we may use that to do even lazier
>>  >>  > flags computation. For example the sequence 'cmp x, y; bne target'
>>  >>  > could be much more optimal by changing the branch to do the
>>  >>  > comparison. Here's an old unfinished patch to do some of this.

I wonder if it buys anything. Sparc RISC architecture means optimizing compiler
would prevent any extra flags computation, right? So it is basically 1-to-1
conditional computation and use. Or even worse, if we delay computation
until there are two or more consumers, correct?

>>  >>  >
>>  >>  >>    Another point is that we always write to env->cc_op when
>>  >>  >>  translating *cc insns
>>  >>  >>    This should solve any issue with dc->cc_op prediction going
>>  >>  >>    out of sync with env->cc_op and cpu_cc_src*
>>  >>  >
>>  >>  > I think this is what is happening now.
>>  >>  >
>>  >>  >>  2. We must flush lazy evaluation back to CC_OP_FLAGS in a few cases when
>>  >>  >>    a. conditional code is required by insn (like addc, cond branch etc.)
>>  >>  >>       - here we can optimize by evaluating specific bits (carry?)
>>  >>  >>       - not sure if it works in case we have two cond consuming insns,
>>  >>  >>         where first needs carry another needs the rest of flags
>>  >>  >
>>  >>  > Here's another patch to optimize C flag handling. It doesn't pass my
>>  >>  > tests though.
>>  >>  >
>>  >>  >>    b. CCR is read by rdccr (helper_rdccr)
>>  >>  >>       - have to compute all flags
>>  >>  >>    c. trap occurs and we prepare trap level context (saving pstate)
>>  >>  >>       - have to compute all flags
>>  >>  >>    d. control goes out of tcg runtime (so gdbstub reads correct value from env)
>>  >>  >>       - have to compute all flags
>>  >>  >
>>  >>  > Fully agree.
>>  >>
>>  >>
>>  >> Cool
>>  >>
>>  >>  Still I'd propose to kill dc->cc_op, find a reliable way to test it
>>  >>  and then add it back possibly with more optimizations.
>>  >>  I'm lost in the code up to the point where I believe we need to
>>  >>  save/restore cc_op and cpu_cc* while switching trap levels.
>>  >
>>  > I'd think this should do the trick:
>>  >
>>  > diff --git a/target-sparc/op_helper.c b/target-sparc/op_helper.c
>>  > index b27778b..94921cd 100644
>>  > --- a/target-sparc/op_helper.c
>>  > +++ b/target-sparc/op_helper.c
>>  > @@ -3506,6 +3506,8 @@ void do_interrupt(CPUState *env)
>>  >     }
>>  >     tsptr = cpu_tsptr(env);
>>  >
>>  > +    helper_compute_psr();
>>  > +
>>  >     tsptr->tstate = ((uint64_t)GET_CCR(env) << 32) |
>>  >         ((env->asi & 0xff) << 24) | ((env->pstate & 0xf3f) << 8) |
>>  >         GET_CWP64(env);
>>  >
>>
>>
>> Thanks, this change seems to work here for silo issue.
>>
>>  Another change would be to flush for gdbstub use of GET_CCR and for
>>  helper_rdccr.
>>  I tried to embed flush into GET_CCR but the code looks ugly since we
>>  need to proxy a call to helper_compute_psr from gdbstub passing
>>  available env pointer.
>>
>>  Not really tested with your changes, but still what is the breakage you see?
>
> Aurora 2.0 (http://distro.ibiblio.org/pub/linux/distributions/aurora/build-2.0/sparc/iso/)
> breaks.
>
> This is what I get with git HEAD, having pressed enter key twice:
> Welcome to Aurora SPARC Linux
>
>
>
>
>                   +--------------+ CD Found +--------------+
>                   |                                        |
>                   | To begin testing the CD media before   |
>                   | installation press OK.                 |
>                   |                                        |
>                   | Choose Skip to skip the media test     |
>                   | and start the installation.            |
>                   |                                        |
>                   |      +----+             +------+       |
>                   |      | OK |             | Skip |       |
>                   |      +----+             +------+       |
>                   |                                        |
>                   |                                        |
>                   +----------------------------------------+
>
>
>
>
>  <Tab>/<Alt-Tab> between elements  | <Space> selects | <F12> next screen
>
> This is what I get with the C flag patch applied:
> Welcome to Aurora SPARC Linux
>
>
>
>
>
>                    +--------------+ Error +---------------+
>                    |                                      |
>                    | failed to read keymap information:   |
>                    | Success                              |
>                    |                                      |
>                    |               +----+                 |
>                    |               | OK |                 |
>                    |               +----+                 |
>                    |                                      |
>                    |                                      |
>                    +--------------------------------------+
>
>
>
>
>
>
>  <Tab>/<Alt-Tab> between elements  | <Space> selects | <F12> next screen
>

I do reproduce the issue here with 0001-Convert-C-flag-input-BROKEN.patch

-- 
Kind regards,
Igor V. Kovalenko

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Qemu-devel] Re: sparc64 lazy conditional codes evaluation
  2010-05-05 20:24           ` Igor Kovalenko
@ 2010-05-06 18:51             ` Blue Swirl
  2010-05-08  6:41               ` Igor Kovalenko
  0 siblings, 1 reply; 13+ messages in thread
From: Blue Swirl @ 2010-05-06 18:51 UTC (permalink / raw)
  To: Igor Kovalenko; +Cc: qemu-devel

On 5/5/10, Igor Kovalenko <igor.v.kovalenko@gmail.com> wrote:
> On Wed, May 5, 2010 at 12:21 AM, Blue Swirl <blauwirbel@gmail.com> wrote:
>  > On 5/3/10, Igor Kovalenko <igor.v.kovalenko@gmail.com> wrote:
>  >> On Mon, May 3, 2010 at 11:54 PM, Blue Swirl <blauwirbel@gmail.com> wrote:
>  >>  > On 5/3/10, Igor Kovalenko <igor.v.kovalenko@gmail.com> wrote:
>  >>  >> On Mon, May 3, 2010 at 11:24 PM, Blue Swirl <blauwirbel@gmail.com> wrote:
>  >>  >>  > On 5/3/10, Igor Kovalenko <igor.v.kovalenko@gmail.com> wrote:
>  >>  >>  >> Hi!
>  >>  >>  >>
>  >>  >>  >>  There is an issue with lazy conditional codes evaluation where
>  >>  >>  >>  we return from trap handler with mismatching conditionals.
>  >>  >>  >>
>  >>  >>  >>  I seldom reproduce it here when dragging qemu window while
>  >>  >>  >>  machine is working through silo initialization. I use gentoo minimal cd
>  >>  >>  >>  install-sparc64-minimal-20100322.iso but I think anything with silo boot
>  >>  >>  >>  would experience the same. Once in a while it would report crc error,
>  >>  >>  >>  unable to open cd partition or it would fail to decompress image.
>  >>  >>  >
>  >>  >>  > I think I've also seen this.
>  >>  >>  >
>  >>  >>  >>  Pattern that fails appears to require a sequence of compare insn
>  >>  >>  >>  possibly followed by a few instructions which do not touch conditionals,
>  >>  >>  >>  then conditional branch insn. If it happens that we trap while processing
>  >>  >>  >>  conditional branch insn so it is restarted after return from trap then
>  >>  >>  >>  seldom conditional codes are calculated incorrectly.
>  >>  >>  >>
>  >>  >>  >>  I cannot point to exact cause but it appears that after trap return
>  >>  >>  >>  we may have CC_OP and CC_SRC* mismatch somewhere,
>  >>  >>  >>  since adding more cond evaluation flushes over the code helps.
>  >>  >>  >>
>  >>  >>  >>  We already tried doing flush more frequently and it is still not
>  >>  >>  >>  complete, so the question is how to finally do this once and right :)
>  >>  >>  >>
>  >>  >>  >>  Obviously I do not get the design of lazy evaluation right, but
>  >>  >>  >>  the following list appears to be good start. Plan is to prepare
>  >>  >>  >>  a change to qemu and find a way to test it.
>  >>  >>  >>
>  >>  >>  >>  1. Since SPARC* is a RISC CPU it seems to be not profitable to
>  >>  >>  >>    use DisasContext->cc_op to predict if flags should be not evaluated
>  >>  >>  >>    due to overriding insn. Instead we can drop cc_op from disassembler
>  >>  >>  >>    context and simplify code to only use cc_op from env.
>  >>  >>  >
>  >>  >>  > Not currently, but in the future we may use that to do even lazier
>  >>  >>  > flags computation. For example the sequence 'cmp x, y; bne target'
>  >>  >>  > could be much more optimal by changing the branch to do the
>  >>  >>  > comparison. Here's an old unfinished patch to do some of this.
>
>
> I wonder if it buys anything. Sparc RISC architecture means optimizing compiler
>  would prevent any extra flags computation, right? So it is basically 1-to-1
>  conditional computation and use. Or even worse, if we delay computation
>  until there are two or more consumers, correct?

For x86 target, that is the other part of savings from using lazy
condition computation. It's true that it will not benefit RISC targets
much.

But I think you are missing the other part, where the actual flags are
not calculated but instead the original comparison can be used.

For example, consider this Sparc64 code:
IN:
0x000001fff000be58:  cmp  %g2, 0x51

OUT: [size=82]
0x40df16b0:  mov    0x10(%r14),%rbp
0x40df16b4:  mov    %rbp,%rbx
0x40df16b7:  sub    $0x51,%rbx
0x40df16bb:  mov    %rbx,%r12
0x40df16be:  mov    %r12,0x10a60(%r14)
0x40df16c5:  mov    %rbp,0x60(%r14)
0x40df16c9:  mov    $0x51,%ebp
0x40df16ce:  mov    %rbp,0x68(%r14)
0x40df16d2:  mov    %rbx,0x70(%r14)
0x40df16d6:  mov    $0x7,%ebp
0x40df16db:  mov    %ebp,0x78(%r14)
0x40df16df:  mov    $0x1fff000be5c,%rbp
0x40df16e9:  mov    %rbp,0x48(%r14)
0x40df16ed:  mov    $0x1fff000be60,%rbp
0x40df16f7:  mov    %rbp,0x50(%r14)
0x40df16fb:  xor    %eax,%eax
0x40df16fd:  jmpq   0xbfface

--------------
IN:
0x000001fff000be5c:  bne  0x1fff000c268

OUT: [size=95]
0x40df1710:  callq  0x518260
0x40df1715:  mov    0x98(%r14),%ebp
0x40df171c:  mov    %ebp,%ebx
0x40df171e:  shr    $0x16,%rbx
0x40df1722:  and    $0x1,%ebx
0x40df1725:  xor    $0x1,%rbx
0x40df1729:  mov    %rbx,0x90(%r14)
0x40df1730:  mov    $0x1fff000be60,%rbp
0x40df173a:  mov    %rbp,0x48(%r14)
0x40df173e:  test   %rbx,%rbx
0x40df1741:  je     0x40df175a
0x40df1747:  mov    $0x1fff000c268,%rbp
0x40df1751:  mov    %rbp,0x50(%r14)
0x40df1755:  jmpq   0x40df1768
0x40df175a:  mov    $0x1fff000be64,%rbp
0x40df1764:  mov    %rbp,0x50(%r14)
0x40df1768:  xor    %eax,%eax
0x40df176a:  jmpq   0xbfface

Instead of calculating any flags, we should generate for combined
'cmp/btest + branch' sequences something like:
mov    0x10(%r14),%rbp
mov    %rbp,%rbx
cmp    $0x51,%rbx
je     0x40ac275a
mov    $0x1fff000c268,%rbp
mov    %rbp,0x50(%r14)
jmpq   0x40ac2768
mov    $0x1fff000be64,%rbp
mov    %rbp,0x50(%r14)
xor    %eax,%eax
jmpq   0xbfface

But I fully agree that correct code generation is the most important issue.

>  >>  >>  >
>  >>  >>  >>    Another point is that we always write to env->cc_op when
>  >>  >>  >>  translating *cc insns
>  >>  >>  >>    This should solve any issue with dc->cc_op prediction going
>  >>  >>  >>    out of sync with env->cc_op and cpu_cc_src*
>  >>  >>  >
>  >>  >>  > I think this is what is happening now.
>  >>  >>  >
>  >>  >>  >>  2. We must flush lazy evaluation back to CC_OP_FLAGS in a few cases when
>  >>  >>  >>    a. conditional code is required by insn (like addc, cond branch etc.)
>  >>  >>  >>       - here we can optimize by evaluating specific bits (carry?)
>  >>  >>  >>       - not sure if it works in case we have two cond consuming insns,
>  >>  >>  >>         where first needs carry another needs the rest of flags
>  >>  >>  >
>  >>  >>  > Here's another patch to optimize C flag handling. It doesn't pass my
>  >>  >>  > tests though.
>  >>  >>  >
>  >>  >>  >>    b. CCR is read by rdccr (helper_rdccr)
>  >>  >>  >>       - have to compute all flags
>  >>  >>  >>    c. trap occurs and we prepare trap level context (saving pstate)
>  >>  >>  >>       - have to compute all flags
>  >>  >>  >>    d. control goes out of tcg runtime (so gdbstub reads correct value from env)
>  >>  >>  >>       - have to compute all flags
>  >>  >>  >
>  >>  >>  > Fully agree.
>  >>  >>
>  >>  >>
>  >>  >> Cool
>  >>  >>
>  >>  >>  Still I'd propose to kill dc->cc_op, find a reliable way to test it
>  >>  >>  and then add it back possibly with more optimizations.
>  >>  >>  I'm lost in the code up to the point where I believe we need to
>  >>  >>  save/restore cc_op and cpu_cc* while switching trap levels.
>  >>  >
>  >>  > I'd think this should do the trick:
>  >>  >
>  >>  > diff --git a/target-sparc/op_helper.c b/target-sparc/op_helper.c
>  >>  > index b27778b..94921cd 100644
>  >>  > --- a/target-sparc/op_helper.c
>  >>  > +++ b/target-sparc/op_helper.c
>  >>  > @@ -3506,6 +3506,8 @@ void do_interrupt(CPUState *env)
>  >>  >     }
>  >>  >     tsptr = cpu_tsptr(env);
>  >>  >
>  >>  > +    helper_compute_psr();
>  >>  > +
>  >>  >     tsptr->tstate = ((uint64_t)GET_CCR(env) << 32) |
>  >>  >         ((env->asi & 0xff) << 24) | ((env->pstate & 0xf3f) << 8) |
>  >>  >         GET_CWP64(env);
>  >>  >
>  >>
>  >>
>  >> Thanks, this change seems to work here for silo issue.
>  >>
>  >>  Another change would be to flush for gdbstub use of GET_CCR and for
>  >>  helper_rdccr.
>  >>  I tried to embed flush into GET_CCR but the code looks ugly since we
>  >>  need to proxy a call to helper_compute_psr from gdbstub passing
>  >>  available env pointer.
>  >>
>  >>  Not really tested with your changes, but still what is the breakage you see?
>  >
>  > Aurora 2.0 (http://distro.ibiblio.org/pub/linux/distributions/aurora/build-2.0/sparc/iso/)
>  > breaks.
>  >
>  > This is what I get with git HEAD, having pressed enter key twice:
>  > Welcome to Aurora SPARC Linux
>  >
>  >
>  >
>  >
>  >                   +--------------+ CD Found +--------------+
>  >                   |                                        |
>  >                   | To begin testing the CD media before   |
>  >                   | installation press OK.                 |
>  >                   |                                        |
>  >                   | Choose Skip to skip the media test     |
>  >                   | and start the installation.            |
>  >                   |                                        |
>  >                   |      +----+             +------+       |
>  >                   |      | OK |             | Skip |       |
>  >                   |      +----+             +------+       |
>  >                   |                                        |
>  >                   |                                        |
>  >                   +----------------------------------------+
>  >
>  >
>  >
>  >
>  >  <Tab>/<Alt-Tab> between elements  | <Space> selects | <F12> next screen
>  >
>  > This is what I get with the C flag patch applied:
>  > Welcome to Aurora SPARC Linux
>  >
>  >
>  >
>  >
>  >
>  >                    +--------------+ Error +---------------+
>  >                    |                                      |
>  >                    | failed to read keymap information:   |
>  >                    | Success                              |
>  >                    |                                      |
>  >                    |               +----+                 |
>  >                    |               | OK |                 |
>  >                    |               +----+                 |
>  >                    |                                      |
>  >                    |                                      |
>  >                    +--------------------------------------+
>  >
>  >
>  >
>  >
>  >
>  >
>  >  <Tab>/<Alt-Tab> between elements  | <Space> selects | <F12> next screen
>  >
>
>
> I do reproduce the issue here with 0001-Convert-C-flag-input-BROKEN.patch

The patch seems harmless, so maybe it uncovers some hideous bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Qemu-devel] Re: sparc64 lazy conditional codes evaluation
  2010-05-06 18:51             ` Blue Swirl
@ 2010-05-08  6:41               ` Igor Kovalenko
  2010-05-09 20:22                 ` Blue Swirl
  0 siblings, 1 reply; 13+ messages in thread
From: Igor Kovalenko @ 2010-05-08  6:41 UTC (permalink / raw)
  To: Blue Swirl; +Cc: qemu-devel

On Thu, May 6, 2010 at 10:51 PM, Blue Swirl <blauwirbel@gmail.com> wrote:
> On 5/5/10, Igor Kovalenko <igor.v.kovalenko@gmail.com> wrote:
>> On Wed, May 5, 2010 at 12:21 AM, Blue Swirl <blauwirbel@gmail.com> wrote:
>>  > On 5/3/10, Igor Kovalenko <igor.v.kovalenko@gmail.com> wrote:
>>  >> On Mon, May 3, 2010 at 11:54 PM, Blue Swirl <blauwirbel@gmail.com> wrote:
>>  >>  > On 5/3/10, Igor Kovalenko <igor.v.kovalenko@gmail.com> wrote:
>>  >>  >> On Mon, May 3, 2010 at 11:24 PM, Blue Swirl <blauwirbel@gmail.com> wrote:
>>  >>  >>  > On 5/3/10, Igor Kovalenko <igor.v.kovalenko@gmail.com> wrote:
>>  >>  >>  >> Hi!
>>  >>  >>  >>
>>  >>  >>  >>  There is an issue with lazy conditional codes evaluation where
>>  >>  >>  >>  we return from trap handler with mismatching conditionals.
>>  >>  >>  >>
>>  >>  >>  >>  I seldom reproduce it here when dragging qemu window while
>>  >>  >>  >>  machine is working through silo initialization. I use gentoo minimal cd
>>  >>  >>  >>  install-sparc64-minimal-20100322.iso but I think anything with silo boot
>>  >>  >>  >>  would experience the same. Once in a while it would report crc error,
>>  >>  >>  >>  unable to open cd partition or it would fail to decompress image.
>>  >>  >>  >
>>  >>  >>  > I think I've also seen this.
>>  >>  >>  >
>>  >>  >>  >>  Pattern that fails appears to require a sequence of compare insn
>>  >>  >>  >>  possibly followed by a few instructions which do not touch conditionals,
>>  >>  >>  >>  then conditional branch insn. If it happens that we trap while processing
>>  >>  >>  >>  conditional branch insn so it is restarted after return from trap then
>>  >>  >>  >>  seldom conditional codes are calculated incorrectly.
>>  >>  >>  >>
>>  >>  >>  >>  I cannot point to exact cause but it appears that after trap return
>>  >>  >>  >>  we may have CC_OP and CC_SRC* mismatch somewhere,
>>  >>  >>  >>  since adding more cond evaluation flushes over the code helps.
>>  >>  >>  >>
>>  >>  >>  >>  We already tried doing flush more frequently and it is still not
>>  >>  >>  >>  complete, so the question is how to finally do this once and right :)
>>  >>  >>  >>
>>  >>  >>  >>  Obviously I do not get the design of lazy evaluation right, but
>>  >>  >>  >>  the following list appears to be good start. Plan is to prepare
>>  >>  >>  >>  a change to qemu and find a way to test it.
>>  >>  >>  >>
>>  >>  >>  >>  1. Since SPARC* is a RISC CPU it seems to be not profitable to
>>  >>  >>  >>    use DisasContext->cc_op to predict if flags should be not evaluated
>>  >>  >>  >>    due to overriding insn. Instead we can drop cc_op from disassembler
>>  >>  >>  >>    context and simplify code to only use cc_op from env.
>>  >>  >>  >
>>  >>  >>  > Not currently, but in the future we may use that to do even lazier
>>  >>  >>  > flags computation. For example the sequence 'cmp x, y; bne target'
>>  >>  >>  > could be much more optimal by changing the branch to do the
>>  >>  >>  > comparison. Here's an old unfinished patch to do some of this.
>>
>>
>> I wonder if it buys anything. Sparc RISC architecture means optimizing compiler
>>  would prevent any extra flags computation, right? So it is basically 1-to-1
>>  conditional computation and use. Or even worse, if we delay computation
>>  until there are two or more consumers, correct?
>
> For x86 target, that is the other part of savings from using lazy
> condition computation. It's true that it will not benefit RISC targets
> much.
>
> But I think you are missing the other part, where the actual flags are
> not calculated but instead the original comparison can be used.
>
> For example, consider this Sparc64 code:
> IN:
> 0x000001fff000be58:  cmp  %g2, 0x51
>
> OUT: [size=82]
> 0x40df16b0:  mov    0x10(%r14),%rbp
> 0x40df16b4:  mov    %rbp,%rbx
> 0x40df16b7:  sub    $0x51,%rbx
> 0x40df16bb:  mov    %rbx,%r12
> 0x40df16be:  mov    %r12,0x10a60(%r14)
> 0x40df16c5:  mov    %rbp,0x60(%r14)
> 0x40df16c9:  mov    $0x51,%ebp
> 0x40df16ce:  mov    %rbp,0x68(%r14)
> 0x40df16d2:  mov    %rbx,0x70(%r14)
> 0x40df16d6:  mov    $0x7,%ebp
> 0x40df16db:  mov    %ebp,0x78(%r14)
> 0x40df16df:  mov    $0x1fff000be5c,%rbp
> 0x40df16e9:  mov    %rbp,0x48(%r14)
> 0x40df16ed:  mov    $0x1fff000be60,%rbp
> 0x40df16f7:  mov    %rbp,0x50(%r14)
> 0x40df16fb:  xor    %eax,%eax
> 0x40df16fd:  jmpq   0xbfface
>
> --------------
> IN:
> 0x000001fff000be5c:  bne  0x1fff000c268
>
> OUT: [size=95]
> 0x40df1710:  callq  0x518260
> 0x40df1715:  mov    0x98(%r14),%ebp
> 0x40df171c:  mov    %ebp,%ebx
> 0x40df171e:  shr    $0x16,%rbx
> 0x40df1722:  and    $0x1,%ebx
> 0x40df1725:  xor    $0x1,%rbx
> 0x40df1729:  mov    %rbx,0x90(%r14)
> 0x40df1730:  mov    $0x1fff000be60,%rbp
> 0x40df173a:  mov    %rbp,0x48(%r14)
> 0x40df173e:  test   %rbx,%rbx
> 0x40df1741:  je     0x40df175a
> 0x40df1747:  mov    $0x1fff000c268,%rbp
> 0x40df1751:  mov    %rbp,0x50(%r14)
> 0x40df1755:  jmpq   0x40df1768
> 0x40df175a:  mov    $0x1fff000be64,%rbp
> 0x40df1764:  mov    %rbp,0x50(%r14)
> 0x40df1768:  xor    %eax,%eax
> 0x40df176a:  jmpq   0xbfface
>
> Instead of calculating any flags, we should generate for combined
> 'cmp/btest + branch' sequences something like:
> mov    0x10(%r14),%rbp
> mov    %rbp,%rbx
> cmp    $0x51,%rbx
> je     0x40ac275a
> mov    $0x1fff000c268,%rbp
> mov    %rbp,0x50(%r14)
> jmpq   0x40ac2768
> mov    $0x1fff000be64,%rbp
> mov    %rbp,0x50(%r14)
> xor    %eax,%eax
> jmpq   0xbfface
>
> But I fully agree that correct code generation is the most important issue.
>
>>  >>  >>  >
>>  >>  >>  >>    Another point is that we always write to env->cc_op when
>>  >>  >>  >>  translating *cc insns
>>  >>  >>  >>    This should solve any issue with dc->cc_op prediction going
>>  >>  >>  >>    out of sync with env->cc_op and cpu_cc_src*
>>  >>  >>  >
>>  >>  >>  > I think this is what is happening now.
>>  >>  >>  >
>>  >>  >>  >>  2. We must flush lazy evaluation back to CC_OP_FLAGS in a few cases when
>>  >>  >>  >>    a. conditional code is required by insn (like addc, cond branch etc.)
>>  >>  >>  >>       - here we can optimize by evaluating specific bits (carry?)
>>  >>  >>  >>       - not sure if it works in case we have two cond consuming insns,
>>  >>  >>  >>         where first needs carry another needs the rest of flags
>>  >>  >>  >
>>  >>  >>  > Here's another patch to optimize C flag handling. It doesn't pass my
>>  >>  >>  > tests though.
>>  >>  >>  >
>>  >>  >>  >>    b. CCR is read by rdccr (helper_rdccr)
>>  >>  >>  >>       - have to compute all flags
>>  >>  >>  >>    c. trap occurs and we prepare trap level context (saving pstate)
>>  >>  >>  >>       - have to compute all flags
>>  >>  >>  >>    d. control goes out of tcg runtime (so gdbstub reads correct value from env)
>>  >>  >>  >>       - have to compute all flags
>>  >>  >>  >
>>  >>  >>  > Fully agree.
>>  >>  >>
>>  >>  >>
>>  >>  >> Cool
>>  >>  >>
>>  >>  >>  Still I'd propose to kill dc->cc_op, find a reliable way to test it
>>  >>  >>  and then add it back possibly with more optimizations.
>>  >>  >>  I'm lost in the code up to the point where I believe we need to
>>  >>  >>  save/restore cc_op and cpu_cc* while switching trap levels.
>>  >>  >
>>  >>  > I'd think this should do the trick:
>>  >>  >
>>  >>  > diff --git a/target-sparc/op_helper.c b/target-sparc/op_helper.c
>>  >>  > index b27778b..94921cd 100644
>>  >>  > --- a/target-sparc/op_helper.c
>>  >>  > +++ b/target-sparc/op_helper.c
>>  >>  > @@ -3506,6 +3506,8 @@ void do_interrupt(CPUState *env)
>>  >>  >     }
>>  >>  >     tsptr = cpu_tsptr(env);
>>  >>  >
>>  >>  > +    helper_compute_psr();
>>  >>  > +
>>  >>  >     tsptr->tstate = ((uint64_t)GET_CCR(env) << 32) |
>>  >>  >         ((env->asi & 0xff) << 24) | ((env->pstate & 0xf3f) << 8) |
>>  >>  >         GET_CWP64(env);
>>  >>  >
>>  >>
>>  >>
>>  >> Thanks, this change seems to work here for silo issue.
>>  >>
>>  >>  Another change would be to flush for gdbstub use of GET_CCR and for
>>  >>  helper_rdccr.
>>  >>  I tried to embed flush into GET_CCR but the code looks ugly since we
>>  >>  need to proxy a call to helper_compute_psr from gdbstub passing
>>  >>  available env pointer.
>>  >>
>>  >>  Not really tested with your changes, but still what is the breakage you see?
>>  >
>>  > Aurora 2.0 (http://distro.ibiblio.org/pub/linux/distributions/aurora/build-2.0/sparc/iso/)
>>  > breaks.
>>  >
>>  > This is what I get with git HEAD, having pressed enter key twice:
>>  > Welcome to Aurora SPARC Linux
>>  >
>>  >
>>  >
>>  >
>>  >                   +--------------+ CD Found +--------------+
>>  >                   |                                        |
>>  >                   | To begin testing the CD media before   |
>>  >                   | installation press OK.                 |
>>  >                   |                                        |
>>  >                   | Choose Skip to skip the media test     |
>>  >                   | and start the installation.            |
>>  >                   |                                        |
>>  >                   |      +----+             +------+       |
>>  >                   |      | OK |             | Skip |       |
>>  >                   |      +----+             +------+       |
>>  >                   |                                        |
>>  >                   |                                        |
>>  >                   +----------------------------------------+
>>  >
>>  >
>>  >
>>  >
>>  >  <Tab>/<Alt-Tab> between elements  | <Space> selects | <F12> next screen
>>  >
>>  > This is what I get with the C flag patch applied:
>>  > Welcome to Aurora SPARC Linux
>>  >
>>  >
>>  >
>>  >
>>  >
>>  >                    +--------------+ Error +---------------+
>>  >                    |                                      |
>>  >                    | failed to read keymap information:   |
>>  >                    | Success                              |
>>  >                    |                                      |
>>  >                    |               +----+                 |
>>  >                    |               | OK |                 |
>>  >                    |               +----+                 |
>>  >                    |                                      |
>>  >                    |                                      |
>>  >                    +--------------------------------------+
>>  >
>>  >
>>  >
>>  >
>>  >
>>  >
>>  >  <Tab>/<Alt-Tab> between elements  | <Space> selects | <F12> next screen
>>  >
>>
>>
>> I do reproduce the issue here with 0001-Convert-C-flag-input-BROKEN.patch
>
> The patch seems harmless, so maybe it uncovers some hideous bug.
>

Right, the bug is inside the gen_op_*cc helpers - we need to compute C
using current flag source, whereas current implementation clobbers source
then computes C.

I think I now get the lazy evaluation design right :)

Something along these changes to your 0001-Convert-C-flag-input-BROKEN.patch
appears to be good for aurora 2.0 test:

diff --git a/target-sparc/translate.c b/target-sparc/translate.c
index 94c343d..ea7c71b 100644
--- a/target-sparc/translate.c
+++ b/target-sparc/translate.c
@@ -334,9 +334,9 @@ static inline void gen_op_add_cc(TCGv dst, TCGv
src1, TCGv src2)

 static inline void gen_op_addxi_cc(TCGv dst, TCGv src1, target_long src2)
 {
+    gen_helper_compute_C_icc(cpu_tmp0);
     tcg_gen_mov_tl(cpu_cc_src, src1);
     tcg_gen_movi_tl(cpu_cc_src2, src2);
-    gen_helper_compute_C_icc(cpu_tmp0);
     tcg_gen_add_tl(cpu_cc_dst, cpu_cc_src, cpu_tmp0);
     tcg_gen_addi_tl(cpu_cc_dst, cpu_cc_dst, src2);
     tcg_gen_mov_tl(dst, cpu_cc_dst);
@@ -344,9 +344,9 @@ static inline void gen_op_addxi_cc(TCGv dst, TCGv
src1, target_long src2)

 static inline void gen_op_addx_cc(TCGv dst, TCGv src1, TCGv src2)
 {
+    gen_helper_compute_C_icc(cpu_tmp0);
     tcg_gen_mov_tl(cpu_cc_src, src1);
     tcg_gen_mov_tl(cpu_cc_src2, src2);
-    gen_helper_compute_C_icc(cpu_tmp0);
     tcg_gen_add_tl(cpu_cc_dst, cpu_cc_src, cpu_tmp0);
     tcg_gen_add_tl(cpu_cc_dst, cpu_cc_dst, cpu_cc_src2);
     tcg_gen_mov_tl(dst, cpu_cc_dst);
@@ -417,9 +417,9 @@ static inline void gen_op_sub_cc(TCGv dst, TCGv
src1, TCGv src2)

 static inline void gen_op_subxi_cc(TCGv dst, TCGv src1, target_long src2)
 {
+    gen_helper_compute_C_icc(cpu_tmp0);
     tcg_gen_mov_tl(cpu_cc_src, src1);
     tcg_gen_movi_tl(cpu_cc_src2, src2);
-    gen_helper_compute_C_icc(cpu_tmp0);
     tcg_gen_sub_tl(cpu_cc_dst, cpu_cc_src, cpu_tmp0);
     tcg_gen_subi_tl(cpu_cc_dst, cpu_cc_dst, src2);
     tcg_gen_mov_tl(dst, cpu_cc_dst);
@@ -427,9 +427,9 @@ static inline void gen_op_subxi_cc(TCGv dst, TCGv
src1, target_long src2)

 static inline void gen_op_subx_cc(TCGv dst, TCGv src1, TCGv src2)
 {
+    gen_helper_compute_C_icc(cpu_tmp0);
     tcg_gen_mov_tl(cpu_cc_src, src1);
     tcg_gen_mov_tl(cpu_cc_src2, src2);
-    gen_helper_compute_C_icc(cpu_tmp0);
     tcg_gen_sub_tl(cpu_cc_dst, cpu_cc_src, cpu_tmp0);
     tcg_gen_sub_tl(cpu_cc_dst, cpu_cc_dst, cpu_cc_src2);
     tcg_gen_mov_tl(dst, cpu_cc_dst);

-- 
Kind regards,
Igor V. Kovalenko

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] Re: sparc64 lazy conditional codes evaluation
  2010-05-08  6:41               ` Igor Kovalenko
@ 2010-05-09 20:22                 ` Blue Swirl
  2010-05-10  9:40                   ` Mark Cave-Ayland
  0 siblings, 1 reply; 13+ messages in thread
From: Blue Swirl @ 2010-05-09 20:22 UTC (permalink / raw)
  To: Igor Kovalenko; +Cc: qemu-devel

On 5/8/10, Igor Kovalenko <igor.v.kovalenko@gmail.com> wrote:
> On Thu, May 6, 2010 at 10:51 PM, Blue Swirl <blauwirbel@gmail.com> wrote:
>  > On 5/5/10, Igor Kovalenko <igor.v.kovalenko@gmail.com> wrote:
>  >> On Wed, May 5, 2010 at 12:21 AM, Blue Swirl <blauwirbel@gmail.com> wrote:
>  >>  > On 5/3/10, Igor Kovalenko <igor.v.kovalenko@gmail.com> wrote:
>  >>  >> On Mon, May 3, 2010 at 11:54 PM, Blue Swirl <blauwirbel@gmail.com> wrote:
>  >>  >>  > On 5/3/10, Igor Kovalenko <igor.v.kovalenko@gmail.com> wrote:
>  >>  >>  >> On Mon, May 3, 2010 at 11:24 PM, Blue Swirl <blauwirbel@gmail.com> wrote:
>  >>  >>  >>  > On 5/3/10, Igor Kovalenko <igor.v.kovalenko@gmail.com> wrote:
>  >>  >>  >>  >> Hi!
>  >>  >>  >>  >>
>  >>  >>  >>  >>  There is an issue with lazy conditional codes evaluation where
>  >>  >>  >>  >>  we return from trap handler with mismatching conditionals.
>  >>  >>  >>  >>
>  >>  >>  >>  >>  I seldom reproduce it here when dragging qemu window while
>  >>  >>  >>  >>  machine is working through silo initialization. I use gentoo minimal cd
>  >>  >>  >>  >>  install-sparc64-minimal-20100322.iso but I think anything with silo boot
>  >>  >>  >>  >>  would experience the same. Once in a while it would report crc error,
>  >>  >>  >>  >>  unable to open cd partition or it would fail to decompress image.
>  >>  >>  >>  >
>  >>  >>  >>  > I think I've also seen this.
>  >>  >>  >>  >
>  >>  >>  >>  >>  Pattern that fails appears to require a sequence of compare insn
>  >>  >>  >>  >>  possibly followed by a few instructions which do not touch conditionals,
>  >>  >>  >>  >>  then conditional branch insn. If it happens that we trap while processing
>  >>  >>  >>  >>  conditional branch insn so it is restarted after return from trap then
>  >>  >>  >>  >>  seldom conditional codes are calculated incorrectly.
>  >>  >>  >>  >>
>  >>  >>  >>  >>  I cannot point to exact cause but it appears that after trap return
>  >>  >>  >>  >>  we may have CC_OP and CC_SRC* mismatch somewhere,
>  >>  >>  >>  >>  since adding more cond evaluation flushes over the code helps.
>  >>  >>  >>  >>
>  >>  >>  >>  >>  We already tried doing flush more frequently and it is still not
>  >>  >>  >>  >>  complete, so the question is how to finally do this once and right :)
>  >>  >>  >>  >>
>  >>  >>  >>  >>  Obviously I do not get the design of lazy evaluation right, but
>  >>  >>  >>  >>  the following list appears to be good start. Plan is to prepare
>  >>  >>  >>  >>  a change to qemu and find a way to test it.
>  >>  >>  >>  >>
>  >>  >>  >>  >>  1. Since SPARC* is a RISC CPU it seems to be not profitable to
>  >>  >>  >>  >>    use DisasContext->cc_op to predict if flags should be not evaluated
>  >>  >>  >>  >>    due to overriding insn. Instead we can drop cc_op from disassembler
>  >>  >>  >>  >>    context and simplify code to only use cc_op from env.
>  >>  >>  >>  >
>  >>  >>  >>  > Not currently, but in the future we may use that to do even lazier
>  >>  >>  >>  > flags computation. For example the sequence 'cmp x, y; bne target'
>  >>  >>  >>  > could be much more optimal by changing the branch to do the
>  >>  >>  >>  > comparison. Here's an old unfinished patch to do some of this.
>  >>
>  >>
>  >> I wonder if it buys anything. Sparc RISC architecture means optimizing compiler
>  >>  would prevent any extra flags computation, right? So it is basically 1-to-1
>  >>  conditional computation and use. Or even worse, if we delay computation
>  >>  until there are two or more consumers, correct?
>  >
>  > For x86 target, that is the other part of savings from using lazy
>  > condition computation. It's true that it will not benefit RISC targets
>  > much.
>  >
>  > But I think you are missing the other part, where the actual flags are
>  > not calculated but instead the original comparison can be used.
>  >
>  > For example, consider this Sparc64 code:
>  > IN:
>  > 0x000001fff000be58:  cmp  %g2, 0x51
>  >
>  > OUT: [size=82]
>  > 0x40df16b0:  mov    0x10(%r14),%rbp
>  > 0x40df16b4:  mov    %rbp,%rbx
>  > 0x40df16b7:  sub    $0x51,%rbx
>  > 0x40df16bb:  mov    %rbx,%r12
>  > 0x40df16be:  mov    %r12,0x10a60(%r14)
>  > 0x40df16c5:  mov    %rbp,0x60(%r14)
>  > 0x40df16c9:  mov    $0x51,%ebp
>  > 0x40df16ce:  mov    %rbp,0x68(%r14)
>  > 0x40df16d2:  mov    %rbx,0x70(%r14)
>  > 0x40df16d6:  mov    $0x7,%ebp
>  > 0x40df16db:  mov    %ebp,0x78(%r14)
>  > 0x40df16df:  mov    $0x1fff000be5c,%rbp
>  > 0x40df16e9:  mov    %rbp,0x48(%r14)
>  > 0x40df16ed:  mov    $0x1fff000be60,%rbp
>  > 0x40df16f7:  mov    %rbp,0x50(%r14)
>  > 0x40df16fb:  xor    %eax,%eax
>  > 0x40df16fd:  jmpq   0xbfface
>  >
>  > --------------
>  > IN:
>  > 0x000001fff000be5c:  bne  0x1fff000c268
>  >
>  > OUT: [size=95]
>  > 0x40df1710:  callq  0x518260
>  > 0x40df1715:  mov    0x98(%r14),%ebp
>  > 0x40df171c:  mov    %ebp,%ebx
>  > 0x40df171e:  shr    $0x16,%rbx
>  > 0x40df1722:  and    $0x1,%ebx
>  > 0x40df1725:  xor    $0x1,%rbx
>  > 0x40df1729:  mov    %rbx,0x90(%r14)
>  > 0x40df1730:  mov    $0x1fff000be60,%rbp
>  > 0x40df173a:  mov    %rbp,0x48(%r14)
>  > 0x40df173e:  test   %rbx,%rbx
>  > 0x40df1741:  je     0x40df175a
>  > 0x40df1747:  mov    $0x1fff000c268,%rbp
>  > 0x40df1751:  mov    %rbp,0x50(%r14)
>  > 0x40df1755:  jmpq   0x40df1768
>  > 0x40df175a:  mov    $0x1fff000be64,%rbp
>  > 0x40df1764:  mov    %rbp,0x50(%r14)
>  > 0x40df1768:  xor    %eax,%eax
>  > 0x40df176a:  jmpq   0xbfface
>  >
>  > Instead of calculating any flags, we should generate for combined
>  > 'cmp/btest + branch' sequences something like:
>  > mov    0x10(%r14),%rbp
>  > mov    %rbp,%rbx
>  > cmp    $0x51,%rbx
>  > je     0x40ac275a
>  > mov    $0x1fff000c268,%rbp
>  > mov    %rbp,0x50(%r14)
>  > jmpq   0x40ac2768
>  > mov    $0x1fff000be64,%rbp
>  > mov    %rbp,0x50(%r14)
>  > xor    %eax,%eax
>  > jmpq   0xbfface
>  >
>  > But I fully agree that correct code generation is the most important issue.
>  >
>  >>  >>  >>  >
>  >>  >>  >>  >>    Another point is that we always write to env->cc_op when
>  >>  >>  >>  >>  translating *cc insns
>  >>  >>  >>  >>    This should solve any issue with dc->cc_op prediction going
>  >>  >>  >>  >>    out of sync with env->cc_op and cpu_cc_src*
>  >>  >>  >>  >
>  >>  >>  >>  > I think this is what is happening now.
>  >>  >>  >>  >
>  >>  >>  >>  >>  2. We must flush lazy evaluation back to CC_OP_FLAGS in a few cases when
>  >>  >>  >>  >>    a. conditional code is required by insn (like addc, cond branch etc.)
>  >>  >>  >>  >>       - here we can optimize by evaluating specific bits (carry?)
>  >>  >>  >>  >>       - not sure if it works in case we have two cond consuming insns,
>  >>  >>  >>  >>         where first needs carry another needs the rest of flags
>  >>  >>  >>  >
>  >>  >>  >>  > Here's another patch to optimize C flag handling. It doesn't pass my
>  >>  >>  >>  > tests though.
>  >>  >>  >>  >
>  >>  >>  >>  >>    b. CCR is read by rdccr (helper_rdccr)
>  >>  >>  >>  >>       - have to compute all flags
>  >>  >>  >>  >>    c. trap occurs and we prepare trap level context (saving pstate)
>  >>  >>  >>  >>       - have to compute all flags
>  >>  >>  >>  >>    d. control goes out of tcg runtime (so gdbstub reads correct value from env)
>  >>  >>  >>  >>       - have to compute all flags
>  >>  >>  >>  >
>  >>  >>  >>  > Fully agree.
>  >>  >>  >>
>  >>  >>  >>
>  >>  >>  >> Cool
>  >>  >>  >>
>  >>  >>  >>  Still I'd propose to kill dc->cc_op, find a reliable way to test it
>  >>  >>  >>  and then add it back possibly with more optimizations.
>  >>  >>  >>  I'm lost in the code up to the point where I believe we need to
>  >>  >>  >>  save/restore cc_op and cpu_cc* while switching trap levels.
>  >>  >>  >
>  >>  >>  > I'd think this should do the trick:
>  >>  >>  >
>  >>  >>  > diff --git a/target-sparc/op_helper.c b/target-sparc/op_helper.c
>  >>  >>  > index b27778b..94921cd 100644
>  >>  >>  > --- a/target-sparc/op_helper.c
>  >>  >>  > +++ b/target-sparc/op_helper.c
>  >>  >>  > @@ -3506,6 +3506,8 @@ void do_interrupt(CPUState *env)
>  >>  >>  >     }
>  >>  >>  >     tsptr = cpu_tsptr(env);
>  >>  >>  >
>  >>  >>  > +    helper_compute_psr();
>  >>  >>  > +
>  >>  >>  >     tsptr->tstate = ((uint64_t)GET_CCR(env) << 32) |
>  >>  >>  >         ((env->asi & 0xff) << 24) | ((env->pstate & 0xf3f) << 8) |
>  >>  >>  >         GET_CWP64(env);
>  >>  >>  >
>  >>  >>
>  >>  >>
>  >>  >> Thanks, this change seems to work here for silo issue.
>  >>  >>
>  >>  >>  Another change would be to flush for gdbstub use of GET_CCR and for
>  >>  >>  helper_rdccr.
>  >>  >>  I tried to embed flush into GET_CCR but the code looks ugly since we
>  >>  >>  need to proxy a call to helper_compute_psr from gdbstub passing
>  >>  >>  available env pointer.
>  >>  >>
>  >>  >>  Not really tested with your changes, but still what is the breakage you see?
>  >>  >
>  >>  > Aurora 2.0 (http://distro.ibiblio.org/pub/linux/distributions/aurora/build-2.0/sparc/iso/)
>  >>  > breaks.
>  >>  >
>  >>  > This is what I get with git HEAD, having pressed enter key twice:
>  >>  > Welcome to Aurora SPARC Linux
>  >>  >
>  >>  >
>  >>  >
>  >>  >
>  >>  >                   +--------------+ CD Found +--------------+
>  >>  >                   |                                        |
>  >>  >                   | To begin testing the CD media before   |
>  >>  >                   | installation press OK.                 |
>  >>  >                   |                                        |
>  >>  >                   | Choose Skip to skip the media test     |
>  >>  >                   | and start the installation.            |
>  >>  >                   |                                        |
>  >>  >                   |      +----+             +------+       |
>  >>  >                   |      | OK |             | Skip |       |
>  >>  >                   |      +----+             +------+       |
>  >>  >                   |                                        |
>  >>  >                   |                                        |
>  >>  >                   +----------------------------------------+
>  >>  >
>  >>  >
>  >>  >
>  >>  >
>  >>  >  <Tab>/<Alt-Tab> between elements  | <Space> selects | <F12> next screen
>  >>  >
>  >>  > This is what I get with the C flag patch applied:
>  >>  > Welcome to Aurora SPARC Linux
>  >>  >
>  >>  >
>  >>  >
>  >>  >
>  >>  >
>  >>  >                    +--------------+ Error +---------------+
>  >>  >                    |                                      |
>  >>  >                    | failed to read keymap information:   |
>  >>  >                    | Success                              |
>  >>  >                    |                                      |
>  >>  >                    |               +----+                 |
>  >>  >                    |               | OK |                 |
>  >>  >                    |               +----+                 |
>  >>  >                    |                                      |
>  >>  >                    |                                      |
>  >>  >                    +--------------------------------------+
>  >>  >
>  >>  >
>  >>  >
>  >>  >
>  >>  >
>  >>  >
>  >>  >  <Tab>/<Alt-Tab> between elements  | <Space> selects | <F12> next screen
>  >>  >
>  >>
>  >>
>  >> I do reproduce the issue here with 0001-Convert-C-flag-input-BROKEN.patch
>  >
>  > The patch seems harmless, so maybe it uncovers some hideous bug.
>  >
>
>
> Right, the bug is inside the gen_op_*cc helpers - we need to compute C
>  using current flag source, whereas current implementation clobbers source
>  then computes C.
>
>  I think I now get the lazy evaluation design right :)
>
>  Something along these changes to your 0001-Convert-C-flag-input-BROKEN.patch
>  appears to be good for aurora 2.0 test:
>
>  diff --git a/target-sparc/translate.c b/target-sparc/translate.c
>  index 94c343d..ea7c71b 100644
>  --- a/target-sparc/translate.c
>  +++ b/target-sparc/translate.c
>  @@ -334,9 +334,9 @@ static inline void gen_op_add_cc(TCGv dst, TCGv
>  src1, TCGv src2)
>
>   static inline void gen_op_addxi_cc(TCGv dst, TCGv src1, target_long src2)
>   {
>  +    gen_helper_compute_C_icc(cpu_tmp0);
>      tcg_gen_mov_tl(cpu_cc_src, src1);
>      tcg_gen_movi_tl(cpu_cc_src2, src2);
>  -    gen_helper_compute_C_icc(cpu_tmp0);
>      tcg_gen_add_tl(cpu_cc_dst, cpu_cc_src, cpu_tmp0);
>      tcg_gen_addi_tl(cpu_cc_dst, cpu_cc_dst, src2);
>      tcg_gen_mov_tl(dst, cpu_cc_dst);
>  @@ -344,9 +344,9 @@ static inline void gen_op_addxi_cc(TCGv dst, TCGv
>  src1, target_long src2)
>
>   static inline void gen_op_addx_cc(TCGv dst, TCGv src1, TCGv src2)
>   {
>  +    gen_helper_compute_C_icc(cpu_tmp0);
>      tcg_gen_mov_tl(cpu_cc_src, src1);
>      tcg_gen_mov_tl(cpu_cc_src2, src2);
>  -    gen_helper_compute_C_icc(cpu_tmp0);
>      tcg_gen_add_tl(cpu_cc_dst, cpu_cc_src, cpu_tmp0);
>      tcg_gen_add_tl(cpu_cc_dst, cpu_cc_dst, cpu_cc_src2);
>      tcg_gen_mov_tl(dst, cpu_cc_dst);
>  @@ -417,9 +417,9 @@ static inline void gen_op_sub_cc(TCGv dst, TCGv
>  src1, TCGv src2)
>
>   static inline void gen_op_subxi_cc(TCGv dst, TCGv src1, target_long src2)
>   {
>  +    gen_helper_compute_C_icc(cpu_tmp0);
>      tcg_gen_mov_tl(cpu_cc_src, src1);
>      tcg_gen_movi_tl(cpu_cc_src2, src2);
>  -    gen_helper_compute_C_icc(cpu_tmp0);
>      tcg_gen_sub_tl(cpu_cc_dst, cpu_cc_src, cpu_tmp0);
>      tcg_gen_subi_tl(cpu_cc_dst, cpu_cc_dst, src2);
>      tcg_gen_mov_tl(dst, cpu_cc_dst);
>  @@ -427,9 +427,9 @@ static inline void gen_op_subxi_cc(TCGv dst, TCGv
>  src1, target_long src2)
>
>   static inline void gen_op_subx_cc(TCGv dst, TCGv src1, TCGv src2)
>   {
>  +    gen_helper_compute_C_icc(cpu_tmp0);
>      tcg_gen_mov_tl(cpu_cc_src, src1);
>      tcg_gen_mov_tl(cpu_cc_src2, src2);
>  -    gen_helper_compute_C_icc(cpu_tmp0);
>      tcg_gen_sub_tl(cpu_cc_dst, cpu_cc_src, cpu_tmp0);
>      tcg_gen_sub_tl(cpu_cc_dst, cpu_cc_dst, cpu_cc_src2);
>      tcg_gen_mov_tl(dst, cpu_cc_dst);

Thanks a lot, with this patch my tests passed! I applied the combined patch.

I also did a bit of refactoring to get the original Sparc64 issue fixed.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] Re: sparc64 lazy conditional codes evaluation
  2010-05-09 20:22                 ` Blue Swirl
@ 2010-05-10  9:40                   ` Mark Cave-Ayland
  2010-05-10 18:33                     ` Blue Swirl
  0 siblings, 1 reply; 13+ messages in thread
From: Mark Cave-Ayland @ 2010-05-10  9:40 UTC (permalink / raw)
  To: Blue Swirl; +Cc: qemu-devel

Blue Swirl wrote:

> Thanks a lot, with this patch my tests passed! I applied the combined patch.

Yes, I definitely see an improvement with this patch - at least my 
Debian lenny SPARC boot cd doesn't randomly kernel panic any more. It 
looks as if it now just can't find /init which could just be due to an 
incorrect device mapping somewhere.

> I also did a bit of refactoring to get the original Sparc64 issue fixed.

However, one thing I did notice is that this does introduce a noticeable 
performance penalty. With OpenBIOS SVN head I see the following:

With commit 72139e83a98eba2bfed2dbc2db2818fb19e47ca0 (just before the 
changes):

[   59.225406] Failed to execute /init
[   59.304088] Kernel panic - not syncing: No init found.  Try passing 
init= option to kernel.
[   59.450313] Press Stop-A (L1-A) to return to the boot prom

With commit 5a834bb47c373e887de5210b7ceae96e1ef413f7 (just after the 
changes):

[   70.384466] Failed to execute /init
[   70.474804] Kernel panic - not syncing: No init found.  Try passing 
init= option to kernel.


So while it's technically correct, it seems to have added ~15% overhead 
to the emulation :(


ATB,

Mark.

-- 
Mark Cave-Ayland - Senior Technical Architect
PostgreSQL - PostGIS
Sirius Corporation plc - control through freedom
http://www.siriusit.co.uk
t: +44 870 608 0063

Sirius Labs: http://www.siriusit.co.uk/labs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] Re: sparc64 lazy conditional codes evaluation
  2010-05-10  9:40                   ` Mark Cave-Ayland
@ 2010-05-10 18:33                     ` Blue Swirl
  2010-05-15 12:56                       ` Mark Cave-Ayland
  0 siblings, 1 reply; 13+ messages in thread
From: Blue Swirl @ 2010-05-10 18:33 UTC (permalink / raw)
  To: Mark Cave-Ayland; +Cc: qemu-devel

On 5/10/10, Mark Cave-Ayland <mark.cave-ayland@siriusit.co.uk> wrote:
> Blue Swirl wrote:
>
>
> > Thanks a lot, with this patch my tests passed! I applied the combined
> patch.
> >
>
>  Yes, I definitely see an improvement with this patch - at least my Debian
> lenny SPARC boot cd doesn't randomly kernel panic any more. It looks as if
> it now just can't find /init which could just be due to an incorrect device
> mapping somewhere.
>
>
> > I also did a bit of refactoring to get the original Sparc64 issue fixed.
> >
>
>  However, one thing I did notice is that this does introduce a noticeable
> performance penalty. With OpenBIOS SVN head I see the following:
>
>  With commit 72139e83a98eba2bfed2dbc2db2818fb19e47ca0 (just
> before the changes):
>
>  [   59.225406] Failed to execute /init
>  [   59.304088] Kernel panic - not syncing: No init found.  Try passing
> init= option to kernel.
>  [   59.450313] Press Stop-A (L1-A) to return to the boot prom
>
>  With commit 5a834bb47c373e887de5210b7ceae96e1ef413f7 (just
> after the changes):
>
>  [   70.384466] Failed to execute /init
>  [   70.474804] Kernel panic - not syncing: No init found.  Try passing
> init= option to kernel.
>
>
>  So while it's technically correct, it seems to have added ~15% overhead to
> the emulation :(

Guest time can be unreliable, it could also indicate that Linux
executes a lot more timer interrupts. Could you retest and measure the
wall clock time?

I think the C flag change should only increase performance. The next
commit may have negative effects because more work is done every
interrupt, but it's also more correct now.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] Re: sparc64 lazy conditional codes evaluation
  2010-05-10 18:33                     ` Blue Swirl
@ 2010-05-15 12:56                       ` Mark Cave-Ayland
  0 siblings, 0 replies; 13+ messages in thread
From: Mark Cave-Ayland @ 2010-05-15 12:56 UTC (permalink / raw)
  To: Blue Swirl; +Cc: qemu-devel

Blue Swirl wrote:

> Guest time can be unreliable, it could also indicate that Linux
> executes a lot more timer interrupts. Could you retest and measure the
> wall clock time?
> 
> I think the C flag change should only increase performance. The next
> commit may have negative effects because more work is done every
> interrupt, but it's also more correct now.

Hmmm looks like you're right. I did a few more tests measuring wall 
clock time and averaging across a set of runs gives roughly the same 
time (although there does seem to be quite a bit of variation in 
resulting times on my system here). So nothing to worry about here.


ATB,

Mark.

-- 
Mark Cave-Ayland - Senior Technical Architect
PostgreSQL - PostGIS
Sirius Corporation plc - control through freedom
http://www.siriusit.co.uk
t: +44 870 608 0063

Sirius Labs: http://www.siriusit.co.uk/labs

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2010-05-15 12:56 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-05-03  7:17 [Qemu-devel] sparc64 lazy conditional codes evaluation Igor Kovalenko
2010-05-03 19:24 ` [Qemu-devel] " Blue Swirl
2010-05-03 19:46   ` Igor Kovalenko
2010-05-03 19:54     ` Blue Swirl
2010-05-03 20:03       ` Igor Kovalenko
2010-05-04 20:21         ` Blue Swirl
2010-05-05 20:24           ` Igor Kovalenko
2010-05-06 18:51             ` Blue Swirl
2010-05-08  6:41               ` Igor Kovalenko
2010-05-09 20:22                 ` Blue Swirl
2010-05-10  9:40                   ` Mark Cave-Ayland
2010-05-10 18:33                     ` Blue Swirl
2010-05-15 12:56                       ` Mark Cave-Ayland

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.