All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 00/13] target-openrisc: More optimizations and corrections
@ 2013-10-29 19:04 Sebastian Macke
  2013-10-29 19:04 ` [Qemu-devel] [PATCH 01/13] target-openrisc: Implement translation block chaining Sebastian Macke
                   ` (14 more replies)
  0 siblings, 15 replies; 48+ messages in thread
From: Sebastian Macke @ 2013-10-29 19:04 UTC (permalink / raw)
  To: qemu-devel, proljc; +Cc: Sebastian Macke, openrisc, openrisc

Hi,

This is the second part of the patches to make the openrisc target faster 
and more reliable.

The first four patches are increasing the speed to a level comparable to the
i386 emulation by implementing block chaining and further small optimizations.

Two patches change the handling of the TLB flushing and increase 
especially the tlb refill procedure for the softmmu emulation. 
In some way it does very slightly break the specification but 
increases the compatbility with the specification of QEMU instead.

One patch introduces a new CPU which neglects
the carry and overflow flags. I hope this one gets accepted. 
It increases the speed and does not harm anything, but violates the 
specification. But since you have to activate it explicitly I don't 
see a problem.

The other patches correct small stuff or increase the readability 
One patch solves a problem I introduced in gdbstub.c in my previous 
patchset.

Further optimizations would require instruction fusion and the resorting 
of instructions (delayed slot). I have not seen any of the other 
targets doing this. But it would be interesting :)

The nbench results are quite remarkable and show a speed increase up to 
a factor of ten for the user space emulation.

                                    old          +chaining+rw     +no flags      +sync          +jmp_pc as flag
                      i386 user     or1k user      or1k user      or1k user      or1k user      or1k user      
                      gcc 4.8.1     gcc 4.8.1      gcc 4.8.1      gcc 4.8.1      gcc 4.8.1      gcc 4.8.1    
TEST                : New Index   : New Index   :  New Index   :  New Index   :  New Index   :  New Index   :
                    : AMD K6/233* : AMD K6/233* :  AMD K6/233* :  AMD K6/233* :  AMD K6/233* :  AMD K6/233* :
--------------------:-------------:-------------:--------------: -------------: -------------: -------------:
NUMERIC SORT        :       3.16  :       0.38  :        2.52  :        3.63  :        3.90  :        4.25  : 
STRING SORT         :       3.52  :       0.43  :        1.95  :        2.71  :        2.81  :        2.96  : 
BITFIELD            :       8.67  :       0.91  :        3.91  :        6.22  :        7.23  :        7.44  : 
FP EMULATION        :      14.12  :       1.25  :        7.27  :        9.82  :       10.20  :       11.09  :
FOURIER             :       1.46  :       0.01  :        0.06  :        0.08  :        0.08  :        0.09  :
ASSIGNMENT          :      15.34  :       0.67  :        5.05  :        7.88  :        8.50  :        9.89  :
IDEA                :      10.43  :       0.82  :        4.34  :        5.87  :        6.33  :        6.57  :
HUFFMAN             :       7.58  :       0.65  :        3.48  :        5.14  :        5.54  :        5.70  :
NEURAL NET          :       0.37  :       0.02  :        0.08  :        0.10  :        0.11  :        0.11  :
LU DECOMPOSITION    :       0.71  :       0.04  :        0.15  :        0.21  :        0.22  :        0.23  :


Keep in mind, that the toolchain is missing floating point support 
and that the emulator has to convert between little and big endian.

The patches where tested by the following programs
- tcg testsuite 
- booting Linux with busybox (+singlestep)
- running nbench in softmmu and user mode (+singlestep)
- configure && make && make install and testing of two applications (cmatrix and Frotz)
- running a few binutils tools like objdump
- qemu user chroot using proot-x86 and running busybox, nano editor and gcc
- gcc C-torture test with static and shared library.

The test.div of the tcg testsuite fails, but this is expected if you want to divide signed 0x80000000 by -1.

At the moment there is only one real bug left which I cannot find.
The compiled application "make" does not work in qemu-user mode.
It failes a few times in the uClibc library because of a segmentation fault. 
E.g. when setting the locale. But this failure existed also before my patches 
and is maybe not related to the target.

Best Regards
Sebastian



Sebastian Macke (13):
  target-openrisc: Implement translation block chaining
  target-openrisc: Separate Delayed slot handling from main loop
  target-openrisc: Separate of load/store instructions
  target-openrisc: sync flags only when necessary
  target-openrisc: Remove TLB flush on exception
  target-openrisc: Remove TLB flush from l.rfe instruction
  target-openrisc: Correct l.cmov conditional check
  target-openrisc: Test for Overflow exception statically
  target-openrisc: Add CPU which neglects Carry and Overflow Flag
  target-openrisc: Correct target number for 64 bit llseek
  target-openrisc: use jmp_pc as flag variable for branches
  target-openrisc: Add correct gdb information for the pc value
  target-openrisc: Add In-circuit emulator support

 linux-user/openrisc/syscall_nr.h   |   2 +-
 target-openrisc/cpu.c              |  15 +-
 target-openrisc/cpu.h              |  10 +-
 target-openrisc/gdbstub.c          |  14 +-
 target-openrisc/interrupt.c        |   4 -
 target-openrisc/interrupt_helper.c |   8 -
 target-openrisc/sys_helper.c       |   4 +
 target-openrisc/translate.c        | 317 +++++++++++++++++++++++++------------
 8 files changed, 250 insertions(+), 124 deletions(-)

-- 
1.8.4.1

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 01/13] target-openrisc: Implement translation block chaining
  2013-10-29 19:04 [Qemu-devel] [PATCH 00/13] target-openrisc: More optimizations and corrections Sebastian Macke
@ 2013-10-29 19:04 ` Sebastian Macke
  2013-10-29 19:04 ` [Qemu-devel] [PATCH 02/13] target-openrisc: Separate Delayed slot handling from main loop Sebastian Macke
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 48+ messages in thread
From: Sebastian Macke @ 2013-10-29 19:04 UTC (permalink / raw)
  To: qemu-devel, proljc; +Cc: Sebastian Macke, openrisc, openrisc

Currently the translation blocks are searched via a hash table
every time.
But QEMU supports direct block chaining for the pc values which
are known. This is true for the instructions l.bf, l.bnf, l.j and l.jal.

Because of the delayed slot we have to save several variables
to correctly jump after the delayed slot is executed.

btaken: temporary flag if the branch is taken or not
j_target: jump pc value
j_type: Save the type of jump. Static, dynamic or branch

The speed is increased by around a factor 2-3.

Signed-off-by: Sebastian Macke <sebastian@macke.de>
---
 target-openrisc/translate.c | 40 +++++++++++++++++++++++++++++++++-------
 1 file changed, 33 insertions(+), 7 deletions(-)

diff --git a/target-openrisc/translate.c b/target-openrisc/translate.c
index 8fc679b..1047661 100644
--- a/target-openrisc/translate.c
+++ b/target-openrisc/translate.c
@@ -39,11 +39,21 @@
 #  define LOG_DIS(...) do { } while (0)
 #endif
 
+enum {
+    JUMP_DYNAMIC, /* The address is not known during compile time*/
+    JUMP_STATIC,  /* The address is known during compile time */
+    JUMP_BRANCH   /* The two possible destinations are known */
+};
+
+
 typedef struct DisasContext {
     TranslationBlock *tb;
     target_ulong pc;
     uint32_t tb_flags, synced_flags, flags;
     uint32_t is_jmp;
+    uint32_t j_state; /* specifies the jump type*/
+    target_ulong j_target; /* target address for jump */
+    TCGv btaken;  /* Temporary variable */
     uint32_t mem_idx;
     int singlestep_enabled;
     uint32_t delayed_branch;
@@ -193,24 +203,31 @@ static void gen_jump(DisasContext *dc, uint32_t imm, uint32_t reg, uint32_t op0)
     target_ulong tmp_pc;
     /* N26, 26bits imm */
     tmp_pc = sign_extend((imm<<2), 26) + dc->pc;
+    dc->j_target = tmp_pc;
 
     switch (op0) {
     case 0x00:     /* l.j */
         tcg_gen_movi_tl(jmp_pc, tmp_pc);
+        dc->j_state = JUMP_STATIC;
         break;
     case 0x01:     /* l.jal */
         tcg_gen_movi_tl(cpu_R[9], (dc->pc + 8));
         tcg_gen_movi_tl(jmp_pc, tmp_pc);
-        break;
+        dc->j_state = JUMP_STATIC;
+    break;
     case 0x03:     /* l.bnf */
     case 0x04:     /* l.bf  */
         {
             int lab = gen_new_label();
+            dc->btaken = tcg_temp_local_new();
             tcg_gen_movi_tl(jmp_pc, dc->pc+8);
+            tcg_gen_movi_tl(dc->btaken, 0);
             tcg_gen_brcondi_i32(op0 == 0x03 ? TCG_COND_NE : TCG_COND_EQ,
                                 cpu_srf, 0, lab);
+            tcg_gen_movi_tl(dc->btaken, 1);
             tcg_gen_movi_tl(jmp_pc, tmp_pc);
             gen_set_label(lab);
+            dc->j_state = JUMP_BRANCH;
         }
         break;
     case 0x11:     /* l.jr */
@@ -1657,6 +1674,8 @@ static inline void gen_intermediate_code_internal(OpenRISCCPU *cpu,
     gen_opc_end = tcg_ctx.gen_opc_buf + OPC_MAX_SIZE;
     dc->is_jmp = DISAS_NEXT;
     dc->pc = pc_start;
+    dc->j_state = JUMP_DYNAMIC;
+    dc->j_target = 0;
     dc->flags = cpu->env.cpucfgr;
     dc->mem_idx = cpu_mmu_index(&cpu->env);
     dc->synced_flags = dc->tb_flags = tb->flags;
@@ -1709,8 +1728,19 @@ static inline void gen_intermediate_code_internal(OpenRISCCPU *cpu,
             if (!dc->delayed_branch) {
                 dc->tb_flags &= ~D_FLAG;
                 gen_sync_flags(dc);
-                tcg_gen_mov_tl(cpu_pc, jmp_pc);
-                tcg_gen_exit_tb(0);
+                if (dc->j_state == JUMP_BRANCH) {
+                    int l1 = gen_new_label();
+                    tcg_gen_brcondi_tl(TCG_COND_NE, dc->btaken, 0, l1);
+                    gen_goto_tb(dc, 1, dc->pc);
+                    gen_set_label(l1);
+                    gen_goto_tb(dc, 0, dc->j_target);
+                    tcg_temp_free(dc->btaken);
+                } else if (dc->j_state == JUMP_STATIC) {
+                    gen_goto_tb(dc, 0, dc->j_target);
+                } else {
+                    tcg_gen_mov_tl(cpu_pc, jmp_pc);
+                    tcg_gen_exit_tb(0);
+                }
                 dc->is_jmp = DISAS_JUMP;
                 break;
             }
@@ -1725,10 +1755,6 @@ static inline void gen_intermediate_code_internal(OpenRISCCPU *cpu,
     if (tb->cflags & CF_LAST_IO) {
         gen_io_end();
     }
-    if (dc->is_jmp == DISAS_NEXT) {
-        dc->is_jmp = DISAS_UPDATE;
-        tcg_gen_movi_tl(cpu_pc, dc->pc);
-    }
     if (unlikely(cs->singlestep_enabled)) {
         if (dc->is_jmp == DISAS_NEXT) {
             tcg_gen_movi_tl(cpu_pc, dc->pc);
-- 
1.8.4.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 02/13] target-openrisc: Separate Delayed slot handling from main loop
  2013-10-29 19:04 [Qemu-devel] [PATCH 00/13] target-openrisc: More optimizations and corrections Sebastian Macke
  2013-10-29 19:04 ` [Qemu-devel] [PATCH 01/13] target-openrisc: Implement translation block chaining Sebastian Macke
@ 2013-10-29 19:04 ` Sebastian Macke
  2013-10-29 19:04 ` [Qemu-devel] [PATCH 03/13] target-openrisc: Separate of load/store instructions Sebastian Macke
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 48+ messages in thread
From: Sebastian Macke @ 2013-10-29 19:04 UTC (permalink / raw)
  To: qemu-devel, proljc; +Cc: Sebastian Macke, openrisc, openrisc

To increase the readability the delayed slot handling is separated to
a function

Signed-off-by: Sebastian Macke <sebastian@macke.de>
---
 target-openrisc/translate.c | 46 +++++++++++++++++++++++++++++----------------
 1 file changed, 30 insertions(+), 16 deletions(-)

diff --git a/target-openrisc/translate.c b/target-openrisc/translate.c
index 1047661..31f8717 100644
--- a/target-openrisc/translate.c
+++ b/target-openrisc/translate.c
@@ -1655,6 +1655,35 @@ static void check_breakpoint(OpenRISCCPU *cpu, DisasContext *dc)
     }
 }
 
+static void handle_delay_slot(DisasContext *dc)
+{
+    dc->tb_flags &= ~D_FLAG;
+    gen_sync_flags(dc);
+
+    switch (dc->j_state) {
+    case JUMP_BRANCH:
+        {
+            int l1 = gen_new_label();
+            tcg_gen_brcondi_tl(TCG_COND_NE, dc->btaken, 0, l1);
+            gen_goto_tb(dc, 1, dc->pc);
+            gen_set_label(l1);
+            tcg_temp_free(dc->btaken);
+            gen_goto_tb(dc, 0, dc->j_target);
+            break;
+        }
+    case JUMP_STATIC:
+        gen_goto_tb(dc, 0, dc->j_target);
+        break;
+    case JUMP_DYNAMIC:
+    default:
+        tcg_gen_mov_tl(cpu_pc, jmp_pc);
+        tcg_gen_exit_tb(0);
+        break;
+    }
+    dc->is_jmp = DISAS_JUMP;
+}
+
+
 static inline void gen_intermediate_code_internal(OpenRISCCPU *cpu,
                                                   TranslationBlock *tb,
                                                   int search_pc)
@@ -1726,22 +1755,7 @@ static inline void gen_intermediate_code_internal(OpenRISCCPU *cpu,
         if (dc->delayed_branch) {
             dc->delayed_branch--;
             if (!dc->delayed_branch) {
-                dc->tb_flags &= ~D_FLAG;
-                gen_sync_flags(dc);
-                if (dc->j_state == JUMP_BRANCH) {
-                    int l1 = gen_new_label();
-                    tcg_gen_brcondi_tl(TCG_COND_NE, dc->btaken, 0, l1);
-                    gen_goto_tb(dc, 1, dc->pc);
-                    gen_set_label(l1);
-                    gen_goto_tb(dc, 0, dc->j_target);
-                    tcg_temp_free(dc->btaken);
-                } else if (dc->j_state == JUMP_STATIC) {
-                    gen_goto_tb(dc, 0, dc->j_target);
-                } else {
-                    tcg_gen_mov_tl(cpu_pc, jmp_pc);
-                    tcg_gen_exit_tb(0);
-                }
-                dc->is_jmp = DISAS_JUMP;
+                handle_delay_slot(dc);
                 break;
             }
         }
-- 
1.8.4.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 03/13] target-openrisc: Separate of load/store instructions
  2013-10-29 19:04 [Qemu-devel] [PATCH 00/13] target-openrisc: More optimizations and corrections Sebastian Macke
  2013-10-29 19:04 ` [Qemu-devel] [PATCH 01/13] target-openrisc: Implement translation block chaining Sebastian Macke
  2013-10-29 19:04 ` [Qemu-devel] [PATCH 02/13] target-openrisc: Separate Delayed slot handling from main loop Sebastian Macke
@ 2013-10-29 19:04 ` Sebastian Macke
  2013-10-29 20:05   ` Max Filippov
  2013-10-29 19:04 ` [Qemu-devel] [PATCH 04/13] target-openrisc: sync flags only when necessary Sebastian Macke
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 48+ messages in thread
From: Sebastian Macke @ 2013-10-29 19:04 UTC (permalink / raw)
  To: qemu-devel, proljc; +Cc: Sebastian Macke, openrisc, openrisc

This patch separates the load and store instruction to a
separate function.
The repetition of the source code can be reduced and further
optimizations can be implemented.
In this case it checks for a zero offset and optimizes it.

Additional this patch solves a severe bug for the softmmu emulation.
The pc has to be saved as these instructions can fail and lead
to a tlb miss exception.

Signed-off-by: Sebastian Macke <sebastian@macke.de>
---
 target-openrisc/translate.c | 130 ++++++++++++++++++++++++++------------------
 1 file changed, 76 insertions(+), 54 deletions(-)

diff --git a/target-openrisc/translate.c b/target-openrisc/translate.c
index 31f8717..1bb686c 100644
--- a/target-openrisc/translate.c
+++ b/target-openrisc/translate.c
@@ -692,6 +692,73 @@ static void dec_calc(DisasContext *dc, uint32_t insn)
     }
 }
 
+static void gen_loadstore(DisasContext *dc, uint32 op0,
+                          uint32_t ra, uint32_t rb, uint32_t rd,
+                          uint32_t offset)
+{
+
+/* The load and store instructions can fail and lead to a
+ *  tlb miss exception. The correct pc has to be stored for
+ *  this case.
+ */
+#if !defined(CONFIG_USER_ONLY)
+    tcg_gen_movi_tl(cpu_pc, dc->pc);
+#endif
+
+    TCGv t0 = cpu_R[ra];
+    if (offset != 0) {
+        t0 = tcg_temp_new();
+        tcg_gen_addi_tl(t0, cpu_R[ra], sign_extend(offset, 16));
+    }
+
+    switch (op0) {
+    case 0x21:    /* l.lwz */
+        tcg_gen_qemu_ld32u(cpu_R[rd], t0, dc->mem_idx);
+        break;
+
+    case 0x22:    /* l.lws */
+        tcg_gen_qemu_ld32s(cpu_R[rd], t0, dc->mem_idx);
+        break;
+
+    case 0x23:    /* l.lbz */
+        tcg_gen_qemu_ld8u(cpu_R[rd], t0, dc->mem_idx);
+        break;
+
+    case 0x24:    /* l.lbs */
+        tcg_gen_qemu_ld8s(cpu_R[rd], t0, dc->mem_idx);
+        break;
+
+    case 0x25:    /* l.lhz */
+        tcg_gen_qemu_ld16u(cpu_R[rd], t0, dc->mem_idx);
+        break;
+
+    case 0x26:    /* l.lhs */
+        tcg_gen_qemu_ld16s(cpu_R[rd], t0, dc->mem_idx);
+        break;
+
+    case 0x35:    /* l.sw */
+        tcg_gen_qemu_st32(cpu_R[rb], t0, dc->mem_idx);
+        break;
+
+    case 0x36:    /* l.sb */
+        tcg_gen_qemu_st8(cpu_R[rb], t0, dc->mem_idx);
+        break;
+
+    case 0x37:    /* l.sh */
+        tcg_gen_qemu_st16(cpu_R[rb], t0, dc->mem_idx);
+        break;
+
+    default:
+    break;
+    }
+
+    if (offset != 0) {
+        tcg_temp_free(t0);
+    }
+
+}
+
+
 static void dec_misc(DisasContext *dc, uint32_t insn)
 {
     uint32_t op0, op1;
@@ -843,62 +910,32 @@ static void dec_misc(DisasContext *dc, uint32_t insn)
 
     case 0x21:    /* l.lwz */
         LOG_DIS("l.lwz r%d, r%d, %d\n", rd, ra, I16);
-        {
-            TCGv t0 = tcg_temp_new();
-            tcg_gen_addi_tl(t0, cpu_R[ra], sign_extend(I16, 16));
-            tcg_gen_qemu_ld32u(cpu_R[rd], t0, dc->mem_idx);
-            tcg_temp_free(t0);
-        }
+        gen_loadstore(dc, op0, ra, rb, rd, I16);
         break;
 
     case 0x22:    /* l.lws */
         LOG_DIS("l.lws r%d, r%d, %d\n", rd, ra, I16);
-        {
-            TCGv t0 = tcg_temp_new();
-            tcg_gen_addi_tl(t0, cpu_R[ra], sign_extend(I16, 16));
-            tcg_gen_qemu_ld32s(cpu_R[rd], t0, dc->mem_idx);
-            tcg_temp_free(t0);
-        }
+        gen_loadstore(dc, op0, ra, rb, rd, I16);
         break;
 
     case 0x23:    /* l.lbz */
         LOG_DIS("l.lbz r%d, r%d, %d\n", rd, ra, I16);
-        {
-            TCGv t0 = tcg_temp_new();
-            tcg_gen_addi_tl(t0, cpu_R[ra], sign_extend(I16, 16));
-            tcg_gen_qemu_ld8u(cpu_R[rd], t0, dc->mem_idx);
-            tcg_temp_free(t0);
-        }
+        gen_loadstore(dc, op0, ra, rb, rd, I16);
         break;
 
     case 0x24:    /* l.lbs */
         LOG_DIS("l.lbs r%d, r%d, %d\n", rd, ra, I16);
-        {
-            TCGv t0 = tcg_temp_new();
-            tcg_gen_addi_tl(t0, cpu_R[ra], sign_extend(I16, 16));
-            tcg_gen_qemu_ld8s(cpu_R[rd], t0, dc->mem_idx);
-            tcg_temp_free(t0);
-        }
+        gen_loadstore(dc, op0, ra, rb, rd, I16);
         break;
 
     case 0x25:    /* l.lhz */
         LOG_DIS("l.lhz r%d, r%d, %d\n", rd, ra, I16);
-        {
-            TCGv t0 = tcg_temp_new();
-            tcg_gen_addi_tl(t0, cpu_R[ra], sign_extend(I16, 16));
-            tcg_gen_qemu_ld16u(cpu_R[rd], t0, dc->mem_idx);
-            tcg_temp_free(t0);
-        }
+        gen_loadstore(dc, op0, ra, rb, rd, I16);
         break;
 
     case 0x26:    /* l.lhs */
         LOG_DIS("l.lhs r%d, r%d, %d\n", rd, ra, I16);
-        {
-            TCGv t0 = tcg_temp_new();
-            tcg_gen_addi_tl(t0, cpu_R[ra], sign_extend(I16, 16));
-            tcg_gen_qemu_ld16s(cpu_R[rd], t0, dc->mem_idx);
-            tcg_temp_free(t0);
-        }
+        gen_loadstore(dc, op0, ra, rb, rd, I16);
         break;
 
     case 0x27:    /* l.addi */
@@ -1047,32 +1084,17 @@ static void dec_misc(DisasContext *dc, uint32_t insn)
 
     case 0x35:    /* l.sw */
         LOG_DIS("l.sw %d, r%d, r%d, %d\n", I5, ra, rb, I11);
-        {
-            TCGv t0 = tcg_temp_new();
-            tcg_gen_addi_tl(t0, cpu_R[ra], sign_extend(tmp, 16));
-            tcg_gen_qemu_st32(cpu_R[rb], t0, dc->mem_idx);
-            tcg_temp_free(t0);
-        }
+        gen_loadstore(dc, op0, ra, rb, rd, tmp);
         break;
 
     case 0x36:    /* l.sb */
         LOG_DIS("l.sb %d, r%d, r%d, %d\n", I5, ra, rb, I11);
-        {
-            TCGv t0 = tcg_temp_new();
-            tcg_gen_addi_tl(t0, cpu_R[ra], sign_extend(tmp, 16));
-            tcg_gen_qemu_st8(cpu_R[rb], t0, dc->mem_idx);
-            tcg_temp_free(t0);
-        }
+        gen_loadstore(dc, op0, ra, rb, rd, tmp);
         break;
 
     case 0x37:    /* l.sh */
+        gen_loadstore(dc, op0, ra, rb, rd, tmp);
         LOG_DIS("l.sh %d, r%d, r%d, %d\n", I5, ra, rb, I11);
-        {
-            TCGv t0 = tcg_temp_new();
-            tcg_gen_addi_tl(t0, cpu_R[ra], sign_extend(tmp, 16));
-            tcg_gen_qemu_st16(cpu_R[rb], t0, dc->mem_idx);
-            tcg_temp_free(t0);
-        }
         break;
 
     default:
-- 
1.8.4.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 04/13] target-openrisc: sync flags only when necessary
  2013-10-29 19:04 [Qemu-devel] [PATCH 00/13] target-openrisc: More optimizations and corrections Sebastian Macke
                   ` (2 preceding siblings ...)
  2013-10-29 19:04 ` [Qemu-devel] [PATCH 03/13] target-openrisc: Separate of load/store instructions Sebastian Macke
@ 2013-10-29 19:04 ` Sebastian Macke
  2013-10-29 21:51   ` Richard Henderson
  2013-10-29 19:04 ` [Qemu-devel] [PATCH 05/13] target-openrisc: Remove TLB flush on exception Sebastian Macke
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 48+ messages in thread
From: Sebastian Macke @ 2013-10-29 19:04 UTC (permalink / raw)
  To: qemu-devel, proljc; +Cc: Sebastian Macke, openrisc, openrisc

Very often the delayed slot flag is set only to be removed
one instruction later. This patch sets this flag
only on instructions which could fail and at the end
of a translation block if necessary.

Signed-off-by: Sebastian Macke <sebastian@macke.de>
---
 target-openrisc/translate.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/target-openrisc/translate.c b/target-openrisc/translate.c
index 1bb686c..378ff1b 100644
--- a/target-openrisc/translate.c
+++ b/target-openrisc/translate.c
@@ -145,12 +145,14 @@ static inline void gen_sync_flags(DisasContext *dc)
 static void gen_exception(DisasContext *dc, unsigned int excp)
 {
     TCGv_i32 tmp = tcg_const_i32(excp);
+    gen_sync_flags(dc);
     gen_helper_exception(cpu_env, tmp);
     tcg_temp_free_i32(tmp);
 }
 
 static void gen_illegal_exception(DisasContext *dc)
 {
+    gen_sync_flags(dc);
     tcg_gen_movi_tl(cpu_pc, dc->pc);
     gen_exception(dc, EXCP_ILLEGAL);
     dc->is_jmp = DISAS_UPDATE;
@@ -244,7 +246,6 @@ static void gen_jump(DisasContext *dc, uint32_t imm, uint32_t reg, uint32_t op0)
 
     dc->delayed_branch = 2;
     dc->tb_flags |= D_FLAG;
-    gen_sync_flags(dc);
 }
 
 
@@ -703,6 +704,7 @@ static void gen_loadstore(DisasContext *dc, uint32 op0,
  */
 #if !defined(CONFIG_USER_ONLY)
     tcg_gen_movi_tl(cpu_pc, dc->pc);
+    gen_sync_flags(dc);
 #endif
 
     TCGv t0 = cpu_R[ra];
@@ -1788,6 +1790,8 @@ static inline void gen_intermediate_code_internal(OpenRISCCPU *cpu,
              && (dc->pc < next_page_start)
              && num_insns < max_insns);
 
+    gen_sync_flags(dc);
+
     if (tb->cflags & CF_LAST_IO) {
         gen_io_end();
     }
-- 
1.8.4.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 05/13] target-openrisc: Remove TLB flush on exception
  2013-10-29 19:04 [Qemu-devel] [PATCH 00/13] target-openrisc: More optimizations and corrections Sebastian Macke
                   ` (3 preceding siblings ...)
  2013-10-29 19:04 ` [Qemu-devel] [PATCH 04/13] target-openrisc: sync flags only when necessary Sebastian Macke
@ 2013-10-29 19:04 ` Sebastian Macke
  2013-10-29 19:47   ` Peter Maydell
  2013-10-29 19:04 ` [Qemu-devel] [PATCH 06/13] target-openrisc: Remove TLB flush from l.rfe instruction Sebastian Macke
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 48+ messages in thread
From: Sebastian Macke @ 2013-10-29 19:04 UTC (permalink / raw)
  To: qemu-devel, proljc; +Cc: Sebastian Macke, openrisc, openrisc

The TLB flush is not necessary as the mmu_index field
already takes care of correct memory locations.
Instead the tb flag field must be expanded that
the exception takes the correct translation block.

Signed-off-by: Sebastian Macke <sebastian@macke.de>
---
 target-openrisc/cpu.h       | 4 ++--
 target-openrisc/interrupt.c | 4 ----
 2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/target-openrisc/cpu.h b/target-openrisc/cpu.h
index 24afe6f..057821d 100644
--- a/target-openrisc/cpu.h
+++ b/target-openrisc/cpu.h
@@ -85,7 +85,7 @@ enum {
 #define SPR_VR 0xFFFF003F
 
 /* Internal flags, delay slot flag */
-#define D_FLAG    1
+#define D_FLAG    2
 
 /* Interrupt */
 #define NR_IRQS  32
@@ -412,7 +412,7 @@ static inline void cpu_get_tb_cpu_state(CPUOpenRISCState *env,
     *pc = env->pc;
     *cs_base = 0;
     /* D_FLAG -- branch instruction exception */
-    *flags = (env->flags & D_FLAG);
+    *flags = (env->flags & D_FLAG) | (env->sr & SR_SM);
 }
 
 static inline int cpu_mmu_index(CPUOpenRISCState *env)
diff --git a/target-openrisc/interrupt.c b/target-openrisc/interrupt.c
index d1d6ae2..ee98ed3 100644
--- a/target-openrisc/interrupt.c
+++ b/target-openrisc/interrupt.c
@@ -41,10 +41,6 @@ void openrisc_cpu_do_interrupt(CPUState *cs)
         env->epcr += 4;
     }
 
-    /* For machine-state changed between user-mode and supervisor mode,
-       we need flush TLB when we enter&exit EXCP.  */
-    tlb_flush(env, 1);
-
     env->esr = ENV_GET_SR(env);
     env->sr &= ~SR_DME;
     env->sr &= ~SR_IME;
-- 
1.8.4.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 06/13] target-openrisc: Remove TLB flush from l.rfe instruction
  2013-10-29 19:04 [Qemu-devel] [PATCH 00/13] target-openrisc: More optimizations and corrections Sebastian Macke
                   ` (4 preceding siblings ...)
  2013-10-29 19:04 ` [Qemu-devel] [PATCH 05/13] target-openrisc: Remove TLB flush on exception Sebastian Macke
@ 2013-10-29 19:04 ` Sebastian Macke
  2013-10-29 21:01   ` Max Filippov
  2013-10-29 19:04 ` [Qemu-devel] [PATCH 07/13] target-openrisc: Correct l.cmov conditional check Sebastian Macke
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 48+ messages in thread
From: Sebastian Macke @ 2013-10-29 19:04 UTC (permalink / raw)
  To: qemu-devel, proljc; +Cc: Sebastian Macke, openrisc, openrisc

At the moment there are two TLBs. The OpenRISC TLB followed
by the QEMU's own TLB.
At the end of the TLB miss handler a tlb_flush of QEMUs TLB
is executed which is exactly what we want to avoid.
As long as there is no context switch we don't have to flush the TLB.

There are two options:
1. If l.rfe is executed and the MMU is off we don't flush the buffer.
This would work under Linux

2. We flush the TLB only if the correct DTLB and ITLB registers are
cleared. Unfortunately the OpenRISC TLB is smaller and there is no
register which commands to flush the whole buffer.
In this case we have to reckognize when the whole TLB is flushed.
This would have the benefit that syscalls which do not change
the context would not flush the TLB.

Both solutions would break weakly the specification but the speed benefit
is more than a factor of two for memory intense programs.
In this patch, I choose solution no. 2.
If the first TLB register is cleared the whole TLB is cleared.

Signed-off-by: Sebastian Macke <sebastian@macke.de>
---
 target-openrisc/cpu.h              | 2 +-
 target-openrisc/interrupt_helper.c | 8 --------
 target-openrisc/sys_helper.c       | 4 ++++
 3 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/target-openrisc/cpu.h b/target-openrisc/cpu.h
index 057821d..bac61e5 100644
--- a/target-openrisc/cpu.h
+++ b/target-openrisc/cpu.h
@@ -412,7 +412,7 @@ static inline void cpu_get_tb_cpu_state(CPUOpenRISCState *env,
     *pc = env->pc;
     *cs_base = 0;
     /* D_FLAG -- branch instruction exception */
-    *flags = (env->flags & D_FLAG) | (env->sr & SR_SM);
+    *flags = (env->flags & D_FLAG) | (env->sr & (SR_SM | SR_DME | SR_IME));
 }
 
 static inline int cpu_mmu_index(CPUOpenRISCState *env)
diff --git a/target-openrisc/interrupt_helper.c b/target-openrisc/interrupt_helper.c
index ae187f5..1ad3a78 100644
--- a/target-openrisc/interrupt_helper.c
+++ b/target-openrisc/interrupt_helper.c
@@ -25,10 +25,6 @@ void HELPER(rfe)(CPUOpenRISCState *env)
 {
     OpenRISCCPU *cpu = openrisc_env_get_cpu(env);
     CPUState *cs = CPU(cpu);
-#ifndef CONFIG_USER_ONLY
-    int need_flush_tlb = (cpu->env.sr & (SR_SM | SR_IME | SR_DME)) ^
-                         (cpu->env.esr & (SR_SM | SR_IME | SR_DME));
-#endif
     cpu->env.pc = cpu->env.epcr;
     ENV_SET_SR(&(cpu->env), cpu->env.esr);
 
@@ -48,10 +44,6 @@ void HELPER(rfe)(CPUOpenRISCState *env)
         cpu->env.tlb->cpu_openrisc_map_address_code =
             &cpu_openrisc_get_phys_nommu;
     }
-
-    if (need_flush_tlb) {
-        tlb_flush(&cpu->env, 1);
-    }
 #endif
     cs->interrupt_request |= CPU_INTERRUPT_EXITTB;
 }
diff --git a/target-openrisc/sys_helper.c b/target-openrisc/sys_helper.c
index 3eda0e1..b7c987d 100644
--- a/target-openrisc/sys_helper.c
+++ b/target-openrisc/sys_helper.c
@@ -77,6 +77,10 @@ void HELPER(mtspr)(CPUOpenRISCState *env,
         idx = spr - TO_SPR(1, 512);
         if (!(rb & 1)) {
             tlb_flush_page(env, env->tlb->dtlb[0][idx].mr & TARGET_PAGE_MASK);
+            if (idx == 0) {
+                /* Assume that the whole tlb buffer must be flushed */
+                tlb_flush(&cpu->env, 1);
+            }
         }
         env->tlb->dtlb[0][idx].mr = rb;
         break;
-- 
1.8.4.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 07/13] target-openrisc: Correct l.cmov conditional check
  2013-10-29 19:04 [Qemu-devel] [PATCH 00/13] target-openrisc: More optimizations and corrections Sebastian Macke
                   ` (5 preceding siblings ...)
  2013-10-29 19:04 ` [Qemu-devel] [PATCH 06/13] target-openrisc: Remove TLB flush from l.rfe instruction Sebastian Macke
@ 2013-10-29 19:04 ` Sebastian Macke
  2013-10-29 21:15   ` Max Filippov
  2013-10-29 19:04 ` [Qemu-devel] [PATCH 08/13] target-openrisc: Test for Overflow exception statically Sebastian Macke
                   ` (7 subsequent siblings)
  14 siblings, 1 reply; 48+ messages in thread
From: Sebastian Macke @ 2013-10-29 19:04 UTC (permalink / raw)
  To: qemu-devel, proljc; +Cc: Sebastian Macke, openrisc, openrisc

srf is a boolean variable.
Therefore the instruction should check for != 0 and not for != SR_F

Signed-off-by: Sebastian Macke <sebastian@macke.de>
---
 target-openrisc/translate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target-openrisc/translate.c b/target-openrisc/translate.c
index 378ff1b..9fd1126 100644
--- a/target-openrisc/translate.c
+++ b/target-openrisc/translate.c
@@ -565,7 +565,7 @@ static void dec_calc(DisasContext *dc, uint32_t insn)
                 int lab = gen_new_label();
                 TCGv res = tcg_temp_local_new();
                 tcg_gen_mov_tl(res, cpu_R[rb]);
-                tcg_gen_brcondi_tl(TCG_COND_NE, cpu_srf, SR_F, lab);
+                tcg_gen_brcondi_tl(TCG_COND_NE, cpu_srf, 0, lab);
                 tcg_gen_mov_tl(res, cpu_R[ra]);
                 gen_set_label(lab);
                 tcg_gen_mov_tl(cpu_R[rd], res);
-- 
1.8.4.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 08/13] target-openrisc: Test for Overflow exception statically
  2013-10-29 19:04 [Qemu-devel] [PATCH 00/13] target-openrisc: More optimizations and corrections Sebastian Macke
                   ` (6 preceding siblings ...)
  2013-10-29 19:04 ` [Qemu-devel] [PATCH 07/13] target-openrisc: Correct l.cmov conditional check Sebastian Macke
@ 2013-10-29 19:04 ` Sebastian Macke
  2013-10-29 21:25   ` Max Filippov
  2013-10-29 19:04 ` [Qemu-devel] [PATCH 09/13] target-openrisc: Add CPU which neglects Carry and Overflow Flag Sebastian Macke
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 48+ messages in thread
From: Sebastian Macke @ 2013-10-29 19:04 UTC (permalink / raw)
  To: qemu-devel, proljc; +Cc: Sebastian Macke, openrisc, openrisc

Instead of testing the overflow exception dynamically every time
The flag will be reckognized by the tcg as changed code and
will recompile the code with the correct checks.

Signed-off-by: Sebastian Macke <sebastian@macke.de>
---
 target-openrisc/cpu.h       |  3 +-
 target-openrisc/translate.c | 78 ++++++++++++++++++++++++++-------------------
 2 files changed, 48 insertions(+), 33 deletions(-)

diff --git a/target-openrisc/cpu.h b/target-openrisc/cpu.h
index bac61e5..94bbb17 100644
--- a/target-openrisc/cpu.h
+++ b/target-openrisc/cpu.h
@@ -412,7 +412,8 @@ static inline void cpu_get_tb_cpu_state(CPUOpenRISCState *env,
     *pc = env->pc;
     *cs_base = 0;
     /* D_FLAG -- branch instruction exception */
-    *flags = (env->flags & D_FLAG) | (env->sr & (SR_SM | SR_DME | SR_IME));
+    *flags = (env->flags & D_FLAG) |
+             (env->sr & (SR_SM | SR_DME | SR_IME | SR_OVE));
 }
 
 static inline int cpu_mmu_index(CPUOpenRISCState *env)
diff --git a/target-openrisc/translate.c b/target-openrisc/translate.c
index 9fd1126..b1f73c4 100644
--- a/target-openrisc/translate.c
+++ b/target-openrisc/translate.c
@@ -271,7 +271,6 @@ static void dec_calc(DisasContext *dc, uint32_t insn)
                 TCGv_i64 tb = tcg_temp_new_i64();
                 TCGv_i64 td = tcg_temp_local_new_i64();
                 TCGv_i32 res = tcg_temp_local_new_i32();
-                TCGv_i32 sr_ove = tcg_temp_local_new_i32();
                 tcg_gen_extu_i32_i64(ta, cpu_R[ra]);
                 tcg_gen_extu_i32_i64(tb, cpu_R[rb]);
                 tcg_gen_add_i64(td, ta, tb);
@@ -282,16 +281,19 @@ static void dec_calc(DisasContext *dc, uint32_t insn)
                 tcg_gen_brcondi_i64(TCG_COND_EQ, td, 0x0, lab);
                 tcg_gen_brcondi_i64(TCG_COND_EQ, td, 0x3, lab);
                 tcg_gen_ori_i32(cpu_sr, cpu_sr, (SR_OV | SR_CY));
-                tcg_gen_andi_i32(sr_ove, cpu_sr, SR_OVE);
-                tcg_gen_brcondi_i32(TCG_COND_NE, sr_ove, SR_OVE, lab);
-                gen_exception(dc, EXCP_RANGE);
+                if (dc->tb_flags & SR_OVE) {
+                    TCGv_i32 sr_ove = tcg_temp_local_new_i32();
+                    tcg_gen_andi_i32(sr_ove, cpu_sr, SR_OVE);
+                    tcg_gen_brcondi_i32(TCG_COND_NE, sr_ove, SR_OVE, lab);
+                    gen_exception(dc, EXCP_RANGE);
+                    tcg_temp_free_i32(sr_ove);
+                }
                 gen_set_label(lab);
                 tcg_gen_mov_i32(cpu_R[rd], res);
                 tcg_temp_free_i64(ta);
                 tcg_temp_free_i64(tb);
                 tcg_temp_free_i64(td);
                 tcg_temp_free_i32(res);
-                tcg_temp_free_i32(sr_ove);
             }
             break;
         default:
@@ -312,7 +314,6 @@ static void dec_calc(DisasContext *dc, uint32_t insn)
                 TCGv_i64 td = tcg_temp_local_new_i64();
                 TCGv_i32 res = tcg_temp_local_new_i32();
                 TCGv_i32 sr_cy = tcg_temp_local_new_i32();
-                TCGv_i32 sr_ove = tcg_temp_local_new_i32();
                 tcg_gen_extu_i32_i64(ta, cpu_R[ra]);
                 tcg_gen_extu_i32_i64(tb, cpu_R[rb]);
                 tcg_gen_andi_i32(sr_cy, cpu_sr, SR_CY);
@@ -327,9 +328,13 @@ static void dec_calc(DisasContext *dc, uint32_t insn)
                 tcg_gen_brcondi_i64(TCG_COND_EQ, td, 0x0, lab);
                 tcg_gen_brcondi_i64(TCG_COND_EQ, td, 0x3, lab);
                 tcg_gen_ori_i32(cpu_sr, cpu_sr, (SR_OV | SR_CY));
-                tcg_gen_andi_i32(sr_ove, cpu_sr, SR_OVE);
-                tcg_gen_brcondi_i32(TCG_COND_NE, sr_ove, SR_OVE, lab);
-                gen_exception(dc, EXCP_RANGE);
+                if (dc->tb_flags & SR_OVE) {
+                    TCGv_i32 sr_ove = tcg_temp_local_new_i32();
+                    tcg_gen_andi_i32(sr_ove, cpu_sr, SR_OVE);
+                    tcg_gen_brcondi_i32(TCG_COND_NE, sr_ove, SR_OVE, lab);
+                    gen_exception(dc, EXCP_RANGE);
+                    tcg_temp_free_i32(sr_ove);
+                }
                 gen_set_label(lab);
                 tcg_gen_mov_i32(cpu_R[rd], res);
                 tcg_temp_free_i64(ta);
@@ -338,7 +343,6 @@ static void dec_calc(DisasContext *dc, uint32_t insn)
                 tcg_temp_free_i64(td);
                 tcg_temp_free_i32(res);
                 tcg_temp_free_i32(sr_cy);
-                tcg_temp_free_i32(sr_ove);
             }
             break;
         default:
@@ -357,7 +361,6 @@ static void dec_calc(DisasContext *dc, uint32_t insn)
                 TCGv_i64 tb = tcg_temp_new_i64();
                 TCGv_i64 td = tcg_temp_local_new_i64();
                 TCGv_i32 res = tcg_temp_local_new_i32();
-                TCGv_i32 sr_ove = tcg_temp_local_new_i32();
 
                 tcg_gen_extu_i32_i64(ta, cpu_R[ra]);
                 tcg_gen_extu_i32_i64(tb, cpu_R[rb]);
@@ -369,16 +372,19 @@ static void dec_calc(DisasContext *dc, uint32_t insn)
                 tcg_gen_brcondi_i64(TCG_COND_EQ, td, 0x0, lab);
                 tcg_gen_brcondi_i64(TCG_COND_EQ, td, 0x3, lab);
                 tcg_gen_ori_i32(cpu_sr, cpu_sr, (SR_OV | SR_CY));
-                tcg_gen_andi_i32(sr_ove, cpu_sr, SR_OVE);
-                tcg_gen_brcondi_i32(TCG_COND_NE, sr_ove, SR_OVE, lab);
-                gen_exception(dc, EXCP_RANGE);
+                if (dc->tb_flags & SR_OVE) {
+                    TCGv_i32 sr_ove = tcg_temp_local_new_i32();
+                    tcg_gen_andi_i32(sr_ove, cpu_sr, SR_OVE);
+                    tcg_gen_brcondi_i32(TCG_COND_NE, sr_ove, SR_OVE, lab);
+                    gen_exception(dc, EXCP_RANGE);
+                    tcg_temp_free_i32(sr_ove);
+                }
                 gen_set_label(lab);
                 tcg_gen_mov_i32(cpu_R[rd], res);
                 tcg_temp_free_i64(ta);
                 tcg_temp_free_i64(tb);
                 tcg_temp_free_i64(td);
                 tcg_temp_free_i32(res);
-                tcg_temp_free_i32(sr_ove);
             }
             break;
         default:
@@ -451,10 +457,12 @@ static void dec_calc(DisasContext *dc, uint32_t insn)
                 TCGv_i32 sr_ove = tcg_temp_local_new_i32();
                 if (rb == 0) {
                     tcg_gen_ori_tl(cpu_sr, cpu_sr, (SR_OV | SR_CY));
-                    tcg_gen_andi_tl(sr_ove, cpu_sr, SR_OVE);
-                    tcg_gen_brcondi_tl(TCG_COND_NE, sr_ove, SR_OVE, lab0);
-                    gen_exception(dc, EXCP_RANGE);
-                    gen_set_label(lab0);
+                    if (dc->tb_flags & SR_OVE) {
+                        tcg_gen_andi_tl(sr_ove, cpu_sr, SR_OVE);
+                        tcg_gen_brcondi_tl(TCG_COND_NE, sr_ove, SR_OVE, lab0);
+                        gen_exception(dc, EXCP_RANGE);
+                        gen_set_label(lab0);
+                    }
                 } else {
                     tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_R[rb],
                                        0x00000000, lab1);
@@ -464,9 +472,11 @@ static void dec_calc(DisasContext *dc, uint32_t insn)
                                        0xffffffff, lab2);
                     gen_set_label(lab1);
                     tcg_gen_ori_tl(cpu_sr, cpu_sr, (SR_OV | SR_CY));
-                    tcg_gen_andi_tl(sr_ove, cpu_sr, SR_OVE);
-                    tcg_gen_brcondi_tl(TCG_COND_NE, sr_ove, SR_OVE, lab3);
-                    gen_exception(dc, EXCP_RANGE);
+                    if (dc->tb_flags & SR_OVE) {
+                        tcg_gen_andi_tl(sr_ove, cpu_sr, SR_OVE);
+                        tcg_gen_brcondi_tl(TCG_COND_NE, sr_ove, SR_OVE, lab3);
+                        gen_exception(dc, EXCP_RANGE);
+                    }
                     gen_set_label(lab2);
                     tcg_gen_div_tl(cpu_R[rd], cpu_R[ra], cpu_R[rb]);
                     gen_set_label(lab3);
@@ -950,7 +960,6 @@ static void dec_misc(DisasContext *dc, uint32_t insn)
                 TCGv_i64 ta = tcg_temp_new_i64();
                 TCGv_i64 td = tcg_temp_local_new_i64();
                 TCGv_i32 res = tcg_temp_local_new_i32();
-                TCGv_i32 sr_ove = tcg_temp_local_new_i32();
                 tcg_gen_extu_i32_i64(ta, cpu_R[ra]);
                 tcg_gen_addi_i64(td, ta, sign_extend(I16, 16));
                 tcg_gen_trunc_i64_i32(res, td);
@@ -960,15 +969,18 @@ static void dec_misc(DisasContext *dc, uint32_t insn)
                 tcg_gen_brcondi_i64(TCG_COND_EQ, td, 0x0, lab);
                 tcg_gen_brcondi_i64(TCG_COND_EQ, td, 0x3, lab);
                 tcg_gen_ori_i32(cpu_sr, cpu_sr, (SR_OV | SR_CY));
-                tcg_gen_andi_i32(sr_ove, cpu_sr, SR_OVE);
-                tcg_gen_brcondi_i32(TCG_COND_NE, sr_ove, SR_OVE, lab);
-                gen_exception(dc, EXCP_RANGE);
+                if (dc->tb_flags & SR_OVE) {
+                    TCGv_i32 sr_ove = tcg_temp_local_new_i32();
+                    tcg_gen_andi_i32(sr_ove, cpu_sr, SR_OVE);
+                    tcg_gen_brcondi_i32(TCG_COND_NE, sr_ove, SR_OVE, lab);
+                    gen_exception(dc, EXCP_RANGE);
+                    tcg_temp_free_i32(sr_ove);
+                }
                 gen_set_label(lab);
                 tcg_gen_mov_i32(cpu_R[rd], res);
                 tcg_temp_free_i64(ta);
                 tcg_temp_free_i64(td);
                 tcg_temp_free_i32(res);
-                tcg_temp_free_i32(sr_ove);
             }
         }
         break;
@@ -982,7 +994,6 @@ static void dec_misc(DisasContext *dc, uint32_t insn)
             TCGv_i64 tcy = tcg_temp_local_new_i64();
             TCGv_i32 res = tcg_temp_local_new_i32();
             TCGv_i32 sr_cy = tcg_temp_local_new_i32();
-            TCGv_i32 sr_ove = tcg_temp_local_new_i32();
             tcg_gen_extu_i32_i64(ta, cpu_R[ra]);
             tcg_gen_andi_i32(sr_cy, cpu_sr, SR_CY);
             tcg_gen_shri_i32(sr_cy, sr_cy, 10);
@@ -996,9 +1007,13 @@ static void dec_misc(DisasContext *dc, uint32_t insn)
             tcg_gen_brcondi_i64(TCG_COND_EQ, td, 0x0, lab);
             tcg_gen_brcondi_i64(TCG_COND_EQ, td, 0x3, lab);
             tcg_gen_ori_i32(cpu_sr, cpu_sr, (SR_OV | SR_CY));
-            tcg_gen_andi_i32(sr_ove, cpu_sr, SR_OVE);
-            tcg_gen_brcondi_i32(TCG_COND_NE, sr_ove, SR_OVE, lab);
-            gen_exception(dc, EXCP_RANGE);
+            if (dc->tb_flags & SR_OVE) {
+                TCGv_i32 sr_ove = tcg_temp_local_new_i32();
+                tcg_gen_andi_i32(sr_ove, cpu_sr, SR_OVE);
+                tcg_gen_brcondi_i32(TCG_COND_NE, sr_ove, SR_OVE, lab);
+                gen_exception(dc, EXCP_RANGE);
+                tcg_temp_free_i32(sr_ove);
+            }
             gen_set_label(lab);
             tcg_gen_mov_i32(cpu_R[rd], res);
             tcg_temp_free_i64(ta);
@@ -1006,7 +1021,6 @@ static void dec_misc(DisasContext *dc, uint32_t insn)
             tcg_temp_free_i64(tcy);
             tcg_temp_free_i32(res);
             tcg_temp_free_i32(sr_cy);
-            tcg_temp_free_i32(sr_ove);
         }
         break;
 
-- 
1.8.4.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 09/13] target-openrisc: Add CPU which neglects Carry and Overflow Flag
  2013-10-29 19:04 [Qemu-devel] [PATCH 00/13] target-openrisc: More optimizations and corrections Sebastian Macke
                   ` (7 preceding siblings ...)
  2013-10-29 19:04 ` [Qemu-devel] [PATCH 08/13] target-openrisc: Test for Overflow exception statically Sebastian Macke
@ 2013-10-29 19:04 ` Sebastian Macke
  2013-10-30 18:14   ` Richard Henderson
  2013-10-29 19:04 ` [Qemu-devel] [PATCH 10/13] target-openrisc: Correct target number for 64 bit llseek Sebastian Macke
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 48+ messages in thread
From: Sebastian Macke @ 2013-10-29 19:04 UTC (permalink / raw)
  To: qemu-devel, proljc; +Cc: Sebastian Macke, openrisc, openrisc

The carry and overflag and the instructions l.addc and l.addic
are never used in the toolchain. Linux and gcc compiled software
don't need them. To speed up the emulation a cpu was added which
neglects the flags for l.addi, l.add, l.sub and
generates an illegal instruction error for l.addic and l.addc

This cpu violate the specification and will not run through the
test cases but increases the emulation speed between 20% and 200%.
The cpu is activated by the option "-cpu or1200-noflags"

Signed-off-by: Sebastian Macke <sebastian@macke.de>
---
 target-openrisc/cpu.c       | 15 +++++++++++++--
 target-openrisc/cpu.h       |  1 +
 target-openrisc/translate.c | 21 +++++++++++++++++----
 3 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/target-openrisc/cpu.c b/target-openrisc/cpu.c
index 09ba728..c80a30c 100644
--- a/target-openrisc/cpu.c
+++ b/target-openrisc/cpu.c
@@ -124,6 +124,16 @@ static void or1200_initfn(Object *obj)
     set_feature(cpu, OPENRISC_FEATURE_OF32S);
 }
 
+static void or1200_noflags_initfn(Object *obj)
+{
+    OpenRISCCPU *cpu = OPENRISC_CPU(obj);
+
+    set_feature(cpu, OPENRISC_FEATURE_OB32S);
+    set_feature(cpu, OPENRISC_FEATURE_OF32S);
+    set_feature(cpu, OPENRISC_FEATURE_NOFLAGS);
+}
+
+
 static void openrisc_any_initfn(Object *obj)
 {
     OpenRISCCPU *cpu = OPENRISC_CPU(obj);
@@ -137,8 +147,9 @@ typedef struct OpenRISCCPUInfo {
 } OpenRISCCPUInfo;
 
 static const OpenRISCCPUInfo openrisc_cpus[] = {
-    { .name = "or1200",      .initfn = or1200_initfn },
-    { .name = "any",         .initfn = openrisc_any_initfn },
+    { .name = "or1200",         .initfn = or1200_initfn },
+    { .name = "or1200-noflags", .initfn = or1200_noflags_initfn },
+    { .name = "any",            .initfn = openrisc_any_initfn },
 };
 
 static void openrisc_cpu_class_init(ObjectClass *oc, void *data)
diff --git a/target-openrisc/cpu.h b/target-openrisc/cpu.h
index 94bbb17..2423fad 100644
--- a/target-openrisc/cpu.h
+++ b/target-openrisc/cpu.h
@@ -204,6 +204,7 @@ enum {
     OPENRISC_FEATURE_OF32S = (1 << 7),
     OPENRISC_FEATURE_OF64S = (1 << 8),
     OPENRISC_FEATURE_OV64S = (1 << 9),
+    OPENRISC_FEATURE_NOFLAGS = (1 << 10),
 };
 
 /* Tick Timer Mode Register */
diff --git a/target-openrisc/translate.c b/target-openrisc/translate.c
index b1f73c4..d926995 100644
--- a/target-openrisc/translate.c
+++ b/target-openrisc/translate.c
@@ -50,6 +50,7 @@ typedef struct DisasContext {
     TranslationBlock *tb;
     target_ulong pc;
     uint32_t tb_flags, synced_flags, flags;
+    uint32_t cpufeature;
     uint32_t is_jmp;
     uint32_t j_state; /* specifies the jump type*/
     target_ulong j_target; /* target address for jump */
@@ -265,7 +266,9 @@ static void dec_calc(DisasContext *dc, uint32_t insn)
         switch (op1) {
         case 0x00:    /* l.add */
             LOG_DIS("l.add r%d, r%d, r%d\n", rd, ra, rb);
-            {
+            if (dc->cpufeature & OPENRISC_FEATURE_NOFLAGS) {
+                tcg_gen_add_tl(cpu_R[rd], cpu_R[ra], cpu_R[rb]);
+            } else {
                 int lab = gen_new_label();
                 TCGv_i64 ta = tcg_temp_new_i64();
                 TCGv_i64 tb = tcg_temp_new_i64();
@@ -306,7 +309,9 @@ static void dec_calc(DisasContext *dc, uint32_t insn)
         switch (op1) {
         case 0x00:
             LOG_DIS("l.addc r%d, r%d, r%d\n", rd, ra, rb);
-            {
+            if (dc->cpufeature & OPENRISC_FEATURE_NOFLAGS) {
+                gen_illegal_exception(dc);
+            } else {
                 int lab = gen_new_label();
                 TCGv_i64 ta = tcg_temp_new_i64();
                 TCGv_i64 tb = tcg_temp_new_i64();
@@ -355,7 +360,9 @@ static void dec_calc(DisasContext *dc, uint32_t insn)
         switch (op1) {
         case 0x00:
             LOG_DIS("l.sub r%d, r%d, r%d\n", rd, ra, rb);
-            {
+            if (dc->cpufeature & OPENRISC_FEATURE_NOFLAGS) {
+                tcg_gen_sub_tl(cpu_R[rd], cpu_R[ra], cpu_R[rb]);
+            } else {
                 int lab = gen_new_label();
                 TCGv_i64 ta = tcg_temp_new_i64();
                 TCGv_i64 tb = tcg_temp_new_i64();
@@ -952,7 +959,9 @@ static void dec_misc(DisasContext *dc, uint32_t insn)
 
     case 0x27:    /* l.addi */
         LOG_DIS("l.addi r%d, r%d, %d\n", rd, ra, I16);
-        {
+        if (dc->cpufeature & OPENRISC_FEATURE_NOFLAGS) {
+            tcg_gen_addi_tl(cpu_R[rd], cpu_R[ra], sign_extend(I16, 16));
+        } else {
             if (I16 == 0) {
                 tcg_gen_mov_tl(cpu_R[rd], cpu_R[ra]);
             } else {
@@ -987,6 +996,9 @@ static void dec_misc(DisasContext *dc, uint32_t insn)
 
     case 0x28:    /* l.addic */
         LOG_DIS("l.addic r%d, r%d, %d\n", rd, ra, I16);
+        if (dc->cpufeature & OPENRISC_FEATURE_NOFLAGS) {
+            gen_illegal_exception(dc);
+        }
         {
             int lab = gen_new_label();
             TCGv_i64 ta = tcg_temp_new_i64();
@@ -1744,6 +1756,7 @@ static inline void gen_intermediate_code_internal(OpenRISCCPU *cpu,
     dc->j_state = JUMP_DYNAMIC;
     dc->j_target = 0;
     dc->flags = cpu->env.cpucfgr;
+    dc->cpufeature = cpu->feature;
     dc->mem_idx = cpu_mmu_index(&cpu->env);
     dc->synced_flags = dc->tb_flags = tb->flags;
     dc->delayed_branch = !!(dc->tb_flags & D_FLAG);
-- 
1.8.4.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 10/13] target-openrisc: Correct target number for 64 bit llseek
  2013-10-29 19:04 [Qemu-devel] [PATCH 00/13] target-openrisc: More optimizations and corrections Sebastian Macke
                   ` (8 preceding siblings ...)
  2013-10-29 19:04 ` [Qemu-devel] [PATCH 09/13] target-openrisc: Add CPU which neglects Carry and Overflow Flag Sebastian Macke
@ 2013-10-29 19:04 ` Sebastian Macke
  2013-10-29 19:04 ` [Qemu-devel] [PATCH 11/13] target-openrisc: use jmp_pc as flag variable for branches Sebastian Macke
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 48+ messages in thread
From: Sebastian Macke @ 2013-10-29 19:04 UTC (permalink / raw)
  To: qemu-devel, proljc; +Cc: Sebastian Macke, openrisc, openrisc

Signed-off-by: Sebastian Macke <sebastian@macke.de>
---
 linux-user/openrisc/syscall_nr.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/linux-user/openrisc/syscall_nr.h b/linux-user/openrisc/syscall_nr.h
index f4ac91e..72cbcf8 100644
--- a/linux-user/openrisc/syscall_nr.h
+++ b/linux-user/openrisc/syscall_nr.h
@@ -493,7 +493,7 @@
 #define TARGET_NR_fstatfs64 TARGET_NR_3264_fstatfs
 #define TARGET_NR_truncate64 TARGET_NR_3264_truncate
 #define TARGET_NR_ftruncate64 TARGET_NR_3264_ftruncate
-#define TARGET_NR_llseek TARGET_NR_3264_lseek
+#define TARGET_NR__llseek TARGET_NR_3264_lseek
 #define TARGET_NR_sendfile64 TARGET_NR_3264_sendfile
 #define TARGET_NR_fstatat64 TARGET_NR_3264_fstatat
 #define TARGET_NR_fstat64 TARGET_NR_3264_fstat
-- 
1.8.4.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 11/13] target-openrisc: use jmp_pc as flag variable for branches
  2013-10-29 19:04 [Qemu-devel] [PATCH 00/13] target-openrisc: More optimizations and corrections Sebastian Macke
                   ` (9 preceding siblings ...)
  2013-10-29 19:04 ` [Qemu-devel] [PATCH 10/13] target-openrisc: Correct target number for 64 bit llseek Sebastian Macke
@ 2013-10-29 19:04 ` Sebastian Macke
  2013-10-30 18:33   ` Richard Henderson
  2013-10-29 19:04 ` [Qemu-devel] [PATCH 12/13] target-openrisc: Add correct gdb information for the pc value Sebastian Macke
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 48+ messages in thread
From: Sebastian Macke @ 2013-10-29 19:04 UTC (permalink / raw)
  To: qemu-devel, proljc; +Cc: Sebastian Macke, openrisc, openrisc

At the moment a branch l.bf and l.bnf requires an additional flag variable
"btaken" for the jump to decide after the delayed slot
whether the jump should be taken or not.

With this patch the jmp_pc variable is used as a flag.
If jmp_pc is zero the branch is not taken.
This requires that together with the delayed slot flag
an additional branch flag is saved to differ between slot types.

In average the speed increases by around 5%.

Signed-off-by: Sebastian Macke <sebastian@macke.de>
---
 target-openrisc/cpu.h       |  7 ++++---
 target-openrisc/translate.c | 34 +++++++++++++++++++++++++---------
 2 files changed, 29 insertions(+), 12 deletions(-)

diff --git a/target-openrisc/cpu.h b/target-openrisc/cpu.h
index 2423fad..ea007c7 100644
--- a/target-openrisc/cpu.h
+++ b/target-openrisc/cpu.h
@@ -84,8 +84,9 @@ enum {
 /* Version Register */
 #define SPR_VR 0xFFFF003F
 
-/* Internal flags, delay slot flag */
-#define D_FLAG    2
+/* Internal flags */
+#define D_FLAG    2 /* delay slot flag */
+#define B_FLAG    4 /* branch flag */
 
 /* Interrupt */
 #define NR_IRQS  32
@@ -413,7 +414,7 @@ static inline void cpu_get_tb_cpu_state(CPUOpenRISCState *env,
     *pc = env->pc;
     *cs_base = 0;
     /* D_FLAG -- branch instruction exception */
-    *flags = (env->flags & D_FLAG) |
+    *flags = (env->flags & (D_FLAG | B_FLAG)) |
              (env->sr & (SR_SM | SR_DME | SR_IME | SR_OVE));
 }
 
diff --git a/target-openrisc/translate.c b/target-openrisc/translate.c
index d926995..274d38e 100644
--- a/target-openrisc/translate.c
+++ b/target-openrisc/translate.c
@@ -42,7 +42,11 @@
 enum {
     JUMP_DYNAMIC, /* The address is not known during compile time*/
     JUMP_STATIC,  /* The address is known during compile time */
-    JUMP_BRANCH   /* The two possible destinations are known */
+    JUMP_BRANCH,  /* The two possible destinations are known */
+    JUMP_BRANCH_DELAYED  /* Indicates that the first instruction in
+                            the tb is a delayed slot of a previous
+                            branch instruction. */
+
 };
 
 
@@ -54,7 +58,6 @@ typedef struct DisasContext {
     uint32_t is_jmp;
     uint32_t j_state; /* specifies the jump type*/
     target_ulong j_target; /* target address for jump */
-    TCGv btaken;  /* Temporary variable */
     uint32_t mem_idx;
     int singlestep_enabled;
     uint32_t delayed_branch;
@@ -222,15 +225,13 @@ static void gen_jump(DisasContext *dc, uint32_t imm, uint32_t reg, uint32_t op0)
     case 0x04:     /* l.bf  */
         {
             int lab = gen_new_label();
-            dc->btaken = tcg_temp_local_new();
-            tcg_gen_movi_tl(jmp_pc, dc->pc+8);
-            tcg_gen_movi_tl(dc->btaken, 0);
+            tcg_gen_movi_tl(jmp_pc, 0);
             tcg_gen_brcondi_i32(op0 == 0x03 ? TCG_COND_NE : TCG_COND_EQ,
                                 cpu_srf, 0, lab);
-            tcg_gen_movi_tl(dc->btaken, 1);
             tcg_gen_movi_tl(jmp_pc, tmp_pc);
             gen_set_label(lab);
             dc->j_state = JUMP_BRANCH;
+            dc->tb_flags |= B_FLAG;
         }
         break;
     case 0x11:     /* l.jr */
@@ -1708,19 +1709,29 @@ static void check_breakpoint(OpenRISCCPU *cpu, DisasContext *dc)
 static void handle_delay_slot(DisasContext *dc)
 {
     dc->tb_flags &= ~D_FLAG;
+    dc->tb_flags &= ~B_FLAG;
     gen_sync_flags(dc);
 
     switch (dc->j_state) {
     case JUMP_BRANCH:
         {
             int l1 = gen_new_label();
-            tcg_gen_brcondi_tl(TCG_COND_NE, dc->btaken, 0, l1);
+            tcg_gen_brcondi_tl(TCG_COND_NE, jmp_pc, 0, l1);
             gen_goto_tb(dc, 1, dc->pc);
             gen_set_label(l1);
-            tcg_temp_free(dc->btaken);
             gen_goto_tb(dc, 0, dc->j_target);
             break;
         }
+    case JUMP_BRANCH_DELAYED:
+        {
+            int l1 = gen_new_label();
+            tcg_gen_brcondi_tl(TCG_COND_NE, jmp_pc, 0, l1);
+            gen_goto_tb(dc, 1, dc->pc);
+            gen_set_label(l1);
+            tcg_gen_mov_tl(cpu_pc, jmp_pc);
+            tcg_gen_exit_tb(0);
+            break;
+        }
     case JUMP_STATIC:
         gen_goto_tb(dc, 0, dc->j_target);
         break;
@@ -1759,8 +1770,13 @@ static inline void gen_intermediate_code_internal(OpenRISCCPU *cpu,
     dc->cpufeature = cpu->feature;
     dc->mem_idx = cpu_mmu_index(&cpu->env);
     dc->synced_flags = dc->tb_flags = tb->flags;
-    dc->delayed_branch = !!(dc->tb_flags & D_FLAG);
     dc->singlestep_enabled = cs->singlestep_enabled;
+
+    dc->delayed_branch = !!(dc->tb_flags & D_FLAG);
+    if ((dc->delayed_branch) && (dc->tb_flags&B_FLAG)) {
+        dc->j_state = JUMP_BRANCH_DELAYED;
+    }
+
     if (qemu_loglevel_mask(CPU_LOG_TB_IN_ASM)) {
         qemu_log("-----------------------------------------\n");
         log_cpu_state(CPU(cpu), 0);
-- 
1.8.4.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 12/13] target-openrisc: Add correct gdb information for the pc value
  2013-10-29 19:04 [Qemu-devel] [PATCH 00/13] target-openrisc: More optimizations and corrections Sebastian Macke
                   ` (10 preceding siblings ...)
  2013-10-29 19:04 ` [Qemu-devel] [PATCH 11/13] target-openrisc: use jmp_pc as flag variable for branches Sebastian Macke
@ 2013-10-29 19:04 ` Sebastian Macke
  2013-10-29 19:04 ` [Qemu-devel] [PATCH 13/13] target-openrisc: Add In-circuit emulator support Sebastian Macke
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 48+ messages in thread
From: Sebastian Macke @ 2013-10-29 19:04 UTC (permalink / raw)
  To: qemu-devel, proljc; +Cc: Sebastian Macke, openrisc, openrisc

The former patch which removed npc and ppc also removed the part in
which the registers were send to gdb. But the npc parameter
is necessary and the numbering of registers is fixed within gdb.
The correct npc value is the current pc value.

Signed-off-by: Sebastian Macke <sebastian@macke.de>
---
 target-openrisc/gdbstub.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/target-openrisc/gdbstub.c b/target-openrisc/gdbstub.c
index c1f9561..81f0f43 100644
--- a/target-openrisc/gdbstub.c
+++ b/target-openrisc/gdbstub.c
@@ -31,7 +31,13 @@ int openrisc_cpu_gdb_read_register(CPUState *cs, uint8_t *mem_buf, int n)
     } else {
         switch (n) {
 
-        case 32:    /* SR */
+        case 32:    /* PPC */
+            return gdb_get_reg32(mem_buf, env->pc-4);
+
+        case 33:    /* NPC */
+            return gdb_get_reg32(mem_buf, env->pc);
+
+        case 34:    /* SR */
             return gdb_get_reg32(mem_buf, ENV_GET_SR(env));
 
         default:
@@ -59,10 +65,14 @@ int openrisc_cpu_gdb_write_register(CPUState *cs, uint8_t *mem_buf, int n)
     } else {
         switch (n) {
 
-        case 32: /* SR */
+        case 33: /* NPC */
+            env->pc = tmp;
+
+        case 34: /* SR */
             ENV_SET_SR(env, tmp);
             break;
 
+        case 32: /* PPC is not allowed to write */
         default:
             break;
         }
-- 
1.8.4.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 13/13] target-openrisc: Add In-circuit emulator support
  2013-10-29 19:04 [Qemu-devel] [PATCH 00/13] target-openrisc: More optimizations and corrections Sebastian Macke
                   ` (11 preceding siblings ...)
  2013-10-29 19:04 ` [Qemu-devel] [PATCH 12/13] target-openrisc: Add correct gdb information for the pc value Sebastian Macke
@ 2013-10-29 19:04 ` Sebastian Macke
  2013-10-29 19:53 ` [Qemu-devel] [PATCH 00/13] target-openrisc: More optimizations and corrections Peter Maydell
  2013-10-29 21:15 ` Max Filippov
  14 siblings, 0 replies; 48+ messages in thread
From: Sebastian Macke @ 2013-10-29 19:04 UTC (permalink / raw)
  To: qemu-devel, proljc; +Cc: Sebastian Macke, openrisc, openrisc

This patch enables single step debugging in gdb

Signed-off-by: Sebastian Macke <sebastian@macke.de>
---
 target-openrisc/cpu.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target-openrisc/cpu.h b/target-openrisc/cpu.h
index ea007c7..d0d410a 100644
--- a/target-openrisc/cpu.h
+++ b/target-openrisc/cpu.h
@@ -20,6 +20,7 @@
 #ifndef CPU_OPENRISC_H
 #define CPU_OPENRISC_H
 
+#define TARGET_HAS_ICE 1
 #define TARGET_LONG_BITS 32
 #define ELF_MACHINE    EM_OPENRISC
 
-- 
1.8.4.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 05/13] target-openrisc: Remove TLB flush on exception
  2013-10-29 19:04 ` [Qemu-devel] [PATCH 05/13] target-openrisc: Remove TLB flush on exception Sebastian Macke
@ 2013-10-29 19:47   ` Peter Maydell
  2013-10-29 22:41     ` Sebastian Macke
  0 siblings, 1 reply; 48+ messages in thread
From: Peter Maydell @ 2013-10-29 19:47 UTC (permalink / raw)
  To: Sebastian Macke; +Cc: openrisc, openrisc, QEMU Developers, Ethan Hunt

On 29 October 2013 19:04, Sebastian Macke <sebastian@macke.de> wrote:
> The TLB flush is not necessary as the mmu_index field
> already takes care of correct memory locations.
> Instead the tb flag field must be expanded that
> the exception takes the correct translation block.
>
> Signed-off-by: Sebastian Macke <sebastian@macke.de>
> ---
>  target-openrisc/cpu.h       | 4 ++--
>  target-openrisc/interrupt.c | 4 ----
>  2 files changed, 2 insertions(+), 6 deletions(-)
>
> diff --git a/target-openrisc/cpu.h b/target-openrisc/cpu.h
> index 24afe6f..057821d 100644
> --- a/target-openrisc/cpu.h
> +++ b/target-openrisc/cpu.h
> @@ -85,7 +85,7 @@ enum {
>  #define SPR_VR 0xFFFF003F
>
>  /* Internal flags, delay slot flag */
> -#define D_FLAG    1
> +#define D_FLAG    2

Since this set of #defines effectively is the documentation for
what the tb_flags usage is, can you update it to include the
new flag you've added, please?

>  /* Interrupt */
>  #define NR_IRQS  32
> @@ -412,7 +412,7 @@ static inline void cpu_get_tb_cpu_state(CPUOpenRISCState *env,
>      *pc = env->pc;
>      *cs_base = 0;
>      /* D_FLAG -- branch instruction exception */
> -    *flags = (env->flags & D_FLAG);
> +    *flags = (env->flags & D_FLAG) | (env->sr & SR_SM);
>  }
>
>  static inline int cpu_mmu_index(CPUOpenRISCState *env)
> diff --git a/target-openrisc/interrupt.c b/target-openrisc/interrupt.c
> index d1d6ae2..ee98ed3 100644
> --- a/target-openrisc/interrupt.c
> +++ b/target-openrisc/interrupt.c
> @@ -41,10 +41,6 @@ void openrisc_cpu_do_interrupt(CPUState *cs)
>          env->epcr += 4;
>      }
>
> -    /* For machine-state changed between user-mode and supervisor mode,
> -       we need flush TLB when we enter&exit EXCP.  */
> -    tlb_flush(env, 1);
> -
>      env->esr = ENV_GET_SR(env);
>      env->sr &= ~SR_DME;
>      env->sr &= ~SR_IME;

It looks suspicious that this patch doesn't include any change to
translate.c which reads the tb flag you've just added. Either:
 (a) the translated code doesn't actually build in any dependencies
  on the SR_SM flag, in which case it doesn't need to be a tb_flag at all
 (b) the translated code is still referring directly to env->sr somewhere,
  in which case it needs to be changed to use the tb_flags version instead

Also, are you sure that tlb_flush() is needed purely for the change to the
SR_SM flags and not for any of the other CPU state changes that
openrisc_cpu_do_interrupt() is making when it does the user->supervisor
state change?

thanks
-- PMM

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 00/13] target-openrisc: More optimizations and corrections
  2013-10-29 19:04 [Qemu-devel] [PATCH 00/13] target-openrisc: More optimizations and corrections Sebastian Macke
                   ` (12 preceding siblings ...)
  2013-10-29 19:04 ` [Qemu-devel] [PATCH 13/13] target-openrisc: Add In-circuit emulator support Sebastian Macke
@ 2013-10-29 19:53 ` Peter Maydell
  2013-10-29 21:15 ` Max Filippov
  14 siblings, 0 replies; 48+ messages in thread
From: Peter Maydell @ 2013-10-29 19:53 UTC (permalink / raw)
  To: Sebastian Macke; +Cc: openrisc, QEMU Developers, Ethan Hunt

On 29 October 2013 19:04, Sebastian Macke <sebastian@macke.de> wrote:
> Hi,
>
> This is the second part of the patches to make the openrisc target faster
> and more reliable.

Hi. Please could you not cc qemu-devel posts to mailing lists which
don't accept postings from non members? (in this case,
openrisc@lists.openrisc.net.) Otherwise people replying to your posts
get bounce messages from that mailing list.

Thanks
-- PMM

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 03/13] target-openrisc: Separate of load/store instructions
  2013-10-29 19:04 ` [Qemu-devel] [PATCH 03/13] target-openrisc: Separate of load/store instructions Sebastian Macke
@ 2013-10-29 20:05   ` Max Filippov
  2013-10-29 21:36     ` Sebastian Macke
  0 siblings, 1 reply; 48+ messages in thread
From: Max Filippov @ 2013-10-29 20:05 UTC (permalink / raw)
  To: Sebastian Macke; +Cc: openrisc, openrisc, qemu-devel, Ethan Hunt

On Tue, Oct 29, 2013 at 11:04 PM, Sebastian Macke <sebastian@macke.de> wrote:
> This patch separates the load and store instruction to a
> separate function.
> The repetition of the source code can be reduced and further
> optimizations can be implemented.
> In this case it checks for a zero offset and optimizes it.
>
> Additional this patch solves a severe bug for the softmmu emulation.
> The pc has to be saved as these instructions can fail and lead
> to a tlb miss exception.

In case of an exception we re-translate the TB to find the PC where
the exception happened, see cpu_restore_state call from the tlb_fill
function. Also this applies to both user and system emulation, but
you only handle the system emulation case.

> Signed-off-by: Sebastian Macke <sebastian@macke.de>
> ---
>  target-openrisc/translate.c | 130 ++++++++++++++++++++++++++------------------
>  1 file changed, 76 insertions(+), 54 deletions(-)
>
> diff --git a/target-openrisc/translate.c b/target-openrisc/translate.c
> index 31f8717..1bb686c 100644
> --- a/target-openrisc/translate.c
> +++ b/target-openrisc/translate.c
> @@ -692,6 +692,73 @@ static void dec_calc(DisasContext *dc, uint32_t insn)
>      }
>  }
>
> +static void gen_loadstore(DisasContext *dc, uint32 op0,
> +                          uint32_t ra, uint32_t rb, uint32_t rd,
> +                          uint32_t offset)
> +{
> +
> +/* The load and store instructions can fail and lead to a
> + *  tlb miss exception. The correct pc has to be stored for
> + *  this case.
> + */
> +#if !defined(CONFIG_USER_ONLY)
> +    tcg_gen_movi_tl(cpu_pc, dc->pc);
> +#endif
> +
> +    TCGv t0 = cpu_R[ra];
> +    if (offset != 0) {
> +        t0 = tcg_temp_new();
> +        tcg_gen_addi_tl(t0, cpu_R[ra], sign_extend(offset, 16));
> +    }
> +
> +    switch (op0) {
> +    case 0x21:    /* l.lwz */
> +        tcg_gen_qemu_ld32u(cpu_R[rd], t0, dc->mem_idx);
> +        break;
> +
> +    case 0x22:    /* l.lws */
> +        tcg_gen_qemu_ld32s(cpu_R[rd], t0, dc->mem_idx);
> +        break;
> +
> +    case 0x23:    /* l.lbz */
> +        tcg_gen_qemu_ld8u(cpu_R[rd], t0, dc->mem_idx);
> +        break;
> +
> +    case 0x24:    /* l.lbs */
> +        tcg_gen_qemu_ld8s(cpu_R[rd], t0, dc->mem_idx);
> +        break;
> +
> +    case 0x25:    /* l.lhz */
> +        tcg_gen_qemu_ld16u(cpu_R[rd], t0, dc->mem_idx);
> +        break;
> +
> +    case 0x26:    /* l.lhs */
> +        tcg_gen_qemu_ld16s(cpu_R[rd], t0, dc->mem_idx);
> +        break;
> +
> +    case 0x35:    /* l.sw */
> +        tcg_gen_qemu_st32(cpu_R[rb], t0, dc->mem_idx);
> +        break;
> +
> +    case 0x36:    /* l.sb */
> +        tcg_gen_qemu_st8(cpu_R[rb], t0, dc->mem_idx);
> +        break;
> +
> +    case 0x37:    /* l.sh */
> +        tcg_gen_qemu_st16(cpu_R[rb], t0, dc->mem_idx);
> +        break;
> +
> +    default:
> +    break;

Broken indentation.

> +    }
> +
> +    if (offset != 0) {
> +        tcg_temp_free(t0);
> +    }
> +
> +}
> +
> +
>  static void dec_misc(DisasContext *dc, uint32_t insn)
>  {
>      uint32_t op0, op1;
> @@ -843,62 +910,32 @@ static void dec_misc(DisasContext *dc, uint32_t insn)
>
>      case 0x21:    /* l.lwz */
>          LOG_DIS("l.lwz r%d, r%d, %d\n", rd, ra, I16);
> -        {
> -            TCGv t0 = tcg_temp_new();
> -            tcg_gen_addi_tl(t0, cpu_R[ra], sign_extend(I16, 16));
> -            tcg_gen_qemu_ld32u(cpu_R[rd], t0, dc->mem_idx);
> -            tcg_temp_free(t0);
> -        }
> +        gen_loadstore(dc, op0, ra, rb, rd, I16);
>          break;
>
>      case 0x22:    /* l.lws */
>          LOG_DIS("l.lws r%d, r%d, %d\n", rd, ra, I16);
> -        {
> -            TCGv t0 = tcg_temp_new();
> -            tcg_gen_addi_tl(t0, cpu_R[ra], sign_extend(I16, 16));
> -            tcg_gen_qemu_ld32s(cpu_R[rd], t0, dc->mem_idx);
> -            tcg_temp_free(t0);
> -        }
> +        gen_loadstore(dc, op0, ra, rb, rd, I16);
>          break;
>
>      case 0x23:    /* l.lbz */
>          LOG_DIS("l.lbz r%d, r%d, %d\n", rd, ra, I16);
> -        {
> -            TCGv t0 = tcg_temp_new();
> -            tcg_gen_addi_tl(t0, cpu_R[ra], sign_extend(I16, 16));
> -            tcg_gen_qemu_ld8u(cpu_R[rd], t0, dc->mem_idx);
> -            tcg_temp_free(t0);
> -        }
> +        gen_loadstore(dc, op0, ra, rb, rd, I16);
>          break;
>
>      case 0x24:    /* l.lbs */
>          LOG_DIS("l.lbs r%d, r%d, %d\n", rd, ra, I16);
> -        {
> -            TCGv t0 = tcg_temp_new();
> -            tcg_gen_addi_tl(t0, cpu_R[ra], sign_extend(I16, 16));
> -            tcg_gen_qemu_ld8s(cpu_R[rd], t0, dc->mem_idx);
> -            tcg_temp_free(t0);
> -        }
> +        gen_loadstore(dc, op0, ra, rb, rd, I16);
>          break;
>
>      case 0x25:    /* l.lhz */
>          LOG_DIS("l.lhz r%d, r%d, %d\n", rd, ra, I16);
> -        {
> -            TCGv t0 = tcg_temp_new();
> -            tcg_gen_addi_tl(t0, cpu_R[ra], sign_extend(I16, 16));
> -            tcg_gen_qemu_ld16u(cpu_R[rd], t0, dc->mem_idx);
> -            tcg_temp_free(t0);
> -        }
> +        gen_loadstore(dc, op0, ra, rb, rd, I16);
>          break;
>
>      case 0x26:    /* l.lhs */
>          LOG_DIS("l.lhs r%d, r%d, %d\n", rd, ra, I16);
> -        {
> -            TCGv t0 = tcg_temp_new();
> -            tcg_gen_addi_tl(t0, cpu_R[ra], sign_extend(I16, 16));
> -            tcg_gen_qemu_ld16s(cpu_R[rd], t0, dc->mem_idx);
> -            tcg_temp_free(t0);
> -        }
> +        gen_loadstore(dc, op0, ra, rb, rd, I16);
>          break;
>
>      case 0x27:    /* l.addi */
> @@ -1047,32 +1084,17 @@ static void dec_misc(DisasContext *dc, uint32_t insn)
>
>      case 0x35:    /* l.sw */
>          LOG_DIS("l.sw %d, r%d, r%d, %d\n", I5, ra, rb, I11);
> -        {
> -            TCGv t0 = tcg_temp_new();
> -            tcg_gen_addi_tl(t0, cpu_R[ra], sign_extend(tmp, 16));
> -            tcg_gen_qemu_st32(cpu_R[rb], t0, dc->mem_idx);
> -            tcg_temp_free(t0);
> -        }
> +        gen_loadstore(dc, op0, ra, rb, rd, tmp);
>          break;
>
>      case 0x36:    /* l.sb */
>          LOG_DIS("l.sb %d, r%d, r%d, %d\n", I5, ra, rb, I11);
> -        {
> -            TCGv t0 = tcg_temp_new();
> -            tcg_gen_addi_tl(t0, cpu_R[ra], sign_extend(tmp, 16));
> -            tcg_gen_qemu_st8(cpu_R[rb], t0, dc->mem_idx);
> -            tcg_temp_free(t0);
> -        }
> +        gen_loadstore(dc, op0, ra, rb, rd, tmp);
>          break;
>
>      case 0x37:    /* l.sh */
> +        gen_loadstore(dc, op0, ra, rb, rd, tmp);
>          LOG_DIS("l.sh %d, r%d, r%d, %d\n", I5, ra, rb, I11);

In other cases you do it in the reverse order.
Looks like all these cases can be further consolidated into
a pair of LOG_DIS(); gen_loadstore(); with a small table for
LOG_DIS format string each.

> -        {
> -            TCGv t0 = tcg_temp_new();
> -            tcg_gen_addi_tl(t0, cpu_R[ra], sign_extend(tmp, 16));
> -            tcg_gen_qemu_st16(cpu_R[rb], t0, dc->mem_idx);
> -            tcg_temp_free(t0);
> -        }
>          break;
>
>      default:

-- 
Thanks.
-- Max

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 06/13] target-openrisc: Remove TLB flush from l.rfe instruction
  2013-10-29 19:04 ` [Qemu-devel] [PATCH 06/13] target-openrisc: Remove TLB flush from l.rfe instruction Sebastian Macke
@ 2013-10-29 21:01   ` Max Filippov
  2013-10-29 21:53     ` Sebastian Macke
  0 siblings, 1 reply; 48+ messages in thread
From: Max Filippov @ 2013-10-29 21:01 UTC (permalink / raw)
  To: Sebastian Macke; +Cc: openrisc, openrisc, qemu-devel, Ethan Hunt

On Tue, Oct 29, 2013 at 11:04 PM, Sebastian Macke <sebastian@macke.de> wrote:
> At the moment there are two TLBs. The OpenRISC TLB followed
> by the QEMU's own TLB.
> At the end of the TLB miss handler a tlb_flush of QEMUs TLB
> is executed which is exactly what we want to avoid.
> As long as there is no context switch we don't have to flush the TLB.

So this flush was needed in order to clean QEMU TLB in case
DTLB/ITLB translation was enabled/disabled, right? But since you
already have mmu index for nommu operation, wouldn't it be easier
to indicate mmu index correctly for data and code access and drop
this flush?

-- 
Thanks.
-- Max

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 00/13] target-openrisc: More optimizations and corrections
  2013-10-29 19:04 [Qemu-devel] [PATCH 00/13] target-openrisc: More optimizations and corrections Sebastian Macke
                   ` (13 preceding siblings ...)
  2013-10-29 19:53 ` [Qemu-devel] [PATCH 00/13] target-openrisc: More optimizations and corrections Peter Maydell
@ 2013-10-29 21:15 ` Max Filippov
  2013-10-29 21:22   ` Sebastian Macke
  14 siblings, 1 reply; 48+ messages in thread
From: Max Filippov @ 2013-10-29 21:15 UTC (permalink / raw)
  To: Sebastian Macke; +Cc: openrisc, openrisc, qemu-devel, Ethan Hunt

On Tue, Oct 29, 2013 at 11:04 PM, Sebastian Macke <sebastian@macke.de> wrote:
> Hi,
>
> This is the second part of the patches to make the openrisc target faster
> and more reliable.

Hi Sebastian,

this series doesn't apply cleanly to the current qemu git head,
what tree is it based on?

-- 
Thanks.
-- Max

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 07/13] target-openrisc: Correct l.cmov conditional check
  2013-10-29 19:04 ` [Qemu-devel] [PATCH 07/13] target-openrisc: Correct l.cmov conditional check Sebastian Macke
@ 2013-10-29 21:15   ` Max Filippov
  2013-10-29 21:23     ` Sebastian Macke
  0 siblings, 1 reply; 48+ messages in thread
From: Max Filippov @ 2013-10-29 21:15 UTC (permalink / raw)
  To: Sebastian Macke; +Cc: openrisc, openrisc, qemu-devel, Ethan Hunt

On Tue, Oct 29, 2013 at 11:04 PM, Sebastian Macke <sebastian@macke.de> wrote:
> srf is a boolean variable.
> Therefore the instruction should check for != 0 and not for != SR_F
>
> Signed-off-by: Sebastian Macke <sebastian@macke.de>
> ---
>  target-openrisc/translate.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/target-openrisc/translate.c b/target-openrisc/translate.c
> index 378ff1b..9fd1126 100644
> --- a/target-openrisc/translate.c
> +++ b/target-openrisc/translate.c
> @@ -565,7 +565,7 @@ static void dec_calc(DisasContext *dc, uint32_t insn)
>                  int lab = gen_new_label();
>                  TCGv res = tcg_temp_local_new();
>                  tcg_gen_mov_tl(res, cpu_R[rb]);
> -                tcg_gen_brcondi_tl(TCG_COND_NE, cpu_srf, SR_F, lab);
> +                tcg_gen_brcondi_tl(TCG_COND_NE, cpu_srf, 0, lab);
>                  tcg_gen_mov_tl(res, cpu_R[ra]);
>                  gen_set_label(lab);
>                  tcg_gen_mov_tl(cpu_R[rd], res);

Looks like this implementation may be rewritten as

TCGv zero = tcg_const_tl(0);
tcg_gen_movcond_tl(cpu_R[rd], cpu_srf, zero, cpu_R[ra], cpu_R[rb], TCG_COND_EQ);
tcg_temp_free(zero);

-- 
Thanks.
-- Max

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 00/13] target-openrisc: More optimizations and corrections
  2013-10-29 21:15 ` Max Filippov
@ 2013-10-29 21:22   ` Sebastian Macke
  2013-10-31 11:47     ` Jia Liu
  0 siblings, 1 reply; 48+ messages in thread
From: Sebastian Macke @ 2013-10-29 21:22 UTC (permalink / raw)
  To: Max Filippov; +Cc: openrisc, openrisc, qemu-devel, Ethan Hunt

On 29/10/2013 2:15 PM, Max Filippov wrote:
> On Tue, Oct 29, 2013 at 11:04 PM, Sebastian Macke <sebastian@macke.de> wrote:
>> Hi,
>>
>> This is the second part of the patches to make the openrisc target faster
>> and more reliable.
> Hi Sebastian,
>
> this series doesn't apply cleanly to the current qemu git head,
> what tree is it based on?
>

It is based on the current master tree but depends on the patches I send 
on the 21st October.
They got accepted by the maintainer but are not in the master tree right 
now.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 07/13] target-openrisc: Correct l.cmov conditional check
  2013-10-29 21:15   ` Max Filippov
@ 2013-10-29 21:23     ` Sebastian Macke
  0 siblings, 0 replies; 48+ messages in thread
From: Sebastian Macke @ 2013-10-29 21:23 UTC (permalink / raw)
  To: Max Filippov; +Cc: openrisc, openrisc, qemu-devel, Ethan Hunt

On 29/10/2013 2:15 PM, Max Filippov wrote:
> On Tue, Oct 29, 2013 at 11:04 PM, Sebastian Macke <sebastian@macke.de> wrote:
>> srf is a boolean variable.
>> Therefore the instruction should check for != 0 and not for != SR_F
>>
>> Signed-off-by: Sebastian Macke <sebastian@macke.de>
>> ---
>>   target-openrisc/translate.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/target-openrisc/translate.c b/target-openrisc/translate.c
>> index 378ff1b..9fd1126 100644
>> --- a/target-openrisc/translate.c
>> +++ b/target-openrisc/translate.c
>> @@ -565,7 +565,7 @@ static void dec_calc(DisasContext *dc, uint32_t insn)
>>                   int lab = gen_new_label();
>>                   TCGv res = tcg_temp_local_new();
>>                   tcg_gen_mov_tl(res, cpu_R[rb]);
>> -                tcg_gen_brcondi_tl(TCG_COND_NE, cpu_srf, SR_F, lab);
>> +                tcg_gen_brcondi_tl(TCG_COND_NE, cpu_srf, 0, lab);
>>                   tcg_gen_mov_tl(res, cpu_R[ra]);
>>                   gen_set_label(lab);
>>                   tcg_gen_mov_tl(cpu_R[rd], res);
> Looks like this implementation may be rewritten as
>
> TCGv zero = tcg_const_tl(0);
> tcg_gen_movcond_tl(cpu_R[rd], cpu_srf, zero, cpu_R[ra], cpu_R[rb], TCG_COND_EQ);
> tcg_temp_free(zero);
>
You are right. I will change that.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 08/13] target-openrisc: Test for Overflow exception statically
  2013-10-29 19:04 ` [Qemu-devel] [PATCH 08/13] target-openrisc: Test for Overflow exception statically Sebastian Macke
@ 2013-10-29 21:25   ` Max Filippov
  2013-10-29 22:06     ` Sebastian Macke
  0 siblings, 1 reply; 48+ messages in thread
From: Max Filippov @ 2013-10-29 21:25 UTC (permalink / raw)
  To: Sebastian Macke; +Cc: openrisc, openrisc, qemu-devel, Ethan Hunt

On Tue, Oct 29, 2013 at 11:04 PM, Sebastian Macke <sebastian@macke.de> wrote:
> Instead of testing the overflow exception dynamically every time
> The flag will be reckognized by the tcg as changed code and
> will recompile the code with the correct checks.
>
> Signed-off-by: Sebastian Macke <sebastian@macke.de>
> ---
>  target-openrisc/cpu.h       |  3 +-
>  target-openrisc/translate.c | 78 ++++++++++++++++++++++++++-------------------
>  2 files changed, 48 insertions(+), 33 deletions(-)
>
> diff --git a/target-openrisc/cpu.h b/target-openrisc/cpu.h
> index bac61e5..94bbb17 100644
> --- a/target-openrisc/cpu.h
> +++ b/target-openrisc/cpu.h
> @@ -412,7 +412,8 @@ static inline void cpu_get_tb_cpu_state(CPUOpenRISCState *env,
>      *pc = env->pc;
>      *cs_base = 0;
>      /* D_FLAG -- branch instruction exception */
> -    *flags = (env->flags & D_FLAG) | (env->sr & (SR_SM | SR_DME | SR_IME));
> +    *flags = (env->flags & D_FLAG) |
> +             (env->sr & (SR_SM | SR_DME | SR_IME | SR_OVE));
>  }
>
>  static inline int cpu_mmu_index(CPUOpenRISCState *env)
> diff --git a/target-openrisc/translate.c b/target-openrisc/translate.c
> index 9fd1126..b1f73c4 100644
> --- a/target-openrisc/translate.c
> +++ b/target-openrisc/translate.c
> @@ -271,7 +271,6 @@ static void dec_calc(DisasContext *dc, uint32_t insn)
>                  TCGv_i64 tb = tcg_temp_new_i64();
>                  TCGv_i64 td = tcg_temp_local_new_i64();
>                  TCGv_i32 res = tcg_temp_local_new_i32();
> -                TCGv_i32 sr_ove = tcg_temp_local_new_i32();
>                  tcg_gen_extu_i32_i64(ta, cpu_R[ra]);
>                  tcg_gen_extu_i32_i64(tb, cpu_R[rb]);
>                  tcg_gen_add_i64(td, ta, tb);
> @@ -282,16 +281,19 @@ static void dec_calc(DisasContext *dc, uint32_t insn)
>                  tcg_gen_brcondi_i64(TCG_COND_EQ, td, 0x0, lab);
>                  tcg_gen_brcondi_i64(TCG_COND_EQ, td, 0x3, lab);
>                  tcg_gen_ori_i32(cpu_sr, cpu_sr, (SR_OV | SR_CY));
> -                tcg_gen_andi_i32(sr_ove, cpu_sr, SR_OVE);
> -                tcg_gen_brcondi_i32(TCG_COND_NE, sr_ove, SR_OVE, lab);
> -                gen_exception(dc, EXCP_RANGE);
> +                if (dc->tb_flags & SR_OVE) {
> +                    TCGv_i32 sr_ove = tcg_temp_local_new_i32();

This temp doesn't need to be local.

> +                    tcg_gen_andi_i32(sr_ove, cpu_sr, SR_OVE);
> +                    tcg_gen_brcondi_i32(TCG_COND_NE, sr_ove, SR_OVE, lab);
> +                    gen_exception(dc, EXCP_RANGE);
> +                    tcg_temp_free_i32(sr_ove);
> +                }

Repetitions of this code call for making it a nice separate function.

>                  gen_set_label(lab);
>                  tcg_gen_mov_i32(cpu_R[rd], res);
>                  tcg_temp_free_i64(ta);
>                  tcg_temp_free_i64(tb);
>                  tcg_temp_free_i64(td);
>                  tcg_temp_free_i32(res);
> -                tcg_temp_free_i32(sr_ove);
>              }
>              break;
>          default:
> @@ -312,7 +314,6 @@ static void dec_calc(DisasContext *dc, uint32_t insn)
>                  TCGv_i64 td = tcg_temp_local_new_i64();
>                  TCGv_i32 res = tcg_temp_local_new_i32();
>                  TCGv_i32 sr_cy = tcg_temp_local_new_i32();
> -                TCGv_i32 sr_ove = tcg_temp_local_new_i32();
>                  tcg_gen_extu_i32_i64(ta, cpu_R[ra]);
>                  tcg_gen_extu_i32_i64(tb, cpu_R[rb]);
>                  tcg_gen_andi_i32(sr_cy, cpu_sr, SR_CY);
> @@ -327,9 +328,13 @@ static void dec_calc(DisasContext *dc, uint32_t insn)
>                  tcg_gen_brcondi_i64(TCG_COND_EQ, td, 0x0, lab);
>                  tcg_gen_brcondi_i64(TCG_COND_EQ, td, 0x3, lab);
>                  tcg_gen_ori_i32(cpu_sr, cpu_sr, (SR_OV | SR_CY));
> -                tcg_gen_andi_i32(sr_ove, cpu_sr, SR_OVE);
> -                tcg_gen_brcondi_i32(TCG_COND_NE, sr_ove, SR_OVE, lab);
> -                gen_exception(dc, EXCP_RANGE);
> +                if (dc->tb_flags & SR_OVE) {
> +                    TCGv_i32 sr_ove = tcg_temp_local_new_i32();
> +                    tcg_gen_andi_i32(sr_ove, cpu_sr, SR_OVE);
> +                    tcg_gen_brcondi_i32(TCG_COND_NE, sr_ove, SR_OVE, lab);
> +                    gen_exception(dc, EXCP_RANGE);
> +                    tcg_temp_free_i32(sr_ove);
> +                }
>                  gen_set_label(lab);
>                  tcg_gen_mov_i32(cpu_R[rd], res);
>                  tcg_temp_free_i64(ta);
> @@ -338,7 +343,6 @@ static void dec_calc(DisasContext *dc, uint32_t insn)
>                  tcg_temp_free_i64(td);
>                  tcg_temp_free_i32(res);
>                  tcg_temp_free_i32(sr_cy);
> -                tcg_temp_free_i32(sr_ove);
>              }
>              break;
>          default:
> @@ -357,7 +361,6 @@ static void dec_calc(DisasContext *dc, uint32_t insn)
>                  TCGv_i64 tb = tcg_temp_new_i64();
>                  TCGv_i64 td = tcg_temp_local_new_i64();
>                  TCGv_i32 res = tcg_temp_local_new_i32();
> -                TCGv_i32 sr_ove = tcg_temp_local_new_i32();
>
>                  tcg_gen_extu_i32_i64(ta, cpu_R[ra]);
>                  tcg_gen_extu_i32_i64(tb, cpu_R[rb]);
> @@ -369,16 +372,19 @@ static void dec_calc(DisasContext *dc, uint32_t insn)
>                  tcg_gen_brcondi_i64(TCG_COND_EQ, td, 0x0, lab);
>                  tcg_gen_brcondi_i64(TCG_COND_EQ, td, 0x3, lab);
>                  tcg_gen_ori_i32(cpu_sr, cpu_sr, (SR_OV | SR_CY));
> -                tcg_gen_andi_i32(sr_ove, cpu_sr, SR_OVE);
> -                tcg_gen_brcondi_i32(TCG_COND_NE, sr_ove, SR_OVE, lab);
> -                gen_exception(dc, EXCP_RANGE);
> +                if (dc->tb_flags & SR_OVE) {
> +                    TCGv_i32 sr_ove = tcg_temp_local_new_i32();
> +                    tcg_gen_andi_i32(sr_ove, cpu_sr, SR_OVE);
> +                    tcg_gen_brcondi_i32(TCG_COND_NE, sr_ove, SR_OVE, lab);
> +                    gen_exception(dc, EXCP_RANGE);
> +                    tcg_temp_free_i32(sr_ove);
> +                }
>                  gen_set_label(lab);
>                  tcg_gen_mov_i32(cpu_R[rd], res);
>                  tcg_temp_free_i64(ta);
>                  tcg_temp_free_i64(tb);
>                  tcg_temp_free_i64(td);
>                  tcg_temp_free_i32(res);
> -                tcg_temp_free_i32(sr_ove);
>              }
>              break;
>          default:
> @@ -451,10 +457,12 @@ static void dec_calc(DisasContext *dc, uint32_t insn)
>                  TCGv_i32 sr_ove = tcg_temp_local_new_i32();
>                  if (rb == 0) {
>                      tcg_gen_ori_tl(cpu_sr, cpu_sr, (SR_OV | SR_CY));
> -                    tcg_gen_andi_tl(sr_ove, cpu_sr, SR_OVE);
> -                    tcg_gen_brcondi_tl(TCG_COND_NE, sr_ove, SR_OVE, lab0);
> -                    gen_exception(dc, EXCP_RANGE);
> -                    gen_set_label(lab0);
> +                    if (dc->tb_flags & SR_OVE) {
> +                        tcg_gen_andi_tl(sr_ove, cpu_sr, SR_OVE);
> +                        tcg_gen_brcondi_tl(TCG_COND_NE, sr_ove, SR_OVE, lab0);
> +                        gen_exception(dc, EXCP_RANGE);
> +                        gen_set_label(lab0);
> +                    }
>                  } else {
>                      tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_R[rb],
>                                         0x00000000, lab1);
> @@ -464,9 +472,11 @@ static void dec_calc(DisasContext *dc, uint32_t insn)
>                                         0xffffffff, lab2);
>                      gen_set_label(lab1);
>                      tcg_gen_ori_tl(cpu_sr, cpu_sr, (SR_OV | SR_CY));
> -                    tcg_gen_andi_tl(sr_ove, cpu_sr, SR_OVE);
> -                    tcg_gen_brcondi_tl(TCG_COND_NE, sr_ove, SR_OVE, lab3);
> -                    gen_exception(dc, EXCP_RANGE);
> +                    if (dc->tb_flags & SR_OVE) {
> +                        tcg_gen_andi_tl(sr_ove, cpu_sr, SR_OVE);
> +                        tcg_gen_brcondi_tl(TCG_COND_NE, sr_ove, SR_OVE, lab3);
> +                        gen_exception(dc, EXCP_RANGE);
> +                    }
>                      gen_set_label(lab2);
>                      tcg_gen_div_tl(cpu_R[rd], cpu_R[ra], cpu_R[rb]);
>                      gen_set_label(lab3);
> @@ -950,7 +960,6 @@ static void dec_misc(DisasContext *dc, uint32_t insn)
>                  TCGv_i64 ta = tcg_temp_new_i64();
>                  TCGv_i64 td = tcg_temp_local_new_i64();
>                  TCGv_i32 res = tcg_temp_local_new_i32();
> -                TCGv_i32 sr_ove = tcg_temp_local_new_i32();
>                  tcg_gen_extu_i32_i64(ta, cpu_R[ra]);
>                  tcg_gen_addi_i64(td, ta, sign_extend(I16, 16));
>                  tcg_gen_trunc_i64_i32(res, td);
> @@ -960,15 +969,18 @@ static void dec_misc(DisasContext *dc, uint32_t insn)
>                  tcg_gen_brcondi_i64(TCG_COND_EQ, td, 0x0, lab);
>                  tcg_gen_brcondi_i64(TCG_COND_EQ, td, 0x3, lab);
>                  tcg_gen_ori_i32(cpu_sr, cpu_sr, (SR_OV | SR_CY));
> -                tcg_gen_andi_i32(sr_ove, cpu_sr, SR_OVE);
> -                tcg_gen_brcondi_i32(TCG_COND_NE, sr_ove, SR_OVE, lab);
> -                gen_exception(dc, EXCP_RANGE);
> +                if (dc->tb_flags & SR_OVE) {
> +                    TCGv_i32 sr_ove = tcg_temp_local_new_i32();
> +                    tcg_gen_andi_i32(sr_ove, cpu_sr, SR_OVE);
> +                    tcg_gen_brcondi_i32(TCG_COND_NE, sr_ove, SR_OVE, lab);
> +                    gen_exception(dc, EXCP_RANGE);
> +                    tcg_temp_free_i32(sr_ove);
> +                }
>                  gen_set_label(lab);
>                  tcg_gen_mov_i32(cpu_R[rd], res);
>                  tcg_temp_free_i64(ta);
>                  tcg_temp_free_i64(td);
>                  tcg_temp_free_i32(res);
> -                tcg_temp_free_i32(sr_ove);
>              }
>          }
>          break;
> @@ -982,7 +994,6 @@ static void dec_misc(DisasContext *dc, uint32_t insn)
>              TCGv_i64 tcy = tcg_temp_local_new_i64();
>              TCGv_i32 res = tcg_temp_local_new_i32();
>              TCGv_i32 sr_cy = tcg_temp_local_new_i32();
> -            TCGv_i32 sr_ove = tcg_temp_local_new_i32();
>              tcg_gen_extu_i32_i64(ta, cpu_R[ra]);
>              tcg_gen_andi_i32(sr_cy, cpu_sr, SR_CY);
>              tcg_gen_shri_i32(sr_cy, sr_cy, 10);
> @@ -996,9 +1007,13 @@ static void dec_misc(DisasContext *dc, uint32_t insn)
>              tcg_gen_brcondi_i64(TCG_COND_EQ, td, 0x0, lab);
>              tcg_gen_brcondi_i64(TCG_COND_EQ, td, 0x3, lab);
>              tcg_gen_ori_i32(cpu_sr, cpu_sr, (SR_OV | SR_CY));
> -            tcg_gen_andi_i32(sr_ove, cpu_sr, SR_OVE);
> -            tcg_gen_brcondi_i32(TCG_COND_NE, sr_ove, SR_OVE, lab);
> -            gen_exception(dc, EXCP_RANGE);
> +            if (dc->tb_flags & SR_OVE) {
> +                TCGv_i32 sr_ove = tcg_temp_local_new_i32();
> +                tcg_gen_andi_i32(sr_ove, cpu_sr, SR_OVE);
> +                tcg_gen_brcondi_i32(TCG_COND_NE, sr_ove, SR_OVE, lab);
> +                gen_exception(dc, EXCP_RANGE);
> +                tcg_temp_free_i32(sr_ove);
> +            }
>              gen_set_label(lab);
>              tcg_gen_mov_i32(cpu_R[rd], res);
>              tcg_temp_free_i64(ta);
> @@ -1006,7 +1021,6 @@ static void dec_misc(DisasContext *dc, uint32_t insn)
>              tcg_temp_free_i64(tcy);
>              tcg_temp_free_i32(res);
>              tcg_temp_free_i32(sr_cy);
> -            tcg_temp_free_i32(sr_ove);
>          }
>          break;
>

-- 
Thanks.
-- Max

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 03/13] target-openrisc: Separate of load/store instructions
  2013-10-29 20:05   ` Max Filippov
@ 2013-10-29 21:36     ` Sebastian Macke
  2013-10-29 21:49       ` Richard Henderson
  2013-10-29 22:55       ` Max Filippov
  0 siblings, 2 replies; 48+ messages in thread
From: Sebastian Macke @ 2013-10-29 21:36 UTC (permalink / raw)
  To: Max Filippov; +Cc: openrisc, openrisc, qemu-devel, Ethan Hunt

On 29/10/2013 1:05 PM, Max Filippov wrote:
> On Tue, Oct 29, 2013 at 11:04 PM, Sebastian Macke <sebastian@macke.de> wrote:
>> This patch separates the load and store instruction to a
>> separate function.
>> The repetition of the source code can be reduced and further
>> optimizations can be implemented.
>> In this case it checks for a zero offset and optimizes it.
>>
>> Additional this patch solves a severe bug for the softmmu emulation.
>> The pc has to be saved as these instructions can fail and lead
>> to a tlb miss exception.
> In case of an exception we re-translate the TB to find the PC where
> the exception happened, see cpu_restore_state call from the tlb_fill
> function. Also this applies to both user and system emulation, but
> you only handle the system emulation case.


The problem is the epcr register in the interrupt routine in which the 
current pc must be saved.
Of course in the user emulation case the interrupt handler is never 
executed.

When is the pc of the fault determined? Before or after the interrupt 
handler?
Finding this problem gave me a long headache. But it would be nice if 
there is a better solution.


>
>> Signed-off-by: Sebastian Macke <sebastian@macke.de>
>> ---
>>   target-openrisc/translate.c | 130 ++++++++++++++++++++++++++------------------
>>   1 file changed, 76 insertions(+), 54 deletions(-)
>>
>> diff --git a/target-openrisc/translate.c b/target-openrisc/translate.c
>> index 31f8717..1bb686c 100644
>> --- a/target-openrisc/translate.c
>> +++ b/target-openrisc/translate.c
>> @@ -692,6 +692,73 @@ static void dec_calc(DisasContext *dc, uint32_t insn)
>>       }
>>   }
>>
>> +static void gen_loadstore(DisasContext *dc, uint32 op0,
>> +                          uint32_t ra, uint32_t rb, uint32_t rd,
>> +                          uint32_t offset)
>> +{
>> +
>> +/* The load and store instructions can fail and lead to a
>> + *  tlb miss exception. The correct pc has to be stored for
>> + *  this case.
>> + */
>> +#if !defined(CONFIG_USER_ONLY)
>> +    tcg_gen_movi_tl(cpu_pc, dc->pc);
>> +#endif
>> +
>> +    TCGv t0 = cpu_R[ra];
>> +    if (offset != 0) {
>> +        t0 = tcg_temp_new();
>> +        tcg_gen_addi_tl(t0, cpu_R[ra], sign_extend(offset, 16));
>> +    }
>> +
>> +    switch (op0) {
>> +    case 0x21:    /* l.lwz */
>> +        tcg_gen_qemu_ld32u(cpu_R[rd], t0, dc->mem_idx);
>> +        break;
>> +
>> +    case 0x22:    /* l.lws */
>> +        tcg_gen_qemu_ld32s(cpu_R[rd], t0, dc->mem_idx);
>> +        break;
>> +
>> +    case 0x23:    /* l.lbz */
>> +        tcg_gen_qemu_ld8u(cpu_R[rd], t0, dc->mem_idx);
>> +        break;
>> +
>> +    case 0x24:    /* l.lbs */
>> +        tcg_gen_qemu_ld8s(cpu_R[rd], t0, dc->mem_idx);
>> +        break;
>> +
>> +    case 0x25:    /* l.lhz */
>> +        tcg_gen_qemu_ld16u(cpu_R[rd], t0, dc->mem_idx);
>> +        break;
>> +
>> +    case 0x26:    /* l.lhs */
>> +        tcg_gen_qemu_ld16s(cpu_R[rd], t0, dc->mem_idx);
>> +        break;
>> +
>> +    case 0x35:    /* l.sw */
>> +        tcg_gen_qemu_st32(cpu_R[rb], t0, dc->mem_idx);
>> +        break;
>> +
>> +    case 0x36:    /* l.sb */
>> +        tcg_gen_qemu_st8(cpu_R[rb], t0, dc->mem_idx);
>> +        break;
>> +
>> +    case 0x37:    /* l.sh */
>> +        tcg_gen_qemu_st16(cpu_R[rb], t0, dc->mem_idx);
>> +        break;
>> +
>> +    default:
>> +    break;
> Broken indentation.
>
>> +    }
>> +
>> +    if (offset != 0) {
>> +        tcg_temp_free(t0);
>> +    }
>> +
>> +}
>> +
>> +
>>   static void dec_misc(DisasContext *dc, uint32_t insn)
>>   {
>>       uint32_t op0, op1;
>> @@ -843,62 +910,32 @@ static void dec_misc(DisasContext *dc, uint32_t insn)
>>
>>       case 0x21:    /* l.lwz */
>>           LOG_DIS("l.lwz r%d, r%d, %d\n", rd, ra, I16);
>> -        {
>> -            TCGv t0 = tcg_temp_new();
>> -            tcg_gen_addi_tl(t0, cpu_R[ra], sign_extend(I16, 16));
>> -            tcg_gen_qemu_ld32u(cpu_R[rd], t0, dc->mem_idx);
>> -            tcg_temp_free(t0);
>> -        }
>> +        gen_loadstore(dc, op0, ra, rb, rd, I16);
>>           break;
>>
>>       case 0x22:    /* l.lws */
>>           LOG_DIS("l.lws r%d, r%d, %d\n", rd, ra, I16);
>> -        {
>> -            TCGv t0 = tcg_temp_new();
>> -            tcg_gen_addi_tl(t0, cpu_R[ra], sign_extend(I16, 16));
>> -            tcg_gen_qemu_ld32s(cpu_R[rd], t0, dc->mem_idx);
>> -            tcg_temp_free(t0);
>> -        }
>> +        gen_loadstore(dc, op0, ra, rb, rd, I16);
>>           break;
>>
>>       case 0x23:    /* l.lbz */
>>           LOG_DIS("l.lbz r%d, r%d, %d\n", rd, ra, I16);
>> -        {
>> -            TCGv t0 = tcg_temp_new();
>> -            tcg_gen_addi_tl(t0, cpu_R[ra], sign_extend(I16, 16));
>> -            tcg_gen_qemu_ld8u(cpu_R[rd], t0, dc->mem_idx);
>> -            tcg_temp_free(t0);
>> -        }
>> +        gen_loadstore(dc, op0, ra, rb, rd, I16);
>>           break;
>>
>>       case 0x24:    /* l.lbs */
>>           LOG_DIS("l.lbs r%d, r%d, %d\n", rd, ra, I16);
>> -        {
>> -            TCGv t0 = tcg_temp_new();
>> -            tcg_gen_addi_tl(t0, cpu_R[ra], sign_extend(I16, 16));
>> -            tcg_gen_qemu_ld8s(cpu_R[rd], t0, dc->mem_idx);
>> -            tcg_temp_free(t0);
>> -        }
>> +        gen_loadstore(dc, op0, ra, rb, rd, I16);
>>           break;
>>
>>       case 0x25:    /* l.lhz */
>>           LOG_DIS("l.lhz r%d, r%d, %d\n", rd, ra, I16);
>> -        {
>> -            TCGv t0 = tcg_temp_new();
>> -            tcg_gen_addi_tl(t0, cpu_R[ra], sign_extend(I16, 16));
>> -            tcg_gen_qemu_ld16u(cpu_R[rd], t0, dc->mem_idx);
>> -            tcg_temp_free(t0);
>> -        }
>> +        gen_loadstore(dc, op0, ra, rb, rd, I16);
>>           break;
>>
>>       case 0x26:    /* l.lhs */
>>           LOG_DIS("l.lhs r%d, r%d, %d\n", rd, ra, I16);
>> -        {
>> -            TCGv t0 = tcg_temp_new();
>> -            tcg_gen_addi_tl(t0, cpu_R[ra], sign_extend(I16, 16));
>> -            tcg_gen_qemu_ld16s(cpu_R[rd], t0, dc->mem_idx);
>> -            tcg_temp_free(t0);
>> -        }
>> +        gen_loadstore(dc, op0, ra, rb, rd, I16);
>>           break;
>>
>>       case 0x27:    /* l.addi */
>> @@ -1047,32 +1084,17 @@ static void dec_misc(DisasContext *dc, uint32_t insn)
>>
>>       case 0x35:    /* l.sw */
>>           LOG_DIS("l.sw %d, r%d, r%d, %d\n", I5, ra, rb, I11);
>> -        {
>> -            TCGv t0 = tcg_temp_new();
>> -            tcg_gen_addi_tl(t0, cpu_R[ra], sign_extend(tmp, 16));
>> -            tcg_gen_qemu_st32(cpu_R[rb], t0, dc->mem_idx);
>> -            tcg_temp_free(t0);
>> -        }
>> +        gen_loadstore(dc, op0, ra, rb, rd, tmp);
>>           break;
>>
>>       case 0x36:    /* l.sb */
>>           LOG_DIS("l.sb %d, r%d, r%d, %d\n", I5, ra, rb, I11);
>> -        {
>> -            TCGv t0 = tcg_temp_new();
>> -            tcg_gen_addi_tl(t0, cpu_R[ra], sign_extend(tmp, 16));
>> -            tcg_gen_qemu_st8(cpu_R[rb], t0, dc->mem_idx);
>> -            tcg_temp_free(t0);
>> -        }
>> +        gen_loadstore(dc, op0, ra, rb, rd, tmp);
>>           break;
>>
>>       case 0x37:    /* l.sh */
>> +        gen_loadstore(dc, op0, ra, rb, rd, tmp);
>>           LOG_DIS("l.sh %d, r%d, r%d, %d\n", I5, ra, rb, I11);
> In other cases you do it in the reverse order.
> Looks like all these cases can be further consolidated into
> a pair of LOG_DIS(); gen_loadstore(); with a small table for
> LOG_DIS format string each.

You are right. This is not optimal. But before it was worse. I will try 
to I optimize it in the V2 patchset.


>
>> -        {
>> -            TCGv t0 = tcg_temp_new();
>> -            tcg_gen_addi_tl(t0, cpu_R[ra], sign_extend(tmp, 16));
>> -            tcg_gen_qemu_st16(cpu_R[rb], t0, dc->mem_idx);
>> -            tcg_temp_free(t0);
>> -        }
>>           break;
>>
>>       default:

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 03/13] target-openrisc: Separate of load/store instructions
  2013-10-29 21:36     ` Sebastian Macke
@ 2013-10-29 21:49       ` Richard Henderson
  2013-10-29 22:55       ` Max Filippov
  1 sibling, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2013-10-29 21:49 UTC (permalink / raw)
  To: Sebastian Macke, Max Filippov; +Cc: openrisc, openrisc, qemu-devel, Ethan Hunt

On 10/29/2013 02:36 PM, Sebastian Macke wrote:
> 
> The problem is the epcr register in the interrupt routine in which the current
> pc must be saved.

I assume the epcr register is quite predictable based on the insn stream.
One can restore more than just the PC during re-translation.

C.f. the s390 port which also restores CC_OP, and the sparc port which
also restores NPC (next pc).


r~

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 04/13] target-openrisc: sync flags only when necessary
  2013-10-29 19:04 ` [Qemu-devel] [PATCH 04/13] target-openrisc: sync flags only when necessary Sebastian Macke
@ 2013-10-29 21:51   ` Richard Henderson
  0 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2013-10-29 21:51 UTC (permalink / raw)
  To: Sebastian Macke, qemu-devel, proljc; +Cc: openrisc, openrisc

On 10/29/2013 12:04 PM, Sebastian Macke wrote:
> Very often the delayed slot flag is set only to be removed
> one instruction later. This patch sets this flag
> only on instructions which could fail and at the end
> of a translation block if necessary.

This is something else you'd probably be better off
determining during restore_state_to_opc, particularly
for the load/store instructions.


r~

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 06/13] target-openrisc: Remove TLB flush from l.rfe instruction
  2013-10-29 21:01   ` Max Filippov
@ 2013-10-29 21:53     ` Sebastian Macke
  2013-10-29 22:20       ` Max Filippov
  0 siblings, 1 reply; 48+ messages in thread
From: Sebastian Macke @ 2013-10-29 21:53 UTC (permalink / raw)
  To: Max Filippov; +Cc: openrisc, openrisc, qemu-devel, Ethan Hunt

On 29/10/2013 2:01 PM, Max Filippov wrote:
> On Tue, Oct 29, 2013 at 11:04 PM, Sebastian Macke <sebastian@macke.de> wrote:
>> At the moment there are two TLBs. The OpenRISC TLB followed
>> by the QEMU's own TLB.
>> At the end of the TLB miss handler a tlb_flush of QEMUs TLB
>> is executed which is exactly what we want to avoid.
>> As long as there is no context switch we don't have to flush the TLB.
> So this flush was needed in order to clean QEMU TLB in case
> DTLB/ITLB translation was enabled/disabled, right? But since you
> already have mmu index for nommu operation, wouldn't it be easier
> to indicate mmu index correctly for data and code access and drop
> this flush?
>

The mmu index is already set correctly and this patch removes the flush.
1. Problem
The problem is if there is a context switch.  OpenRISC clears its own 
small tlb page by page. But this does mean it flushes all pages in the 
big QEMU TLB.  This is the reason why l.rfe instruction clears the tlb 
which is the instruction used to return to user mode
But according to the specification this is wrong.

2. Problem which is the case you mentioned.
Your are right, this is one solution and its written in the patchnotes 
as point 1.
But this would not solve the problem No 1. I mentioned in this email.

Confused? I am :)

Easy: l.rfe is not supposed to clear the tlb. It can but it shouldn't.
With this patch I remove the flush and solve all problems by assuming a 
global tlb flush if you invalidate the first entry of the small OpenRISC 
TLB.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 08/13] target-openrisc: Test for Overflow exception statically
  2013-10-29 21:25   ` Max Filippov
@ 2013-10-29 22:06     ` Sebastian Macke
  0 siblings, 0 replies; 48+ messages in thread
From: Sebastian Macke @ 2013-10-29 22:06 UTC (permalink / raw)
  To: Max Filippov; +Cc: openrisc, openrisc, qemu-devel, Ethan Hunt

On 29/10/2013 2:25 PM, Max Filippov wrote:
> On Tue, Oct 29, 2013 at 11:04 PM, Sebastian Macke <sebastian@macke.de> wrote:
>> Instead of testing the overflow exception dynamically every time
>> The flag will be reckognized by the tcg as changed code and
>> will recompile the code with the correct checks.
>>
>> Signed-off-by: Sebastian Macke <sebastian@macke.de>
>> ---
>>   target-openrisc/cpu.h       |  3 +-
>>   target-openrisc/translate.c | 78 ++++++++++++++++++++++++++-------------------
>>   2 files changed, 48 insertions(+), 33 deletions(-)
>>
>> diff --git a/target-openrisc/cpu.h b/target-openrisc/cpu.h
>> index bac61e5..94bbb17 100644
>> --- a/target-openrisc/cpu.h
>> +++ b/target-openrisc/cpu.h
>> @@ -412,7 +412,8 @@ static inline void cpu_get_tb_cpu_state(CPUOpenRISCState *env,
>>       *pc = env->pc;
>>       *cs_base = 0;
>>       /* D_FLAG -- branch instruction exception */
>> -    *flags = (env->flags & D_FLAG) | (env->sr & (SR_SM | SR_DME | SR_IME));
>> +    *flags = (env->flags & D_FLAG) |
>> +             (env->sr & (SR_SM | SR_DME | SR_IME | SR_OVE));
>>   }
>>
>>   static inline int cpu_mmu_index(CPUOpenRISCState *env)
>> diff --git a/target-openrisc/translate.c b/target-openrisc/translate.c
>> index 9fd1126..b1f73c4 100644
>> --- a/target-openrisc/translate.c
>> +++ b/target-openrisc/translate.c
>> @@ -271,7 +271,6 @@ static void dec_calc(DisasContext *dc, uint32_t insn)
>>                   TCGv_i64 tb = tcg_temp_new_i64();
>>                   TCGv_i64 td = tcg_temp_local_new_i64();
>>                   TCGv_i32 res = tcg_temp_local_new_i32();
>> -                TCGv_i32 sr_ove = tcg_temp_local_new_i32();
>>                   tcg_gen_extu_i32_i64(ta, cpu_R[ra]);
>>                   tcg_gen_extu_i32_i64(tb, cpu_R[rb]);
>>                   tcg_gen_add_i64(td, ta, tb);
>> @@ -282,16 +281,19 @@ static void dec_calc(DisasContext *dc, uint32_t insn)
>>                   tcg_gen_brcondi_i64(TCG_COND_EQ, td, 0x0, lab);
>>                   tcg_gen_brcondi_i64(TCG_COND_EQ, td, 0x3, lab);
>>                   tcg_gen_ori_i32(cpu_sr, cpu_sr, (SR_OV | SR_CY));
>> -                tcg_gen_andi_i32(sr_ove, cpu_sr, SR_OVE);
>> -                tcg_gen_brcondi_i32(TCG_COND_NE, sr_ove, SR_OVE, lab);
>> -                gen_exception(dc, EXCP_RANGE);
>> +                if (dc->tb_flags & SR_OVE) {
>> +                    TCGv_i32 sr_ove = tcg_temp_local_new_i32();
> This temp doesn't need to be local.


>> +                    tcg_gen_andi_i32(sr_ove, cpu_sr, SR_OVE);
>> +                    tcg_gen_brcondi_i32(TCG_COND_NE, sr_ove, SR_OVE, lab);
>> +                    gen_exception(dc, EXCP_RANGE);
>> +                    tcg_temp_free_i32(sr_ove);
>> +                }
> Repetitions of this code call for making it a nice separate function.

You are right with both suggestions. Thanks for your comments. Looks 
like there is still some optimization possible :)
>
>>                   gen_set_label(lab);
>>                   tcg_gen_mov_i32(cpu_R[rd], res);
>>                   tcg_temp_free_i64(ta);
>>                   tcg_temp_free_i64(tb);
>>                   tcg_temp_free_i64(td);
>>                   tcg_temp_free_i32(res);
>> -                tcg_temp_free_i32(sr_ove);
>>               }
>>               break;
>>           default:
>> @@ -312,7 +314,6 @@ static void dec_calc(DisasContext *dc, uint32_t insn)
>>                   TCGv_i64 td = tcg_temp_local_new_i64();
>>                   TCGv_i32 res = tcg_temp_local_new_i32();
>>                   TCGv_i32 sr_cy = tcg_temp_local_new_i32();
>> -                TCGv_i32 sr_ove = tcg_temp_local_new_i32();
>>                   tcg_gen_extu_i32_i64(ta, cpu_R[ra]);
>>                   tcg_gen_extu_i32_i64(tb, cpu_R[rb]);
>>                   tcg_gen_andi_i32(sr_cy, cpu_sr, SR_CY);
>> @@ -327,9 +328,13 @@ static void dec_calc(DisasContext *dc, uint32_t insn)
>>                   tcg_gen_brcondi_i64(TCG_COND_EQ, td, 0x0, lab);
>>                   tcg_gen_brcondi_i64(TCG_COND_EQ, td, 0x3, lab);
>>                   tcg_gen_ori_i32(cpu_sr, cpu_sr, (SR_OV | SR_CY));
>> -                tcg_gen_andi_i32(sr_ove, cpu_sr, SR_OVE);
>> -                tcg_gen_brcondi_i32(TCG_COND_NE, sr_ove, SR_OVE, lab);
>> -                gen_exception(dc, EXCP_RANGE);
>> +                if (dc->tb_flags & SR_OVE) {
>> +                    TCGv_i32 sr_ove = tcg_temp_local_new_i32();
>> +                    tcg_gen_andi_i32(sr_ove, cpu_sr, SR_OVE);
>> +                    tcg_gen_brcondi_i32(TCG_COND_NE, sr_ove, SR_OVE, lab);
>> +                    gen_exception(dc, EXCP_RANGE);
>> +                    tcg_temp_free_i32(sr_ove);
>> +                }
>>                   gen_set_label(lab);
>>                   tcg_gen_mov_i32(cpu_R[rd], res);
>>                   tcg_temp_free_i64(ta);
>> @@ -338,7 +343,6 @@ static void dec_calc(DisasContext *dc, uint32_t insn)
>>                   tcg_temp_free_i64(td);
>>                   tcg_temp_free_i32(res);
>>                   tcg_temp_free_i32(sr_cy);
>> -                tcg_temp_free_i32(sr_ove);
>>               }
>>               break;
>>           default:
>> @@ -357,7 +361,6 @@ static void dec_calc(DisasContext *dc, uint32_t insn)
>>                   TCGv_i64 tb = tcg_temp_new_i64();
>>                   TCGv_i64 td = tcg_temp_local_new_i64();
>>                   TCGv_i32 res = tcg_temp_local_new_i32();
>> -                TCGv_i32 sr_ove = tcg_temp_local_new_i32();
>>
>>                   tcg_gen_extu_i32_i64(ta, cpu_R[ra]);
>>                   tcg_gen_extu_i32_i64(tb, cpu_R[rb]);
>> @@ -369,16 +372,19 @@ static void dec_calc(DisasContext *dc, uint32_t insn)
>>                   tcg_gen_brcondi_i64(TCG_COND_EQ, td, 0x0, lab);
>>                   tcg_gen_brcondi_i64(TCG_COND_EQ, td, 0x3, lab);
>>                   tcg_gen_ori_i32(cpu_sr, cpu_sr, (SR_OV | SR_CY));
>> -                tcg_gen_andi_i32(sr_ove, cpu_sr, SR_OVE);
>> -                tcg_gen_brcondi_i32(TCG_COND_NE, sr_ove, SR_OVE, lab);
>> -                gen_exception(dc, EXCP_RANGE);
>> +                if (dc->tb_flags & SR_OVE) {
>> +                    TCGv_i32 sr_ove = tcg_temp_local_new_i32();
>> +                    tcg_gen_andi_i32(sr_ove, cpu_sr, SR_OVE);
>> +                    tcg_gen_brcondi_i32(TCG_COND_NE, sr_ove, SR_OVE, lab);
>> +                    gen_exception(dc, EXCP_RANGE);
>> +                    tcg_temp_free_i32(sr_ove);
>> +                }
>>                   gen_set_label(lab);
>>                   tcg_gen_mov_i32(cpu_R[rd], res);
>>                   tcg_temp_free_i64(ta);
>>                   tcg_temp_free_i64(tb);
>>                   tcg_temp_free_i64(td);
>>                   tcg_temp_free_i32(res);
>> -                tcg_temp_free_i32(sr_ove);
>>               }
>>               break;
>>           default:
>> @@ -451,10 +457,12 @@ static void dec_calc(DisasContext *dc, uint32_t insn)
>>                   TCGv_i32 sr_ove = tcg_temp_local_new_i32();
>>                   if (rb == 0) {
>>                       tcg_gen_ori_tl(cpu_sr, cpu_sr, (SR_OV | SR_CY));
>> -                    tcg_gen_andi_tl(sr_ove, cpu_sr, SR_OVE);
>> -                    tcg_gen_brcondi_tl(TCG_COND_NE, sr_ove, SR_OVE, lab0);
>> -                    gen_exception(dc, EXCP_RANGE);
>> -                    gen_set_label(lab0);
>> +                    if (dc->tb_flags & SR_OVE) {
>> +                        tcg_gen_andi_tl(sr_ove, cpu_sr, SR_OVE);
>> +                        tcg_gen_brcondi_tl(TCG_COND_NE, sr_ove, SR_OVE, lab0);
>> +                        gen_exception(dc, EXCP_RANGE);
>> +                        gen_set_label(lab0);
>> +                    }
>>                   } else {
>>                       tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_R[rb],
>>                                          0x00000000, lab1);
>> @@ -464,9 +472,11 @@ static void dec_calc(DisasContext *dc, uint32_t insn)
>>                                          0xffffffff, lab2);
>>                       gen_set_label(lab1);
>>                       tcg_gen_ori_tl(cpu_sr, cpu_sr, (SR_OV | SR_CY));
>> -                    tcg_gen_andi_tl(sr_ove, cpu_sr, SR_OVE);
>> -                    tcg_gen_brcondi_tl(TCG_COND_NE, sr_ove, SR_OVE, lab3);
>> -                    gen_exception(dc, EXCP_RANGE);
>> +                    if (dc->tb_flags & SR_OVE) {
>> +                        tcg_gen_andi_tl(sr_ove, cpu_sr, SR_OVE);
>> +                        tcg_gen_brcondi_tl(TCG_COND_NE, sr_ove, SR_OVE, lab3);
>> +                        gen_exception(dc, EXCP_RANGE);
>> +                    }
>>                       gen_set_label(lab2);
>>                       tcg_gen_div_tl(cpu_R[rd], cpu_R[ra], cpu_R[rb]);
>>                       gen_set_label(lab3);
>> @@ -950,7 +960,6 @@ static void dec_misc(DisasContext *dc, uint32_t insn)
>>                   TCGv_i64 ta = tcg_temp_new_i64();
>>                   TCGv_i64 td = tcg_temp_local_new_i64();
>>                   TCGv_i32 res = tcg_temp_local_new_i32();
>> -                TCGv_i32 sr_ove = tcg_temp_local_new_i32();
>>                   tcg_gen_extu_i32_i64(ta, cpu_R[ra]);
>>                   tcg_gen_addi_i64(td, ta, sign_extend(I16, 16));
>>                   tcg_gen_trunc_i64_i32(res, td);
>> @@ -960,15 +969,18 @@ static void dec_misc(DisasContext *dc, uint32_t insn)
>>                   tcg_gen_brcondi_i64(TCG_COND_EQ, td, 0x0, lab);
>>                   tcg_gen_brcondi_i64(TCG_COND_EQ, td, 0x3, lab);
>>                   tcg_gen_ori_i32(cpu_sr, cpu_sr, (SR_OV | SR_CY));
>> -                tcg_gen_andi_i32(sr_ove, cpu_sr, SR_OVE);
>> -                tcg_gen_brcondi_i32(TCG_COND_NE, sr_ove, SR_OVE, lab);
>> -                gen_exception(dc, EXCP_RANGE);
>> +                if (dc->tb_flags & SR_OVE) {
>> +                    TCGv_i32 sr_ove = tcg_temp_local_new_i32();
>> +                    tcg_gen_andi_i32(sr_ove, cpu_sr, SR_OVE);
>> +                    tcg_gen_brcondi_i32(TCG_COND_NE, sr_ove, SR_OVE, lab);
>> +                    gen_exception(dc, EXCP_RANGE);
>> +                    tcg_temp_free_i32(sr_ove);
>> +                }
>>                   gen_set_label(lab);
>>                   tcg_gen_mov_i32(cpu_R[rd], res);
>>                   tcg_temp_free_i64(ta);
>>                   tcg_temp_free_i64(td);
>>                   tcg_temp_free_i32(res);
>> -                tcg_temp_free_i32(sr_ove);
>>               }
>>           }
>>           break;
>> @@ -982,7 +994,6 @@ static void dec_misc(DisasContext *dc, uint32_t insn)
>>               TCGv_i64 tcy = tcg_temp_local_new_i64();
>>               TCGv_i32 res = tcg_temp_local_new_i32();
>>               TCGv_i32 sr_cy = tcg_temp_local_new_i32();
>> -            TCGv_i32 sr_ove = tcg_temp_local_new_i32();
>>               tcg_gen_extu_i32_i64(ta, cpu_R[ra]);
>>               tcg_gen_andi_i32(sr_cy, cpu_sr, SR_CY);
>>               tcg_gen_shri_i32(sr_cy, sr_cy, 10);
>> @@ -996,9 +1007,13 @@ static void dec_misc(DisasContext *dc, uint32_t insn)
>>               tcg_gen_brcondi_i64(TCG_COND_EQ, td, 0x0, lab);
>>               tcg_gen_brcondi_i64(TCG_COND_EQ, td, 0x3, lab);
>>               tcg_gen_ori_i32(cpu_sr, cpu_sr, (SR_OV | SR_CY));
>> -            tcg_gen_andi_i32(sr_ove, cpu_sr, SR_OVE);
>> -            tcg_gen_brcondi_i32(TCG_COND_NE, sr_ove, SR_OVE, lab);
>> -            gen_exception(dc, EXCP_RANGE);
>> +            if (dc->tb_flags & SR_OVE) {
>> +                TCGv_i32 sr_ove = tcg_temp_local_new_i32();
>> +                tcg_gen_andi_i32(sr_ove, cpu_sr, SR_OVE);
>> +                tcg_gen_brcondi_i32(TCG_COND_NE, sr_ove, SR_OVE, lab);
>> +                gen_exception(dc, EXCP_RANGE);
>> +                tcg_temp_free_i32(sr_ove);
>> +            }
>>               gen_set_label(lab);
>>               tcg_gen_mov_i32(cpu_R[rd], res);
>>               tcg_temp_free_i64(ta);
>> @@ -1006,7 +1021,6 @@ static void dec_misc(DisasContext *dc, uint32_t insn)
>>               tcg_temp_free_i64(tcy);
>>               tcg_temp_free_i32(res);
>>               tcg_temp_free_i32(sr_cy);
>> -            tcg_temp_free_i32(sr_ove);
>>           }
>>           break;
>>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 06/13] target-openrisc: Remove TLB flush from l.rfe instruction
  2013-10-29 21:53     ` Sebastian Macke
@ 2013-10-29 22:20       ` Max Filippov
  2013-10-29 23:14         ` Sebastian Macke
  0 siblings, 1 reply; 48+ messages in thread
From: Max Filippov @ 2013-10-29 22:20 UTC (permalink / raw)
  To: Sebastian Macke; +Cc: openrisc, openrisc, qemu-devel, Ethan Hunt

On Wed, Oct 30, 2013 at 1:53 AM, Sebastian Macke <sebastian@macke.de> wrote:
> On 29/10/2013 2:01 PM, Max Filippov wrote:
>>
>> On Tue, Oct 29, 2013 at 11:04 PM, Sebastian Macke <sebastian@macke.de>
>> wrote:
>>>
>>> At the moment there are two TLBs. The OpenRISC TLB followed
>>> by the QEMU's own TLB.
>>> At the end of the TLB miss handler a tlb_flush of QEMUs TLB
>>> is executed which is exactly what we want to avoid.
>>> As long as there is no context switch we don't have to flush the TLB.
>>
>> So this flush was needed in order to clean QEMU TLB in case
>> DTLB/ITLB translation was enabled/disabled, right? But since you
>> already have mmu index for nommu operation, wouldn't it be easier
>> to indicate mmu index correctly for data and code access and drop
>> this flush?
>>
>
> The mmu index is already set correctly and this patch removes the flush.

I'm not sure: cpu_mmu_index only checks SR_IME, not SR_DME.

> 1. Problem
> The problem is if there is a context switch.  OpenRISC clears its own small
> tlb page by page. But this does mean it flushes all pages in the big QEMU

I think there shouldn't be any entries in QEMU TLB other than those in
OpenRISC TLB, otherwise it would be possible to access unmapped virtual
addresses without getting pagefaults.

> TLB.  This is the reason why l.rfe instruction clears the tlb which is the
> instruction used to return to user mode
> But according to the specification this is wrong.
>
> 2. Problem which is the case you mentioned.
> Your are right, this is one solution and its written in the patchnotes as
> point 1.
> But this would not solve the problem No 1. I mentioned in this email.
>
> Confused? I am :)
>
> Easy: l.rfe is not supposed to clear the tlb. It can but it shouldn't.
> With this patch I remove the flush and solve all problems by assuming a
> global tlb flush if you invalidate the first entry of the small OpenRISC
> TLB.

-- 
Thanks.
-- Max

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 05/13] target-openrisc: Remove TLB flush on exception
  2013-10-29 19:47   ` Peter Maydell
@ 2013-10-29 22:41     ` Sebastian Macke
  2013-11-01 18:58       ` Peter Maydell
  2013-11-02  1:29       ` [Qemu-devel] " Richard Henderson
  0 siblings, 2 replies; 48+ messages in thread
From: Sebastian Macke @ 2013-10-29 22:41 UTC (permalink / raw)
  To: Peter Maydell; +Cc: openrisc, openrisc, QEMU Developers, Ethan Hunt

On 29/10/2013 12:47 PM, Peter Maydell wrote:
> On 29 October 2013 19:04, Sebastian Macke <sebastian@macke.de> wrote:
>> The TLB flush is not necessary as the mmu_index field
>> already takes care of correct memory locations.
>> Instead the tb flag field must be expanded that
>> the exception takes the correct translation block.
>>
>> Signed-off-by: Sebastian Macke <sebastian@macke.de>
>> ---
>>   target-openrisc/cpu.h       | 4 ++--
>>   target-openrisc/interrupt.c | 4 ----
>>   2 files changed, 2 insertions(+), 6 deletions(-)
>>
>> diff --git a/target-openrisc/cpu.h b/target-openrisc/cpu.h
>> index 24afe6f..057821d 100644
>> --- a/target-openrisc/cpu.h
>> +++ b/target-openrisc/cpu.h
>> @@ -85,7 +85,7 @@ enum {
>>   #define SPR_VR 0xFFFF003F
>>
>>   /* Internal flags, delay slot flag */
>> -#define D_FLAG    1
>> +#define D_FLAG    2
> Since this set of #defines effectively is the documentation for
> what the tb_flags usage is, can you update it to include the
> new flag you've added, please?

I will. I think I have done it in one of the later patches.
But the D_FLAG was there before. What I did was just changing it to 2 
because 1 is used by the new SR_SM
(supervisor mode) Flag.


>>   /* Interrupt */
>>   #define NR_IRQS  32
>> @@ -412,7 +412,7 @@ static inline void cpu_get_tb_cpu_state(CPUOpenRISCState *env,
>>       *pc = env->pc;
>>       *cs_base = 0;
>>       /* D_FLAG -- branch instruction exception */
>> -    *flags = (env->flags & D_FLAG);
>> +    *flags = (env->flags & D_FLAG) | (env->sr & SR_SM);
>>   }
>>
>>   static inline int cpu_mmu_index(CPUOpenRISCState *env)
>> diff --git a/target-openrisc/interrupt.c b/target-openrisc/interrupt.c
>> index d1d6ae2..ee98ed3 100644
>> --- a/target-openrisc/interrupt.c
>> +++ b/target-openrisc/interrupt.c
>> @@ -41,10 +41,6 @@ void openrisc_cpu_do_interrupt(CPUState *cs)
>>           env->epcr += 4;
>>       }
>>
>> -    /* For machine-state changed between user-mode and supervisor mode,
>> -       we need flush TLB when we enter&exit EXCP.  */
>> -    tlb_flush(env, 1);
>> -
>>       env->esr = ENV_GET_SR(env);
>>       env->sr &= ~SR_DME;
>>       env->sr &= ~SR_IME;
> It looks suspicious that this patch doesn't include any change to
> translate.c which reads the tb flag you've just added. Either:
>   (a) the translated code doesn't actually build in any dependencies
>    on the SR_SM flag, in which case it doesn't need to be a tb_flag at all
>   (b) the translated code is still referring directly to env->sr somewhere,
>    in which case it needs to be changed to use the tb_flags version instead
>
> Also, are you sure that tlb_flush() is needed purely for the change to the
> SR_SM flags and not for any of the other CPU state changes that
> openrisc_cpu_do_interrupt() is making when it does the user->supervisor
> state change?
>
> thanks
> -- PMM

The exception is going into supervisor mode and disables the mmu. The 
mmu_index is changed and it should work.
But then the emulated Linux crashes.
This does not happen when I add the supervisor mode flag to the tb_flags.

I was also a little bit confused when I implemented it. But I don't know 
the internals of QEMU as good as you. And some other targets
doing it the same way I think.

What is included in the tb hash? The virtual pc + physical page + the 
tb_flags? Not the mmu_index?

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 03/13] target-openrisc: Separate of load/store instructions
  2013-10-29 21:36     ` Sebastian Macke
  2013-10-29 21:49       ` Richard Henderson
@ 2013-10-29 22:55       ` Max Filippov
  2013-10-29 23:37         ` Sebastian Macke
  1 sibling, 1 reply; 48+ messages in thread
From: Max Filippov @ 2013-10-29 22:55 UTC (permalink / raw)
  To: Sebastian Macke; +Cc: openrisc, openrisc, qemu-devel, Ethan Hunt

On Wed, Oct 30, 2013 at 1:36 AM, Sebastian Macke <sebastian@macke.de> wrote:
> On 29/10/2013 1:05 PM, Max Filippov wrote:
>>
>> On Tue, Oct 29, 2013 at 11:04 PM, Sebastian Macke <sebastian@macke.de>
>> wrote:
>>> Additional this patch solves a severe bug for the softmmu emulation.
>>> The pc has to be saved as these instructions can fail and lead
>>> to a tlb miss exception.
>>
>> In case of an exception we re-translate the TB to find the PC where
>> the exception happened, see cpu_restore_state call from the tlb_fill
>> function. Also this applies to both user and system emulation, but
>> you only handle the system emulation case.
>
> The problem is the epcr register in the interrupt routine in which the
> current pc must be saved.
> Of course in the user emulation case the interrupt handler is never
> executed.
>
> When is the pc of the fault determined? Before or after the interrupt
> handler?

Before, in the tlb_fill: cpu_restore_state is called to restore context,
and after that cpu_loop_exit is called to handle the exception.

> Finding this problem gave me a long headache. But it would be nice if there
> is a better solution.

-- 
Thanks.
-- Max

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 06/13] target-openrisc: Remove TLB flush from l.rfe instruction
  2013-10-29 22:20       ` Max Filippov
@ 2013-10-29 23:14         ` Sebastian Macke
  0 siblings, 0 replies; 48+ messages in thread
From: Sebastian Macke @ 2013-10-29 23:14 UTC (permalink / raw)
  To: Max Filippov; +Cc: openrisc, openrisc, qemu-devel, Ethan Hunt

On 29/10/2013 3:20 PM, Max Filippov wrote:
> On Wed, Oct 30, 2013 at 1:53 AM, Sebastian Macke <sebastian@macke.de> wrote:
>> On 29/10/2013 2:01 PM, Max Filippov wrote:
>>> On Tue, Oct 29, 2013 at 11:04 PM, Sebastian Macke <sebastian@macke.de>
>>> wrote:
>>>> At the moment there are two TLBs. The OpenRISC TLB followed
>>>> by the QEMU's own TLB.
>>>> At the end of the TLB miss handler a tlb_flush of QEMUs TLB
>>>> is executed which is exactly what we want to avoid.
>>>> As long as there is no context switch we don't have to flush the TLB.
>>> So this flush was needed in order to clean QEMU TLB in case
>>> DTLB/ITLB translation was enabled/disabled, right? But since you
>>> already have mmu index for nommu operation, wouldn't it be easier
>>> to indicate mmu index correctly for data and code access and drop
>>> this flush?
>>>
>> The mmu index is already set correctly and this patch removes the flush.
> I'm not sure: cpu_mmu_index only checks SR_IME, not SR_DME.

And again, correct. I saw this problem too and wanted to correct it later.
But using Linux the IME and DME flags are always changed together and 
therefore distinguishing between them don't play a role.
So I forgot it in the end.

>> 1. Problem
>> The problem is if there is a context switch.  OpenRISC clears its own small
>> tlb page by page. But this does mean it flushes all pages in the big QEMU
> I think there shouldn't be any entries in QEMU TLB other than those in
> OpenRISC TLB, otherwise it would be possible to access unmapped virtual
> addresses without getting pagefaults.
Correct. And this was one dilemma I had. How to increase the whole speed 
of tlb misses by an order of magnitude
and not break any rules.

Without the two patches it works like this.
User mode OpenRISC TLB-Miss -> Exception -> QEMU TLB flush -> Set new 
page in the tlb handler -> return via l.rfe -> QEMU TLB flush

In the end we had always only one valid page in the QEMU TLB which is 
kind of crazy. ( Now I am unsure, because with my logic we would have 
never a valid page in the QEMU TLB and would run in an endless loop)

So with the patches QEMU's TLB could have indeed more entries than the 
OpenRISC TLB right now but which is 99% fine if you run it under Linux. 
Linux has the option to clear certain pages from the tlb which then of 
course will not reckognize those pages in QEMUs TLB. So there is a small 
chance for an error.

Three options:

1. Removing QEMUs TLBs.
2. Another option might be to make the flushing mmu_index dependend to 
increase the speed. But this is not implemented as far as I see.
3. I could also invalidate the previous page if the OpenRISC TLB entry 
is overwritten. But then I don't know the correct mmu_index. So this has 
to be done for all possible mmu indices.

Option three might be possible. I didn't think about this before.

Hopefully OpenRISC will get Hardware TLB refill next year. This would 
solve the problem.


>> TLB.  This is the reason why l.rfe instruction clears the tlb which is the
>> instruction used to return to user mode
>> But according to the specification this is wrong.
>>
>> 2. Problem which is the case you mentioned.
>> Your are right, this is one solution and its written in the patchnotes as
>> point 1.
>> But this would not solve the problem No 1. I mentioned in this email.
>>
>> Confused? I am :)
>>
>> Easy: l.rfe is not supposed to clear the tlb. It can but it shouldn't.
>> With this patch I remove the flush and solve all problems by assuming a
>> global tlb flush if you invalidate the first entry of the small OpenRISC
>> TLB.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 03/13] target-openrisc: Separate of load/store instructions
  2013-10-29 22:55       ` Max Filippov
@ 2013-10-29 23:37         ` Sebastian Macke
  0 siblings, 0 replies; 48+ messages in thread
From: Sebastian Macke @ 2013-10-29 23:37 UTC (permalink / raw)
  To: Max Filippov; +Cc: openrisc, openrisc, qemu-devel, Ethan Hunt

On 29/10/2013 3:55 PM, Max Filippov wrote:
> On Wed, Oct 30, 2013 at 1:36 AM, Sebastian Macke <sebastian@macke.de> wrote:
>> On 29/10/2013 1:05 PM, Max Filippov wrote:
>>> On Tue, Oct 29, 2013 at 11:04 PM, Sebastian Macke <sebastian@macke.de>
>>> wrote:
>>>> Additional this patch solves a severe bug for the softmmu emulation.
>>>> The pc has to be saved as these instructions can fail and lead
>>>> to a tlb miss exception.
>>> In case of an exception we re-translate the TB to find the PC where
>>> the exception happened, see cpu_restore_state call from the tlb_fill
>>> function. Also this applies to both user and system emulation, but
>>> you only handle the system emulation case.
>> The problem is the epcr register in the interrupt routine in which the
>> current pc must be saved.
>> Of course in the user emulation case the interrupt handler is never
>> executed.
>>
>> When is the pc of the fault determined? Before or after the interrupt
>> handler?
> Before, in the tlb_fill: cpu_restore_state is called to restore context,
> and after that cpu_loop_exit is called to handle the exception.

Ok, but then my line should be useless and the pc value should be 
immediately overwritten when the we get an OpenRISC TLB miss.
This is definitely not the case. If I remove it I get random kernel 
crashes. So there is anything else wrong.
Maybe the curent MMU code directly executes the exception handler. So 
QEMU has no chance in interfering  and can't give me the correct pc. 
Maybe I will have to restore the context myself.

>> Finding this problem gave me a long headache. But it would be nice if there
>> is a better solution.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 09/13] target-openrisc: Add CPU which neglects Carry and Overflow Flag
  2013-10-29 19:04 ` [Qemu-devel] [PATCH 09/13] target-openrisc: Add CPU which neglects Carry and Overflow Flag Sebastian Macke
@ 2013-10-30 18:14   ` Richard Henderson
  2013-10-30 19:22     ` Sebastian Macke
  0 siblings, 1 reply; 48+ messages in thread
From: Richard Henderson @ 2013-10-30 18:14 UTC (permalink / raw)
  To: Sebastian Macke; +Cc: qemu-devel, proljc

On 10/29/2013 12:04 PM, Sebastian Macke wrote:
> The carry and overflag and the instructions l.addc and l.addic
> are never used in the toolchain. Linux and gcc compiled software
> don't need them.

Really?  That's quite surprising.

> To speed up the emulation a cpu was added which
> neglects the flags for l.addi, l.add, l.sub and
> generates an illegal instruction error for l.addic and l.addc

I'm somewhat shocked that l.addc is never used.  To me that points
to a missed opportunity in the compiler.

It would be much better to simply improve handling of these bits.
In a previous patch set you broke out SR[F] to its own variable;
I suggest that you do the same for SR[CY] and SR[OV].

If you can implement add et al without branches, the TCG optimizer
will be able to do a good job eliminating shadowed computation.

A good example to follow here is the ARM implementation.  Have a
look at the gen_add_CC and gen_sub_CC functions especially.  Note
that the overflow bit is stored in bit 31 of cpu_VF and the other
bits of cpu_VF are ignored.


r~

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 11/13] target-openrisc: use jmp_pc as flag variable for branches
  2013-10-29 19:04 ` [Qemu-devel] [PATCH 11/13] target-openrisc: use jmp_pc as flag variable for branches Sebastian Macke
@ 2013-10-30 18:33   ` Richard Henderson
  2013-10-30 19:07     ` Sebastian Macke
  0 siblings, 1 reply; 48+ messages in thread
From: Richard Henderson @ 2013-10-30 18:33 UTC (permalink / raw)
  To: Sebastian Macke; +Cc: openrisc, openrisc, qemu-devel, proljc

On 10/29/2013 12:04 PM, Sebastian Macke wrote:
>          {
>              int lab = gen_new_label();
> -            dc->btaken = tcg_temp_local_new();
> -            tcg_gen_movi_tl(jmp_pc, dc->pc+8);
> -            tcg_gen_movi_tl(dc->btaken, 0);
> +            tcg_gen_movi_tl(jmp_pc, 0);
>              tcg_gen_brcondi_i32(op0 == 0x03 ? TCG_COND_NE : TCG_COND_EQ,
>                                  cpu_srf, 0, lab);
> -            tcg_gen_movi_tl(dc->btaken, 1);
>              tcg_gen_movi_tl(jmp_pc, tmp_pc);
>              gen_set_label(lab);

You can now use movcond here:

  tcg_gen_movi_i32(jmp_pc, tmp_pc);
  tcg_gen_movcond_i32(jmp_pc, cpu_srf, zero, jmp_pc, zero,
                      op0 == 0x03 ? TCG_COND_NE : TCG_COND_EQ);

Although I'd wonder about just using setcond instead, since I think
the value stored in jmp_pc here is also stored in dc->j_target, leading
to the right behaviour...

>      case JUMP_BRANCH:
>          {
>              int l1 = gen_new_label();
> -            tcg_gen_brcondi_tl(TCG_COND_NE, dc->btaken, 0, l1);
> +            tcg_gen_brcondi_tl(TCG_COND_NE, jmp_pc, 0, l1);
>              gen_goto_tb(dc, 1, dc->pc);
>              gen_set_label(l1);
> -            tcg_temp_free(dc->btaken);
>              gen_goto_tb(dc, 0, dc->j_target);
>              break;

... here.

> +    case JUMP_BRANCH_DELAYED:
> +        {
> +            int l1 = gen_new_label();
> +            tcg_gen_brcondi_tl(TCG_COND_NE, jmp_pc, 0, l1);
> +            gen_goto_tb(dc, 1, dc->pc);
> +            gen_set_label(l1);
> +            tcg_gen_mov_tl(cpu_pc, jmp_pc);
> +            tcg_gen_exit_tb(0);
> +            break;
...
> +
> +    dc->delayed_branch = !!(dc->tb_flags & D_FLAG);
> +    if ((dc->delayed_branch) && (dc->tb_flags&B_FLAG)) {
> +        dc->j_state = JUMP_BRANCH_DELAYED;
> +    }
> +


And thus I can't see how these additions are actually useful?

If I've missed something, and the last hunk needs to be retained,
then please fix the coding style:

  if (dc->delayed_branch && (dc->tb_flags & B_FLAG)) {


r~

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 11/13] target-openrisc: use jmp_pc as flag variable for branches
  2013-10-30 18:33   ` Richard Henderson
@ 2013-10-30 19:07     ` Sebastian Macke
  2013-10-30 19:32       ` Richard Henderson
  2013-10-30 19:47       ` Richard Henderson
  0 siblings, 2 replies; 48+ messages in thread
From: Sebastian Macke @ 2013-10-30 19:07 UTC (permalink / raw)
  To: Richard Henderson; +Cc: openrisc, openrisc, qemu-devel, proljc

On 30/10/2013 11:33 AM, Richard Henderson wrote:
> On 10/29/2013 12:04 PM, Sebastian Macke wrote:
>>           {
>>               int lab = gen_new_label();
>> -            dc->btaken = tcg_temp_local_new();
>> -            tcg_gen_movi_tl(jmp_pc, dc->pc+8);
>> -            tcg_gen_movi_tl(dc->btaken, 0);
>> +            tcg_gen_movi_tl(jmp_pc, 0);
>>               tcg_gen_brcondi_i32(op0 == 0x03 ? TCG_COND_NE : TCG_COND_EQ,
>>                                   cpu_srf, 0, lab);
>> -            tcg_gen_movi_tl(dc->btaken, 1);
>>               tcg_gen_movi_tl(jmp_pc, tmp_pc);
>>               gen_set_label(lab);
> You can now use movcond here:
>
>    tcg_gen_movi_i32(jmp_pc, tmp_pc);
>    tcg_gen_movcond_i32(jmp_pc, cpu_srf, zero, jmp_pc, zero,
>                        op0 == 0x03 ? TCG_COND_NE : TCG_COND_EQ);
>
> Although I'd wonder about just using setcond instead, since I think
> the value stored in jmp_pc here is also stored in dc->j_target, leading
> to the right behaviour...

Correct. The movcond function can be used here. I will change that.
But I didn't know that it exists because it is not written in the 
document http://wiki.qemu.org/Documentation/TCG/frontend-ops


>
>>       case JUMP_BRANCH:
>>           {
>>               int l1 = gen_new_label();
>> -            tcg_gen_brcondi_tl(TCG_COND_NE, dc->btaken, 0, l1);
>> +            tcg_gen_brcondi_tl(TCG_COND_NE, jmp_pc, 0, l1);
>>               gen_goto_tb(dc, 1, dc->pc);
>>               gen_set_label(l1);
>> -            tcg_temp_free(dc->btaken);
>>               gen_goto_tb(dc, 0, dc->j_target);
>>               break;
> ... here.

But j_target is not known when the delayed slot is translated 
separately. (E.g. if the delayed slot is at a page boundary.)

>
>> +    case JUMP_BRANCH_DELAYED:
>> +        {
>> +            int l1 = gen_new_label();
>> +            tcg_gen_brcondi_tl(TCG_COND_NE, jmp_pc, 0, l1);
>> +            gen_goto_tb(dc, 1, dc->pc);
>> +            gen_set_label(l1);
>> +            tcg_gen_mov_tl(cpu_pc, jmp_pc);
>> +            tcg_gen_exit_tb(0);
>> +            break;
> ...
>> +
>> +    dc->delayed_branch = !!(dc->tb_flags & D_FLAG);
>> +    if ((dc->delayed_branch) && (dc->tb_flags&B_FLAG)) {
>> +        dc->j_state = JUMP_BRANCH_DELAYED;
>> +    }
>> +
>
> And thus I can't see how these additions are actually useful?
>
> If I've missed something, and the last hunk needs to be retained,
> then please fix the coding style:
>
>    if (dc->delayed_branch && (dc->tb_flags & B_FLAG)) {
The problem is the delayed slot.
The decision whether a jump is taken or not depends on the previous 
branch instruction l.bf and l.bnf instruction.
If only the delayed slot is translated then we have to know what was the 
previous instruction.
In this case we have to differ additionally if the delayed slot is a 
branch or a normal jump. Because for a branch the jmp_pc field is also a 
flag.

We could simply jump to jmp_pc like it was done before but then we miss 
the block chaining feature.
And I have not seen another way implementing it.




>
> r~

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 09/13] target-openrisc: Add CPU which neglects Carry and Overflow Flag
  2013-10-30 18:14   ` Richard Henderson
@ 2013-10-30 19:22     ` Sebastian Macke
  2013-10-30 19:31       ` Richard Henderson
  0 siblings, 1 reply; 48+ messages in thread
From: Sebastian Macke @ 2013-10-30 19:22 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, proljc

On 30/10/2013 11:14 AM, Richard Henderson wrote:
> On 10/29/2013 12:04 PM, Sebastian Macke wrote:
>> The carry and overflag and the instructions l.addc and l.addic
>> are never used in the toolchain. Linux and gcc compiled software
>> don't need them.
> Really?  That's quite surprising.

Yes, Really.  :)

>> To speed up the emulation a cpu was added which
>> neglects the flags for l.addi, l.add, l.sub and
>> generates an illegal instruction error for l.addic and l.addc
> I'm somewhat shocked that l.addc is never used.  To me that points
> to a missed opportunity in the compiler.
Yes

> It would be much better to simply improve handling of these bits.
> In a previous patch set you broke out SR[F] to its own variable;
> I suggest that you do the same for SR[CY] and SR[OV].
>
> If you can implement add et al without branches, the TCG optimizer
> will be able to do a good job eliminating shadowed computation.
>
> A good example to follow here is the ARM implementation.  Have a
> look at the gen_add_CC and gen_sub_CC functions especially.  Note
> that the overflow bit is stored in bit 31 of cpu_VF and the other
> bits of cpu_VF are ignored.
>
>
> r~

I would like to implement lazy flags or at least change the big bunch of 
code which is there right now.
But for this patchset I want to keep it that way and change the whole 
flag handling in one of the next patches.

There is almost nothing available to test the flags right now. And I 
have to keep an eye on the delayed slot. So this is more work than you 
might expect.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 09/13] target-openrisc: Add CPU which neglects Carry and Overflow Flag
  2013-10-30 19:22     ` Sebastian Macke
@ 2013-10-30 19:31       ` Richard Henderson
  0 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2013-10-30 19:31 UTC (permalink / raw)
  To: Sebastian Macke; +Cc: qemu-devel, proljc

On 10/30/2013 12:22 PM, Sebastian Macke wrote:
> I would like to implement lazy flags or at least change the big bunch of code
> which is there right now.
> But for this patchset I want to keep it that way and change the whole flag
> handling in one of the next patches.

If you don't want to modernize the flags handling now, fine, but I think adding
this separate cpu definition is a step in the wrong direction.


r~

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 11/13] target-openrisc: use jmp_pc as flag variable for branches
  2013-10-30 19:07     ` Sebastian Macke
@ 2013-10-30 19:32       ` Richard Henderson
  2013-10-30 19:47       ` Richard Henderson
  1 sibling, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2013-10-30 19:32 UTC (permalink / raw)
  To: Sebastian Macke; +Cc: openrisc, openrisc, qemu-devel, proljc

On 10/30/2013 12:07 PM, Sebastian Macke wrote:
> 
> Correct. The movcond function can be used here. I will change that.
> But I didn't know that it exists because it is not written in the document
> http://wiki.qemu.org/Documentation/TCG/frontend-ops

I had no idea that existed.

The canonical documentation is in tcg/README.


r~

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 11/13] target-openrisc: use jmp_pc as flag variable for branches
  2013-10-30 19:07     ` Sebastian Macke
  2013-10-30 19:32       ` Richard Henderson
@ 2013-10-30 19:47       ` Richard Henderson
  2013-10-30 21:08         ` Sebastian Macke
  1 sibling, 1 reply; 48+ messages in thread
From: Richard Henderson @ 2013-10-30 19:47 UTC (permalink / raw)
  To: Sebastian Macke; +Cc: openrisc, qemu-devel, proljc

On 10/30/2013 12:07 PM, Sebastian Macke wrote:
>>>       case JUMP_BRANCH:
>>>           {
>>>               int l1 = gen_new_label();
>>> -            tcg_gen_brcondi_tl(TCG_COND_NE, dc->btaken, 0, l1);
>>> +            tcg_gen_brcondi_tl(TCG_COND_NE, jmp_pc, 0, l1);
>>>               gen_goto_tb(dc, 1, dc->pc);
>>>               gen_set_label(l1);
>>> -            tcg_temp_free(dc->btaken);
>>>               gen_goto_tb(dc, 0, dc->j_target);
>>>               break;
>> ... here.
> 
> But j_target is not known when the delayed slot is translated separately. (E.g.
> if the delayed slot is at a page boundary.)

Hmm.  This was just guesswork on my part since I don't have a tree
with your previous patch set applied; j_target of course doesn't
exist in master.

Do you have a publicly accessible tree with all your patches applied?
I'd like to re-read the logic in the proper context.


r~

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 11/13] target-openrisc: use jmp_pc as flag variable for branches
  2013-10-30 19:47       ` Richard Henderson
@ 2013-10-30 21:08         ` Sebastian Macke
  2013-10-30 22:02           ` Richard Henderson
  0 siblings, 1 reply; 48+ messages in thread
From: Sebastian Macke @ 2013-10-30 21:08 UTC (permalink / raw)
  To: Richard Henderson; +Cc: openrisc, qemu-devel, proljc

On 30/10/2013 12:47 PM, Richard Henderson wrote:
> On 10/30/2013 12:07 PM, Sebastian Macke wrote:
>>>>        case JUMP_BRANCH:
>>>>            {
>>>>                int l1 = gen_new_label();
>>>> -            tcg_gen_brcondi_tl(TCG_COND_NE, dc->btaken, 0, l1);
>>>> +            tcg_gen_brcondi_tl(TCG_COND_NE, jmp_pc, 0, l1);
>>>>                gen_goto_tb(dc, 1, dc->pc);
>>>>                gen_set_label(l1);
>>>> -            tcg_temp_free(dc->btaken);
>>>>                gen_goto_tb(dc, 0, dc->j_target);
>>>>                break;
>>> ... here.
>> But j_target is not known when the delayed slot is translated separately. (E.g.
>> if the delayed slot is at a page boundary.)
> Hmm.  This was just guesswork on my part since I don't have a tree
> with your previous patch set applied; j_target of course doesn't
> exist in master.
>
> Do you have a publicly accessible tree with all your patches applied?
> I'd like to re-read the logic in the proper context.
After you are the second who demanded it:
https://github.com/s-macke/qemu/tree/or32-optimize

>
> r~

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 11/13] target-openrisc: use jmp_pc as flag variable for branches
  2013-10-30 21:08         ` Sebastian Macke
@ 2013-10-30 22:02           ` Richard Henderson
  2013-10-31  0:29             ` Sebastian Macke
  0 siblings, 1 reply; 48+ messages in thread
From: Richard Henderson @ 2013-10-30 22:02 UTC (permalink / raw)
  To: Sebastian Macke; +Cc: openrisc, qemu-devel, proljc

On 10/30/2013 02:08 PM, Sebastian Macke wrote:
>> Do you have a publicly accessible tree with all your patches applied?
>> I'd like to re-read the logic in the proper context.
> After you are the second who demanded it:
> https://github.com/s-macke/qemu/tree/or32-optimize

Ok, the logic as written is correct as far as I can see.

It's a little convoluted though, and I think there may be a way to
streamline it.  But I'll have to think about that some more.


r~

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 11/13] target-openrisc: use jmp_pc as flag variable for branches
  2013-10-30 22:02           ` Richard Henderson
@ 2013-10-31  0:29             ` Sebastian Macke
  0 siblings, 0 replies; 48+ messages in thread
From: Sebastian Macke @ 2013-10-31  0:29 UTC (permalink / raw)
  To: Richard Henderson; +Cc: openrisc, qemu-devel, proljc

On 30/10/2013 3:02 PM, Richard Henderson wrote:
> On 10/30/2013 02:08 PM, Sebastian Macke wrote:
>>> Do you have a publicly accessible tree with all your patches applied?
>>> I'd like to re-read the logic in the proper context.
>> After you are the second who demanded it:
>> https://github.com/s-macke/qemu/tree/or32-optimize
> Ok, the logic as written is correct as far as I can see.
>
> It's a little convoluted though, and I think there may be a way to
> streamline it.  But I'll have to think about that some more.
>
>
> r~

Yeah it is. Because of the delayed slot and the way QEMU is doing its 
translation it is hard to see and distinguish all different code paths. 
When is which information available.

At the moment the biggest time eater and most complex code part is the 
branching part.

1. l.sf.....  <- set flag if condition is fullfiled (setcond instruction)
2. l.bf  <- branch if flag  (a brcond instruction or with the last 
suggestion you gave a movcond instruction)
3. delayed slot instruction which could fail.
4. Actual jump. We need a branch here for the two different slots for 
chaining. And we need all information about point 2)

So at the moment we put three branches/conditions in the translated 
code. One setcond, one movcond and one brcond.

In principle point 1 and 2 could be fused with some coding effort. But I 
am not sure if one brcond is faster then one setcond+movcond.

And maybe the branching in No 4. could be avoided by some internal code 
change in the way QEMU does its block chaining.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 00/13] target-openrisc: More optimizations and corrections
  2013-10-29 21:22   ` Sebastian Macke
@ 2013-10-31 11:47     ` Jia Liu
  0 siblings, 0 replies; 48+ messages in thread
From: Jia Liu @ 2013-10-31 11:47 UTC (permalink / raw)
  To: Sebastian Macke
  Cc: Peter Maydell, openrisc, qemu-devel, Max Filippov, openrisc,
	Richard Henderson

Hi Sebastian,

On Wed, Oct 30, 2013 at 5:22 AM, Sebastian Macke <sebastian@macke.de> wrote:
> On 29/10/2013 2:15 PM, Max Filippov wrote:
>>
>> On Tue, Oct 29, 2013 at 11:04 PM, Sebastian Macke <sebastian@macke.de>
>> wrote:
>>>
>>> Hi,
>>>
>>> This is the second part of the patches to make the openrisc target faster
>>> and more reliable.
>>
>> Hi Sebastian,
>>
>> this series doesn't apply cleanly to the current qemu git head,
>> what tree is it based on?
>>
>
> It is based on the current master tree but depends on the patches I send on
> the 21st October.
> They got accepted by the maintainer but are not in the master tree right
> now.

Your previous patch set separate SR is a little different from Arch
spec, you said
it will speed up some instructions, I think maybe it is a good reason
and acceptable.
But this one, going a little more too far, maybe you should consider on
the comment from Max, Peter and Richard.

More people comment your patch, make your code more better :)

Regards,
Jia

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 05/13] target-openrisc: Remove TLB flush on exception
  2013-10-29 22:41     ` Sebastian Macke
@ 2013-11-01 18:58       ` Peter Maydell
  2013-11-02  1:21         ` Richard Henderson
  2013-11-02  1:29       ` [Qemu-devel] " Richard Henderson
  1 sibling, 1 reply; 48+ messages in thread
From: Peter Maydell @ 2013-11-01 18:58 UTC (permalink / raw)
  To: Sebastian Macke
  Cc: openrisc, Richard Henderson, openrisc, QEMU Developers, Ethan Hunt

On 29 October 2013 22:41, Sebastian Macke <sebastian@macke.de> wrote:
> On 29/10/2013 12:47 PM, Peter Maydell wrote:
>>
>> On 29 October 2013 19:04, Sebastian Macke <sebastian@macke.de> wrote:
>>>   /* Internal flags, delay slot flag */
>>> -#define D_FLAG    1
>>> +#define D_FLAG    2
>>
>> Since this set of #defines effectively is the documentation for
>> what the tb_flags usage is, can you update it to include the
>> new flag you've added, please?
>
>
> I will. I think I have done it in one of the later patches.
> But the D_FLAG was there before. What I did was just changing it to 2
> because 1 is used by the new SR_SM
> (supervisor mode) Flag.

Yes, that's what I mean -- this patch is adding a new flag
(because it's changing cpu_get_tb_cpu_state) and so it should
also be updating this set of #defines.

>> It looks suspicious that this patch doesn't include any change to
>> translate.c which reads the tb flag you've just added. Either:
>>   (a) the translated code doesn't actually build in any dependencies
>>    on the SR_SM flag, in which case it doesn't need to be a tb_flag at all
>>   (b) the translated code is still referring directly to env->sr
>> somewhere,
>>    in which case it needs to be changed to use the tb_flags version
>> instead
>>
>> Also, are you sure that tlb_flush() is needed purely for the change to the
>> SR_SM flags and not for any of the other CPU state changes that
>> openrisc_cpu_do_interrupt() is making when it does the user->supervisor
>> state change?

> The exception is going into supervisor mode and disables the mmu. The
> mmu_index is changed and it should work.
> But then the emulated Linux crashes.
> This does not happen when I add the supervisor mode flag to the tb_flags.

> What is included in the tb hash? The virtual pc + physical page + the
> tb_flags? Not the mmu_index?

You're right that the mmu_index is not included in the tb hash.
Does that mean that the CPU state which determines the
mmu_index needs to be in the tb_flags? I'm not sure and it's
not something I'd thought about before. Richard -- do you know?

thanks
-- PMM

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 05/13] target-openrisc: Remove TLB flush on exception
  2013-11-01 18:58       ` Peter Maydell
@ 2013-11-02  1:21         ` Richard Henderson
  2013-11-06 22:59           ` [Qemu-devel] [Openrisc] " Edgar E. Iglesias
  0 siblings, 1 reply; 48+ messages in thread
From: Richard Henderson @ 2013-11-02  1:21 UTC (permalink / raw)
  To: Peter Maydell, Sebastian Macke
  Cc: openrisc, openrisc, QEMU Developers, Ethan Hunt

On 11/01/2013 11:58 AM, Peter Maydell wrote:
>> > What is included in the tb hash? The virtual pc + physical page + the
>> > tb_flags? Not the mmu_index?
> You're right that the mmu_index is not included in the tb hash.
> Does that mean that the CPU state which determines the
> mmu_index needs to be in the tb_flags? I'm not sure and it's
> not something I'd thought about before. Richard -- do you know?

Normally the supervisor/hypervisor/whatever bit(s) is present in
the tb_flags, and the mmu_index ought to be computed from that.

That said, what normally happens is that we re-use cpu_mmu_index,
which looks at env, which is technically wrong.  But there's an
assumption that the bits we're examining in env have been copied
to tb_flags, so the data is actually in sync.


r~

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 05/13] target-openrisc: Remove TLB flush on exception
  2013-10-29 22:41     ` Sebastian Macke
  2013-11-01 18:58       ` Peter Maydell
@ 2013-11-02  1:29       ` Richard Henderson
  1 sibling, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2013-11-02  1:29 UTC (permalink / raw)
  To: Sebastian Macke, Peter Maydell
  Cc: openrisc, openrisc, QEMU Developers, Ethan Hunt

On 10/29/2013 03:41 PM, Sebastian Macke wrote:
> 
> What is included in the tb hash? The virtual pc + physical page + the tb_flags?
> Not the mmu_index?

What's included is everything you return from cpu_get_tb_cpu_state.

Note that cs_base is an interesting case.  On i386 real mode, it's
what the name implies -- the code segment base.  On sparc, we (ab)use
it to handle an insn beginning from a delay slot.

(On Sparc, pc is the current insn, and npc is the next insn.  For
straight-line code, npc = pc + 4.  After a branch, npc is the branch
target.  After every insn, the cpu copies pc = npc.)

See my response to Peter for info re mmu_index.


r~

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [Openrisc] [PATCH 05/13] target-openrisc: Remove TLB flush on exception
  2013-11-02  1:21         ` Richard Henderson
@ 2013-11-06 22:59           ` Edgar E. Iglesias
  0 siblings, 0 replies; 48+ messages in thread
From: Edgar E. Iglesias @ 2013-11-06 22:59 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Sebastian Macke, Peter Maydell, Ethan Hunt, QEMU Developers,
	openrisc, openrisc

On Fri, Nov 01, 2013 at 06:21:14PM -0700, Richard Henderson wrote:
> On 11/01/2013 11:58 AM, Peter Maydell wrote:
> >> > What is included in the tb hash? The virtual pc + physical page + the
> >> > tb_flags? Not the mmu_index?
> > You're right that the mmu_index is not included in the tb hash.
> > Does that mean that the CPU state which determines the
> > mmu_index needs to be in the tb_flags? I'm not sure and it's
> > not something I'd thought about before. Richard -- do you know?
> 
> Normally the supervisor/hypervisor/whatever bit(s) is present in
> the tb_flags, and the mmu_index ought to be computed from that.
> 
> That said, what normally happens is that we re-use cpu_mmu_index,
> which looks at env, which is technically wrong.  But there's an
> assumption that the bits we're examining in env have been copied
> to tb_flags, so the data is actually in sync.
> 

That's right, I tripped over this with the microblaze port some
years ago. It might be a good idea to add an --enable-debug enabled
check that asserts the consistency of tb_flags and current mmu_index.

Cheers,
Edgar

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2013-11-06 22:59 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-29 19:04 [Qemu-devel] [PATCH 00/13] target-openrisc: More optimizations and corrections Sebastian Macke
2013-10-29 19:04 ` [Qemu-devel] [PATCH 01/13] target-openrisc: Implement translation block chaining Sebastian Macke
2013-10-29 19:04 ` [Qemu-devel] [PATCH 02/13] target-openrisc: Separate Delayed slot handling from main loop Sebastian Macke
2013-10-29 19:04 ` [Qemu-devel] [PATCH 03/13] target-openrisc: Separate of load/store instructions Sebastian Macke
2013-10-29 20:05   ` Max Filippov
2013-10-29 21:36     ` Sebastian Macke
2013-10-29 21:49       ` Richard Henderson
2013-10-29 22:55       ` Max Filippov
2013-10-29 23:37         ` Sebastian Macke
2013-10-29 19:04 ` [Qemu-devel] [PATCH 04/13] target-openrisc: sync flags only when necessary Sebastian Macke
2013-10-29 21:51   ` Richard Henderson
2013-10-29 19:04 ` [Qemu-devel] [PATCH 05/13] target-openrisc: Remove TLB flush on exception Sebastian Macke
2013-10-29 19:47   ` Peter Maydell
2013-10-29 22:41     ` Sebastian Macke
2013-11-01 18:58       ` Peter Maydell
2013-11-02  1:21         ` Richard Henderson
2013-11-06 22:59           ` [Qemu-devel] [Openrisc] " Edgar E. Iglesias
2013-11-02  1:29       ` [Qemu-devel] " Richard Henderson
2013-10-29 19:04 ` [Qemu-devel] [PATCH 06/13] target-openrisc: Remove TLB flush from l.rfe instruction Sebastian Macke
2013-10-29 21:01   ` Max Filippov
2013-10-29 21:53     ` Sebastian Macke
2013-10-29 22:20       ` Max Filippov
2013-10-29 23:14         ` Sebastian Macke
2013-10-29 19:04 ` [Qemu-devel] [PATCH 07/13] target-openrisc: Correct l.cmov conditional check Sebastian Macke
2013-10-29 21:15   ` Max Filippov
2013-10-29 21:23     ` Sebastian Macke
2013-10-29 19:04 ` [Qemu-devel] [PATCH 08/13] target-openrisc: Test for Overflow exception statically Sebastian Macke
2013-10-29 21:25   ` Max Filippov
2013-10-29 22:06     ` Sebastian Macke
2013-10-29 19:04 ` [Qemu-devel] [PATCH 09/13] target-openrisc: Add CPU which neglects Carry and Overflow Flag Sebastian Macke
2013-10-30 18:14   ` Richard Henderson
2013-10-30 19:22     ` Sebastian Macke
2013-10-30 19:31       ` Richard Henderson
2013-10-29 19:04 ` [Qemu-devel] [PATCH 10/13] target-openrisc: Correct target number for 64 bit llseek Sebastian Macke
2013-10-29 19:04 ` [Qemu-devel] [PATCH 11/13] target-openrisc: use jmp_pc as flag variable for branches Sebastian Macke
2013-10-30 18:33   ` Richard Henderson
2013-10-30 19:07     ` Sebastian Macke
2013-10-30 19:32       ` Richard Henderson
2013-10-30 19:47       ` Richard Henderson
2013-10-30 21:08         ` Sebastian Macke
2013-10-30 22:02           ` Richard Henderson
2013-10-31  0:29             ` Sebastian Macke
2013-10-29 19:04 ` [Qemu-devel] [PATCH 12/13] target-openrisc: Add correct gdb information for the pc value Sebastian Macke
2013-10-29 19:04 ` [Qemu-devel] [PATCH 13/13] target-openrisc: Add In-circuit emulator support Sebastian Macke
2013-10-29 19:53 ` [Qemu-devel] [PATCH 00/13] target-openrisc: More optimizations and corrections Peter Maydell
2013-10-29 21:15 ` Max Filippov
2013-10-29 21:22   ` Sebastian Macke
2013-10-31 11:47     ` Jia Liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.