All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v3 00/30] target/sh4 improvements
@ 2017-07-18 20:02 Richard Henderson
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 01/30] target/sh4: Use cmpxchg for movco Richard Henderson
                   ` (29 more replies)
  0 siblings, 30 replies; 34+ messages in thread
From: Richard Henderson @ 2017-07-18 20:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

The big ticket item in this series is the support for
user-space atomics.  But a lot of other cleanup has
crept in as well.

Changes since v2 incorporate feedback from Aurelien.
I've tried to remember to add individual changelogs
to the patches, but I may have forgotten some.

I do now include the linux-user reserved_va changes
that I posted subsequent to posting v2.

I believe there are only 4 patches that have not 
seen a Reviewed-by yet.


r~


Richard Henderson (30):
  target/sh4: Use cmpxchg for movco
  target/sh4: Consolidate end-of-TB tests
  target/sh4: Introduce TB_FLAG_ENVFLAGS_MASK
  target/sh4: Keep env->flags clean
  target/sh4: Adjust TB_FLAG_PENDING_MOVCA
  target/sh4: Handle user-space atomics
  target/sh4: Recognize common gUSA sequences
  linux-user/sh4: Notice gUSA regions during signal delivery
  linux-user/sh4: Clean env->flags on signal boundaries
  target/sh4: Hoist register bank selection
  target/sh4: Unify cpu_fregs into FREG
  target/sh4: Pass DisasContext to fpr64 routines
  target/sh4: Hoist fp register bank selection
  target/sh4: Eliminate unused XREG macro
  target/sh4: Merge DREG into fpr64 routines
  target/sh4: Load/store Dr as 64-bit quantities
  target/sh4: Simplify 64-bit fp reg-reg move
  target/sh4: Unify code for CHECK_NOT_DELAY_SLOT
  target/sh4: Unify code for CHECK_PRIVILEGED
  target/sh4: Unify code for CHECK_FPU_ENABLED
  target/sh4: Tidy misc illegal insn checks
  target/sh4: Introduce CHECK_FPSCR_PR_*
  target/sh4: Introduce CHECK_SH4A
  target/sh4: Implement fpchg
  target/sh4: Add missing FPSCR.PR == 0 checks
  target/sh4: Implement fsrra
  target/sh4: Use tcg_gen_lookup_and_goto_ptr
  tcg: Fix off-by-one in assert in page_set_flags
  linux-user: Tidy and enforce reserved_va initialization
  linux-user/sh4: Reduce TARGET_VIRT_ADDR_SPACE_BITS to 31

 linux-user/arm/target_cpu.h |   4 +
 target/mips/mips-defs.h     |   6 +-
 target/nios2/cpu.h          |   6 +-
 target/sh4/cpu.h            |  33 +-
 target/sh4/helper.h         |   2 +
 accel/tcg/translate-all.c   |   2 +-
 linux-user/main.c           |  39 +-
 linux-user/signal.c         |  33 ++
 target/sh4/cpu.c            |   2 +-
 target/sh4/helper.c         |   2 +-
 target/sh4/op_helper.c      |  22 +
 target/sh4/translate.c      | 975 ++++++++++++++++++++++++++++++++------------
 12 files changed, 839 insertions(+), 287 deletions(-)

-- 
2.9.4

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v3 01/30] target/sh4: Use cmpxchg for movco
  2017-07-18 20:02 [Qemu-devel] [PATCH v3 00/30] target/sh4 improvements Richard Henderson
@ 2017-07-18 20:02 ` Richard Henderson
  2017-07-18 20:19   ` Aurelien Jarno
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 02/30] target/sh4: Consolidate end-of-TB tests Richard Henderson
                   ` (28 subsequent siblings)
  29 siblings, 1 reply; 34+ messages in thread
From: Richard Henderson @ 2017-07-18 20:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

As for other targets, cmpxchg isn't quite right for ll/sc,
suffering from an ABA race, but is sufficient to implement
portable atomic operations.

Signed-off-by: Richard Henderson <rth@twiddle.net>

---
V2: Clear lock_addr in rte, do_interrupt, syscall entry, & signal delivery.
    Fix movli to tollerate overlap between R0 and REG(B11_8).
---
 target/sh4/cpu.h       |  3 ++-
 linux-user/main.c      |  1 +
 linux-user/signal.c    |  2 ++
 target/sh4/helper.c    |  2 +-
 target/sh4/translate.c | 72 +++++++++++++++++++++++++++++---------------------
 5 files changed, 48 insertions(+), 32 deletions(-)

diff --git a/target/sh4/cpu.h b/target/sh4/cpu.h
index ffb9168..b15116e 100644
--- a/target/sh4/cpu.h
+++ b/target/sh4/cpu.h
@@ -169,7 +169,8 @@ typedef struct CPUSH4State {
     tlb_t itlb[ITLB_SIZE];	/* instruction translation table */
     tlb_t utlb[UTLB_SIZE];	/* unified translation table */
 
-    uint32_t ldst;
+    uint32_t lock_addr;
+    uint32_t lock_value;
 
     /* Fields up to this point are cleared by a CPU reset */
     struct {} end_reset_fields;
diff --git a/linux-user/main.c b/linux-user/main.c
index ad03c9e..30f0ae1 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -2673,6 +2673,7 @@ void cpu_loop(CPUSH4State *env)
         switch (trapnr) {
         case 0x160:
             env->pc += 2;
+            env->lock_addr = -1;
             ret = do_syscall(env,
                              env->gregs[3],
                              env->gregs[4],
diff --git a/linux-user/signal.c b/linux-user/signal.c
index 3d18d1b..ddfd75c 100644
--- a/linux-user/signal.c
+++ b/linux-user/signal.c
@@ -3566,6 +3566,7 @@ static void setup_frame(int sig, struct target_sigaction *ka,
     regs->gregs[5] = 0;
     regs->gregs[6] = frame_addr += offsetof(typeof(*frame), sc);
     regs->pc = (unsigned long) ka->_sa_handler;
+    regs->lock_addr = -1;
 
     unlock_user_struct(frame, frame_addr, 1);
     return;
@@ -3626,6 +3627,7 @@ static void setup_rt_frame(int sig, struct target_sigaction *ka,
     regs->gregs[5] = frame_addr + offsetof(typeof(*frame), info);
     regs->gregs[6] = frame_addr + offsetof(typeof(*frame), uc);
     regs->pc = (unsigned long) ka->_sa_handler;
+    regs->lock_addr = -1;
 
     unlock_user_struct(frame, frame_addr, 1);
     return;
diff --git a/target/sh4/helper.c b/target/sh4/helper.c
index 28d93c2..df7c000 100644
--- a/target/sh4/helper.c
+++ b/target/sh4/helper.c
@@ -87,7 +87,6 @@ void superh_cpu_do_interrupt(CPUState *cs)
     int do_exp, irq_vector = cs->exception_index;
 
     /* prioritize exceptions over interrupts */
-
     do_exp = cs->exception_index != -1;
     do_irq = do_irq && (cs->exception_index == -1);
 
@@ -171,6 +170,7 @@ void superh_cpu_do_interrupt(CPUState *cs)
     env->spc = env->pc;
     env->sgr = env->gregs[15];
     env->sr |= (1u << SR_BL) | (1u << SR_MD) | (1u << SR_RB);
+    env->lock_addr = -1;
 
     if (env->flags & DELAY_SLOT_MASK) {
         /* Branch instruction should be executed again before delay slot. */
diff --git a/target/sh4/translate.c b/target/sh4/translate.c
index 4c3512f..45f7661 100644
--- a/target/sh4/translate.c
+++ b/target/sh4/translate.c
@@ -68,7 +68,8 @@ static TCGv cpu_gregs[24];
 static TCGv cpu_sr, cpu_sr_m, cpu_sr_q, cpu_sr_t;
 static TCGv cpu_pc, cpu_ssr, cpu_spc, cpu_gbr;
 static TCGv cpu_vbr, cpu_sgr, cpu_dbr, cpu_mach, cpu_macl;
-static TCGv cpu_pr, cpu_fpscr, cpu_fpul, cpu_ldst;
+static TCGv cpu_pr, cpu_fpscr, cpu_fpul;
+static TCGv cpu_lock_addr, cpu_lock_value;
 static TCGv cpu_fregs[32];
 
 /* internal register indexes */
@@ -151,8 +152,12 @@ void sh4_translate_init(void)
                                               offsetof(CPUSH4State,
                                                        delayed_cond),
                                               "_delayed_cond_");
-    cpu_ldst = tcg_global_mem_new_i32(cpu_env,
-				      offsetof(CPUSH4State, ldst), "_ldst_");
+    cpu_lock_addr = tcg_global_mem_new_i32(cpu_env,
+				           offsetof(CPUSH4State, lock_addr),
+                                           "_lock_addr_");
+    cpu_lock_value = tcg_global_mem_new_i32(cpu_env,
+				            offsetof(CPUSH4State, lock_value),
+                                            "_lock_value_");
 
     for (i = 0; i < 32; i++)
         cpu_fregs[i] = tcg_global_mem_new_i32(cpu_env,
@@ -430,6 +435,7 @@ static void _decode_opc(DisasContext * ctx)
 	CHECK_NOT_DELAY_SLOT
         gen_write_sr(cpu_ssr);
 	tcg_gen_mov_i32(cpu_delayed_pc, cpu_spc);
+        tcg_gen_movi_i32(cpu_lock_addr, -1);
         ctx->envflags |= DELAY_SLOT_RTE;
 	ctx->delayed_pc = (uint32_t) - 1;
         ctx->bstate = BS_STOP;
@@ -1527,35 +1533,41 @@ static void _decode_opc(DisasContext * ctx)
         tcg_gen_mov_i32(REG(B11_8), cpu_sr_t);
 	return;
     case 0x0073:
-        /* MOVCO.L
-	       LDST -> T
-               If (T == 1) R0 -> (Rn)
-               0 -> LDST
-        */
+        /* MOVCO.L: if (lock still held) R0 -> (Rn), T=1; else T=0.
+           Approximate "lock still held" with a comparison of address
+           from the MOVLI insn and a cmpxchg with the value read.  */
         if (ctx->features & SH_FEATURE_SH4A) {
-            TCGLabel *label = gen_new_label();
-            tcg_gen_mov_i32(cpu_sr_t, cpu_ldst);
-	    tcg_gen_brcondi_i32(TCG_COND_EQ, cpu_ldst, 0, label);
-            tcg_gen_qemu_st_i32(REG(0), REG(B11_8), ctx->memidx, MO_TEUL);
-	    gen_set_label(label);
-	    tcg_gen_movi_i32(cpu_ldst, 0);
-	    return;
-	} else
-	    break;
+            TCGLabel *fail = gen_new_label();
+            TCGLabel *done = gen_new_label();
+
+            tcg_gen_brcond_i32(TCG_COND_NE, REG(B11_8), cpu_lock_addr, fail);
+
+            tcg_gen_atomic_cmpxchg_i32(cpu_sr_t, REG(B11_8), cpu_lock_value,
+                                       REG(0), ctx->memidx, MO_TEUL);
+            tcg_gen_setcond_i32(TCG_COND_EQ, cpu_sr_t,
+                                cpu_sr_t, cpu_lock_value);
+            tcg_gen_br(done);
+
+            gen_set_label(fail);
+            tcg_gen_movi_i32(cpu_sr_t, 0);
+
+            gen_set_label(done);
+            tcg_gen_movi_i32(cpu_lock_addr, -1);
+            return;
+        } else {
+            break;
+        }
     case 0x0063:
-        /* MOVLI.L @Rm,R0
-               1 -> LDST
-               (Rm) -> R0
-               When interrupt/exception
-               occurred 0 -> LDST
-        */
-	if (ctx->features & SH_FEATURE_SH4A) {
-	    tcg_gen_movi_i32(cpu_ldst, 0);
-            tcg_gen_qemu_ld_i32(REG(0), REG(B11_8), ctx->memidx, MO_TESL);
-	    tcg_gen_movi_i32(cpu_ldst, 1);
-	    return;
-	} else
-	    break;
+        /* MOVLI.L @Rm -> R0, and remember the address and value loaded.  */
+        if (ctx->features & SH_FEATURE_SH4A) {
+            tcg_gen_qemu_ld_i32(cpu_lock_value, REG(B11_8),
+                                ctx->memidx, MO_TESL);
+            tcg_gen_mov_i32(cpu_lock_addr, REG(B11_8));
+            tcg_gen_mov_i32(REG(0), cpu_lock_value);
+            return;
+        } else {
+            break;
+        }
     case 0x0093:		/* ocbi @Rn */
 	{
             gen_helper_ocbi(cpu_env, REG(B11_8));
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v3 02/30] target/sh4: Consolidate end-of-TB tests
  2017-07-18 20:02 [Qemu-devel] [PATCH v3 00/30] target/sh4 improvements Richard Henderson
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 01/30] target/sh4: Use cmpxchg for movco Richard Henderson
@ 2017-07-18 20:02 ` Richard Henderson
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 03/30] target/sh4: Introduce TB_FLAG_ENVFLAGS_MASK Richard Henderson
                   ` (27 subsequent siblings)
  29 siblings, 0 replies; 34+ messages in thread
From: Richard Henderson @ 2017-07-18 20:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

We can fold 3 different tests within the decode loop
into a more accurate computation of max_insns to start.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target/sh4/translate.c | 31 +++++++++++++++++--------------
 1 file changed, 17 insertions(+), 14 deletions(-)

diff --git a/target/sh4/translate.c b/target/sh4/translate.c
index 45f7661..8740ee3 100644
--- a/target/sh4/translate.c
+++ b/target/sh4/translate.c
@@ -1842,17 +1842,28 @@ void gen_intermediate_code(CPUSH4State * env, struct TranslationBlock *tb)
     ctx.features = env->features;
     ctx.has_movcal = (ctx.tbflags & TB_FLAG_PENDING_MOVCA);
 
-    num_insns = 0;
     max_insns = tb->cflags & CF_COUNT_MASK;
     if (max_insns == 0) {
         max_insns = CF_COUNT_MASK;
     }
-    if (max_insns > TCG_MAX_INSNS) {
-        max_insns = TCG_MAX_INSNS;
+    max_insns = MIN(max_insns, TCG_MAX_INSNS);
+
+    /* Since the ISA is fixed-width, we can bound by the number
+       of instructions remaining on the page.  */
+    num_insns = -(ctx.pc | TARGET_PAGE_MASK) / 2;
+    max_insns = MIN(max_insns, num_insns);
+
+    /* Single stepping means just that.  */
+    if (ctx.singlestep_enabled || singlestep) {
+        max_insns = 1;
     }
 
     gen_tb_start(tb);
-    while (ctx.bstate == BS_NONE && !tcg_op_buf_full()) {
+    num_insns = 0;
+
+    while (ctx.bstate == BS_NONE
+           && num_insns < max_insns
+           && !tcg_op_buf_full()) {
         tcg_gen_insn_start(ctx.pc, ctx.envflags);
         num_insns++;
 
@@ -1876,18 +1887,10 @@ void gen_intermediate_code(CPUSH4State * env, struct TranslationBlock *tb)
         ctx.opcode = cpu_lduw_code(env, ctx.pc);
 	decode_opc(&ctx);
 	ctx.pc += 2;
-	if ((ctx.pc & (TARGET_PAGE_SIZE - 1)) == 0)
-	    break;
-        if (cs->singlestep_enabled) {
-	    break;
-        }
-        if (num_insns >= max_insns)
-            break;
-        if (singlestep)
-            break;
     }
-    if (tb->cflags & CF_LAST_IO)
+    if (tb->cflags & CF_LAST_IO) {
         gen_io_end();
+    }
     if (cs->singlestep_enabled) {
         gen_save_cpu_state(&ctx, true);
         gen_helper_debug(cpu_env);
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v3 03/30] target/sh4: Introduce TB_FLAG_ENVFLAGS_MASK
  2017-07-18 20:02 [Qemu-devel] [PATCH v3 00/30] target/sh4 improvements Richard Henderson
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 01/30] target/sh4: Use cmpxchg for movco Richard Henderson
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 02/30] target/sh4: Consolidate end-of-TB tests Richard Henderson
@ 2017-07-18 20:02 ` Richard Henderson
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 04/30] target/sh4: Keep env->flags clean Richard Henderson
                   ` (26 subsequent siblings)
  29 siblings, 0 replies; 34+ messages in thread
From: Richard Henderson @ 2017-07-18 20:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

We'll be putting more things into this bitmask soon.
Let's have a name that covers all possible uses.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target/sh4/cpu.h       | 4 +++-
 target/sh4/translate.c | 4 ++--
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/target/sh4/cpu.h b/target/sh4/cpu.h
index b15116e..240ed36 100644
--- a/target/sh4/cpu.h
+++ b/target/sh4/cpu.h
@@ -96,6 +96,8 @@
 #define DELAY_SLOT_CONDITIONAL (1 << 1)
 #define DELAY_SLOT_RTE         (1 << 2)
 
+#define TB_FLAG_ENVFLAGS_MASK  DELAY_SLOT_MASK
+
 typedef struct tlb_t {
     uint32_t vpn;		/* virtual page number */
     uint32_t ppn;		/* physical page number */
@@ -389,7 +391,7 @@ static inline void cpu_get_tb_cpu_state(CPUSH4State *env, target_ulong *pc,
 {
     *pc = env->pc;
     *cs_base = 0;
-    *flags = (env->flags & DELAY_SLOT_MASK)                    /* Bits  0- 2 */
+    *flags = (env->flags & TB_FLAG_ENVFLAGS_MASK) /* Bits  0-2 */
             | (env->fpscr & (FPSCR_FR | FPSCR_SZ | FPSCR_PR))  /* Bits 19-21 */
             | (env->sr & ((1u << SR_MD) | (1u << SR_RB)))      /* Bits 29-30 */
             | (env->sr & (1u << SR_FD))                        /* Bit 15 */
diff --git a/target/sh4/translate.c b/target/sh4/translate.c
index 8740ee3..998860a 100644
--- a/target/sh4/translate.c
+++ b/target/sh4/translate.c
@@ -225,7 +225,7 @@ static inline void gen_save_cpu_state(DisasContext *ctx, bool save_pc)
     if (ctx->delayed_pc != (uint32_t) -1) {
         tcg_gen_movi_i32(cpu_delayed_pc, ctx->delayed_pc);
     }
-    if ((ctx->tbflags & DELAY_SLOT_MASK) != ctx->envflags) {
+    if ((ctx->tbflags & TB_FLAG_ENVFLAGS_MASK) != ctx->envflags) {
         tcg_gen_movi_i32(cpu_flags, ctx->envflags);
     }
 }
@@ -1831,7 +1831,7 @@ void gen_intermediate_code(CPUSH4State * env, struct TranslationBlock *tb)
     pc_start = tb->pc;
     ctx.pc = pc_start;
     ctx.tbflags = (uint32_t)tb->flags;
-    ctx.envflags = tb->flags & DELAY_SLOT_MASK;
+    ctx.envflags = tb->flags & TB_FLAG_ENVFLAGS_MASK;
     ctx.bstate = BS_NONE;
     ctx.memidx = (ctx.tbflags & (1u << SR_MD)) == 0 ? 1 : 0;
     /* We don't know if the delayed pc came from a dynamic or static branch,
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v3 04/30] target/sh4: Keep env->flags clean
  2017-07-18 20:02 [Qemu-devel] [PATCH v3 00/30] target/sh4 improvements Richard Henderson
                   ` (2 preceding siblings ...)
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 03/30] target/sh4: Introduce TB_FLAG_ENVFLAGS_MASK Richard Henderson
@ 2017-07-18 20:02 ` Richard Henderson
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 05/30] target/sh4: Adjust TB_FLAG_PENDING_MOVCA Richard Henderson
                   ` (25 subsequent siblings)
  29 siblings, 0 replies; 34+ messages in thread
From: Richard Henderson @ 2017-07-18 20:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

If we mask off any out-of-band bits before we assign to the
variable, then we don't need to clean it up when reading.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target/sh4/cpu.h | 2 +-
 target/sh4/cpu.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/sh4/cpu.h b/target/sh4/cpu.h
index 240ed36..6d179a7 100644
--- a/target/sh4/cpu.h
+++ b/target/sh4/cpu.h
@@ -391,7 +391,7 @@ static inline void cpu_get_tb_cpu_state(CPUSH4State *env, target_ulong *pc,
 {
     *pc = env->pc;
     *cs_base = 0;
-    *flags = (env->flags & TB_FLAG_ENVFLAGS_MASK) /* Bits  0-2 */
+    *flags = env->flags                                        /* Bits  0-2 */
             | (env->fpscr & (FPSCR_FR | FPSCR_SZ | FPSCR_PR))  /* Bits 19-21 */
             | (env->sr & ((1u << SR_MD) | (1u << SR_RB)))      /* Bits 29-30 */
             | (env->sr & (1u << SR_FD))                        /* Bit 15 */
diff --git a/target/sh4/cpu.c b/target/sh4/cpu.c
index 9da7e1e..8536f6d 100644
--- a/target/sh4/cpu.c
+++ b/target/sh4/cpu.c
@@ -39,7 +39,7 @@ static void superh_cpu_synchronize_from_tb(CPUState *cs, TranslationBlock *tb)
     SuperHCPU *cpu = SUPERH_CPU(cs);
 
     cpu->env.pc = tb->pc;
-    cpu->env.flags = tb->flags;
+    cpu->env.flags = tb->flags & TB_FLAG_ENVFLAGS_MASK;
 }
 
 static bool superh_cpu_has_work(CPUState *cs)
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v3 05/30] target/sh4: Adjust TB_FLAG_PENDING_MOVCA
  2017-07-18 20:02 [Qemu-devel] [PATCH v3 00/30] target/sh4 improvements Richard Henderson
                   ` (3 preceding siblings ...)
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 04/30] target/sh4: Keep env->flags clean Richard Henderson
@ 2017-07-18 20:02 ` Richard Henderson
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 06/30] target/sh4: Handle user-space atomics Richard Henderson
                   ` (24 subsequent siblings)
  29 siblings, 0 replies; 34+ messages in thread
From: Richard Henderson @ 2017-07-18 20:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

Don't leave an unused bit after DELAY_SLOT_MASK.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target/sh4/cpu.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/target/sh4/cpu.h b/target/sh4/cpu.h
index 6d179a7..da31805 100644
--- a/target/sh4/cpu.h
+++ b/target/sh4/cpu.h
@@ -96,6 +96,8 @@
 #define DELAY_SLOT_CONDITIONAL (1 << 1)
 #define DELAY_SLOT_RTE         (1 << 2)
 
+#define TB_FLAG_PENDING_MOVCA  (1 << 3)
+
 #define TB_FLAG_ENVFLAGS_MASK  DELAY_SLOT_MASK
 
 typedef struct tlb_t {
@@ -369,8 +371,6 @@ static inline int cpu_ptel_pr (uint32_t ptel)
 #define PTEA_TC        (1 << 3)
 #define cpu_ptea_tc(ptea) (((ptea) & PTEA_TC) >> 3)
 
-#define TB_FLAG_PENDING_MOVCA  (1 << 4)
-
 static inline target_ulong cpu_read_sr(CPUSH4State *env)
 {
     return env->sr | (env->sr_m << SR_M) |
@@ -395,7 +395,7 @@ static inline void cpu_get_tb_cpu_state(CPUSH4State *env, target_ulong *pc,
             | (env->fpscr & (FPSCR_FR | FPSCR_SZ | FPSCR_PR))  /* Bits 19-21 */
             | (env->sr & ((1u << SR_MD) | (1u << SR_RB)))      /* Bits 29-30 */
             | (env->sr & (1u << SR_FD))                        /* Bit 15 */
-            | (env->movcal_backup ? TB_FLAG_PENDING_MOVCA : 0); /* Bit 4 */
+            | (env->movcal_backup ? TB_FLAG_PENDING_MOVCA : 0); /* Bit 3 */
 }
 
 #endif /* SH4_CPU_H */
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v3 06/30] target/sh4: Handle user-space atomics
  2017-07-18 20:02 [Qemu-devel] [PATCH v3 00/30] target/sh4 improvements Richard Henderson
                   ` (4 preceding siblings ...)
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 05/30] target/sh4: Adjust TB_FLAG_PENDING_MOVCA Richard Henderson
@ 2017-07-18 20:02 ` Richard Henderson
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 07/30] target/sh4: Recognize common gUSA sequences Richard Henderson
                   ` (23 subsequent siblings)
  29 siblings, 0 replies; 34+ messages in thread
From: Richard Henderson @ 2017-07-18 20:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

For uniprocessors, SH4 uses optimistic restartable atomic sequences.
Upon an interrupt, a real kernel would simply notice magic values in
the registers and reset the PC to the start of the sequence.

For QEMU, we cannot do this in quite the same way.  Instead, we notice
the normal start of such a sequence (mov #-x,r15), and start a new TB
that can be executed under cpu_exec_step_atomic.

Reported-by: Bruno Haible  <bruno@clisp.org>
LP: https://bugs.launchpad.net/bugs/1701971
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>

---
V2: Tidy the arguments to gen_conditional_jump.
---
 target/sh4/cpu.h       |  18 +++++--
 target/sh4/helper.h    |   1 +
 target/sh4/op_helper.c |   6 +++
 target/sh4/translate.c | 138 ++++++++++++++++++++++++++++++++++++++++++++-----
 4 files changed, 148 insertions(+), 15 deletions(-)

diff --git a/target/sh4/cpu.h b/target/sh4/cpu.h
index da31805..e3abb6a 100644
--- a/target/sh4/cpu.h
+++ b/target/sh4/cpu.h
@@ -98,7 +98,18 @@
 
 #define TB_FLAG_PENDING_MOVCA  (1 << 3)
 
-#define TB_FLAG_ENVFLAGS_MASK  DELAY_SLOT_MASK
+#define GUSA_SHIFT             4
+#ifdef CONFIG_USER_ONLY
+#define GUSA_EXCLUSIVE         (1 << 12)
+#define GUSA_MASK              ((0xff << GUSA_SHIFT) | GUSA_EXCLUSIVE)
+#else
+/* Provide dummy versions of the above to allow tests against tbflags
+   to be elided while avoiding ifdefs.  */
+#define GUSA_EXCLUSIVE         0
+#define GUSA_MASK              0
+#endif
+
+#define TB_FLAG_ENVFLAGS_MASK  (DELAY_SLOT_MASK | GUSA_MASK)
 
 typedef struct tlb_t {
     uint32_t vpn;		/* virtual page number */
@@ -390,8 +401,9 @@ static inline void cpu_get_tb_cpu_state(CPUSH4State *env, target_ulong *pc,
                                         target_ulong *cs_base, uint32_t *flags)
 {
     *pc = env->pc;
-    *cs_base = 0;
-    *flags = env->flags                                        /* Bits  0-2 */
+    /* For a gUSA region, notice the end of the region.  */
+    *cs_base = env->flags & GUSA_MASK ? env->gregs[0] : 0;
+    *flags = env->flags /* TB_FLAG_ENVFLAGS_MASK: bits 0-2, 4-12 */
             | (env->fpscr & (FPSCR_FR | FPSCR_SZ | FPSCR_PR))  /* Bits 19-21 */
             | (env->sr & ((1u << SR_MD) | (1u << SR_RB)))      /* Bits 29-30 */
             | (env->sr & (1u << SR_FD))                        /* Bit 15 */
diff --git a/target/sh4/helper.h b/target/sh4/helper.h
index 767a6d5..6c6fa04 100644
--- a/target/sh4/helper.h
+++ b/target/sh4/helper.h
@@ -6,6 +6,7 @@ DEF_HELPER_1(raise_slot_fpu_disable, noreturn, env)
 DEF_HELPER_1(debug, noreturn, env)
 DEF_HELPER_1(sleep, noreturn, env)
 DEF_HELPER_2(trapa, noreturn, env, i32)
+DEF_HELPER_1(exclusive, noreturn, env)
 
 DEF_HELPER_3(movcal, void, env, i32, i32)
 DEF_HELPER_1(discard_movcal_backup, void, env)
diff --git a/target/sh4/op_helper.c b/target/sh4/op_helper.c
index c3d19b1..8513f38 100644
--- a/target/sh4/op_helper.c
+++ b/target/sh4/op_helper.c
@@ -115,6 +115,12 @@ void helper_trapa(CPUSH4State *env, uint32_t tra)
     raise_exception(env, 0x160, 0);
 }
 
+void helper_exclusive(CPUSH4State *env)
+{
+    /* We do not want cpu_restore_state to run.  */
+    cpu_loop_exit_atomic(ENV_GET_CPU(env), 0);
+}
+
 void helper_movcal(CPUSH4State *env, uint32_t address, uint32_t value)
 {
     if (cpu_sh4_is_cached (env, address))
diff --git a/target/sh4/translate.c b/target/sh4/translate.c
index 998860a..c5786cb 100644
--- a/target/sh4/translate.c
+++ b/target/sh4/translate.c
@@ -235,7 +235,9 @@ static inline bool use_goto_tb(DisasContext *ctx, target_ulong dest)
     if (unlikely(ctx->singlestep_enabled)) {
         return false;
     }
-
+    if (ctx->tbflags & GUSA_EXCLUSIVE) {
+        return false;
+    }
 #ifndef CONFIG_USER_ONLY
     return (ctx->tb->pc & TARGET_PAGE_MASK) == (dest & TARGET_PAGE_MASK);
 #else
@@ -274,28 +276,56 @@ static void gen_jump(DisasContext * ctx)
 }
 
 /* Immediate conditional jump (bt or bf) */
-static void gen_conditional_jump(DisasContext * ctx,
-				 target_ulong ift, target_ulong ifnott)
+static void gen_conditional_jump(DisasContext * ctx, target_ulong dest,
+                                 bool jump_if_true)
 {
     TCGLabel *l1 = gen_new_label();
+    TCGCond cond_not_taken = jump_if_true ? TCG_COND_EQ : TCG_COND_NE;
+
+    if (ctx->tbflags & GUSA_EXCLUSIVE) {
+        /* When in an exclusive region, we must continue to the end.
+           Therefore, exit the region on a taken branch, but otherwise
+           fall through to the next instruction.  */
+        tcg_gen_brcondi_i32(cond_not_taken, cpu_sr_t, 0, l1);
+        tcg_gen_movi_i32(cpu_flags, ctx->envflags & ~GUSA_MASK);
+        /* Note that this won't actually use a goto_tb opcode because we
+           disallow it in use_goto_tb, but it handles exit + singlestep.  */
+        gen_goto_tb(ctx, 0, dest);
+        gen_set_label(l1);
+        return;
+    }
+
     gen_save_cpu_state(ctx, false);
-    tcg_gen_brcondi_i32(TCG_COND_NE, cpu_sr_t, 0, l1);
-    gen_goto_tb(ctx, 0, ifnott);
+    tcg_gen_brcondi_i32(cond_not_taken, cpu_sr_t, 0, l1);
+    gen_goto_tb(ctx, 0, dest);
     gen_set_label(l1);
-    gen_goto_tb(ctx, 1, ift);
+    gen_goto_tb(ctx, 1, ctx->pc + 2);
     ctx->bstate = BS_BRANCH;
 }
 
 /* Delayed conditional jump (bt or bf) */
 static void gen_delayed_conditional_jump(DisasContext * ctx)
 {
-    TCGLabel *l1;
-    TCGv ds;
+    TCGLabel *l1 = gen_new_label();
+    TCGv ds = tcg_temp_new();
 
-    l1 = gen_new_label();
-    ds = tcg_temp_new();
     tcg_gen_mov_i32(ds, cpu_delayed_cond);
     tcg_gen_discard_i32(cpu_delayed_cond);
+
+    if (ctx->tbflags & GUSA_EXCLUSIVE) {
+        /* When in an exclusive region, we must continue to the end.
+           Therefore, exit the region on a taken branch, but otherwise
+           fall through to the next instruction.  */
+        tcg_gen_brcondi_i32(TCG_COND_EQ, ds, 0, l1);
+
+        /* Leave the gUSA region.  */
+        tcg_gen_movi_i32(cpu_flags, ctx->envflags & ~GUSA_MASK);
+        gen_jump(ctx);
+
+        gen_set_label(l1);
+        return;
+    }
+
     tcg_gen_brcondi_i32(TCG_COND_NE, ds, 0, l1);
     gen_goto_tb(ctx, 1, ctx->pc + 2);
     gen_set_label(l1);
@@ -481,6 +511,15 @@ static void _decode_opc(DisasContext * ctx)
 	}
 	return;
     case 0xe000:		/* mov #imm,Rn */
+#ifdef CONFIG_USER_ONLY
+        /* Detect the start of a gUSA region.  If so, update envflags
+           and end the TB.  This will allow us to see the end of the
+           region (stored in R0) in the next TB.  */
+        if (B11_8 == 15 && B7_0s < 0 && parallel_cpus) {
+            ctx->envflags = deposit32(ctx->envflags, GUSA_SHIFT, 8, B7_0s);
+            ctx->bstate = BS_STOP;
+        }
+#endif
 	tcg_gen_movi_i32(REG(B11_8), B7_0s);
 	return;
     case 0x9000:		/* mov.w @(disp,PC),Rn */
@@ -1161,7 +1200,7 @@ static void _decode_opc(DisasContext * ctx)
 	return;
     case 0x8b00:		/* bf label */
 	CHECK_NOT_DELAY_SLOT
-        gen_conditional_jump(ctx, ctx->pc + 2, ctx->pc + 4 + B7_0s * 2);
+        gen_conditional_jump(ctx, ctx->pc + 4 + B7_0s * 2, false);
 	return;
     case 0x8f00:		/* bf/s label */
 	CHECK_NOT_DELAY_SLOT
@@ -1171,7 +1210,7 @@ static void _decode_opc(DisasContext * ctx)
 	return;
     case 0x8900:		/* bt label */
 	CHECK_NOT_DELAY_SLOT
-        gen_conditional_jump(ctx, ctx->pc + 4 + B7_0s * 2, ctx->pc + 2);
+        gen_conditional_jump(ctx, ctx->pc + 4 + B7_0s * 2, true);
 	return;
     case 0x8d00:		/* bt/s label */
 	CHECK_NOT_DELAY_SLOT
@@ -1808,6 +1847,18 @@ static void decode_opc(DisasContext * ctx)
     if (old_flags & DELAY_SLOT_MASK) {
         /* go out of the delay slot */
         ctx->envflags &= ~DELAY_SLOT_MASK;
+
+        /* When in an exclusive region, we must continue to the end
+           for conditional branches.  */
+        if (ctx->tbflags & GUSA_EXCLUSIVE
+            && old_flags & DELAY_SLOT_CONDITIONAL) {
+            gen_delayed_conditional_jump(ctx);
+            return;
+        }
+        /* Otherwise this is probably an invalid gUSA region.
+           Drop the GUSA bits so the next TB doesn't see them.  */
+        ctx->envflags &= ~GUSA_MASK;
+
         tcg_gen_movi_i32(cpu_flags, ctx->envflags);
         ctx->bstate = BS_BRANCH;
         if (old_flags & DELAY_SLOT_CONDITIONAL) {
@@ -1815,9 +1866,60 @@ static void decode_opc(DisasContext * ctx)
         } else {
             gen_jump(ctx);
 	}
+    }
+}
+
+#ifdef CONFIG_USER_ONLY
+/* For uniprocessors, SH4 uses optimistic restartable atomic sequences.
+   Upon an interrupt, a real kernel would simply notice magic values in
+   the registers and reset the PC to the start of the sequence.
+
+   For QEMU, we cannot do this in quite the same way.  Instead, we notice
+   the normal start of such a sequence (mov #-x,r15).  While we can handle
+   any sequence via cpu_exec_step_atomic, we can recognize the "normal"
+   sequences and transform them into atomic operations as seen by the host.
+*/
+static int decode_gusa(DisasContext *ctx, CPUSH4State *env, int *pmax_insns)
+{
+    uint32_t pc = ctx->pc;
+    uint32_t pc_end = ctx->tb->cs_base;
+    int backup = sextract32(ctx->tbflags, GUSA_SHIFT, 8);
+    int max_insns = (pc_end - pc) / 2;
+
+    if (pc != pc_end + backup || max_insns < 2) {
+        /* This is a malformed gUSA region.  Don't do anything special,
+           since the interpreter is likely to get confused.  */
+        ctx->envflags &= ~GUSA_MASK;
+        return 0;
+    }
 
+    if (ctx->tbflags & GUSA_EXCLUSIVE) {
+        /* Regardless of single-stepping or the end of the page,
+           we must complete execution of the gUSA region while
+           holding the exclusive lock.  */
+        *pmax_insns = max_insns;
+        return 0;
     }
+
+    qemu_log_mask(LOG_UNIMP, "Unrecognized gUSA sequence %08x-%08x\n",
+                  pc, pc_end);
+
+    /* Restart with the EXCLUSIVE bit set, within a TB run via
+       cpu_exec_step_atomic holding the exclusive lock.  */
+    tcg_gen_insn_start(pc, ctx->envflags);
+    ctx->envflags |= GUSA_EXCLUSIVE;
+    gen_save_cpu_state(ctx, false);
+    gen_helper_exclusive(cpu_env);
+    ctx->bstate = BS_EXCP;
+
+    /* We're not executing an instruction, but we must report one for the
+       purposes of accounting within the TB.  We might as well report the
+       entire region consumed via ctx->pc so that it's immediately available
+       in the disassembly dump.  */
+    ctx->pc = pc_end;
+    return 1;
 }
+#endif
 
 void gen_intermediate_code(CPUSH4State * env, struct TranslationBlock *tb)
 {
@@ -1861,6 +1963,12 @@ void gen_intermediate_code(CPUSH4State * env, struct TranslationBlock *tb)
     gen_tb_start(tb);
     num_insns = 0;
 
+#ifdef CONFIG_USER_ONLY
+    if (ctx.tbflags & GUSA_MASK) {
+        num_insns = decode_gusa(&ctx, env, &max_insns);
+    }
+#endif
+
     while (ctx.bstate == BS_NONE
            && num_insns < max_insns
            && !tcg_op_buf_full()) {
@@ -1891,6 +1999,12 @@ void gen_intermediate_code(CPUSH4State * env, struct TranslationBlock *tb)
     if (tb->cflags & CF_LAST_IO) {
         gen_io_end();
     }
+
+    if (ctx.tbflags & GUSA_EXCLUSIVE) {
+        /* Ending the region of exclusivity.  Clear the bits.  */
+        ctx.envflags &= ~GUSA_MASK;
+    }
+
     if (cs->singlestep_enabled) {
         gen_save_cpu_state(&ctx, true);
         gen_helper_debug(cpu_env);
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v3 07/30] target/sh4: Recognize common gUSA sequences
  2017-07-18 20:02 [Qemu-devel] [PATCH v3 00/30] target/sh4 improvements Richard Henderson
                   ` (5 preceding siblings ...)
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 06/30] target/sh4: Handle user-space atomics Richard Henderson
@ 2017-07-18 20:02 ` Richard Henderson
  2017-07-18 20:32   ` Aurelien Jarno
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 08/30] linux-user/sh4: Notice gUSA regions during signal delivery Richard Henderson
                   ` (22 subsequent siblings)
  29 siblings, 1 reply; 34+ messages in thread
From: Richard Henderson @ 2017-07-18 20:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

For many of the sequences produced by gcc or glibc,
we can translate these as host atomic operations.
Which saves the need to acquire the exclusive lock.

Signed-off-by: Richard Henderson <rth@twiddle.net>

---
V2: Free constants loaded during the gUSA sequence.
---
 target/sh4/translate.c | 321 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 321 insertions(+)

diff --git a/target/sh4/translate.c b/target/sh4/translate.c
index c5786cb..0ae2ca6 100644
--- a/target/sh4/translate.c
+++ b/target/sh4/translate.c
@@ -1881,10 +1881,17 @@ static void decode_opc(DisasContext * ctx)
 */
 static int decode_gusa(DisasContext *ctx, CPUSH4State *env, int *pmax_insns)
 {
+    uint16_t insns[5];
+    int ld_adr, ld_dst, ld_mop;
+    int op_dst, op_src, op_opc;
+    int mv_src, mt_dst, st_src, st_mop;
+    TCGv op_arg;
+
     uint32_t pc = ctx->pc;
     uint32_t pc_end = ctx->tb->cs_base;
     int backup = sextract32(ctx->tbflags, GUSA_SHIFT, 8);
     int max_insns = (pc_end - pc) / 2;
+    int i;
 
     if (pc != pc_end + backup || max_insns < 2) {
         /* This is a malformed gUSA region.  Don't do anything special,
@@ -1901,6 +1908,320 @@ static int decode_gusa(DisasContext *ctx, CPUSH4State *env, int *pmax_insns)
         return 0;
     }
 
+    /* The state machine below will consume only a few insns.
+       If there are more than that in a region, fail now.  */
+    if (max_insns > ARRAY_SIZE(insns)) {
+        goto fail;
+    }
+
+    /* Read all of the insns for the region.  */
+    for (i = 0; i < max_insns; ++i) {
+        insns[i] = cpu_lduw_code(env, pc + i * 2);
+    }
+
+    ld_adr = ld_dst = ld_mop = -1;
+    mv_src = -1;
+    op_dst = op_src = op_opc = -1;
+    mt_dst = -1;
+    st_src = st_mop = -1;
+    TCGV_UNUSED(op_arg);
+    i = 0;
+
+#define NEXT_INSN \
+    do { if (i >= max_insns) goto fail; ctx->opcode = insns[i++]; } while (0)
+
+    /*
+     * Expect a load to begin the region.
+     */
+    NEXT_INSN;
+    switch (ctx->opcode & 0xf00f) {
+    case 0x6000: /* mov.b @Rm,Rn */
+        ld_mop = MO_SB;
+        break;
+    case 0x6001: /* mov.w @Rm,Rn */
+        ld_mop = MO_TESW;
+        break;
+    case 0x6002: /* mov.l @Rm,Rn */
+        ld_mop = MO_TESL;
+        break;
+    default:
+        goto fail;
+    }
+    ld_adr = B7_4;
+    ld_dst = B11_8;
+    if (ld_adr == ld_dst) {
+        goto fail;
+    }
+    /* Unless we see a mov, any two-operand operation must use ld_dst.  */
+    op_dst = ld_dst;
+
+    /*
+     * Expect an optional register move.
+     */
+    NEXT_INSN;
+    switch (ctx->opcode & 0xf00f) {
+    case 0x6003: /* mov Rm,Rn */
+        /* Here we want to recognize ld_dst being saved for later consumtion,
+           or for another input register being copied so that ld_dst need not
+           be clobbered during the operation.  */
+        op_dst = B11_8;
+        mv_src = B7_4;
+        if (op_dst == ld_dst) {
+            /* Overwriting the load output.  */
+            goto fail;
+        }
+        if (mv_src != ld_dst) {
+            /* Copying a new input; constrain op_src to match the load.  */
+            op_src = ld_dst;
+        }
+        break;
+
+    default:
+        /* Put back and re-examine as operation.  */
+        --i;
+    }
+
+    /*
+     * Expect the operation.
+     */
+    NEXT_INSN;
+    switch (ctx->opcode & 0xf00f) {
+    case 0x300c: /* add Rm,Rn */
+        op_opc = INDEX_op_add_i32;
+        goto do_reg_op;
+    case 0x2009: /* and Rm,Rn */
+        op_opc = INDEX_op_and_i32;
+        goto do_reg_op;
+    case 0x200a: /* xor Rm,Rn */
+        op_opc = INDEX_op_xor_i32;
+        goto do_reg_op;
+    case 0x200b: /* or Rm,Rn */
+        op_opc = INDEX_op_or_i32;
+    do_reg_op:
+        /* The operation register should be as expected, and the
+           other input cannot depend on the load.  */
+        if (op_dst != B11_8) {
+            goto fail;
+        }
+        if (op_src < 0) {
+            /* Unconstrainted input.  */
+            op_src = B7_4;
+        } else if (op_src == B7_4) {
+            /* Constrained input matched load.  All operations are
+               commutative; "swap" them by "moving" the load output
+               to the (implicit) first argument and the move source
+               to the (explicit) second argument.  */
+            op_src = mv_src;
+        } else {
+            goto fail;
+        }
+        op_arg = REG(op_src);
+        break;
+
+    case 0x6007: /* not Rm,Rn */
+        if (ld_dst != B7_4 || mv_src >= 0) {
+            goto fail;
+        }
+        op_dst = B11_8;
+        op_opc = INDEX_op_xor_i32;
+        op_arg = tcg_const_i32(-1);
+        break;
+
+    case 0x7000 ... 0x700f: /* add #imm,Rn */
+        if (op_dst != B11_8 || mv_src >= 0) {
+            goto fail;
+        }
+        op_opc = INDEX_op_add_i32;
+        op_arg = tcg_const_i32(B7_0s);
+        break;
+
+    case 0x3000: /* cmp/eq Rm,Rn */
+        /* Looking for the middle of a compare-and-swap sequence,
+           beginning with the compare.  Operands can be either order,
+           but with only one overlapping the load.  */
+        if ((ld_dst == B11_8) + (ld_dst == B7_4) != 1 || mv_src >= 0) {
+            goto fail;
+        }
+        op_opc = INDEX_op_setcond_i32;  /* placeholder */
+        op_src = (ld_dst == B11_8 ? B7_4 : B11_8);
+        op_arg = REG(op_src);
+
+        NEXT_INSN;
+        switch (ctx->opcode & 0xff00) {
+        case 0x8b00: /* bf label */
+        case 0x8f00: /* bf/s label */
+            if (pc + (i + 1 + B7_0s) * 2 != pc_end) {
+                goto fail;
+            }
+            if ((ctx->opcode & 0xff00) == 0x8b00) { /* bf label */
+                break;
+            }
+            /* We're looking to unconditionally modify Rn with the
+               result of the comparison, within the delay slot of
+               the branch.  This is used by older gcc.  */
+            NEXT_INSN;
+            if ((ctx->opcode & 0xf0ff) == 0x0029) { /* movt Rn */
+                mt_dst = B11_8;
+            } else {
+                goto fail;
+            }
+            break;
+
+        default:
+            goto fail;
+        }
+        break;
+
+    case 0x2008: /* tst Rm,Rn */
+        /* Looking for a compare-and-swap against zero.  */
+        if (ld_dst != B11_8 || ld_dst != B7_4 || mv_src >= 0) {
+            goto fail;
+        }
+        op_opc = INDEX_op_setcond_i32;
+        op_arg = tcg_const_i32(0);
+
+        NEXT_INSN;
+        if ((ctx->opcode & 0xff00) != 0x8900 /* bt label */
+            || pc + (i + 1 + B7_0s) * 2 != pc_end) {
+            goto fail;
+        }
+        break;
+
+    default:
+        /* Put back and re-examine as store.  */
+        --i;
+    }
+
+    /*
+     * Expect the store.
+     */
+    /* The store must be the last insn.  */
+    if (i != max_insns - 1) {
+        goto fail;
+    }
+    NEXT_INSN;
+    switch (ctx->opcode & 0xf00f) {
+    case 0x2000: /* mov.b Rm,@Rn */
+        st_mop = MO_UB;
+        break;
+    case 0x2001: /* mov.w Rm,@Rn */
+        st_mop = MO_UW;
+        break;
+    case 0x2002: /* mov.l Rm,@Rn */
+        st_mop = MO_UL;
+        break;
+    default:
+        goto fail;
+    }
+    /* The store must match the load.  */
+    if (ld_adr != B11_8 || st_mop != (ld_mop & MO_SIZE)) {
+        goto fail;
+    }
+    st_src = B7_4;
+
+#undef NEXT_INSN
+
+    /*
+     * Emit the operation.
+     */
+    tcg_gen_insn_start(pc, ctx->envflags);
+    switch (op_opc) {
+    case -1:
+        /* No operation found.  Look for exchange pattern.  */
+        if (st_src == ld_dst || mv_src >= 0) {
+            goto fail;
+        }
+        tcg_gen_atomic_xchg_i32(REG(ld_dst), REG(ld_adr), REG(st_src),
+                                ctx->memidx, ld_mop);
+        break;
+
+    case INDEX_op_add_i32:
+        if (op_dst != st_src) {
+            goto fail;
+        }
+        if (op_dst == ld_dst && st_mop == MO_UL) {
+            tcg_gen_atomic_add_fetch_i32(REG(ld_dst), REG(ld_adr),
+                                         op_arg, ctx->memidx, ld_mop);
+        } else {
+            tcg_gen_atomic_fetch_add_i32(REG(ld_dst), REG(ld_adr),
+                                         op_arg, ctx->memidx, ld_mop);
+            if (op_dst != ld_dst) {
+                /* Note that mop sizes < 4 cannot use add_fetch
+                   because it won't carry into the higher bits.  */
+                tcg_gen_add_i32(REG(op_dst), REG(ld_dst), op_arg);
+            }
+        }
+        break;
+
+    case INDEX_op_and_i32:
+        if (op_dst != st_src) {
+            goto fail;
+        }
+        if (op_dst == ld_dst) {
+            tcg_gen_atomic_and_fetch_i32(REG(ld_dst), REG(ld_adr),
+                                         op_arg, ctx->memidx, ld_mop);
+        } else {
+            tcg_gen_atomic_fetch_and_i32(REG(ld_dst), REG(ld_adr),
+                                         op_arg, ctx->memidx, ld_mop);
+            tcg_gen_and_i32(REG(op_dst), REG(ld_dst), op_arg);
+        }
+        break;
+
+    case INDEX_op_or_i32:
+        if (op_dst != st_src) {
+            goto fail;
+        }
+        if (op_dst == ld_dst) {
+            tcg_gen_atomic_or_fetch_i32(REG(ld_dst), REG(ld_adr),
+                                        op_arg, ctx->memidx, ld_mop);
+        } else {
+            tcg_gen_atomic_fetch_or_i32(REG(ld_dst), REG(ld_adr),
+                                        op_arg, ctx->memidx, ld_mop);
+            tcg_gen_or_i32(REG(op_dst), REG(ld_dst), op_arg);
+        }
+        break;
+
+    case INDEX_op_xor_i32:
+        if (op_dst != st_src) {
+            goto fail;
+        }
+        if (op_dst == ld_dst) {
+            tcg_gen_atomic_xor_fetch_i32(REG(ld_dst), REG(ld_adr),
+                                         op_arg, ctx->memidx, ld_mop);
+        } else {
+            tcg_gen_atomic_fetch_xor_i32(REG(ld_dst), REG(ld_adr),
+                                         op_arg, ctx->memidx, ld_mop);
+            tcg_gen_xor_i32(REG(op_dst), REG(ld_dst), op_arg);
+        }
+        break;
+
+    case INDEX_op_setcond_i32:
+        if (st_src == ld_dst) {
+            goto fail;
+        }
+        tcg_gen_atomic_cmpxchg_i32(REG(ld_dst), REG(ld_adr), op_arg,
+                                   REG(st_src), ctx->memidx, ld_mop);
+        tcg_gen_setcond_i32(TCG_COND_EQ, cpu_sr_t, REG(ld_dst), op_arg);
+        if (mt_dst >= 0) {
+            tcg_gen_mov_i32(REG(mt_dst), cpu_sr_t);
+        }
+        break;
+
+    default:
+        g_assert_not_reached();
+    }
+
+    /* If op_src is not a valid register, then op_arg was a constant.  */
+    if (op_src < 0) {
+        tcg_temp_free_i32(op_arg);
+    }
+
+    /* The entire region has been translated.  */
+    ctx->envflags &= ~GUSA_MASK;
+    ctx->pc = pc_end;
+    return max_insns;
+
+ fail:
     qemu_log_mask(LOG_UNIMP, "Unrecognized gUSA sequence %08x-%08x\n",
                   pc, pc_end);
 
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v3 08/30] linux-user/sh4: Notice gUSA regions during signal delivery
  2017-07-18 20:02 [Qemu-devel] [PATCH v3 00/30] target/sh4 improvements Richard Henderson
                   ` (6 preceding siblings ...)
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 07/30] target/sh4: Recognize common gUSA sequences Richard Henderson
@ 2017-07-18 20:02 ` Richard Henderson
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 09/30] linux-user/sh4: Clean env->flags on signal boundaries Richard Henderson
                   ` (21 subsequent siblings)
  29 siblings, 0 replies; 34+ messages in thread
From: Richard Henderson @ 2017-07-18 20:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

We translate gUSA regions atomically in a parallel context.
But in a serial context a gUSA region may be interrupted.
In that case, restart the region as the kernel would.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 linux-user/signal.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/linux-user/signal.c b/linux-user/signal.c
index ddfd75c..27867a4 100644
--- a/linux-user/signal.c
+++ b/linux-user/signal.c
@@ -3471,6 +3471,30 @@ static abi_ulong get_sigframe(struct target_sigaction *ka,
     return (sp - frame_size) & -8ul;
 }
 
+/* Notice when we're in the middle of a gUSA region and reset.
+   Note that this will only occur for !parallel_cpus, as we will
+   translate such sequences differently in a parallel context.  */
+static void unwind_gusa(CPUSH4State *regs)
+{
+    /* If the stack pointer is sufficiently negative, and we haven't
+       completed the sequence, then reset to the entry to the region.  */
+    /* ??? The SH4 kernel checks for and address above 0xC0000000.
+       However, the page mappings in qemu linux-user aren't as restricted
+       and we wind up with the normal stack mapped above 0xF0000000.
+       That said, there is no reason why the kernel should be allowing
+       a gUSA region that spans 1GB.  Use a tighter check here, for what
+       can actually be enabled by the immediate move.  */
+    if (regs->gregs[15] >= -128u && regs->pc < regs->gregs[0]) {
+        /* Reset the PC to before the gUSA region, as computed from
+           R0 = region end, SP = -(region size), plus one more for the
+           insn that actually initializes SP to the region size.  */
+        regs->pc = regs->gregs[0] + regs->gregs[15] - 2;
+
+        /* Reset the SP to the saved version in R1.  */
+        regs->gregs[15] = regs->gregs[1];
+    }
+}
+
 static void setup_sigcontext(struct target_sigcontext *sc,
                              CPUSH4State *regs, unsigned long mask)
 {
@@ -3534,6 +3558,8 @@ static void setup_frame(int sig, struct target_sigaction *ka,
     abi_ulong frame_addr;
     int i;
 
+    unwind_gusa(regs);
+
     frame_addr = get_sigframe(ka, regs->gregs[15], sizeof(*frame));
     trace_user_setup_frame(regs, frame_addr);
     if (!lock_user_struct(VERIFY_WRITE, frame, frame_addr, 0)) {
@@ -3584,6 +3610,8 @@ static void setup_rt_frame(int sig, struct target_sigaction *ka,
     abi_ulong frame_addr;
     int i;
 
+    unwind_gusa(regs);
+
     frame_addr = get_sigframe(ka, regs->gregs[15], sizeof(*frame));
     trace_user_setup_rt_frame(regs, frame_addr);
     if (!lock_user_struct(VERIFY_WRITE, frame, frame_addr, 0)) {
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v3 09/30] linux-user/sh4: Clean env->flags on signal boundaries
  2017-07-18 20:02 [Qemu-devel] [PATCH v3 00/30] target/sh4 improvements Richard Henderson
                   ` (7 preceding siblings ...)
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 08/30] linux-user/sh4: Notice gUSA regions during signal delivery Richard Henderson
@ 2017-07-18 20:02 ` Richard Henderson
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 10/30] target/sh4: Hoist register bank selection Richard Henderson
                   ` (20 subsequent siblings)
  29 siblings, 0 replies; 34+ messages in thread
From: Richard Henderson @ 2017-07-18 20:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

If a signal is delivered during the execution of a delay slot,
or a gUSA region, clear those bits from the environment so that
the signal handler does not start in that same state.

Cleaning the bits on signal return is paranoid good sense.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 linux-user/signal.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/linux-user/signal.c b/linux-user/signal.c
index 27867a4..426e330 100644
--- a/linux-user/signal.c
+++ b/linux-user/signal.c
@@ -3549,6 +3549,7 @@ static void restore_sigcontext(CPUSH4State *regs, struct target_sigcontext *sc)
     __get_user(regs->fpul, &sc->sc_fpul);
 
     regs->tra = -1;         /* disable syscall checks */
+    regs->flags &= ~(DELAY_SLOT_MASK | GUSA_MASK);
 }
 
 static void setup_frame(int sig, struct target_sigaction *ka,
@@ -3593,6 +3594,7 @@ static void setup_frame(int sig, struct target_sigaction *ka,
     regs->gregs[6] = frame_addr += offsetof(typeof(*frame), sc);
     regs->pc = (unsigned long) ka->_sa_handler;
     regs->lock_addr = -1;
+    regs->flags &= ~(DELAY_SLOT_MASK | GUSA_MASK);
 
     unlock_user_struct(frame, frame_addr, 1);
     return;
@@ -3656,6 +3658,7 @@ static void setup_rt_frame(int sig, struct target_sigaction *ka,
     regs->gregs[6] = frame_addr + offsetof(typeof(*frame), uc);
     regs->pc = (unsigned long) ka->_sa_handler;
     regs->lock_addr = -1;
+    regs->flags &= ~(DELAY_SLOT_MASK | GUSA_MASK);
 
     unlock_user_struct(frame, frame_addr, 1);
     return;
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v3 10/30] target/sh4: Hoist register bank selection
  2017-07-18 20:02 [Qemu-devel] [PATCH v3 00/30] target/sh4 improvements Richard Henderson
                   ` (8 preceding siblings ...)
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 09/30] linux-user/sh4: Clean env->flags on signal boundaries Richard Henderson
@ 2017-07-18 20:02 ` Richard Henderson
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 11/30] target/sh4: Unify cpu_fregs into FREG Richard Henderson
                   ` (19 subsequent siblings)
  29 siblings, 0 replies; 34+ messages in thread
From: Richard Henderson @ 2017-07-18 20:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

Compute which register bank to use once at the start of translation.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target/sh4/translate.c | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/target/sh4/translate.c b/target/sh4/translate.c
index 0ae2ca6..ef6f674 100644
--- a/target/sh4/translate.c
+++ b/target/sh4/translate.c
@@ -41,6 +41,7 @@ typedef struct DisasContext {
     uint32_t envflags;   /* should stay in sync with env->flags using TCG ops */
     int bstate;
     int memidx;
+    int gbank;
     uint32_t delayed_pc;
     int singlestep_enabled;
     uint32_t features;
@@ -64,7 +65,7 @@ enum {
 
 /* global register indexes */
 static TCGv_env cpu_env;
-static TCGv cpu_gregs[24];
+static TCGv cpu_gregs[32];
 static TCGv cpu_sr, cpu_sr_m, cpu_sr_q, cpu_sr_t;
 static TCGv cpu_pc, cpu_ssr, cpu_spc, cpu_gbr;
 static TCGv cpu_vbr, cpu_sgr, cpu_dbr, cpu_mach, cpu_macl;
@@ -99,16 +100,19 @@ void sh4_translate_init(void)
         "FPR12_BANK1", "FPR13_BANK1", "FPR14_BANK1", "FPR15_BANK1",
     };
 
-    if (done_init)
+    if (done_init) {
         return;
+    }
 
     cpu_env = tcg_global_reg_new_ptr(TCG_AREG0, "env");
     tcg_ctx.tcg_env = cpu_env;
 
-    for (i = 0; i < 24; i++)
+    for (i = 0; i < 24; i++) {
         cpu_gregs[i] = tcg_global_mem_new_i32(cpu_env,
                                               offsetof(CPUSH4State, gregs[i]),
                                               gregnames[i]);
+    }
+    memcpy(cpu_gregs + 24, cpu_gregs + 8, 8 * sizeof(TCGv));
 
     cpu_pc = tcg_global_mem_new_i32(cpu_env,
                                     offsetof(CPUSH4State, pc), "PC");
@@ -352,13 +356,8 @@ static inline void gen_store_fpr64 (TCGv_i64 t, int reg)
 #define B11_8 ((ctx->opcode >> 8) & 0xf)
 #define B15_12 ((ctx->opcode >> 12) & 0xf)
 
-#define REG(x) ((x) < 8 && (ctx->tbflags & (1u << SR_MD))\
-                        && (ctx->tbflags & (1u << SR_RB))\
-                ? (cpu_gregs[x + 16]) : (cpu_gregs[x]))
-
-#define ALTREG(x) ((x) < 8 && (!(ctx->tbflags & (1u << SR_MD))\
-                               || !(ctx->tbflags & (1u << SR_RB)))\
-		? (cpu_gregs[x + 16]) : (cpu_gregs[x]))
+#define REG(x)     cpu_gregs[(x) ^ ctx->gbank]
+#define ALTREG(x)  cpu_gregs[(x) ^ ctx->gbank ^ 0x10]
 
 #define FREG(x) (ctx->tbflags & FPSCR_FR ? (x) ^ 0x10 : (x))
 #define XHACK(x) ((((x) & 1 ) << 4) | ((x) & 0xe))
@@ -2264,6 +2263,8 @@ void gen_intermediate_code(CPUSH4State * env, struct TranslationBlock *tb)
     ctx.singlestep_enabled = cs->singlestep_enabled;
     ctx.features = env->features;
     ctx.has_movcal = (ctx.tbflags & TB_FLAG_PENDING_MOVCA);
+    ctx.gbank = ((ctx.tbflags & (1 << SR_MD)) &&
+                 (ctx.tbflags & (1 << SR_RB))) * 0x10;
 
     max_insns = tb->cflags & CF_COUNT_MASK;
     if (max_insns == 0) {
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v3 11/30] target/sh4: Unify cpu_fregs into FREG
  2017-07-18 20:02 [Qemu-devel] [PATCH v3 00/30] target/sh4 improvements Richard Henderson
                   ` (9 preceding siblings ...)
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 10/30] target/sh4: Hoist register bank selection Richard Henderson
@ 2017-07-18 20:02 ` Richard Henderson
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 12/30] target/sh4: Pass DisasContext to fpr64 routines Richard Henderson
                   ` (18 subsequent siblings)
  29 siblings, 0 replies; 34+ messages in thread
From: Richard Henderson @ 2017-07-18 20:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

We were treating FREG as an index and REG as a TCGv.
Making FREG return a TCGv is both less confusing and
a step toward cleaner banking of cpu_fregs.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target/sh4/translate.c | 125 ++++++++++++++++++++-----------------------------
 1 file changed, 52 insertions(+), 73 deletions(-)

diff --git a/target/sh4/translate.c b/target/sh4/translate.c
index ef6f674..b3c3e8e 100644
--- a/target/sh4/translate.c
+++ b/target/sh4/translate.c
@@ -359,10 +359,11 @@ static inline void gen_store_fpr64 (TCGv_i64 t, int reg)
 #define REG(x)     cpu_gregs[(x) ^ ctx->gbank]
 #define ALTREG(x)  cpu_gregs[(x) ^ ctx->gbank ^ 0x10]
 
-#define FREG(x) (ctx->tbflags & FPSCR_FR ? (x) ^ 0x10 : (x))
+#define FREG(x) cpu_fregs[ctx->tbflags & FPSCR_FR ? (x) ^ 0x10 : (x)]
 #define XHACK(x) ((((x) & 1 ) << 4) | ((x) & 0xe))
-#define XREG(x) (ctx->tbflags & FPSCR_FR ? XHACK(x) ^ 0x10 : XHACK(x))
-#define DREG(x) FREG(x) /* Assumes lsb of (x) is always 0 */
+#define XREG(x) FREG(XHACK(x))
+/* Assumes lsb of (x) is always 0 */
+#define DREG(x) (ctx->tbflags & FPSCR_FR ? (x) ^ 0x10 : (x))
 
 #define CHECK_NOT_DELAY_SLOT \
     if (ctx->envflags & DELAY_SLOT_MASK) {                           \
@@ -983,56 +984,51 @@ static void _decode_opc(DisasContext * ctx)
 	CHECK_FPU_ENABLED
         if (ctx->tbflags & FPSCR_SZ) {
 	    TCGv_i64 fp = tcg_temp_new_i64();
-	    gen_load_fpr64(fp, XREG(B7_4));
-	    gen_store_fpr64(fp, XREG(B11_8));
+	    gen_load_fpr64(fp, XHACK(B7_4));
+	    gen_store_fpr64(fp, XHACK(B11_8));
 	    tcg_temp_free_i64(fp);
 	} else {
-	    tcg_gen_mov_i32(cpu_fregs[FREG(B11_8)], cpu_fregs[FREG(B7_4)]);
+	    tcg_gen_mov_i32(FREG(B11_8), FREG(B7_4));
 	}
 	return;
     case 0xf00a: /* fmov {F,D,X}Rm,@Rn - FPSCR: Nothing */
 	CHECK_FPU_ENABLED
         if (ctx->tbflags & FPSCR_SZ) {
 	    TCGv addr_hi = tcg_temp_new();
-	    int fr = XREG(B7_4);
+	    int fr = XHACK(B7_4);
 	    tcg_gen_addi_i32(addr_hi, REG(B11_8), 4);
-            tcg_gen_qemu_st_i32(cpu_fregs[fr], REG(B11_8),
-                                ctx->memidx, MO_TEUL);
-            tcg_gen_qemu_st_i32(cpu_fregs[fr+1], addr_hi,
-                                ctx->memidx, MO_TEUL);
+            tcg_gen_qemu_st_i32(FREG(fr), REG(B11_8), ctx->memidx, MO_TEUL);
+            tcg_gen_qemu_st_i32(FREG(fr + 1), addr_hi, ctx->memidx, MO_TEUL);
 	    tcg_temp_free(addr_hi);
 	} else {
-            tcg_gen_qemu_st_i32(cpu_fregs[FREG(B7_4)], REG(B11_8),
-                                ctx->memidx, MO_TEUL);
+            tcg_gen_qemu_st_i32(FREG(B7_4), REG(B11_8), ctx->memidx, MO_TEUL);
 	}
 	return;
     case 0xf008: /* fmov @Rm,{F,D,X}Rn - FPSCR: Nothing */
 	CHECK_FPU_ENABLED
         if (ctx->tbflags & FPSCR_SZ) {
 	    TCGv addr_hi = tcg_temp_new();
-	    int fr = XREG(B11_8);
+	    int fr = XHACK(B11_8);
 	    tcg_gen_addi_i32(addr_hi, REG(B7_4), 4);
-            tcg_gen_qemu_ld_i32(cpu_fregs[fr], REG(B7_4), ctx->memidx, MO_TEUL);
-            tcg_gen_qemu_ld_i32(cpu_fregs[fr+1], addr_hi, ctx->memidx, MO_TEUL);
+            tcg_gen_qemu_ld_i32(FREG(fr), REG(B7_4), ctx->memidx, MO_TEUL);
+            tcg_gen_qemu_ld_i32(FREG(fr + 1), addr_hi, ctx->memidx, MO_TEUL);
 	    tcg_temp_free(addr_hi);
 	} else {
-            tcg_gen_qemu_ld_i32(cpu_fregs[FREG(B11_8)], REG(B7_4),
-                                ctx->memidx, MO_TEUL);
+            tcg_gen_qemu_ld_i32(FREG(B11_8), REG(B7_4), ctx->memidx, MO_TEUL);
 	}
 	return;
     case 0xf009: /* fmov @Rm+,{F,D,X}Rn - FPSCR: Nothing */
 	CHECK_FPU_ENABLED
         if (ctx->tbflags & FPSCR_SZ) {
 	    TCGv addr_hi = tcg_temp_new();
-	    int fr = XREG(B11_8);
+	    int fr = XHACK(B11_8);
 	    tcg_gen_addi_i32(addr_hi, REG(B7_4), 4);
-            tcg_gen_qemu_ld_i32(cpu_fregs[fr], REG(B7_4), ctx->memidx, MO_TEUL);
-            tcg_gen_qemu_ld_i32(cpu_fregs[fr+1], addr_hi, ctx->memidx, MO_TEUL);
+            tcg_gen_qemu_ld_i32(FREG(fr), REG(B7_4), ctx->memidx, MO_TEUL);
+            tcg_gen_qemu_ld_i32(FREG(fr + 1), addr_hi, ctx->memidx, MO_TEUL);
 	    tcg_gen_addi_i32(REG(B7_4), REG(B7_4), 8);
 	    tcg_temp_free(addr_hi);
 	} else {
-            tcg_gen_qemu_ld_i32(cpu_fregs[FREG(B11_8)], REG(B7_4),
-                                ctx->memidx, MO_TEUL);
+            tcg_gen_qemu_ld_i32(FREG(B11_8), REG(B7_4), ctx->memidx, MO_TEUL);
 	    tcg_gen_addi_i32(REG(B7_4), REG(B7_4), 4);
 	}
 	return;
@@ -1041,13 +1037,12 @@ static void _decode_opc(DisasContext * ctx)
         TCGv addr = tcg_temp_new_i32();
         tcg_gen_subi_i32(addr, REG(B11_8), 4);
         if (ctx->tbflags & FPSCR_SZ) {
-	    int fr = XREG(B7_4);
-            tcg_gen_qemu_st_i32(cpu_fregs[fr+1], addr, ctx->memidx, MO_TEUL);
+	    int fr = XHACK(B7_4);
+            tcg_gen_qemu_st_i32(FREG(fr + 1), addr, ctx->memidx, MO_TEUL);
 	    tcg_gen_subi_i32(addr, addr, 4);
-            tcg_gen_qemu_st_i32(cpu_fregs[fr], addr, ctx->memidx, MO_TEUL);
+            tcg_gen_qemu_st_i32(FREG(fr), addr, ctx->memidx, MO_TEUL);
 	} else {
-            tcg_gen_qemu_st_i32(cpu_fregs[FREG(B7_4)], addr,
-                                ctx->memidx, MO_TEUL);
+            tcg_gen_qemu_st_i32(FREG(B7_4), addr, ctx->memidx, MO_TEUL);
 	}
         tcg_gen_mov_i32(REG(B11_8), addr);
         tcg_temp_free(addr);
@@ -1058,15 +1053,12 @@ static void _decode_opc(DisasContext * ctx)
 	    TCGv addr = tcg_temp_new_i32();
 	    tcg_gen_add_i32(addr, REG(B7_4), REG(0));
             if (ctx->tbflags & FPSCR_SZ) {
-		int fr = XREG(B11_8);
-                tcg_gen_qemu_ld_i32(cpu_fregs[fr], addr,
-                                    ctx->memidx, MO_TEUL);
+		int fr = XHACK(B11_8);
+                tcg_gen_qemu_ld_i32(FREG(fr), addr, ctx->memidx, MO_TEUL);
 		tcg_gen_addi_i32(addr, addr, 4);
-                tcg_gen_qemu_ld_i32(cpu_fregs[fr+1], addr,
-                                    ctx->memidx, MO_TEUL);
+                tcg_gen_qemu_ld_i32(FREG(fr + 1), addr, ctx->memidx, MO_TEUL);
 	    } else {
-                tcg_gen_qemu_ld_i32(cpu_fregs[FREG(B11_8)], addr,
-                                    ctx->memidx, MO_TEUL);
+                tcg_gen_qemu_ld_i32(FREG(B11_8), addr, ctx->memidx, MO_TEUL);
 	    }
 	    tcg_temp_free(addr);
 	}
@@ -1077,15 +1069,12 @@ static void _decode_opc(DisasContext * ctx)
 	    TCGv addr = tcg_temp_new();
 	    tcg_gen_add_i32(addr, REG(B11_8), REG(0));
             if (ctx->tbflags & FPSCR_SZ) {
-		int fr = XREG(B7_4);
-                tcg_gen_qemu_ld_i32(cpu_fregs[fr], addr,
-                                    ctx->memidx, MO_TEUL);
+		int fr = XHACK(B7_4);
+                tcg_gen_qemu_ld_i32(FREG(fr), addr, ctx->memidx, MO_TEUL);
 		tcg_gen_addi_i32(addr, addr, 4);
-                tcg_gen_qemu_ld_i32(cpu_fregs[fr+1], addr,
-                                    ctx->memidx, MO_TEUL);
+                tcg_gen_qemu_ld_i32(FREG(fr + 1), addr, ctx->memidx, MO_TEUL);
 	    } else {
-                tcg_gen_qemu_st_i32(cpu_fregs[FREG(B7_4)], addr,
-                                    ctx->memidx, MO_TEUL);
+                tcg_gen_qemu_st_i32(FREG(B7_4), addr, ctx->memidx, MO_TEUL);
 	    }
 	    tcg_temp_free(addr);
 	}
@@ -1133,34 +1122,28 @@ static void _decode_opc(DisasContext * ctx)
 	    } else {
                 switch (ctx->opcode & 0xf00f) {
                 case 0xf000:		/* fadd Rm,Rn */
-                    gen_helper_fadd_FT(cpu_fregs[FREG(B11_8)], cpu_env,
-                                       cpu_fregs[FREG(B11_8)],
-                                       cpu_fregs[FREG(B7_4)]);
+                    gen_helper_fadd_FT(FREG(B11_8), cpu_env,
+                                       FREG(B11_8), FREG(B7_4));
                     break;
                 case 0xf001:		/* fsub Rm,Rn */
-                    gen_helper_fsub_FT(cpu_fregs[FREG(B11_8)], cpu_env,
-                                       cpu_fregs[FREG(B11_8)],
-                                       cpu_fregs[FREG(B7_4)]);
+                    gen_helper_fsub_FT(FREG(B11_8), cpu_env,
+                                       FREG(B11_8), FREG(B7_4));
                     break;
                 case 0xf002:		/* fmul Rm,Rn */
-                    gen_helper_fmul_FT(cpu_fregs[FREG(B11_8)], cpu_env,
-                                       cpu_fregs[FREG(B11_8)],
-                                       cpu_fregs[FREG(B7_4)]);
+                    gen_helper_fmul_FT(FREG(B11_8), cpu_env,
+                                       FREG(B11_8), FREG(B7_4));
                     break;
                 case 0xf003:		/* fdiv Rm,Rn */
-                    gen_helper_fdiv_FT(cpu_fregs[FREG(B11_8)], cpu_env,
-                                       cpu_fregs[FREG(B11_8)],
-                                       cpu_fregs[FREG(B7_4)]);
+                    gen_helper_fdiv_FT(FREG(B11_8), cpu_env,
+                                       FREG(B11_8), FREG(B7_4));
                     break;
                 case 0xf004:		/* fcmp/eq Rm,Rn */
                     gen_helper_fcmp_eq_FT(cpu_sr_t, cpu_env,
-                                          cpu_fregs[FREG(B11_8)],
-                                          cpu_fregs[FREG(B7_4)]);
+                                          FREG(B11_8), FREG(B7_4));
                     return;
                 case 0xf005:		/* fcmp/gt Rm,Rn */
                     gen_helper_fcmp_gt_FT(cpu_sr_t, cpu_env,
-                                          cpu_fregs[FREG(B11_8)],
-                                          cpu_fregs[FREG(B7_4)]);
+                                          FREG(B11_8), FREG(B7_4));
                     return;
                 }
 	    }
@@ -1172,9 +1155,8 @@ static void _decode_opc(DisasContext * ctx)
             if (ctx->tbflags & FPSCR_PR) {
                 break; /* illegal instruction */
             } else {
-                gen_helper_fmac_FT(cpu_fregs[FREG(B11_8)], cpu_env,
-                                   cpu_fregs[FREG(0)], cpu_fregs[FREG(B7_4)],
-                                   cpu_fregs[FREG(B11_8)]);
+                gen_helper_fmac_FT(FREG(B11_8), cpu_env,
+                                   FREG(0), FREG(B7_4), FREG(B11_8));
                 return;
             }
         }
@@ -1705,11 +1687,11 @@ static void _decode_opc(DisasContext * ctx)
         return;
     case 0xf00d: /* fsts FPUL,FRn - FPSCR: Nothing */
 	CHECK_FPU_ENABLED
-	tcg_gen_mov_i32(cpu_fregs[FREG(B11_8)], cpu_fpul);
+	tcg_gen_mov_i32(FREG(B11_8), cpu_fpul);
 	return;
     case 0xf01d: /* flds FRm,FPUL - FPSCR: Nothing */
 	CHECK_FPU_ENABLED
-	tcg_gen_mov_i32(cpu_fpul, cpu_fregs[FREG(B11_8)]);
+	tcg_gen_mov_i32(cpu_fpul, FREG(B11_8));
 	return;
     case 0xf02d: /* float FPUL,FRn/DRn - FPSCR: R[PR,Enable.I]/W[Cause,Flag] */
 	CHECK_FPU_ENABLED
@@ -1723,7 +1705,7 @@ static void _decode_opc(DisasContext * ctx)
 	    tcg_temp_free_i64(fp);
 	}
 	else {
-            gen_helper_float_FT(cpu_fregs[FREG(B11_8)], cpu_env, cpu_fpul);
+            gen_helper_float_FT(FREG(B11_8), cpu_env, cpu_fpul);
 	}
 	return;
     case 0xf03d: /* ftrc FRm/DRm,FPUL - FPSCR: R[PR,Enable.V]/W[Cause,Flag] */
@@ -1738,18 +1720,16 @@ static void _decode_opc(DisasContext * ctx)
 	    tcg_temp_free_i64(fp);
 	}
 	else {
-            gen_helper_ftrc_FT(cpu_fpul, cpu_env, cpu_fregs[FREG(B11_8)]);
+            gen_helper_ftrc_FT(cpu_fpul, cpu_env, FREG(B11_8));
 	}
 	return;
     case 0xf04d: /* fneg FRn/DRn - FPSCR: Nothing */
 	CHECK_FPU_ENABLED
-        tcg_gen_xori_i32(cpu_fregs[FREG(B11_8)], cpu_fregs[FREG(B11_8)],
-                         0x80000000);
+        tcg_gen_xori_i32(FREG(B11_8), FREG(B11_8), 0x80000000);
 	return;
     case 0xf05d: /* fabs FRn/DRn - FPCSR: Nothing */
 	CHECK_FPU_ENABLED
-        tcg_gen_andi_i32(cpu_fregs[FREG(B11_8)], cpu_fregs[FREG(B11_8)],
-                         0x7fffffff);
+        tcg_gen_andi_i32(FREG(B11_8), FREG(B11_8), 0x7fffffff);
 	return;
     case 0xf06d: /* fsqrt FRn */
 	CHECK_FPU_ENABLED
@@ -1762,8 +1742,7 @@ static void _decode_opc(DisasContext * ctx)
 	    gen_store_fpr64(fp, DREG(B11_8));
 	    tcg_temp_free_i64(fp);
 	} else {
-            gen_helper_fsqrt_FT(cpu_fregs[FREG(B11_8)], cpu_env,
-                                cpu_fregs[FREG(B11_8)]);
+            gen_helper_fsqrt_FT(FREG(B11_8), cpu_env, FREG(B11_8));
 	}
 	return;
     case 0xf07d: /* fsrra FRn */
@@ -1772,13 +1751,13 @@ static void _decode_opc(DisasContext * ctx)
     case 0xf08d: /* fldi0 FRn - FPSCR: R[PR] */
 	CHECK_FPU_ENABLED
         if (!(ctx->tbflags & FPSCR_PR)) {
-	    tcg_gen_movi_i32(cpu_fregs[FREG(B11_8)], 0);
+	    tcg_gen_movi_i32(FREG(B11_8), 0);
 	}
 	return;
     case 0xf09d: /* fldi1 FRn - FPSCR: R[PR] */
 	CHECK_FPU_ENABLED
         if (!(ctx->tbflags & FPSCR_PR)) {
-	    tcg_gen_movi_i32(cpu_fregs[FREG(B11_8)], 0x3f800000);
+	    tcg_gen_movi_i32(FREG(B11_8), 0x3f800000);
 	}
 	return;
     case 0xf0ad: /* fcnvsd FPUL,DRn */
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v3 12/30] target/sh4: Pass DisasContext to fpr64 routines
  2017-07-18 20:02 [Qemu-devel] [PATCH v3 00/30] target/sh4 improvements Richard Henderson
                   ` (10 preceding siblings ...)
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 11/30] target/sh4: Unify cpu_fregs into FREG Richard Henderson
@ 2017-07-18 20:02 ` Richard Henderson
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 14/30] target/sh4: Eliminate unused XREG macro Richard Henderson
                   ` (17 subsequent siblings)
  29 siblings, 0 replies; 34+ messages in thread
From: Richard Henderson @ 2017-07-18 20:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target/sh4/translate.c | 26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/target/sh4/translate.c b/target/sh4/translate.c
index b3c3e8e..caa4598 100644
--- a/target/sh4/translate.c
+++ b/target/sh4/translate.c
@@ -336,12 +336,12 @@ static void gen_delayed_conditional_jump(DisasContext * ctx)
     gen_jump(ctx);
 }
 
-static inline void gen_load_fpr64(TCGv_i64 t, int reg)
+static inline void gen_load_fpr64(DisasContext *ctx, TCGv_i64 t, int reg)
 {
     tcg_gen_concat_i32_i64(t, cpu_fregs[reg + 1], cpu_fregs[reg]);
 }
 
-static inline void gen_store_fpr64 (TCGv_i64 t, int reg)
+static inline void gen_store_fpr64(DisasContext *ctx, TCGv_i64 t, int reg)
 {
     tcg_gen_extr_i64_i32(cpu_fregs[reg + 1], cpu_fregs[reg], t);
 }
@@ -984,8 +984,8 @@ static void _decode_opc(DisasContext * ctx)
 	CHECK_FPU_ENABLED
         if (ctx->tbflags & FPSCR_SZ) {
 	    TCGv_i64 fp = tcg_temp_new_i64();
-	    gen_load_fpr64(fp, XHACK(B7_4));
-	    gen_store_fpr64(fp, XHACK(B11_8));
+	    gen_load_fpr64(ctx, fp, XHACK(B7_4));
+	    gen_store_fpr64(ctx, fp, XHACK(B11_8));
 	    tcg_temp_free_i64(fp);
 	} else {
 	    tcg_gen_mov_i32(FREG(B11_8), FREG(B7_4));
@@ -1094,8 +1094,8 @@ static void _decode_opc(DisasContext * ctx)
 		    break; /* illegal instruction */
 		fp0 = tcg_temp_new_i64();
 		fp1 = tcg_temp_new_i64();
-		gen_load_fpr64(fp0, DREG(B11_8));
-		gen_load_fpr64(fp1, DREG(B7_4));
+		gen_load_fpr64(ctx, fp0, DREG(B11_8));
+		gen_load_fpr64(ctx, fp1, DREG(B7_4));
                 switch (ctx->opcode & 0xf00f) {
                 case 0xf000:		/* fadd Rm,Rn */
                     gen_helper_fadd_DT(fp0, cpu_env, fp0, fp1);
@@ -1116,7 +1116,7 @@ static void _decode_opc(DisasContext * ctx)
                     gen_helper_fcmp_gt_DT(cpu_sr_t, cpu_env, fp0, fp1);
                     return;
                 }
-		gen_store_fpr64(fp0, DREG(B11_8));
+		gen_store_fpr64(ctx, fp0, DREG(B11_8));
                 tcg_temp_free_i64(fp0);
                 tcg_temp_free_i64(fp1);
 	    } else {
@@ -1701,7 +1701,7 @@ static void _decode_opc(DisasContext * ctx)
 		break; /* illegal instruction */
 	    fp = tcg_temp_new_i64();
             gen_helper_float_DT(fp, cpu_env, cpu_fpul);
-	    gen_store_fpr64(fp, DREG(B11_8));
+	    gen_store_fpr64(ctx, fp, DREG(B11_8));
 	    tcg_temp_free_i64(fp);
 	}
 	else {
@@ -1715,7 +1715,7 @@ static void _decode_opc(DisasContext * ctx)
 	    if (ctx->opcode & 0x0100)
 		break; /* illegal instruction */
 	    fp = tcg_temp_new_i64();
-	    gen_load_fpr64(fp, DREG(B11_8));
+	    gen_load_fpr64(ctx, fp, DREG(B11_8));
             gen_helper_ftrc_DT(cpu_fpul, cpu_env, fp);
 	    tcg_temp_free_i64(fp);
 	}
@@ -1737,9 +1737,9 @@ static void _decode_opc(DisasContext * ctx)
 	    if (ctx->opcode & 0x0100)
 		break; /* illegal instruction */
 	    TCGv_i64 fp = tcg_temp_new_i64();
-	    gen_load_fpr64(fp, DREG(B11_8));
+	    gen_load_fpr64(ctx, fp, DREG(B11_8));
             gen_helper_fsqrt_DT(fp, cpu_env, fp);
-	    gen_store_fpr64(fp, DREG(B11_8));
+	    gen_store_fpr64(ctx, fp, DREG(B11_8));
 	    tcg_temp_free_i64(fp);
 	} else {
             gen_helper_fsqrt_FT(FREG(B11_8), cpu_env, FREG(B11_8));
@@ -1765,7 +1765,7 @@ static void _decode_opc(DisasContext * ctx)
 	{
 	    TCGv_i64 fp = tcg_temp_new_i64();
             gen_helper_fcnvsd_FT_DT(fp, cpu_env, cpu_fpul);
-	    gen_store_fpr64(fp, DREG(B11_8));
+	    gen_store_fpr64(ctx, fp, DREG(B11_8));
 	    tcg_temp_free_i64(fp);
 	}
 	return;
@@ -1773,7 +1773,7 @@ static void _decode_opc(DisasContext * ctx)
 	CHECK_FPU_ENABLED
 	{
 	    TCGv_i64 fp = tcg_temp_new_i64();
-	    gen_load_fpr64(fp, DREG(B11_8));
+	    gen_load_fpr64(ctx, fp, DREG(B11_8));
             gen_helper_fcnvds_DT_FT(cpu_fpul, cpu_env, fp);
 	    tcg_temp_free_i64(fp);
 	}
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v3 14/30] target/sh4: Eliminate unused XREG macro
  2017-07-18 20:02 [Qemu-devel] [PATCH v3 00/30] target/sh4 improvements Richard Henderson
                   ` (11 preceding siblings ...)
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 12/30] target/sh4: Pass DisasContext to fpr64 routines Richard Henderson
@ 2017-07-18 20:02 ` Richard Henderson
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 15/30] target/sh4: Merge DREG into fpr64 routines Richard Henderson
                   ` (16 subsequent siblings)
  29 siblings, 0 replies; 34+ messages in thread
From: Richard Henderson @ 2017-07-18 20:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target/sh4/translate.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/target/sh4/translate.c b/target/sh4/translate.c
index 9473cbd..019862d 100644
--- a/target/sh4/translate.c
+++ b/target/sh4/translate.c
@@ -362,7 +362,6 @@ static inline void gen_store_fpr64(DisasContext *ctx, TCGv_i64 t, int reg)
 #define FREG(x)    cpu_fregs[(x) ^ ctx->fbank]
 
 #define XHACK(x) ((((x) & 1 ) << 4) | ((x) & 0xe))
-#define XREG(x)  FREG(XHACK(x))
 /* Assumes lsb of (x) is always 0 */
 #define DREG(x)  ((x) ^ ctx->fbank)
 
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v3 15/30] target/sh4: Merge DREG into fpr64 routines
  2017-07-18 20:02 [Qemu-devel] [PATCH v3 00/30] target/sh4 improvements Richard Henderson
                   ` (12 preceding siblings ...)
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 14/30] target/sh4: Eliminate unused XREG macro Richard Henderson
@ 2017-07-18 20:02 ` Richard Henderson
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 16/30] target/sh4: Load/store Dr as 64-bit quantities Richard Henderson
                   ` (15 subsequent siblings)
  29 siblings, 0 replies; 34+ messages in thread
From: Richard Henderson @ 2017-07-18 20:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

Also add a debugging assert that we did signal illegal opc
for odd double-precision registers.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target/sh4/translate.c | 26 +++++++++++++++-----------
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/target/sh4/translate.c b/target/sh4/translate.c
index 019862d..9c320e4 100644
--- a/target/sh4/translate.c
+++ b/target/sh4/translate.c
@@ -339,11 +339,17 @@ static void gen_delayed_conditional_jump(DisasContext * ctx)
 
 static inline void gen_load_fpr64(DisasContext *ctx, TCGv_i64 t, int reg)
 {
+    /* We have already signaled illegal instruction for odd Dr.  */
+    tcg_debug_assert((reg & 1) == 0);
+    reg ^= ctx->fbank;
     tcg_gen_concat_i32_i64(t, cpu_fregs[reg + 1], cpu_fregs[reg]);
 }
 
 static inline void gen_store_fpr64(DisasContext *ctx, TCGv_i64 t, int reg)
 {
+    /* We have already signaled illegal instruction for odd Dr.  */
+    tcg_debug_assert((reg & 1) == 0);
+    reg ^= ctx->fbank;
     tcg_gen_extr_i64_i32(cpu_fregs[reg + 1], cpu_fregs[reg], t);
 }
 
@@ -362,8 +368,6 @@ static inline void gen_store_fpr64(DisasContext *ctx, TCGv_i64 t, int reg)
 #define FREG(x)    cpu_fregs[(x) ^ ctx->fbank]
 
 #define XHACK(x) ((((x) & 1 ) << 4) | ((x) & 0xe))
-/* Assumes lsb of (x) is always 0 */
-#define DREG(x)  ((x) ^ ctx->fbank)
 
 #define CHECK_NOT_DELAY_SLOT \
     if (ctx->envflags & DELAY_SLOT_MASK) {                           \
@@ -1094,8 +1098,8 @@ static void _decode_opc(DisasContext * ctx)
 		    break; /* illegal instruction */
 		fp0 = tcg_temp_new_i64();
 		fp1 = tcg_temp_new_i64();
-		gen_load_fpr64(ctx, fp0, DREG(B11_8));
-		gen_load_fpr64(ctx, fp1, DREG(B7_4));
+		gen_load_fpr64(ctx, fp0, B11_8);
+		gen_load_fpr64(ctx, fp1, B7_4);
                 switch (ctx->opcode & 0xf00f) {
                 case 0xf000:		/* fadd Rm,Rn */
                     gen_helper_fadd_DT(fp0, cpu_env, fp0, fp1);
@@ -1116,7 +1120,7 @@ static void _decode_opc(DisasContext * ctx)
                     gen_helper_fcmp_gt_DT(cpu_sr_t, cpu_env, fp0, fp1);
                     return;
                 }
-		gen_store_fpr64(ctx, fp0, DREG(B11_8));
+		gen_store_fpr64(ctx, fp0, B11_8);
                 tcg_temp_free_i64(fp0);
                 tcg_temp_free_i64(fp1);
 	    } else {
@@ -1701,7 +1705,7 @@ static void _decode_opc(DisasContext * ctx)
 		break; /* illegal instruction */
 	    fp = tcg_temp_new_i64();
             gen_helper_float_DT(fp, cpu_env, cpu_fpul);
-	    gen_store_fpr64(ctx, fp, DREG(B11_8));
+	    gen_store_fpr64(ctx, fp, B11_8);
 	    tcg_temp_free_i64(fp);
 	}
 	else {
@@ -1715,7 +1719,7 @@ static void _decode_opc(DisasContext * ctx)
 	    if (ctx->opcode & 0x0100)
 		break; /* illegal instruction */
 	    fp = tcg_temp_new_i64();
-	    gen_load_fpr64(ctx, fp, DREG(B11_8));
+	    gen_load_fpr64(ctx, fp, B11_8);
             gen_helper_ftrc_DT(cpu_fpul, cpu_env, fp);
 	    tcg_temp_free_i64(fp);
 	}
@@ -1737,9 +1741,9 @@ static void _decode_opc(DisasContext * ctx)
 	    if (ctx->opcode & 0x0100)
 		break; /* illegal instruction */
 	    TCGv_i64 fp = tcg_temp_new_i64();
-	    gen_load_fpr64(ctx, fp, DREG(B11_8));
+	    gen_load_fpr64(ctx, fp, B11_8);
             gen_helper_fsqrt_DT(fp, cpu_env, fp);
-	    gen_store_fpr64(ctx, fp, DREG(B11_8));
+	    gen_store_fpr64(ctx, fp, B11_8);
 	    tcg_temp_free_i64(fp);
 	} else {
             gen_helper_fsqrt_FT(FREG(B11_8), cpu_env, FREG(B11_8));
@@ -1765,7 +1769,7 @@ static void _decode_opc(DisasContext * ctx)
 	{
 	    TCGv_i64 fp = tcg_temp_new_i64();
             gen_helper_fcnvsd_FT_DT(fp, cpu_env, cpu_fpul);
-	    gen_store_fpr64(ctx, fp, DREG(B11_8));
+	    gen_store_fpr64(ctx, fp, B11_8);
 	    tcg_temp_free_i64(fp);
 	}
 	return;
@@ -1773,7 +1777,7 @@ static void _decode_opc(DisasContext * ctx)
 	CHECK_FPU_ENABLED
 	{
 	    TCGv_i64 fp = tcg_temp_new_i64();
-	    gen_load_fpr64(ctx, fp, DREG(B11_8));
+	    gen_load_fpr64(ctx, fp, B11_8);
             gen_helper_fcnvds_DT_FT(cpu_fpul, cpu_env, fp);
 	    tcg_temp_free_i64(fp);
 	}
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v3 16/30] target/sh4: Load/store Dr as 64-bit quantities
  2017-07-18 20:02 [Qemu-devel] [PATCH v3 00/30] target/sh4 improvements Richard Henderson
                   ` (13 preceding siblings ...)
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 15/30] target/sh4: Merge DREG into fpr64 routines Richard Henderson
@ 2017-07-18 20:02 ` Richard Henderson
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 17/30] target/sh4: Simplify 64-bit fp reg-reg move Richard Henderson
                   ` (14 subsequent siblings)
  29 siblings, 0 replies; 34+ messages in thread
From: Richard Henderson @ 2017-07-18 20:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

This enforces proper alignment and makes the register update
more natural.  Note that there is a more serious bug fix for
fmov {DX}Rn,@(R0,Rn) to use a store instead of a load.

Signed-off-by: Richard Henderson <rth@twiddle.net>

---
V2: Fix pre-dec address error.
---
 target/sh4/translate.c | 75 ++++++++++++++++++++++++--------------------------
 1 file changed, 36 insertions(+), 39 deletions(-)

diff --git a/target/sh4/translate.c b/target/sh4/translate.c
index 9c320e4..bda6fa7 100644
--- a/target/sh4/translate.c
+++ b/target/sh4/translate.c
@@ -998,12 +998,10 @@ static void _decode_opc(DisasContext * ctx)
     case 0xf00a: /* fmov {F,D,X}Rm,@Rn - FPSCR: Nothing */
 	CHECK_FPU_ENABLED
         if (ctx->tbflags & FPSCR_SZ) {
-	    TCGv addr_hi = tcg_temp_new();
-	    int fr = XHACK(B7_4);
-	    tcg_gen_addi_i32(addr_hi, REG(B11_8), 4);
-            tcg_gen_qemu_st_i32(FREG(fr), REG(B11_8), ctx->memidx, MO_TEUL);
-            tcg_gen_qemu_st_i32(FREG(fr + 1), addr_hi, ctx->memidx, MO_TEUL);
-	    tcg_temp_free(addr_hi);
+            TCGv_i64 fp = tcg_temp_new_i64();
+            gen_load_fpr64(ctx, fp, XHACK(B7_4));
+            tcg_gen_qemu_st_i64(fp, REG(B11_8), ctx->memidx, MO_TEQ);
+            tcg_temp_free_i64(fp);
 	} else {
             tcg_gen_qemu_st_i32(FREG(B7_4), REG(B11_8), ctx->memidx, MO_TEUL);
 	}
@@ -1011,12 +1009,10 @@ static void _decode_opc(DisasContext * ctx)
     case 0xf008: /* fmov @Rm,{F,D,X}Rn - FPSCR: Nothing */
 	CHECK_FPU_ENABLED
         if (ctx->tbflags & FPSCR_SZ) {
-	    TCGv addr_hi = tcg_temp_new();
-	    int fr = XHACK(B11_8);
-	    tcg_gen_addi_i32(addr_hi, REG(B7_4), 4);
-            tcg_gen_qemu_ld_i32(FREG(fr), REG(B7_4), ctx->memidx, MO_TEUL);
-            tcg_gen_qemu_ld_i32(FREG(fr + 1), addr_hi, ctx->memidx, MO_TEUL);
-	    tcg_temp_free(addr_hi);
+            TCGv_i64 fp = tcg_temp_new_i64();
+            tcg_gen_qemu_ld_i64(fp, REG(B7_4), ctx->memidx, MO_TEQ);
+            gen_store_fpr64(ctx, fp, XHACK(B11_8));
+            tcg_temp_free_i64(fp);
 	} else {
             tcg_gen_qemu_ld_i32(FREG(B11_8), REG(B7_4), ctx->memidx, MO_TEUL);
 	}
@@ -1024,13 +1020,11 @@ static void _decode_opc(DisasContext * ctx)
     case 0xf009: /* fmov @Rm+,{F,D,X}Rn - FPSCR: Nothing */
 	CHECK_FPU_ENABLED
         if (ctx->tbflags & FPSCR_SZ) {
-	    TCGv addr_hi = tcg_temp_new();
-	    int fr = XHACK(B11_8);
-	    tcg_gen_addi_i32(addr_hi, REG(B7_4), 4);
-            tcg_gen_qemu_ld_i32(FREG(fr), REG(B7_4), ctx->memidx, MO_TEUL);
-            tcg_gen_qemu_ld_i32(FREG(fr + 1), addr_hi, ctx->memidx, MO_TEUL);
-	    tcg_gen_addi_i32(REG(B7_4), REG(B7_4), 8);
-	    tcg_temp_free(addr_hi);
+            TCGv_i64 fp = tcg_temp_new_i64();
+            tcg_gen_qemu_ld_i64(fp, REG(B7_4), ctx->memidx, MO_TEQ);
+            gen_store_fpr64(ctx, fp, XHACK(B11_8));
+            tcg_temp_free_i64(fp);
+            tcg_gen_addi_i32(REG(B7_4), REG(B7_4), 8);
 	} else {
             tcg_gen_qemu_ld_i32(FREG(B11_8), REG(B7_4), ctx->memidx, MO_TEUL);
 	    tcg_gen_addi_i32(REG(B7_4), REG(B7_4), 4);
@@ -1038,18 +1032,21 @@ static void _decode_opc(DisasContext * ctx)
 	return;
     case 0xf00b: /* fmov {F,D,X}Rm,@-Rn - FPSCR: Nothing */
 	CHECK_FPU_ENABLED
-        TCGv addr = tcg_temp_new_i32();
-        tcg_gen_subi_i32(addr, REG(B11_8), 4);
-        if (ctx->tbflags & FPSCR_SZ) {
-	    int fr = XHACK(B7_4);
-            tcg_gen_qemu_st_i32(FREG(fr + 1), addr, ctx->memidx, MO_TEUL);
-	    tcg_gen_subi_i32(addr, addr, 4);
-            tcg_gen_qemu_st_i32(FREG(fr), addr, ctx->memidx, MO_TEUL);
-	} else {
-            tcg_gen_qemu_st_i32(FREG(B7_4), addr, ctx->memidx, MO_TEUL);
-	}
-        tcg_gen_mov_i32(REG(B11_8), addr);
-        tcg_temp_free(addr);
+        {
+            TCGv addr = tcg_temp_new_i32();
+            if (ctx->tbflags & FPSCR_SZ) {
+                TCGv_i64 fp = tcg_temp_new_i64();
+                gen_load_fpr64(ctx, fp, XHACK(B7_4));
+                tcg_gen_subi_i32(addr, REG(B11_8), 8);
+                tcg_gen_qemu_st_i64(fp, addr, ctx->memidx, MO_TEQ);
+                tcg_temp_free_i64(fp);
+            } else {
+                tcg_gen_subi_i32(addr, REG(B11_8), 4);
+                tcg_gen_qemu_st_i32(FREG(B7_4), addr, ctx->memidx, MO_TEUL);
+            }
+            tcg_gen_mov_i32(REG(B11_8), addr);
+            tcg_temp_free(addr);
+        }
 	return;
     case 0xf006: /* fmov @(R0,Rm),{F,D,X}Rm - FPSCR: Nothing */
 	CHECK_FPU_ENABLED
@@ -1057,10 +1054,10 @@ static void _decode_opc(DisasContext * ctx)
 	    TCGv addr = tcg_temp_new_i32();
 	    tcg_gen_add_i32(addr, REG(B7_4), REG(0));
             if (ctx->tbflags & FPSCR_SZ) {
-		int fr = XHACK(B11_8);
-                tcg_gen_qemu_ld_i32(FREG(fr), addr, ctx->memidx, MO_TEUL);
-		tcg_gen_addi_i32(addr, addr, 4);
-                tcg_gen_qemu_ld_i32(FREG(fr + 1), addr, ctx->memidx, MO_TEUL);
+                TCGv_i64 fp = tcg_temp_new_i64();
+                tcg_gen_qemu_ld_i64(fp, addr, ctx->memidx, MO_TEQ);
+                gen_store_fpr64(ctx, fp, XHACK(B11_8));
+                tcg_temp_free_i64(fp);
 	    } else {
                 tcg_gen_qemu_ld_i32(FREG(B11_8), addr, ctx->memidx, MO_TEUL);
 	    }
@@ -1073,10 +1070,10 @@ static void _decode_opc(DisasContext * ctx)
 	    TCGv addr = tcg_temp_new();
 	    tcg_gen_add_i32(addr, REG(B11_8), REG(0));
             if (ctx->tbflags & FPSCR_SZ) {
-		int fr = XHACK(B7_4);
-                tcg_gen_qemu_ld_i32(FREG(fr), addr, ctx->memidx, MO_TEUL);
-		tcg_gen_addi_i32(addr, addr, 4);
-                tcg_gen_qemu_ld_i32(FREG(fr + 1), addr, ctx->memidx, MO_TEUL);
+                TCGv_i64 fp = tcg_temp_new_i64();
+                gen_load_fpr64(ctx, fp, XHACK(B7_4));
+                tcg_gen_qemu_st_i64(fp, addr, ctx->memidx, MO_TEQ);
+                tcg_temp_free_i64(fp);
 	    } else {
                 tcg_gen_qemu_st_i32(FREG(B7_4), addr, ctx->memidx, MO_TEUL);
 	    }
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v3 17/30] target/sh4: Simplify 64-bit fp reg-reg move
  2017-07-18 20:02 [Qemu-devel] [PATCH v3 00/30] target/sh4 improvements Richard Henderson
                   ` (14 preceding siblings ...)
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 16/30] target/sh4: Load/store Dr as 64-bit quantities Richard Henderson
@ 2017-07-18 20:02 ` Richard Henderson
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 18/30] target/sh4: Unify code for CHECK_NOT_DELAY_SLOT Richard Henderson
                   ` (13 subsequent siblings)
  29 siblings, 0 replies; 34+ messages in thread
From: Richard Henderson @ 2017-07-18 20:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

We do not need to form full 64-bit quantities in order to perform
the move.  This reduces code expansion on 64-bit hosts.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target/sh4/translate.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/target/sh4/translate.c b/target/sh4/translate.c
index bda6fa7..1164f73 100644
--- a/target/sh4/translate.c
+++ b/target/sh4/translate.c
@@ -987,10 +987,10 @@ static void _decode_opc(DisasContext * ctx)
     case 0xf00c: /* fmov {F,D,X}Rm,{F,D,X}Rn - FPSCR: Nothing */
 	CHECK_FPU_ENABLED
         if (ctx->tbflags & FPSCR_SZ) {
-	    TCGv_i64 fp = tcg_temp_new_i64();
-	    gen_load_fpr64(ctx, fp, XHACK(B7_4));
-	    gen_store_fpr64(ctx, fp, XHACK(B11_8));
-	    tcg_temp_free_i64(fp);
+            int xsrc = XHACK(B7_4);
+            int xdst = XHACK(B11_8);
+            tcg_gen_mov_i32(FREG(xdst), FREG(xsrc));
+            tcg_gen_mov_i32(FREG(xdst + 1), FREG(xsrc + 1));
 	} else {
 	    tcg_gen_mov_i32(FREG(B11_8), FREG(B7_4));
 	}
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v3 18/30] target/sh4: Unify code for CHECK_NOT_DELAY_SLOT
  2017-07-18 20:02 [Qemu-devel] [PATCH v3 00/30] target/sh4 improvements Richard Henderson
                   ` (15 preceding siblings ...)
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 17/30] target/sh4: Simplify 64-bit fp reg-reg move Richard Henderson
@ 2017-07-18 20:02 ` Richard Henderson
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 19/30] target/sh4: Unify code for CHECK_PRIVILEGED Richard Henderson
                   ` (12 subsequent siblings)
  29 siblings, 0 replies; 34+ messages in thread
From: Richard Henderson @ 2017-07-18 20:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

We do not need to emit N copies of raising an exception.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target/sh4/translate.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/target/sh4/translate.c b/target/sh4/translate.c
index 1164f73..b0a3d79 100644
--- a/target/sh4/translate.c
+++ b/target/sh4/translate.c
@@ -370,11 +370,8 @@ static inline void gen_store_fpr64(DisasContext *ctx, TCGv_i64 t, int reg)
 #define XHACK(x) ((((x) & 1 ) << 4) | ((x) & 0xe))
 
 #define CHECK_NOT_DELAY_SLOT \
-    if (ctx->envflags & DELAY_SLOT_MASK) {                           \
-        gen_save_cpu_state(ctx, true);                               \
-        gen_helper_raise_slot_illegal_instruction(cpu_env);          \
-        ctx->bstate = BS_EXCP;                                       \
-        return;                                                      \
+    if (ctx->envflags & DELAY_SLOT_MASK) {  \
+        goto do_illegal_slot;               \
     }
 
 #define CHECK_PRIVILEGED                                             \
@@ -1808,10 +1805,12 @@ static void _decode_opc(DisasContext * ctx)
 	    ctx->opcode, ctx->pc);
     fflush(stderr);
 #endif
-    gen_save_cpu_state(ctx, true);
     if (ctx->envflags & DELAY_SLOT_MASK) {
+ do_illegal_slot:
+        gen_save_cpu_state(ctx, true);
         gen_helper_raise_slot_illegal_instruction(cpu_env);
     } else {
+        gen_save_cpu_state(ctx, true);
         gen_helper_raise_illegal_instruction(cpu_env);
     }
     ctx->bstate = BS_EXCP;
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v3 19/30] target/sh4: Unify code for CHECK_PRIVILEGED
  2017-07-18 20:02 [Qemu-devel] [PATCH v3 00/30] target/sh4 improvements Richard Henderson
                   ` (16 preceding siblings ...)
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 18/30] target/sh4: Unify code for CHECK_NOT_DELAY_SLOT Richard Henderson
@ 2017-07-18 20:02 ` Richard Henderson
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 20/30] target/sh4: Unify code for CHECK_FPU_ENABLED Richard Henderson
                   ` (11 subsequent siblings)
  29 siblings, 0 replies; 34+ messages in thread
From: Richard Henderson @ 2017-07-18 20:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

We do not need to emit N copies of raising an exception.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target/sh4/translate.c | 14 ++++----------
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/target/sh4/translate.c b/target/sh4/translate.c
index b0a3d79..b40e52b 100644
--- a/target/sh4/translate.c
+++ b/target/sh4/translate.c
@@ -374,16 +374,9 @@ static inline void gen_store_fpr64(DisasContext *ctx, TCGv_i64 t, int reg)
         goto do_illegal_slot;               \
     }
 
-#define CHECK_PRIVILEGED                                             \
-    if (IS_USER(ctx)) {                                              \
-        gen_save_cpu_state(ctx, true);                               \
-        if (ctx->envflags & DELAY_SLOT_MASK) {                       \
-            gen_helper_raise_slot_illegal_instruction(cpu_env);      \
-        } else {                                                     \
-            gen_helper_raise_illegal_instruction(cpu_env);           \
-        }                                                            \
-        ctx->bstate = BS_EXCP;                                       \
-        return;                                                      \
+#define CHECK_PRIVILEGED \
+    if (IS_USER(ctx)) {                     \
+        goto do_illegal;                    \
     }
 
 #define CHECK_FPU_ENABLED                                            \
@@ -1805,6 +1798,7 @@ static void _decode_opc(DisasContext * ctx)
 	    ctx->opcode, ctx->pc);
     fflush(stderr);
 #endif
+ do_illegal:
     if (ctx->envflags & DELAY_SLOT_MASK) {
  do_illegal_slot:
         gen_save_cpu_state(ctx, true);
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v3 20/30] target/sh4: Unify code for CHECK_FPU_ENABLED
  2017-07-18 20:02 [Qemu-devel] [PATCH v3 00/30] target/sh4 improvements Richard Henderson
                   ` (17 preceding siblings ...)
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 19/30] target/sh4: Unify code for CHECK_PRIVILEGED Richard Henderson
@ 2017-07-18 20:02 ` Richard Henderson
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 21/30] target/sh4: Tidy misc illegal insn checks Richard Henderson
                   ` (10 subsequent siblings)
  29 siblings, 0 replies; 34+ messages in thread
From: Richard Henderson @ 2017-07-18 20:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

We do not need to emit N copies of raising an exception.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target/sh4/translate.c | 24 ++++++++++++++----------
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/target/sh4/translate.c b/target/sh4/translate.c
index b40e52b..4c32248 100644
--- a/target/sh4/translate.c
+++ b/target/sh4/translate.c
@@ -379,16 +379,9 @@ static inline void gen_store_fpr64(DisasContext *ctx, TCGv_i64 t, int reg)
         goto do_illegal;                    \
     }
 
-#define CHECK_FPU_ENABLED                                            \
-    if (ctx->tbflags & (1u << SR_FD)) {                              \
-        gen_save_cpu_state(ctx, true);                               \
-        if (ctx->envflags & DELAY_SLOT_MASK) {                       \
-            gen_helper_raise_slot_fpu_disable(cpu_env);              \
-        } else {                                                     \
-            gen_helper_raise_fpu_disable(cpu_env);                   \
-        }                                                            \
-        ctx->bstate = BS_EXCP;                                       \
-        return;                                                      \
+#define CHECK_FPU_ENABLED \
+    if (ctx->tbflags & (1u << SR_FD)) {     \
+        goto do_fpu_disabled;               \
     }
 
 static void _decode_opc(DisasContext * ctx)
@@ -1808,6 +1801,17 @@ static void _decode_opc(DisasContext * ctx)
         gen_helper_raise_illegal_instruction(cpu_env);
     }
     ctx->bstate = BS_EXCP;
+    return;
+
+ do_fpu_disabled:
+    gen_save_cpu_state(ctx, true);
+    if (ctx->envflags & DELAY_SLOT_MASK) {
+        gen_helper_raise_slot_fpu_disable(cpu_env);
+    } else {
+        gen_helper_raise_fpu_disable(cpu_env);
+    }
+    ctx->bstate = BS_EXCP;
+    return;
 }
 
 static void decode_opc(DisasContext * ctx)
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v3 21/30] target/sh4: Tidy misc illegal insn checks
  2017-07-18 20:02 [Qemu-devel] [PATCH v3 00/30] target/sh4 improvements Richard Henderson
                   ` (18 preceding siblings ...)
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 20/30] target/sh4: Unify code for CHECK_FPU_ENABLED Richard Henderson
@ 2017-07-18 20:02 ` Richard Henderson
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 22/30] target/sh4: Introduce CHECK_FPSCR_PR_* Richard Henderson
                   ` (9 subsequent siblings)
  29 siblings, 0 replies; 34+ messages in thread
From: Richard Henderson @ 2017-07-18 20:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

Now that we have a do_illegal label, use goto in order
to self-document the forcing of the exception.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target/sh4/translate.c | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/target/sh4/translate.c b/target/sh4/translate.c
index 4c32248..09e4ace 100644
--- a/target/sh4/translate.c
+++ b/target/sh4/translate.c
@@ -1074,8 +1074,9 @@ static void _decode_opc(DisasContext * ctx)
             if (ctx->tbflags & FPSCR_PR) {
                 TCGv_i64 fp0, fp1;
 
-		if (ctx->opcode & 0x0110)
-		    break; /* illegal instruction */
+                if (ctx->opcode & 0x0110) {
+                    goto do_illegal;
+                }
 		fp0 = tcg_temp_new_i64();
 		fp1 = tcg_temp_new_i64();
 		gen_load_fpr64(ctx, fp0, B11_8);
@@ -1137,7 +1138,7 @@ static void _decode_opc(DisasContext * ctx)
         {
             CHECK_FPU_ENABLED
             if (ctx->tbflags & FPSCR_PR) {
-                break; /* illegal instruction */
+                goto do_illegal;
             } else {
                 gen_helper_fmac_FT(FREG(B11_8), cpu_env,
                                    FREG(0), FREG(B7_4), FREG(B11_8));
@@ -1681,8 +1682,9 @@ static void _decode_opc(DisasContext * ctx)
 	CHECK_FPU_ENABLED
         if (ctx->tbflags & FPSCR_PR) {
 	    TCGv_i64 fp;
-	    if (ctx->opcode & 0x0100)
-		break; /* illegal instruction */
+            if (ctx->opcode & 0x0100) {
+                goto do_illegal;
+            }
 	    fp = tcg_temp_new_i64();
             gen_helper_float_DT(fp, cpu_env, cpu_fpul);
 	    gen_store_fpr64(ctx, fp, B11_8);
@@ -1696,8 +1698,9 @@ static void _decode_opc(DisasContext * ctx)
 	CHECK_FPU_ENABLED
         if (ctx->tbflags & FPSCR_PR) {
 	    TCGv_i64 fp;
-	    if (ctx->opcode & 0x0100)
-		break; /* illegal instruction */
+            if (ctx->opcode & 0x0100) {
+		goto do_illegal;
+            }
 	    fp = tcg_temp_new_i64();
 	    gen_load_fpr64(ctx, fp, B11_8);
             gen_helper_ftrc_DT(cpu_fpul, cpu_env, fp);
@@ -1718,8 +1721,9 @@ static void _decode_opc(DisasContext * ctx)
     case 0xf06d: /* fsqrt FRn */
 	CHECK_FPU_ENABLED
         if (ctx->tbflags & FPSCR_PR) {
-	    if (ctx->opcode & 0x0100)
-		break; /* illegal instruction */
+            if (ctx->opcode & 0x0100) {
+		goto do_illegal;
+            }
 	    TCGv_i64 fp = tcg_temp_new_i64();
 	    gen_load_fpr64(ctx, fp, B11_8);
             gen_helper_fsqrt_DT(fp, cpu_env, fp);
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v3 22/30] target/sh4: Introduce CHECK_FPSCR_PR_*
  2017-07-18 20:02 [Qemu-devel] [PATCH v3 00/30] target/sh4 improvements Richard Henderson
                   ` (19 preceding siblings ...)
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 21/30] target/sh4: Tidy misc illegal insn checks Richard Henderson
@ 2017-07-18 20:02 ` Richard Henderson
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 23/30] target/sh4: Introduce CHECK_SH4A Richard Henderson
                   ` (8 subsequent siblings)
  29 siblings, 0 replies; 34+ messages in thread
From: Richard Henderson @ 2017-07-18 20:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target/sh4/translate.c | 57 +++++++++++++++++++++++++++-----------------------
 1 file changed, 31 insertions(+), 26 deletions(-)

diff --git a/target/sh4/translate.c b/target/sh4/translate.c
index 09e4ace..346f672 100644
--- a/target/sh4/translate.c
+++ b/target/sh4/translate.c
@@ -384,6 +384,16 @@ static inline void gen_store_fpr64(DisasContext *ctx, TCGv_i64 t, int reg)
         goto do_fpu_disabled;               \
     }
 
+#define CHECK_FPSCR_PR_0 \
+    if (ctx->tbflags & FPSCR_PR) {          \
+        goto do_illegal;                    \
+    }
+
+#define CHECK_FPSCR_PR_1 \
+    if (!(ctx->tbflags & FPSCR_PR)) {       \
+        goto do_illegal;                    \
+    }
+
 static void _decode_opc(DisasContext * ctx)
 {
     /* This code tries to make movcal emulation sufficiently
@@ -1135,16 +1145,11 @@ static void _decode_opc(DisasContext * ctx)
 	}
 	return;
     case 0xf00e: /* fmac FR0,RM,Rn */
-        {
-            CHECK_FPU_ENABLED
-            if (ctx->tbflags & FPSCR_PR) {
-                goto do_illegal;
-            } else {
-                gen_helper_fmac_FT(FREG(B11_8), cpu_env,
-                                   FREG(0), FREG(B7_4), FREG(B11_8));
-                return;
-            }
-        }
+        CHECK_FPU_ENABLED
+        CHECK_FPSCR_PR_0
+        gen_helper_fmac_FT(FREG(B11_8), cpu_env,
+                           FREG(0), FREG(B7_4), FREG(B11_8));
+        return;
     }
 
     switch (ctx->opcode & 0xff00) {
@@ -1738,16 +1743,14 @@ static void _decode_opc(DisasContext * ctx)
 	break;
     case 0xf08d: /* fldi0 FRn - FPSCR: R[PR] */
 	CHECK_FPU_ENABLED
-        if (!(ctx->tbflags & FPSCR_PR)) {
-	    tcg_gen_movi_i32(FREG(B11_8), 0);
-	}
-	return;
+        CHECK_FPSCR_PR_0
+        tcg_gen_movi_i32(FREG(B11_8), 0);
+        return;
     case 0xf09d: /* fldi1 FRn - FPSCR: R[PR] */
 	CHECK_FPU_ENABLED
-        if (!(ctx->tbflags & FPSCR_PR)) {
-	    tcg_gen_movi_i32(FREG(B11_8), 0x3f800000);
-	}
-	return;
+        CHECK_FPSCR_PR_0
+        tcg_gen_movi_i32(FREG(B11_8), 0x3f800000);
+        return;
     case 0xf0ad: /* fcnvsd FPUL,DRn */
 	CHECK_FPU_ENABLED
 	{
@@ -1768,10 +1771,10 @@ static void _decode_opc(DisasContext * ctx)
 	return;
     case 0xf0ed: /* fipr FVm,FVn */
         CHECK_FPU_ENABLED
-        if ((ctx->tbflags & FPSCR_PR) == 0) {
-            TCGv m, n;
-            m = tcg_const_i32((ctx->opcode >> 8) & 3);
-            n = tcg_const_i32((ctx->opcode >> 10) & 3);
+        CHECK_FPSCR_PR_1
+        {
+            TCGv m = tcg_const_i32((ctx->opcode >> 8) & 3);
+            TCGv n = tcg_const_i32((ctx->opcode >> 10) & 3);
             gen_helper_fipr(cpu_env, m, n);
             tcg_temp_free(m);
             tcg_temp_free(n);
@@ -1780,10 +1783,12 @@ static void _decode_opc(DisasContext * ctx)
         break;
     case 0xf0fd: /* ftrv XMTRX,FVn */
         CHECK_FPU_ENABLED
-        if ((ctx->opcode & 0x0300) == 0x0100 &&
-            (ctx->tbflags & FPSCR_PR) == 0) {
-            TCGv n;
-            n = tcg_const_i32((ctx->opcode >> 10) & 3);
+        CHECK_FPSCR_PR_1
+        {
+            if ((ctx->opcode & 0x0300) != 0x0100) {
+                goto do_illegal;
+            }
+            TCGv n = tcg_const_i32((ctx->opcode >> 10) & 3);
             gen_helper_ftrv(cpu_env, n);
             tcg_temp_free(n);
             return;
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v3 23/30] target/sh4: Introduce CHECK_SH4A
  2017-07-18 20:02 [Qemu-devel] [PATCH v3 00/30] target/sh4 improvements Richard Henderson
                   ` (20 preceding siblings ...)
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 22/30] target/sh4: Introduce CHECK_FPSCR_PR_* Richard Henderson
@ 2017-07-18 20:02 ` Richard Henderson
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 24/30] target/sh4: Implement fpchg Richard Henderson
                   ` (7 subsequent siblings)
  29 siblings, 0 replies; 34+ messages in thread
From: Richard Henderson @ 2017-07-18 20:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target/sh4/translate.c | 66 +++++++++++++++++++++++---------------------------
 1 file changed, 30 insertions(+), 36 deletions(-)

diff --git a/target/sh4/translate.c b/target/sh4/translate.c
index 346f672..239a0d9 100644
--- a/target/sh4/translate.c
+++ b/target/sh4/translate.c
@@ -394,6 +394,11 @@ static inline void gen_store_fpr64(DisasContext *ctx, TCGv_i64 t, int reg)
         goto do_illegal;                    \
     }
 
+#define CHECK_SH4A \
+    if (!(ctx->features & SH_FEATURE_SH4A)) { \
+        goto do_illegal;                      \
+    }
+
 static void _decode_opc(DisasContext * ctx)
 {
     /* This code tries to make movcal emulation sufficiently
@@ -1473,7 +1478,7 @@ static void _decode_opc(DisasContext * ctx)
 	LDST(ssr,  0x403e, 0x4037, 0x0032, 0x4033, CHECK_PRIVILEGED)
 	LDST(spc,  0x404e, 0x4047, 0x0042, 0x4043, CHECK_PRIVILEGED)
 	ST(sgr,  0x003a, 0x4032, CHECK_PRIVILEGED)
-	LD(sgr,  0x403a, 0x4036, CHECK_PRIVILEGED if (!(ctx->features & SH_FEATURE_SH4A)) break;)
+	LD(sgr,  0x403a, 0x4036, CHECK_PRIVILEGED CHECK_SH4A)
 	LDST(dbr,  0x40fa, 0x40f6, 0x00fa, 0x40f2, CHECK_PRIVILEGED)
 	LDST(mach, 0x400a, 0x4006, 0x000a, 0x4002, {})
 	LDST(macl, 0x401a, 0x4016, 0x001a, 0x4012, {})
@@ -1523,21 +1528,19 @@ static void _decode_opc(DisasContext * ctx)
         ctx->has_movcal = 1;
 	return;
     case 0x40a9:                /* movua.l @Rm,R0 */
+        CHECK_SH4A
         /* Load non-boundary-aligned data */
-        if (ctx->features & SH_FEATURE_SH4A) {
-            tcg_gen_qemu_ld_i32(REG(0), REG(B11_8), ctx->memidx,
-                                MO_TEUL | MO_UNALN);
-            return;
-        }
+        tcg_gen_qemu_ld_i32(REG(0), REG(B11_8), ctx->memidx,
+                            MO_TEUL | MO_UNALN);
+        return;
         break;
     case 0x40e9:                /* movua.l @Rm+,R0 */
+        CHECK_SH4A
         /* Load non-boundary-aligned data */
-        if (ctx->features & SH_FEATURE_SH4A) {
-            tcg_gen_qemu_ld_i32(REG(0), REG(B11_8), ctx->memidx,
-                                MO_TEUL | MO_UNALN);
-            tcg_gen_addi_i32(REG(B11_8), REG(B11_8), 4);
-            return;
-        }
+        tcg_gen_qemu_ld_i32(REG(0), REG(B11_8), ctx->memidx,
+                            MO_TEUL | MO_UNALN);
+        tcg_gen_addi_i32(REG(B11_8), REG(B11_8), 4);
+        return;
         break;
     case 0x0029:		/* movt Rn */
         tcg_gen_mov_i32(REG(B11_8), cpu_sr_t);
@@ -1546,7 +1549,8 @@ static void _decode_opc(DisasContext * ctx)
         /* MOVCO.L: if (lock still held) R0 -> (Rn), T=1; else T=0.
            Approximate "lock still held" with a comparison of address
            from the MOVLI insn and a cmpxchg with the value read.  */
-        if (ctx->features & SH_FEATURE_SH4A) {
+        CHECK_SH4A
+        {
             TCGLabel *fail = gen_new_label();
             TCGLabel *done = gen_new_label();
 
@@ -1564,20 +1568,15 @@ static void _decode_opc(DisasContext * ctx)
             gen_set_label(done);
             tcg_gen_movi_i32(cpu_lock_addr, -1);
             return;
-        } else {
-            break;
         }
     case 0x0063:
         /* MOVLI.L @Rm -> R0, and remember the address and value loaded.  */
-        if (ctx->features & SH_FEATURE_SH4A) {
-            tcg_gen_qemu_ld_i32(cpu_lock_value, REG(B11_8),
-                                ctx->memidx, MO_TESL);
-            tcg_gen_mov_i32(cpu_lock_addr, REG(B11_8));
-            tcg_gen_mov_i32(REG(0), cpu_lock_value);
-            return;
-        } else {
-            break;
-        }
+        CHECK_SH4A
+        tcg_gen_qemu_ld_i32(cpu_lock_value, REG(B11_8),
+                            ctx->memidx, MO_TESL);
+        tcg_gen_mov_i32(cpu_lock_addr, REG(B11_8));
+        tcg_gen_mov_i32(REG(0), cpu_lock_value);
+        return;
     case 0x0093:		/* ocbi @Rn */
 	{
             gen_helper_ocbi(cpu_env, REG(B11_8));
@@ -1592,20 +1591,15 @@ static void _decode_opc(DisasContext * ctx)
     case 0x0083:		/* pref @Rn */
 	return;
     case 0x00d3:		/* prefi @Rn */
-	if (ctx->features & SH_FEATURE_SH4A)
-	    return;
-	else
-	    break;
+        CHECK_SH4A
+        return;
     case 0x00e3:		/* icbi @Rn */
-	if (ctx->features & SH_FEATURE_SH4A)
-	    return;
-	else
-	    break;
+        CHECK_SH4A
+        return;
     case 0x00ab:		/* synco */
-        if (ctx->features & SH_FEATURE_SH4A) {
-            tcg_gen_mb(TCG_MO_ALL | TCG_BAR_SC);
-            return;
-        }
+        CHECK_SH4A
+        tcg_gen_mb(TCG_MO_ALL | TCG_BAR_SC);
+        return;
         break;
     case 0x4024:		/* rotcl Rn */
 	{
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v3 24/30] target/sh4: Implement fpchg
  2017-07-18 20:02 [Qemu-devel] [PATCH v3 00/30] target/sh4 improvements Richard Henderson
                   ` (21 preceding siblings ...)
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 23/30] target/sh4: Introduce CHECK_SH4A Richard Henderson
@ 2017-07-18 20:02 ` Richard Henderson
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 25/30] target/sh4: Add missing FPSCR.PR == 0 checks Richard Henderson
                   ` (6 subsequent siblings)
  29 siblings, 0 replies; 34+ messages in thread
From: Richard Henderson @ 2017-07-18 20:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target/sh4/translate.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/target/sh4/translate.c b/target/sh4/translate.c
index 239a0d9..65cc7d1 100644
--- a/target/sh4/translate.c
+++ b/target/sh4/translate.c
@@ -486,6 +486,11 @@ static void _decode_opc(DisasContext * ctx)
         tcg_gen_xori_i32(cpu_fpscr, cpu_fpscr, FPSCR_SZ);
 	ctx->bstate = BS_STOP;
 	return;
+    case 0xf7fd:                /* fpchg */
+        CHECK_SH4A
+        tcg_gen_xori_i32(cpu_fpscr, cpu_fpscr, FPSCR_PR);
+        ctx->bstate = BS_STOP;
+        return;
     case 0x0009:		/* nop */
 	return;
     case 0x001b:		/* sleep */
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v3 25/30] target/sh4: Add missing FPSCR.PR == 0 checks
  2017-07-18 20:02 [Qemu-devel] [PATCH v3 00/30] target/sh4 improvements Richard Henderson
                   ` (22 preceding siblings ...)
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 24/30] target/sh4: Implement fpchg Richard Henderson
@ 2017-07-18 20:02 ` Richard Henderson
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 26/30] target/sh4: Implement fsrra Richard Henderson
                   ` (5 subsequent siblings)
  29 siblings, 0 replies; 34+ messages in thread
From: Richard Henderson @ 2017-07-18 20:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

Both frchg and fschg require PR == 0, otherwise undefined_operation.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target/sh4/translate.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/target/sh4/translate.c b/target/sh4/translate.c
index 65cc7d1..a5d07a0 100644
--- a/target/sh4/translate.c
+++ b/target/sh4/translate.c
@@ -479,10 +479,12 @@ static void _decode_opc(DisasContext * ctx)
         tcg_gen_movi_i32(cpu_sr_t, 1);
 	return;
     case 0xfbfd:		/* frchg */
+        CHECK_FPSCR_PR_0
 	tcg_gen_xori_i32(cpu_fpscr, cpu_fpscr, FPSCR_FR);
 	ctx->bstate = BS_STOP;
 	return;
     case 0xf3fd:		/* fschg */
+        CHECK_FPSCR_PR_0
         tcg_gen_xori_i32(cpu_fpscr, cpu_fpscr, FPSCR_SZ);
 	ctx->bstate = BS_STOP;
 	return;
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v3 26/30] target/sh4: Implement fsrra
  2017-07-18 20:02 [Qemu-devel] [PATCH v3 00/30] target/sh4 improvements Richard Henderson
                   ` (23 preceding siblings ...)
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 25/30] target/sh4: Add missing FPSCR.PR == 0 checks Richard Henderson
@ 2017-07-18 20:02 ` Richard Henderson
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 27/30] target/sh4: Use tcg_gen_lookup_and_goto_ptr Richard Henderson
                   ` (4 subsequent siblings)
  29 siblings, 0 replies; 34+ messages in thread
From: Richard Henderson @ 2017-07-18 20:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

Signed-off-by: Richard Henderson <rth@twiddle.net>

---
V2: Fix argument types in helper.h
---
 target/sh4/helper.h    |  1 +
 target/sh4/op_helper.c | 16 ++++++++++++++++
 target/sh4/translate.c |  2 ++
 3 files changed, 19 insertions(+)

diff --git a/target/sh4/helper.h b/target/sh4/helper.h
index 6c6fa04..1e768fc 100644
--- a/target/sh4/helper.h
+++ b/target/sh4/helper.h
@@ -37,6 +37,7 @@ DEF_HELPER_FLAGS_3(fsub_FT, TCG_CALL_NO_WG, f32, env, f32, f32)
 DEF_HELPER_FLAGS_3(fsub_DT, TCG_CALL_NO_WG, f64, env, f64, f64)
 DEF_HELPER_FLAGS_2(fsqrt_FT, TCG_CALL_NO_WG, f32, env, f32)
 DEF_HELPER_FLAGS_2(fsqrt_DT, TCG_CALL_NO_WG, f64, env, f64)
+DEF_HELPER_FLAGS_2(fsrra_FT, TCG_CALL_NO_WG, f32, env, f32)
 DEF_HELPER_FLAGS_2(ftrc_FT, TCG_CALL_NO_WG, i32, env, f32)
 DEF_HELPER_FLAGS_2(ftrc_DT, TCG_CALL_NO_WG, i32, env, f64)
 DEF_HELPER_3(fipr, void, env, i32, i32)
diff --git a/target/sh4/op_helper.c b/target/sh4/op_helper.c
index 8513f38..d798f23 100644
--- a/target/sh4/op_helper.c
+++ b/target/sh4/op_helper.c
@@ -406,6 +406,22 @@ float64 helper_fsqrt_DT(CPUSH4State *env, float64 t0)
     return t0;
 }
 
+float32 helper_fsrra_FT(CPUSH4State *env, float32 t0)
+{
+    set_float_exception_flags(0, &env->fp_status);
+    /* "Approximate" 1/sqrt(x) via actual computation.  */
+    t0 = float32_sqrt(t0, &env->fp_status);
+    t0 = float32_div(float32_one, t0, &env->fp_status);
+    /* Since this is supposed to be an approximation, an imprecision
+       exception is required.  One supposes this also follows the usual
+       IEEE rule that other exceptions take precidence.  */
+    if (get_float_exception_flags(&env->fp_status) == 0) {
+        set_float_exception_flags(float_flag_inexact, &env->fp_status);
+    }
+    update_fpscr(env, GETPC());
+    return t0;
+}
+
 float32 helper_fsub_FT(CPUSH4State *env, float32 t0, float32 t1)
 {
     set_float_exception_flags(0, &env->fp_status);
diff --git a/target/sh4/translate.c b/target/sh4/translate.c
index a5d07a0..12d5ed7 100644
--- a/target/sh4/translate.c
+++ b/target/sh4/translate.c
@@ -1741,6 +1741,8 @@ static void _decode_opc(DisasContext * ctx)
 	return;
     case 0xf07d: /* fsrra FRn */
 	CHECK_FPU_ENABLED
+        CHECK_FPSCR_PR_0
+        gen_helper_fsrra_FT(FREG(B11_8), cpu_env, FREG(B11_8));
 	break;
     case 0xf08d: /* fldi0 FRn - FPSCR: R[PR] */
 	CHECK_FPU_ENABLED
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v3 27/30] target/sh4: Use tcg_gen_lookup_and_goto_ptr
  2017-07-18 20:02 [Qemu-devel] [PATCH v3 00/30] target/sh4 improvements Richard Henderson
                   ` (24 preceding siblings ...)
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 26/30] target/sh4: Implement fsrra Richard Henderson
@ 2017-07-18 20:02 ` Richard Henderson
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 28/30] tcg: Fix off-by-one in assert in page_set_flags Richard Henderson
                   ` (3 subsequent siblings)
  29 siblings, 0 replies; 34+ messages in thread
From: Richard Henderson @ 2017-07-18 20:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target/sh4/translate.c | 30 ++++++++++++++++++++----------
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/target/sh4/translate.c b/target/sh4/translate.c
index 12d5ed7..ab81084 100644
--- a/target/sh4/translate.c
+++ b/target/sh4/translate.c
@@ -235,12 +235,15 @@ static inline void gen_save_cpu_state(DisasContext *ctx, bool save_pc)
     }
 }
 
+static inline bool use_exit_tb(DisasContext *ctx)
+{
+    return (ctx->tbflags & GUSA_EXCLUSIVE) != 0;
+}
+
 static inline bool use_goto_tb(DisasContext *ctx, target_ulong dest)
 {
-    if (unlikely(ctx->singlestep_enabled)) {
-        return false;
-    }
-    if (ctx->tbflags & GUSA_EXCLUSIVE) {
+    /* Use a direct jump if in same page and singlestep not enabled */
+    if (unlikely(ctx->singlestep_enabled || use_exit_tb(ctx))) {
         return false;
     }
 #ifndef CONFIG_USER_ONLY
@@ -253,28 +256,35 @@ static inline bool use_goto_tb(DisasContext *ctx, target_ulong dest)
 static void gen_goto_tb(DisasContext *ctx, int n, target_ulong dest)
 {
     if (use_goto_tb(ctx, dest)) {
-	/* Use a direct jump if in same page and singlestep not enabled */
         tcg_gen_goto_tb(n);
         tcg_gen_movi_i32(cpu_pc, dest);
         tcg_gen_exit_tb((uintptr_t)ctx->tb + n);
     } else {
         tcg_gen_movi_i32(cpu_pc, dest);
-        if (ctx->singlestep_enabled)
+        if (ctx->singlestep_enabled) {
             gen_helper_debug(cpu_env);
-        tcg_gen_exit_tb(0);
+        } else if (use_exit_tb(ctx)) {
+            tcg_gen_exit_tb(0);
+        } else {
+            tcg_gen_lookup_and_goto_ptr(cpu_pc);
+        }
     }
 }
 
 static void gen_jump(DisasContext * ctx)
 {
-    if (ctx->delayed_pc == (uint32_t) - 1) {
+    if (ctx->delayed_pc == -1) {
 	/* Target is not statically known, it comes necessarily from a
 	   delayed jump as immediate jump are conditinal jumps */
 	tcg_gen_mov_i32(cpu_pc, cpu_delayed_pc);
         tcg_gen_discard_i32(cpu_delayed_pc);
-	if (ctx->singlestep_enabled)
+	if (ctx->singlestep_enabled) {
             gen_helper_debug(cpu_env);
-	tcg_gen_exit_tb(0);
+        } else if (use_exit_tb(ctx)) {
+            tcg_gen_exit_tb(0);
+        } else {
+            tcg_gen_lookup_and_goto_ptr(cpu_pc);
+        }
     } else {
 	gen_goto_tb(ctx, 0, ctx->delayed_pc);
     }
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v3 28/30] tcg: Fix off-by-one in assert in page_set_flags
  2017-07-18 20:02 [Qemu-devel] [PATCH v3 00/30] target/sh4 improvements Richard Henderson
                   ` (25 preceding siblings ...)
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 27/30] target/sh4: Use tcg_gen_lookup_and_goto_ptr Richard Henderson
@ 2017-07-18 20:02 ` Richard Henderson
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 29/30] linux-user: Tidy and enforce reserved_va initialization Richard Henderson
                   ` (2 subsequent siblings)
  29 siblings, 0 replies; 34+ messages in thread
From: Richard Henderson @ 2017-07-18 20:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

Most of the users of page_set_flags offset (page, page + len) as
the end points.  One might consider this an error, since the other
users do supply an endpoint as the last byte of the region.

However, the first thing that page_set_flags does is round end UP
to the start of the next page.  Which means computing page + len - 1
is in the end pointless.  Therefore, accept this usage and do not
assert when given the exact size of the vm as the endpoint.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 accel/tcg/translate-all.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 4e1831c..f304ee1 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -2063,7 +2063,7 @@ void page_set_flags(target_ulong start, target_ulong end, int flags)
        guest address space.  If this assert fires, it probably indicates
        a missing call to h2g_valid.  */
 #if TARGET_ABI_BITS > L1_MAP_ADDR_SPACE_BITS
-    assert(end < ((target_ulong)1 << L1_MAP_ADDR_SPACE_BITS));
+    assert(end <= ((target_ulong)1 << L1_MAP_ADDR_SPACE_BITS));
 #endif
     assert(start < end);
     assert_memory_lock();
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v3 29/30] linux-user: Tidy and enforce reserved_va initialization
  2017-07-18 20:02 [Qemu-devel] [PATCH v3 00/30] target/sh4 improvements Richard Henderson
                   ` (26 preceding siblings ...)
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 28/30] tcg: Fix off-by-one in assert in page_set_flags Richard Henderson
@ 2017-07-18 20:02 ` Richard Henderson
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 30/30] linux-user/sh4: Reduce TARGET_VIRT_ADDR_SPACE_BITS to 31 Richard Henderson
  2017-07-18 21:02 ` [Qemu-devel] [PATCH v3 00/30] target/sh4 improvements Aurelien Jarno
  29 siblings, 0 replies; 34+ messages in thread
From: Richard Henderson @ 2017-07-18 20:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

We had a check using TARGET_VIRT_ADDR_SPACE_BITS to make sure
that the allocation coming in from the command-line option was
not too large, but that didn't include target-specific knowledge
about other restrictions on user-space.

Remove several target-specific hacks in linux-user/main.c.

For MIPS and Nios, we can replace them with proper adjustments
to the respective target's TARGET_VIRT_ADDR_SPACE_BITS definition.

For ARM, we had no existing ifdef but I suspect that the current
default value of 0xf7000000 was chosen with this in mind.  Define
a workable value in linux-user/arm/, and also document why the
special case is required.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 linux-user/arm/target_cpu.h |  4 ++++
 target/mips/mips-defs.h     |  6 +++++-
 target/nios2/cpu.h          |  6 +++++-
 linux-user/main.c           | 38 +++++++++++++++++++++++++-------------
 4 files changed, 39 insertions(+), 15 deletions(-)

diff --git a/linux-user/arm/target_cpu.h b/linux-user/arm/target_cpu.h
index d888219..c4f79eb 100644
--- a/linux-user/arm/target_cpu.h
+++ b/linux-user/arm/target_cpu.h
@@ -19,6 +19,10 @@
 #ifndef ARM_TARGET_CPU_H
 #define ARM_TARGET_CPU_H
 
+/* We need to be able to map the commpage.
+   See validate_guest_space in linux-user/elfload.c.  */
+#define MAX_RESERVED_VA  0xfff00000ul
+
 static inline void cpu_clone_regs(CPUARMState *env, target_ulong newsp)
 {
     if (newsp) {
diff --git a/target/mips/mips-defs.h b/target/mips/mips-defs.h
index 047554e..d239069 100644
--- a/target/mips/mips-defs.h
+++ b/target/mips/mips-defs.h
@@ -15,7 +15,11 @@
 #else
 #define TARGET_LONG_BITS 32
 #define TARGET_PHYS_ADDR_SPACE_BITS 40
-#define TARGET_VIRT_ADDR_SPACE_BITS 32
+# ifdef CONFIG_USER_ONLY
+#  define TARGET_VIRT_ADDR_SPACE_BITS 31
+# else
+#  define TARGET_VIRT_ADDR_SPACE_BITS 32
+#endif
 #endif
 
 /* Masks used to mark instructions to indicate which ISA level they
diff --git a/target/nios2/cpu.h b/target/nios2/cpu.h
index 13931f3..da3f637 100644
--- a/target/nios2/cpu.h
+++ b/target/nios2/cpu.h
@@ -227,7 +227,11 @@ qemu_irq *nios2_cpu_pic_init(Nios2CPU *cpu);
 void nios2_check_interrupts(CPUNios2State *env);
 
 #define TARGET_PHYS_ADDR_SPACE_BITS 32
-#define TARGET_VIRT_ADDR_SPACE_BITS 32
+#ifdef CONFIG_USER_ONLY
+# define TARGET_VIRT_ADDR_SPACE_BITS 31
+#else
+# define TARGET_VIRT_ADDR_SPACE_BITS 32
+#endif
 
 #define cpu_init(cpu_model) CPU(cpu_nios2_init(cpu_model))
 
diff --git a/linux-user/main.c b/linux-user/main.c
index 30f0ae1..7693a62 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -60,23 +60,38 @@ do {                                                                    \
     }                                                                   \
 } while (0)
 
-#if (TARGET_LONG_BITS == 32) && (HOST_LONG_BITS == 64)
 /*
  * When running 32-on-64 we should make sure we can fit all of the possible
  * guest address space into a contiguous chunk of virtual host memory.
  *
  * This way we will never overlap with our own libraries or binaries or stack
  * or anything else that QEMU maps.
+ *
+ * Many cpus reserve the high bit (or more than one for some 64-bit cpus)
+ * of the address for the kernel.  Some cpus rely on this and user space
+ * uses the high bit(s) for pointer tagging and the like.  For them, we
+ * must preserve the expected address space.
  */
-# if defined(TARGET_MIPS) || defined(TARGET_NIOS2)
-/*
- * MIPS only supports 31 bits of virtual address space for user space.
- * Nios2 also only supports 31 bits.
- */
-unsigned long reserved_va = 0x77000000;
+#ifndef MAX_RESERVED_VA
+# if HOST_LONG_BITS > TARGET_VIRT_ADDR_SPACE_BITS
+#  if TARGET_VIRT_ADDR_SPACE_BITS == 32 && \
+      (TARGET_LONG_BITS == 32 || defined(TARGET_ABI32))
+/* There are a number of places where we assign reserved_va to a variable
+   of type abi_ulong and expect it to fit.  Avoid the last page.  */
+#   define MAX_RESERVED_VA  (0xfffffffful & TARGET_PAGE_MASK)
+#  else
+#   define MAX_RESERVED_VA  (1ul << TARGET_VIRT_ADDR_SPACE_BITS)
+#  endif
 # else
-unsigned long reserved_va = 0xf7000000;
+#  define MAX_RESERVED_VA  0
 # endif
+#endif
+
+/* That said, reserving *too* much vm space via mmap can run into problems
+   with rlimits, oom due to page table creation, etc.  We will still try it,
+   if directed by the command-line option, but not by default.  */
+#if HOST_LONG_BITS == 64 && TARGET_VIRT_ADDR_SPACE_BITS <= 32
+unsigned long reserved_va = MAX_RESERVED_VA;
 #else
 unsigned long reserved_va;
 #endif
@@ -3976,11 +3991,8 @@ static void handle_arg_reserved_va(const char *arg)
         unsigned long unshifted = reserved_va;
         p++;
         reserved_va <<= shift;
-        if (((reserved_va >> shift) != unshifted)
-#if HOST_LONG_BITS > TARGET_VIRT_ADDR_SPACE_BITS
-            || (reserved_va > (1ul << TARGET_VIRT_ADDR_SPACE_BITS))
-#endif
-            ) {
+        if (reserved_va >> shift != unshifted
+            || (MAX_RESERVED_VA && reserved_va > MAX_RESERVED_VA)) {
             fprintf(stderr, "Reserved virtual address too big\n");
             exit(EXIT_FAILURE);
         }
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v3 30/30] linux-user/sh4: Reduce TARGET_VIRT_ADDR_SPACE_BITS to 31
  2017-07-18 20:02 [Qemu-devel] [PATCH v3 00/30] target/sh4 improvements Richard Henderson
                   ` (27 preceding siblings ...)
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 29/30] linux-user: Tidy and enforce reserved_va initialization Richard Henderson
@ 2017-07-18 20:02 ` Richard Henderson
  2017-07-18 21:02 ` [Qemu-devel] [PATCH v3 00/30] target/sh4 improvements Aurelien Jarno
  29 siblings, 0 replies; 34+ messages in thread
From: Richard Henderson @ 2017-07-18 20:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

The real kernel has TASK_SIZE as 0x7c000000, due to quirks with
a couple of SH parts.  But nominally user-space is limited to 2GB.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target/sh4/cpu.h | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/target/sh4/cpu.h b/target/sh4/cpu.h
index e3abb6a..3121d1e 100644
--- a/target/sh4/cpu.h
+++ b/target/sh4/cpu.h
@@ -45,7 +45,11 @@
 #define TARGET_PAGE_BITS 12	/* 4k XXXXX */
 
 #define TARGET_PHYS_ADDR_SPACE_BITS 32
-#define TARGET_VIRT_ADDR_SPACE_BITS 32
+#ifdef CONFIG_USER_ONLY
+# define TARGET_VIRT_ADDR_SPACE_BITS 31
+#else
+# define TARGET_VIRT_ADDR_SPACE_BITS 32
+#endif
 
 #define SR_MD 30
 #define SR_RB 29
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v3 01/30] target/sh4: Use cmpxchg for movco
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 01/30] target/sh4: Use cmpxchg for movco Richard Henderson
@ 2017-07-18 20:19   ` Aurelien Jarno
  2017-07-18 21:36     ` Richard Henderson
  0 siblings, 1 reply; 34+ messages in thread
From: Aurelien Jarno @ 2017-07-18 20:19 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On 2017-07-18 10:02, Richard Henderson wrote:
> As for other targets, cmpxchg isn't quite right for ll/sc,
> suffering from an ABA race, but is sufficient to implement
> portable atomic operations.
> 
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> 
> ---
> V2: Clear lock_addr in rte, do_interrupt, syscall entry, & signal delivery.
>     Fix movli to tollerate overlap between R0 and REG(B11_8).
> ---
>  target/sh4/cpu.h       |  3 ++-
>  linux-user/main.c      |  1 +
>  linux-user/signal.c    |  2 ++
>  target/sh4/helper.c    |  2 +-
>  target/sh4/translate.c | 72 +++++++++++++++++++++++++++++---------------------
>  5 files changed, 48 insertions(+), 32 deletions(-)

I still believe that for the system case, we should implement the
behaviour described in the manual, that is setting ldst to 1 in movli
and clearing it in an interrupt. Otherwise the plainly silly following
instruction sequence will give different results than on real silicon:

    movli.l   @r1,r0
    add       #4, r1
    movco.l   r0, @r1

Yes, this is plainly silly to use movli/movco to copy data, but we have
also implemented silly behaviour for other CPUs. For the user case it's
different, we don't have real choice, plus we know that it will be used
to execute linux binaries, which are more likely to have a sane usage of
atomic instructions.

Aurelien

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v3 07/30] target/sh4: Recognize common gUSA sequences
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 07/30] target/sh4: Recognize common gUSA sequences Richard Henderson
@ 2017-07-18 20:32   ` Aurelien Jarno
  0 siblings, 0 replies; 34+ messages in thread
From: Aurelien Jarno @ 2017-07-18 20:32 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On 2017-07-18 10:02, Richard Henderson wrote:
> For many of the sequences produced by gcc or glibc,
> we can translate these as host atomic operations.
> Which saves the need to acquire the exclusive lock.
> 
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> 
> ---
> V2: Free constants loaded during the gUSA sequence.
> ---
>  target/sh4/translate.c | 321 +++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 321 insertions(+)

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v3 00/30] target/sh4 improvements
  2017-07-18 20:02 [Qemu-devel] [PATCH v3 00/30] target/sh4 improvements Richard Henderson
                   ` (28 preceding siblings ...)
  2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 30/30] linux-user/sh4: Reduce TARGET_VIRT_ADDR_SPACE_BITS to 31 Richard Henderson
@ 2017-07-18 21:02 ` Aurelien Jarno
  29 siblings, 0 replies; 34+ messages in thread
From: Aurelien Jarno @ 2017-07-18 21:02 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On 2017-07-18 10:02, Richard Henderson wrote:
> The big ticket item in this series is the support for
> user-space atomics.  But a lot of other cleanup has
> crept in as well.
> 
> Changes since v2 incorporate feedback from Aurelien.
> I've tried to remember to add individual changelogs
> to the patches, but I may have forgotten some.
> 
> I do now include the linux-user reserved_va changes
> that I posted subsequent to posting v2.
> 
> I believe there are only 4 patches that have not 
> seen a Reviewed-by yet.

Thanks for this new version. I have applied patches 2-27 to my tree. I
believe the patch 1 doesn't provide the correct (corner case) behaviour
in system mode. As for the last 3 patches, given they touch non-sh4
specific part of linux-user, I guess they should at least be acked by
a linux-user maintainer.

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v3 01/30] target/sh4: Use cmpxchg for movco
  2017-07-18 20:19   ` Aurelien Jarno
@ 2017-07-18 21:36     ` Richard Henderson
  0 siblings, 0 replies; 34+ messages in thread
From: Richard Henderson @ 2017-07-18 21:36 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel

On 07/18/2017 10:19 AM, Aurelien Jarno wrote:
> On 2017-07-18 10:02, Richard Henderson wrote:
>> As for other targets, cmpxchg isn't quite right for ll/sc,
>> suffering from an ABA race, but is sufficient to implement
>> portable atomic operations.
>>
>> Signed-off-by: Richard Henderson <rth@twiddle.net>
>>
>> ---
>> V2: Clear lock_addr in rte, do_interrupt, syscall entry, & signal delivery.
>>      Fix movli to tollerate overlap between R0 and REG(B11_8).
>> ---
>>   target/sh4/cpu.h       |  3 ++-
>>   linux-user/main.c      |  1 +
>>   linux-user/signal.c    |  2 ++
>>   target/sh4/helper.c    |  2 +-
>>   target/sh4/translate.c | 72 +++++++++++++++++++++++++++++---------------------
>>   5 files changed, 48 insertions(+), 32 deletions(-)
> 
> I still believe that for the system case, we should implement the
> behaviour described in the manual, that is setting ldst to 1 in movli
> and clearing it in an interrupt. Otherwise the plainly silly following
> instruction sequence will give different results than on real silicon:
> 
>      movli.l   @r1,r0
>      add       #4, r1
>      movco.l   r0, @r1
> 
> Yes, this is plainly silly to use movli/movco to copy data, but we have
> also implemented silly behaviour for other CPUs. For the user case it's
> different, we don't have real choice, plus we know that it will be used
> to execute linux binaries, which are more likely to have a sane usage of
> atomic instructions.

Ok, I guess I didn't understand your last comment.
But now the "ifdeffing" portion of that makes sense.


r~

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2017-07-18 21:37 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-18 20:02 [Qemu-devel] [PATCH v3 00/30] target/sh4 improvements Richard Henderson
2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 01/30] target/sh4: Use cmpxchg for movco Richard Henderson
2017-07-18 20:19   ` Aurelien Jarno
2017-07-18 21:36     ` Richard Henderson
2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 02/30] target/sh4: Consolidate end-of-TB tests Richard Henderson
2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 03/30] target/sh4: Introduce TB_FLAG_ENVFLAGS_MASK Richard Henderson
2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 04/30] target/sh4: Keep env->flags clean Richard Henderson
2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 05/30] target/sh4: Adjust TB_FLAG_PENDING_MOVCA Richard Henderson
2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 06/30] target/sh4: Handle user-space atomics Richard Henderson
2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 07/30] target/sh4: Recognize common gUSA sequences Richard Henderson
2017-07-18 20:32   ` Aurelien Jarno
2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 08/30] linux-user/sh4: Notice gUSA regions during signal delivery Richard Henderson
2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 09/30] linux-user/sh4: Clean env->flags on signal boundaries Richard Henderson
2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 10/30] target/sh4: Hoist register bank selection Richard Henderson
2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 11/30] target/sh4: Unify cpu_fregs into FREG Richard Henderson
2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 12/30] target/sh4: Pass DisasContext to fpr64 routines Richard Henderson
2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 14/30] target/sh4: Eliminate unused XREG macro Richard Henderson
2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 15/30] target/sh4: Merge DREG into fpr64 routines Richard Henderson
2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 16/30] target/sh4: Load/store Dr as 64-bit quantities Richard Henderson
2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 17/30] target/sh4: Simplify 64-bit fp reg-reg move Richard Henderson
2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 18/30] target/sh4: Unify code for CHECK_NOT_DELAY_SLOT Richard Henderson
2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 19/30] target/sh4: Unify code for CHECK_PRIVILEGED Richard Henderson
2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 20/30] target/sh4: Unify code for CHECK_FPU_ENABLED Richard Henderson
2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 21/30] target/sh4: Tidy misc illegal insn checks Richard Henderson
2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 22/30] target/sh4: Introduce CHECK_FPSCR_PR_* Richard Henderson
2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 23/30] target/sh4: Introduce CHECK_SH4A Richard Henderson
2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 24/30] target/sh4: Implement fpchg Richard Henderson
2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 25/30] target/sh4: Add missing FPSCR.PR == 0 checks Richard Henderson
2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 26/30] target/sh4: Implement fsrra Richard Henderson
2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 27/30] target/sh4: Use tcg_gen_lookup_and_goto_ptr Richard Henderson
2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 28/30] tcg: Fix off-by-one in assert in page_set_flags Richard Henderson
2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 29/30] linux-user: Tidy and enforce reserved_va initialization Richard Henderson
2017-07-18 20:02 ` [Qemu-devel] [PATCH v3 30/30] linux-user/sh4: Reduce TARGET_VIRT_ADDR_SPACE_BITS to 31 Richard Henderson
2017-07-18 21:02 ` [Qemu-devel] [PATCH v3 00/30] target/sh4 improvements Aurelien Jarno

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.