All of lore.kernel.org
 help / color / mirror / Atom feed
* [Fwd: Re: [Qemu-devel] RFC: Code fetch optimisation]
@ 2007-10-12 23:00 J. Mayer
  2007-10-13  7:11 ` Blue Swirl
  0 siblings, 1 reply; 8+ messages in thread
From: J. Mayer @ 2007-10-12 23:00 UTC (permalink / raw)
  To: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2473 bytes --]

-------- Forwarded Message --------
> From: Jocelyn Mayer <l_indien@magic.fr>
> Reply-To: l_indien@magic.fr, qemu-devel@nongnu.org
> To: qemu-devel@nongnu.org
> Subject: Re: [Qemu-devel] RFC: Code fetch optimisation
> Date: Fri, 12 Oct 2007 20:24:44 +0200
> 
> On Fri, 2007-10-12 at 18:21 +0300, Blue Swirl wrote:
> > On 10/12/07, J. Mayer <l_indien@magic.fr> wrote:
> > > Here's a small patch that allow an optimisation for code fetch, at least
> > > for RISC CPU targets, as suggested by Fabrice Bellard.
> > > The main idea is that a translated block is never to span over a page
> > > boundary. As the tb_find_slow routine already gets the physical address
> > > of the page of code to be translated, the code translator could then
> > > fetch the code using raw host memory accesses instead of doing it
> > > through the softmmu routines.
> > > This patch could also be adapted to RISC CPU targets, with care for the
> > > last instruction of a page. For now, I did implement it for alpha, arm,
> > > mips, PowerPC and SH4.
> > > I don't actually know if the optimsation would bring a sensible speed
> > > gain or if it will be absolutelly marginal.
> > >
> > > Please comment.
> > 
> > This will not work correctly for execution of MMIO registers, but
> > maybe that won't work on real hardware either. Who cares.
> 
> I wonder if this is important or not... But maybe, when retrieving the
> physical address we could check if it is inside ROM/RAM or an I/O area
> and in the last case do not give the phys_addr information to the
> translator. In that case, it would go on using the ldxx_code. I guess if
> we want to do that, a set of helpers would be appreciated to avoid
> adding code like:
> if (phys_pc == 0)
>   opc = ldul_code(virt_pc)
> else
>   opc = ldul_raw(phys_pc)
> everywhere... I could also add another check so this set of macro would
> automatically use ldxx_code if we reach a page boundary, which would
> then make easy to use this optimisation for CISC/VLE architectures too.
> 
> I'm not sure of the proper solution to allow executing code from mmio
> devices. But adding specific accessors to handle the CISC/VLE case is to
> be done. 

[...]

I did update my patch following this way and it's now able to run x86
and PowerPC targets.
PowerPC is the easy case, x86 is maybe the worst... Well, I'm not really
sure of what I've done for Sparc, but other targets should be safe.

Please comment.

-- 
J. Mayer <l_indien@magic.fr>
Never organized

[-- Attachment #2: code_raw_optim.diff --]
[-- Type: text/x-patch, Size: 55562 bytes --]

Index: cpu-all.h
===================================================================
RCS file: /sources/qemu/qemu/cpu-all.h,v
retrieving revision 1.76
diff -u -d -d -p -r1.76 cpu-all.h
--- cpu-all.h	23 Sep 2007 15:28:03 -0000	1.76
+++ cpu-all.h	12 Oct 2007 22:53:37 -0000
@@ -646,6 +646,13 @@ static inline void stfq_be_p(void *ptr, 
 #define ldl_code(p) ldl_raw(p)
 #define ldq_code(p) ldq_raw(p)
 
+#define ldub_code_p(sp, pp, p) ldub_raw(p)
+#define ldsb_code_p(sp, pp, p) ldsb_raw(p)
+#define lduw_code_p(sp, pp, p) lduw_raw(p)
+#define ldsw_code_p(sp, pp, p) ldsw_raw(p)
+#define ldl_code_p(sp, pp, p) ldl_raw(p)
+#define ldq_code_p(sp, pp, p) ldq_raw(p)
+
 #define ldub_kernel(p) ldub_raw(p)
 #define ldsb_kernel(p) ldsb_raw(p)
 #define lduw_kernel(p) lduw_raw(p)
Index: cpu-exec.c
===================================================================
RCS file: /sources/qemu/qemu/cpu-exec.c,v
retrieving revision 1.119
diff -u -d -d -p -r1.119 cpu-exec.c
--- cpu-exec.c	8 Oct 2007 13:16:13 -0000	1.119
+++ cpu-exec.c	12 Oct 2007 22:53:37 -0000
@@ -133,6 +133,7 @@ static TranslationBlock *tb_find_slow(ta
     tb->tc_ptr = tc_ptr;
     tb->cs_base = cs_base;
     tb->flags = flags;
+    tb->page_addr[0] = phys_page1;
     cpu_gen_code(env, tb, CODE_GEN_MAX_SIZE, &code_gen_size);
     code_gen_ptr = (void *)(((unsigned long)code_gen_ptr + code_gen_size + CODE_GEN_ALIGN - 1) & ~(CODE_GEN_ALIGN - 1));
 
Index: softmmu_header.h
===================================================================
RCS file: /sources/qemu/qemu/softmmu_header.h,v
retrieving revision 1.17
diff -u -d -d -p -r1.17 softmmu_header.h
--- softmmu_header.h	8 Oct 2007 13:16:14 -0000	1.17
+++ softmmu_header.h	12 Oct 2007 22:53:37 -0000
@@ -336,6 +336,60 @@ static inline void glue(glue(st, SUFFIX)
     }
 }
 
+#else
+
+#if DATA_SIZE <= 2
+static inline RES_TYPE glue(glue(glue(lds,SUFFIX),MEMSUFFIX),_p)(unsigned long *start_pc,
+                                                                 unsigned long phys_pc,
+                                                                 target_ulong virt_pc)
+{
+    RES_TYPE opc;
+
+    if (unlikely((*start_pc ^
+                  (phys_pc + sizeof(RES_TYPE) - 1)) >> TARGET_PAGE_BITS)) {
+        /* Slow path: phys_pc is not in the same page than start_pc
+         *            or the insn is spanning two pages
+         */
+        opc = glue(glue(lds,SUFFIX),MEMSUFFIX)(virt_pc);
+        /* Avoid softmmu access on next load */
+        /* XXX: dont: phys PC is not correct anymore
+         *      We chould call get_phys_addr_code(env, pc); and remove the else
+         *      condition, here.
+         */
+        //*start_pc = phys_pc;
+    } else {
+        opc = glue(glue(lds,SUFFIX),_raw)(phys_pc);
+    }
+
+    return opc;
+}
+#endif
+
+static inline RES_TYPE glue(glue(glue(ld,USUFFIX),MEMSUFFIX),_p)(unsigned long *start_pc,
+                                                                 unsigned long phys_pc,
+                                                                 target_ulong virt_pc)
+{
+    RES_TYPE opc;
+
+    if (unlikely((*start_pc ^
+                  (phys_pc + sizeof(RES_TYPE) - 1)) >> TARGET_PAGE_BITS)) {
+        /* Slow path: phys_pc is not in the same page than start_pc
+         *            or the insn is spanning two pages
+         */
+        opc = glue(glue(ld,USUFFIX),MEMSUFFIX)(virt_pc);
+        /* Avoid softmmu access on next load */
+        /* XXX: dont: phys PC is not correct anymore
+         *      We chould call get_phys_addr_code(env, pc); and remove the else
+         *      condition, here.
+         */
+        //*start_pc = phys_pc;
+    } else {
+        opc = glue(glue(ld,USUFFIX),_raw)(phys_pc);
+    }
+
+    return opc;
+}
+
 #endif /* ACCESS_TYPE != 3 */
 
 #endif /* !asm */
Index: target-alpha/translate.c
===================================================================
RCS file: /sources/qemu/qemu/target-alpha/translate.c,v
retrieving revision 1.5
diff -u -d -d -p -r1.5 translate.c
--- target-alpha/translate.c	16 Sep 2007 21:08:01 -0000	1.5
+++ target-alpha/translate.c	12 Oct 2007 22:53:38 -0000
@@ -1965,6 +1965,7 @@ int gen_intermediate_code_internal (CPUS
     static int insn_count;
 #endif
     DisasContext ctx, *ctxp = &ctx;
+    unsigned long phys_pc, phys_pc_start;
     target_ulong pc_start;
     uint32_t insn;
     uint16_t *gen_opc_end;
@@ -1972,6 +1973,9 @@ int gen_intermediate_code_internal (CPUS
     int ret;
 
     pc_start = tb->pc;
+    phys_pc_start = (unsigned long)phys_ram_base + tb->page_addr[0] +
+        (pc_start & ~TARGET_PAGE_MASK);
+    phys_pc = phys_pc_start;
     gen_opc_ptr = gen_opc_buf;
     gen_opc_end = gen_opc_buf + OPC_MAX_SIZE;
     gen_opparam_ptr = gen_opparam_buf;
@@ -2010,7 +2014,7 @@ int gen_intermediate_code_internal (CPUS
                     ctx.pc, ctx.mem_idx);
         }
 #endif
-        insn = ldl_code(ctx.pc);
+        insn = ldl_code_p(&phys_pc_start, phys_pc, ctx.pc);
 #if defined ALPHA_DEBUG_DISAS
         insn_count++;
         if (logfile != NULL) {
@@ -2018,6 +2022,7 @@ int gen_intermediate_code_internal (CPUS
         }
 #endif
         ctx.pc += 4;
+        phys_pc += 4;
         ret = translate_one(ctxp, insn);
         if (ret != 0)
             break;
Index: target-arm/translate.c
===================================================================
RCS file: /sources/qemu/qemu/target-arm/translate.c,v
retrieving revision 1.57
diff -u -d -d -p -r1.57 translate.c
--- target-arm/translate.c	17 Sep 2007 08:09:51 -0000	1.57
+++ target-arm/translate.c	12 Oct 2007 22:53:38 -0000
@@ -38,6 +38,8 @@
 /* internal defines */
 typedef struct DisasContext {
     target_ulong pc;
+    unsigned long phys_pc;
+    unsigned long phys_pc_start;
     int is_jmp;
     /* Nonzero if this instruction has been conditionally skipped.  */
     int condjmp;
@@ -2206,8 +2208,9 @@ static void disas_arm_insn(CPUState * en
 {
     unsigned int cond, insn, val, op1, i, shift, rm, rs, rn, rd, sh;
 
-    insn = ldl_code(s->pc);
+    insn = ldl_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
     s->pc += 4;
+    s->phys_pc += 4;
 
     cond = insn >> 28;
     if (cond == 0xf){
@@ -2971,8 +2974,9 @@ static void disas_thumb_insn(DisasContex
     int32_t offset;
     int i;
 
-    insn = lduw_code(s->pc);
+    insn = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
     s->pc += 2;
+    s->phys_pc += 2;
 
     switch (insn >> 12) {
     case 0: case 1:
@@ -3494,7 +3498,7 @@ static void disas_thumb_insn(DisasContex
             break;
         }
         offset = ((int32_t)insn << 21) >> 10;
-        insn = lduw_code(s->pc);
+        insn = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
         offset |= insn & 0x7ff;
 
         val = (uint32_t)s->pc + 2;
@@ -3544,6 +3548,9 @@ static inline int gen_intermediate_code_
 
     dc->is_jmp = DISAS_NEXT;
     dc->pc = pc_start;
+    dc->phys_pc_start = (unsigned long)phys_ram_base + tb->page_addr[0] +
+        (pc_start & ~TARGET_PAGE_MASK);
+    dc->phys_pc = dc->phys_pc_start;
     dc->singlestep_enabled = env->singlestep_enabled;
     dc->condjmp = 0;
     dc->thumb = env->thumb;
Index: target-cris/translate.c
===================================================================
RCS file: /sources/qemu/qemu/target-cris/translate.c,v
retrieving revision 1.1
diff -u -d -d -p -r1.1 translate.c
--- target-cris/translate.c	8 Oct 2007 12:49:08 -0000	1.1
+++ target-cris/translate.c	12 Oct 2007 22:53:38 -0000
@@ -100,6 +100,7 @@ enum {
 typedef struct DisasContext {
 	CPUState *env;
 	target_ulong pc, insn_pc;
+        unsigned long phys_pc, phys_pc_start;
 
 	/* Decoder.  */
 	uint32_t ir;
@@ -828,7 +829,8 @@ static int dec_prep_alu_m(DisasContext *
 		if (memsize == 1)
 			insn_len++;
 
-		imm = ldl_code(dc->pc + 2);
+                imm = ldl_code_p(&dc->phys_pc_start, dc->phys_pc + 2,
+                                 dc->pc + 2);
 		if (memsize != 4) {
 			if (s_ext) {
 				imm = sign_extend(imm, (memsize * 8) - 1);
@@ -1962,7 +1964,7 @@ static unsigned int dec_lapc_im(DisasCon
 	rd = dc->op2;
 
 	cris_cc_mask(dc, 0);
-	imm = ldl_code(dc->pc + 2);
+	imm = ldl_code_p(&dc->phys_pc_start, dc->phys_pc + 2, dc->pc + 2);
 	DIS(fprintf (logfile, "lapc 0x%x, $r%u\n", imm + dc->pc, dc->op2));
 	gen_op_movl_T0_im (dc->pc + imm);
 	gen_movl_reg_T0[rd] ();
@@ -1999,7 +2001,7 @@ static unsigned int dec_jas_im(DisasCont
 {
 	uint32_t imm;
 
-	imm = ldl_code(dc->pc + 2);
+	imm = ldl_code_p(&dc->phys_pc_start, dc->phys_pc + 2, dc->pc + 2);
 
 	DIS(fprintf (logfile, "jas 0x%x\n", imm));
 	cris_cc_mask(dc, 0);
@@ -2016,7 +2018,7 @@ static unsigned int dec_jasc_im(DisasCon
 {
 	uint32_t imm;
 
-	imm = ldl_code(dc->pc + 2);
+	imm = ldl_code_p(&dc->phys_pc_start, dc->phys_pc + 2, dc->pc + 2);
 
 	DIS(fprintf (logfile, "jasc 0x%x\n", imm));
 	cris_cc_mask(dc, 0);
@@ -2047,7 +2049,7 @@ static unsigned int dec_bcc_im(DisasCont
 	int32_t offset;
 	uint32_t cond = dc->op2;
 
-	offset = ldl_code(dc->pc + 2);
+	offset = ldl_code_p(&dc->phys_pc_start, dc->phys_pc + 2, dc->pc + 2);
 	offset = sign_extend(offset, 15);
 
 	DIS(fprintf (logfile, "b%s %d pc=%x dst=%x\n",
@@ -2065,7 +2067,7 @@ static unsigned int dec_bas_im(DisasCont
 	int32_t simm;
 
 
-	simm = ldl_code(dc->pc + 2);
+	simm = ldl_code_p(&dc->phys_pc_start, dc->phys_pc + 2, dc->pc + 2);
 
 	DIS(fprintf (logfile, "bas 0x%x, $p%u\n", dc->pc + simm, dc->op2));
 	cris_cc_mask(dc, 0);
@@ -2081,7 +2083,7 @@ static unsigned int dec_bas_im(DisasCont
 static unsigned int dec_basc_im(DisasContext *dc)
 {
 	int32_t simm;
-	simm = ldl_code(dc->pc + 2);
+	simm = ldl_code_p(&dc->phys_pc_start, dc->phys_pc + 2, dc->pc + 2);
 
 	DIS(fprintf (logfile, "basc 0x%x, $p%u\n", dc->pc + simm, dc->op2));
 	cris_cc_mask(dc, 0);
@@ -2259,7 +2261,7 @@ cris_decoder(DisasContext *dc)
 	int i;
 
 	/* Load a halfword onto the instruction register.  */
-	tmp = ldl_code(dc->pc);
+	tmp = ldl_code_p(&dc->phys_pc_start, dc->phys_pc, dc->pc);
 	dc->ir = tmp & 0xffff;
 
 	/* Now decode it.  */
@@ -2313,6 +2315,9 @@ gen_intermediate_code_internal(CPUState 
 	uint32_t next_page_start;
 
 	pc_start = tb->pc;
+        dc->phys_pc_start = (unsigned long)phys_ram_base + tb->page_addr[0] +
+            (pc_start & ~TARGET_PAGE_MASK);
+        dc->phys_pc = dc->phys_pc_start;
 	dc->env = env;
 	dc->tb = tb;
 
@@ -2347,6 +2352,7 @@ gen_intermediate_code_internal(CPUState 
 		insn_len = cris_decoder(dc);
 		STATS(gen_op_exec_insn());
 		dc->pc += insn_len;
+                dc->phys_pc += insn_len;
 		if (!dc->flagx_live
 		    || (dc->flagx_live &&
 			!(dc->cc_op == CC_OP_FLAGS && dc->flags_x))) {
Index: target-i386/translate.c
===================================================================
RCS file: /sources/qemu/qemu/target-i386/translate.c,v
retrieving revision 1.72
diff -u -d -d -p -r1.72 translate.c
--- target-i386/translate.c	27 Sep 2007 01:52:00 -0000	1.72
+++ target-i386/translate.c	12 Oct 2007 22:53:39 -0000
@@ -73,6 +73,7 @@ typedef struct DisasContext {
     int prefix;
     int aflag, dflag;
     target_ulong pc; /* pc = eip + cs_base */
+    unsigned long phys_pc,phys_pc_start;
     int is_jmp; /* 1 = means jump (stop translation), 2 means CPU
                    static state change (stop translation) */
     /* current block context */
@@ -1451,7 +1452,7 @@ static void gen_lea_modrm(DisasContext *
 
         if (base == 4) {
             havesib = 1;
-            code = ldub_code(s->pc++);
+            code = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
             scale = (code >> 6) & 3;
             index = ((code >> 3) & 7) | REX_X(s);
             base = (code & 7);
@@ -1462,8 +1463,10 @@ static void gen_lea_modrm(DisasContext *
         case 0:
             if ((base & 7) == 5) {
                 base = -1;
-                disp = (int32_t)ldl_code(s->pc);
+                disp = (int32_t)ldl_code_p(&s->phys_pc_start, s->phys_pc,
+                                           s->pc);
                 s->pc += 4;
+                s->phys_pc += 4;
                 if (CODE64(s) && !havesib) {
                     disp += s->pc + s->rip_offset;
                 }
@@ -1472,12 +1475,14 @@ static void gen_lea_modrm(DisasContext *
             }
             break;
         case 1:
-            disp = (int8_t)ldub_code(s->pc++);
+            disp = (int8_t)ldub_code_p(&s->phys_pc_start, s->phys_pc++,
+                                       s->pc++);
             break;
         default:
         case 2:
-            disp = ldl_code(s->pc);
+            disp = ldl_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
             s->pc += 4;
+            s->phys_pc += 4;
             break;
         }
 
@@ -1545,8 +1550,9 @@ static void gen_lea_modrm(DisasContext *
         switch (mod) {
         case 0:
             if (rm == 6) {
-                disp = lduw_code(s->pc);
+                disp = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
                 s->pc += 2;
+                s->phys_pc += 2;
                 gen_op_movl_A0_im(disp);
                 rm = 0; /* avoid SS override */
                 goto no_rm;
@@ -1555,12 +1561,14 @@ static void gen_lea_modrm(DisasContext *
             }
             break;
         case 1:
-            disp = (int8_t)ldub_code(s->pc++);
+            disp = (int8_t)ldub_code_p(&s->phys_pc_start, s->phys_pc++,
+                                       s->pc++);
             break;
         default:
         case 2:
-            disp = lduw_code(s->pc);
+            disp = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
             s->pc += 2;
+            s->phys_pc += 2;
             break;
         }
         switch(rm) {
@@ -1629,7 +1637,7 @@ static void gen_nop_modrm(DisasContext *
         base = rm;
 
         if (base == 4) {
-            code = ldub_code(s->pc++);
+            code = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
             base = (code & 7);
         }
 
@@ -1637,14 +1645,17 @@ static void gen_nop_modrm(DisasContext *
         case 0:
             if (base == 5) {
                 s->pc += 4;
+                s->phys_pc += 4;
             }
             break;
         case 1:
             s->pc++;
+            s->phys_pc++;
             break;
         default:
         case 2:
             s->pc += 4;
+            s->phys_pc += 4;
             break;
         }
     } else {
@@ -1652,14 +1663,17 @@ static void gen_nop_modrm(DisasContext *
         case 0:
             if (rm == 6) {
                 s->pc += 2;
+                s->phys_pc += 2;
             }
             break;
         case 1:
             s->pc++;
+            s->phys_pc++;
             break;
         default:
         case 2:
             s->pc += 2;
+            s->phys_pc += 2;
             break;
         }
     }
@@ -1727,17 +1741,20 @@ static inline uint32_t insn_get(DisasCon
 
     switch(ot) {
     case OT_BYTE:
-        ret = ldub_code(s->pc);
+        ret = ldub_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
         s->pc++;
+        s->phys_pc++;
         break;
     case OT_WORD:
-        ret = lduw_code(s->pc);
+        ret = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
         s->pc += 2;
+        s->phys_pc += 2;
         break;
     default:
     case OT_LONG:
-        ret = ldl_code(s->pc);
+        ret = ldl_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
         s->pc += 4;
+        s->phys_pc += 4;
         break;
     }
     return ret;
@@ -2689,7 +2706,7 @@ static void gen_sse(DisasContext *s, int
         gen_op_enter_mmx();
     }
 
-    modrm = ldub_code(s->pc++);
+    modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
     reg = ((modrm >> 3) & 7);
     if (is_xmm)
         reg |= rex_r;
@@ -2962,7 +2979,7 @@ static void gen_sse(DisasContext *s, int
         case 0x171: /* shift xmm, im */
         case 0x172:
         case 0x173:
-            val = ldub_code(s->pc++);
+            val = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
             if (is_xmm) {
                 gen_op_movl_T0_im(val);
                 gen_op_movl_env_T0(offsetof(CPUX86State,xmm_t0.XMM_L(0)));
@@ -3082,7 +3099,7 @@ static void gen_sse(DisasContext *s, int
         case 0x1c4:
             s->rip_offset = 1;
             gen_ldst_modrm(s, modrm, OT_WORD, OR_TMP0, 0);
-            val = ldub_code(s->pc++);
+            val = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
             if (b1) {
                 val &= 7;
                 gen_op_pinsrw_xmm(offsetof(CPUX86State,xmm_regs[reg]), val);
@@ -3095,7 +3112,7 @@ static void gen_sse(DisasContext *s, int
         case 0x1c5:
             if (mod != 3)
                 goto illegal_op;
-            val = ldub_code(s->pc++);
+            val = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
             if (b1) {
                 val &= 7;
                 rm = (modrm & 7) | REX_B(s);
@@ -3213,13 +3230,13 @@ static void gen_sse(DisasContext *s, int
         switch(b) {
         case 0x70: /* pshufx insn */
         case 0xc6: /* pshufx insn */
-            val = ldub_code(s->pc++);
+            val = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
             sse_op3 = (GenOpFunc3 *)sse_op2;
             sse_op3(op1_offset, op2_offset, val);
             break;
         case 0xc2:
             /* compare insns */
-            val = ldub_code(s->pc++);
+            val = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
             if (val >= 8)
                 goto illegal_op;
             sse_op2 = sse_op_table4[val][b1];
@@ -3260,8 +3277,9 @@ static target_ulong disas_insn(DisasCont
 #endif
     s->rip_offset = 0; /* for relative ip address */
  next_byte:
-    b = ldub_code(s->pc);
+    b = ldub_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
     s->pc++;
+    s->phys_pc++;
     /* check prefixes */
 #ifdef TARGET_X86_64
     if (CODE64(s)) {
@@ -3375,7 +3393,7 @@ static target_ulong disas_insn(DisasCont
     case 0x0f:
         /**************************/
         /* extended op code */
-        b = ldub_code(s->pc++) | 0x100;
+        b = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++) | 0x100;
         goto reswitch;
 
         /**************************/
@@ -3400,7 +3418,7 @@ static target_ulong disas_insn(DisasCont
 
             switch(f) {
             case 0: /* OP Ev, Gv */
-                modrm = ldub_code(s->pc++);
+                modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
                 reg = ((modrm >> 3) & 7) | rex_r;
                 mod = (modrm >> 6) & 3;
                 rm = (modrm & 7) | REX_B(s);
@@ -3422,7 +3440,7 @@ static target_ulong disas_insn(DisasCont
                 gen_op(s, op, ot, opreg);
                 break;
             case 1: /* OP Gv, Ev */
-                modrm = ldub_code(s->pc++);
+                modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
                 mod = (modrm >> 6) & 3;
                 reg = ((modrm >> 3) & 7) | rex_r;
                 rm = (modrm & 7) | REX_B(s);
@@ -3457,7 +3475,7 @@ static target_ulong disas_insn(DisasCont
             else
                 ot = dflag + OT_WORD;
 
-            modrm = ldub_code(s->pc++);
+            modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
             mod = (modrm >> 6) & 3;
             rm = (modrm & 7) | REX_B(s);
             op = (modrm >> 3) & 7;
@@ -3506,7 +3524,7 @@ static target_ulong disas_insn(DisasCont
         else
             ot = dflag + OT_WORD;
 
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         mod = (modrm >> 6) & 3;
         rm = (modrm & 7) | REX_B(s);
         op = (modrm >> 3) & 7;
@@ -3648,7 +3666,7 @@ static target_ulong disas_insn(DisasCont
         else
             ot = dflag + OT_WORD;
 
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         mod = (modrm >> 6) & 3;
         rm = (modrm & 7) | REX_B(s);
         op = (modrm >> 3) & 7;
@@ -3754,7 +3772,7 @@ static target_ulong disas_insn(DisasCont
         else
             ot = dflag + OT_WORD;
 
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         mod = (modrm >> 6) & 3;
         rm = (modrm & 7) | REX_B(s);
         reg = ((modrm >> 3) & 7) | rex_r;
@@ -3805,7 +3823,7 @@ static target_ulong disas_insn(DisasCont
     case 0x69: /* imul Gv, Ev, I */
     case 0x6b:
         ot = dflag + OT_WORD;
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         reg = ((modrm >> 3) & 7) | rex_r;
         if (b == 0x69)
             s->rip_offset = insn_const_size(ot);
@@ -3841,7 +3859,7 @@ static target_ulong disas_insn(DisasCont
             ot = OT_BYTE;
         else
             ot = dflag + OT_WORD;
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         reg = ((modrm >> 3) & 7) | rex_r;
         mod = (modrm >> 6) & 3;
         if (mod == 3) {
@@ -3868,7 +3886,7 @@ static target_ulong disas_insn(DisasCont
             ot = OT_BYTE;
         else
             ot = dflag + OT_WORD;
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         reg = ((modrm >> 3) & 7) | rex_r;
         mod = (modrm >> 6) & 3;
         gen_op_mov_TN_reg[ot][1][reg]();
@@ -3885,7 +3903,7 @@ static target_ulong disas_insn(DisasCont
         s->cc_op = CC_OP_SUBB + ot;
         break;
     case 0x1c7: /* cmpxchg8b */
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         mod = (modrm >> 6) & 3;
         if (mod == 3)
             goto illegal_op;
@@ -3944,7 +3962,7 @@ static target_ulong disas_insn(DisasCont
         } else {
             ot = dflag + OT_WORD;
         }
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         mod = (modrm >> 6) & 3;
         gen_pop_T0(s);
         if (mod == 3) {
@@ -3963,9 +3981,10 @@ static target_ulong disas_insn(DisasCont
     case 0xc8: /* enter */
         {
             int level;
-            val = lduw_code(s->pc);
+            val = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
             s->pc += 2;
-            level = ldub_code(s->pc++);
+            s->phys_pc += 2;
+            level = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
             gen_enter(s, val, level);
         }
         break;
@@ -4045,7 +4064,7 @@ static target_ulong disas_insn(DisasCont
             ot = OT_BYTE;
         else
             ot = dflag + OT_WORD;
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         reg = ((modrm >> 3) & 7) | rex_r;
 
         /* generate a generic store */
@@ -4057,7 +4076,7 @@ static target_ulong disas_insn(DisasCont
             ot = OT_BYTE;
         else
             ot = dflag + OT_WORD;
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         mod = (modrm >> 6) & 3;
         if (mod != 3) {
             s->rip_offset = insn_const_size(ot);
@@ -4076,14 +4095,14 @@ static target_ulong disas_insn(DisasCont
             ot = OT_BYTE;
         else
             ot = OT_WORD + dflag;
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         reg = ((modrm >> 3) & 7) | rex_r;
 
         gen_ldst_modrm(s, modrm, ot, OR_TMP0, 0);
         gen_op_mov_reg_T0[ot][reg]();
         break;
     case 0x8e: /* mov seg, Gv */
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         reg = (modrm >> 3) & 7;
         if (reg >= 6 || reg == R_CS)
             goto illegal_op;
@@ -4103,7 +4122,7 @@ static target_ulong disas_insn(DisasCont
         }
         break;
     case 0x8c: /* mov Gv, seg */
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         reg = (modrm >> 3) & 7;
         mod = (modrm >> 6) & 3;
         if (reg >= 6)
@@ -4126,7 +4145,7 @@ static target_ulong disas_insn(DisasCont
             d_ot = dflag + OT_WORD;
             /* ot is the size of source */
             ot = (b & 1) + OT_BYTE;
-            modrm = ldub_code(s->pc++);
+            modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
             reg = ((modrm >> 3) & 7) | rex_r;
             mod = (modrm >> 6) & 3;
             rm = (modrm & 7) | REX_B(s);
@@ -4163,7 +4182,7 @@ static target_ulong disas_insn(DisasCont
 
     case 0x8d: /* lea */
         ot = dflag + OT_WORD;
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         mod = (modrm >> 6) & 3;
         if (mod == 3)
             goto illegal_op;
@@ -4190,8 +4209,9 @@ static target_ulong disas_insn(DisasCont
                 ot = dflag + OT_WORD;
 #ifdef TARGET_X86_64
             if (s->aflag == 2) {
-                offset_addr = ldq_code(s->pc);
+                offset_addr = ldq_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
                 s->pc += 8;
+                s->phys_pc += 8;
                 if (offset_addr == (int32_t)offset_addr)
                     gen_op_movq_A0_im(offset_addr);
                 else
@@ -4243,8 +4263,9 @@ static target_ulong disas_insn(DisasCont
         if (dflag == 2) {
             uint64_t tmp;
             /* 64 bit case */
-            tmp = ldq_code(s->pc);
+            tmp = ldq_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
             s->pc += 8;
+            s->phys_pc += 8;
             reg = (b & 7) | REX_B(s);
             gen_movtl_T0_im(tmp);
             gen_op_mov_reg_T0[OT_QUAD][reg]();
@@ -4270,7 +4291,7 @@ static target_ulong disas_insn(DisasCont
             ot = OT_BYTE;
         else
             ot = dflag + OT_WORD;
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         reg = ((modrm >> 3) & 7) | rex_r;
         mod = (modrm >> 6) & 3;
         if (mod == 3) {
@@ -4313,7 +4334,7 @@ static target_ulong disas_insn(DisasCont
         op = R_GS;
     do_lxx:
         ot = dflag ? OT_LONG : OT_WORD;
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         reg = ((modrm >> 3) & 7) | rex_r;
         mod = (modrm >> 6) & 3;
         if (mod == 3)
@@ -4345,7 +4366,7 @@ static target_ulong disas_insn(DisasCont
             else
                 ot = dflag + OT_WORD;
 
-            modrm = ldub_code(s->pc++);
+            modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
             mod = (modrm >> 6) & 3;
             op = (modrm >> 3) & 7;
 
@@ -4364,7 +4385,8 @@ static target_ulong disas_insn(DisasCont
                 gen_shift(s, op, ot, opreg, OR_ECX);
             } else {
                 if (shift == 2) {
-                    shift = ldub_code(s->pc++);
+                    shift = ldub_code_p(&s->phys_pc_start, s->phys_pc++,
+                                        s->pc++);
                 }
                 gen_shifti(s, op, ot, opreg, shift);
             }
@@ -4398,7 +4420,7 @@ static target_ulong disas_insn(DisasCont
         shift = 0;
     do_shiftd:
         ot = dflag + OT_WORD;
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         mod = (modrm >> 6) & 3;
         rm = (modrm & 7) | REX_B(s);
         reg = ((modrm >> 3) & 7) | rex_r;
@@ -4412,7 +4434,7 @@ static target_ulong disas_insn(DisasCont
         gen_op_mov_TN_reg[ot][1][reg]();
 
         if (shift) {
-            val = ldub_code(s->pc++);
+            val = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
             if (ot == OT_QUAD)
                 val &= 0x3f;
             else
@@ -4450,7 +4472,7 @@ static target_ulong disas_insn(DisasCont
             gen_exception(s, EXCP07_PREX, pc_start - s->cs_base);
             break;
         }
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         mod = (modrm >> 6) & 3;
         rm = modrm & 7;
         op = ((b & 7) << 3) | ((modrm >> 3) & 7);
@@ -5013,7 +5035,7 @@ static target_ulong disas_insn(DisasCont
             ot = OT_BYTE;
         else
             ot = dflag ? OT_LONG : OT_WORD;
-        val = ldub_code(s->pc++);
+        val = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         gen_op_movl_T0_im(val);
         gen_check_io(s, ot, 0, pc_start - s->cs_base);
         if (gen_svm_check_io(s, pc_start,
@@ -5029,7 +5051,7 @@ static target_ulong disas_insn(DisasCont
             ot = OT_BYTE;
         else
             ot = dflag ? OT_LONG : OT_WORD;
-        val = ldub_code(s->pc++);
+        val = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         gen_op_movl_T0_im(val);
         gen_check_io(s, ot, 0, pc_start - s->cs_base);
         if (gen_svm_check_io(s, pc_start, svm_is_rep(prefixes) |
@@ -5073,8 +5095,9 @@ static target_ulong disas_insn(DisasCont
         /************************/
         /* control */
     case 0xc2: /* ret im */
-        val = ldsw_code(s->pc);
+        val = ldsw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
         s->pc += 2;
+        s->phys_pc += 2;
         gen_pop_T0(s);
         if (CODE64(s) && s->dflag)
             s->dflag = 2;
@@ -5093,8 +5116,9 @@ static target_ulong disas_insn(DisasCont
         gen_eob(s);
         break;
     case 0xca: /* lret im */
-        val = ldsw_code(s->pc);
+        val = ldsw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
         s->pc += 2;
+        s->phys_pc += 2;
     do_lret:
         if (s->pe && !s->vm86) {
             if (s->cc_op != CC_OP_DYNAMIC)
@@ -5223,13 +5247,13 @@ static target_ulong disas_insn(DisasCont
         break;
 
     case 0x190 ... 0x19f: /* setcc Gv */
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         gen_setcc(s, b);
         gen_ldst_modrm(s, modrm, OT_BYTE, OR_TMP0, 1);
         break;
     case 0x140 ... 0x14f: /* cmov Gv, Ev */
         ot = dflag + OT_WORD;
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         reg = ((modrm >> 3) & 7) | rex_r;
         mod = (modrm >> 6) & 3;
         gen_setcc(s, b);
@@ -5338,7 +5362,7 @@ static target_ulong disas_insn(DisasCont
         /* bit operations */
     case 0x1ba: /* bt/bts/btr/btc Gv, im */
         ot = dflag + OT_WORD;
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         op = (modrm >> 3) & 7;
         mod = (modrm >> 6) & 3;
         rm = (modrm & 7) | REX_B(s);
@@ -5350,7 +5374,7 @@ static target_ulong disas_insn(DisasCont
             gen_op_mov_TN_reg[ot][0][rm]();
         }
         /* load shift */
-        val = ldub_code(s->pc++);
+        val = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         gen_op_movl_T1_im(val);
         if (op < 4)
             goto illegal_op;
@@ -5378,7 +5402,7 @@ static target_ulong disas_insn(DisasCont
         op = 3;
     do_btx:
         ot = dflag + OT_WORD;
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         reg = ((modrm >> 3) & 7) | rex_r;
         mod = (modrm >> 6) & 3;
         rm = (modrm & 7) | REX_B(s);
@@ -5404,7 +5428,7 @@ static target_ulong disas_insn(DisasCont
     case 0x1bc: /* bsf */
     case 0x1bd: /* bsr */
         ot = dflag + OT_WORD;
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         reg = ((modrm >> 3) & 7) | rex_r;
         gen_ldst_modrm(s, modrm, ot, OR_TMP0, 0);
         /* NOTE: in order to handle the 0 case, we must load the
@@ -5451,7 +5475,7 @@ static target_ulong disas_insn(DisasCont
     case 0xd4: /* aam */
         if (CODE64(s))
             goto illegal_op;
-        val = ldub_code(s->pc++);
+        val = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         if (val == 0) {
             gen_exception(s, EXCP00_DIVZ, pc_start - s->cs_base);
         } else {
@@ -5462,7 +5486,7 @@ static target_ulong disas_insn(DisasCont
     case 0xd5: /* aad */
         if (CODE64(s))
             goto illegal_op;
-        val = ldub_code(s->pc++);
+        val = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         gen_op_aad(val);
         s->cc_op = CC_OP_LOGICB;
         break;
@@ -5494,7 +5518,7 @@ static target_ulong disas_insn(DisasCont
         gen_interrupt(s, EXCP03_INT3, pc_start - s->cs_base, s->pc - s->cs_base);
         break;
     case 0xcd: /* int N */
-        val = ldub_code(s->pc++);
+        val = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         if (gen_svm_check_intercept(s, pc_start, SVM_EXIT_SWINT))
             break;
         if (s->vm86 && s->iopl != 3) {
@@ -5567,7 +5591,7 @@ static target_ulong disas_insn(DisasCont
         if (CODE64(s))
             goto illegal_op;
         ot = dflag ? OT_LONG : OT_WORD;
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         reg = (modrm >> 3) & 7;
         mod = (modrm >> 6) & 3;
         if (mod == 3)
@@ -5738,7 +5762,7 @@ static target_ulong disas_insn(DisasCont
         }
         break;
     case 0x100:
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         mod = (modrm >> 6) & 3;
         op = (modrm >> 3) & 7;
         switch(op) {
@@ -5808,7 +5832,7 @@ static target_ulong disas_insn(DisasCont
         }
         break;
     case 0x101:
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         mod = (modrm >> 6) & 3;
         op = (modrm >> 3) & 7;
         rm = modrm & 7;
@@ -6022,7 +6046,7 @@ static target_ulong disas_insn(DisasCont
             /* d_ot is the size of destination */
             d_ot = dflag + OT_WORD;
 
-            modrm = ldub_code(s->pc++);
+            modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
             reg = ((modrm >> 3) & 7) | rex_r;
             mod = (modrm >> 6) & 3;
             rm = (modrm & 7) | REX_B(s);
@@ -6048,7 +6072,7 @@ static target_ulong disas_insn(DisasCont
             if (!s->pe || s->vm86)
                 goto illegal_op;
             ot = dflag ? OT_LONG : OT_WORD;
-            modrm = ldub_code(s->pc++);
+            modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
             reg = (modrm >> 3) & 7;
             mod = (modrm >> 6) & 3;
             rm = modrm & 7;
@@ -6075,7 +6099,7 @@ static target_ulong disas_insn(DisasCont
         if (!s->pe || s->vm86)
             goto illegal_op;
         ot = dflag ? OT_LONG : OT_WORD;
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         reg = ((modrm >> 3) & 7) | rex_r;
         gen_ldst_modrm(s, modrm, ot, OR_TMP0, 0);
         gen_op_mov_TN_reg[ot][1][reg]();
@@ -6089,7 +6113,7 @@ static target_ulong disas_insn(DisasCont
         gen_op_mov_reg_T1[ot][reg]();
         break;
     case 0x118:
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         mod = (modrm >> 6) & 3;
         op = (modrm >> 3) & 7;
         switch(op) {
@@ -6108,7 +6132,7 @@ static target_ulong disas_insn(DisasCont
         }
         break;
     case 0x119 ... 0x11f: /* nop (multi byte) */
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         gen_nop_modrm(s, modrm);
         break;
     case 0x120: /* mov reg, crN */
@@ -6116,7 +6140,7 @@ static target_ulong disas_insn(DisasCont
         if (s->cpl != 0) {
             gen_exception(s, EXCP0D_GPF, pc_start - s->cs_base);
         } else {
-            modrm = ldub_code(s->pc++);
+            modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
             if ((modrm & 0xc0) != 0xc0)
                 goto illegal_op;
             rm = (modrm & 7) | REX_B(s);
@@ -6158,7 +6182,7 @@ static target_ulong disas_insn(DisasCont
         if (s->cpl != 0) {
             gen_exception(s, EXCP0D_GPF, pc_start - s->cs_base);
         } else {
-            modrm = ldub_code(s->pc++);
+            modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
             if ((modrm & 0xc0) != 0xc0)
                 goto illegal_op;
             rm = (modrm & 7) | REX_B(s);
@@ -6199,7 +6223,7 @@ static target_ulong disas_insn(DisasCont
         if (!(s->cpuid_features & CPUID_SSE2))
             goto illegal_op;
         ot = s->dflag == 2 ? OT_QUAD : OT_LONG;
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         mod = (modrm >> 6) & 3;
         if (mod == 3)
             goto illegal_op;
@@ -6208,7 +6232,7 @@ static target_ulong disas_insn(DisasCont
         gen_ldst_modrm(s, modrm, ot, reg, 1);
         break;
     case 0x1ae:
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         mod = (modrm >> 6) & 3;
         op = (modrm >> 3) & 7;
         switch(op) {
@@ -6274,7 +6298,7 @@ static target_ulong disas_insn(DisasCont
         }
         break;
     case 0x10d: /* prefetch */
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         gen_lea_modrm(s, modrm, &reg_addr, &offset_addr);
         /* ignore for now */
         break;
@@ -6752,6 +6776,9 @@ static inline int gen_intermediate_code_
 
     dc->is_jmp = DISAS_NEXT;
     pc_ptr = pc_start;
+    dc->phys_pc_start = (unsigned long)phys_ram_base + tb->page_addr[0] +
+        (pc_start & ~TARGET_PAGE_MASK);
+    dc->phys_pc = dc->phys_pc_start;
     lj = -1;
 
     for(;;) {
Index: target-m68k/translate.c
===================================================================
RCS file: /sources/qemu/qemu/target-m68k/translate.c,v
retrieving revision 1.20
diff -u -d -d -p -r1.20 translate.c
--- target-m68k/translate.c	17 Sep 2007 08:09:53 -0000	1.20
+++ target-m68k/translate.c	12 Oct 2007 22:53:39 -0000
@@ -45,6 +45,8 @@ typedef struct DisasContext {
     CPUM68KState *env;
     target_ulong insn_pc; /* Start of the current instruction.  */
     target_ulong pc;
+    unsigned long phys_pc;
+    unsigned long phys_pc_start;
     int is_jmp;
     int cc_op;
     int user;
@@ -207,10 +209,12 @@ static int gen_ldst(DisasContext *s, int
 static inline uint32_t read_im32(DisasContext *s)
 {
     uint32_t im;
-    im = ((uint32_t)lduw_code(s->pc)) << 16;
+    im = ((uint32_t)lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc)) << 16;
     s->pc += 2;
-    im |= lduw_code(s->pc);
+    s->phys_pc += 2;
+    im |= lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
     s->pc += 2;
+    s->phys_pc += 2;
     return im;
 }
 
@@ -244,8 +248,9 @@ static int gen_lea_indexed(DisasContext 
     uint32_t bd, od;
 
     offset = s->pc;
-    ext = lduw_code(s->pc);
+    ext = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
     s->pc += 2;
+    s->phys_pc += 2;
 
     if ((ext & 0x800) == 0 && !m68k_feature(s->env, M68K_FEATURE_WORD_INDEX))
         return -1;
@@ -258,8 +263,10 @@ static int gen_lea_indexed(DisasContext 
         if ((ext & 0x30) > 0x10) {
             /* base displacement */
             if ((ext & 0x30) == 0x20) {
-                bd = (int16_t)lduw_code(s->pc);
+                bd = (int16_t)lduw_code_p(&s->phys_pc_start, s->phys_pc,
+                                          s->pc);
                 s->pc += 2;
+                s->phys_pc += 2;
             } else {
                 bd = read_im32(s);
             }
@@ -307,8 +314,10 @@ static int gen_lea_indexed(DisasContext 
             if ((ext & 3) > 1) {
                 /* outer displacement */
                 if ((ext & 3) == 2) {
-                    od = (int16_t)lduw_code(s->pc);
+                    od = (int16_t)lduw_code_p(&s->phys_pc_start, s->phys_pc,
+                                              s->pc);
                     s->pc += 2;
+                    s->phys_pc += 2;
                 } else {
                     od = read_im32(s);
                 }
@@ -455,8 +464,9 @@ static int gen_lea(DisasContext *s, uint
     case 5: /* Indirect displacement.  */
         reg += QREG_A0;
         tmp = gen_new_qreg(QMODE_I32);
-        ext = lduw_code(s->pc);
+        ext = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
         s->pc += 2;
+        s->phys_pc += 2;
         gen_op_add32(tmp, reg, gen_im32((int16_t)ext));
         return tmp;
     case 6: /* Indirect index + displacement.  */
@@ -465,8 +475,9 @@ static int gen_lea(DisasContext *s, uint
     case 7: /* Other */
         switch (reg) {
         case 0: /* Absolute short.  */
-            offset = ldsw_code(s->pc);
+            offset = ldsw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
             s->pc += 2;
+            s->phys_pc += 2;
             return gen_im32(offset);
         case 1: /* Absolute long.  */
             offset = read_im32(s);
@@ -474,8 +485,9 @@ static int gen_lea(DisasContext *s, uint
         case 2: /* pc displacement  */
             tmp = gen_new_qreg(QMODE_I32);
             offset = s->pc;
-            offset += ldsw_code(s->pc);
+            offset += ldsw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
             s->pc += 2;
+            s->phys_pc += 2;
             return gen_im32(offset);
         case 3: /* pc index+displacement.  */
             return gen_lea_indexed(s, opsize, -1);
@@ -581,18 +593,23 @@ static int gen_ea(DisasContext *s, uint1
             /* Sign extend values for consistency.  */
             switch (opsize) {
             case OS_BYTE:
-                if (val)
-                    offset = ldsb_code(s->pc + 1);
-                else
-                    offset = ldub_code(s->pc + 1);
+                if (val) {
+                    offset = ldsb_code_p(&s->phys_pc_start, s->phys_pc + 1,
+                                         s->pc + 1);
+                } else {
+                    offset = ldub_code_p(&s->phys_pc_start, s->phys_pc + 1,
+                                         s->pc + 1);
+                }
                 s->pc += 2;
+                s->phys_pc += 2;
                 break;
             case OS_WORD:
                 if (val)
-                    offset = ldsw_code(s->pc);
+                    offset = ldsw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
                 else
-                    offset = lduw_code(s->pc);
+                    offset = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
                 s->pc += 2;
+                s->phys_pc += 2;
                 break;
             case OS_LONG:
                 offset = read_im32(s);
@@ -879,8 +896,9 @@ DISAS_INSN(divl)
     int reg;
     uint16_t ext;
 
-    ext = lduw_code(s->pc);
+    ext = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
     s->pc += 2;
+    s->phys_pc += 2;
     if (ext & 0x87f8) {
         gen_exception(s, s->pc - 4, EXCP_UNSUPPORTED);
         return;
@@ -1066,8 +1084,9 @@ DISAS_INSN(movem)
     int tmp;
     int is_load;
 
-    mask = lduw_code(s->pc);
+    mask = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
     s->pc += 2;
+    s->phys_pc += 2;
     tmp = gen_lea(s, insn, OS_LONG);
     if (tmp == -1) {
         gen_addr_fault(s);
@@ -1111,8 +1130,9 @@ DISAS_INSN(bitop_im)
         opsize = OS_LONG;
     op = (insn >> 6) & 3;
 
-    bitnum = lduw_code(s->pc);
+    bitnum = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
     s->pc += 2;
+    s->phys_pc += 2;
     if (bitnum & 0xff00) {
         disas_undef(s, insn);
         return;
@@ -1375,8 +1395,9 @@ static void gen_set_sr(DisasContext *s, 
     else if ((insn & 0x3f) == 0x3c)
       {
         uint16_t val;
-        val = lduw_code(s->pc);
+        val = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
         s->pc += 2;
+        s->phys_pc += 2;
         gen_set_sr_im(s, val, ccr_only);
       }
     else
@@ -1502,8 +1523,9 @@ DISAS_INSN(mull)
 
     /* The upper 32 bits of the product are discarded, so
        muls.l and mulu.l are functionally equivalent.  */
-    ext = lduw_code(s->pc);
+    ext = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
     s->pc += 2;
+    s->phys_pc += 2;
     if (ext & 0x87ff) {
         gen_exception(s, s->pc - 4, EXCP_UNSUPPORTED);
         return;
@@ -1523,8 +1545,9 @@ DISAS_INSN(link)
     int reg;
     int tmp;
 
-    offset = ldsw_code(s->pc);
+    offset = ldsw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
     s->pc += 2;
+    s->phys_pc += 2;
     reg = AREG(insn, 0);
     tmp = gen_new_qreg(QMODE_I32);
     gen_op_sub32(tmp, QREG_SP, gen_im32(4));
@@ -1622,9 +1645,11 @@ DISAS_INSN(tpf)
     switch (insn & 7) {
     case 2: /* One extension word.  */
         s->pc += 2;
+        s->phys_pc += 2;
         break;
     case 3: /* Two extension words.  */
         s->pc += 4;
+        s->phys_pc += 4;
         break;
     case 4: /* No extension words.  */
         break;
@@ -1644,8 +1669,9 @@ DISAS_INSN(branch)
     op = (insn >> 8) & 0xf;
     offset = (int8_t)insn;
     if (offset == 0) {
-        offset = ldsw_code(s->pc);
+        offset = ldsw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
         s->pc += 2;
+        s->phys_pc += 2;
     } else if (offset == -1) {
         offset = read_im32(s);
     }
@@ -1957,14 +1983,16 @@ DISAS_INSN(strldsr)
     uint32_t addr;
 
     addr = s->pc - 2;
-    ext = lduw_code(s->pc);
+    ext = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
     s->pc += 2;
+    s->phys_pc += 2;
     if (ext != 0x46FC) {
         gen_exception(s, addr, EXCP_UNSUPPORTED);
         return;
     }
-    ext = lduw_code(s->pc);
+    ext = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
     s->pc += 2;
+    s->phys_pc += 2;
     if (IS_USER(s) || (ext & SR_S) == 0) {
         gen_exception(s, addr, EXCP_PRIVILEGE);
         return;
@@ -2032,8 +2060,9 @@ DISAS_INSN(stop)
         return;
     }
 
-    ext = lduw_code(s->pc);
+    ext = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
     s->pc += 2;
+    s->phys_pc += 2;
 
     gen_set_sr_im(s, ext, 0);
     gen_jmp(s, gen_im32(s->pc));
@@ -2059,8 +2088,9 @@ DISAS_INSN(movec)
         return;
     }
 
-    ext = lduw_code(s->pc);
+    ext = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
     s->pc += 2;
+    s->phys_pc += 2;
 
     if (ext & 0x8000) {
         reg = AREG(ext, 12);
@@ -2121,8 +2151,9 @@ DISAS_INSN(fpu)
     int round;
     int opsize;
 
-    ext = lduw_code(s->pc);
+    ext = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
     s->pc += 2;
+    s->phys_pc += 2;
     opmode = ext & 0x7f;
     switch ((ext >> 13) & 7) {
     case 0: case 2:
@@ -2331,6 +2362,7 @@ DISAS_INSN(fpu)
     return;
 undef:
     s->pc -= 2;
+    s->phys_pc -= 2;
     disas_undef_fpu(s, insn);
 }
 
@@ -2343,11 +2375,14 @@ DISAS_INSN(fbcc)
     int l1;
 
     addr = s->pc;
-    offset = ldsw_code(s->pc);
+    offset = ldsw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
     s->pc += 2;
+    s->phys_pc += 2;
     if (insn & (1 << 6)) {
-        offset = (offset << 16) | lduw_code(s->pc);
+        offset = (offset << 16) |
+            lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
         s->pc += 2;
+        s->phys_pc += 2;
     }
 
     l1 = gen_new_label();
@@ -2473,8 +2508,9 @@ DISAS_INSN(mac)
     int dual;
     int saved_flags = -1;
 
-    ext = lduw_code(s->pc);
+    ext = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
     s->pc += 2;
+    s->phys_pc += 2;
 
     acc = ((insn >> 7) & 1) | ((ext >> 3) & 2);
     dual = ((insn & 0x30) != 0 && (ext & 3) != 0);
@@ -2882,8 +2918,9 @@ static void disas_m68k_insn(CPUState * e
 {
     uint16_t insn;
 
-    insn = lduw_code(s->pc);
+    insn = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
     s->pc += 2;
+    s->phys_pc += 2;
 
     opcode_table[insn](s, insn);
 }
@@ -3169,6 +3206,9 @@ gen_intermediate_code_internal(CPUState 
     dc->env = env;
     dc->is_jmp = DISAS_NEXT;
     dc->pc = pc_start;
+    dc->phys_pc_start = (unsigned long)phys_ram_base + tb->page_addr[0] +
+        (pc_start & ~TARGET_PAGE_MASK);
+    dc->phys_pc = dc->phys_pc_start;
     dc->cc_op = CC_OP_DYNAMIC;
     dc->singlestep_enabled = env->singlestep_enabled;
     dc->fpcr = env->fpcr;
Index: target-mips/translate.c
===================================================================
RCS file: /sources/qemu/qemu/target-mips/translate.c,v
retrieving revision 1.106
diff -u -d -d -p -r1.106 translate.c
--- target-mips/translate.c	9 Oct 2007 03:39:58 -0000	1.106
+++ target-mips/translate.c	12 Oct 2007 22:53:39 -0000
@@ -536,6 +536,7 @@ FOP_CONDS(abs, ps)
 typedef struct DisasContext {
     struct TranslationBlock *tb;
     target_ulong pc, saved_pc;
+    unsigned long phys_pc, phys_pc_start;
     uint32_t opcode;
     uint32_t fp_status;
     /* Routine used to access memory */
@@ -1764,6 +1765,7 @@ static void gen_compute_branch (DisasCon
             /* Skip the instruction in the delay slot */
             MIPS_DEBUG("bnever, link and skip");
             ctx->pc += 4;
+            ctx->phys_pc += 4;
             return;
         case OPC_BNEL:    /* rx != rx likely */
         case OPC_BGTZL:   /* 0 > 0 likely */
@@ -1771,6 +1773,7 @@ static void gen_compute_branch (DisasCon
             /* Skip the instruction in the delay slot */
             MIPS_DEBUG("bnever and skip");
             ctx->pc += 4;
+            ctx->phys_pc += 4;
             return;
         case OPC_J:
             ctx->hflags |= MIPS_HFLAG_B;
@@ -6495,6 +6498,9 @@ gen_intermediate_code_internal (CPUState
     gen_opparam_ptr = gen_opparam_buf;
     nb_gen_labels = 0;
     ctx.pc = pc_start;
+    ctx.phys_pc_start = (unsigned long)phys_ram_base + tb->page_addr[0] +
+        (pc_start & ~TARGET_PAGE_MASK);
+    ctx.phys_pc = ctx.phys_pc_start;
     ctx.saved_pc = -1;
     ctx.tb = tb;
     ctx.bstate = BS_NONE;
@@ -6544,9 +6550,10 @@ gen_intermediate_code_internal (CPUState
             gen_opc_hflags[lj] = ctx.hflags & MIPS_HFLAG_BMASK;
             gen_opc_instr_start[lj] = 1;
         }
-        ctx.opcode = ldl_code(ctx.pc);
+        ctx.opcode = ldl_code_p(&ctx.phys_pc_start, ctx.phys_pc, ctx.pc);
         decode_opc(env, &ctx);
         ctx.pc += 4;
+        ctx.phys_pc += 4;
 
         if (env->singlestep_enabled)
             break;
Index: target-ppc/translate.c
===================================================================
RCS file: /sources/qemu/qemu/target-ppc/translate.c,v
retrieving revision 1.92
diff -u -d -d -p -r1.92 translate.c
--- target-ppc/translate.c	7 Oct 2007 23:10:08 -0000	1.92
+++ target-ppc/translate.c	12 Oct 2007 22:53:40 -0000
@@ -6678,6 +6678,7 @@ static always_inline int gen_intermediat
 {
     DisasContext ctx, *ctxp = &ctx;
     opc_handler_t **table, *handler;
+    unsigned long phys_pc, phys_pc_start;
     target_ulong pc_start;
     uint16_t *gen_opc_end;
     int supervisor;
@@ -6685,6 +6686,9 @@ static always_inline int gen_intermediat
     int j, lj = -1;
 
     pc_start = tb->pc;
+    phys_pc_start = (unsigned long)phys_ram_base + tb->page_addr[0] +
+        (pc_start & ~TARGET_PAGE_MASK);
+    phys_pc = phys_pc_start;
     gen_opc_ptr = gen_opc_buf;
     gen_opc_end = gen_opc_buf + OPC_MAX_SIZE;
     gen_opparam_ptr = gen_opparam_buf;
@@ -6763,7 +6767,7 @@ static always_inline int gen_intermediat
                     ctx.nip, 1 - msr_pr, msr_ir);
         }
 #endif
-        ctx.opcode = ldl_code(ctx.nip);
+        ctx.opcode = ldl_code_p(&phys_pc_start, phys_pc, env->nip);
         if (msr_le) {
             ctx.opcode = ((ctx.opcode & 0xFF000000) >> 24) |
                 ((ctx.opcode & 0x00FF0000) >> 8) |
@@ -6778,6 +6782,7 @@ static always_inline int gen_intermediat
         }
 #endif
         ctx.nip += 4;
+        phys_pc += 4;
         table = env->opcodes;
         handler = table[opc1(ctx.opcode)];
         if (is_indirect_opcode(handler)) {
Index: target-sh4/translate.c
===================================================================
RCS file: /sources/qemu/qemu/target-sh4/translate.c,v
retrieving revision 1.18
diff -u -d -d -p -r1.18 translate.c
--- target-sh4/translate.c	29 Sep 2007 19:52:22 -0000	1.18
+++ target-sh4/translate.c	12 Oct 2007 22:53:40 -0000
@@ -1150,11 +1150,15 @@ gen_intermediate_code_internal(CPUState 
 {
     DisasContext ctx;
     target_ulong pc_start;
+    unsigned long phys_pc, phys_pc_start;
     static uint16_t *gen_opc_end;
     uint32_t old_flags;
     int i, ii;
 
     pc_start = tb->pc;
+    phys_pc_start = (unsigned long)phys_ram_base + tb->page_addr[0] +
+        (pc_start & ~TARGET_PAGE_MASK);
+    phys_pc = phys_pc_start;
     gen_opc_ptr = gen_opc_buf;
     gen_opc_end = gen_opc_buf + OPC_MAX_SIZE;
     gen_opparam_ptr = gen_opparam_buf;
@@ -1210,9 +1214,10 @@ gen_intermediate_code_internal(CPUState 
 	fprintf(stderr, "Loading opcode at address 0x%08x\n", ctx.pc);
 	fflush(stderr);
 #endif
-	ctx.opcode = lduw_code(ctx.pc);
+	ctx.opcode = lduw_code_p(&phys_pc_start, phys_pc, ctx.pc);
 	decode_opc(&ctx);
 	ctx.pc += 2;
+        phys_pc += 2;
 	if ((ctx.pc & (TARGET_PAGE_SIZE - 1)) == 0)
 	    break;
 	if (env->singlestep_enabled)
Index: target-sparc/translate.c
===================================================================
RCS file: /sources/qemu/qemu/target-sparc/translate.c,v
retrieving revision 1.74
diff -u -d -d -p -r1.74 translate.c
--- target-sparc/translate.c	10 Oct 2007 19:11:54 -0000	1.74
+++ target-sparc/translate.c	12 Oct 2007 22:53:40 -0000
@@ -48,6 +48,8 @@ typedef struct DisasContext {
     target_ulong pc;    /* current Program Counter: integer or DYNAMIC_PC */
     target_ulong npc;   /* next PC: integer or DYNAMIC_PC or JUMP_PC */
     target_ulong jump_pc[2]; /* used when JUMP_PC pc value is used */
+    unsigned long phys_pc;
+    unsigned long phys_pc_start;
     int is_br;
     int mem_idx;
     int fpu_enabled;
@@ -1089,7 +1091,7 @@ static void disas_sparc_insn(DisasContex
 {
     unsigned int insn, opc, rs1, rs2, rd;
 
-    insn = ldl_code(dc->pc);
+    insn = ldl_code_p(&dc->phys_pc_start, dc->phys_pc, dc->pc);
     opc = GET_FIELD(insn, 0, 1);
 
     rd = GET_FIELD(insn, 2, 6);
@@ -3319,6 +3321,7 @@ static void disas_sparc_insn(DisasContex
     }
     /* default case for non jump instructions */
     if (dc->npc == DYNAMIC_PC) {
+        dc->phys_pc += DYNAMIC_PC - dc->pc;
         dc->pc = DYNAMIC_PC;
         gen_op_next_insn();
     } else if (dc->npc == JUMP_PC) {
@@ -3326,6 +3329,7 @@ static void disas_sparc_insn(DisasContex
         gen_branch2(dc, dc->jump_pc[0], dc->jump_pc[1]);
         dc->is_br = 1;
     } else {
+        dc->phys_pc += dc->npc - dc->pc;
         dc->pc = dc->npc;
         dc->npc = dc->npc + 4;
     }
@@ -3376,6 +3380,9 @@ static inline int gen_intermediate_code_
     dc->tb = tb;
     pc_start = tb->pc;
     dc->pc = pc_start;
+    dc->phys_pc_start = (unsigned long)phys_ram_base + tb->page_addr[0] +
+        (pc_start & ~TARGET_PAGE_MASK);
+    dc->phys_pc = dc->phys_pc_start;
     last_pc = dc->pc;
     dc->npc = (target_ulong) tb->cs_base;
 #if defined(CONFIG_USER_ONLY)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Fwd: Re: [Qemu-devel] RFC: Code fetch optimisation]
  2007-10-12 23:00 [Fwd: Re: [Qemu-devel] RFC: Code fetch optimisation] J. Mayer
@ 2007-10-13  7:11 ` Blue Swirl
  2007-10-13  9:57   ` J. Mayer
  0 siblings, 1 reply; 8+ messages in thread
From: Blue Swirl @ 2007-10-13  7:11 UTC (permalink / raw)
  To: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2836 bytes --]

On 10/13/07, J. Mayer <l_indien@magic.fr> wrote:
> -------- Forwarded Message --------
> > From: Jocelyn Mayer <l_indien@magic.fr>
> > Reply-To: l_indien@magic.fr, qemu-devel@nongnu.org
> > To: qemu-devel@nongnu.org
> > Subject: Re: [Qemu-devel] RFC: Code fetch optimisation
> > Date: Fri, 12 Oct 2007 20:24:44 +0200
> >
> > On Fri, 2007-10-12 at 18:21 +0300, Blue Swirl wrote:
> > > On 10/12/07, J. Mayer <l_indien@magic.fr> wrote:
> > > > Here's a small patch that allow an optimisation for code fetch, at least
> > > > for RISC CPU targets, as suggested by Fabrice Bellard.
> > > > The main idea is that a translated block is never to span over a page
> > > > boundary. As the tb_find_slow routine already gets the physical address
> > > > of the page of code to be translated, the code translator could then
> > > > fetch the code using raw host memory accesses instead of doing it
> > > > through the softmmu routines.
> > > > This patch could also be adapted to RISC CPU targets, with care for the
> > > > last instruction of a page. For now, I did implement it for alpha, arm,
> > > > mips, PowerPC and SH4.
> > > > I don't actually know if the optimsation would bring a sensible speed
> > > > gain or if it will be absolutelly marginal.
> > > >
> > > > Please comment.
> > >
> > > This will not work correctly for execution of MMIO registers, but
> > > maybe that won't work on real hardware either. Who cares.
> >
> > I wonder if this is important or not... But maybe, when retrieving the
> > physical address we could check if it is inside ROM/RAM or an I/O area
> > and in the last case do not give the phys_addr information to the
> > translator. In that case, it would go on using the ldxx_code. I guess if
> > we want to do that, a set of helpers would be appreciated to avoid
> > adding code like:
> > if (phys_pc == 0)
> >   opc = ldul_code(virt_pc)
> > else
> >   opc = ldul_raw(phys_pc)
> > everywhere... I could also add another check so this set of macro would
> > automatically use ldxx_code if we reach a page boundary, which would
> > then make easy to use this optimisation for CISC/VLE architectures too.
> >
> > I'm not sure of the proper solution to allow executing code from mmio
> > devices. But adding specific accessors to handle the CISC/VLE case is to
> > be done.
>
> [...]
>
> I did update my patch following this way and it's now able to run x86
> and PowerPC targets.
> PowerPC is the easy case, x86 is maybe the worst... Well, I'm not really
> sure of what I've done for Sparc, but other targets should be safe.

It broke Sparc, delay slot handling makes things complicated. The
updated patch passes my tests.

For extra performance, I bypassed the ldl_code_p. On Sparc,
instructions can't be split between two pages. Isn't translation
always contained to the same page for all targets like Sparc?

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: code_raw_optim.diff --]
[-- Type: text/x-diff; name="code_raw_optim.diff", Size: 49258 bytes --]

Index: qemu/cpu-all.h
===================================================================
--- qemu.orig/cpu-all.h	2007-10-13 06:22:35.000000000 +0000
+++ qemu/cpu-all.h	2007-10-13 06:25:46.000000000 +0000
@@ -646,6 +646,13 @@
 #define ldl_code(p) ldl_raw(p)
 #define ldq_code(p) ldq_raw(p)
 
+#define ldub_code_p(sp, pp, p) ldub_raw(p)
+#define ldsb_code_p(sp, pp, p) ldsb_raw(p)
+#define lduw_code_p(sp, pp, p) lduw_raw(p)
+#define ldsw_code_p(sp, pp, p) ldsw_raw(p)
+#define ldl_code_p(sp, pp, p) ldl_raw(p)
+#define ldq_code_p(sp, pp, p) ldq_raw(p)
+
 #define ldub_kernel(p) ldub_raw(p)
 #define ldsb_kernel(p) ldsb_raw(p)
 #define lduw_kernel(p) lduw_raw(p)
Index: qemu/cpu-exec.c
===================================================================
--- qemu.orig/cpu-exec.c	2007-10-13 06:22:35.000000000 +0000
+++ qemu/cpu-exec.c	2007-10-13 06:25:46.000000000 +0000
@@ -133,6 +133,7 @@
     tb->tc_ptr = tc_ptr;
     tb->cs_base = cs_base;
     tb->flags = flags;
+    tb->page_addr[0] = phys_page1;
     cpu_gen_code(env, tb, CODE_GEN_MAX_SIZE, &code_gen_size);
     code_gen_ptr = (void *)(((unsigned long)code_gen_ptr + code_gen_size + CODE_GEN_ALIGN - 1) & ~(CODE_GEN_ALIGN - 1));
 
Index: qemu/softmmu_header.h
===================================================================
--- qemu.orig/softmmu_header.h	2007-10-13 06:22:35.000000000 +0000
+++ qemu/softmmu_header.h	2007-10-13 06:25:46.000000000 +0000
@@ -336,6 +336,60 @@
     }
 }
 
+#else
+
+#if DATA_SIZE <= 2
+static inline RES_TYPE glue(glue(glue(lds,SUFFIX),MEMSUFFIX),_p)(unsigned long *start_pc,
+                                                                 unsigned long phys_pc,
+                                                                 target_ulong virt_pc)
+{
+    RES_TYPE opc;
+
+    if (unlikely((*start_pc ^
+                  (phys_pc + sizeof(RES_TYPE) - 1)) >> TARGET_PAGE_BITS)) {
+        /* Slow path: phys_pc is not in the same page than start_pc
+         *            or the insn is spanning two pages
+         */
+        opc = glue(glue(lds,SUFFIX),MEMSUFFIX)(virt_pc);
+        /* Avoid softmmu access on next load */
+        /* XXX: dont: phys PC is not correct anymore
+         *      We chould call get_phys_addr_code(env, pc); and remove the else
+         *      condition, here.
+         */
+        //*start_pc = phys_pc;
+    } else {
+        opc = glue(glue(lds,SUFFIX),_raw)(phys_pc);
+    }
+
+    return opc;
+}
+#endif
+
+static inline RES_TYPE glue(glue(glue(ld,USUFFIX),MEMSUFFIX),_p)(unsigned long *start_pc,
+                                                                 unsigned long phys_pc,
+                                                                 target_ulong virt_pc)
+{
+    RES_TYPE opc;
+
+    if (unlikely((*start_pc ^
+                  (phys_pc + sizeof(RES_TYPE) - 1)) >> TARGET_PAGE_BITS)) {
+        /* Slow path: phys_pc is not in the same page than start_pc
+         *            or the insn is spanning two pages
+         */
+        opc = glue(glue(ld,USUFFIX),MEMSUFFIX)(virt_pc);
+        /* Avoid softmmu access on next load */
+        /* XXX: dont: phys PC is not correct anymore
+         *      We chould call get_phys_addr_code(env, pc); and remove the else
+         *      condition, here.
+         */
+        //*start_pc = phys_pc;
+    } else {
+        opc = glue(glue(ld,USUFFIX),_raw)(phys_pc);
+    }
+
+    return opc;
+}
+
 #endif /* ACCESS_TYPE != 3 */
 
 #endif /* !asm */
Index: qemu/target-alpha/translate.c
===================================================================
--- qemu.orig/target-alpha/translate.c	2007-10-13 06:22:35.000000000 +0000
+++ qemu/target-alpha/translate.c	2007-10-13 06:25:46.000000000 +0000
@@ -1965,6 +1965,7 @@
     static int insn_count;
 #endif
     DisasContext ctx, *ctxp = &ctx;
+    unsigned long phys_pc, phys_pc_start;
     target_ulong pc_start;
     uint32_t insn;
     uint16_t *gen_opc_end;
@@ -1972,6 +1973,9 @@
     int ret;
 
     pc_start = tb->pc;
+    phys_pc_start = (unsigned long)phys_ram_base + tb->page_addr[0] +
+        (pc_start & ~TARGET_PAGE_MASK);
+    phys_pc = phys_pc_start;
     gen_opc_ptr = gen_opc_buf;
     gen_opc_end = gen_opc_buf + OPC_MAX_SIZE;
     gen_opparam_ptr = gen_opparam_buf;
@@ -2010,7 +2014,7 @@
                     ctx.pc, ctx.mem_idx);
         }
 #endif
-        insn = ldl_code(ctx.pc);
+        insn = ldl_code_p(&phys_pc_start, phys_pc, ctx.pc);
 #if defined ALPHA_DEBUG_DISAS
         insn_count++;
         if (logfile != NULL) {
@@ -2018,6 +2022,7 @@
         }
 #endif
         ctx.pc += 4;
+        phys_pc += 4;
         ret = translate_one(ctxp, insn);
         if (ret != 0)
             break;
Index: qemu/target-arm/translate.c
===================================================================
--- qemu.orig/target-arm/translate.c	2007-10-13 06:22:35.000000000 +0000
+++ qemu/target-arm/translate.c	2007-10-13 06:25:46.000000000 +0000
@@ -38,6 +38,8 @@
 /* internal defines */
 typedef struct DisasContext {
     target_ulong pc;
+    unsigned long phys_pc;
+    unsigned long phys_pc_start;
     int is_jmp;
     /* Nonzero if this instruction has been conditionally skipped.  */
     int condjmp;
@@ -2206,8 +2208,9 @@
 {
     unsigned int cond, insn, val, op1, i, shift, rm, rs, rn, rd, sh;
 
-    insn = ldl_code(s->pc);
+    insn = ldl_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
     s->pc += 4;
+    s->phys_pc += 4;
 
     cond = insn >> 28;
     if (cond == 0xf){
@@ -2971,8 +2974,9 @@
     int32_t offset;
     int i;
 
-    insn = lduw_code(s->pc);
+    insn = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
     s->pc += 2;
+    s->phys_pc += 2;
 
     switch (insn >> 12) {
     case 0: case 1:
@@ -3494,7 +3498,7 @@
             break;
         }
         offset = ((int32_t)insn << 21) >> 10;
-        insn = lduw_code(s->pc);
+        insn = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
         offset |= insn & 0x7ff;
 
         val = (uint32_t)s->pc + 2;
@@ -3544,6 +3548,9 @@
 
     dc->is_jmp = DISAS_NEXT;
     dc->pc = pc_start;
+    dc->phys_pc_start = (unsigned long)phys_ram_base + tb->page_addr[0] +
+        (pc_start & ~TARGET_PAGE_MASK);
+    dc->phys_pc = dc->phys_pc_start;
     dc->singlestep_enabled = env->singlestep_enabled;
     dc->condjmp = 0;
     dc->thumb = env->thumb;
Index: qemu/target-cris/translate.c
===================================================================
--- qemu.orig/target-cris/translate.c	2007-10-13 06:22:35.000000000 +0000
+++ qemu/target-cris/translate.c	2007-10-13 06:25:46.000000000 +0000
@@ -100,6 +100,7 @@
 typedef struct DisasContext {
 	CPUState *env;
 	target_ulong pc, insn_pc;
+        unsigned long phys_pc, phys_pc_start;
 
 	/* Decoder.  */
 	uint32_t ir;
@@ -828,7 +829,8 @@
 		if (memsize == 1)
 			insn_len++;
 
-		imm = ldl_code(dc->pc + 2);
+                imm = ldl_code_p(&dc->phys_pc_start, dc->phys_pc + 2,
+                                 dc->pc + 2);
 		if (memsize != 4) {
 			if (s_ext) {
 				imm = sign_extend(imm, (memsize * 8) - 1);
@@ -1962,7 +1964,7 @@
 	rd = dc->op2;
 
 	cris_cc_mask(dc, 0);
-	imm = ldl_code(dc->pc + 2);
+	imm = ldl_code_p(&dc->phys_pc_start, dc->phys_pc + 2, dc->pc + 2);
 	DIS(fprintf (logfile, "lapc 0x%x, $r%u\n", imm + dc->pc, dc->op2));
 	gen_op_movl_T0_im (dc->pc + imm);
 	gen_movl_reg_T0[rd] ();
@@ -1999,7 +2001,7 @@
 {
 	uint32_t imm;
 
-	imm = ldl_code(dc->pc + 2);
+	imm = ldl_code_p(&dc->phys_pc_start, dc->phys_pc + 2, dc->pc + 2);
 
 	DIS(fprintf (logfile, "jas 0x%x\n", imm));
 	cris_cc_mask(dc, 0);
@@ -2016,7 +2018,7 @@
 {
 	uint32_t imm;
 
-	imm = ldl_code(dc->pc + 2);
+	imm = ldl_code_p(&dc->phys_pc_start, dc->phys_pc + 2, dc->pc + 2);
 
 	DIS(fprintf (logfile, "jasc 0x%x\n", imm));
 	cris_cc_mask(dc, 0);
@@ -2047,7 +2049,7 @@
 	int32_t offset;
 	uint32_t cond = dc->op2;
 
-	offset = ldl_code(dc->pc + 2);
+	offset = ldl_code_p(&dc->phys_pc_start, dc->phys_pc + 2, dc->pc + 2);
 	offset = sign_extend(offset, 15);
 
 	DIS(fprintf (logfile, "b%s %d pc=%x dst=%x\n",
@@ -2065,7 +2067,7 @@
 	int32_t simm;
 
 
-	simm = ldl_code(dc->pc + 2);
+	simm = ldl_code_p(&dc->phys_pc_start, dc->phys_pc + 2, dc->pc + 2);
 
 	DIS(fprintf (logfile, "bas 0x%x, $p%u\n", dc->pc + simm, dc->op2));
 	cris_cc_mask(dc, 0);
@@ -2081,7 +2083,7 @@
 static unsigned int dec_basc_im(DisasContext *dc)
 {
 	int32_t simm;
-	simm = ldl_code(dc->pc + 2);
+	simm = ldl_code_p(&dc->phys_pc_start, dc->phys_pc + 2, dc->pc + 2);
 
 	DIS(fprintf (logfile, "basc 0x%x, $p%u\n", dc->pc + simm, dc->op2));
 	cris_cc_mask(dc, 0);
@@ -2259,7 +2261,7 @@
 	int i;
 
 	/* Load a halfword onto the instruction register.  */
-	tmp = ldl_code(dc->pc);
+	tmp = ldl_code_p(&dc->phys_pc_start, dc->phys_pc, dc->pc);
 	dc->ir = tmp & 0xffff;
 
 	/* Now decode it.  */
@@ -2313,6 +2315,9 @@
 	uint32_t next_page_start;
 
 	pc_start = tb->pc;
+        dc->phys_pc_start = (unsigned long)phys_ram_base + tb->page_addr[0] +
+            (pc_start & ~TARGET_PAGE_MASK);
+        dc->phys_pc = dc->phys_pc_start;
 	dc->env = env;
 	dc->tb = tb;
 
@@ -2347,6 +2352,7 @@
 		insn_len = cris_decoder(dc);
 		STATS(gen_op_exec_insn());
 		dc->pc += insn_len;
+                dc->phys_pc += insn_len;
 		if (!dc->flagx_live
 		    || (dc->flagx_live &&
 			!(dc->cc_op == CC_OP_FLAGS && dc->flags_x))) {
Index: qemu/target-i386/translate.c
===================================================================
--- qemu.orig/target-i386/translate.c	2007-10-13 06:22:35.000000000 +0000
+++ qemu/target-i386/translate.c	2007-10-13 06:25:46.000000000 +0000
@@ -73,6 +73,7 @@
     int prefix;
     int aflag, dflag;
     target_ulong pc; /* pc = eip + cs_base */
+    unsigned long phys_pc,phys_pc_start;
     int is_jmp; /* 1 = means jump (stop translation), 2 means CPU
                    static state change (stop translation) */
     /* current block context */
@@ -1451,7 +1452,7 @@
 
         if (base == 4) {
             havesib = 1;
-            code = ldub_code(s->pc++);
+            code = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
             scale = (code >> 6) & 3;
             index = ((code >> 3) & 7) | REX_X(s);
             base = (code & 7);
@@ -1462,8 +1463,10 @@
         case 0:
             if ((base & 7) == 5) {
                 base = -1;
-                disp = (int32_t)ldl_code(s->pc);
+                disp = (int32_t)ldl_code_p(&s->phys_pc_start, s->phys_pc,
+                                           s->pc);
                 s->pc += 4;
+                s->phys_pc += 4;
                 if (CODE64(s) && !havesib) {
                     disp += s->pc + s->rip_offset;
                 }
@@ -1472,12 +1475,14 @@
             }
             break;
         case 1:
-            disp = (int8_t)ldub_code(s->pc++);
+            disp = (int8_t)ldub_code_p(&s->phys_pc_start, s->phys_pc++,
+                                       s->pc++);
             break;
         default:
         case 2:
-            disp = ldl_code(s->pc);
+            disp = ldl_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
             s->pc += 4;
+            s->phys_pc += 4;
             break;
         }
 
@@ -1545,8 +1550,9 @@
         switch (mod) {
         case 0:
             if (rm == 6) {
-                disp = lduw_code(s->pc);
+                disp = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
                 s->pc += 2;
+                s->phys_pc += 2;
                 gen_op_movl_A0_im(disp);
                 rm = 0; /* avoid SS override */
                 goto no_rm;
@@ -1555,12 +1561,14 @@
             }
             break;
         case 1:
-            disp = (int8_t)ldub_code(s->pc++);
+            disp = (int8_t)ldub_code_p(&s->phys_pc_start, s->phys_pc++,
+                                       s->pc++);
             break;
         default:
         case 2:
-            disp = lduw_code(s->pc);
+            disp = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
             s->pc += 2;
+            s->phys_pc += 2;
             break;
         }
         switch(rm) {
@@ -1629,7 +1637,7 @@
         base = rm;
 
         if (base == 4) {
-            code = ldub_code(s->pc++);
+            code = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
             base = (code & 7);
         }
 
@@ -1637,14 +1645,17 @@
         case 0:
             if (base == 5) {
                 s->pc += 4;
+                s->phys_pc += 4;
             }
             break;
         case 1:
             s->pc++;
+            s->phys_pc++;
             break;
         default:
         case 2:
             s->pc += 4;
+            s->phys_pc += 4;
             break;
         }
     } else {
@@ -1652,14 +1663,17 @@
         case 0:
             if (rm == 6) {
                 s->pc += 2;
+                s->phys_pc += 2;
             }
             break;
         case 1:
             s->pc++;
+            s->phys_pc++;
             break;
         default:
         case 2:
             s->pc += 2;
+            s->phys_pc += 2;
             break;
         }
     }
@@ -1727,17 +1741,20 @@
 
     switch(ot) {
     case OT_BYTE:
-        ret = ldub_code(s->pc);
+        ret = ldub_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
         s->pc++;
+        s->phys_pc++;
         break;
     case OT_WORD:
-        ret = lduw_code(s->pc);
+        ret = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
         s->pc += 2;
+        s->phys_pc += 2;
         break;
     default:
     case OT_LONG:
-        ret = ldl_code(s->pc);
+        ret = ldl_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
         s->pc += 4;
+        s->phys_pc += 4;
         break;
     }
     return ret;
@@ -2689,7 +2706,7 @@
         gen_op_enter_mmx();
     }
 
-    modrm = ldub_code(s->pc++);
+    modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
     reg = ((modrm >> 3) & 7);
     if (is_xmm)
         reg |= rex_r;
@@ -2962,7 +2979,7 @@
         case 0x171: /* shift xmm, im */
         case 0x172:
         case 0x173:
-            val = ldub_code(s->pc++);
+            val = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
             if (is_xmm) {
                 gen_op_movl_T0_im(val);
                 gen_op_movl_env_T0(offsetof(CPUX86State,xmm_t0.XMM_L(0)));
@@ -3082,7 +3099,7 @@
         case 0x1c4:
             s->rip_offset = 1;
             gen_ldst_modrm(s, modrm, OT_WORD, OR_TMP0, 0);
-            val = ldub_code(s->pc++);
+            val = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
             if (b1) {
                 val &= 7;
                 gen_op_pinsrw_xmm(offsetof(CPUX86State,xmm_regs[reg]), val);
@@ -3095,7 +3112,7 @@
         case 0x1c5:
             if (mod != 3)
                 goto illegal_op;
-            val = ldub_code(s->pc++);
+            val = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
             if (b1) {
                 val &= 7;
                 rm = (modrm & 7) | REX_B(s);
@@ -3213,13 +3230,13 @@
         switch(b) {
         case 0x70: /* pshufx insn */
         case 0xc6: /* pshufx insn */
-            val = ldub_code(s->pc++);
+            val = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
             sse_op3 = (GenOpFunc3 *)sse_op2;
             sse_op3(op1_offset, op2_offset, val);
             break;
         case 0xc2:
             /* compare insns */
-            val = ldub_code(s->pc++);
+            val = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
             if (val >= 8)
                 goto illegal_op;
             sse_op2 = sse_op_table4[val][b1];
@@ -3260,8 +3277,9 @@
 #endif
     s->rip_offset = 0; /* for relative ip address */
  next_byte:
-    b = ldub_code(s->pc);
+    b = ldub_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
     s->pc++;
+    s->phys_pc++;
     /* check prefixes */
 #ifdef TARGET_X86_64
     if (CODE64(s)) {
@@ -3375,7 +3393,7 @@
     case 0x0f:
         /**************************/
         /* extended op code */
-        b = ldub_code(s->pc++) | 0x100;
+        b = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++) | 0x100;
         goto reswitch;
 
         /**************************/
@@ -3400,7 +3418,7 @@
 
             switch(f) {
             case 0: /* OP Ev, Gv */
-                modrm = ldub_code(s->pc++);
+                modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
                 reg = ((modrm >> 3) & 7) | rex_r;
                 mod = (modrm >> 6) & 3;
                 rm = (modrm & 7) | REX_B(s);
@@ -3422,7 +3440,7 @@
                 gen_op(s, op, ot, opreg);
                 break;
             case 1: /* OP Gv, Ev */
-                modrm = ldub_code(s->pc++);
+                modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
                 mod = (modrm >> 6) & 3;
                 reg = ((modrm >> 3) & 7) | rex_r;
                 rm = (modrm & 7) | REX_B(s);
@@ -3457,7 +3475,7 @@
             else
                 ot = dflag + OT_WORD;
 
-            modrm = ldub_code(s->pc++);
+            modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
             mod = (modrm >> 6) & 3;
             rm = (modrm & 7) | REX_B(s);
             op = (modrm >> 3) & 7;
@@ -3506,7 +3524,7 @@
         else
             ot = dflag + OT_WORD;
 
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         mod = (modrm >> 6) & 3;
         rm = (modrm & 7) | REX_B(s);
         op = (modrm >> 3) & 7;
@@ -3648,7 +3666,7 @@
         else
             ot = dflag + OT_WORD;
 
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         mod = (modrm >> 6) & 3;
         rm = (modrm & 7) | REX_B(s);
         op = (modrm >> 3) & 7;
@@ -3754,7 +3772,7 @@
         else
             ot = dflag + OT_WORD;
 
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         mod = (modrm >> 6) & 3;
         rm = (modrm & 7) | REX_B(s);
         reg = ((modrm >> 3) & 7) | rex_r;
@@ -3805,7 +3823,7 @@
     case 0x69: /* imul Gv, Ev, I */
     case 0x6b:
         ot = dflag + OT_WORD;
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         reg = ((modrm >> 3) & 7) | rex_r;
         if (b == 0x69)
             s->rip_offset = insn_const_size(ot);
@@ -3841,7 +3859,7 @@
             ot = OT_BYTE;
         else
             ot = dflag + OT_WORD;
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         reg = ((modrm >> 3) & 7) | rex_r;
         mod = (modrm >> 6) & 3;
         if (mod == 3) {
@@ -3868,7 +3886,7 @@
             ot = OT_BYTE;
         else
             ot = dflag + OT_WORD;
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         reg = ((modrm >> 3) & 7) | rex_r;
         mod = (modrm >> 6) & 3;
         gen_op_mov_TN_reg[ot][1][reg]();
@@ -3885,7 +3903,7 @@
         s->cc_op = CC_OP_SUBB + ot;
         break;
     case 0x1c7: /* cmpxchg8b */
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         mod = (modrm >> 6) & 3;
         if (mod == 3)
             goto illegal_op;
@@ -3944,7 +3962,7 @@
         } else {
             ot = dflag + OT_WORD;
         }
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         mod = (modrm >> 6) & 3;
         gen_pop_T0(s);
         if (mod == 3) {
@@ -3963,9 +3981,10 @@
     case 0xc8: /* enter */
         {
             int level;
-            val = lduw_code(s->pc);
+            val = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
             s->pc += 2;
-            level = ldub_code(s->pc++);
+            s->phys_pc += 2;
+            level = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
             gen_enter(s, val, level);
         }
         break;
@@ -4045,7 +4064,7 @@
             ot = OT_BYTE;
         else
             ot = dflag + OT_WORD;
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         reg = ((modrm >> 3) & 7) | rex_r;
 
         /* generate a generic store */
@@ -4057,7 +4076,7 @@
             ot = OT_BYTE;
         else
             ot = dflag + OT_WORD;
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         mod = (modrm >> 6) & 3;
         if (mod != 3) {
             s->rip_offset = insn_const_size(ot);
@@ -4076,14 +4095,14 @@
             ot = OT_BYTE;
         else
             ot = OT_WORD + dflag;
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         reg = ((modrm >> 3) & 7) | rex_r;
 
         gen_ldst_modrm(s, modrm, ot, OR_TMP0, 0);
         gen_op_mov_reg_T0[ot][reg]();
         break;
     case 0x8e: /* mov seg, Gv */
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         reg = (modrm >> 3) & 7;
         if (reg >= 6 || reg == R_CS)
             goto illegal_op;
@@ -4103,7 +4122,7 @@
         }
         break;
     case 0x8c: /* mov Gv, seg */
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         reg = (modrm >> 3) & 7;
         mod = (modrm >> 6) & 3;
         if (reg >= 6)
@@ -4126,7 +4145,7 @@
             d_ot = dflag + OT_WORD;
             /* ot is the size of source */
             ot = (b & 1) + OT_BYTE;
-            modrm = ldub_code(s->pc++);
+            modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
             reg = ((modrm >> 3) & 7) | rex_r;
             mod = (modrm >> 6) & 3;
             rm = (modrm & 7) | REX_B(s);
@@ -4163,7 +4182,7 @@
 
     case 0x8d: /* lea */
         ot = dflag + OT_WORD;
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         mod = (modrm >> 6) & 3;
         if (mod == 3)
             goto illegal_op;
@@ -4190,8 +4209,9 @@
                 ot = dflag + OT_WORD;
 #ifdef TARGET_X86_64
             if (s->aflag == 2) {
-                offset_addr = ldq_code(s->pc);
+                offset_addr = ldq_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
                 s->pc += 8;
+                s->phys_pc += 8;
                 if (offset_addr == (int32_t)offset_addr)
                     gen_op_movq_A0_im(offset_addr);
                 else
@@ -4243,8 +4263,9 @@
         if (dflag == 2) {
             uint64_t tmp;
             /* 64 bit case */
-            tmp = ldq_code(s->pc);
+            tmp = ldq_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
             s->pc += 8;
+            s->phys_pc += 8;
             reg = (b & 7) | REX_B(s);
             gen_movtl_T0_im(tmp);
             gen_op_mov_reg_T0[OT_QUAD][reg]();
@@ -4270,7 +4291,7 @@
             ot = OT_BYTE;
         else
             ot = dflag + OT_WORD;
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         reg = ((modrm >> 3) & 7) | rex_r;
         mod = (modrm >> 6) & 3;
         if (mod == 3) {
@@ -4313,7 +4334,7 @@
         op = R_GS;
     do_lxx:
         ot = dflag ? OT_LONG : OT_WORD;
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         reg = ((modrm >> 3) & 7) | rex_r;
         mod = (modrm >> 6) & 3;
         if (mod == 3)
@@ -4345,7 +4366,7 @@
             else
                 ot = dflag + OT_WORD;
 
-            modrm = ldub_code(s->pc++);
+            modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
             mod = (modrm >> 6) & 3;
             op = (modrm >> 3) & 7;
 
@@ -4364,7 +4385,8 @@
                 gen_shift(s, op, ot, opreg, OR_ECX);
             } else {
                 if (shift == 2) {
-                    shift = ldub_code(s->pc++);
+                    shift = ldub_code_p(&s->phys_pc_start, s->phys_pc++,
+                                        s->pc++);
                 }
                 gen_shifti(s, op, ot, opreg, shift);
             }
@@ -4398,7 +4420,7 @@
         shift = 0;
     do_shiftd:
         ot = dflag + OT_WORD;
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         mod = (modrm >> 6) & 3;
         rm = (modrm & 7) | REX_B(s);
         reg = ((modrm >> 3) & 7) | rex_r;
@@ -4412,7 +4434,7 @@
         gen_op_mov_TN_reg[ot][1][reg]();
 
         if (shift) {
-            val = ldub_code(s->pc++);
+            val = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
             if (ot == OT_QUAD)
                 val &= 0x3f;
             else
@@ -4450,7 +4472,7 @@
             gen_exception(s, EXCP07_PREX, pc_start - s->cs_base);
             break;
         }
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         mod = (modrm >> 6) & 3;
         rm = modrm & 7;
         op = ((b & 7) << 3) | ((modrm >> 3) & 7);
@@ -5013,7 +5035,7 @@
             ot = OT_BYTE;
         else
             ot = dflag ? OT_LONG : OT_WORD;
-        val = ldub_code(s->pc++);
+        val = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         gen_op_movl_T0_im(val);
         gen_check_io(s, ot, 0, pc_start - s->cs_base);
         if (gen_svm_check_io(s, pc_start,
@@ -5029,7 +5051,7 @@
             ot = OT_BYTE;
         else
             ot = dflag ? OT_LONG : OT_WORD;
-        val = ldub_code(s->pc++);
+        val = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         gen_op_movl_T0_im(val);
         gen_check_io(s, ot, 0, pc_start - s->cs_base);
         if (gen_svm_check_io(s, pc_start, svm_is_rep(prefixes) |
@@ -5073,8 +5095,9 @@
         /************************/
         /* control */
     case 0xc2: /* ret im */
-        val = ldsw_code(s->pc);
+        val = ldsw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
         s->pc += 2;
+        s->phys_pc += 2;
         gen_pop_T0(s);
         if (CODE64(s) && s->dflag)
             s->dflag = 2;
@@ -5093,8 +5116,9 @@
         gen_eob(s);
         break;
     case 0xca: /* lret im */
-        val = ldsw_code(s->pc);
+        val = ldsw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
         s->pc += 2;
+        s->phys_pc += 2;
     do_lret:
         if (s->pe && !s->vm86) {
             if (s->cc_op != CC_OP_DYNAMIC)
@@ -5223,13 +5247,13 @@
         break;
 
     case 0x190 ... 0x19f: /* setcc Gv */
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         gen_setcc(s, b);
         gen_ldst_modrm(s, modrm, OT_BYTE, OR_TMP0, 1);
         break;
     case 0x140 ... 0x14f: /* cmov Gv, Ev */
         ot = dflag + OT_WORD;
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         reg = ((modrm >> 3) & 7) | rex_r;
         mod = (modrm >> 6) & 3;
         gen_setcc(s, b);
@@ -5338,7 +5362,7 @@
         /* bit operations */
     case 0x1ba: /* bt/bts/btr/btc Gv, im */
         ot = dflag + OT_WORD;
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         op = (modrm >> 3) & 7;
         mod = (modrm >> 6) & 3;
         rm = (modrm & 7) | REX_B(s);
@@ -5350,7 +5374,7 @@
             gen_op_mov_TN_reg[ot][0][rm]();
         }
         /* load shift */
-        val = ldub_code(s->pc++);
+        val = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         gen_op_movl_T1_im(val);
         if (op < 4)
             goto illegal_op;
@@ -5378,7 +5402,7 @@
         op = 3;
     do_btx:
         ot = dflag + OT_WORD;
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         reg = ((modrm >> 3) & 7) | rex_r;
         mod = (modrm >> 6) & 3;
         rm = (modrm & 7) | REX_B(s);
@@ -5404,7 +5428,7 @@
     case 0x1bc: /* bsf */
     case 0x1bd: /* bsr */
         ot = dflag + OT_WORD;
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         reg = ((modrm >> 3) & 7) | rex_r;
         gen_ldst_modrm(s, modrm, ot, OR_TMP0, 0);
         /* NOTE: in order to handle the 0 case, we must load the
@@ -5451,7 +5475,7 @@
     case 0xd4: /* aam */
         if (CODE64(s))
             goto illegal_op;
-        val = ldub_code(s->pc++);
+        val = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         if (val == 0) {
             gen_exception(s, EXCP00_DIVZ, pc_start - s->cs_base);
         } else {
@@ -5462,7 +5486,7 @@
     case 0xd5: /* aad */
         if (CODE64(s))
             goto illegal_op;
-        val = ldub_code(s->pc++);
+        val = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         gen_op_aad(val);
         s->cc_op = CC_OP_LOGICB;
         break;
@@ -5494,7 +5518,7 @@
         gen_interrupt(s, EXCP03_INT3, pc_start - s->cs_base, s->pc - s->cs_base);
         break;
     case 0xcd: /* int N */
-        val = ldub_code(s->pc++);
+        val = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         if (gen_svm_check_intercept(s, pc_start, SVM_EXIT_SWINT))
             break;
         if (s->vm86 && s->iopl != 3) {
@@ -5567,7 +5591,7 @@
         if (CODE64(s))
             goto illegal_op;
         ot = dflag ? OT_LONG : OT_WORD;
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         reg = (modrm >> 3) & 7;
         mod = (modrm >> 6) & 3;
         if (mod == 3)
@@ -5738,7 +5762,7 @@
         }
         break;
     case 0x100:
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         mod = (modrm >> 6) & 3;
         op = (modrm >> 3) & 7;
         switch(op) {
@@ -5808,7 +5832,7 @@
         }
         break;
     case 0x101:
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         mod = (modrm >> 6) & 3;
         op = (modrm >> 3) & 7;
         rm = modrm & 7;
@@ -6022,7 +6046,7 @@
             /* d_ot is the size of destination */
             d_ot = dflag + OT_WORD;
 
-            modrm = ldub_code(s->pc++);
+            modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
             reg = ((modrm >> 3) & 7) | rex_r;
             mod = (modrm >> 6) & 3;
             rm = (modrm & 7) | REX_B(s);
@@ -6048,7 +6072,7 @@
             if (!s->pe || s->vm86)
                 goto illegal_op;
             ot = dflag ? OT_LONG : OT_WORD;
-            modrm = ldub_code(s->pc++);
+            modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
             reg = (modrm >> 3) & 7;
             mod = (modrm >> 6) & 3;
             rm = modrm & 7;
@@ -6075,7 +6099,7 @@
         if (!s->pe || s->vm86)
             goto illegal_op;
         ot = dflag ? OT_LONG : OT_WORD;
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         reg = ((modrm >> 3) & 7) | rex_r;
         gen_ldst_modrm(s, modrm, ot, OR_TMP0, 0);
         gen_op_mov_TN_reg[ot][1][reg]();
@@ -6089,7 +6113,7 @@
         gen_op_mov_reg_T1[ot][reg]();
         break;
     case 0x118:
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         mod = (modrm >> 6) & 3;
         op = (modrm >> 3) & 7;
         switch(op) {
@@ -6108,7 +6132,7 @@
         }
         break;
     case 0x119 ... 0x11f: /* nop (multi byte) */
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         gen_nop_modrm(s, modrm);
         break;
     case 0x120: /* mov reg, crN */
@@ -6116,7 +6140,7 @@
         if (s->cpl != 0) {
             gen_exception(s, EXCP0D_GPF, pc_start - s->cs_base);
         } else {
-            modrm = ldub_code(s->pc++);
+            modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
             if ((modrm & 0xc0) != 0xc0)
                 goto illegal_op;
             rm = (modrm & 7) | REX_B(s);
@@ -6158,7 +6182,7 @@
         if (s->cpl != 0) {
             gen_exception(s, EXCP0D_GPF, pc_start - s->cs_base);
         } else {
-            modrm = ldub_code(s->pc++);
+            modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
             if ((modrm & 0xc0) != 0xc0)
                 goto illegal_op;
             rm = (modrm & 7) | REX_B(s);
@@ -6199,7 +6223,7 @@
         if (!(s->cpuid_features & CPUID_SSE2))
             goto illegal_op;
         ot = s->dflag == 2 ? OT_QUAD : OT_LONG;
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         mod = (modrm >> 6) & 3;
         if (mod == 3)
             goto illegal_op;
@@ -6208,7 +6232,7 @@
         gen_ldst_modrm(s, modrm, ot, reg, 1);
         break;
     case 0x1ae:
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         mod = (modrm >> 6) & 3;
         op = (modrm >> 3) & 7;
         switch(op) {
@@ -6274,7 +6298,7 @@
         }
         break;
     case 0x10d: /* prefetch */
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         gen_lea_modrm(s, modrm, &reg_addr, &offset_addr);
         /* ignore for now */
         break;
@@ -6752,6 +6776,9 @@
 
     dc->is_jmp = DISAS_NEXT;
     pc_ptr = pc_start;
+    dc->phys_pc_start = (unsigned long)phys_ram_base + tb->page_addr[0] +
+        (pc_start & ~TARGET_PAGE_MASK);
+    dc->phys_pc = dc->phys_pc_start;
     lj = -1;
 
     for(;;) {
Index: qemu/target-m68k/translate.c
===================================================================
--- qemu.orig/target-m68k/translate.c	2007-10-13 06:22:35.000000000 +0000
+++ qemu/target-m68k/translate.c	2007-10-13 06:25:46.000000000 +0000
@@ -45,6 +45,8 @@
     CPUM68KState *env;
     target_ulong insn_pc; /* Start of the current instruction.  */
     target_ulong pc;
+    unsigned long phys_pc;
+    unsigned long phys_pc_start;
     int is_jmp;
     int cc_op;
     int user;
@@ -207,10 +209,12 @@
 static inline uint32_t read_im32(DisasContext *s)
 {
     uint32_t im;
-    im = ((uint32_t)lduw_code(s->pc)) << 16;
+    im = ((uint32_t)lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc)) << 16;
     s->pc += 2;
-    im |= lduw_code(s->pc);
+    s->phys_pc += 2;
+    im |= lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
     s->pc += 2;
+    s->phys_pc += 2;
     return im;
 }
 
@@ -244,8 +248,9 @@
     uint32_t bd, od;
 
     offset = s->pc;
-    ext = lduw_code(s->pc);
+    ext = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
     s->pc += 2;
+    s->phys_pc += 2;
 
     if ((ext & 0x800) == 0 && !m68k_feature(s->env, M68K_FEATURE_WORD_INDEX))
         return -1;
@@ -258,8 +263,10 @@
         if ((ext & 0x30) > 0x10) {
             /* base displacement */
             if ((ext & 0x30) == 0x20) {
-                bd = (int16_t)lduw_code(s->pc);
+                bd = (int16_t)lduw_code_p(&s->phys_pc_start, s->phys_pc,
+                                          s->pc);
                 s->pc += 2;
+                s->phys_pc += 2;
             } else {
                 bd = read_im32(s);
             }
@@ -307,8 +314,10 @@
             if ((ext & 3) > 1) {
                 /* outer displacement */
                 if ((ext & 3) == 2) {
-                    od = (int16_t)lduw_code(s->pc);
+                    od = (int16_t)lduw_code_p(&s->phys_pc_start, s->phys_pc,
+                                              s->pc);
                     s->pc += 2;
+                    s->phys_pc += 2;
                 } else {
                     od = read_im32(s);
                 }
@@ -455,8 +464,9 @@
     case 5: /* Indirect displacement.  */
         reg += QREG_A0;
         tmp = gen_new_qreg(QMODE_I32);
-        ext = lduw_code(s->pc);
+        ext = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
         s->pc += 2;
+        s->phys_pc += 2;
         gen_op_add32(tmp, reg, gen_im32((int16_t)ext));
         return tmp;
     case 6: /* Indirect index + displacement.  */
@@ -465,8 +475,9 @@
     case 7: /* Other */
         switch (reg) {
         case 0: /* Absolute short.  */
-            offset = ldsw_code(s->pc);
+            offset = ldsw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
             s->pc += 2;
+            s->phys_pc += 2;
             return gen_im32(offset);
         case 1: /* Absolute long.  */
             offset = read_im32(s);
@@ -474,8 +485,9 @@
         case 2: /* pc displacement  */
             tmp = gen_new_qreg(QMODE_I32);
             offset = s->pc;
-            offset += ldsw_code(s->pc);
+            offset += ldsw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
             s->pc += 2;
+            s->phys_pc += 2;
             return gen_im32(offset);
         case 3: /* pc index+displacement.  */
             return gen_lea_indexed(s, opsize, -1);
@@ -581,18 +593,23 @@
             /* Sign extend values for consistency.  */
             switch (opsize) {
             case OS_BYTE:
-                if (val)
-                    offset = ldsb_code(s->pc + 1);
-                else
-                    offset = ldub_code(s->pc + 1);
+                if (val) {
+                    offset = ldsb_code_p(&s->phys_pc_start, s->phys_pc + 1,
+                                         s->pc + 1);
+                } else {
+                    offset = ldub_code_p(&s->phys_pc_start, s->phys_pc + 1,
+                                         s->pc + 1);
+                }
                 s->pc += 2;
+                s->phys_pc += 2;
                 break;
             case OS_WORD:
                 if (val)
-                    offset = ldsw_code(s->pc);
+                    offset = ldsw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
                 else
-                    offset = lduw_code(s->pc);
+                    offset = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
                 s->pc += 2;
+                s->phys_pc += 2;
                 break;
             case OS_LONG:
                 offset = read_im32(s);
@@ -879,8 +896,9 @@
     int reg;
     uint16_t ext;
 
-    ext = lduw_code(s->pc);
+    ext = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
     s->pc += 2;
+    s->phys_pc += 2;
     if (ext & 0x87f8) {
         gen_exception(s, s->pc - 4, EXCP_UNSUPPORTED);
         return;
@@ -1066,8 +1084,9 @@
     int tmp;
     int is_load;
 
-    mask = lduw_code(s->pc);
+    mask = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
     s->pc += 2;
+    s->phys_pc += 2;
     tmp = gen_lea(s, insn, OS_LONG);
     if (tmp == -1) {
         gen_addr_fault(s);
@@ -1111,8 +1130,9 @@
         opsize = OS_LONG;
     op = (insn >> 6) & 3;
 
-    bitnum = lduw_code(s->pc);
+    bitnum = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
     s->pc += 2;
+    s->phys_pc += 2;
     if (bitnum & 0xff00) {
         disas_undef(s, insn);
         return;
@@ -1375,8 +1395,9 @@
     else if ((insn & 0x3f) == 0x3c)
       {
         uint16_t val;
-        val = lduw_code(s->pc);
+        val = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
         s->pc += 2;
+        s->phys_pc += 2;
         gen_set_sr_im(s, val, ccr_only);
       }
     else
@@ -1502,8 +1523,9 @@
 
     /* The upper 32 bits of the product are discarded, so
        muls.l and mulu.l are functionally equivalent.  */
-    ext = lduw_code(s->pc);
+    ext = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
     s->pc += 2;
+    s->phys_pc += 2;
     if (ext & 0x87ff) {
         gen_exception(s, s->pc - 4, EXCP_UNSUPPORTED);
         return;
@@ -1523,8 +1545,9 @@
     int reg;
     int tmp;
 
-    offset = ldsw_code(s->pc);
+    offset = ldsw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
     s->pc += 2;
+    s->phys_pc += 2;
     reg = AREG(insn, 0);
     tmp = gen_new_qreg(QMODE_I32);
     gen_op_sub32(tmp, QREG_SP, gen_im32(4));
@@ -1622,9 +1645,11 @@
     switch (insn & 7) {
     case 2: /* One extension word.  */
         s->pc += 2;
+        s->phys_pc += 2;
         break;
     case 3: /* Two extension words.  */
         s->pc += 4;
+        s->phys_pc += 4;
         break;
     case 4: /* No extension words.  */
         break;
@@ -1644,8 +1669,9 @@
     op = (insn >> 8) & 0xf;
     offset = (int8_t)insn;
     if (offset == 0) {
-        offset = ldsw_code(s->pc);
+        offset = ldsw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
         s->pc += 2;
+        s->phys_pc += 2;
     } else if (offset == -1) {
         offset = read_im32(s);
     }
@@ -1957,14 +1983,16 @@
     uint32_t addr;
 
     addr = s->pc - 2;
-    ext = lduw_code(s->pc);
+    ext = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
     s->pc += 2;
+    s->phys_pc += 2;
     if (ext != 0x46FC) {
         gen_exception(s, addr, EXCP_UNSUPPORTED);
         return;
     }
-    ext = lduw_code(s->pc);
+    ext = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
     s->pc += 2;
+    s->phys_pc += 2;
     if (IS_USER(s) || (ext & SR_S) == 0) {
         gen_exception(s, addr, EXCP_PRIVILEGE);
         return;
@@ -2032,8 +2060,9 @@
         return;
     }
 
-    ext = lduw_code(s->pc);
+    ext = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
     s->pc += 2;
+    s->phys_pc += 2;
 
     gen_set_sr_im(s, ext, 0);
     gen_jmp(s, gen_im32(s->pc));
@@ -2059,8 +2088,9 @@
         return;
     }
 
-    ext = lduw_code(s->pc);
+    ext = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
     s->pc += 2;
+    s->phys_pc += 2;
 
     if (ext & 0x8000) {
         reg = AREG(ext, 12);
@@ -2121,8 +2151,9 @@
     int round;
     int opsize;
 
-    ext = lduw_code(s->pc);
+    ext = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
     s->pc += 2;
+    s->phys_pc += 2;
     opmode = ext & 0x7f;
     switch ((ext >> 13) & 7) {
     case 0: case 2:
@@ -2331,6 +2362,7 @@
     return;
 undef:
     s->pc -= 2;
+    s->phys_pc -= 2;
     disas_undef_fpu(s, insn);
 }
 
@@ -2343,11 +2375,14 @@
     int l1;
 
     addr = s->pc;
-    offset = ldsw_code(s->pc);
+    offset = ldsw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
     s->pc += 2;
+    s->phys_pc += 2;
     if (insn & (1 << 6)) {
-        offset = (offset << 16) | lduw_code(s->pc);
+        offset = (offset << 16) |
+            lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
         s->pc += 2;
+        s->phys_pc += 2;
     }
 
     l1 = gen_new_label();
@@ -2473,8 +2508,9 @@
     int dual;
     int saved_flags = -1;
 
-    ext = lduw_code(s->pc);
+    ext = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
     s->pc += 2;
+    s->phys_pc += 2;
 
     acc = ((insn >> 7) & 1) | ((ext >> 3) & 2);
     dual = ((insn & 0x30) != 0 && (ext & 3) != 0);
@@ -2882,8 +2918,9 @@
 {
     uint16_t insn;
 
-    insn = lduw_code(s->pc);
+    insn = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
     s->pc += 2;
+    s->phys_pc += 2;
 
     opcode_table[insn](s, insn);
 }
@@ -3169,6 +3206,9 @@
     dc->env = env;
     dc->is_jmp = DISAS_NEXT;
     dc->pc = pc_start;
+    dc->phys_pc_start = (unsigned long)phys_ram_base + tb->page_addr[0] +
+        (pc_start & ~TARGET_PAGE_MASK);
+    dc->phys_pc = dc->phys_pc_start;
     dc->cc_op = CC_OP_DYNAMIC;
     dc->singlestep_enabled = env->singlestep_enabled;
     dc->fpcr = env->fpcr;
Index: qemu/target-mips/translate.c
===================================================================
--- qemu.orig/target-mips/translate.c	2007-10-13 06:22:35.000000000 +0000
+++ qemu/target-mips/translate.c	2007-10-13 06:25:46.000000000 +0000
@@ -536,6 +536,7 @@
 typedef struct DisasContext {
     struct TranslationBlock *tb;
     target_ulong pc, saved_pc;
+    unsigned long phys_pc, phys_pc_start;
     uint32_t opcode;
     uint32_t fp_status;
     /* Routine used to access memory */
@@ -1764,6 +1765,7 @@
             /* Skip the instruction in the delay slot */
             MIPS_DEBUG("bnever, link and skip");
             ctx->pc += 4;
+            ctx->phys_pc += 4;
             return;
         case OPC_BNEL:    /* rx != rx likely */
         case OPC_BGTZL:   /* 0 > 0 likely */
@@ -1771,6 +1773,7 @@
             /* Skip the instruction in the delay slot */
             MIPS_DEBUG("bnever and skip");
             ctx->pc += 4;
+            ctx->phys_pc += 4;
             return;
         case OPC_J:
             ctx->hflags |= MIPS_HFLAG_B;
@@ -6495,6 +6498,9 @@
     gen_opparam_ptr = gen_opparam_buf;
     nb_gen_labels = 0;
     ctx.pc = pc_start;
+    ctx.phys_pc_start = (unsigned long)phys_ram_base + tb->page_addr[0] +
+        (pc_start & ~TARGET_PAGE_MASK);
+    ctx.phys_pc = ctx.phys_pc_start;
     ctx.saved_pc = -1;
     ctx.tb = tb;
     ctx.bstate = BS_NONE;
@@ -6544,9 +6550,10 @@
             gen_opc_hflags[lj] = ctx.hflags & MIPS_HFLAG_BMASK;
             gen_opc_instr_start[lj] = 1;
         }
-        ctx.opcode = ldl_code(ctx.pc);
+        ctx.opcode = ldl_code_p(&ctx.phys_pc_start, ctx.phys_pc, ctx.pc);
         decode_opc(env, &ctx);
         ctx.pc += 4;
+        ctx.phys_pc += 4;
 
         if (env->singlestep_enabled)
             break;
Index: qemu/target-ppc/translate.c
===================================================================
--- qemu.orig/target-ppc/translate.c	2007-10-13 06:22:35.000000000 +0000
+++ qemu/target-ppc/translate.c	2007-10-13 06:25:46.000000000 +0000
@@ -6678,6 +6678,7 @@
 {
     DisasContext ctx, *ctxp = &ctx;
     opc_handler_t **table, *handler;
+    unsigned long phys_pc, phys_pc_start;
     target_ulong pc_start;
     uint16_t *gen_opc_end;
     int supervisor;
@@ -6685,6 +6686,9 @@
     int j, lj = -1;
 
     pc_start = tb->pc;
+    phys_pc_start = (unsigned long)phys_ram_base + tb->page_addr[0] +
+        (pc_start & ~TARGET_PAGE_MASK);
+    phys_pc = phys_pc_start;
     gen_opc_ptr = gen_opc_buf;
     gen_opc_end = gen_opc_buf + OPC_MAX_SIZE;
     gen_opparam_ptr = gen_opparam_buf;
@@ -6763,7 +6767,7 @@
                     ctx.nip, 1 - msr_pr, msr_ir);
         }
 #endif
-        ctx.opcode = ldl_code(ctx.nip);
+        ctx.opcode = ldl_code_p(&phys_pc_start, phys_pc, env->nip);
         if (msr_le) {
             ctx.opcode = ((ctx.opcode & 0xFF000000) >> 24) |
                 ((ctx.opcode & 0x00FF0000) >> 8) |
@@ -6778,6 +6782,7 @@
         }
 #endif
         ctx.nip += 4;
+        phys_pc += 4;
         table = env->opcodes;
         handler = table[opc1(ctx.opcode)];
         if (is_indirect_opcode(handler)) {
Index: qemu/target-sh4/translate.c
===================================================================
--- qemu.orig/target-sh4/translate.c	2007-10-13 06:22:36.000000000 +0000
+++ qemu/target-sh4/translate.c	2007-10-13 06:25:46.000000000 +0000
@@ -1150,11 +1150,15 @@
 {
     DisasContext ctx;
     target_ulong pc_start;
+    unsigned long phys_pc, phys_pc_start;
     static uint16_t *gen_opc_end;
     uint32_t old_flags;
     int i, ii;
 
     pc_start = tb->pc;
+    phys_pc_start = (unsigned long)phys_ram_base + tb->page_addr[0] +
+        (pc_start & ~TARGET_PAGE_MASK);
+    phys_pc = phys_pc_start;
     gen_opc_ptr = gen_opc_buf;
     gen_opc_end = gen_opc_buf + OPC_MAX_SIZE;
     gen_opparam_ptr = gen_opparam_buf;
@@ -1210,9 +1214,10 @@
 	fprintf(stderr, "Loading opcode at address 0x%08x\n", ctx.pc);
 	fflush(stderr);
 #endif
-	ctx.opcode = lduw_code(ctx.pc);
+	ctx.opcode = lduw_code_p(&phys_pc_start, phys_pc, ctx.pc);
 	decode_opc(&ctx);
 	ctx.pc += 2;
+        phys_pc += 2;
 	if ((ctx.pc & (TARGET_PAGE_SIZE - 1)) == 0)
 	    break;
 	if (env->singlestep_enabled)
Index: qemu/target-sparc/translate.c
===================================================================
--- qemu.orig/target-sparc/translate.c	2007-10-13 06:22:36.000000000 +0000
+++ qemu/target-sparc/translate.c	2007-10-13 06:25:46.000000000 +0000
@@ -48,6 +48,8 @@
     target_ulong pc;    /* current Program Counter: integer or DYNAMIC_PC */
     target_ulong npc;   /* next PC: integer or DYNAMIC_PC or JUMP_PC */
     target_ulong jump_pc[2]; /* used when JUMP_PC pc value is used */
+    unsigned long phys_pc;
+    unsigned long phys_pc_start;
     int is_br;
     int mem_idx;
     int fpu_enabled;
@@ -1089,7 +1091,11 @@
 {
     unsigned int insn, opc, rs1, rs2, rd;
 
-    insn = ldl_code(dc->pc);
+#if defined(CONFIG_USER_ONLY)
+    insn = ldl_raw(dc->pc);
+#else
+    insn = ldl_raw(dc->phys_pc);
+#endif
     opc = GET_FIELD(insn, 0, 1);
 
     rd = GET_FIELD(insn, 2, 6);
@@ -3376,6 +3382,8 @@
     dc->tb = tb;
     pc_start = tb->pc;
     dc->pc = pc_start;
+    dc->phys_pc_start = (unsigned long)phys_ram_base + tb->page_addr[0] +
+        (pc_start & ~TARGET_PAGE_MASK);
     last_pc = dc->pc;
     dc->npc = (target_ulong) tb->cs_base;
 #if defined(CONFIG_USER_ONLY)
@@ -3422,6 +3430,7 @@
             }
         }
         last_pc = dc->pc;
+        dc->phys_pc = dc->phys_pc_start + dc->pc - pc_start;
         disas_sparc_insn(dc);
 
         if (dc->is_br)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Fwd: Re: [Qemu-devel] RFC: Code fetch optimisation]
  2007-10-13  7:11 ` Blue Swirl
@ 2007-10-13  9:57   ` J. Mayer
  2007-10-13 11:05     ` J. Mayer
  2007-10-13 11:08     ` Blue Swirl
  0 siblings, 2 replies; 8+ messages in thread
From: J. Mayer @ 2007-10-13  9:57 UTC (permalink / raw)
  To: qemu-devel

On Sat, 2007-10-13 at 10:11 +0300, Blue Swirl wrote:
> On 10/13/07, J. Mayer <l_indien@magic.fr> wrote:
> > -------- Forwarded Message --------
> > > From: Jocelyn Mayer <l_indien@magic.fr>
> > > Reply-To: l_indien@magic.fr, qemu-devel@nongnu.org
> > > To: qemu-devel@nongnu.org
> > > Subject: Re: [Qemu-devel] RFC: Code fetch optimisation
> > > Date: Fri, 12 Oct 2007 20:24:44 +0200
> > >
> > > On Fri, 2007-10-12 at 18:21 +0300, Blue Swirl wrote:
> > > > On 10/12/07, J. Mayer <l_indien@magic.fr> wrote:
> > > > > Here's a small patch that allow an optimisation for code fetch, at least
> > > > > for RISC CPU targets, as suggested by Fabrice Bellard.
> > > > > The main idea is that a translated block is never to span over a page
> > > > > boundary. As the tb_find_slow routine already gets the physical address
> > > > > of the page of code to be translated, the code translator could then
> > > > > fetch the code using raw host memory accesses instead of doing it
> > > > > through the softmmu routines.
> > > > > This patch could also be adapted to RISC CPU targets, with care for the
> > > > > last instruction of a page. For now, I did implement it for alpha, arm,
> > > > > mips, PowerPC and SH4.
> > > > > I don't actually know if the optimsation would bring a sensible speed
> > > > > gain or if it will be absolutelly marginal.
> > > > >
> > > > > Please comment.
> > > >
> > > > This will not work correctly for execution of MMIO registers, but
> > > > maybe that won't work on real hardware either. Who cares.
> > >
> > > I wonder if this is important or not... But maybe, when retrieving the
> > > physical address we could check if it is inside ROM/RAM or an I/O area
> > > and in the last case do not give the phys_addr information to the
> > > translator. In that case, it would go on using the ldxx_code. I guess if
> > > we want to do that, a set of helpers would be appreciated to avoid
> > > adding code like:
> > > if (phys_pc == 0)
> > >   opc = ldul_code(virt_pc)
> > > else
> > >   opc = ldul_raw(phys_pc)
> > > everywhere... I could also add another check so this set of macro would
> > > automatically use ldxx_code if we reach a page boundary, which would
> > > then make easy to use this optimisation for CISC/VLE architectures too.
> > >
> > > I'm not sure of the proper solution to allow executing code from mmio
> > > devices. But adding specific accessors to handle the CISC/VLE case is to
> > > be done.
> >
> > [...]
> >
> > I did update my patch following this way and it's now able to run x86
> > and PowerPC targets.
> > PowerPC is the easy case, x86 is maybe the worst... Well, I'm not really
> > sure of what I've done for Sparc, but other targets should be safe.
> 
> It broke Sparc, delay slot handling makes things complicated. The
> updated patch passes my tests.

OK. I will take a look of how you solved this issue.

> For extra performance, I bypassed the ldl_code_p. On Sparc,
> instructions can't be split between two pages. Isn't translation
> always contained to the same page for all targets like Sparc?

Yes, for RISC targets running 32 bits mode, we always stop translation
when we reach the end of a code page. The problem comes with CISC
architectures, like x86 or m68k, or RISC architecture running 16/32 bits
code, like ARM in thumb mode or PowerPC in VLE mode. In all those case,
there can be instructions spanning on 2 pages, then we need the
ldx_code_p functions.
My idea of always using the ldx_code_p function is that we may have the
occasion to make it more cleaver and make the slow case handle code
execution in mmio areas, when it will be possible.

-- 
J. Mayer <l_indien@magic.fr>
Never organized

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Fwd: Re: [Qemu-devel] RFC: Code fetch optimisation]
  2007-10-13  9:57   ` J. Mayer
@ 2007-10-13 11:05     ` J. Mayer
  2007-10-13 11:58       ` Blue Swirl
  2007-10-13 19:07       ` Thiemo Seufer
  2007-10-13 11:08     ` Blue Swirl
  1 sibling, 2 replies; 8+ messages in thread
From: J. Mayer @ 2007-10-13 11:05 UTC (permalink / raw)
  To: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 4770 bytes --]

On Sat, 2007-10-13 at 11:57 +0200, J. Mayer wrote:
> On Sat, 2007-10-13 at 10:11 +0300, Blue Swirl wrote:
> > On 10/13/07, J. Mayer <l_indien@magic.fr> wrote:
> > > -------- Forwarded Message --------
> > > > From: Jocelyn Mayer <l_indien@magic.fr>
> > > > Reply-To: l_indien@magic.fr, qemu-devel@nongnu.org
> > > > To: qemu-devel@nongnu.org
> > > > Subject: Re: [Qemu-devel] RFC: Code fetch optimisation
> > > > Date: Fri, 12 Oct 2007 20:24:44 +0200
> > > >
> > > > On Fri, 2007-10-12 at 18:21 +0300, Blue Swirl wrote:
> > > > > On 10/12/07, J. Mayer <l_indien@magic.fr> wrote:
> > > > > > Here's a small patch that allow an optimisation for code fetch, at least
> > > > > > for RISC CPU targets, as suggested by Fabrice Bellard.
> > > > > > The main idea is that a translated block is never to span over a page
> > > > > > boundary. As the tb_find_slow routine already gets the physical address
> > > > > > of the page of code to be translated, the code translator could then
> > > > > > fetch the code using raw host memory accesses instead of doing it
> > > > > > through the softmmu routines.
> > > > > > This patch could also be adapted to RISC CPU targets, with care for the
> > > > > > last instruction of a page. For now, I did implement it for alpha, arm,
> > > > > > mips, PowerPC and SH4.
> > > > > > I don't actually know if the optimsation would bring a sensible speed
> > > > > > gain or if it will be absolutelly marginal.
> > > > > >
> > > > > > Please comment.
> > > > >
> > > > > This will not work correctly for execution of MMIO registers, but
> > > > > maybe that won't work on real hardware either. Who cares.
> > > >
> > > > I wonder if this is important or not... But maybe, when retrieving the
> > > > physical address we could check if it is inside ROM/RAM or an I/O area
> > > > and in the last case do not give the phys_addr information to the
> > > > translator. In that case, it would go on using the ldxx_code. I guess if
> > > > we want to do that, a set of helpers would be appreciated to avoid
> > > > adding code like:
> > > > if (phys_pc == 0)
> > > >   opc = ldul_code(virt_pc)
> > > > else
> > > >   opc = ldul_raw(phys_pc)
> > > > everywhere... I could also add another check so this set of macro would
> > > > automatically use ldxx_code if we reach a page boundary, which would
> > > > then make easy to use this optimisation for CISC/VLE architectures too.
> > > >
> > > > I'm not sure of the proper solution to allow executing code from mmio
> > > > devices. But adding specific accessors to handle the CISC/VLE case is to
> > > > be done.
> > >
> > > [...]
> > >
> > > I did update my patch following this way and it's now able to run x86
> > > and PowerPC targets.
> > > PowerPC is the easy case, x86 is maybe the worst... Well, I'm not really
> > > sure of what I've done for Sparc, but other targets should be safe.
> > 
> > It broke Sparc, delay slot handling makes things complicated. The
> > updated patch passes my tests.
> 
> OK. I will take a look of how you solved this issue.
> 
> > For extra performance, I bypassed the ldl_code_p. On Sparc,
> > instructions can't be split between two pages. Isn't translation
> > always contained to the same page for all targets like Sparc?
> 
> Yes, for RISC targets running 32 bits mode, we always stop translation
> when we reach the end of a code page. The problem comes with CISC
> architectures, like x86 or m68k, or RISC architecture running 16/32 bits
> code, like ARM in thumb mode or PowerPC in VLE mode. In all those case,
> there can be instructions spanning on 2 pages, then we need the
> ldx_code_p functions.
> My idea of always using the ldx_code_p function is that we may have the
> occasion to make it more cleaver and make the slow case handle code
> execution in mmio areas, when it will be possible.

Here's an updated patch. I added a definition TARGET_HAS_VLE_INSNS which
is defined is the cris, i386, m68k and ppcemb cases. Arm already has an
explicit support for 32 bits thumb instructions spanning 2 pages, so it
should not need this define. When this define is not set, the
ldxxx_code_p function just does ldxxx_raw(phys_pc) in the softmmu case
and ldxxx_raw(pc) in the user-mode only case. This is optimal for pure
RISC architectures and does not need the #ifdef CONFIG_USER_ONLY you
added for Sparc in your patch version. I also added a provision for a
TARGET_MMIO_CODE define which may be used later when this will really be
supported by Qemu.
I also took your fixes for Sparc phys_pc computation, but reversed your
patch to use ldl_raw as it should not be needed anymore.
I did test PowerPC in user-mode only and softmmu mode and i386 in
softmmu successfully using this new version of the patch.

-- 
J. Mayer <l_indien@magic.fr>
Never organized

[-- Attachment #2: code_raw_optim.diff --]
[-- Type: text/x-patch, Size: 57770 bytes --]

Index: cpu-all.h
===================================================================
RCS file: /sources/qemu/qemu/cpu-all.h,v
retrieving revision 1.76
diff -u -d -d -p -r1.76 cpu-all.h
--- cpu-all.h	23 Sep 2007 15:28:03 -0000	1.76
+++ cpu-all.h	13 Oct 2007 10:19:05 -0000
@@ -646,6 +646,13 @@ static inline void stfq_be_p(void *ptr, 
 #define ldl_code(p) ldl_raw(p)
 #define ldq_code(p) ldq_raw(p)
 
+#define ldub_code_p(sp, pp, p) ldub_raw(p)
+#define ldsb_code_p(sp, pp, p) ldsb_raw(p)
+#define lduw_code_p(sp, pp, p) lduw_raw(p)
+#define ldsw_code_p(sp, pp, p) ldsw_raw(p)
+#define ldl_code_p(sp, pp, p) ldl_raw(p)
+#define ldq_code_p(sp, pp, p) ldq_raw(p)
+
 #define ldub_kernel(p) ldub_raw(p)
 #define ldsb_kernel(p) ldsb_raw(p)
 #define lduw_kernel(p) lduw_raw(p)
Index: cpu-exec.c
===================================================================
RCS file: /sources/qemu/qemu/cpu-exec.c,v
retrieving revision 1.119
diff -u -d -d -p -r1.119 cpu-exec.c
--- cpu-exec.c	8 Oct 2007 13:16:13 -0000	1.119
+++ cpu-exec.c	13 Oct 2007 10:19:05 -0000
@@ -133,6 +133,7 @@ static TranslationBlock *tb_find_slow(ta
     tb->tc_ptr = tc_ptr;
     tb->cs_base = cs_base;
     tb->flags = flags;
+    tb->page_addr[0] = phys_page1;
     cpu_gen_code(env, tb, CODE_GEN_MAX_SIZE, &code_gen_size);
     code_gen_ptr = (void *)(((unsigned long)code_gen_ptr + code_gen_size + CODE_GEN_ALIGN - 1) & ~(CODE_GEN_ALIGN - 1));
 
Index: softmmu_header.h
===================================================================
RCS file: /sources/qemu/qemu/softmmu_header.h,v
retrieving revision 1.17
diff -u -d -d -p -r1.17 softmmu_header.h
--- softmmu_header.h	8 Oct 2007 13:16:14 -0000	1.17
+++ softmmu_header.h	13 Oct 2007 10:19:05 -0000
@@ -336,6 +336,68 @@ static inline void glue(glue(st, SUFFIX)
     }
 }
 
+#else
+
+#if DATA_SIZE <= 2
+static inline RES_TYPE glue(glue(glue(lds,SUFFIX),MEMSUFFIX),_p)(unsigned long *start_pc,
+                                                                 unsigned long phys_pc,
+                                                                 target_ulong virt_pc)
+{
+    RES_TYPE opc;
+
+    /* XXX: Target executing code from MMIO ares is not supported for now */
+#if defined(TARGET_HAS_VLE_INSNS) /* || defined(TARGET_MMIO_CODE) */
+    if (unlikely((*start_pc ^
+                  (phys_pc + sizeof(RES_TYPE) - 1)) >> TARGET_PAGE_BITS)) {
+        /* Slow path: phys_pc is not in the same page than start_pc
+         *            or the insn is spanning two pages
+         */
+        opc = glue(glue(lds,SUFFIX),MEMSUFFIX)(virt_pc);
+        /* Avoid softmmu access on next load */
+        /* XXX: dont: phys PC is not correct anymore
+         *      We could call get_phys_addr_code(env, pc); and remove the else
+         *      condition, here.
+         */
+        //*start_pc = phys_pc;
+    } else
+#endif
+    {
+        opc = glue(glue(lds,SUFFIX),_raw)(phys_pc);
+    }
+
+    return opc;
+}
+#endif
+
+static inline RES_TYPE glue(glue(glue(ld,USUFFIX),MEMSUFFIX),_p)(unsigned long *start_pc,
+                                                                 unsigned long phys_pc,
+                                                                 target_ulong virt_pc)
+{
+    RES_TYPE opc;
+
+    /* XXX: Target executing code from MMIO ares is not supported for now */
+#if defined(TARGET_HAS_VLE_INSNS) /* || defined(TARGET_MMIO_CODE) */
+    if (unlikely((*start_pc ^
+                  (phys_pc + sizeof(RES_TYPE) - 1)) >> TARGET_PAGE_BITS)) {
+        /* Slow path: phys_pc is not in the same page than start_pc
+         *            or the insn is spanning two pages
+         */
+        opc = glue(glue(ld,USUFFIX),MEMSUFFIX)(virt_pc);
+        /* Avoid softmmu access on next load */
+        /* XXX: dont: phys PC is not correct anymore
+         *      We could call get_phys_addr_code(env, pc); and remove the else
+         *      condition, here.
+         */
+        //*start_pc = phys_pc;
+    } else
+#endif
+    {
+        opc = glue(glue(ld,USUFFIX),_raw)(phys_pc);
+    }
+
+    return opc;
+}
+
 #endif /* ACCESS_TYPE != 3 */
 
 #endif /* !asm */
Index: target-alpha/translate.c
===================================================================
RCS file: /sources/qemu/qemu/target-alpha/translate.c,v
retrieving revision 1.5
diff -u -d -d -p -r1.5 translate.c
--- target-alpha/translate.c	16 Sep 2007 21:08:01 -0000	1.5
+++ target-alpha/translate.c	13 Oct 2007 10:19:06 -0000
@@ -1965,6 +1965,7 @@ int gen_intermediate_code_internal (CPUS
     static int insn_count;
 #endif
     DisasContext ctx, *ctxp = &ctx;
+    unsigned long phys_pc, phys_pc_start;
     target_ulong pc_start;
     uint32_t insn;
     uint16_t *gen_opc_end;
@@ -1972,6 +1973,9 @@ int gen_intermediate_code_internal (CPUS
     int ret;
 
     pc_start = tb->pc;
+    phys_pc_start = (unsigned long)phys_ram_base + tb->page_addr[0] +
+        (pc_start & ~TARGET_PAGE_MASK);
+    phys_pc = phys_pc_start;
     gen_opc_ptr = gen_opc_buf;
     gen_opc_end = gen_opc_buf + OPC_MAX_SIZE;
     gen_opparam_ptr = gen_opparam_buf;
@@ -2010,7 +2018,7 @@ int gen_intermediate_code_internal (CPUS
                     ctx.pc, ctx.mem_idx);
         }
 #endif
-        insn = ldl_code(ctx.pc);
+        insn = ldl_code_p(&phys_pc_start, phys_pc, ctx.pc);
 #if defined ALPHA_DEBUG_DISAS
         insn_count++;
         if (logfile != NULL) {
@@ -2018,6 +2026,7 @@ int gen_intermediate_code_internal (CPUS
         }
 #endif
         ctx.pc += 4;
+        phys_pc += 4;
         ret = translate_one(ctxp, insn);
         if (ret != 0)
             break;
Index: target-arm/translate.c
===================================================================
RCS file: /sources/qemu/qemu/target-arm/translate.c,v
retrieving revision 1.57
diff -u -d -d -p -r1.57 translate.c
--- target-arm/translate.c	17 Sep 2007 08:09:51 -0000	1.57
+++ target-arm/translate.c	13 Oct 2007 10:19:06 -0000
@@ -38,6 +38,8 @@
 /* internal defines */
 typedef struct DisasContext {
     target_ulong pc;
+    unsigned long phys_pc;
+    unsigned long phys_pc_start;
     int is_jmp;
     /* Nonzero if this instruction has been conditionally skipped.  */
     int condjmp;
@@ -2206,8 +2208,9 @@ static void disas_arm_insn(CPUState * en
 {
     unsigned int cond, insn, val, op1, i, shift, rm, rs, rn, rd, sh;
 
-    insn = ldl_code(s->pc);
+    insn = ldl_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
     s->pc += 4;
+    s->phys_pc += 4;
 
     cond = insn >> 28;
     if (cond == 0xf){
@@ -2971,8 +2974,9 @@ static void disas_thumb_insn(DisasContex
     int32_t offset;
     int i;
 
-    insn = lduw_code(s->pc);
+    insn = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
     s->pc += 2;
+    s->phys_pc += 2;
 
     switch (insn >> 12) {
     case 0: case 1:
@@ -3494,7 +3498,7 @@ static void disas_thumb_insn(DisasContex
             break;
         }
         offset = ((int32_t)insn << 21) >> 10;
-        insn = lduw_code(s->pc);
+        insn = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
         offset |= insn & 0x7ff;
 
         val = (uint32_t)s->pc + 2;
@@ -3544,6 +3548,9 @@ static inline int gen_intermediate_code_
 
     dc->is_jmp = DISAS_NEXT;
     dc->pc = pc_start;
+    dc->phys_pc_start = (unsigned long)phys_ram_base + tb->page_addr[0] +
+        (pc_start & ~TARGET_PAGE_MASK);
+    dc->phys_pc = dc->phys_pc_start;
     dc->singlestep_enabled = env->singlestep_enabled;
     dc->condjmp = 0;
     dc->thumb = env->thumb;
Index: target-cris/cpu.h
===================================================================
RCS file: /sources/qemu/qemu/target-cris/cpu.h,v
retrieving revision 1.1
diff -u -d -d -p -r1.1 cpu.h
--- target-cris/cpu.h	8 Oct 2007 13:04:02 -0000	1.1
+++ target-cris/cpu.h	13 Oct 2007 10:19:06 -0000
@@ -22,6 +22,8 @@
 #define CPU_CRIS_H
 
 #define TARGET_LONG_BITS 32
+/* need explicit support for instructions spanning 2 pages */
+#define TARGET_HAS_VLE_INSNS 1
 
 #include "cpu-defs.h"
 
Index: target-cris/translate.c
===================================================================
RCS file: /sources/qemu/qemu/target-cris/translate.c,v
retrieving revision 1.1
diff -u -d -d -p -r1.1 translate.c
--- target-cris/translate.c	8 Oct 2007 12:49:08 -0000	1.1
+++ target-cris/translate.c	13 Oct 2007 10:19:06 -0000
@@ -100,6 +100,7 @@ enum {
 typedef struct DisasContext {
 	CPUState *env;
 	target_ulong pc, insn_pc;
+        unsigned long phys_pc, phys_pc_start;
 
 	/* Decoder.  */
 	uint32_t ir;
@@ -828,7 +829,8 @@ static int dec_prep_alu_m(DisasContext *
 		if (memsize == 1)
 			insn_len++;
 
-		imm = ldl_code(dc->pc + 2);
+                imm = ldl_code_p(&dc->phys_pc_start, dc->phys_pc + 2,
+                                 dc->pc + 2);
 		if (memsize != 4) {
 			if (s_ext) {
 				imm = sign_extend(imm, (memsize * 8) - 1);
@@ -1962,7 +1964,7 @@ static unsigned int dec_lapc_im(DisasCon
 	rd = dc->op2;
 
 	cris_cc_mask(dc, 0);
-	imm = ldl_code(dc->pc + 2);
+	imm = ldl_code_p(&dc->phys_pc_start, dc->phys_pc + 2, dc->pc + 2);
 	DIS(fprintf (logfile, "lapc 0x%x, $r%u\n", imm + dc->pc, dc->op2));
 	gen_op_movl_T0_im (dc->pc + imm);
 	gen_movl_reg_T0[rd] ();
@@ -1999,7 +2001,7 @@ static unsigned int dec_jas_im(DisasCont
 {
 	uint32_t imm;
 
-	imm = ldl_code(dc->pc + 2);
+	imm = ldl_code_p(&dc->phys_pc_start, dc->phys_pc + 2, dc->pc + 2);
 
 	DIS(fprintf (logfile, "jas 0x%x\n", imm));
 	cris_cc_mask(dc, 0);
@@ -2016,7 +2018,7 @@ static unsigned int dec_jasc_im(DisasCon
 {
 	uint32_t imm;
 
-	imm = ldl_code(dc->pc + 2);
+	imm = ldl_code_p(&dc->phys_pc_start, dc->phys_pc + 2, dc->pc + 2);
 
 	DIS(fprintf (logfile, "jasc 0x%x\n", imm));
 	cris_cc_mask(dc, 0);
@@ -2047,7 +2049,7 @@ static unsigned int dec_bcc_im(DisasCont
 	int32_t offset;
 	uint32_t cond = dc->op2;
 
-	offset = ldl_code(dc->pc + 2);
+	offset = ldl_code_p(&dc->phys_pc_start, dc->phys_pc + 2, dc->pc + 2);
 	offset = sign_extend(offset, 15);
 
 	DIS(fprintf (logfile, "b%s %d pc=%x dst=%x\n",
@@ -2065,7 +2067,7 @@ static unsigned int dec_bas_im(DisasCont
 	int32_t simm;
 
 
-	simm = ldl_code(dc->pc + 2);
+	simm = ldl_code_p(&dc->phys_pc_start, dc->phys_pc + 2, dc->pc + 2);
 
 	DIS(fprintf (logfile, "bas 0x%x, $p%u\n", dc->pc + simm, dc->op2));
 	cris_cc_mask(dc, 0);
@@ -2081,7 +2083,7 @@ static unsigned int dec_bas_im(DisasCont
 static unsigned int dec_basc_im(DisasContext *dc)
 {
 	int32_t simm;
-	simm = ldl_code(dc->pc + 2);
+	simm = ldl_code_p(&dc->phys_pc_start, dc->phys_pc + 2, dc->pc + 2);
 
 	DIS(fprintf (logfile, "basc 0x%x, $p%u\n", dc->pc + simm, dc->op2));
 	cris_cc_mask(dc, 0);
@@ -2259,7 +2261,7 @@ cris_decoder(DisasContext *dc)
 	int i;
 
 	/* Load a halfword onto the instruction register.  */
-	tmp = ldl_code(dc->pc);
+	tmp = ldl_code_p(&dc->phys_pc_start, dc->phys_pc, dc->pc);
 	dc->ir = tmp & 0xffff;
 
 	/* Now decode it.  */
@@ -2313,6 +2315,9 @@ gen_intermediate_code_internal(CPUState 
 	uint32_t next_page_start;
 
 	pc_start = tb->pc;
+        dc->phys_pc_start = (unsigned long)phys_ram_base + tb->page_addr[0] +
+            (pc_start & ~TARGET_PAGE_MASK);
+        dc->phys_pc = dc->phys_pc_start;
 	dc->env = env;
 	dc->tb = tb;
 
@@ -2347,6 +2356,7 @@ gen_intermediate_code_internal(CPUState 
 		insn_len = cris_decoder(dc);
 		STATS(gen_op_exec_insn());
 		dc->pc += insn_len;
+                dc->phys_pc += insn_len;
 		if (!dc->flagx_live
 		    || (dc->flagx_live &&
 			!(dc->cc_op == CC_OP_FLAGS && dc->flags_x))) {
Index: target-i386/cpu.h
===================================================================
RCS file: /sources/qemu/qemu/target-i386/cpu.h,v
retrieving revision 1.50
diff -u -d -d -p -r1.50 cpu.h
--- target-i386/cpu.h	27 Sep 2007 16:44:31 -0000	1.50
+++ target-i386/cpu.h	13 Oct 2007 10:19:06 -0000
@@ -33,6 +33,8 @@
 /* support for self modifying code even if the modified instruction is
    close to the modifying instruction */
 #define TARGET_HAS_PRECISE_SMC
+/* need explicit support for instructions spanning 2 pages */
+#define TARGET_HAS_VLE_INSNS 1
 
 #define TARGET_HAS_ICE 1
 
Index: target-i386/translate.c
===================================================================
RCS file: /sources/qemu/qemu/target-i386/translate.c,v
retrieving revision 1.72
diff -u -d -d -p -r1.72 translate.c
--- target-i386/translate.c	27 Sep 2007 01:52:00 -0000	1.72
+++ target-i386/translate.c	13 Oct 2007 10:19:07 -0000
@@ -73,6 +73,7 @@ typedef struct DisasContext {
     int prefix;
     int aflag, dflag;
     target_ulong pc; /* pc = eip + cs_base */
+    unsigned long phys_pc,phys_pc_start;
     int is_jmp; /* 1 = means jump (stop translation), 2 means CPU
                    static state change (stop translation) */
     /* current block context */
@@ -1451,7 +1452,7 @@ static void gen_lea_modrm(DisasContext *
 
         if (base == 4) {
             havesib = 1;
-            code = ldub_code(s->pc++);
+            code = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
             scale = (code >> 6) & 3;
             index = ((code >> 3) & 7) | REX_X(s);
             base = (code & 7);
@@ -1462,8 +1463,10 @@ static void gen_lea_modrm(DisasContext *
         case 0:
             if ((base & 7) == 5) {
                 base = -1;
-                disp = (int32_t)ldl_code(s->pc);
+                disp = (int32_t)ldl_code_p(&s->phys_pc_start, s->phys_pc,
+                                           s->pc);
                 s->pc += 4;
+                s->phys_pc += 4;
                 if (CODE64(s) && !havesib) {
                     disp += s->pc + s->rip_offset;
                 }
@@ -1472,12 +1475,14 @@ static void gen_lea_modrm(DisasContext *
             }
             break;
         case 1:
-            disp = (int8_t)ldub_code(s->pc++);
+            disp = (int8_t)ldub_code_p(&s->phys_pc_start, s->phys_pc++,
+                                       s->pc++);
             break;
         default:
         case 2:
-            disp = ldl_code(s->pc);
+            disp = ldl_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
             s->pc += 4;
+            s->phys_pc += 4;
             break;
         }
 
@@ -1545,8 +1550,9 @@ static void gen_lea_modrm(DisasContext *
         switch (mod) {
         case 0:
             if (rm == 6) {
-                disp = lduw_code(s->pc);
+                disp = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
                 s->pc += 2;
+                s->phys_pc += 2;
                 gen_op_movl_A0_im(disp);
                 rm = 0; /* avoid SS override */
                 goto no_rm;
@@ -1555,12 +1561,14 @@ static void gen_lea_modrm(DisasContext *
             }
             break;
         case 1:
-            disp = (int8_t)ldub_code(s->pc++);
+            disp = (int8_t)ldub_code_p(&s->phys_pc_start, s->phys_pc++,
+                                       s->pc++);
             break;
         default:
         case 2:
-            disp = lduw_code(s->pc);
+            disp = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
             s->pc += 2;
+            s->phys_pc += 2;
             break;
         }
         switch(rm) {
@@ -1629,7 +1637,7 @@ static void gen_nop_modrm(DisasContext *
         base = rm;
 
         if (base == 4) {
-            code = ldub_code(s->pc++);
+            code = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
             base = (code & 7);
         }
 
@@ -1637,14 +1645,17 @@ static void gen_nop_modrm(DisasContext *
         case 0:
             if (base == 5) {
                 s->pc += 4;
+                s->phys_pc += 4;
             }
             break;
         case 1:
             s->pc++;
+            s->phys_pc++;
             break;
         default:
         case 2:
             s->pc += 4;
+            s->phys_pc += 4;
             break;
         }
     } else {
@@ -1652,14 +1663,17 @@ static void gen_nop_modrm(DisasContext *
         case 0:
             if (rm == 6) {
                 s->pc += 2;
+                s->phys_pc += 2;
             }
             break;
         case 1:
             s->pc++;
+            s->phys_pc++;
             break;
         default:
         case 2:
             s->pc += 2;
+            s->phys_pc += 2;
             break;
         }
     }
@@ -1727,17 +1741,20 @@ static inline uint32_t insn_get(DisasCon
 
     switch(ot) {
     case OT_BYTE:
-        ret = ldub_code(s->pc);
+        ret = ldub_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
         s->pc++;
+        s->phys_pc++;
         break;
     case OT_WORD:
-        ret = lduw_code(s->pc);
+        ret = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
         s->pc += 2;
+        s->phys_pc += 2;
         break;
     default:
     case OT_LONG:
-        ret = ldl_code(s->pc);
+        ret = ldl_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
         s->pc += 4;
+        s->phys_pc += 4;
         break;
     }
     return ret;
@@ -2689,7 +2706,7 @@ static void gen_sse(DisasContext *s, int
         gen_op_enter_mmx();
     }
 
-    modrm = ldub_code(s->pc++);
+    modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
     reg = ((modrm >> 3) & 7);
     if (is_xmm)
         reg |= rex_r;
@@ -2962,7 +2979,7 @@ static void gen_sse(DisasContext *s, int
         case 0x171: /* shift xmm, im */
         case 0x172:
         case 0x173:
-            val = ldub_code(s->pc++);
+            val = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
             if (is_xmm) {
                 gen_op_movl_T0_im(val);
                 gen_op_movl_env_T0(offsetof(CPUX86State,xmm_t0.XMM_L(0)));
@@ -3082,7 +3099,7 @@ static void gen_sse(DisasContext *s, int
         case 0x1c4:
             s->rip_offset = 1;
             gen_ldst_modrm(s, modrm, OT_WORD, OR_TMP0, 0);
-            val = ldub_code(s->pc++);
+            val = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
             if (b1) {
                 val &= 7;
                 gen_op_pinsrw_xmm(offsetof(CPUX86State,xmm_regs[reg]), val);
@@ -3095,7 +3112,7 @@ static void gen_sse(DisasContext *s, int
         case 0x1c5:
             if (mod != 3)
                 goto illegal_op;
-            val = ldub_code(s->pc++);
+            val = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
             if (b1) {
                 val &= 7;
                 rm = (modrm & 7) | REX_B(s);
@@ -3213,13 +3230,13 @@ static void gen_sse(DisasContext *s, int
         switch(b) {
         case 0x70: /* pshufx insn */
         case 0xc6: /* pshufx insn */
-            val = ldub_code(s->pc++);
+            val = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
             sse_op3 = (GenOpFunc3 *)sse_op2;
             sse_op3(op1_offset, op2_offset, val);
             break;
         case 0xc2:
             /* compare insns */
-            val = ldub_code(s->pc++);
+            val = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
             if (val >= 8)
                 goto illegal_op;
             sse_op2 = sse_op_table4[val][b1];
@@ -3260,8 +3277,9 @@ static target_ulong disas_insn(DisasCont
 #endif
     s->rip_offset = 0; /* for relative ip address */
  next_byte:
-    b = ldub_code(s->pc);
+    b = ldub_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
     s->pc++;
+    s->phys_pc++;
     /* check prefixes */
 #ifdef TARGET_X86_64
     if (CODE64(s)) {
@@ -3375,7 +3393,7 @@ static target_ulong disas_insn(DisasCont
     case 0x0f:
         /**************************/
         /* extended op code */
-        b = ldub_code(s->pc++) | 0x100;
+        b = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++) | 0x100;
         goto reswitch;
 
         /**************************/
@@ -3400,7 +3418,7 @@ static target_ulong disas_insn(DisasCont
 
             switch(f) {
             case 0: /* OP Ev, Gv */
-                modrm = ldub_code(s->pc++);
+                modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
                 reg = ((modrm >> 3) & 7) | rex_r;
                 mod = (modrm >> 6) & 3;
                 rm = (modrm & 7) | REX_B(s);
@@ -3422,7 +3440,7 @@ static target_ulong disas_insn(DisasCont
                 gen_op(s, op, ot, opreg);
                 break;
             case 1: /* OP Gv, Ev */
-                modrm = ldub_code(s->pc++);
+                modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
                 mod = (modrm >> 6) & 3;
                 reg = ((modrm >> 3) & 7) | rex_r;
                 rm = (modrm & 7) | REX_B(s);
@@ -3457,7 +3475,7 @@ static target_ulong disas_insn(DisasCont
             else
                 ot = dflag + OT_WORD;
 
-            modrm = ldub_code(s->pc++);
+            modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
             mod = (modrm >> 6) & 3;
             rm = (modrm & 7) | REX_B(s);
             op = (modrm >> 3) & 7;
@@ -3506,7 +3524,7 @@ static target_ulong disas_insn(DisasCont
         else
             ot = dflag + OT_WORD;
 
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         mod = (modrm >> 6) & 3;
         rm = (modrm & 7) | REX_B(s);
         op = (modrm >> 3) & 7;
@@ -3648,7 +3666,7 @@ static target_ulong disas_insn(DisasCont
         else
             ot = dflag + OT_WORD;
 
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         mod = (modrm >> 6) & 3;
         rm = (modrm & 7) | REX_B(s);
         op = (modrm >> 3) & 7;
@@ -3754,7 +3772,7 @@ static target_ulong disas_insn(DisasCont
         else
             ot = dflag + OT_WORD;
 
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         mod = (modrm >> 6) & 3;
         rm = (modrm & 7) | REX_B(s);
         reg = ((modrm >> 3) & 7) | rex_r;
@@ -3805,7 +3823,7 @@ static target_ulong disas_insn(DisasCont
     case 0x69: /* imul Gv, Ev, I */
     case 0x6b:
         ot = dflag + OT_WORD;
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         reg = ((modrm >> 3) & 7) | rex_r;
         if (b == 0x69)
             s->rip_offset = insn_const_size(ot);
@@ -3841,7 +3859,7 @@ static target_ulong disas_insn(DisasCont
             ot = OT_BYTE;
         else
             ot = dflag + OT_WORD;
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         reg = ((modrm >> 3) & 7) | rex_r;
         mod = (modrm >> 6) & 3;
         if (mod == 3) {
@@ -3868,7 +3886,7 @@ static target_ulong disas_insn(DisasCont
             ot = OT_BYTE;
         else
             ot = dflag + OT_WORD;
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         reg = ((modrm >> 3) & 7) | rex_r;
         mod = (modrm >> 6) & 3;
         gen_op_mov_TN_reg[ot][1][reg]();
@@ -3885,7 +3903,7 @@ static target_ulong disas_insn(DisasCont
         s->cc_op = CC_OP_SUBB + ot;
         break;
     case 0x1c7: /* cmpxchg8b */
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         mod = (modrm >> 6) & 3;
         if (mod == 3)
             goto illegal_op;
@@ -3944,7 +3962,7 @@ static target_ulong disas_insn(DisasCont
         } else {
             ot = dflag + OT_WORD;
         }
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         mod = (modrm >> 6) & 3;
         gen_pop_T0(s);
         if (mod == 3) {
@@ -3963,9 +3981,10 @@ static target_ulong disas_insn(DisasCont
     case 0xc8: /* enter */
         {
             int level;
-            val = lduw_code(s->pc);
+            val = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
             s->pc += 2;
-            level = ldub_code(s->pc++);
+            s->phys_pc += 2;
+            level = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
             gen_enter(s, val, level);
         }
         break;
@@ -4045,7 +4064,7 @@ static target_ulong disas_insn(DisasCont
             ot = OT_BYTE;
         else
             ot = dflag + OT_WORD;
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         reg = ((modrm >> 3) & 7) | rex_r;
 
         /* generate a generic store */
@@ -4057,7 +4076,7 @@ static target_ulong disas_insn(DisasCont
             ot = OT_BYTE;
         else
             ot = dflag + OT_WORD;
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         mod = (modrm >> 6) & 3;
         if (mod != 3) {
             s->rip_offset = insn_const_size(ot);
@@ -4076,14 +4095,14 @@ static target_ulong disas_insn(DisasCont
             ot = OT_BYTE;
         else
             ot = OT_WORD + dflag;
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         reg = ((modrm >> 3) & 7) | rex_r;
 
         gen_ldst_modrm(s, modrm, ot, OR_TMP0, 0);
         gen_op_mov_reg_T0[ot][reg]();
         break;
     case 0x8e: /* mov seg, Gv */
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         reg = (modrm >> 3) & 7;
         if (reg >= 6 || reg == R_CS)
             goto illegal_op;
@@ -4103,7 +4122,7 @@ static target_ulong disas_insn(DisasCont
         }
         break;
     case 0x8c: /* mov Gv, seg */
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         reg = (modrm >> 3) & 7;
         mod = (modrm >> 6) & 3;
         if (reg >= 6)
@@ -4126,7 +4145,7 @@ static target_ulong disas_insn(DisasCont
             d_ot = dflag + OT_WORD;
             /* ot is the size of source */
             ot = (b & 1) + OT_BYTE;
-            modrm = ldub_code(s->pc++);
+            modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
             reg = ((modrm >> 3) & 7) | rex_r;
             mod = (modrm >> 6) & 3;
             rm = (modrm & 7) | REX_B(s);
@@ -4163,7 +4182,7 @@ static target_ulong disas_insn(DisasCont
 
     case 0x8d: /* lea */
         ot = dflag + OT_WORD;
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         mod = (modrm >> 6) & 3;
         if (mod == 3)
             goto illegal_op;
@@ -4190,8 +4209,9 @@ static target_ulong disas_insn(DisasCont
                 ot = dflag + OT_WORD;
 #ifdef TARGET_X86_64
             if (s->aflag == 2) {
-                offset_addr = ldq_code(s->pc);
+                offset_addr = ldq_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
                 s->pc += 8;
+                s->phys_pc += 8;
                 if (offset_addr == (int32_t)offset_addr)
                     gen_op_movq_A0_im(offset_addr);
                 else
@@ -4243,8 +4263,9 @@ static target_ulong disas_insn(DisasCont
         if (dflag == 2) {
             uint64_t tmp;
             /* 64 bit case */
-            tmp = ldq_code(s->pc);
+            tmp = ldq_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
             s->pc += 8;
+            s->phys_pc += 8;
             reg = (b & 7) | REX_B(s);
             gen_movtl_T0_im(tmp);
             gen_op_mov_reg_T0[OT_QUAD][reg]();
@@ -4270,7 +4291,7 @@ static target_ulong disas_insn(DisasCont
             ot = OT_BYTE;
         else
             ot = dflag + OT_WORD;
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         reg = ((modrm >> 3) & 7) | rex_r;
         mod = (modrm >> 6) & 3;
         if (mod == 3) {
@@ -4313,7 +4334,7 @@ static target_ulong disas_insn(DisasCont
         op = R_GS;
     do_lxx:
         ot = dflag ? OT_LONG : OT_WORD;
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         reg = ((modrm >> 3) & 7) | rex_r;
         mod = (modrm >> 6) & 3;
         if (mod == 3)
@@ -4345,7 +4366,7 @@ static target_ulong disas_insn(DisasCont
             else
                 ot = dflag + OT_WORD;
 
-            modrm = ldub_code(s->pc++);
+            modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
             mod = (modrm >> 6) & 3;
             op = (modrm >> 3) & 7;
 
@@ -4364,7 +4385,8 @@ static target_ulong disas_insn(DisasCont
                 gen_shift(s, op, ot, opreg, OR_ECX);
             } else {
                 if (shift == 2) {
-                    shift = ldub_code(s->pc++);
+                    shift = ldub_code_p(&s->phys_pc_start, s->phys_pc++,
+                                        s->pc++);
                 }
                 gen_shifti(s, op, ot, opreg, shift);
             }
@@ -4398,7 +4420,7 @@ static target_ulong disas_insn(DisasCont
         shift = 0;
     do_shiftd:
         ot = dflag + OT_WORD;
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         mod = (modrm >> 6) & 3;
         rm = (modrm & 7) | REX_B(s);
         reg = ((modrm >> 3) & 7) | rex_r;
@@ -4412,7 +4434,7 @@ static target_ulong disas_insn(DisasCont
         gen_op_mov_TN_reg[ot][1][reg]();
 
         if (shift) {
-            val = ldub_code(s->pc++);
+            val = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
             if (ot == OT_QUAD)
                 val &= 0x3f;
             else
@@ -4450,7 +4472,7 @@ static target_ulong disas_insn(DisasCont
             gen_exception(s, EXCP07_PREX, pc_start - s->cs_base);
             break;
         }
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         mod = (modrm >> 6) & 3;
         rm = modrm & 7;
         op = ((b & 7) << 3) | ((modrm >> 3) & 7);
@@ -5013,7 +5035,7 @@ static target_ulong disas_insn(DisasCont
             ot = OT_BYTE;
         else
             ot = dflag ? OT_LONG : OT_WORD;
-        val = ldub_code(s->pc++);
+        val = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         gen_op_movl_T0_im(val);
         gen_check_io(s, ot, 0, pc_start - s->cs_base);
         if (gen_svm_check_io(s, pc_start,
@@ -5029,7 +5051,7 @@ static target_ulong disas_insn(DisasCont
             ot = OT_BYTE;
         else
             ot = dflag ? OT_LONG : OT_WORD;
-        val = ldub_code(s->pc++);
+        val = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         gen_op_movl_T0_im(val);
         gen_check_io(s, ot, 0, pc_start - s->cs_base);
         if (gen_svm_check_io(s, pc_start, svm_is_rep(prefixes) |
@@ -5073,8 +5095,9 @@ static target_ulong disas_insn(DisasCont
         /************************/
         /* control */
     case 0xc2: /* ret im */
-        val = ldsw_code(s->pc);
+        val = ldsw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
         s->pc += 2;
+        s->phys_pc += 2;
         gen_pop_T0(s);
         if (CODE64(s) && s->dflag)
             s->dflag = 2;
@@ -5093,8 +5116,9 @@ static target_ulong disas_insn(DisasCont
         gen_eob(s);
         break;
     case 0xca: /* lret im */
-        val = ldsw_code(s->pc);
+        val = ldsw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
         s->pc += 2;
+        s->phys_pc += 2;
     do_lret:
         if (s->pe && !s->vm86) {
             if (s->cc_op != CC_OP_DYNAMIC)
@@ -5223,13 +5247,13 @@ static target_ulong disas_insn(DisasCont
         break;
 
     case 0x190 ... 0x19f: /* setcc Gv */
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         gen_setcc(s, b);
         gen_ldst_modrm(s, modrm, OT_BYTE, OR_TMP0, 1);
         break;
     case 0x140 ... 0x14f: /* cmov Gv, Ev */
         ot = dflag + OT_WORD;
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         reg = ((modrm >> 3) & 7) | rex_r;
         mod = (modrm >> 6) & 3;
         gen_setcc(s, b);
@@ -5338,7 +5362,7 @@ static target_ulong disas_insn(DisasCont
         /* bit operations */
     case 0x1ba: /* bt/bts/btr/btc Gv, im */
         ot = dflag + OT_WORD;
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         op = (modrm >> 3) & 7;
         mod = (modrm >> 6) & 3;
         rm = (modrm & 7) | REX_B(s);
@@ -5350,7 +5374,7 @@ static target_ulong disas_insn(DisasCont
             gen_op_mov_TN_reg[ot][0][rm]();
         }
         /* load shift */
-        val = ldub_code(s->pc++);
+        val = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         gen_op_movl_T1_im(val);
         if (op < 4)
             goto illegal_op;
@@ -5378,7 +5402,7 @@ static target_ulong disas_insn(DisasCont
         op = 3;
     do_btx:
         ot = dflag + OT_WORD;
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         reg = ((modrm >> 3) & 7) | rex_r;
         mod = (modrm >> 6) & 3;
         rm = (modrm & 7) | REX_B(s);
@@ -5404,7 +5428,7 @@ static target_ulong disas_insn(DisasCont
     case 0x1bc: /* bsf */
     case 0x1bd: /* bsr */
         ot = dflag + OT_WORD;
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         reg = ((modrm >> 3) & 7) | rex_r;
         gen_ldst_modrm(s, modrm, ot, OR_TMP0, 0);
         /* NOTE: in order to handle the 0 case, we must load the
@@ -5451,7 +5475,7 @@ static target_ulong disas_insn(DisasCont
     case 0xd4: /* aam */
         if (CODE64(s))
             goto illegal_op;
-        val = ldub_code(s->pc++);
+        val = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         if (val == 0) {
             gen_exception(s, EXCP00_DIVZ, pc_start - s->cs_base);
         } else {
@@ -5462,7 +5486,7 @@ static target_ulong disas_insn(DisasCont
     case 0xd5: /* aad */
         if (CODE64(s))
             goto illegal_op;
-        val = ldub_code(s->pc++);
+        val = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         gen_op_aad(val);
         s->cc_op = CC_OP_LOGICB;
         break;
@@ -5494,7 +5518,7 @@ static target_ulong disas_insn(DisasCont
         gen_interrupt(s, EXCP03_INT3, pc_start - s->cs_base, s->pc - s->cs_base);
         break;
     case 0xcd: /* int N */
-        val = ldub_code(s->pc++);
+        val = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         if (gen_svm_check_intercept(s, pc_start, SVM_EXIT_SWINT))
             break;
         if (s->vm86 && s->iopl != 3) {
@@ -5567,7 +5591,7 @@ static target_ulong disas_insn(DisasCont
         if (CODE64(s))
             goto illegal_op;
         ot = dflag ? OT_LONG : OT_WORD;
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         reg = (modrm >> 3) & 7;
         mod = (modrm >> 6) & 3;
         if (mod == 3)
@@ -5738,7 +5762,7 @@ static target_ulong disas_insn(DisasCont
         }
         break;
     case 0x100:
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         mod = (modrm >> 6) & 3;
         op = (modrm >> 3) & 7;
         switch(op) {
@@ -5808,7 +5832,7 @@ static target_ulong disas_insn(DisasCont
         }
         break;
     case 0x101:
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         mod = (modrm >> 6) & 3;
         op = (modrm >> 3) & 7;
         rm = modrm & 7;
@@ -6022,7 +6046,7 @@ static target_ulong disas_insn(DisasCont
             /* d_ot is the size of destination */
             d_ot = dflag + OT_WORD;
 
-            modrm = ldub_code(s->pc++);
+            modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
             reg = ((modrm >> 3) & 7) | rex_r;
             mod = (modrm >> 6) & 3;
             rm = (modrm & 7) | REX_B(s);
@@ -6048,7 +6072,7 @@ static target_ulong disas_insn(DisasCont
             if (!s->pe || s->vm86)
                 goto illegal_op;
             ot = dflag ? OT_LONG : OT_WORD;
-            modrm = ldub_code(s->pc++);
+            modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
             reg = (modrm >> 3) & 7;
             mod = (modrm >> 6) & 3;
             rm = modrm & 7;
@@ -6075,7 +6099,7 @@ static target_ulong disas_insn(DisasCont
         if (!s->pe || s->vm86)
             goto illegal_op;
         ot = dflag ? OT_LONG : OT_WORD;
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         reg = ((modrm >> 3) & 7) | rex_r;
         gen_ldst_modrm(s, modrm, ot, OR_TMP0, 0);
         gen_op_mov_TN_reg[ot][1][reg]();
@@ -6089,7 +6113,7 @@ static target_ulong disas_insn(DisasCont
         gen_op_mov_reg_T1[ot][reg]();
         break;
     case 0x118:
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         mod = (modrm >> 6) & 3;
         op = (modrm >> 3) & 7;
         switch(op) {
@@ -6108,7 +6132,7 @@ static target_ulong disas_insn(DisasCont
         }
         break;
     case 0x119 ... 0x11f: /* nop (multi byte) */
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         gen_nop_modrm(s, modrm);
         break;
     case 0x120: /* mov reg, crN */
@@ -6116,7 +6140,7 @@ static target_ulong disas_insn(DisasCont
         if (s->cpl != 0) {
             gen_exception(s, EXCP0D_GPF, pc_start - s->cs_base);
         } else {
-            modrm = ldub_code(s->pc++);
+            modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
             if ((modrm & 0xc0) != 0xc0)
                 goto illegal_op;
             rm = (modrm & 7) | REX_B(s);
@@ -6158,7 +6182,7 @@ static target_ulong disas_insn(DisasCont
         if (s->cpl != 0) {
             gen_exception(s, EXCP0D_GPF, pc_start - s->cs_base);
         } else {
-            modrm = ldub_code(s->pc++);
+            modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
             if ((modrm & 0xc0) != 0xc0)
                 goto illegal_op;
             rm = (modrm & 7) | REX_B(s);
@@ -6199,7 +6223,7 @@ static target_ulong disas_insn(DisasCont
         if (!(s->cpuid_features & CPUID_SSE2))
             goto illegal_op;
         ot = s->dflag == 2 ? OT_QUAD : OT_LONG;
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         mod = (modrm >> 6) & 3;
         if (mod == 3)
             goto illegal_op;
@@ -6208,7 +6232,7 @@ static target_ulong disas_insn(DisasCont
         gen_ldst_modrm(s, modrm, ot, reg, 1);
         break;
     case 0x1ae:
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         mod = (modrm >> 6) & 3;
         op = (modrm >> 3) & 7;
         switch(op) {
@@ -6274,7 +6298,7 @@ static target_ulong disas_insn(DisasCont
         }
         break;
     case 0x10d: /* prefetch */
-        modrm = ldub_code(s->pc++);
+        modrm = ldub_code_p(&s->phys_pc_start, s->phys_pc++, s->pc++);
         gen_lea_modrm(s, modrm, &reg_addr, &offset_addr);
         /* ignore for now */
         break;
@@ -6752,6 +6776,9 @@ static inline int gen_intermediate_code_
 
     dc->is_jmp = DISAS_NEXT;
     pc_ptr = pc_start;
+    dc->phys_pc_start = (unsigned long)phys_ram_base + tb->page_addr[0] +
+        (pc_start & ~TARGET_PAGE_MASK);
+    dc->phys_pc = dc->phys_pc_start;
     lj = -1;
 
     for(;;) {
Index: target-m68k/cpu.h
===================================================================
RCS file: /sources/qemu/qemu/target-m68k/cpu.h,v
retrieving revision 1.13
diff -u -d -d -p -r1.13 cpu.h
--- target-m68k/cpu.h	17 Sep 2007 08:09:53 -0000	1.13
+++ target-m68k/cpu.h	13 Oct 2007 10:19:07 -0000
@@ -22,6 +22,8 @@
 #define CPU_M68K_H
 
 #define TARGET_LONG_BITS 32
+/* need explicit support for instructions spanning 2 pages */
+#define TARGET_HAS_VLE_INSNS 1
 
 #include "cpu-defs.h"
 
Index: target-m68k/translate.c
===================================================================
RCS file: /sources/qemu/qemu/target-m68k/translate.c,v
retrieving revision 1.20
diff -u -d -d -p -r1.20 translate.c
--- target-m68k/translate.c	17 Sep 2007 08:09:53 -0000	1.20
+++ target-m68k/translate.c	13 Oct 2007 10:19:07 -0000
@@ -45,6 +45,8 @@ typedef struct DisasContext {
     CPUM68KState *env;
     target_ulong insn_pc; /* Start of the current instruction.  */
     target_ulong pc;
+    unsigned long phys_pc;
+    unsigned long phys_pc_start;
     int is_jmp;
     int cc_op;
     int user;
@@ -207,10 +209,12 @@ static int gen_ldst(DisasContext *s, int
 static inline uint32_t read_im32(DisasContext *s)
 {
     uint32_t im;
-    im = ((uint32_t)lduw_code(s->pc)) << 16;
+    im = ((uint32_t)lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc)) << 16;
     s->pc += 2;
-    im |= lduw_code(s->pc);
+    s->phys_pc += 2;
+    im |= lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
     s->pc += 2;
+    s->phys_pc += 2;
     return im;
 }
 
@@ -244,8 +248,9 @@ static int gen_lea_indexed(DisasContext 
     uint32_t bd, od;
 
     offset = s->pc;
-    ext = lduw_code(s->pc);
+    ext = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
     s->pc += 2;
+    s->phys_pc += 2;
 
     if ((ext & 0x800) == 0 && !m68k_feature(s->env, M68K_FEATURE_WORD_INDEX))
         return -1;
@@ -258,8 +263,10 @@ static int gen_lea_indexed(DisasContext 
         if ((ext & 0x30) > 0x10) {
             /* base displacement */
             if ((ext & 0x30) == 0x20) {
-                bd = (int16_t)lduw_code(s->pc);
+                bd = (int16_t)lduw_code_p(&s->phys_pc_start, s->phys_pc,
+                                          s->pc);
                 s->pc += 2;
+                s->phys_pc += 2;
             } else {
                 bd = read_im32(s);
             }
@@ -307,8 +314,10 @@ static int gen_lea_indexed(DisasContext 
             if ((ext & 3) > 1) {
                 /* outer displacement */
                 if ((ext & 3) == 2) {
-                    od = (int16_t)lduw_code(s->pc);
+                    od = (int16_t)lduw_code_p(&s->phys_pc_start, s->phys_pc,
+                                              s->pc);
                     s->pc += 2;
+                    s->phys_pc += 2;
                 } else {
                     od = read_im32(s);
                 }
@@ -455,8 +464,9 @@ static int gen_lea(DisasContext *s, uint
     case 5: /* Indirect displacement.  */
         reg += QREG_A0;
         tmp = gen_new_qreg(QMODE_I32);
-        ext = lduw_code(s->pc);
+        ext = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
         s->pc += 2;
+        s->phys_pc += 2;
         gen_op_add32(tmp, reg, gen_im32((int16_t)ext));
         return tmp;
     case 6: /* Indirect index + displacement.  */
@@ -465,8 +475,9 @@ static int gen_lea(DisasContext *s, uint
     case 7: /* Other */
         switch (reg) {
         case 0: /* Absolute short.  */
-            offset = ldsw_code(s->pc);
+            offset = ldsw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
             s->pc += 2;
+            s->phys_pc += 2;
             return gen_im32(offset);
         case 1: /* Absolute long.  */
             offset = read_im32(s);
@@ -474,8 +485,9 @@ static int gen_lea(DisasContext *s, uint
         case 2: /* pc displacement  */
             tmp = gen_new_qreg(QMODE_I32);
             offset = s->pc;
-            offset += ldsw_code(s->pc);
+            offset += ldsw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
             s->pc += 2;
+            s->phys_pc += 2;
             return gen_im32(offset);
         case 3: /* pc index+displacement.  */
             return gen_lea_indexed(s, opsize, -1);
@@ -581,18 +593,23 @@ static int gen_ea(DisasContext *s, uint1
             /* Sign extend values for consistency.  */
             switch (opsize) {
             case OS_BYTE:
-                if (val)
-                    offset = ldsb_code(s->pc + 1);
-                else
-                    offset = ldub_code(s->pc + 1);
+                if (val) {
+                    offset = ldsb_code_p(&s->phys_pc_start, s->phys_pc + 1,
+                                         s->pc + 1);
+                } else {
+                    offset = ldub_code_p(&s->phys_pc_start, s->phys_pc + 1,
+                                         s->pc + 1);
+                }
                 s->pc += 2;
+                s->phys_pc += 2;
                 break;
             case OS_WORD:
                 if (val)
-                    offset = ldsw_code(s->pc);
+                    offset = ldsw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
                 else
-                    offset = lduw_code(s->pc);
+                    offset = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
                 s->pc += 2;
+                s->phys_pc += 2;
                 break;
             case OS_LONG:
                 offset = read_im32(s);
@@ -879,8 +896,9 @@ DISAS_INSN(divl)
     int reg;
     uint16_t ext;
 
-    ext = lduw_code(s->pc);
+    ext = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
     s->pc += 2;
+    s->phys_pc += 2;
     if (ext & 0x87f8) {
         gen_exception(s, s->pc - 4, EXCP_UNSUPPORTED);
         return;
@@ -1066,8 +1084,9 @@ DISAS_INSN(movem)
     int tmp;
     int is_load;
 
-    mask = lduw_code(s->pc);
+    mask = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
     s->pc += 2;
+    s->phys_pc += 2;
     tmp = gen_lea(s, insn, OS_LONG);
     if (tmp == -1) {
         gen_addr_fault(s);
@@ -1111,8 +1130,9 @@ DISAS_INSN(bitop_im)
         opsize = OS_LONG;
     op = (insn >> 6) & 3;
 
-    bitnum = lduw_code(s->pc);
+    bitnum = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
     s->pc += 2;
+    s->phys_pc += 2;
     if (bitnum & 0xff00) {
         disas_undef(s, insn);
         return;
@@ -1375,8 +1395,9 @@ static void gen_set_sr(DisasContext *s, 
     else if ((insn & 0x3f) == 0x3c)
       {
         uint16_t val;
-        val = lduw_code(s->pc);
+        val = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
         s->pc += 2;
+        s->phys_pc += 2;
         gen_set_sr_im(s, val, ccr_only);
       }
     else
@@ -1502,8 +1523,9 @@ DISAS_INSN(mull)
 
     /* The upper 32 bits of the product are discarded, so
        muls.l and mulu.l are functionally equivalent.  */
-    ext = lduw_code(s->pc);
+    ext = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
     s->pc += 2;
+    s->phys_pc += 2;
     if (ext & 0x87ff) {
         gen_exception(s, s->pc - 4, EXCP_UNSUPPORTED);
         return;
@@ -1523,8 +1545,9 @@ DISAS_INSN(link)
     int reg;
     int tmp;
 
-    offset = ldsw_code(s->pc);
+    offset = ldsw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
     s->pc += 2;
+    s->phys_pc += 2;
     reg = AREG(insn, 0);
     tmp = gen_new_qreg(QMODE_I32);
     gen_op_sub32(tmp, QREG_SP, gen_im32(4));
@@ -1622,9 +1645,11 @@ DISAS_INSN(tpf)
     switch (insn & 7) {
     case 2: /* One extension word.  */
         s->pc += 2;
+        s->phys_pc += 2;
         break;
     case 3: /* Two extension words.  */
         s->pc += 4;
+        s->phys_pc += 4;
         break;
     case 4: /* No extension words.  */
         break;
@@ -1644,8 +1669,9 @@ DISAS_INSN(branch)
     op = (insn >> 8) & 0xf;
     offset = (int8_t)insn;
     if (offset == 0) {
-        offset = ldsw_code(s->pc);
+        offset = ldsw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
         s->pc += 2;
+        s->phys_pc += 2;
     } else if (offset == -1) {
         offset = read_im32(s);
     }
@@ -1957,14 +1983,16 @@ DISAS_INSN(strldsr)
     uint32_t addr;
 
     addr = s->pc - 2;
-    ext = lduw_code(s->pc);
+    ext = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
     s->pc += 2;
+    s->phys_pc += 2;
     if (ext != 0x46FC) {
         gen_exception(s, addr, EXCP_UNSUPPORTED);
         return;
     }
-    ext = lduw_code(s->pc);
+    ext = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
     s->pc += 2;
+    s->phys_pc += 2;
     if (IS_USER(s) || (ext & SR_S) == 0) {
         gen_exception(s, addr, EXCP_PRIVILEGE);
         return;
@@ -2032,8 +2060,9 @@ DISAS_INSN(stop)
         return;
     }
 
-    ext = lduw_code(s->pc);
+    ext = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
     s->pc += 2;
+    s->phys_pc += 2;
 
     gen_set_sr_im(s, ext, 0);
     gen_jmp(s, gen_im32(s->pc));
@@ -2059,8 +2088,9 @@ DISAS_INSN(movec)
         return;
     }
 
-    ext = lduw_code(s->pc);
+    ext = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
     s->pc += 2;
+    s->phys_pc += 2;
 
     if (ext & 0x8000) {
         reg = AREG(ext, 12);
@@ -2121,8 +2151,9 @@ DISAS_INSN(fpu)
     int round;
     int opsize;
 
-    ext = lduw_code(s->pc);
+    ext = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
     s->pc += 2;
+    s->phys_pc += 2;
     opmode = ext & 0x7f;
     switch ((ext >> 13) & 7) {
     case 0: case 2:
@@ -2331,6 +2362,7 @@ DISAS_INSN(fpu)
     return;
 undef:
     s->pc -= 2;
+    s->phys_pc -= 2;
     disas_undef_fpu(s, insn);
 }
 
@@ -2343,11 +2375,14 @@ DISAS_INSN(fbcc)
     int l1;
 
     addr = s->pc;
-    offset = ldsw_code(s->pc);
+    offset = ldsw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
     s->pc += 2;
+    s->phys_pc += 2;
     if (insn & (1 << 6)) {
-        offset = (offset << 16) | lduw_code(s->pc);
+        offset = (offset << 16) |
+            lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
         s->pc += 2;
+        s->phys_pc += 2;
     }
 
     l1 = gen_new_label();
@@ -2473,8 +2508,9 @@ DISAS_INSN(mac)
     int dual;
     int saved_flags = -1;
 
-    ext = lduw_code(s->pc);
+    ext = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
     s->pc += 2;
+    s->phys_pc += 2;
 
     acc = ((insn >> 7) & 1) | ((ext >> 3) & 2);
     dual = ((insn & 0x30) != 0 && (ext & 3) != 0);
@@ -2882,8 +2918,9 @@ static void disas_m68k_insn(CPUState * e
 {
     uint16_t insn;
 
-    insn = lduw_code(s->pc);
+    insn = lduw_code_p(&s->phys_pc_start, s->phys_pc, s->pc);
     s->pc += 2;
+    s->phys_pc += 2;
 
     opcode_table[insn](s, insn);
 }
@@ -3169,6 +3206,9 @@ gen_intermediate_code_internal(CPUState 
     dc->env = env;
     dc->is_jmp = DISAS_NEXT;
     dc->pc = pc_start;
+    dc->phys_pc_start = (unsigned long)phys_ram_base + tb->page_addr[0] +
+        (pc_start & ~TARGET_PAGE_MASK);
+    dc->phys_pc = dc->phys_pc_start;
     dc->cc_op = CC_OP_DYNAMIC;
     dc->singlestep_enabled = env->singlestep_enabled;
     dc->fpcr = env->fpcr;
Index: target-mips/translate.c
===================================================================
RCS file: /sources/qemu/qemu/target-mips/translate.c,v
retrieving revision 1.106
diff -u -d -d -p -r1.106 translate.c
--- target-mips/translate.c	9 Oct 2007 03:39:58 -0000	1.106
+++ target-mips/translate.c	13 Oct 2007 10:19:07 -0000
@@ -536,6 +536,7 @@ FOP_CONDS(abs, ps)
 typedef struct DisasContext {
     struct TranslationBlock *tb;
     target_ulong pc, saved_pc;
+    unsigned long phys_pc, phys_pc_start;
     uint32_t opcode;
     uint32_t fp_status;
     /* Routine used to access memory */
@@ -1764,6 +1765,7 @@ static void gen_compute_branch (DisasCon
             /* Skip the instruction in the delay slot */
             MIPS_DEBUG("bnever, link and skip");
             ctx->pc += 4;
+            ctx->phys_pc += 4;
             return;
         case OPC_BNEL:    /* rx != rx likely */
         case OPC_BGTZL:   /* 0 > 0 likely */
@@ -1771,6 +1773,7 @@ static void gen_compute_branch (DisasCon
             /* Skip the instruction in the delay slot */
             MIPS_DEBUG("bnever and skip");
             ctx->pc += 4;
+            ctx->phys_pc += 4;
             return;
         case OPC_J:
             ctx->hflags |= MIPS_HFLAG_B;
@@ -6495,6 +6498,9 @@ gen_intermediate_code_internal (CPUState
     gen_opparam_ptr = gen_opparam_buf;
     nb_gen_labels = 0;
     ctx.pc = pc_start;
+    ctx.phys_pc_start = (unsigned long)phys_ram_base + tb->page_addr[0] +
+        (pc_start & ~TARGET_PAGE_MASK);
+    ctx.phys_pc = ctx.phys_pc_start;
     ctx.saved_pc = -1;
     ctx.tb = tb;
     ctx.bstate = BS_NONE;
@@ -6544,9 +6554,10 @@ gen_intermediate_code_internal (CPUState
             gen_opc_hflags[lj] = ctx.hflags & MIPS_HFLAG_BMASK;
             gen_opc_instr_start[lj] = 1;
         }
-        ctx.opcode = ldl_code(ctx.pc);
+        ctx.opcode = ldl_code_p(&ctx.phys_pc_start, ctx.phys_pc, ctx.pc);
         decode_opc(env, &ctx);
         ctx.pc += 4;
+        ctx.phys_pc += 4;
 
         if (env->singlestep_enabled)
             break;
Index: target-ppc/cpu.h
===================================================================
RCS file: /sources/qemu/qemu/target-ppc/cpu.h,v
retrieving revision 1.79
diff -u -d -d -p -r1.79 cpu.h
--- target-ppc/cpu.h	12 Oct 2007 06:47:46 -0000	1.79
+++ target-ppc/cpu.h	13 Oct 2007 10:19:07 -0000
@@ -37,6 +37,8 @@ typedef uint64_t ppc_gpr_t;
 #define TARGET_GPR_BITS  64
 #define TARGET_LONG_BITS 32
 #define REGX "%016" PRIx64
+/* need explicit support for instructions spanning 2 pages for VLE code */
+#define TARGET_HAS_VLE_INSNS 1
 #if defined(CONFIG_USER_ONLY)
 /* It looks like a lot of Linux programs assume page size
  * is 4kB long. This is evil, but we have to deal with it...
Index: target-ppc/translate.c
===================================================================
RCS file: /sources/qemu/qemu/target-ppc/translate.c,v
retrieving revision 1.92
diff -u -d -d -p -r1.92 translate.c
--- target-ppc/translate.c	7 Oct 2007 23:10:08 -0000	1.92
+++ target-ppc/translate.c	13 Oct 2007 10:19:08 -0000
@@ -6678,6 +6678,7 @@ static always_inline int gen_intermediat
 {
     DisasContext ctx, *ctxp = &ctx;
     opc_handler_t **table, *handler;
+    unsigned long phys_pc, phys_pc_start;
     target_ulong pc_start;
     uint16_t *gen_opc_end;
     int supervisor;
@@ -6685,6 +6686,9 @@ static always_inline int gen_intermediat
     int j, lj = -1;
 
     pc_start = tb->pc;
+    phys_pc_start = (unsigned long)phys_ram_base + tb->page_addr[0] +
+        (pc_start & ~TARGET_PAGE_MASK);
+    phys_pc = phys_pc_start;
     gen_opc_ptr = gen_opc_buf;
     gen_opc_end = gen_opc_buf + OPC_MAX_SIZE;
     gen_opparam_ptr = gen_opparam_buf;
@@ -6763,7 +6771,7 @@ static always_inline int gen_intermediat
                     ctx.nip, 1 - msr_pr, msr_ir);
         }
 #endif
-        ctx.opcode = ldl_code(ctx.nip);
+        ctx.opcode = ldl_code_p(&phys_pc_start, phys_pc, ctx.nip);
         if (msr_le) {
             ctx.opcode = ((ctx.opcode & 0xFF000000) >> 24) |
                 ((ctx.opcode & 0x00FF0000) >> 8) |
@@ -6778,6 +6786,7 @@ static always_inline int gen_intermediat
         }
 #endif
         ctx.nip += 4;
+        phys_pc += 4;
         table = env->opcodes;
         handler = table[opc1(ctx.opcode)];
         if (is_indirect_opcode(handler)) {
Index: target-sh4/translate.c
===================================================================
RCS file: /sources/qemu/qemu/target-sh4/translate.c,v
retrieving revision 1.18
diff -u -d -d -p -r1.18 translate.c
--- target-sh4/translate.c	29 Sep 2007 19:52:22 -0000	1.18
+++ target-sh4/translate.c	13 Oct 2007 10:19:08 -0000
@@ -1150,11 +1150,15 @@ gen_intermediate_code_internal(CPUState 
 {
     DisasContext ctx;
     target_ulong pc_start;
+    unsigned long phys_pc, phys_pc_start;
     static uint16_t *gen_opc_end;
     uint32_t old_flags;
     int i, ii;
 
     pc_start = tb->pc;
+    phys_pc_start = (unsigned long)phys_ram_base + tb->page_addr[0] +
+        (pc_start & ~TARGET_PAGE_MASK);
+    phys_pc = phys_pc_start;
     gen_opc_ptr = gen_opc_buf;
     gen_opc_end = gen_opc_buf + OPC_MAX_SIZE;
     gen_opparam_ptr = gen_opparam_buf;
@@ -1210,9 +1218,10 @@ gen_intermediate_code_internal(CPUState 
 	fprintf(stderr, "Loading opcode at address 0x%08x\n", ctx.pc);
 	fflush(stderr);
 #endif
-	ctx.opcode = lduw_code(ctx.pc);
+	ctx.opcode = lduw_code_p(&phys_pc_start, phys_pc, ctx.pc);
 	decode_opc(&ctx);
 	ctx.pc += 2;
+        phys_pc += 2;
 	if ((ctx.pc & (TARGET_PAGE_SIZE - 1)) == 0)
 	    break;
 	if (env->singlestep_enabled)
Index: target-sparc/translate.c
===================================================================
RCS file: /sources/qemu/qemu/target-sparc/translate.c,v
retrieving revision 1.74
diff -u -d -d -p -r1.74 translate.c
--- target-sparc/translate.c	10 Oct 2007 19:11:54 -0000	1.74
+++ target-sparc/translate.c	13 Oct 2007 10:19:08 -0000
@@ -48,6 +48,8 @@ typedef struct DisasContext {
     target_ulong pc;    /* current Program Counter: integer or DYNAMIC_PC */
     target_ulong npc;   /* next PC: integer or DYNAMIC_PC or JUMP_PC */
     target_ulong jump_pc[2]; /* used when JUMP_PC pc value is used */
+    unsigned long phys_pc;
+    unsigned long phys_pc_start;
     int is_br;
     int mem_idx;
     int fpu_enabled;
@@ -1089,7 +1091,7 @@ static void disas_sparc_insn(DisasContex
 {
     unsigned int insn, opc, rs1, rs2, rd;
 
-    insn = ldl_code(dc->pc);
+    insn = ldl_code_p(&dc->phys_pc_start, dc->phys_pc, dc->pc);
     opc = GET_FIELD(insn, 0, 1);
 
     rd = GET_FIELD(insn, 2, 6);
@@ -3376,6 +3378,8 @@ static inline int gen_intermediate_code_
     dc->tb = tb;
     pc_start = tb->pc;
     dc->pc = pc_start;
+    dc->phys_pc_start = (unsigned long)phys_ram_base + tb->page_addr[0] +
+        (pc_start & ~TARGET_PAGE_MASK);
     last_pc = dc->pc;
     dc->npc = (target_ulong) tb->cs_base;
 #if defined(CONFIG_USER_ONLY)
@@ -3422,6 +3431,7 @@ static inline int gen_intermediate_code_
             }
         }
         last_pc = dc->pc;
+        dc->phys_pc = dc->phys_pc_start + dc->pc - pc_start;
         disas_sparc_insn(dc);
 
         if (dc->is_br)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Fwd: Re: [Qemu-devel] RFC: Code fetch optimisation]
  2007-10-13  9:57   ` J. Mayer
  2007-10-13 11:05     ` J. Mayer
@ 2007-10-13 11:08     ` Blue Swirl
  1 sibling, 0 replies; 8+ messages in thread
From: Blue Swirl @ 2007-10-13 11:08 UTC (permalink / raw)
  To: qemu-devel

On 10/13/07, J. Mayer <l_indien@magic.fr> wrote:
> On Sat, 2007-10-13 at 10:11 +0300, Blue Swirl wrote:
> > On 10/13/07, J. Mayer <l_indien@magic.fr> wrote:
> > > -------- Forwarded Message --------
> > > > From: Jocelyn Mayer <l_indien@magic.fr>
> > > > Reply-To: l_indien@magic.fr, qemu-devel@nongnu.org
> > > > To: qemu-devel@nongnu.org
> > > > Subject: Re: [Qemu-devel] RFC: Code fetch optimisation
> > > > Date: Fri, 12 Oct 2007 20:24:44 +0200
> > > >
> > > > On Fri, 2007-10-12 at 18:21 +0300, Blue Swirl wrote:
> > > > > On 10/12/07, J. Mayer <l_indien@magic.fr> wrote:
> > > > > > Here's a small patch that allow an optimisation for code fetch, at least
> > > > > > for RISC CPU targets, as suggested by Fabrice Bellard.
> > > > > > The main idea is that a translated block is never to span over a page
> > > > > > boundary. As the tb_find_slow routine already gets the physical address
> > > > > > of the page of code to be translated, the code translator could then
> > > > > > fetch the code using raw host memory accesses instead of doing it
> > > > > > through the softmmu routines.
> > > > > > This patch could also be adapted to RISC CPU targets, with care for the
> > > > > > last instruction of a page. For now, I did implement it for alpha, arm,
> > > > > > mips, PowerPC and SH4.
> > > > > > I don't actually know if the optimsation would bring a sensible speed
> > > > > > gain or if it will be absolutelly marginal.
> > > > > >
> > > > > > Please comment.
> > > > >
> > > > > This will not work correctly for execution of MMIO registers, but
> > > > > maybe that won't work on real hardware either. Who cares.
> > > >
> > > > I wonder if this is important or not... But maybe, when retrieving the
> > > > physical address we could check if it is inside ROM/RAM or an I/O area
> > > > and in the last case do not give the phys_addr information to the
> > > > translator. In that case, it would go on using the ldxx_code. I guess if
> > > > we want to do that, a set of helpers would be appreciated to avoid
> > > > adding code like:
> > > > if (phys_pc == 0)
> > > >   opc = ldul_code(virt_pc)
> > > > else
> > > >   opc = ldul_raw(phys_pc)
> > > > everywhere... I could also add another check so this set of macro would
> > > > automatically use ldxx_code if we reach a page boundary, which would
> > > > then make easy to use this optimisation for CISC/VLE architectures too.
> > > >
> > > > I'm not sure of the proper solution to allow executing code from mmio
> > > > devices. But adding specific accessors to handle the CISC/VLE case is to
> > > > be done.
> > >
> > > [...]
> > >
> > > I did update my patch following this way and it's now able to run x86
> > > and PowerPC targets.
> > > PowerPC is the easy case, x86 is maybe the worst... Well, I'm not really
> > > sure of what I've done for Sparc, but other targets should be safe.
> >
> > It broke Sparc, delay slot handling makes things complicated. The
> > updated patch passes my tests.
>
> OK. I will take a look of how you solved this issue.
>
> > For extra performance, I bypassed the ldl_code_p. On Sparc,
> > instructions can't be split between two pages. Isn't translation
> > always contained to the same page for all targets like Sparc?
>
> Yes, for RISC targets running 32 bits mode, we always stop translation
> when we reach the end of a code page. The problem comes with CISC
> architectures, like x86 or m68k, or RISC architecture running 16/32 bits
> code, like ARM in thumb mode or PowerPC in VLE mode. In all those case,
> there can be instructions spanning on 2 pages, then we need the
> ldx_code_p functions.

I see.

> My idea of always using the ldx_code_p function is that we may have the
> occasion to make it more cleaver and make the slow case handle code
> execution in mmio areas, when it will be possible.

The fast path could be provided for the RISC targets conditional to, for example
#define TARGET_CODE_ALIGNED
or something.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Fwd: Re: [Qemu-devel] RFC: Code fetch optimisation]
  2007-10-13 11:05     ` J. Mayer
@ 2007-10-13 11:58       ` Blue Swirl
  2007-10-13 19:07       ` Thiemo Seufer
  1 sibling, 0 replies; 8+ messages in thread
From: Blue Swirl @ 2007-10-13 11:58 UTC (permalink / raw)
  To: qemu-devel

On 10/13/07, J. Mayer <l_indien@magic.fr> wrote:
> On Sat, 2007-10-13 at 11:57 +0200, J. Mayer wrote:
> > On Sat, 2007-10-13 at 10:11 +0300, Blue Swirl wrote:
> > > On 10/13/07, J. Mayer <l_indien@magic.fr> wrote:
> > > > -------- Forwarded Message --------
> > > > > From: Jocelyn Mayer <l_indien@magic.fr>
> > > > > Reply-To: l_indien@magic.fr, qemu-devel@nongnu.org
> > > > > To: qemu-devel@nongnu.org
> > > > > Subject: Re: [Qemu-devel] RFC: Code fetch optimisation
> > > > > Date: Fri, 12 Oct 2007 20:24:44 +0200
> > > > >
> > > > > On Fri, 2007-10-12 at 18:21 +0300, Blue Swirl wrote:
> > > > > > On 10/12/07, J. Mayer <l_indien@magic.fr> wrote:
> > > > > > > Here's a small patch that allow an optimisation for code fetch, at least
> > > > > > > for RISC CPU targets, as suggested by Fabrice Bellard.
> > > > > > > The main idea is that a translated block is never to span over a page
> > > > > > > boundary. As the tb_find_slow routine already gets the physical address
> > > > > > > of the page of code to be translated, the code translator could then
> > > > > > > fetch the code using raw host memory accesses instead of doing it
> > > > > > > through the softmmu routines.
> > > > > > > This patch could also be adapted to RISC CPU targets, with care for the
> > > > > > > last instruction of a page. For now, I did implement it for alpha, arm,
> > > > > > > mips, PowerPC and SH4.
> > > > > > > I don't actually know if the optimsation would bring a sensible speed
> > > > > > > gain or if it will be absolutelly marginal.
> > > > > > >
> > > > > > > Please comment.
> > > > > >
> > > > > > This will not work correctly for execution of MMIO registers, but
> > > > > > maybe that won't work on real hardware either. Who cares.
> > > > >
> > > > > I wonder if this is important or not... But maybe, when retrieving the
> > > > > physical address we could check if it is inside ROM/RAM or an I/O area
> > > > > and in the last case do not give the phys_addr information to the
> > > > > translator. In that case, it would go on using the ldxx_code. I guess if
> > > > > we want to do that, a set of helpers would be appreciated to avoid
> > > > > adding code like:
> > > > > if (phys_pc == 0)
> > > > >   opc = ldul_code(virt_pc)
> > > > > else
> > > > >   opc = ldul_raw(phys_pc)
> > > > > everywhere... I could also add another check so this set of macro would
> > > > > automatically use ldxx_code if we reach a page boundary, which would
> > > > > then make easy to use this optimisation for CISC/VLE architectures too.
> > > > >
> > > > > I'm not sure of the proper solution to allow executing code from mmio
> > > > > devices. But adding specific accessors to handle the CISC/VLE case is to
> > > > > be done.
> > > >
> > > > [...]
> > > >
> > > > I did update my patch following this way and it's now able to run x86
> > > > and PowerPC targets.
> > > > PowerPC is the easy case, x86 is maybe the worst... Well, I'm not really
> > > > sure of what I've done for Sparc, but other targets should be safe.
> > >
> > > It broke Sparc, delay slot handling makes things complicated. The
> > > updated patch passes my tests.
> >
> > OK. I will take a look of how you solved this issue.
> >
> > > For extra performance, I bypassed the ldl_code_p. On Sparc,
> > > instructions can't be split between two pages. Isn't translation
> > > always contained to the same page for all targets like Sparc?
> >
> > Yes, for RISC targets running 32 bits mode, we always stop translation
> > when we reach the end of a code page. The problem comes with CISC
> > architectures, like x86 or m68k, or RISC architecture running 16/32 bits
> > code, like ARM in thumb mode or PowerPC in VLE mode. In all those case,
> > there can be instructions spanning on 2 pages, then we need the
> > ldx_code_p functions.
> > My idea of always using the ldx_code_p function is that we may have the
> > occasion to make it more cleaver and make the slow case handle code
> > execution in mmio areas, when it will be possible.
>
> Here's an updated patch. I added a definition TARGET_HAS_VLE_INSNS which
> is defined is the cris, i386, m68k and ppcemb cases. Arm already has an
> explicit support for 32 bits thumb instructions spanning 2 pages, so it
> should not need this define. When this define is not set, the
> ldxxx_code_p function just does ldxxx_raw(phys_pc) in the softmmu case
> and ldxxx_raw(pc) in the user-mode only case. This is optimal for pure
> RISC architectures and does not need the #ifdef CONFIG_USER_ONLY you
> added for Sparc in your patch version. I also added a provision for a
> TARGET_MMIO_CODE define which may be used later when this will really be
> supported by Qemu.
> I also took your fixes for Sparc phys_pc computation, but reversed your
> patch to use ldl_raw as it should not be needed anymore.
> I did test PowerPC in user-mode only and softmmu mode and i386 in
> softmmu successfully using this new version of the patch.

OK  for Sparc.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Fwd: Re: [Qemu-devel] RFC: Code fetch optimisation]
  2007-10-13 11:05     ` J. Mayer
  2007-10-13 11:58       ` Blue Swirl
@ 2007-10-13 19:07       ` Thiemo Seufer
  2007-10-13 21:11         ` J. Mayer
  1 sibling, 1 reply; 8+ messages in thread
From: Thiemo Seufer @ 2007-10-13 19:07 UTC (permalink / raw)
  To: J. Mayer; +Cc: qemu-devel

J. Mayer wrote:
[snip]
> > My idea of always using the ldx_code_p function is that we may have the
> > occasion to make it more cleaver and make the slow case handle code
> > execution in mmio areas, when it will be possible.
> 
> Here's an updated patch. I added a definition TARGET_HAS_VLE_INSNS which
> is defined is the cris, i386, m68k and ppcemb cases. Arm already has an
> explicit support for 32 bits thumb instructions spanning 2 pages, so it
> should not need this define. When this define is not set, the
> ldxxx_code_p function just does ldxxx_raw(phys_pc) in the softmmu case
> and ldxxx_raw(pc) in the user-mode only case. This is optimal for pure
> RISC architectures and does not need the #ifdef CONFIG_USER_ONLY you
> added for Sparc in your patch version. I also added a provision for a
> TARGET_MMIO_CODE define which may be used later when this will really be
> supported by Qemu.
> I also took your fixes for Sparc phys_pc computation, but reversed your
> patch to use ldl_raw as it should not be needed anymore.
> I did test PowerPC in user-mode only and softmmu mode and i386 in
> softmmu successfully using this new version of the patch.

Works ok for MIPS. There's no obvious change in performance, I guess
the slow TLB emulation drowns out any possible improvement.


Thiemo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Fwd: Re: [Qemu-devel] RFC: Code fetch optimisation]
  2007-10-13 19:07       ` Thiemo Seufer
@ 2007-10-13 21:11         ` J. Mayer
  0 siblings, 0 replies; 8+ messages in thread
From: J. Mayer @ 2007-10-13 21:11 UTC (permalink / raw)
  To: Thiemo Seufer; +Cc: qemu-devel

On Sat, 2007-10-13 at 20:07 +0100, Thiemo Seufer wrote:
> J. Mayer wrote:
> [snip]
> > > My idea of always using the ldx_code_p function is that we may have the
> > > occasion to make it more cleaver and make the slow case handle code
> > > execution in mmio areas, when it will be possible.
> > 
> > Here's an updated patch. I added a definition TARGET_HAS_VLE_INSNS which
> > is defined is the cris, i386, m68k and ppcemb cases. Arm already has an
> > explicit support for 32 bits thumb instructions spanning 2 pages, so it
> > should not need this define. When this define is not set, the
> > ldxxx_code_p function just does ldxxx_raw(phys_pc) in the softmmu case
> > and ldxxx_raw(pc) in the user-mode only case. This is optimal for pure
> > RISC architectures and does not need the #ifdef CONFIG_USER_ONLY you
> > added for Sparc in your patch version. I also added a provision for a
> > TARGET_MMIO_CODE define which may be used later when this will really be
> > supported by Qemu.
> > I also took your fixes for Sparc phys_pc computation, but reversed your
> > patch to use ldl_raw as it should not be needed anymore.
> > I did test PowerPC in user-mode only and softmmu mode and i386 in
> > softmmu successfully using this new version of the patch.
> 
> Works ok for MIPS. There's no obvious change in performance, I guess
> the slow TLB emulation drowns out any possible improvement.

Great !
Yes, the optimisation we got here is more a 'don't waste our time doing
unneeded things' than a great performance boost. Running a long test
program in linux user mode seems to indicate it spend a little less time
in user-mode when the patch is applied, but this is not very significant
compared to the total time spent in execution. Maybe gprof would show us
if we really spend less time in the translation process...

-- 
J. Mayer <l_indien@magic.fr>
Never organized

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2007-10-13 21:11 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-10-12 23:00 [Fwd: Re: [Qemu-devel] RFC: Code fetch optimisation] J. Mayer
2007-10-13  7:11 ` Blue Swirl
2007-10-13  9:57   ` J. Mayer
2007-10-13 11:05     ` J. Mayer
2007-10-13 11:58       ` Blue Swirl
2007-10-13 19:07       ` Thiemo Seufer
2007-10-13 21:11         ` J. Mayer
2007-10-13 11:08     ` Blue Swirl

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.