All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC][PATCH v2 0/4] tcg: enhance code generation quality for qemu_ld/st IRs
@ 2012-07-05 13:23 Yeongkyoon Lee
  2012-07-05 13:23 ` [Qemu-devel] [RFC][PATCH v2 1/4] tcg: add declarations and templates of extended MMU helpers Yeongkyoon Lee
                   ` (4 more replies)
  0 siblings, 5 replies; 22+ messages in thread
From: Yeongkyoon Lee @ 2012-07-05 13:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: laurent.desnogues, chenwj, e.voevodin, Yeongkyoon Lee

Hi, all.

I think the generated codes from qemu_ld/st IRs are relatively heavy, which are up to 12 instructions for TLB hit case on i386 host.
This patch series enhances the code quality of TCG qemu_ld/st IRs by reducing jump and enhancing locality.
Main idea is simple and has been already described in the comments in tcg-target.c, which separates slow path (TLB miss case), and generates it at the end of TB.

For example, the generated code from qemu_ld changes as follow.
Before:
(1) TLB check
(2) If hit fall through, else jump to TLB miss case (5)
(3) TLB hit case: Load value from host memory
(4) Jump to next code (6)
(5) TLB miss case: call MMU helper
(6) ... (next code)

After:
(1) TLB check
(2) If hit fall through, else jump to TLB miss case (7)
(3) TLB hit case: Load value from host memory
(4) ... (next code)
...
(7) TLB miss case: call MMU helper
(8) Return to next code (4)

Following is some performance results which was measured based on qemu 1.0.
Although there was measurement error, the results was not negligible.

* EEMBC CoreMark (before -> after)
  - Guest: i386, Linux (Tizen platform)
  - Host: Intel Core2 Quad 2.4GHz, 2GB RAM, Linux
  - Results: 1135.6 -> 1179.9 (+3.9%)

* nbench (before -> after)
  - Guest: i386, Linux (linux-0.2.img included in QEMU source)
  - Host: Intel Core2 Quad 2.4GHz, 2GB RAM, Linux
  - Results
    . MEMORY INDEX: 1.6782 -> 1.6818 (+0.2%)
    . INTEGER INDEX: 1.8258 -> 1.877 (+2.8%)
    . FLOATING-POINT INDEX: 0.5944 -> 0.5954 (+0.2%)

Summarized feature is as following.
 - All the changes are wrapped by macro "CONFIG_QEMU_LDST_OPTIMIZATION" and disabled by default.
 - They are enabled by "configure --enable-ldst-optimization" and need CONFIG_SOFTMMU.
 - They do not work with CONFIG_TCG_PASS_AREG0 because it looks better apply them after areg0 codes come steady.
 - Currently, they support only x86 and x86-64 and have been tested with x86 and ARM linux targets on x86/x86-64 host platforms.
 - Build test has been done for all targets.

In addition, I have tried to remove the generated codes of calling MMU helpers for TLB miss case from end of TB, however, have not found good solution yet. In my opinion, TLB hit case performance could be degraded if removing the calling codes, because it needs to set runtime parameters, such as, data, mmu index and return address, in register or stack though they are not used in TLB hit case. This remains as a further issue.

Yeongkyoon Lee (4):
  tcg: add declarations and templates of extended MMU helpers
  tcg: add extended MMU helpers to softmmu targets
  tcg: add optimized TCG qemu_ld/st generation
  configure: add CONFIG_QEMU_LDST_OPTIMIZATION for TCG qemu_ld/st
    optimization

 configure                     |   15 ++
 softmmu_defs.h                |   13 ++
 softmmu_template.h            |   51 +++++--
 target-alpha/mem_helper.c     |   22 +++
 target-arm/op_helper.c        |   23 +++
 target-cris/op_helper.c       |   22 +++
 target-i386/mem_helper.c      |   22 +++
 target-lm32/op_helper.c       |   23 +++-
 target-m68k/op_helper.c       |   22 +++
 target-microblaze/op_helper.c |   22 +++
 target-mips/op_helper.c       |   22 +++
 target-ppc/mem_helper.c       |   22 +++
 target-s390x/op_helper.c      |   22 +++
 target-sh4/op_helper.c        |   22 +++
 target-sparc/ldst_helper.c    |   23 +++
 target-xtensa/op_helper.c     |   22 +++
 tcg/i386/tcg-target.c         |  328 +++++++++++++++++++++++++++++++++++++++++
 tcg/tcg.c                     |   12 ++
 tcg/tcg.h                     |   35 +++++
 19 files changed, 732 insertions(+), 11 deletions(-)

-- 
1.7.4.1

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Qemu-devel] [RFC][PATCH v2 1/4] tcg: add declarations and templates of extended MMU helpers
  2012-07-05 13:23 [Qemu-devel] [RFC][PATCH v2 0/4] tcg: enhance code generation quality for qemu_ld/st IRs Yeongkyoon Lee
@ 2012-07-05 13:23 ` Yeongkyoon Lee
  2012-07-05 13:40   ` Peter Maydell
  2012-07-05 13:23 ` [Qemu-devel] [RFC][PATCH v2 2/4] tcg: add extended MMU helpers to softmmu targets Yeongkyoon Lee
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 22+ messages in thread
From: Yeongkyoon Lee @ 2012-07-05 13:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: laurent.desnogues, chenwj, e.voevodin, Yeongkyoon Lee

Add declarations and templates of extended MMU helpers which can take return address argument to what helper functions return. These extended helper functions are called only by generated code.

Signed-off-by: Yeongkyoon Lee <yeongkyoon.lee@samsung.com>
---
 softmmu_defs.h     |   13 +++++++++++++
 softmmu_template.h |   51 +++++++++++++++++++++++++++++++++++++++++----------
 2 files changed, 54 insertions(+), 10 deletions(-)

diff --git a/softmmu_defs.h b/softmmu_defs.h
index 8d59f9d..728c51b 100644
--- a/softmmu_defs.h
+++ b/softmmu_defs.h
@@ -19,6 +19,19 @@ void __stl_mmu(target_ulong addr, uint32_t val, int mmu_idx);
 uint64_t __ldq_mmu(target_ulong addr, int mmu_idx);
 void __stq_mmu(target_ulong addr, uint64_t val, int mmu_idx);
 
+#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
+/* Extended versions of MMU helpers for qemu_ld/st optimization.
+   They get return address arguments because the caller PCs are not where helpers return to. */
+uint8_t __ext_ldb_mmu(target_ulong addr, int mmu_idx, uintptr_t ra);
+void __ext_stb_mmu(target_ulong addr, uint8_t val, int mmu_idx, uintptr_t ra);
+uint16_t __ext_ldw_mmu(target_ulong addr, int mmu_idx, uintptr_t ra);
+void __ext_stw_mmu(target_ulong addr, uint16_t val, int mmu_idx, uintptr_t ra);
+uint32_t __ext_ldl_mmu(target_ulong addr, int mmu_idx, uintptr_t ra);
+void __ext_stl_mmu(target_ulong addr, uint32_t val, int mmu_idx, uintptr_t ra);
+uint64_t __ext_ldq_mmu(target_ulong addr, int mmu_idx, uintptr_t ra);
+void __ext_stq_mmu(target_ulong addr, uint64_t val, int mmu_idx, uintptr_t ra);
+#endif  /* CONFIG_QEMU_LDST_OPTIMIZATION */
+
 uint8_t __ldb_cmmu(target_ulong addr, int mmu_idx);
 void __stb_cmmu(target_ulong addr, uint8_t val, int mmu_idx);
 uint16_t __ldw_cmmu(target_ulong addr, int mmu_idx);
diff --git a/softmmu_template.h b/softmmu_template.h
index b8bd700..d549800 100644
--- a/softmmu_template.h
+++ b/softmmu_template.h
@@ -66,6 +66,25 @@
 #define HELPER_PREFIX helper_
 #endif
 
+#ifdef USE_EXTENDED_HELPER
+/* Exteneded helper funtions have one more argument of address
+   to which pc is returned after setting TLB entry */
+#if !defined(CONFIG_QEMU_LDST_OPTIMIZATION) || defined(CONFIG_TCG_PASS_AREG0)
+#error You need CONFIG_QEMU_LDST_OPTIMIZATION or to disable CONFIG_TCG_PASS_AREG0!
+#endif
+#undef HELPER_PREFIX
+#define HELPER_PREFIX __ext_
+#define RET_PARAM , uintptr_t raddr
+#define RET_VAR raddr
+#define GET_RET_ADDR() RET_VAR
+#else
+#define RET_PARAM
+#define RET_VAR
+#define GET_RET_ADDR() GETPC()
+#endif  /* USE_EXTENDED_HELPER */
+
+
+#ifndef USE_EXTENDED_HELPER
 static DATA_TYPE glue(glue(slow_ld, SUFFIX), MMUSUFFIX)(ENV_PARAM
                                                         target_ulong addr,
                                                         int mmu_idx,
@@ -101,12 +120,14 @@ static inline DATA_TYPE glue(io_read, SUFFIX)(ENV_PARAM
 #endif /* SHIFT > 2 */
     return res;
 }
+#endif  /* !USE_EXTENDED_HELPER */
 
 /* handle all cases except unaligned access which span two pages */
 DATA_TYPE
 glue(glue(glue(HELPER_PREFIX, ld), SUFFIX), MMUSUFFIX)(ENV_PARAM
                                                        target_ulong addr,
-                                                       int mmu_idx)
+                                                       int mmu_idx
+                                                       RET_PARAM)
 {
     DATA_TYPE res;
     int index;
@@ -124,13 +145,13 @@ glue(glue(glue(HELPER_PREFIX, ld), SUFFIX), MMUSUFFIX)(ENV_PARAM
             /* IO access */
             if ((addr & (DATA_SIZE - 1)) != 0)
                 goto do_unaligned_access;
-            retaddr = GETPC();
+            retaddr = GET_RET_ADDR();
             ioaddr = env->iotlb[mmu_idx][index];
             res = glue(io_read, SUFFIX)(ENV_VAR ioaddr, addr, retaddr);
         } else if (((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1) >= TARGET_PAGE_SIZE) {
             /* slow unaligned access (it spans two pages or IO) */
         do_unaligned_access:
-            retaddr = GETPC();
+            retaddr = GET_RET_ADDR();
 #ifdef ALIGNED_ONLY
             do_unaligned_access(ENV_VAR addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
 #endif
@@ -141,7 +162,7 @@ glue(glue(glue(HELPER_PREFIX, ld), SUFFIX), MMUSUFFIX)(ENV_PARAM
             uintptr_t addend;
 #ifdef ALIGNED_ONLY
             if ((addr & (DATA_SIZE - 1)) != 0) {
-                retaddr = GETPC();
+                retaddr = GET_RET_ADDR();
                 do_unaligned_access(ENV_VAR addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
             }
 #endif
@@ -151,7 +172,7 @@ glue(glue(glue(HELPER_PREFIX, ld), SUFFIX), MMUSUFFIX)(ENV_PARAM
         }
     } else {
         /* the page is not in the TLB : fill it */
-        retaddr = GETPC();
+        retaddr = GET_RET_ADDR();
 #ifdef ALIGNED_ONLY
         if ((addr & (DATA_SIZE - 1)) != 0)
             do_unaligned_access(ENV_VAR addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
@@ -162,6 +183,7 @@ glue(glue(glue(HELPER_PREFIX, ld), SUFFIX), MMUSUFFIX)(ENV_PARAM
     return res;
 }
 
+#ifndef USE_EXTENDED_HELPER
 /* handle all unaligned cases */
 static DATA_TYPE
 glue(glue(slow_ld, SUFFIX), MMUSUFFIX)(ENV_PARAM
@@ -213,9 +235,11 @@ glue(glue(slow_ld, SUFFIX), MMUSUFFIX)(ENV_PARAM
     }
     return res;
 }
+#endif  /* !USE_EXTENDED_HELPER */
 
 #ifndef SOFTMMU_CODE_ACCESS
 
+#ifndef USE_EXTENDED_HELPER
 static void glue(glue(slow_st, SUFFIX), MMUSUFFIX)(ENV_PARAM
                                                    target_ulong addr,
                                                    DATA_TYPE val,
@@ -252,11 +276,13 @@ static inline void glue(io_write, SUFFIX)(ENV_PARAM
 #endif
 #endif /* SHIFT > 2 */
 }
+#endif  /* !USE_EXTENDED_HELPER */
 
 void glue(glue(glue(HELPER_PREFIX, st), SUFFIX), MMUSUFFIX)(ENV_PARAM
                                                             target_ulong addr,
                                                             DATA_TYPE val,
-                                                            int mmu_idx)
+                                                            int mmu_idx
+                                                            RET_PARAM)
 {
     target_phys_addr_t ioaddr;
     target_ulong tlb_addr;
@@ -271,12 +297,12 @@ void glue(glue(glue(HELPER_PREFIX, st), SUFFIX), MMUSUFFIX)(ENV_PARAM
             /* IO access */
             if ((addr & (DATA_SIZE - 1)) != 0)
                 goto do_unaligned_access;
-            retaddr = GETPC();
+            retaddr = GET_RET_ADDR();
             ioaddr = env->iotlb[mmu_idx][index];
             glue(io_write, SUFFIX)(ENV_VAR ioaddr, val, addr, retaddr);
         } else if (((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1) >= TARGET_PAGE_SIZE) {
         do_unaligned_access:
-            retaddr = GETPC();
+            retaddr = GET_RET_ADDR();
 #ifdef ALIGNED_ONLY
             do_unaligned_access(ENV_VAR addr, 1, mmu_idx, retaddr);
 #endif
@@ -287,7 +313,7 @@ void glue(glue(glue(HELPER_PREFIX, st), SUFFIX), MMUSUFFIX)(ENV_PARAM
             uintptr_t addend;
 #ifdef ALIGNED_ONLY
             if ((addr & (DATA_SIZE - 1)) != 0) {
-                retaddr = GETPC();
+                retaddr = GET_RET_ADDR();
                 do_unaligned_access(ENV_VAR addr, 1, mmu_idx, retaddr);
             }
 #endif
@@ -297,7 +323,7 @@ void glue(glue(glue(HELPER_PREFIX, st), SUFFIX), MMUSUFFIX)(ENV_PARAM
         }
     } else {
         /* the page is not in the TLB : fill it */
-        retaddr = GETPC();
+        retaddr = GET_RET_ADDR();
 #ifdef ALIGNED_ONLY
         if ((addr & (DATA_SIZE - 1)) != 0)
             do_unaligned_access(ENV_VAR addr, 1, mmu_idx, retaddr);
@@ -307,6 +333,7 @@ void glue(glue(glue(HELPER_PREFIX, st), SUFFIX), MMUSUFFIX)(ENV_PARAM
     }
 }
 
+#ifndef USE_EXTENDED_HELPER
 /* handles all unaligned cases */
 static void glue(glue(slow_st, SUFFIX), MMUSUFFIX)(ENV_PARAM
                                                    target_ulong addr,
@@ -356,6 +383,7 @@ static void glue(glue(slow_st, SUFFIX), MMUSUFFIX)(ENV_PARAM
         goto redo;
     }
 }
+#endif  /* !USE_EXTENDED_HELPER */
 
 #endif /* !defined(SOFTMMU_CODE_ACCESS) */
 
@@ -370,3 +398,6 @@ static void glue(glue(slow_st, SUFFIX), MMUSUFFIX)(ENV_PARAM
 #undef ENV_VAR
 #undef CPU_PREFIX
 #undef HELPER_PREFIX
+#undef RET_PARAM
+#undef RET_VAR
+#undef GET_RET_ADDR
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [RFC][PATCH v2 2/4] tcg: add extended MMU helpers to softmmu targets
  2012-07-05 13:23 [Qemu-devel] [RFC][PATCH v2 0/4] tcg: enhance code generation quality for qemu_ld/st IRs Yeongkyoon Lee
  2012-07-05 13:23 ` [Qemu-devel] [RFC][PATCH v2 1/4] tcg: add declarations and templates of extended MMU helpers Yeongkyoon Lee
@ 2012-07-05 13:23 ` Yeongkyoon Lee
  2012-07-05 13:43   ` Peter Maydell
  2012-07-05 13:23 ` [Qemu-devel] [RFC][PATCH v2 3/4] tcg: add optimized TCG qemu_ld/st generation Yeongkyoon Lee
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 22+ messages in thread
From: Yeongkyoon Lee @ 2012-07-05 13:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: laurent.desnogues, chenwj, e.voevodin, Yeongkyoon Lee

Add extended MMU helpers to softmmu targets, where the targets are alpha, arm, cris, i386, lm32, m68k, microblaze, mips, ppc, s390x, sh4, sparc and xtensa.

Signed-off-by: Yeongkyoon Lee <yeongkyoon.lee@samsung.com>
---
 target-alpha/mem_helper.c     |   22 ++++++++++++++++++++++
 target-arm/op_helper.c        |   23 +++++++++++++++++++++++
 target-cris/op_helper.c       |   22 ++++++++++++++++++++++
 target-i386/mem_helper.c      |   22 ++++++++++++++++++++++
 target-lm32/op_helper.c       |   23 ++++++++++++++++++++++-
 target-m68k/op_helper.c       |   22 ++++++++++++++++++++++
 target-microblaze/op_helper.c |   22 ++++++++++++++++++++++
 target-mips/op_helper.c       |   22 ++++++++++++++++++++++
 target-ppc/mem_helper.c       |   22 ++++++++++++++++++++++
 target-s390x/op_helper.c      |   22 ++++++++++++++++++++++
 target-sh4/op_helper.c        |   22 ++++++++++++++++++++++
 target-sparc/ldst_helper.c    |   23 +++++++++++++++++++++++
 target-xtensa/op_helper.c     |   22 ++++++++++++++++++++++
 13 files changed, 288 insertions(+), 1 deletions(-)

diff --git a/target-alpha/mem_helper.c b/target-alpha/mem_helper.c
index 87cada4..ef880a7 100644
--- a/target-alpha/mem_helper.c
+++ b/target-alpha/mem_helper.c
@@ -132,6 +132,28 @@ void cpu_unassigned_access(CPUAlphaState *env, target_phys_addr_t addr,
 #define SHIFT 3
 #include "softmmu_template.h"
 
+#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
+/* Exteneded MMU helper funtions for qemu_ld/st optimization
+   Note that normal helper functions should be defined above
+   to avoid duplication of common functions, slow_ld/st and io_read/write.
+ */
+#define USE_EXTENDED_HELPER
+
+#define SHIFT 0
+#include "softmmu_template.h"
+
+#define SHIFT 1
+#include "softmmu_template.h"
+
+#define SHIFT 2
+#include "softmmu_template.h"
+
+#define SHIFT 3
+#include "softmmu_template.h"
+
+#undef USE_EXTENDED_HELPER
+#endif  /* CONFIG_QEMU_LDST_OPTIMIZATION */
+
 /* try to fill the TLB and return an exception if error. If retaddr is
    NULL, it means that the function was called in C code (i.e. not
    from generated code or from helper.c) */
diff --git a/target-arm/op_helper.c b/target-arm/op_helper.c
index 490111c..8ff1209 100644
--- a/target-arm/op_helper.c
+++ b/target-arm/op_helper.c
@@ -69,6 +69,29 @@ uint32_t HELPER(neon_tbl)(uint32_t ireg, uint32_t def,
 #define SHIFT 3
 #include "softmmu_template.h"
 
+
+#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
+/* Exteneded MMU helper funtions for qemu_ld/st optimization
+   Note that normal helper functions should be defined above
+   to avoid duplication of common functions, slow_ld/st and io_read/write.
+ */
+#define USE_EXTENDED_HELPER
+
+#define SHIFT 0
+#include "softmmu_template.h"
+
+#define SHIFT 1
+#include "softmmu_template.h"
+
+#define SHIFT 2
+#include "softmmu_template.h"
+
+#define SHIFT 3
+#include "softmmu_template.h"
+
+#undef USE_EXTENDED_HELPER
+#endif  /* CONFIG_QEMU_LDST_OPTIMIZATION */
+
 /* try to fill the TLB and return an exception if error. If retaddr is
    NULL, it means that the function was called in C code (i.e. not
    from generated code or from helper.c) */
diff --git a/target-cris/op_helper.c b/target-cris/op_helper.c
index ac7c98c..cc34f3c 100644
--- a/target-cris/op_helper.c
+++ b/target-cris/op_helper.c
@@ -52,6 +52,28 @@
 #define SHIFT 3
 #include "softmmu_template.h"
 
+#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
+/* Exteneded MMU helper funtions for qemu_ld/st optimization
+   Note that normal helper functions should be defined above
+   to avoid duplication of common functions, slow_ld/st and io_read/write.
+ */
+#define USE_EXTENDED_HELPER
+
+#define SHIFT 0
+#include "softmmu_template.h"
+
+#define SHIFT 1
+#include "softmmu_template.h"
+
+#define SHIFT 2
+#include "softmmu_template.h"
+
+#define SHIFT 3
+#include "softmmu_template.h"
+
+#undef USE_EXTENDED_HELPER
+#endif  /* CONFIG_QEMU_LDST_OPTIMIZATION */
+
 /* Try to fill the TLB and return an exception if error. If retaddr is
    NULL, it means that the function was called in C code (i.e. not
    from generated code or from helper.c) */
diff --git a/target-i386/mem_helper.c b/target-i386/mem_helper.c
index 91353c0..59fb03c 100644
--- a/target-i386/mem_helper.c
+++ b/target-i386/mem_helper.c
@@ -126,6 +126,28 @@ void helper_boundl(target_ulong a0, int v)
 #define SHIFT 3
 #include "softmmu_template.h"
 
+#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
+/* Exteneded MMU helper funtions for qemu_ld/st optimization
+   Note that normal helper functions should be defined above
+   to avoid duplication of common functions, slow_ld/st and io_read/write.
+ */
+#define USE_EXTENDED_HELPER
+
+#define SHIFT 0
+#include "softmmu_template.h"
+
+#define SHIFT 1
+#include "softmmu_template.h"
+
+#define SHIFT 2
+#include "softmmu_template.h"
+
+#define SHIFT 3
+#include "softmmu_template.h"
+
+#undef USE_EXTENDED_HELPER
+#endif  /* CONFIG_QEMU_LDST_OPTIMIZATION */
+
 #endif
 
 #if !defined(CONFIG_USER_ONLY)
diff --git a/target-lm32/op_helper.c b/target-lm32/op_helper.c
index 51edc1a..bcdddf7 100644
--- a/target-lm32/op_helper.c
+++ b/target-lm32/op_helper.c
@@ -18,6 +18,28 @@
 #define SHIFT 3
 #include "softmmu_template.h"
 
+#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
+/* Exteneded MMU helper funtions for qemu_ld/st optimization
+   Note that normal helper functions should be defined above
+   to avoid duplication of common functions, slow_ld/st and io_read/write.
+ */
+#define USE_EXTENDED_HELPER
+
+#define SHIFT 0
+#include "softmmu_template.h"
+
+#define SHIFT 1
+#include "softmmu_template.h"
+
+#define SHIFT 2
+#include "softmmu_template.h"
+
+#define SHIFT 3
+#include "softmmu_template.h"
+
+#undef USE_EXTENDED_HELPER
+#endif  /* CONFIG_QEMU_LDST_OPTIMIZATION */
+
 void helper_raise_exception(uint32_t index)
 {
     env->exception_index = index;
@@ -101,4 +123,3 @@ void tlb_fill(CPULM32State *env1, target_ulong addr, int is_write, int mmu_idx,
     env = saved_env;
 }
 #endif
-
diff --git a/target-m68k/op_helper.c b/target-m68k/op_helper.c
index 1971a57..454fd71 100644
--- a/target-m68k/op_helper.c
+++ b/target-m68k/op_helper.c
@@ -51,6 +51,28 @@ extern int semihosting_enabled;
 #define SHIFT 3
 #include "softmmu_template.h"
 
+#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
+/* Exteneded MMU helper funtions for qemu_ld/st optimization
+   Note that normal helper functions should be defined above
+   to avoid duplication of common functions, slow_ld/st and io_read/write.
+ */
+#define USE_EXTENDED_HELPER
+
+#define SHIFT 0
+#include "softmmu_template.h"
+
+#define SHIFT 1
+#include "softmmu_template.h"
+
+#define SHIFT 2
+#include "softmmu_template.h"
+
+#define SHIFT 3
+#include "softmmu_template.h"
+
+#undef USE_EXTENDED_HELPER
+#endif  /* CONFIG_QEMU_LDST_OPTIMIZATION */
+
 /* Try to fill the TLB and return an exception if error. If retaddr is
    NULL, it means that the function was called in C code (i.e. not
    from generated code or from helper.c) */
diff --git a/target-microblaze/op_helper.c b/target-microblaze/op_helper.c
index 3b1f072..12be3bd 100644
--- a/target-microblaze/op_helper.c
+++ b/target-microblaze/op_helper.c
@@ -39,6 +39,28 @@
 #define SHIFT 3
 #include "softmmu_template.h"
 
+#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
+/* Exteneded MMU helper funtions for qemu_ld/st optimization
+   Note that normal helper functions should be defined above
+   to avoid duplication of common functions, slow_ld/st and io_read/write.
+ */
+#define USE_EXTENDED_HELPER
+
+#define SHIFT 0
+#include "softmmu_template.h"
+
+#define SHIFT 1
+#include "softmmu_template.h"
+
+#define SHIFT 2
+#include "softmmu_template.h"
+
+#define SHIFT 3
+#include "softmmu_template.h"
+
+#undef USE_EXTENDED_HELPER
+#endif  /* CONFIG_QEMU_LDST_OPTIMIZATION */
+
 /* Try to fill the TLB and return an exception if error. If retaddr is
    NULL, it means that the function was called in C code (i.e. not
    from generated code or from helper.c) */
diff --git a/target-mips/op_helper.c b/target-mips/op_helper.c
index 66037ac..c2fd2db 100644
--- a/target-mips/op_helper.c
+++ b/target-mips/op_helper.c
@@ -2303,6 +2303,28 @@ static void QEMU_NORETURN do_unaligned_access(target_ulong addr, int is_write,
 #define SHIFT 3
 #include "softmmu_template.h"
 
+#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
+/* Exteneded MMU helper funtions for qemu_ld/st optimization
+   Note that normal helper functions should be defined above
+   to avoid duplication of common functions, slow_ld/st and io_read/write.
+ */
+#define USE_EXTENDED_HELPER
+
+#define SHIFT 0
+#include "softmmu_template.h"
+
+#define SHIFT 1
+#include "softmmu_template.h"
+
+#define SHIFT 2
+#include "softmmu_template.h"
+
+#define SHIFT 3
+#include "softmmu_template.h"
+
+#undef USE_EXTENDED_HELPER
+#endif  /* CONFIG_QEMU_LDST_OPTIMIZATION */
+
 static void do_unaligned_access(target_ulong addr, int is_write,
                                 int is_user, uintptr_t retaddr)
 {
diff --git a/target-ppc/mem_helper.c b/target-ppc/mem_helper.c
index 5b5f1bd..01a9ac6 100644
--- a/target-ppc/mem_helper.c
+++ b/target-ppc/mem_helper.c
@@ -268,6 +268,28 @@ STVE(stvewx, cpu_stl_data, bswap32, u32)
 #define SHIFT 3
 #include "softmmu_template.h"
 
+#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
+/* Exteneded MMU helper funtions for qemu_ld/st optimization
+   Note that normal helper functions should be defined above
+   to avoid duplication of common functions, slow_ld/st and io_read/write.
+ */
+#define USE_EXTENDED_HELPER
+
+#define SHIFT 0
+#include "softmmu_template.h"
+
+#define SHIFT 1
+#include "softmmu_template.h"
+
+#define SHIFT 2
+#include "softmmu_template.h"
+
+#define SHIFT 3
+#include "softmmu_template.h"
+
+#undef USE_EXTENDED_HELPER
+#endif  /* CONFIG_QEMU_LDST_OPTIMIZATION */
+
 /* try to fill the TLB and return an exception if error. If retaddr is
    NULL, it means that the function was called in C code (i.e. not
    from generated code or from helper.c) */
diff --git a/target-s390x/op_helper.c b/target-s390x/op_helper.c
index 7b72473..05109c5 100644
--- a/target-s390x/op_helper.c
+++ b/target-s390x/op_helper.c
@@ -52,6 +52,28 @@
 #define SHIFT 3
 #include "softmmu_template.h"
 
+#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
+/* Exteneded MMU helper funtions for qemu_ld/st optimization
+   Note that normal helper functions should be defined above
+   to avoid duplication of common functions, slow_ld/st and io_read/write.
+ */
+#define USE_EXTENDED_HELPER
+
+#define SHIFT 0
+#include "softmmu_template.h"
+
+#define SHIFT 1
+#include "softmmu_template.h"
+
+#define SHIFT 2
+#include "softmmu_template.h"
+
+#define SHIFT 3
+#include "softmmu_template.h"
+
+#undef USE_EXTENDED_HELPER
+#endif  /* CONFIG_QEMU_LDST_OPTIMIZATION */
+
 /* try to fill the TLB and return an exception if error. If retaddr is
    NULL, it means that the function was called in C code (i.e. not
    from generated code or from helper.c) */
diff --git a/target-sh4/op_helper.c b/target-sh4/op_helper.c
index 4054791..f7c1f67 100644
--- a/target-sh4/op_helper.c
+++ b/target-sh4/op_helper.c
@@ -53,6 +53,28 @@ static void cpu_restore_state_from_retaddr(uintptr_t retaddr)
 #define SHIFT 3
 #include "softmmu_template.h"
 
+#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
+/* Exteneded MMU helper funtions for qemu_ld/st optimization
+   Note that normal helper functions should be defined above
+   to avoid duplication of common functions, slow_ld/st and io_read/write.
+ */
+#define USE_EXTENDED_HELPER
+
+#define SHIFT 0
+#include "softmmu_template.h"
+
+#define SHIFT 1
+#include "softmmu_template.h"
+
+#define SHIFT 2
+#include "softmmu_template.h"
+
+#define SHIFT 3
+#include "softmmu_template.h"
+
+#undef USE_EXTENDED_HELPER
+#endif  /* CONFIG_QEMU_LDST_OPTIMIZATION */
+
 void tlb_fill(CPUSH4State *env1, target_ulong addr, int is_write, int mmu_idx,
               uintptr_t retaddr)
 {
diff --git a/target-sparc/ldst_helper.c b/target-sparc/ldst_helper.c
index 9bec7a9..1ceb588 100644
--- a/target-sparc/ldst_helper.c
+++ b/target-sparc/ldst_helper.c
@@ -80,6 +80,29 @@
 
 #define SHIFT 3
 #include "softmmu_template.h"
+
+#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
+/* Exteneded MMU helper funtions for qemu_ld/st optimization
+   Note that normal helper functions should be defined above
+   to avoid duplication of common functions, slow_ld/st and io_read/write.
+ */
+#define USE_EXTENDED_HELPER
+
+#define SHIFT 0
+#include "softmmu_template.h"
+
+#define SHIFT 1
+#include "softmmu_template.h"
+
+#define SHIFT 2
+#include "softmmu_template.h"
+
+#define SHIFT 3
+#include "softmmu_template.h"
+
+#undef USE_EXTENDED_HELPER
+#endif  /* CONFIG_QEMU_LDST_OPTIMIZATION */
+
 #endif
 
 #if defined(TARGET_SPARC64) && !defined(CONFIG_USER_ONLY)
diff --git a/target-xtensa/op_helper.c b/target-xtensa/op_helper.c
index 2659c0e..536e1fb 100644
--- a/target-xtensa/op_helper.c
+++ b/target-xtensa/op_helper.c
@@ -47,6 +47,28 @@ static void do_unaligned_access(CPUXtensaState *env,
 #define SHIFT 3
 #include "softmmu_template.h"
 
+#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
+/* Exteneded MMU helper funtions for qemu_ld/st optimization
+   Note that normal helper functions should be defined above
+   to avoid duplication of common functions, slow_ld/st and io_read/write.
+ */
+#define USE_EXTENDED_HELPER
+
+#define SHIFT 0
+#include "softmmu_template.h"
+
+#define SHIFT 1
+#include "softmmu_template.h"
+
+#define SHIFT 2
+#include "softmmu_template.h"
+
+#define SHIFT 3
+#include "softmmu_template.h"
+
+#undef USE_EXTENDED_HELPER
+#endif  /* CONFIG_QEMU_LDST_OPTIMIZATION */
+
 static void do_restore_state(CPUXtensaState *env, uintptr_t pc)
 {
     TranslationBlock *tb;
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [RFC][PATCH v2 3/4] tcg: add optimized TCG qemu_ld/st generation
  2012-07-05 13:23 [Qemu-devel] [RFC][PATCH v2 0/4] tcg: enhance code generation quality for qemu_ld/st IRs Yeongkyoon Lee
  2012-07-05 13:23 ` [Qemu-devel] [RFC][PATCH v2 1/4] tcg: add declarations and templates of extended MMU helpers Yeongkyoon Lee
  2012-07-05 13:23 ` [Qemu-devel] [RFC][PATCH v2 2/4] tcg: add extended MMU helpers to softmmu targets Yeongkyoon Lee
@ 2012-07-05 13:23 ` Yeongkyoon Lee
  2012-07-05 14:04   ` Peter Maydell
  2012-07-05 13:23 ` [Qemu-devel] [RFC][PATCH v2 4/4] configure: add CONFIG_QEMU_LDST_OPTIMIZATION for TCG qemu_ld/st optimization Yeongkyoon Lee
  2012-07-10  9:12 ` [Qemu-devel] [RFC][PATCH v2 0/4] tcg: enhance code generation quality for qemu_ld/st IRs Yeongkyoon Lee
  4 siblings, 1 reply; 22+ messages in thread
From: Yeongkyoon Lee @ 2012-07-05 13:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: laurent.desnogues, chenwj, e.voevodin, Yeongkyoon Lee

Add optimized TCG qemu_ld/st generation which generates the code for TLB miss case handling at the end of TB after generating other IRs.

Signed-off-by: Yeongkyoon Lee <yeongkyoon.lee@samsung.com>
---
 tcg/i386/tcg-target.c |  328 +++++++++++++++++++++++++++++++++++++++++++++++++
 tcg/tcg.c             |   12 ++
 tcg/tcg.h             |   35 +++++
 3 files changed, 375 insertions(+), 0 deletions(-)

diff --git a/tcg/i386/tcg-target.c b/tcg/i386/tcg-target.c
index da17bba..3f2f640 100644
--- a/tcg/i386/tcg-target.c
+++ b/tcg/i386/tcg-target.c
@@ -984,6 +984,8 @@ static const void *qemu_st_helpers[4] = {
     helper_stq_mmu,
 };
 #else
+
+#ifndef CONFIG_QEMU_LDST_OPTIMIZATION
 /* legacy helper signature: __ld_mmu(target_ulong addr, int
    mmu_idx) */
 static void *qemu_ld_helpers[4] = {
@@ -1001,6 +1003,35 @@ static void *qemu_st_helpers[4] = {
     __stl_mmu,
     __stq_mmu,
 };
+#else
+/* extended legacy helper signature: __ext_ld_mmu(target_ulong addr, int
+   mmu_idx, uintptr raddr) */
+static void *qemu_ld_helpers[4] = {
+    __ext_ldb_mmu,
+    __ext_ldw_mmu,
+    __ext_ldl_mmu,
+    __ext_ldq_mmu,
+};
+
+/* extended legacy helper signature: __ext_st_mmu(target_ulong addr, uintxx_t val,
+   int mmu_idx) */
+static void *qemu_st_helpers[4] = {
+    __ext_stb_mmu,
+    __ext_stw_mmu,
+    __ext_stl_mmu,
+    __ext_stq_mmu,
+};
+
+static void add_qemu_ldst_label(TCGContext *s,
+                                int opc_ext,
+                                int data_reg,
+                                int data_reg2,
+                                int addrlo_reg,
+                                int addrhi_reg,
+                                int mem_index,
+                                uint8_t *raddr,
+                                uint8_t **label_ptr);
+#endif  /* !CONFIG_QEMU_LDST_OPTIMIZATION */
 #endif
 
 /* Perform the TLB load and compare.
@@ -1061,19 +1092,36 @@ static inline void tcg_out_tlb_load(TCGContext *s, int addrlo_idx,
 
     tcg_out_mov(s, type, r0, addrlo);
 
+#ifdef CONFIG_QEMU_LDST_OPTIMIZATION
+    /* jne slow_path */
+    tcg_out_opc(s, OPC_JCC_long + JCC_JNE, 0, 0, 0);
+    if (!label_ptr) {
+        tcg_abort();
+    }
+    label_ptr[0] = s->code_ptr;
+    s->code_ptr += 4;
+#else
     /* jne label1 */
     tcg_out8(s, OPC_JCC_short + JCC_JNE);
     label_ptr[0] = s->code_ptr;
     s->code_ptr++;
+#endif
 
     if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
         /* cmp 4(r1), addrhi */
         tcg_out_modrm_offset(s, OPC_CMP_GvEv, args[addrlo_idx+1], r1, 4);
 
+#ifdef CONFIG_QEMU_LDST_OPTIMIZATION
+        /* jne slow_path */
+        tcg_out_opc(s, OPC_JCC_long + JCC_JNE, 0, 0, 0);
+        label_ptr[1] = s->code_ptr;
+        s->code_ptr += 4;
+#else
         /* jne label1 */
         tcg_out8(s, OPC_JCC_short + JCC_JNE);
         label_ptr[1] = s->code_ptr;
         s->code_ptr++;
+#endif
     }
 
     /* TLB Hit.  */
@@ -1171,11 +1219,13 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args,
     int addrlo_idx;
 #if defined(CONFIG_SOFTMMU)
     int mem_index, s_bits;
+#if !defined(CONFIG_QEMU_LDST_OPTIMIZATION)
 #if TCG_TARGET_REG_BITS == 64
     int arg_idx;
 #else
     int stack_adjust;
 #endif
+#endif  /* !CONFIG_QEMU_LDST_OPTIMIZATION */
     uint8_t *label_ptr[3];
 #endif
 
@@ -1197,6 +1247,18 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args,
     tcg_out_qemu_ld_direct(s, data_reg, data_reg2,
                            tcg_target_call_iarg_regs[0], 0, opc);
 
+#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
+    /* helper stub will be jumped back here */
+    add_qemu_ldst_label(s,
+                        opc,
+                        data_reg,
+                        data_reg2,
+                        args[addrlo_idx],
+                        args[addrlo_idx + 1],
+                        mem_index,
+                        s->code_ptr,
+                        label_ptr);
+#else
     /* jmp label2 */
     tcg_out8(s, OPC_JMP_short);
     label_ptr[2] = s->code_ptr;
@@ -1292,6 +1354,7 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args,
 
     /* label2: */
     *label_ptr[2] = s->code_ptr - label_ptr[2] - 1;
+#endif  /* CONFIG_QEMU_LDST_OPTIMIZATION */
 #else
     {
         int32_t offset = GUEST_BASE;
@@ -1385,7 +1448,9 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args,
     int addrlo_idx;
 #if defined(CONFIG_SOFTMMU)
     int mem_index, s_bits;
+#if !defined(CONFIG_QEMU_LDST_OPTIMIZATION)
     int stack_adjust;
+#endif
     uint8_t *label_ptr[3];
 #endif
 
@@ -1407,6 +1472,18 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args,
     tcg_out_qemu_st_direct(s, data_reg, data_reg2,
                            tcg_target_call_iarg_regs[0], 0, opc);
 
+#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
+    /* helper stub will be jumped back here */
+    add_qemu_ldst_label(s,
+                        opc | HL_ST_MASK,
+                        data_reg,
+                        data_reg2,
+                        args[addrlo_idx],
+                        args[addrlo_idx + 1],
+                        mem_index,
+                        s->code_ptr,
+                        label_ptr);
+#else
     /* jmp label2 */
     tcg_out8(s, OPC_JMP_short);
     label_ptr[2] = s->code_ptr;
@@ -1469,6 +1546,7 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args,
 
     /* label2: */
     *label_ptr[2] = s->code_ptr - label_ptr[2] - 1;
+#endif  /* CONFIG_QEMU_LDST_OPTIMIZATION */
 #else
     {
         int32_t offset = GUEST_BASE;
@@ -1496,6 +1574,256 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args,
 #endif
 }
 
+#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
+/* optimization to reduce jump overheads for qemu_ld/st IRs */
+
+/*
+ * qemu_ld/st code generator call add_qemu_ldst_label,
+ * so that slow case(TLB miss or I/O rw) is handled at the end of TB
+ */
+static void add_qemu_ldst_label(TCGContext *s,
+                                int opc_ext,
+                                int data_reg,
+                                int data_reg2,
+                                int addrlo_reg,
+                                int addrhi_reg,
+                                int mem_index,
+                                uint8_t *raddr,
+                                uint8_t **label_ptr)
+{
+    int idx;
+    TCGLabelQemuLdst *label;
+
+    if (s->nb_qemu_ldst_labels >= TCG_MAX_QEMU_LDST)
+        tcg_abort();
+
+    idx = s->nb_qemu_ldst_labels++;
+    label = (TCGLabelQemuLdst *)&s->qemu_ldst_labels[idx];
+    label->opc_ext = opc_ext;
+    label->datalo_reg = data_reg;
+    label->datahi_reg = data_reg2;
+    label->addrlo_reg = addrlo_reg;
+    label->addrhi_reg = addrhi_reg;
+    label->mem_index = mem_index;
+    label->raddr = raddr;
+    if (!label_ptr) {
+        tcg_abort();
+    }
+    label->label_ptr[0] = label_ptr[0];
+    label->label_ptr[1] = label_ptr[1];
+}
+
+/* generates slow case of qemu_ld at the end of TB */
+static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *label)
+{
+    int s_bits;
+    int opc = label->opc_ext & HL_OPC_MASK;
+    int mem_index = label->mem_index;
+#if TCG_TARGET_REG_BITS == 64
+    int arg_idx;
+#else
+    int stack_adjust;
+    int addrlo_reg = label->addrlo_reg;
+    int addrhi_reg = label->addrhi_reg;
+#endif
+    int data_reg = label->datalo_reg;
+    int data_reg2 = label->datahi_reg;
+    uint8_t *raddr = label->raddr;
+    uint8_t **label_ptr = &label->label_ptr[0];
+
+    s_bits = opc & 3;
+
+    /* resolve label address */
+    *(uint32_t *)label_ptr[0] = (uint32_t)(s->code_ptr - label_ptr[0] - 4);
+    if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
+        *(uint32_t *)label_ptr[1] = (uint32_t)(s->code_ptr - label_ptr[1] - 4);
+    }
+
+    /* extended helper signature: __ext_ld_mmu(target_ulong addr, int mmu_idx,
+       uintptr_t raddr) */
+#if TCG_TARGET_REG_BITS == 32
+    tcg_out_pushi(s, (uintptr_t)(raddr - 1)); /* return address */
+    stack_adjust = 4;
+    tcg_out_pushi(s, mem_index);        /* mmu index */
+    stack_adjust += 4;
+    if (TARGET_LONG_BITS == 64) {
+        tcg_out_push(s, addrhi_reg);
+        stack_adjust += 4;
+    }
+    tcg_out_push(s, addrlo_reg); /* guest addr */
+    stack_adjust += 4;
+#ifdef CONFIG_TCG_PASS_AREG0
+    tcg_out_push(s, TCG_AREG0);
+    stack_adjust += 4;
+#endif
+#else
+    /* The first argument is already loaded with addrlo.  */
+    arg_idx = 1;
+    tcg_out_movi(s, TCG_TYPE_I32, tcg_target_call_iarg_regs[arg_idx++],
+                 mem_index);
+    tcg_out_movi(s, TCG_TYPE_I32, tcg_target_call_iarg_regs[arg_idx++],
+                 (uintptr_t)(raddr - 1));
+#ifdef CONFIG_TCG_PASS_AREG0
+    /* XXX/FIXME: suboptimal */
+    tcg_out_mov(s, TCG_TYPE_I64, tcg_target_call_iarg_regs[3],
+                tcg_target_call_iarg_regs[2]);
+    tcg_out_mov(s, TCG_TYPE_I64, tcg_target_call_iarg_regs[2],
+                tcg_target_call_iarg_regs[1]);
+    tcg_out_mov(s, TCG_TYPE_I64, tcg_target_call_iarg_regs[1],
+                tcg_target_call_iarg_regs[0]);
+    tcg_out_mov(s, TCG_TYPE_I64, tcg_target_call_iarg_regs[0],
+                TCG_AREG0);
+#endif
+#endif
+
+    tcg_out_calli(s, (tcg_target_long)qemu_ld_helpers[s_bits]);
+
+#if TCG_TARGET_REG_BITS == 32
+    if (stack_adjust == (TCG_TARGET_REG_BITS / 8)) {
+        /* Pop and discard.  This is 2 bytes smaller than the add.  */
+        tcg_out_pop(s, TCG_REG_ECX);
+    } else if (stack_adjust != 0) {
+        tcg_out_addi(s, TCG_REG_CALL_STACK, stack_adjust);
+    }
+#endif
+
+    switch(opc) {
+    case 0 | 4:
+        tcg_out_ext8s(s, data_reg, TCG_REG_EAX, P_REXW);
+        break;
+    case 1 | 4:
+        tcg_out_ext16s(s, data_reg, TCG_REG_EAX, P_REXW);
+        break;
+    case 0:
+        tcg_out_ext8u(s, data_reg, TCG_REG_EAX);
+        break;
+    case 1:
+        tcg_out_ext16u(s, data_reg, TCG_REG_EAX);
+        break;
+    case 2:
+        tcg_out_mov(s, TCG_TYPE_I32, data_reg, TCG_REG_EAX);
+        break;
+#if TCG_TARGET_REG_BITS == 64
+    case 2 | 4:
+        tcg_out_ext32s(s, data_reg, TCG_REG_EAX);
+        break;
+#endif
+    case 3:
+        if (TCG_TARGET_REG_BITS == 64) {
+            tcg_out_mov(s, TCG_TYPE_I64, data_reg, TCG_REG_RAX);
+        } else if (data_reg == TCG_REG_EDX) {
+            /* xchg %edx, %eax */
+            tcg_out_opc(s, OPC_XCHG_ax_r32 + TCG_REG_EDX, 0, 0, 0);
+            tcg_out_mov(s, TCG_TYPE_I32, data_reg2, TCG_REG_EAX);
+        } else {
+            tcg_out_mov(s, TCG_TYPE_I32, data_reg, TCG_REG_EAX);
+            tcg_out_mov(s, TCG_TYPE_I32, data_reg2, TCG_REG_EDX);
+        }
+        break;
+    default:
+        tcg_abort();
+    }
+
+    /* jump back to original code */
+    tcg_out_jmp(s, (tcg_target_long) raddr);
+}
+
+/* generates slow case of qemu_st at the end of TB */
+static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *label)
+{
+    int s_bits;
+    int stack_adjust;
+    int opc = label->opc_ext & HL_OPC_MASK;
+    int mem_index = label->mem_index;
+    int data_reg = label->datalo_reg;
+#if TCG_TARGET_REG_BITS == 32
+    int data_reg2 = label->datahi_reg;
+    int addrlo_reg = label->addrlo_reg;
+    int addrhi_reg = label->addrhi_reg;
+#endif
+    uint8_t *raddr = label->raddr;
+    uint8_t **label_ptr = &label->label_ptr[0];
+
+    s_bits = opc & 3;
+
+    /* resolve label address */
+    *(uint32_t *)label_ptr[0] = (uint32_t)(s->code_ptr - label_ptr[0] - 4);
+    if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
+        *(uint32_t *)label_ptr[1] = (uint32_t)(s->code_ptr - label_ptr[1] - 4);
+    }
+
+    /* extended helper signature: __ext_st_mmu(target_ulong addr, uintxx_t val,
+       int mmu_idx, uintptr_t raddr) */
+#if TCG_TARGET_REG_BITS == 32
+    tcg_out_pushi(s, (uintptr_t)(raddr - 1)); /* return address */
+    stack_adjust = 4;
+    tcg_out_pushi(s, mem_index); /* mmu index */
+    stack_adjust += 4;
+    if (opc == 3) {
+        tcg_out_push(s, data_reg2);
+        stack_adjust += 4;
+    }
+    tcg_out_push(s, data_reg);   /* guest data */
+    stack_adjust += 4;
+    if (TARGET_LONG_BITS == 64) {
+        tcg_out_push(s, addrhi_reg);
+        stack_adjust += 4;
+    }
+    tcg_out_push(s, addrlo_reg); /* guest addr */
+    stack_adjust += 4;
+#ifdef CONFIG_TCG_PASS_AREG0
+    tcg_out_push(s, TCG_AREG0);
+    stack_adjust += 4;
+#endif
+#else
+    tcg_out_mov(s, (opc == 3 ? TCG_TYPE_I64 : TCG_TYPE_I32),
+                tcg_target_call_iarg_regs[1], data_reg);
+    tcg_out_movi(s, TCG_TYPE_I32, tcg_target_call_iarg_regs[2], mem_index);
+    tcg_out_movi(s, TCG_TYPE_I64, tcg_target_call_iarg_regs[3], (uintptr_t)(raddr - 1));
+    stack_adjust = 0;
+#ifdef CONFIG_TCG_PASS_AREG0
+    /* XXX/FIXME: suboptimal */
+    tcg_out_mov(s, TCG_TYPE_I64, tcg_target_call_iarg_regs[3],
+                tcg_target_call_iarg_regs[2]);
+    tcg_out_mov(s, TCG_TYPE_I64, tcg_target_call_iarg_regs[2],
+                tcg_target_call_iarg_regs[1]);
+    tcg_out_mov(s, TCG_TYPE_I64, tcg_target_call_iarg_regs[1],
+                tcg_target_call_iarg_regs[0]);
+    tcg_out_mov(s, TCG_TYPE_I64, tcg_target_call_iarg_regs[0],
+                TCG_AREG0);
+#endif
+#endif
+
+    tcg_out_calli(s, (tcg_target_long)qemu_st_helpers[s_bits]);
+
+    if (stack_adjust == (TCG_TARGET_REG_BITS / 8)) {
+        /* Pop and discard.  This is 2 bytes smaller than the add.  */
+        tcg_out_pop(s, TCG_REG_ECX);
+    } else if (stack_adjust != 0) {
+        tcg_out_addi(s, TCG_REG_CALL_STACK, stack_adjust);
+    }
+
+    /* jump back to original code */
+    tcg_out_jmp(s, (tcg_target_long) raddr);
+}
+
+/* generates all of the slow cases of qemu_ld/st at the end of TB */
+void tcg_out_qemu_ldst_slow_path(TCGContext *s)
+{
+    int i;
+    TCGLabelQemuLdst *label;
+
+    for (i = 0; i < s->nb_qemu_ldst_labels; i++) {
+        label = (TCGLabelQemuLdst *)&s->qemu_ldst_labels[i];
+        if (IS_QEMU_LD_LABEL(label)) {
+            tcg_out_qemu_ld_slow_path(s, label);
+        } else {
+            tcg_out_qemu_st_slow_path(s, label);
+        }
+    }
+}
+#endif  /* CONFIG_QEMU_LDST_OPTIMIZATION */
+
 static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
                               const TCGArg *args, const int *const_args)
 {
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 8386b70..8009069 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -301,6 +301,14 @@ void tcg_func_start(TCGContext *s)
 
     gen_opc_ptr = gen_opc_buf;
     gen_opparam_ptr = gen_opparam_buf;
+#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
+    /* initialize qemu_ld/st labels which help to generate TLB miss case codes at the end of TB */
+    s->qemu_ldst_labels = tcg_malloc(sizeof(TCGLabelQemuLdst) * TCG_MAX_QEMU_LDST);
+    if (!s->qemu_ldst_labels) {
+        tcg_abort();
+    }
+    s->nb_qemu_ldst_labels = 0;
+#endif
 }
 
 static inline void tcg_temp_alloc(TCGContext *s, int n)
@@ -2169,6 +2177,10 @@ static inline int tcg_gen_code_common(TCGContext *s, uint8_t *gen_code_buf,
 #endif
     }
  the_end:
+#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
+    /* Generate MMU call helpers at the end of block (currently only for qemu_ld/st) */
+    tcg_out_qemu_ldst_slow_path(s);
+#endif
     return -1;
 }
 
diff --git a/tcg/tcg.h b/tcg/tcg.h
index d710694..b174cdb 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -187,6 +187,29 @@ typedef tcg_target_ulong TCGArg;
    are aliases for target_ulong and host pointer sized values respectively.
  */
 
+#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
+/* Macros and structures for qemu_ld/st IR code optimization:
+   It looks good for TCG_MAX_HELPER_LABELS to be half of OPC_BUF_SIZE in exec-all.h. */
+#define TCG_MAX_QEMU_LDST       320
+#define HL_LDST_SHIFT           4
+#define HL_LDST_MASK            (1 << HL_LDST_SHIFT)
+#define HL_ST_MASK              HL_LDST_MASK
+#define HL_OPC_MASK             (HL_LDST_MASK - 1)
+#define IS_QEMU_LD_LABEL(L)     (!((L)->opc_ext & HL_LDST_MASK))
+#define IS_QEMU_ST_LABEL(L)     ((L)->opc_ext & HL_LDST_MASK)
+
+typedef struct TCGLabelQemuLdst {
+    int opc_ext;                /* | 27bit (reserved) | 1bit (ld/st flag) | 4bit (opc) | */
+    int addrlo_reg;             /* reg index for the low word of guest virtual address */
+    int addrhi_reg;             /* reg index for the high word of guest virtual address */
+    int datalo_reg;             /* reg index for the low word to be loaded or to be stored */
+    int datahi_reg;             /* reg index for the high word to be loaded or to be stored */
+    int mem_index;              /* soft MMU memory index */
+    uint8_t *raddr;             /* return address (located end of TB) */
+    uint8_t *label_ptr[2];      /* label pointers to be updated */
+} TCGLabelQemuLdst;
+#endif  /* CONFIG_QEMU_LDST_OPTIMIZATION */
+
 #ifdef CONFIG_DEBUG_TCG
 #define DEBUG_TCGV 1
 #endif
@@ -389,6 +412,13 @@ struct TCGContext {
 #ifdef CONFIG_DEBUG_TCG
     int temps_in_use;
 #endif
+
+#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
+    /* labels info for qemu_ld/st IRs
+       The labels help to generate TLB miss case codes at the end of TB */
+    TCGLabelQemuLdst *qemu_ldst_labels;
+    int nb_qemu_ldst_labels;
+#endif
 };
 
 extern TCGContext tcg_ctx;
@@ -588,3 +618,8 @@ extern uint8_t code_gen_prologue[];
 #endif
 
 void tcg_register_jit(void *buf, size_t buf_size);
+
+#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
+/* qemu_ld/st generation at the end of TB */
+void tcg_out_qemu_ldst_slow_path(TCGContext *s);
+#endif
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [RFC][PATCH v2 4/4] configure: add CONFIG_QEMU_LDST_OPTIMIZATION for TCG qemu_ld/st optimization
  2012-07-05 13:23 [Qemu-devel] [RFC][PATCH v2 0/4] tcg: enhance code generation quality for qemu_ld/st IRs Yeongkyoon Lee
                   ` (2 preceding siblings ...)
  2012-07-05 13:23 ` [Qemu-devel] [RFC][PATCH v2 3/4] tcg: add optimized TCG qemu_ld/st generation Yeongkyoon Lee
@ 2012-07-05 13:23 ` Yeongkyoon Lee
  2012-07-05 13:55   ` Andreas Färber
  2012-07-05 14:06   ` Peter Maydell
  2012-07-10  9:12 ` [Qemu-devel] [RFC][PATCH v2 0/4] tcg: enhance code generation quality for qemu_ld/st IRs Yeongkyoon Lee
  4 siblings, 2 replies; 22+ messages in thread
From: Yeongkyoon Lee @ 2012-07-05 13:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: laurent.desnogues, chenwj, e.voevodin, Yeongkyoon Lee

Add an option "--enable-ldst-optimization" to enable CONFIG_QEMU_LDST_OPTIMIZATION macro for TCG qemu_ld/st optimization. It only works with CONFIG_SOFTMMU and doesn't work with CONFIG_TCG_PASS_AREG0.

Signed-off-by: Yeongkyoon Lee <yeongkyoon.lee@samsung.com>
---
 configure |   15 +++++++++++++++
 1 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/configure b/configure
index 9f071b7..2b364cc 100755
--- a/configure
+++ b/configure
@@ -171,6 +171,7 @@ bsd="no"
 linux="no"
 solaris="no"
 profiler="no"
+ldst_optimization="no"
 cocoa="no"
 softmmu="yes"
 linux_user="no"
@@ -714,6 +715,8 @@ for opt do
   ;;
   --enable-profiler) profiler="yes"
   ;;
+  --enable-ldst-optimization) ldst_optimization="yes"
+  ;;
   --disable-cocoa) cocoa="no"
   ;;
   --enable-cocoa)
@@ -3463,6 +3466,11 @@ echo "EXESUF=$EXESUF" >> $config_host_mak
 echo "LIBS_QGA+=$libs_qga" >> $config_host_mak
 echo "POD2MAN=$POD2MAN" >> $config_host_mak
 
+if [ "$ldst_optimization" = "yes" -a "$cpu" != "i386" -a "$cpu" != "x86_64" ] ; then
+  echo "ERROR: qemu_ld/st optimization is only available on i386 or x86_64 hosts"
+  exit 1
+fi
+
 # generate list of library paths for linker script
 
 $ld --verbose -v 2> /dev/null | grep SEARCH_DIR > ${config_host_ld}
@@ -3696,11 +3704,18 @@ fi
 symlink "$source_path/Makefile.target" "$target_dir/Makefile"
 
 
+target_ldst_optimization="$ldst_optimization"
+
 case "$target_arch2" in
   alpha | sparc* | xtensa* | ppc*)
     echo "CONFIG_TCG_PASS_AREG0=y" >> $config_target_mak
+    # qemu_ld/st optimization is not available with CONFIG_TCG_PASS_AREG0
+    target_ldst_optimization="no"
   ;;
 esac
+if [ "$target_ldst_optimization" = "yes" -a "$target_softmmu" = "yes" ] ; then
+    echo "CONFIG_QEMU_LDST_OPTIMIZATION=y" >> $config_target_mak
+fi
 
 echo "TARGET_SHORT_ALIGNMENT=$target_short_alignment" >> $config_target_mak
 echo "TARGET_INT_ALIGNMENT=$target_int_alignment" >> $config_target_mak
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH v2 1/4] tcg: add declarations and templates of extended MMU helpers
  2012-07-05 13:23 ` [Qemu-devel] [RFC][PATCH v2 1/4] tcg: add declarations and templates of extended MMU helpers Yeongkyoon Lee
@ 2012-07-05 13:40   ` Peter Maydell
  2012-07-06 10:30     ` Yeongkyoon Lee
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Maydell @ 2012-07-05 13:40 UTC (permalink / raw)
  To: Yeongkyoon Lee; +Cc: laurent.desnogues, e.voevodin, qemu-devel, chenwj

On 5 July 2012 14:23, Yeongkyoon Lee <yeongkyoon.lee@samsung.com> wrote:
> Add declarations and templates of extended MMU helpers which can take return address argument to what helper functions return. These extended helper functions are called only by generated code.

It's not entirely clear from this description what the
return address argument actually is.

Also, please line wrap your commit messages.

>
> Signed-off-by: Yeongkyoon Lee <yeongkyoon.lee@samsung.com>
> ---
>  softmmu_defs.h     |   13 +++++++++++++
>  softmmu_template.h |   51 +++++++++++++++++++++++++++++++++++++++++----------
>  2 files changed, 54 insertions(+), 10 deletions(-)
>
> diff --git a/softmmu_defs.h b/softmmu_defs.h
> index 8d59f9d..728c51b 100644
> --- a/softmmu_defs.h
> +++ b/softmmu_defs.h
> @@ -19,6 +19,19 @@ void __stl_mmu(target_ulong addr, uint32_t val, int mmu_idx);
>  uint64_t __ldq_mmu(target_ulong addr, int mmu_idx);
>  void __stq_mmu(target_ulong addr, uint64_t val, int mmu_idx);
>
> +#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
> +/* Extended versions of MMU helpers for qemu_ld/st optimization.
> +   They get return address arguments because the caller PCs are not where helpers return to. */

This is >80 characters ; please use checkpatch.pl.

> +uint8_t __ext_ldb_mmu(target_ulong addr, int mmu_idx, uintptr_t ra);

'__' is a prefix reserved for the system. I know we have existing usage
of it, but I think it would be better to avoid adding new uses.

> +void __ext_stb_mmu(target_ulong addr, uint8_t val, int mmu_idx, uintptr_t ra);
> +uint16_t __ext_ldw_mmu(target_ulong addr, int mmu_idx, uintptr_t ra);
> +void __ext_stw_mmu(target_ulong addr, uint16_t val, int mmu_idx, uintptr_t ra);
> +uint32_t __ext_ldl_mmu(target_ulong addr, int mmu_idx, uintptr_t ra);
> +void __ext_stl_mmu(target_ulong addr, uint32_t val, int mmu_idx, uintptr_t ra);
> +uint64_t __ext_ldq_mmu(target_ulong addr, int mmu_idx, uintptr_t ra);
> +void __ext_stq_mmu(target_ulong addr, uint64_t val, int mmu_idx, uintptr_t ra);
> +#endif  /* CONFIG_QEMU_LDST_OPTIMIZATION */
> +
>  uint8_t __ldb_cmmu(target_ulong addr, int mmu_idx);
>  void __stb_cmmu(target_ulong addr, uint8_t val, int mmu_idx);
>  uint16_t __ldw_cmmu(target_ulong addr, int mmu_idx);
> diff --git a/softmmu_template.h b/softmmu_template.h
> index b8bd700..d549800 100644
> --- a/softmmu_template.h
> +++ b/softmmu_template.h
> @@ -66,6 +66,25 @@
>  #define HELPER_PREFIX helper_
>  #endif
>
> +#ifdef USE_EXTENDED_HELPER
> +/* Exteneded helper funtions have one more argument of address
> +   to which pc is returned after setting TLB entry */

"Extended". "functions".

> +#if !defined(CONFIG_QEMU_LDST_OPTIMIZATION) || defined(CONFIG_TCG_PASS_AREG0)
> +#error You need CONFIG_QEMU_LDST_OPTIMIZATION or to disable CONFIG_TCG_PASS_AREG0!
> +#endif
> +#undef HELPER_PREFIX
> +#define HELPER_PREFIX __ext_
> +#define RET_PARAM , uintptr_t raddr
> +#define RET_VAR raddr
> +#define GET_RET_ADDR() RET_VAR
> +#else
> +#define RET_PARAM
> +#define RET_VAR
> +#define GET_RET_ADDR() GETPC()
> +#endif  /* USE_EXTENDED_HELPER */
> +
> +
> +#ifndef USE_EXTENDED_HELPER
>  static DATA_TYPE glue(glue(slow_ld, SUFFIX), MMUSUFFIX)(ENV_PARAM
>                                                          target_ulong addr,
>                                                          int mmu_idx,
> @@ -101,12 +120,14 @@ static inline DATA_TYPE glue(io_read, SUFFIX)(ENV_PARAM
>  #endif /* SHIFT > 2 */
>      return res;
>  }
> +#endif  /* !USE_EXTENDED_HELPER */
>
>  /* handle all cases except unaligned access which span two pages */
>  DATA_TYPE
>  glue(glue(glue(HELPER_PREFIX, ld), SUFFIX), MMUSUFFIX)(ENV_PARAM
>                                                         target_ulong addr,
> -                                                       int mmu_idx)
> +                                                       int mmu_idx
> +                                                       RET_PARAM)
>  {
>      DATA_TYPE res;
>      int index;
> @@ -124,13 +145,13 @@ glue(glue(glue(HELPER_PREFIX, ld), SUFFIX), MMUSUFFIX)(ENV_PARAM
>              /* IO access */
>              if ((addr & (DATA_SIZE - 1)) != 0)
>                  goto do_unaligned_access;
> -            retaddr = GETPC();
> +            retaddr = GET_RET_ADDR();
>              ioaddr = env->iotlb[mmu_idx][index];
>              res = glue(io_read, SUFFIX)(ENV_VAR ioaddr, addr, retaddr);
>          } else if (((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1) >= TARGET_PAGE_SIZE) {
>              /* slow unaligned access (it spans two pages or IO) */
>          do_unaligned_access:
> -            retaddr = GETPC();
> +            retaddr = GET_RET_ADDR();
>  #ifdef ALIGNED_ONLY
>              do_unaligned_access(ENV_VAR addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
>  #endif
> @@ -141,7 +162,7 @@ glue(glue(glue(HELPER_PREFIX, ld), SUFFIX), MMUSUFFIX)(ENV_PARAM
>              uintptr_t addend;
>  #ifdef ALIGNED_ONLY
>              if ((addr & (DATA_SIZE - 1)) != 0) {
> -                retaddr = GETPC();
> +                retaddr = GET_RET_ADDR();
>                  do_unaligned_access(ENV_VAR addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
>              }
>  #endif
> @@ -151,7 +172,7 @@ glue(glue(glue(HELPER_PREFIX, ld), SUFFIX), MMUSUFFIX)(ENV_PARAM
>          }
>      } else {
>          /* the page is not in the TLB : fill it */
> -        retaddr = GETPC();
> +        retaddr = GET_RET_ADDR();
>  #ifdef ALIGNED_ONLY
>          if ((addr & (DATA_SIZE - 1)) != 0)
>              do_unaligned_access(ENV_VAR addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
> @@ -162,6 +183,7 @@ glue(glue(glue(HELPER_PREFIX, ld), SUFFIX), MMUSUFFIX)(ENV_PARAM
>      return res;
>  }
>
> +#ifndef USE_EXTENDED_HELPER
>  /* handle all unaligned cases */
>  static DATA_TYPE
>  glue(glue(slow_ld, SUFFIX), MMUSUFFIX)(ENV_PARAM
> @@ -213,9 +235,11 @@ glue(glue(slow_ld, SUFFIX), MMUSUFFIX)(ENV_PARAM
>      }
>      return res;
>  }
> +#endif  /* !USE_EXTENDED_HELPER */
>
>  #ifndef SOFTMMU_CODE_ACCESS
>
> +#ifndef USE_EXTENDED_HELPER
>  static void glue(glue(slow_st, SUFFIX), MMUSUFFIX)(ENV_PARAM
>                                                     target_ulong addr,
>                                                     DATA_TYPE val,
> @@ -252,11 +276,13 @@ static inline void glue(io_write, SUFFIX)(ENV_PARAM
>  #endif
>  #endif /* SHIFT > 2 */
>  }
> +#endif  /* !USE_EXTENDED_HELPER */
>
>  void glue(glue(glue(HELPER_PREFIX, st), SUFFIX), MMUSUFFIX)(ENV_PARAM
>                                                              target_ulong addr,
>                                                              DATA_TYPE val,
> -                                                            int mmu_idx)
> +                                                            int mmu_idx
> +                                                            RET_PARAM)
>  {
>      target_phys_addr_t ioaddr;
>      target_ulong tlb_addr;
> @@ -271,12 +297,12 @@ void glue(glue(glue(HELPER_PREFIX, st), SUFFIX), MMUSUFFIX)(ENV_PARAM
>              /* IO access */
>              if ((addr & (DATA_SIZE - 1)) != 0)
>                  goto do_unaligned_access;
> -            retaddr = GETPC();
> +            retaddr = GET_RET_ADDR();
>              ioaddr = env->iotlb[mmu_idx][index];
>              glue(io_write, SUFFIX)(ENV_VAR ioaddr, val, addr, retaddr);
>          } else if (((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1) >= TARGET_PAGE_SIZE) {
>          do_unaligned_access:
> -            retaddr = GETPC();
> +            retaddr = GET_RET_ADDR();
>  #ifdef ALIGNED_ONLY
>              do_unaligned_access(ENV_VAR addr, 1, mmu_idx, retaddr);
>  #endif
> @@ -287,7 +313,7 @@ void glue(glue(glue(HELPER_PREFIX, st), SUFFIX), MMUSUFFIX)(ENV_PARAM
>              uintptr_t addend;
>  #ifdef ALIGNED_ONLY
>              if ((addr & (DATA_SIZE - 1)) != 0) {
> -                retaddr = GETPC();
> +                retaddr = GET_RET_ADDR();
>                  do_unaligned_access(ENV_VAR addr, 1, mmu_idx, retaddr);
>              }
>  #endif
> @@ -297,7 +323,7 @@ void glue(glue(glue(HELPER_PREFIX, st), SUFFIX), MMUSUFFIX)(ENV_PARAM
>          }
>      } else {
>          /* the page is not in the TLB : fill it */
> -        retaddr = GETPC();
> +        retaddr = GET_RET_ADDR();
>  #ifdef ALIGNED_ONLY
>          if ((addr & (DATA_SIZE - 1)) != 0)
>              do_unaligned_access(ENV_VAR addr, 1, mmu_idx, retaddr);
> @@ -307,6 +333,7 @@ void glue(glue(glue(HELPER_PREFIX, st), SUFFIX), MMUSUFFIX)(ENV_PARAM
>      }
>  }
>
> +#ifndef USE_EXTENDED_HELPER
>  /* handles all unaligned cases */
>  static void glue(glue(slow_st, SUFFIX), MMUSUFFIX)(ENV_PARAM
>                                                     target_ulong addr,
> @@ -356,6 +383,7 @@ static void glue(glue(slow_st, SUFFIX), MMUSUFFIX)(ENV_PARAM
>          goto redo;
>      }
>  }
> +#endif  /* !USE_EXTENDED_HELPER */
>
>  #endif /* !defined(SOFTMMU_CODE_ACCESS) */
>
> @@ -370,3 +398,6 @@ static void glue(glue(slow_st, SUFFIX), MMUSUFFIX)(ENV_PARAM
>  #undef ENV_VAR
>  #undef CPU_PREFIX
>  #undef HELPER_PREFIX
> +#undef RET_PARAM
> +#undef RET_VAR
> +#undef GET_RET_ADDR
> --
> 1.7.4.1
>

-- PMM

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH v2 2/4] tcg: add extended MMU helpers to softmmu targets
  2012-07-05 13:23 ` [Qemu-devel] [RFC][PATCH v2 2/4] tcg: add extended MMU helpers to softmmu targets Yeongkyoon Lee
@ 2012-07-05 13:43   ` Peter Maydell
  2012-07-05 18:49     ` Blue Swirl
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Maydell @ 2012-07-05 13:43 UTC (permalink / raw)
  To: Yeongkyoon Lee; +Cc: laurent.desnogues, e.voevodin, qemu-devel, chenwj

On 5 July 2012 14:23, Yeongkyoon Lee <yeongkyoon.lee@samsung.com> wrote:
> Add extended MMU helpers to softmmu targets, where the targets are alpha, arm, cris, i386, lm32, m68k, microblaze, mips, ppc, s390x, sh4, sparc and xtensa.
>
> Signed-off-by: Yeongkyoon Lee <yeongkyoon.lee@samsung.com>
> ---
>  target-alpha/mem_helper.c     |   22 ++++++++++++++++++++++
>  target-arm/op_helper.c        |   23 +++++++++++++++++++++++
>  target-cris/op_helper.c       |   22 ++++++++++++++++++++++
>  target-i386/mem_helper.c      |   22 ++++++++++++++++++++++
>  target-lm32/op_helper.c       |   23 ++++++++++++++++++++++-
>  target-m68k/op_helper.c       |   22 ++++++++++++++++++++++
>  target-microblaze/op_helper.c |   22 ++++++++++++++++++++++
>  target-mips/op_helper.c       |   22 ++++++++++++++++++++++
>  target-ppc/mem_helper.c       |   22 ++++++++++++++++++++++
>  target-s390x/op_helper.c      |   22 ++++++++++++++++++++++
>  target-sh4/op_helper.c        |   22 ++++++++++++++++++++++
>  target-sparc/ldst_helper.c    |   23 +++++++++++++++++++++++
>  target-xtensa/op_helper.c     |   22 ++++++++++++++++++++++
>  13 files changed, 288 insertions(+), 1 deletions(-)

This makes the already slightly repetitive inclusion of the
softmmu_templates even more repetitive. Perhaps we could abstract
it all out into a single header which the targets can include?

>
> diff --git a/target-alpha/mem_helper.c b/target-alpha/mem_helper.c
> index 87cada4..ef880a7 100644
> --- a/target-alpha/mem_helper.c
> +++ b/target-alpha/mem_helper.c
> @@ -132,6 +132,28 @@ void cpu_unassigned_access(CPUAlphaState *env, target_phys_addr_t addr,
>  #define SHIFT 3
>  #include "softmmu_template.h"
>
> +#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
> +/* Exteneded MMU helper funtions for qemu_ld/st optimization

"Extended".

> +   Note that normal helper functions should be defined above
> +   to avoid duplication of common functions, slow_ld/st and io_read/write.
> + */
> +#define USE_EXTENDED_HELPER
> +
> +#define SHIFT 0
> +#include "softmmu_template.h"
> +
> +#define SHIFT 1
> +#include "softmmu_template.h"
> +
> +#define SHIFT 2
> +#include "softmmu_template.h"
> +
> +#define SHIFT 3
> +#include "softmmu_template.h"
> +
> +#undef USE_EXTENDED_HELPER
> +#endif  /* CONFIG_QEMU_LDST_OPTIMIZATION */
> +
>  /* try to fill the TLB and return an exception if error. If retaddr is
>     NULL, it means that the function was called in C code (i.e. not
>     from generated code or from helper.c) */
> diff --git a/target-arm/op_helper.c b/target-arm/op_helper.c
> index 490111c..8ff1209 100644
> --- a/target-arm/op_helper.c
> +++ b/target-arm/op_helper.c
> @@ -69,6 +69,29 @@ uint32_t HELPER(neon_tbl)(uint32_t ireg, uint32_t def,
>  #define SHIFT 3
>  #include "softmmu_template.h"
>
> +
> +#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
> +/* Exteneded MMU helper funtions for qemu_ld/st optimization
> +   Note that normal helper functions should be defined above
> +   to avoid duplication of common functions, slow_ld/st and io_read/write.
> + */
> +#define USE_EXTENDED_HELPER
> +
> +#define SHIFT 0
> +#include "softmmu_template.h"
> +
> +#define SHIFT 1
> +#include "softmmu_template.h"
> +
> +#define SHIFT 2
> +#include "softmmu_template.h"
> +
> +#define SHIFT 3
> +#include "softmmu_template.h"
> +
> +#undef USE_EXTENDED_HELPER
> +#endif  /* CONFIG_QEMU_LDST_OPTIMIZATION */
> +
>  /* try to fill the TLB and return an exception if error. If retaddr is
>     NULL, it means that the function was called in C code (i.e. not
>     from generated code or from helper.c) */
> diff --git a/target-cris/op_helper.c b/target-cris/op_helper.c
> index ac7c98c..cc34f3c 100644
> --- a/target-cris/op_helper.c
> +++ b/target-cris/op_helper.c
> @@ -52,6 +52,28 @@
>  #define SHIFT 3
>  #include "softmmu_template.h"
>
> +#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
> +/* Exteneded MMU helper funtions for qemu_ld/st optimization
> +   Note that normal helper functions should be defined above
> +   to avoid duplication of common functions, slow_ld/st and io_read/write.
> + */
> +#define USE_EXTENDED_HELPER
> +
> +#define SHIFT 0
> +#include "softmmu_template.h"
> +
> +#define SHIFT 1
> +#include "softmmu_template.h"
> +
> +#define SHIFT 2
> +#include "softmmu_template.h"
> +
> +#define SHIFT 3
> +#include "softmmu_template.h"
> +
> +#undef USE_EXTENDED_HELPER
> +#endif  /* CONFIG_QEMU_LDST_OPTIMIZATION */
> +
>  /* Try to fill the TLB and return an exception if error. If retaddr is
>     NULL, it means that the function was called in C code (i.e. not
>     from generated code or from helper.c) */
> diff --git a/target-i386/mem_helper.c b/target-i386/mem_helper.c
> index 91353c0..59fb03c 100644
> --- a/target-i386/mem_helper.c
> +++ b/target-i386/mem_helper.c
> @@ -126,6 +126,28 @@ void helper_boundl(target_ulong a0, int v)
>  #define SHIFT 3
>  #include "softmmu_template.h"
>
> +#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
> +/* Exteneded MMU helper funtions for qemu_ld/st optimization
> +   Note that normal helper functions should be defined above
> +   to avoid duplication of common functions, slow_ld/st and io_read/write.
> + */
> +#define USE_EXTENDED_HELPER
> +
> +#define SHIFT 0
> +#include "softmmu_template.h"
> +
> +#define SHIFT 1
> +#include "softmmu_template.h"
> +
> +#define SHIFT 2
> +#include "softmmu_template.h"
> +
> +#define SHIFT 3
> +#include "softmmu_template.h"
> +
> +#undef USE_EXTENDED_HELPER
> +#endif  /* CONFIG_QEMU_LDST_OPTIMIZATION */
> +
>  #endif
>
>  #if !defined(CONFIG_USER_ONLY)
> diff --git a/target-lm32/op_helper.c b/target-lm32/op_helper.c
> index 51edc1a..bcdddf7 100644
> --- a/target-lm32/op_helper.c
> +++ b/target-lm32/op_helper.c
> @@ -18,6 +18,28 @@
>  #define SHIFT 3
>  #include "softmmu_template.h"
>
> +#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
> +/* Exteneded MMU helper funtions for qemu_ld/st optimization
> +   Note that normal helper functions should be defined above
> +   to avoid duplication of common functions, slow_ld/st and io_read/write.
> + */
> +#define USE_EXTENDED_HELPER
> +
> +#define SHIFT 0
> +#include "softmmu_template.h"
> +
> +#define SHIFT 1
> +#include "softmmu_template.h"
> +
> +#define SHIFT 2
> +#include "softmmu_template.h"
> +
> +#define SHIFT 3
> +#include "softmmu_template.h"
> +
> +#undef USE_EXTENDED_HELPER
> +#endif  /* CONFIG_QEMU_LDST_OPTIMIZATION */
> +
>  void helper_raise_exception(uint32_t index)
>  {
>      env->exception_index = index;
> @@ -101,4 +123,3 @@ void tlb_fill(CPULM32State *env1, target_ulong addr, int is_write, int mmu_idx,
>      env = saved_env;
>  }
>  #endif
> -
> diff --git a/target-m68k/op_helper.c b/target-m68k/op_helper.c
> index 1971a57..454fd71 100644
> --- a/target-m68k/op_helper.c
> +++ b/target-m68k/op_helper.c
> @@ -51,6 +51,28 @@ extern int semihosting_enabled;
>  #define SHIFT 3
>  #include "softmmu_template.h"
>
> +#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
> +/* Exteneded MMU helper funtions for qemu_ld/st optimization
> +   Note that normal helper functions should be defined above
> +   to avoid duplication of common functions, slow_ld/st and io_read/write.
> + */
> +#define USE_EXTENDED_HELPER
> +
> +#define SHIFT 0
> +#include "softmmu_template.h"
> +
> +#define SHIFT 1
> +#include "softmmu_template.h"
> +
> +#define SHIFT 2
> +#include "softmmu_template.h"
> +
> +#define SHIFT 3
> +#include "softmmu_template.h"
> +
> +#undef USE_EXTENDED_HELPER
> +#endif  /* CONFIG_QEMU_LDST_OPTIMIZATION */
> +
>  /* Try to fill the TLB and return an exception if error. If retaddr is
>     NULL, it means that the function was called in C code (i.e. not
>     from generated code or from helper.c) */
> diff --git a/target-microblaze/op_helper.c b/target-microblaze/op_helper.c
> index 3b1f072..12be3bd 100644
> --- a/target-microblaze/op_helper.c
> +++ b/target-microblaze/op_helper.c
> @@ -39,6 +39,28 @@
>  #define SHIFT 3
>  #include "softmmu_template.h"
>
> +#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
> +/* Exteneded MMU helper funtions for qemu_ld/st optimization
> +   Note that normal helper functions should be defined above
> +   to avoid duplication of common functions, slow_ld/st and io_read/write.
> + */
> +#define USE_EXTENDED_HELPER
> +
> +#define SHIFT 0
> +#include "softmmu_template.h"
> +
> +#define SHIFT 1
> +#include "softmmu_template.h"
> +
> +#define SHIFT 2
> +#include "softmmu_template.h"
> +
> +#define SHIFT 3
> +#include "softmmu_template.h"
> +
> +#undef USE_EXTENDED_HELPER
> +#endif  /* CONFIG_QEMU_LDST_OPTIMIZATION */
> +
>  /* Try to fill the TLB and return an exception if error. If retaddr is
>     NULL, it means that the function was called in C code (i.e. not
>     from generated code or from helper.c) */
> diff --git a/target-mips/op_helper.c b/target-mips/op_helper.c
> index 66037ac..c2fd2db 100644
> --- a/target-mips/op_helper.c
> +++ b/target-mips/op_helper.c
> @@ -2303,6 +2303,28 @@ static void QEMU_NORETURN do_unaligned_access(target_ulong addr, int is_write,
>  #define SHIFT 3
>  #include "softmmu_template.h"
>
> +#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
> +/* Exteneded MMU helper funtions for qemu_ld/st optimization
> +   Note that normal helper functions should be defined above
> +   to avoid duplication of common functions, slow_ld/st and io_read/write.
> + */
> +#define USE_EXTENDED_HELPER
> +
> +#define SHIFT 0
> +#include "softmmu_template.h"
> +
> +#define SHIFT 1
> +#include "softmmu_template.h"
> +
> +#define SHIFT 2
> +#include "softmmu_template.h"
> +
> +#define SHIFT 3
> +#include "softmmu_template.h"
> +
> +#undef USE_EXTENDED_HELPER
> +#endif  /* CONFIG_QEMU_LDST_OPTIMIZATION */
> +
>  static void do_unaligned_access(target_ulong addr, int is_write,
>                                  int is_user, uintptr_t retaddr)
>  {
> diff --git a/target-ppc/mem_helper.c b/target-ppc/mem_helper.c
> index 5b5f1bd..01a9ac6 100644
> --- a/target-ppc/mem_helper.c
> +++ b/target-ppc/mem_helper.c
> @@ -268,6 +268,28 @@ STVE(stvewx, cpu_stl_data, bswap32, u32)
>  #define SHIFT 3
>  #include "softmmu_template.h"
>
> +#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
> +/* Exteneded MMU helper funtions for qemu_ld/st optimization
> +   Note that normal helper functions should be defined above
> +   to avoid duplication of common functions, slow_ld/st and io_read/write.
> + */
> +#define USE_EXTENDED_HELPER
> +
> +#define SHIFT 0
> +#include "softmmu_template.h"
> +
> +#define SHIFT 1
> +#include "softmmu_template.h"
> +
> +#define SHIFT 2
> +#include "softmmu_template.h"
> +
> +#define SHIFT 3
> +#include "softmmu_template.h"
> +
> +#undef USE_EXTENDED_HELPER
> +#endif  /* CONFIG_QEMU_LDST_OPTIMIZATION */
> +
>  /* try to fill the TLB and return an exception if error. If retaddr is
>     NULL, it means that the function was called in C code (i.e. not
>     from generated code or from helper.c) */
> diff --git a/target-s390x/op_helper.c b/target-s390x/op_helper.c
> index 7b72473..05109c5 100644
> --- a/target-s390x/op_helper.c
> +++ b/target-s390x/op_helper.c
> @@ -52,6 +52,28 @@
>  #define SHIFT 3
>  #include "softmmu_template.h"
>
> +#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
> +/* Exteneded MMU helper funtions for qemu_ld/st optimization
> +   Note that normal helper functions should be defined above
> +   to avoid duplication of common functions, slow_ld/st and io_read/write.
> + */
> +#define USE_EXTENDED_HELPER
> +
> +#define SHIFT 0
> +#include "softmmu_template.h"
> +
> +#define SHIFT 1
> +#include "softmmu_template.h"
> +
> +#define SHIFT 2
> +#include "softmmu_template.h"
> +
> +#define SHIFT 3
> +#include "softmmu_template.h"
> +
> +#undef USE_EXTENDED_HELPER
> +#endif  /* CONFIG_QEMU_LDST_OPTIMIZATION */
> +
>  /* try to fill the TLB and return an exception if error. If retaddr is
>     NULL, it means that the function was called in C code (i.e. not
>     from generated code or from helper.c) */
> diff --git a/target-sh4/op_helper.c b/target-sh4/op_helper.c
> index 4054791..f7c1f67 100644
> --- a/target-sh4/op_helper.c
> +++ b/target-sh4/op_helper.c
> @@ -53,6 +53,28 @@ static void cpu_restore_state_from_retaddr(uintptr_t retaddr)
>  #define SHIFT 3
>  #include "softmmu_template.h"
>
> +#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
> +/* Exteneded MMU helper funtions for qemu_ld/st optimization
> +   Note that normal helper functions should be defined above
> +   to avoid duplication of common functions, slow_ld/st and io_read/write.
> + */
> +#define USE_EXTENDED_HELPER
> +
> +#define SHIFT 0
> +#include "softmmu_template.h"
> +
> +#define SHIFT 1
> +#include "softmmu_template.h"
> +
> +#define SHIFT 2
> +#include "softmmu_template.h"
> +
> +#define SHIFT 3
> +#include "softmmu_template.h"
> +
> +#undef USE_EXTENDED_HELPER
> +#endif  /* CONFIG_QEMU_LDST_OPTIMIZATION */
> +
>  void tlb_fill(CPUSH4State *env1, target_ulong addr, int is_write, int mmu_idx,
>                uintptr_t retaddr)
>  {
> diff --git a/target-sparc/ldst_helper.c b/target-sparc/ldst_helper.c
> index 9bec7a9..1ceb588 100644
> --- a/target-sparc/ldst_helper.c
> +++ b/target-sparc/ldst_helper.c
> @@ -80,6 +80,29 @@
>
>  #define SHIFT 3
>  #include "softmmu_template.h"
> +
> +#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
> +/* Exteneded MMU helper funtions for qemu_ld/st optimization
> +   Note that normal helper functions should be defined above
> +   to avoid duplication of common functions, slow_ld/st and io_read/write.
> + */
> +#define USE_EXTENDED_HELPER
> +
> +#define SHIFT 0
> +#include "softmmu_template.h"
> +
> +#define SHIFT 1
> +#include "softmmu_template.h"
> +
> +#define SHIFT 2
> +#include "softmmu_template.h"
> +
> +#define SHIFT 3
> +#include "softmmu_template.h"
> +
> +#undef USE_EXTENDED_HELPER
> +#endif  /* CONFIG_QEMU_LDST_OPTIMIZATION */
> +
>  #endif
>
>  #if defined(TARGET_SPARC64) && !defined(CONFIG_USER_ONLY)
> diff --git a/target-xtensa/op_helper.c b/target-xtensa/op_helper.c
> index 2659c0e..536e1fb 100644
> --- a/target-xtensa/op_helper.c
> +++ b/target-xtensa/op_helper.c
> @@ -47,6 +47,28 @@ static void do_unaligned_access(CPUXtensaState *env,
>  #define SHIFT 3
>  #include "softmmu_template.h"
>
> +#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
> +/* Exteneded MMU helper funtions for qemu_ld/st optimization
> +   Note that normal helper functions should be defined above
> +   to avoid duplication of common functions, slow_ld/st and io_read/write.
> + */
> +#define USE_EXTENDED_HELPER
> +
> +#define SHIFT 0
> +#include "softmmu_template.h"
> +
> +#define SHIFT 1
> +#include "softmmu_template.h"
> +
> +#define SHIFT 2
> +#include "softmmu_template.h"
> +
> +#define SHIFT 3
> +#include "softmmu_template.h"
> +
> +#undef USE_EXTENDED_HELPER
> +#endif  /* CONFIG_QEMU_LDST_OPTIMIZATION */
> +
>  static void do_restore_state(CPUXtensaState *env, uintptr_t pc)
>  {
>      TranslationBlock *tb;
> --
> 1.7.4.1
>

-- PMM

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH v2 4/4] configure: add CONFIG_QEMU_LDST_OPTIMIZATION for TCG qemu_ld/st optimization
  2012-07-05 13:23 ` [Qemu-devel] [RFC][PATCH v2 4/4] configure: add CONFIG_QEMU_LDST_OPTIMIZATION for TCG qemu_ld/st optimization Yeongkyoon Lee
@ 2012-07-05 13:55   ` Andreas Färber
  2012-07-06  3:13     ` Evgeny Voevodin
  2012-07-05 14:06   ` Peter Maydell
  1 sibling, 1 reply; 22+ messages in thread
From: Andreas Färber @ 2012-07-05 13:55 UTC (permalink / raw)
  To: Yeongkyoon Lee
  Cc: Peter Maydell, chenwj, e.voevodin, qemu-devel, Blue Swirl,
	laurent.desnogues

Am 05.07.2012 15:23, schrieb Yeongkyoon Lee:
> Add an option "--enable-ldst-optimization" to enable CONFIG_QEMU_LDST_OPTIMIZATION macro for TCG qemu_ld/st optimization. It only works with CONFIG_SOFTMMU and doesn't work with CONFIG_TCG_PASS_AREG0.
> 
> Signed-off-by: Yeongkyoon Lee <yeongkyoon.lee@samsung.com>
> ---
>  configure |   15 +++++++++++++++
>  1 files changed, 15 insertions(+), 0 deletions(-)
> 
> diff --git a/configure b/configure
> index 9f071b7..2b364cc 100755
> --- a/configure
> +++ b/configure
[...]
> @@ -3463,6 +3466,11 @@ echo "EXESUF=$EXESUF" >> $config_host_mak
>  echo "LIBS_QGA+=$libs_qga" >> $config_host_mak
>  echo "POD2MAN=$POD2MAN" >> $config_host_mak
>  
> +if [ "$ldst_optimization" = "yes" -a "$cpu" != "i386" -a "$cpu" != "x86_64" ] ; then
> +  echo "ERROR: qemu_ld/st optimization is only available on i386 or x86_64 hosts"
> +  exit 1
> +fi
[snip]

I assume that Samsung is interested in optimizing the Exynos emulation.
I think there was already a patchset posted converting target-arm to
CONFIG_PASS_TCG_AREG0, only with some slowdowns to be investigated...
What is the obstacle for supporting AREG0 mode in your optimization?

Regards,
Andreas

> +
>  # generate list of library paths for linker script
>  
>  $ld --verbose -v 2> /dev/null | grep SEARCH_DIR > ${config_host_ld}
> @@ -3696,11 +3704,18 @@ fi
>  symlink "$source_path/Makefile.target" "$target_dir/Makefile"
>  
>  
> +target_ldst_optimization="$ldst_optimization"
> +
>  case "$target_arch2" in
>    alpha | sparc* | xtensa* | ppc*)
>      echo "CONFIG_TCG_PASS_AREG0=y" >> $config_target_mak
> +    # qemu_ld/st optimization is not available with CONFIG_TCG_PASS_AREG0
> +    target_ldst_optimization="no"
>    ;;
>  esac
> +if [ "$target_ldst_optimization" = "yes" -a "$target_softmmu" = "yes" ] ; then
> +    echo "CONFIG_QEMU_LDST_OPTIMIZATION=y" >> $config_target_mak
> +fi
>  
>  echo "TARGET_SHORT_ALIGNMENT=$target_short_alignment" >> $config_target_mak
>  echo "TARGET_INT_ALIGNMENT=$target_int_alignment" >> $config_target_mak

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH v2 3/4] tcg: add optimized TCG qemu_ld/st generation
  2012-07-05 13:23 ` [Qemu-devel] [RFC][PATCH v2 3/4] tcg: add optimized TCG qemu_ld/st generation Yeongkyoon Lee
@ 2012-07-05 14:04   ` Peter Maydell
  2012-07-06 11:20     ` Yeongkyoon Lee
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Maydell @ 2012-07-05 14:04 UTC (permalink / raw)
  To: Yeongkyoon Lee; +Cc: laurent.desnogues, e.voevodin, qemu-devel, chenwj

On 5 July 2012 14:23, Yeongkyoon Lee <yeongkyoon.lee@samsung.com> wrote:
> Add optimized TCG qemu_ld/st generation which generates the code for TLB miss case handling at the end of TB after generating other IRs.
>
> Signed-off-by: Yeongkyoon Lee <yeongkyoon.lee@samsung.com>
> ---
>  tcg/i386/tcg-target.c |  328 +++++++++++++++++++++++++++++++++++++++++++++++++
>  tcg/tcg.c             |   12 ++
>  tcg/tcg.h             |   35 +++++
>  3 files changed, 375 insertions(+), 0 deletions(-)
>
> diff --git a/tcg/i386/tcg-target.c b/tcg/i386/tcg-target.c
> index da17bba..3f2f640 100644
> --- a/tcg/i386/tcg-target.c
> +++ b/tcg/i386/tcg-target.c
> @@ -984,6 +984,8 @@ static const void *qemu_st_helpers[4] = {
>      helper_stq_mmu,
>  };
>  #else
> +
> +#ifndef CONFIG_QEMU_LDST_OPTIMIZATION
>  /* legacy helper signature: __ld_mmu(target_ulong addr, int
>     mmu_idx) */
>  static void *qemu_ld_helpers[4] = {
> @@ -1001,6 +1003,35 @@ static void *qemu_st_helpers[4] = {
>      __stl_mmu,
>      __stq_mmu,
>  };
> +#else

Is it really worth having this as a CONFIG_ switch? If we think
it's better to do this out of line we should just switch to
always generating the out of line code, I think. There's not much
point in retaining the old code path if it's disabled -- it will
just bitrot.

> +/* extended legacy helper signature: __ext_ld_mmu(target_ulong addr, int
> +   mmu_idx, uintptr raddr) */
> +static void *qemu_ld_helpers[4] = {
> +    __ext_ldb_mmu,
> +    __ext_ldw_mmu,
> +    __ext_ldl_mmu,
> +    __ext_ldq_mmu,
> +};
> +
> +/* extended legacy helper signature: __ext_st_mmu(target_ulong addr, uintxx_t val,
> +   int mmu_idx) */
> +static void *qemu_st_helpers[4] = {
> +    __ext_stb_mmu,
> +    __ext_stw_mmu,
> +    __ext_stl_mmu,
> +    __ext_stq_mmu,
> +};
> +
> +static void add_qemu_ldst_label(TCGContext *s,
> +                                int opc_ext,
> +                                int data_reg,
> +                                int data_reg2,
> +                                int addrlo_reg,
> +                                int addrhi_reg,
> +                                int mem_index,
> +                                uint8_t *raddr,
> +                                uint8_t **label_ptr);
> +#endif  /* !CONFIG_QEMU_LDST_OPTIMIZATION */
>  #endif
>
>  /* Perform the TLB load and compare.
> @@ -1061,19 +1092,36 @@ static inline void tcg_out_tlb_load(TCGContext *s, int addrlo_idx,
>
>      tcg_out_mov(s, type, r0, addrlo);
>
> +#ifdef CONFIG_QEMU_LDST_OPTIMIZATION
> +    /* jne slow_path */
> +    tcg_out_opc(s, OPC_JCC_long + JCC_JNE, 0, 0, 0);
> +    if (!label_ptr) {
> +        tcg_abort();
> +    }

There's no point in this check and abort -- label_ptr will always be
non-NULL (it would be an internal error if it wasn't), and if it is
by some future bug NULL, we'll just crash on the next line, which is
just as good. The existing code didn't feel the need to make this
check, we don't need to do it in the new code.

> +    label_ptr[0] = s->code_ptr;
> +    s->code_ptr += 4;
> +#else
>      /* jne label1 */
>      tcg_out8(s, OPC_JCC_short + JCC_JNE);
>      label_ptr[0] = s->code_ptr;
>      s->code_ptr++;
> +#endif
>
>      if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
>          /* cmp 4(r1), addrhi */
>          tcg_out_modrm_offset(s, OPC_CMP_GvEv, args[addrlo_idx+1], r1, 4);
>
> +#ifdef CONFIG_QEMU_LDST_OPTIMIZATION
> +        /* jne slow_path */
> +        tcg_out_opc(s, OPC_JCC_long + JCC_JNE, 0, 0, 0);
> +        label_ptr[1] = s->code_ptr;
> +        s->code_ptr += 4;
> +#else
>          /* jne label1 */
>          tcg_out8(s, OPC_JCC_short + JCC_JNE);
>          label_ptr[1] = s->code_ptr;
>          s->code_ptr++;
> +#endif
>      }
>
>      /* TLB Hit.  */
> @@ -1171,11 +1219,13 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args,
>      int addrlo_idx;
>  #if defined(CONFIG_SOFTMMU)
>      int mem_index, s_bits;
> +#if !defined(CONFIG_QEMU_LDST_OPTIMIZATION)
>  #if TCG_TARGET_REG_BITS == 64
>      int arg_idx;
>  #else
>      int stack_adjust;
>  #endif
> +#endif  /* !CONFIG_QEMU_LDST_OPTIMIZATION */
>      uint8_t *label_ptr[3];
>  #endif
>
> @@ -1197,6 +1247,18 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args,
>      tcg_out_qemu_ld_direct(s, data_reg, data_reg2,
>                             tcg_target_call_iarg_regs[0], 0, opc);
>
> +#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
> +    /* helper stub will be jumped back here */

"will jump back here".

> +    add_qemu_ldst_label(s,
> +                        opc,
> +                        data_reg,
> +                        data_reg2,
> +                        args[addrlo_idx],
> +                        args[addrlo_idx + 1],
> +                        mem_index,
> +                        s->code_ptr,
> +                        label_ptr);
> +#else
>      /* jmp label2 */
>      tcg_out8(s, OPC_JMP_short);
>      label_ptr[2] = s->code_ptr;
> @@ -1292,6 +1354,7 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args,
>
>      /* label2: */
>      *label_ptr[2] = s->code_ptr - label_ptr[2] - 1;
> +#endif  /* CONFIG_QEMU_LDST_OPTIMIZATION */
>  #else
>      {
>          int32_t offset = GUEST_BASE;
> @@ -1385,7 +1448,9 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args,
>      int addrlo_idx;
>  #if defined(CONFIG_SOFTMMU)
>      int mem_index, s_bits;
> +#if !defined(CONFIG_QEMU_LDST_OPTIMIZATION)
>      int stack_adjust;
> +#endif
>      uint8_t *label_ptr[3];
>  #endif
>
> @@ -1407,6 +1472,18 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args,
>      tcg_out_qemu_st_direct(s, data_reg, data_reg2,
>                             tcg_target_call_iarg_regs[0], 0, opc);
>
> +#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
> +    /* helper stub will be jumped back here */

ditto.

> +    add_qemu_ldst_label(s,
> +                        opc | HL_ST_MASK,
> +                        data_reg,
> +                        data_reg2,
> +                        args[addrlo_idx],
> +                        args[addrlo_idx + 1],
> +                        mem_index,
> +                        s->code_ptr,
> +                        label_ptr);
> +#else
>      /* jmp label2 */
>      tcg_out8(s, OPC_JMP_short);
>      label_ptr[2] = s->code_ptr;
> @@ -1469,6 +1546,7 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args,
>
>      /* label2: */
>      *label_ptr[2] = s->code_ptr - label_ptr[2] - 1;
> +#endif  /* CONFIG_QEMU_LDST_OPTIMIZATION */
>  #else
>      {
>          int32_t offset = GUEST_BASE;
> @@ -1496,6 +1574,256 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args,
>  #endif
>  }
>
> +#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
> +/* optimization to reduce jump overheads for qemu_ld/st IRs */
> +
> +/*
> + * qemu_ld/st code generator call add_qemu_ldst_label,
> + * so that slow case(TLB miss or I/O rw) is handled at the end of TB
> + */

This comment isn't really describing the purpose of this function,
which is something more along the lines of "Record the context of
a call to the out of line helper code for the slow path for a
load or store, so that we can later generate the correct helper
code".

> +static void add_qemu_ldst_label(TCGContext *s,
> +                                int opc_ext,
> +                                int data_reg,
> +                                int data_reg2,
> +                                int addrlo_reg,
> +                                int addrhi_reg,
> +                                int mem_index,
> +                                uint8_t *raddr,
> +                                uint8_t **label_ptr)
> +{
> +    int idx;
> +    TCGLabelQemuLdst *label;
> +
> +    if (s->nb_qemu_ldst_labels >= TCG_MAX_QEMU_LDST)
> +        tcg_abort();

QEMU coding style requires braces. Please use checkpatch.pl.

> +
> +    idx = s->nb_qemu_ldst_labels++;
> +    label = (TCGLabelQemuLdst *)&s->qemu_ldst_labels[idx];
> +    label->opc_ext = opc_ext;
> +    label->datalo_reg = data_reg;
> +    label->datahi_reg = data_reg2;
> +    label->addrlo_reg = addrlo_reg;
> +    label->addrhi_reg = addrhi_reg;
> +    label->mem_index = mem_index;
> +    label->raddr = raddr;
> +    if (!label_ptr) {
> +        tcg_abort();
> +    }

Another pointless abort.

> +    label->label_ptr[0] = label_ptr[0];
> +    label->label_ptr[1] = label_ptr[1];
> +}
> +
> +/* generates slow case of qemu_ld at the end of TB */
> +static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *label)
> +{
> +    int s_bits;
> +    int opc = label->opc_ext & HL_OPC_MASK;
> +    int mem_index = label->mem_index;
> +#if TCG_TARGET_REG_BITS == 64
> +    int arg_idx;
> +#else
> +    int stack_adjust;
> +    int addrlo_reg = label->addrlo_reg;
> +    int addrhi_reg = label->addrhi_reg;
> +#endif
> +    int data_reg = label->datalo_reg;
> +    int data_reg2 = label->datahi_reg;
> +    uint8_t *raddr = label->raddr;
> +    uint8_t **label_ptr = &label->label_ptr[0];
> +
> +    s_bits = opc & 3;
> +
> +    /* resolve label address */
> +    *(uint32_t *)label_ptr[0] = (uint32_t)(s->code_ptr - label_ptr[0] - 4);
> +    if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
> +        *(uint32_t *)label_ptr[1] = (uint32_t)(s->code_ptr - label_ptr[1] - 4);
> +    }
> +
> +    /* extended helper signature: __ext_ld_mmu(target_ulong addr, int mmu_idx,
> +       uintptr_t raddr) */
> +#if TCG_TARGET_REG_BITS == 32
> +    tcg_out_pushi(s, (uintptr_t)(raddr - 1)); /* return address */
> +    stack_adjust = 4;
> +    tcg_out_pushi(s, mem_index);        /* mmu index */
> +    stack_adjust += 4;
> +    if (TARGET_LONG_BITS == 64) {
> +        tcg_out_push(s, addrhi_reg);
> +        stack_adjust += 4;
> +    }
> +    tcg_out_push(s, addrlo_reg); /* guest addr */
> +    stack_adjust += 4;
> +#ifdef CONFIG_TCG_PASS_AREG0
> +    tcg_out_push(s, TCG_AREG0);
> +    stack_adjust += 4;
> +#endif
> +#else
> +    /* The first argument is already loaded with addrlo.  */
> +    arg_idx = 1;
> +    tcg_out_movi(s, TCG_TYPE_I32, tcg_target_call_iarg_regs[arg_idx++],
> +                 mem_index);
> +    tcg_out_movi(s, TCG_TYPE_I32, tcg_target_call_iarg_regs[arg_idx++],
> +                 (uintptr_t)(raddr - 1));
> +#ifdef CONFIG_TCG_PASS_AREG0
> +    /* XXX/FIXME: suboptimal */
> +    tcg_out_mov(s, TCG_TYPE_I64, tcg_target_call_iarg_regs[3],
> +                tcg_target_call_iarg_regs[2]);
> +    tcg_out_mov(s, TCG_TYPE_I64, tcg_target_call_iarg_regs[2],
> +                tcg_target_call_iarg_regs[1]);
> +    tcg_out_mov(s, TCG_TYPE_I64, tcg_target_call_iarg_regs[1],
> +                tcg_target_call_iarg_regs[0]);
> +    tcg_out_mov(s, TCG_TYPE_I64, tcg_target_call_iarg_regs[0],
> +                TCG_AREG0);
> +#endif
> +#endif
> +
> +    tcg_out_calli(s, (tcg_target_long)qemu_ld_helpers[s_bits]);
> +
> +#if TCG_TARGET_REG_BITS == 32
> +    if (stack_adjust == (TCG_TARGET_REG_BITS / 8)) {
> +        /* Pop and discard.  This is 2 bytes smaller than the add.  */
> +        tcg_out_pop(s, TCG_REG_ECX);
> +    } else if (stack_adjust != 0) {
> +        tcg_out_addi(s, TCG_REG_CALL_STACK, stack_adjust);
> +    }
> +#endif
> +
> +    switch(opc) {
> +    case 0 | 4:
> +        tcg_out_ext8s(s, data_reg, TCG_REG_EAX, P_REXW);
> +        break;
> +    case 1 | 4:
> +        tcg_out_ext16s(s, data_reg, TCG_REG_EAX, P_REXW);
> +        break;
> +    case 0:
> +        tcg_out_ext8u(s, data_reg, TCG_REG_EAX);
> +        break;
> +    case 1:
> +        tcg_out_ext16u(s, data_reg, TCG_REG_EAX);
> +        break;
> +    case 2:
> +        tcg_out_mov(s, TCG_TYPE_I32, data_reg, TCG_REG_EAX);
> +        break;
> +#if TCG_TARGET_REG_BITS == 64
> +    case 2 | 4:
> +        tcg_out_ext32s(s, data_reg, TCG_REG_EAX);
> +        break;
> +#endif
> +    case 3:
> +        if (TCG_TARGET_REG_BITS == 64) {
> +            tcg_out_mov(s, TCG_TYPE_I64, data_reg, TCG_REG_RAX);
> +        } else if (data_reg == TCG_REG_EDX) {
> +            /* xchg %edx, %eax */
> +            tcg_out_opc(s, OPC_XCHG_ax_r32 + TCG_REG_EDX, 0, 0, 0);
> +            tcg_out_mov(s, TCG_TYPE_I32, data_reg2, TCG_REG_EAX);
> +        } else {
> +            tcg_out_mov(s, TCG_TYPE_I32, data_reg, TCG_REG_EAX);
> +            tcg_out_mov(s, TCG_TYPE_I32, data_reg2, TCG_REG_EDX);
> +        }
> +        break;
> +    default:
> +        tcg_abort();
> +    }
> +
> +    /* jump back to original code */
> +    tcg_out_jmp(s, (tcg_target_long) raddr);
> +}
> +
> +/* generates slow case of qemu_st at the end of TB */
> +static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *label)
> +{
> +    int s_bits;
> +    int stack_adjust;
> +    int opc = label->opc_ext & HL_OPC_MASK;
> +    int mem_index = label->mem_index;
> +    int data_reg = label->datalo_reg;
> +#if TCG_TARGET_REG_BITS == 32
> +    int data_reg2 = label->datahi_reg;
> +    int addrlo_reg = label->addrlo_reg;
> +    int addrhi_reg = label->addrhi_reg;
> +#endif
> +    uint8_t *raddr = label->raddr;
> +    uint8_t **label_ptr = &label->label_ptr[0];
> +
> +    s_bits = opc & 3;
> +
> +    /* resolve label address */
> +    *(uint32_t *)label_ptr[0] = (uint32_t)(s->code_ptr - label_ptr[0] - 4);
> +    if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
> +        *(uint32_t *)label_ptr[1] = (uint32_t)(s->code_ptr - label_ptr[1] - 4);
> +    }
> +
> +    /* extended helper signature: __ext_st_mmu(target_ulong addr, uintxx_t val,
> +       int mmu_idx, uintptr_t raddr) */
> +#if TCG_TARGET_REG_BITS == 32
> +    tcg_out_pushi(s, (uintptr_t)(raddr - 1)); /* return address */
> +    stack_adjust = 4;
> +    tcg_out_pushi(s, mem_index); /* mmu index */
> +    stack_adjust += 4;
> +    if (opc == 3) {
> +        tcg_out_push(s, data_reg2);
> +        stack_adjust += 4;
> +    }
> +    tcg_out_push(s, data_reg);   /* guest data */
> +    stack_adjust += 4;
> +    if (TARGET_LONG_BITS == 64) {
> +        tcg_out_push(s, addrhi_reg);
> +        stack_adjust += 4;
> +    }
> +    tcg_out_push(s, addrlo_reg); /* guest addr */
> +    stack_adjust += 4;
> +#ifdef CONFIG_TCG_PASS_AREG0
> +    tcg_out_push(s, TCG_AREG0);
> +    stack_adjust += 4;
> +#endif
> +#else
> +    tcg_out_mov(s, (opc == 3 ? TCG_TYPE_I64 : TCG_TYPE_I32),
> +                tcg_target_call_iarg_regs[1], data_reg);
> +    tcg_out_movi(s, TCG_TYPE_I32, tcg_target_call_iarg_regs[2], mem_index);
> +    tcg_out_movi(s, TCG_TYPE_I64, tcg_target_call_iarg_regs[3], (uintptr_t)(raddr - 1));
> +    stack_adjust = 0;
> +#ifdef CONFIG_TCG_PASS_AREG0
> +    /* XXX/FIXME: suboptimal */
> +    tcg_out_mov(s, TCG_TYPE_I64, tcg_target_call_iarg_regs[3],
> +                tcg_target_call_iarg_regs[2]);
> +    tcg_out_mov(s, TCG_TYPE_I64, tcg_target_call_iarg_regs[2],
> +                tcg_target_call_iarg_regs[1]);
> +    tcg_out_mov(s, TCG_TYPE_I64, tcg_target_call_iarg_regs[1],
> +                tcg_target_call_iarg_regs[0]);
> +    tcg_out_mov(s, TCG_TYPE_I64, tcg_target_call_iarg_regs[0],
> +                TCG_AREG0);
> +#endif
> +#endif
> +
> +    tcg_out_calli(s, (tcg_target_long)qemu_st_helpers[s_bits]);
> +
> +    if (stack_adjust == (TCG_TARGET_REG_BITS / 8)) {
> +        /* Pop and discard.  This is 2 bytes smaller than the add.  */
> +        tcg_out_pop(s, TCG_REG_ECX);
> +    } else if (stack_adjust != 0) {
> +        tcg_out_addi(s, TCG_REG_CALL_STACK, stack_adjust);
> +    }
> +
> +    /* jump back to original code */
> +    tcg_out_jmp(s, (tcg_target_long) raddr);
> +}
> +
> +/* generates all of the slow cases of qemu_ld/st at the end of TB */
> +void tcg_out_qemu_ldst_slow_path(TCGContext *s)
> +{
> +    int i;
> +    TCGLabelQemuLdst *label;
> +
> +    for (i = 0; i < s->nb_qemu_ldst_labels; i++) {
> +        label = (TCGLabelQemuLdst *)&s->qemu_ldst_labels[i];
> +        if (IS_QEMU_LD_LABEL(label)) {
> +            tcg_out_qemu_ld_slow_path(s, label);
> +        } else {
> +            tcg_out_qemu_st_slow_path(s, label);
> +        }
> +    }
> +}
> +#endif  /* CONFIG_QEMU_LDST_OPTIMIZATION */
> +
>  static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
>                                const TCGArg *args, const int *const_args)
>  {
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index 8386b70..8009069 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -301,6 +301,14 @@ void tcg_func_start(TCGContext *s)
>
>      gen_opc_ptr = gen_opc_buf;
>      gen_opparam_ptr = gen_opparam_buf;
> +#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
> +    /* initialize qemu_ld/st labels which help to generate TLB miss case codes at the end of TB */
> +    s->qemu_ldst_labels = tcg_malloc(sizeof(TCGLabelQemuLdst) * TCG_MAX_QEMU_LDST);
> +    if (!s->qemu_ldst_labels) {
> +        tcg_abort();
> +    }

Unnecessary check -- tcg_malloc() can never return 0.

> +    s->nb_qemu_ldst_labels = 0;
> +#endif
>  }
>
>  static inline void tcg_temp_alloc(TCGContext *s, int n)
> @@ -2169,6 +2177,10 @@ static inline int tcg_gen_code_common(TCGContext *s, uint8_t *gen_code_buf,
>  #endif
>      }
>   the_end:
> +#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
> +    /* Generate MMU call helpers at the end of block (currently only for qemu_ld/st) */
> +    tcg_out_qemu_ldst_slow_path(s);
> +#endif
>      return -1;
>  }
>
> diff --git a/tcg/tcg.h b/tcg/tcg.h
> index d710694..b174cdb 100644
> --- a/tcg/tcg.h
> +++ b/tcg/tcg.h
> @@ -187,6 +187,29 @@ typedef tcg_target_ulong TCGArg;
>     are aliases for target_ulong and host pointer sized values respectively.
>   */
>
> +#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
> +/* Macros and structures for qemu_ld/st IR code optimization:
> +   It looks good for TCG_MAX_HELPER_LABELS to be half of OPC_BUF_SIZE in exec-all.h. */
> +#define TCG_MAX_QEMU_LDST       320

Is that true even if you have a huge block with nothing but simple
guest load instructions in it?

> +#define HL_LDST_SHIFT           4
> +#define HL_LDST_MASK            (1 << HL_LDST_SHIFT)
> +#define HL_ST_MASK              HL_LDST_MASK
> +#define HL_OPC_MASK             (HL_LDST_MASK - 1)
> +#define IS_QEMU_LD_LABEL(L)     (!((L)->opc_ext & HL_LDST_MASK))
> +#define IS_QEMU_ST_LABEL(L)     ((L)->opc_ext & HL_LDST_MASK)
> +
> +typedef struct TCGLabelQemuLdst {
> +    int opc_ext;                /* | 27bit (reserved) | 1bit (ld/st flag) | 4bit (opc) | */
> +    int addrlo_reg;             /* reg index for the low word of guest virtual address */
> +    int addrhi_reg;             /* reg index for the high word of guest virtual address */
> +    int datalo_reg;             /* reg index for the low word to be loaded or to be stored */
> +    int datahi_reg;             /* reg index for the high word to be loaded or to be stored */
> +    int mem_index;              /* soft MMU memory index */
> +    uint8_t *raddr;             /* return address (located end of TB) */
> +    uint8_t *label_ptr[2];      /* label pointers to be updated */
> +} TCGLabelQemuLdst;
> +#endif  /* CONFIG_QEMU_LDST_OPTIMIZATION */
> +
>  #ifdef CONFIG_DEBUG_TCG
>  #define DEBUG_TCGV 1
>  #endif
> @@ -389,6 +412,13 @@ struct TCGContext {
>  #ifdef CONFIG_DEBUG_TCG
>      int temps_in_use;
>  #endif
> +
> +#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
> +    /* labels info for qemu_ld/st IRs
> +       The labels help to generate TLB miss case codes at the end of TB */
> +    TCGLabelQemuLdst *qemu_ldst_labels;
> +    int nb_qemu_ldst_labels;
> +#endif
>  };
>
>  extern TCGContext tcg_ctx;
> @@ -588,3 +618,8 @@ extern uint8_t code_gen_prologue[];
>  #endif
>
>  void tcg_register_jit(void *buf, size_t buf_size);
> +
> +#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
> +/* qemu_ld/st generation at the end of TB */
> +void tcg_out_qemu_ldst_slow_path(TCGContext *s);
> +#endif
> --
> 1.7.4.1
>

-- PMM

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH v2 4/4] configure: add CONFIG_QEMU_LDST_OPTIMIZATION for TCG qemu_ld/st optimization
  2012-07-05 13:23 ` [Qemu-devel] [RFC][PATCH v2 4/4] configure: add CONFIG_QEMU_LDST_OPTIMIZATION for TCG qemu_ld/st optimization Yeongkyoon Lee
  2012-07-05 13:55   ` Andreas Färber
@ 2012-07-05 14:06   ` Peter Maydell
  2012-07-05 14:26     ` Laurent Desnogues
  2012-07-06 11:43     ` Yeongkyoon Lee
  1 sibling, 2 replies; 22+ messages in thread
From: Peter Maydell @ 2012-07-05 14:06 UTC (permalink / raw)
  To: Yeongkyoon Lee; +Cc: laurent.desnogues, e.voevodin, qemu-devel, chenwj

On 5 July 2012 14:23, Yeongkyoon Lee <yeongkyoon.lee@samsung.com> wrote:
> Add an option "--enable-ldst-optimization" to enable CONFIG_QEMU_LDST_OPTIMIZATION macro for TCG qemu_ld/st optimization. It only works with CONFIG_SOFTMMU and doesn't work with CONFIG_TCG_PASS_AREG0.

This shouldn't be a user settable config option -- we should
just identify what the optimal setting is for the guest/target
combination and use it.

>  case "$target_arch2" in
>    alpha | sparc* | xtensa* | ppc*)
>      echo "CONFIG_TCG_PASS_AREG0=y" >> $config_target_mak
> +    # qemu_ld/st optimization is not available with CONFIG_TCG_PASS_AREG0
> +    target_ldst_optimization="no"

PASS_AREG0 is the way of the future -- you need to fix the ldst
optimization to work with it.

-- PMM

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH v2 4/4] configure: add CONFIG_QEMU_LDST_OPTIMIZATION for TCG qemu_ld/st optimization
  2012-07-05 14:06   ` Peter Maydell
@ 2012-07-05 14:26     ` Laurent Desnogues
  2012-07-06 11:43     ` Yeongkyoon Lee
  1 sibling, 0 replies; 22+ messages in thread
From: Laurent Desnogues @ 2012-07-05 14:26 UTC (permalink / raw)
  To: Peter Maydell; +Cc: e.voevodin, qemu-devel, chenwj, Yeongkyoon Lee

On Thu, Jul 5, 2012 at 4:06 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
[...]
>>  case "$target_arch2" in
>>    alpha | sparc* | xtensa* | ppc*)
>>      echo "CONFIG_TCG_PASS_AREG0=y" >> $config_target_mak
>> +    # qemu_ld/st optimization is not available with CONFIG_TCG_PASS_AREG0
>> +    target_ldst_optimization="no"
>
> PASS_AREG0 is the way of the future -- you need to fix the ldst
> optimization to work with it.

Agreed.  But what's the point of speeding up on one side and
losing speed on the other side?  AREG0 slowdown would be
acceptable if the ARM target was using less helpers, until this
happens I don't think it is a good idea to push AREG0,
hence I'm not sure it is a prerequisite that Yeongkyoon Lee's
patch supports it.  By the way, it's also a good reason to have
CONFIG_QEMU_LDST_OPTIMIZATION, which you
commented on in patch 3.

Basically, and in my humble opinion, this is not a good
enough reason to reject the patch :-)  Of course the support
should be added as soon as possible once the rest has
been discussed.


Laurent

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH v2 2/4] tcg: add extended MMU helpers to softmmu targets
  2012-07-05 13:43   ` Peter Maydell
@ 2012-07-05 18:49     ` Blue Swirl
  2012-07-06 12:16       ` Yeongkyoon Lee
  0 siblings, 1 reply; 22+ messages in thread
From: Blue Swirl @ 2012-07-05 18:49 UTC (permalink / raw)
  To: Peter Maydell
  Cc: laurent.desnogues, chenwj, qemu-devel, e.voevodin, Yeongkyoon Lee

On Thu, Jul 5, 2012 at 1:43 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 5 July 2012 14:23, Yeongkyoon Lee <yeongkyoon.lee@samsung.com> wrote:
>> Add extended MMU helpers to softmmu targets, where the targets are alpha, arm, cris, i386, lm32, m68k, microblaze, mips, ppc, s390x, sh4, sparc and xtensa.
>>
>> Signed-off-by: Yeongkyoon Lee <yeongkyoon.lee@samsung.com>
>> ---
>>  target-alpha/mem_helper.c     |   22 ++++++++++++++++++++++
>>  target-arm/op_helper.c        |   23 +++++++++++++++++++++++
>>  target-cris/op_helper.c       |   22 ++++++++++++++++++++++
>>  target-i386/mem_helper.c      |   22 ++++++++++++++++++++++
>>  target-lm32/op_helper.c       |   23 ++++++++++++++++++++++-
>>  target-m68k/op_helper.c       |   22 ++++++++++++++++++++++
>>  target-microblaze/op_helper.c |   22 ++++++++++++++++++++++
>>  target-mips/op_helper.c       |   22 ++++++++++++++++++++++
>>  target-ppc/mem_helper.c       |   22 ++++++++++++++++++++++
>>  target-s390x/op_helper.c      |   22 ++++++++++++++++++++++
>>  target-sh4/op_helper.c        |   22 ++++++++++++++++++++++
>>  target-sparc/ldst_helper.c    |   23 +++++++++++++++++++++++
>>  target-xtensa/op_helper.c     |   22 ++++++++++++++++++++++
>>  13 files changed, 288 insertions(+), 1 deletions(-)
>
> This makes the already slightly repetitive inclusion of the
> softmmu_templates even more repetitive. Perhaps we could abstract
> it all out into a single header which the targets can include?

I'd just replace standard versions with extended versions
unconditionally, no CONFIG_*. Both AREG0 and !AREG0 cases must be
handled.

>>
>> diff --git a/target-alpha/mem_helper.c b/target-alpha/mem_helper.c
>> index 87cada4..ef880a7 100644
>> --- a/target-alpha/mem_helper.c
>> +++ b/target-alpha/mem_helper.c
>> @@ -132,6 +132,28 @@ void cpu_unassigned_access(CPUAlphaState *env, target_phys_addr_t addr,
>>  #define SHIFT 3
>>  #include "softmmu_template.h"
>>
>> +#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
>> +/* Exteneded MMU helper funtions for qemu_ld/st optimization
>
> "Extended".
>
>> +   Note that normal helper functions should be defined above
>> +   to avoid duplication of common functions, slow_ld/st and io_read/write.
>> + */
>> +#define USE_EXTENDED_HELPER
>> +
>> +#define SHIFT 0
>> +#include "softmmu_template.h"
>> +
>> +#define SHIFT 1
>> +#include "softmmu_template.h"
>> +
>> +#define SHIFT 2
>> +#include "softmmu_template.h"
>> +
>> +#define SHIFT 3
>> +#include "softmmu_template.h"
>> +
>> +#undef USE_EXTENDED_HELPER
>> +#endif  /* CONFIG_QEMU_LDST_OPTIMIZATION */
>> +
>>  /* try to fill the TLB and return an exception if error. If retaddr is
>>     NULL, it means that the function was called in C code (i.e. not
>>     from generated code or from helper.c) */
>> diff --git a/target-arm/op_helper.c b/target-arm/op_helper.c
>> index 490111c..8ff1209 100644
>> --- a/target-arm/op_helper.c
>> +++ b/target-arm/op_helper.c
>> @@ -69,6 +69,29 @@ uint32_t HELPER(neon_tbl)(uint32_t ireg, uint32_t def,
>>  #define SHIFT 3
>>  #include "softmmu_template.h"
>>
>> +
>> +#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
>> +/* Exteneded MMU helper funtions for qemu_ld/st optimization
>> +   Note that normal helper functions should be defined above
>> +   to avoid duplication of common functions, slow_ld/st and io_read/write.
>> + */
>> +#define USE_EXTENDED_HELPER
>> +
>> +#define SHIFT 0
>> +#include "softmmu_template.h"
>> +
>> +#define SHIFT 1
>> +#include "softmmu_template.h"
>> +
>> +#define SHIFT 2
>> +#include "softmmu_template.h"
>> +
>> +#define SHIFT 3
>> +#include "softmmu_template.h"
>> +
>> +#undef USE_EXTENDED_HELPER
>> +#endif  /* CONFIG_QEMU_LDST_OPTIMIZATION */
>> +
>>  /* try to fill the TLB and return an exception if error. If retaddr is
>>     NULL, it means that the function was called in C code (i.e. not
>>     from generated code or from helper.c) */
>> diff --git a/target-cris/op_helper.c b/target-cris/op_helper.c
>> index ac7c98c..cc34f3c 100644
>> --- a/target-cris/op_helper.c
>> +++ b/target-cris/op_helper.c
>> @@ -52,6 +52,28 @@
>>  #define SHIFT 3
>>  #include "softmmu_template.h"
>>
>> +#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
>> +/* Exteneded MMU helper funtions for qemu_ld/st optimization
>> +   Note that normal helper functions should be defined above
>> +   to avoid duplication of common functions, slow_ld/st and io_read/write.
>> + */
>> +#define USE_EXTENDED_HELPER
>> +
>> +#define SHIFT 0
>> +#include "softmmu_template.h"
>> +
>> +#define SHIFT 1
>> +#include "softmmu_template.h"
>> +
>> +#define SHIFT 2
>> +#include "softmmu_template.h"
>> +
>> +#define SHIFT 3
>> +#include "softmmu_template.h"
>> +
>> +#undef USE_EXTENDED_HELPER
>> +#endif  /* CONFIG_QEMU_LDST_OPTIMIZATION */
>> +
>>  /* Try to fill the TLB and return an exception if error. If retaddr is
>>     NULL, it means that the function was called in C code (i.e. not
>>     from generated code or from helper.c) */
>> diff --git a/target-i386/mem_helper.c b/target-i386/mem_helper.c
>> index 91353c0..59fb03c 100644
>> --- a/target-i386/mem_helper.c
>> +++ b/target-i386/mem_helper.c
>> @@ -126,6 +126,28 @@ void helper_boundl(target_ulong a0, int v)
>>  #define SHIFT 3
>>  #include "softmmu_template.h"
>>
>> +#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
>> +/* Exteneded MMU helper funtions for qemu_ld/st optimization
>> +   Note that normal helper functions should be defined above
>> +   to avoid duplication of common functions, slow_ld/st and io_read/write.
>> + */
>> +#define USE_EXTENDED_HELPER
>> +
>> +#define SHIFT 0
>> +#include "softmmu_template.h"
>> +
>> +#define SHIFT 1
>> +#include "softmmu_template.h"
>> +
>> +#define SHIFT 2
>> +#include "softmmu_template.h"
>> +
>> +#define SHIFT 3
>> +#include "softmmu_template.h"
>> +
>> +#undef USE_EXTENDED_HELPER
>> +#endif  /* CONFIG_QEMU_LDST_OPTIMIZATION */
>> +
>>  #endif
>>
>>  #if !defined(CONFIG_USER_ONLY)
>> diff --git a/target-lm32/op_helper.c b/target-lm32/op_helper.c
>> index 51edc1a..bcdddf7 100644
>> --- a/target-lm32/op_helper.c
>> +++ b/target-lm32/op_helper.c
>> @@ -18,6 +18,28 @@
>>  #define SHIFT 3
>>  #include "softmmu_template.h"
>>
>> +#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
>> +/* Exteneded MMU helper funtions for qemu_ld/st optimization
>> +   Note that normal helper functions should be defined above
>> +   to avoid duplication of common functions, slow_ld/st and io_read/write.
>> + */
>> +#define USE_EXTENDED_HELPER
>> +
>> +#define SHIFT 0
>> +#include "softmmu_template.h"
>> +
>> +#define SHIFT 1
>> +#include "softmmu_template.h"
>> +
>> +#define SHIFT 2
>> +#include "softmmu_template.h"
>> +
>> +#define SHIFT 3
>> +#include "softmmu_template.h"
>> +
>> +#undef USE_EXTENDED_HELPER
>> +#endif  /* CONFIG_QEMU_LDST_OPTIMIZATION */
>> +
>>  void helper_raise_exception(uint32_t index)
>>  {
>>      env->exception_index = index;
>> @@ -101,4 +123,3 @@ void tlb_fill(CPULM32State *env1, target_ulong addr, int is_write, int mmu_idx,
>>      env = saved_env;
>>  }
>>  #endif
>> -
>> diff --git a/target-m68k/op_helper.c b/target-m68k/op_helper.c
>> index 1971a57..454fd71 100644
>> --- a/target-m68k/op_helper.c
>> +++ b/target-m68k/op_helper.c
>> @@ -51,6 +51,28 @@ extern int semihosting_enabled;
>>  #define SHIFT 3
>>  #include "softmmu_template.h"
>>
>> +#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
>> +/* Exteneded MMU helper funtions for qemu_ld/st optimization
>> +   Note that normal helper functions should be defined above
>> +   to avoid duplication of common functions, slow_ld/st and io_read/write.
>> + */
>> +#define USE_EXTENDED_HELPER
>> +
>> +#define SHIFT 0
>> +#include "softmmu_template.h"
>> +
>> +#define SHIFT 1
>> +#include "softmmu_template.h"
>> +
>> +#define SHIFT 2
>> +#include "softmmu_template.h"
>> +
>> +#define SHIFT 3
>> +#include "softmmu_template.h"
>> +
>> +#undef USE_EXTENDED_HELPER
>> +#endif  /* CONFIG_QEMU_LDST_OPTIMIZATION */
>> +
>>  /* Try to fill the TLB and return an exception if error. If retaddr is
>>     NULL, it means that the function was called in C code (i.e. not
>>     from generated code or from helper.c) */
>> diff --git a/target-microblaze/op_helper.c b/target-microblaze/op_helper.c
>> index 3b1f072..12be3bd 100644
>> --- a/target-microblaze/op_helper.c
>> +++ b/target-microblaze/op_helper.c
>> @@ -39,6 +39,28 @@
>>  #define SHIFT 3
>>  #include "softmmu_template.h"
>>
>> +#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
>> +/* Exteneded MMU helper funtions for qemu_ld/st optimization
>> +   Note that normal helper functions should be defined above
>> +   to avoid duplication of common functions, slow_ld/st and io_read/write.
>> + */
>> +#define USE_EXTENDED_HELPER
>> +
>> +#define SHIFT 0
>> +#include "softmmu_template.h"
>> +
>> +#define SHIFT 1
>> +#include "softmmu_template.h"
>> +
>> +#define SHIFT 2
>> +#include "softmmu_template.h"
>> +
>> +#define SHIFT 3
>> +#include "softmmu_template.h"
>> +
>> +#undef USE_EXTENDED_HELPER
>> +#endif  /* CONFIG_QEMU_LDST_OPTIMIZATION */
>> +
>>  /* Try to fill the TLB and return an exception if error. If retaddr is
>>     NULL, it means that the function was called in C code (i.e. not
>>     from generated code or from helper.c) */
>> diff --git a/target-mips/op_helper.c b/target-mips/op_helper.c
>> index 66037ac..c2fd2db 100644
>> --- a/target-mips/op_helper.c
>> +++ b/target-mips/op_helper.c
>> @@ -2303,6 +2303,28 @@ static void QEMU_NORETURN do_unaligned_access(target_ulong addr, int is_write,
>>  #define SHIFT 3
>>  #include "softmmu_template.h"
>>
>> +#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
>> +/* Exteneded MMU helper funtions for qemu_ld/st optimization
>> +   Note that normal helper functions should be defined above
>> +   to avoid duplication of common functions, slow_ld/st and io_read/write.
>> + */
>> +#define USE_EXTENDED_HELPER
>> +
>> +#define SHIFT 0
>> +#include "softmmu_template.h"
>> +
>> +#define SHIFT 1
>> +#include "softmmu_template.h"
>> +
>> +#define SHIFT 2
>> +#include "softmmu_template.h"
>> +
>> +#define SHIFT 3
>> +#include "softmmu_template.h"
>> +
>> +#undef USE_EXTENDED_HELPER
>> +#endif  /* CONFIG_QEMU_LDST_OPTIMIZATION */
>> +
>>  static void do_unaligned_access(target_ulong addr, int is_write,
>>                                  int is_user, uintptr_t retaddr)
>>  {
>> diff --git a/target-ppc/mem_helper.c b/target-ppc/mem_helper.c
>> index 5b5f1bd..01a9ac6 100644
>> --- a/target-ppc/mem_helper.c
>> +++ b/target-ppc/mem_helper.c
>> @@ -268,6 +268,28 @@ STVE(stvewx, cpu_stl_data, bswap32, u32)
>>  #define SHIFT 3
>>  #include "softmmu_template.h"
>>
>> +#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
>> +/* Exteneded MMU helper funtions for qemu_ld/st optimization
>> +   Note that normal helper functions should be defined above
>> +   to avoid duplication of common functions, slow_ld/st and io_read/write.
>> + */
>> +#define USE_EXTENDED_HELPER
>> +
>> +#define SHIFT 0
>> +#include "softmmu_template.h"
>> +
>> +#define SHIFT 1
>> +#include "softmmu_template.h"
>> +
>> +#define SHIFT 2
>> +#include "softmmu_template.h"
>> +
>> +#define SHIFT 3
>> +#include "softmmu_template.h"
>> +
>> +#undef USE_EXTENDED_HELPER
>> +#endif  /* CONFIG_QEMU_LDST_OPTIMIZATION */
>> +
>>  /* try to fill the TLB and return an exception if error. If retaddr is
>>     NULL, it means that the function was called in C code (i.e. not
>>     from generated code or from helper.c) */
>> diff --git a/target-s390x/op_helper.c b/target-s390x/op_helper.c
>> index 7b72473..05109c5 100644
>> --- a/target-s390x/op_helper.c
>> +++ b/target-s390x/op_helper.c
>> @@ -52,6 +52,28 @@
>>  #define SHIFT 3
>>  #include "softmmu_template.h"
>>
>> +#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
>> +/* Exteneded MMU helper funtions for qemu_ld/st optimization
>> +   Note that normal helper functions should be defined above
>> +   to avoid duplication of common functions, slow_ld/st and io_read/write.
>> + */
>> +#define USE_EXTENDED_HELPER
>> +
>> +#define SHIFT 0
>> +#include "softmmu_template.h"
>> +
>> +#define SHIFT 1
>> +#include "softmmu_template.h"
>> +
>> +#define SHIFT 2
>> +#include "softmmu_template.h"
>> +
>> +#define SHIFT 3
>> +#include "softmmu_template.h"
>> +
>> +#undef USE_EXTENDED_HELPER
>> +#endif  /* CONFIG_QEMU_LDST_OPTIMIZATION */
>> +
>>  /* try to fill the TLB and return an exception if error. If retaddr is
>>     NULL, it means that the function was called in C code (i.e. not
>>     from generated code or from helper.c) */
>> diff --git a/target-sh4/op_helper.c b/target-sh4/op_helper.c
>> index 4054791..f7c1f67 100644
>> --- a/target-sh4/op_helper.c
>> +++ b/target-sh4/op_helper.c
>> @@ -53,6 +53,28 @@ static void cpu_restore_state_from_retaddr(uintptr_t retaddr)
>>  #define SHIFT 3
>>  #include "softmmu_template.h"
>>
>> +#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
>> +/* Exteneded MMU helper funtions for qemu_ld/st optimization
>> +   Note that normal helper functions should be defined above
>> +   to avoid duplication of common functions, slow_ld/st and io_read/write.
>> + */
>> +#define USE_EXTENDED_HELPER
>> +
>> +#define SHIFT 0
>> +#include "softmmu_template.h"
>> +
>> +#define SHIFT 1
>> +#include "softmmu_template.h"
>> +
>> +#define SHIFT 2
>> +#include "softmmu_template.h"
>> +
>> +#define SHIFT 3
>> +#include "softmmu_template.h"
>> +
>> +#undef USE_EXTENDED_HELPER
>> +#endif  /* CONFIG_QEMU_LDST_OPTIMIZATION */
>> +
>>  void tlb_fill(CPUSH4State *env1, target_ulong addr, int is_write, int mmu_idx,
>>                uintptr_t retaddr)
>>  {
>> diff --git a/target-sparc/ldst_helper.c b/target-sparc/ldst_helper.c
>> index 9bec7a9..1ceb588 100644
>> --- a/target-sparc/ldst_helper.c
>> +++ b/target-sparc/ldst_helper.c
>> @@ -80,6 +80,29 @@
>>
>>  #define SHIFT 3
>>  #include "softmmu_template.h"
>> +
>> +#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
>> +/* Exteneded MMU helper funtions for qemu_ld/st optimization
>> +   Note that normal helper functions should be defined above
>> +   to avoid duplication of common functions, slow_ld/st and io_read/write.
>> + */
>> +#define USE_EXTENDED_HELPER
>> +
>> +#define SHIFT 0
>> +#include "softmmu_template.h"
>> +
>> +#define SHIFT 1
>> +#include "softmmu_template.h"
>> +
>> +#define SHIFT 2
>> +#include "softmmu_template.h"
>> +
>> +#define SHIFT 3
>> +#include "softmmu_template.h"
>> +
>> +#undef USE_EXTENDED_HELPER
>> +#endif  /* CONFIG_QEMU_LDST_OPTIMIZATION */
>> +
>>  #endif
>>
>>  #if defined(TARGET_SPARC64) && !defined(CONFIG_USER_ONLY)
>> diff --git a/target-xtensa/op_helper.c b/target-xtensa/op_helper.c
>> index 2659c0e..536e1fb 100644
>> --- a/target-xtensa/op_helper.c
>> +++ b/target-xtensa/op_helper.c
>> @@ -47,6 +47,28 @@ static void do_unaligned_access(CPUXtensaState *env,
>>  #define SHIFT 3
>>  #include "softmmu_template.h"
>>
>> +#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
>> +/* Exteneded MMU helper funtions for qemu_ld/st optimization
>> +   Note that normal helper functions should be defined above
>> +   to avoid duplication of common functions, slow_ld/st and io_read/write.
>> + */
>> +#define USE_EXTENDED_HELPER
>> +
>> +#define SHIFT 0
>> +#include "softmmu_template.h"
>> +
>> +#define SHIFT 1
>> +#include "softmmu_template.h"
>> +
>> +#define SHIFT 2
>> +#include "softmmu_template.h"
>> +
>> +#define SHIFT 3
>> +#include "softmmu_template.h"
>> +
>> +#undef USE_EXTENDED_HELPER
>> +#endif  /* CONFIG_QEMU_LDST_OPTIMIZATION */
>> +
>>  static void do_restore_state(CPUXtensaState *env, uintptr_t pc)
>>  {
>>      TranslationBlock *tb;
>> --
>> 1.7.4.1
>>
>
> -- PMM
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH v2 4/4] configure: add CONFIG_QEMU_LDST_OPTIMIZATION for TCG qemu_ld/st optimization
  2012-07-05 13:55   ` Andreas Färber
@ 2012-07-06  3:13     ` Evgeny Voevodin
  0 siblings, 0 replies; 22+ messages in thread
From: Evgeny Voevodin @ 2012-07-06  3:13 UTC (permalink / raw)
  To: Andreas Färber
  Cc: Peter Maydell, chenwj, Yeongkyoon Lee, qemu-devel, Blue Swirl,
	laurent.desnogues

On 05.07.2012 17:55, Andreas Färber wrote:
> Am 05.07.2012 15:23, schrieb Yeongkyoon Lee:
>> Add an option "--enable-ldst-optimization" to enable CONFIG_QEMU_LDST_OPTIMIZATION macro for TCG qemu_ld/st optimization. It only works with CONFIG_SOFTMMU and doesn't work with CONFIG_TCG_PASS_AREG0.
>>
>> Signed-off-by: Yeongkyoon Lee <yeongkyoon.lee@samsung.com>
>> ---
>>   configure |   15 +++++++++++++++
>>   1 files changed, 15 insertions(+), 0 deletions(-)
>>
>> diff --git a/configure b/configure
>> index 9f071b7..2b364cc 100755
>> --- a/configure
>> +++ b/configure
> [...]
>> @@ -3463,6 +3466,11 @@ echo "EXESUF=$EXESUF" >> $config_host_mak
>>   echo "LIBS_QGA+=$libs_qga" >> $config_host_mak
>>   echo "POD2MAN=$POD2MAN" >> $config_host_mak
>>   
>> +if [ "$ldst_optimization" = "yes" -a "$cpu" != "i386" -a "$cpu" != "x86_64" ] ; then
>> +  echo "ERROR: qemu_ld/st optimization is only available on i386 or x86_64 hosts"
>> +  exit 1
>> +fi
> [snip]
>
> I assume that Samsung is interested in optimizing the Exynos emulation.

Nope ) Originally it's from x86 Tizen emulator )

> I think there was already a patchset posted converting target-arm to
> CONFIG_PASS_TCG_AREG0, only with some slowdowns to be investigated...
> What is the obstacle for supporting AREG0 mode in your optimization?
>
> Regards,
> Andreas
>
>> +
>>   # generate list of library paths for linker script
>>   
>>   $ld --verbose -v 2> /dev/null | grep SEARCH_DIR > ${config_host_ld}
>> @@ -3696,11 +3704,18 @@ fi
>>   symlink "$source_path/Makefile.target" "$target_dir/Makefile"
>>   
>>   
>> +target_ldst_optimization="$ldst_optimization"
>> +
>>   case "$target_arch2" in
>>     alpha | sparc* | xtensa* | ppc*)
>>       echo "CONFIG_TCG_PASS_AREG0=y" >> $config_target_mak
>> +    # qemu_ld/st optimization is not available with CONFIG_TCG_PASS_AREG0
>> +    target_ldst_optimization="no"
>>     ;;
>>   esac
>> +if [ "$target_ldst_optimization" = "yes" -a "$target_softmmu" = "yes" ] ; then
>> +    echo "CONFIG_QEMU_LDST_OPTIMIZATION=y" >> $config_target_mak
>> +fi
>>   
>>   echo "TARGET_SHORT_ALIGNMENT=$target_short_alignment" >> $config_target_mak
>>   echo "TARGET_INT_ALIGNMENT=$target_int_alignment" >> $config_target_mak


-- 
Kind regards,
Evgeny Voevodin,
Technical Leader,
Mobile Group, SMRC, Samsung Electronics
e-mail: e.voevodin@samsung.com

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH v2 1/4] tcg: add declarations and templates of extended MMU helpers
  2012-07-05 13:40   ` Peter Maydell
@ 2012-07-06 10:30     ` Yeongkyoon Lee
  2012-07-06 10:35       ` 陳韋任 (Wei-Ren Chen)
  0 siblings, 1 reply; 22+ messages in thread
From: Yeongkyoon Lee @ 2012-07-06 10:30 UTC (permalink / raw)
  To: Peter Maydell; +Cc: laurent.desnogues, chenwj, qemu-devel, e.voevodin

>> Add declarations and templates of extended MMU helpers which can take return address argument to what helper functions return. These extended helper functions are called only by generated code.
> It's not entirely clear from this description what the
> return address argument actually is.
My commit message might give confusion. The return address is originally 
expressed as "retaddr" in softmmu_template.h. It means the runtime host 
pc which access guest memory. In previous standard MMU helpers, the 
address is the caller's pc of MMU helper calculated from GETPC(), 
however, in new optimized MMU helpers, the address is different from the 
caller's pc because the call site is located end of TB.
> Also, please line wrap your commit messages.
I didn't know the line wrap rule of commit message. Is the rule included 
in checkpatch.pl? Let me check it.
>> +#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
>> +/* Extended versions of MMU helpers for qemu_ld/st optimization.
>> +   They get return address arguments because the caller PCs are not where helpers return to. */
> This is >80 characters ; please use checkpatch.pl.
Ok.
>> +uint8_t __ext_ldb_mmu(target_ulong addr, int mmu_idx, uintptr_t ra);
> '__' is a prefix reserved for the system. I know we have existing usage
> of it, but I think it would be better to avoid adding new uses.
Ok, I'll fix it.
>> +#ifdef USE_EXTENDED_HELPER
>> +/* Exteneded helper funtions have one more argument of address
>> +   to which pc is returned after setting TLB entry */
> "Extended". "functions".
Ok.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH v2 1/4] tcg: add declarations and templates of extended MMU helpers
  2012-07-06 10:30     ` Yeongkyoon Lee
@ 2012-07-06 10:35       ` 陳韋任 (Wei-Ren Chen)
  0 siblings, 0 replies; 22+ messages in thread
From: 陳韋任 (Wei-Ren Chen) @ 2012-07-06 10:35 UTC (permalink / raw)
  To: Yeongkyoon Lee
  Cc: laurent.desnogues, Peter Maydell, chenwj, qemu-devel, e.voevodin

> > Also, please line wrap your commit messages.
> I didn't know the line wrap rule of commit message. Is the rule included 
> in checkpatch.pl? Let me check it.

  I guess it's 80 char length rule?

Regards,
chenwj

-- 
Wei-Ren Chen (陳韋任)
Computer Systems Lab, Institute of Information Science,
Academia Sinica, Taiwan (R.O.C.)
Tel:886-2-2788-3799 #1667
Homepage: http://people.cs.nctu.edu.tw/~chenwj

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH v2 3/4] tcg: add optimized TCG qemu_ld/st generation
  2012-07-05 14:04   ` Peter Maydell
@ 2012-07-06 11:20     ` Yeongkyoon Lee
  2012-07-06 11:28       ` Peter Maydell
  0 siblings, 1 reply; 22+ messages in thread
From: Yeongkyoon Lee @ 2012-07-06 11:20 UTC (permalink / raw)
  To: Peter Maydell; +Cc: laurent.desnogues, e.voevodin, qemu-devel, chenwj

> Is it really worth having this as a CONFIG_ switch? If we think
> it's better to do this out of line we should just switch to
> always generating the out of line code, I think. There's not much
> point in retaining the old code path if it's disabled -- it will
> just bitrot.

I agree.
However, it is just a safe guard because I have not test all the targets 
of qemu.
I've only tested x86 and ARM targets on x86 and x86-64 hosts.
If agreed to remove conditional macro, then I'll fix it.


>> +#ifdef CONFIG_QEMU_LDST_OPTIMIZATION
>> +    /* jne slow_path */
>> +    tcg_out_opc(s, OPC_JCC_long + JCC_JNE, 0, 0, 0);
>> +    if (!label_ptr) {
>> +        tcg_abort();
>> +    }
> There's no point in this check and abort -- label_ptr will always be 
> non-NULL (it would be an internal error if it wasn't), and if it is by 
> some future bug NULL, we'll just crash on the next line, which is just 
> as good. The existing code didn't feel the need to make this check, we 
> don't need to do it in the new code. 

It cannot be happened now as you said. It is just for a possible future bug.
But I cannot understand "we'll just crash on the next line" you 
mentioned above.

>> +#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
>> +    /* helper stub will be jumped back here */
> "will jump back here".

Ok.

>
>> +#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
>> +    /* helper stub will be jumped back here */
> ditto.

Ok.

>> +#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
>> +/* optimization to reduce jump overheads for qemu_ld/st IRs */
>> +
>> +/*
>> + * qemu_ld/st code generator call add_qemu_ldst_label,
>> + * so that slow case(TLB miss or I/O rw) is handled at the end of TB
>> + */
> This comment isn't really describing the purpose of this function,
> which is something more along the lines of "Record the context of
> a call to the out of line helper code for the slow path for a
> load or store, so that we can later generate the correct helper
> code".

I agree. Your description looks better.

>
>> +    if (s->nb_qemu_ldst_labels >= TCG_MAX_QEMU_LDST)
>> +        tcg_abort();
> QEMU coding style requires braces. Please use checkpatch.pl.

Ok.

>
>> +
>> +    idx = s->nb_qemu_ldst_labels++;
>> +    label = (TCGLabelQemuLdst *)&s->qemu_ldst_labels[idx];
>> +    label->opc_ext = opc_ext;
>> +    label->datalo_reg = data_reg;
>> +    label->datahi_reg = data_reg2;
>> +    label->addrlo_reg = addrlo_reg;
>> +    label->addrhi_reg = addrhi_reg;
>> +    label->mem_index = mem_index;
>> +    label->raddr = raddr;
>> +    if (!label_ptr) {
>> +        tcg_abort();
>> +    }
> Another pointless abort.

ditto.

>
>> diff --git a/tcg/tcg.c b/tcg/tcg.c
>> index 8386b70..8009069 100644
>> --- a/tcg/tcg.c
>> +++ b/tcg/tcg.c
>> @@ -301,6 +301,14 @@ void tcg_func_start(TCGContext *s)
>>
>>       gen_opc_ptr = gen_opc_buf;
>>       gen_opparam_ptr = gen_opparam_buf;
>> +#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
>> +    /* initialize qemu_ld/st labels which help to generate TLB miss case codes at the end of TB */
>> +    s->qemu_ldst_labels = tcg_malloc(sizeof(TCGLabelQemuLdst) * TCG_MAX_QEMU_LDST);
>> +    if (!s->qemu_ldst_labels) {
>> +        tcg_abort();
>> +    }
> Unnecessary check -- tcg_malloc() can never return 0.

Ok.

>> +#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
>> +/* Macros and structures for qemu_ld/st IR code optimization:
>> +   It looks good for TCG_MAX_HELPER_LABELS to be half of OPC_BUF_SIZE in exec-all.h. */
>> +#define TCG_MAX_QEMU_LDST       320
> Is that true even if you have a huge block with nothing but simple
> guest load instructions in it?

I agree. It needs to be set as same size with OPC_BUF_SIZE for covering 
extreme cases.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH v2 3/4] tcg: add optimized TCG qemu_ld/st generation
  2012-07-06 11:20     ` Yeongkyoon Lee
@ 2012-07-06 11:28       ` Peter Maydell
  0 siblings, 0 replies; 22+ messages in thread
From: Peter Maydell @ 2012-07-06 11:28 UTC (permalink / raw)
  To: Yeongkyoon Lee; +Cc: laurent.desnogues, e.voevodin, qemu-devel, chenwj

On 6 July 2012 12:20, Yeongkyoon Lee <yeongkyoon.lee@samsung.com> wrote:
>
>>> +#ifdef CONFIG_QEMU_LDST_OPTIMIZATION
>>> +    /* jne slow_path */
>>> +    tcg_out_opc(s, OPC_JCC_long + JCC_JNE, 0, 0, 0);
>>> +    if (!label_ptr) {
>>> +        tcg_abort();
>>> +    }
>>
>> There's no point in this check and abort -- label_ptr will always be
>> non-NULL (it would be an internal error if it wasn't), and if it is by some
>> future bug NULL, we'll just crash on the next line, which is just as good.
>> The existing code didn't feel the need to make this check, we don't need to
>> do it in the new code.
>
> It cannot be happened now as you said. It is just for a possible future bug.
> But I cannot understand "we'll just crash on the next line" you mentioned
> above.

If the check was not present and label_ptr was somehow NULL, then
attempting to execute "label_ptr[0] = s->code_ptr;" will crash.
This is just as helpful for debugging purposes as an abort.
It's sometimes worth having sanity-checking assertions when the
code would otherwise proceed for a long time doing something wrong
but not crashing, because the assert means that you get an early
indication of failure near the point of failure. However the check
you have here is delaying the failure by exactly one line, which is
not useful.

>>> +#if defined(CONFIG_QEMU_LDST_OPTIMIZATION)
>>> +/* Macros and structures for qemu_ld/st IR code optimization:
>>> +   It looks good for TCG_MAX_HELPER_LABELS to be half of OPC_BUF_SIZE in
>>> exec-all.h. */
>>> +#define TCG_MAX_QEMU_LDST       320
>>
>> Is that true even if you have a huge block with nothing but simple
>> guest load instructions in it?
>
> I agree. It needs to be set as same size with OPC_BUF_SIZE for covering
> extreme cases.

The point here, incidentally, is that guest code should never be able
to make qemu crash or abort, so any fixed sized buffer has to be able
to handle the worst case.

-- PMM

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH v2 4/4] configure: add CONFIG_QEMU_LDST_OPTIMIZATION for TCG qemu_ld/st optimization
  2012-07-05 14:06   ` Peter Maydell
  2012-07-05 14:26     ` Laurent Desnogues
@ 2012-07-06 11:43     ` Yeongkyoon Lee
  2012-07-07  7:51       ` Blue Swirl
  1 sibling, 1 reply; 22+ messages in thread
From: Yeongkyoon Lee @ 2012-07-06 11:43 UTC (permalink / raw)
  To: Peter Maydell; +Cc: laurent.desnogues, e.voevodin, qemu-devel, chenwj

On 2012년 07월 05일 23:06, Peter Maydell wrote:
> On 5 July 2012 14:23, Yeongkyoon Lee <yeongkyoon.lee@samsung.com> wrote:
>> Add an option "--enable-ldst-optimization" to enable CONFIG_QEMU_LDST_OPTIMIZATION macro for TCG qemu_ld/st optimization. It only works with CONFIG_SOFTMMU and doesn't work with CONFIG_TCG_PASS_AREG0.
> This shouldn't be a user settable config option -- we should
> just identify what the optimal setting is for the guest/target
> combination and use it.

It looks better remove the option after it is confirmed that it works 
well for all the guest target architectures.

>>   case "$target_arch2" in
>>     alpha | sparc* | xtensa* | ppc*)
>>       echo "CONFIG_TCG_PASS_AREG0=y" >> $config_target_mak
>> +    # qemu_ld/st optimization is not available with CONFIG_TCG_PASS_AREG0
>> +    target_ldst_optimization="no"
> PASS_AREG0 is the way of the future -- you need to fix the ldst
> optimization to work with it.

There are two reasons to prevent working with PASS_AREG0.
The first one is I'm not sure the history and future PASS_AREG0 and have 
not tested PASS_AREG0 which is only enabled for some guest architectures..
The second is a problem of too many conditional paths in sources.
I think it is not late to combine the ldst optimization with PASS_AREG0 
after the ldst optimization agreed as default.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH v2 2/4] tcg: add extended MMU helpers to softmmu targets
  2012-07-05 18:49     ` Blue Swirl
@ 2012-07-06 12:16       ` Yeongkyoon Lee
  0 siblings, 0 replies; 22+ messages in thread
From: Yeongkyoon Lee @ 2012-07-06 12:16 UTC (permalink / raw)
  To: Blue Swirl
  Cc: laurent.desnogues, Peter Maydell, chenwj, qemu-devel, e.voevodin

On 2012년 07월 06일 03:49, Blue Swirl wrote:
> On Thu, Jul 5, 2012 at 1:43 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
>> On 5 July 2012 14:23, Yeongkyoon Lee <yeongkyoon.lee@samsung.com> wrote:
>>> Add extended MMU helpers to softmmu targets, where the targets are alpha, arm, cris, i386, lm32, m68k, microblaze, mips, ppc, s390x, sh4, sparc and xtensa.
>>>
>>> Signed-off-by: Yeongkyoon Lee <yeongkyoon.lee@samsung.com>
>>> ---
>>>   target-alpha/mem_helper.c     |   22 ++++++++++++++++++++++
>>>   target-arm/op_helper.c        |   23 +++++++++++++++++++++++
>>>   target-cris/op_helper.c       |   22 ++++++++++++++++++++++
>>>   target-i386/mem_helper.c      |   22 ++++++++++++++++++++++
>>>   target-lm32/op_helper.c       |   23 ++++++++++++++++++++++-
>>>   target-m68k/op_helper.c       |   22 ++++++++++++++++++++++
>>>   target-microblaze/op_helper.c |   22 ++++++++++++++++++++++
>>>   target-mips/op_helper.c       |   22 ++++++++++++++++++++++
>>>   target-ppc/mem_helper.c       |   22 ++++++++++++++++++++++
>>>   target-s390x/op_helper.c      |   22 ++++++++++++++++++++++
>>>   target-sh4/op_helper.c        |   22 ++++++++++++++++++++++
>>>   target-sparc/ldst_helper.c    |   23 +++++++++++++++++++++++
>>>   target-xtensa/op_helper.c     |   22 ++++++++++++++++++++++
>>>   13 files changed, 288 insertions(+), 1 deletions(-)
>> This makes the already slightly repetitive inclusion of the
>> softmmu_templates even more repetitive. Perhaps we could abstract
>> it all out into a single header which the targets can include?
> I'd just replace standard versions with extended versions
> unconditionally, no CONFIG_*. Both AREG0 and !AREG0 cases must be
> handled.
>

I've only modified the code related to the MMU call from generated code 
because this patch is focused optimize that kind of runtime generated 
code. As I mentioned another thread, basically I agree to apply AREG0 
case if ldst optimization is accepted as default (no macros).
But when replacing standard versions with extended versions, there is an 
issue of non-x86 (and x64) hosts which has no ldst optimization impl. It 
needs to conserve the usage of conditional macro for standard versions, 
however, it looks better have a type of version per a host, which is 
different from current my patch.
How do you think about it?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH v2 4/4] configure: add CONFIG_QEMU_LDST_OPTIMIZATION for TCG qemu_ld/st optimization
  2012-07-06 11:43     ` Yeongkyoon Lee
@ 2012-07-07  7:51       ` Blue Swirl
  2012-07-08  8:35         ` Yeongkyoon Lee
  0 siblings, 1 reply; 22+ messages in thread
From: Blue Swirl @ 2012-07-07  7:51 UTC (permalink / raw)
  To: Yeongkyoon Lee
  Cc: laurent.desnogues, Peter Maydell, chenwj, qemu-devel, e.voevodin

On Fri, Jul 6, 2012 at 11:43 AM, Yeongkyoon Lee
<yeongkyoon.lee@samsung.com> wrote:
> On 2012년 07월 05일 23:06, Peter Maydell wrote:
>>
>> On 5 July 2012 14:23, Yeongkyoon Lee <yeongkyoon.lee@samsung.com> wrote:
>>>
>>> Add an option "--enable-ldst-optimization" to enable
>>> CONFIG_QEMU_LDST_OPTIMIZATION macro for TCG qemu_ld/st optimization. It only
>>> works with CONFIG_SOFTMMU and doesn't work with CONFIG_TCG_PASS_AREG0.
>>
>> This shouldn't be a user settable config option -- we should
>> just identify what the optimal setting is for the guest/target
>> combination and use it.
>
>
> It looks better remove the option after it is confirmed that it works well
> for all the guest target architectures.
>
>
>>>   case "$target_arch2" in
>>>     alpha | sparc* | xtensa* | ppc*)
>>>       echo "CONFIG_TCG_PASS_AREG0=y" >> $config_target_mak
>>> +    # qemu_ld/st optimization is not available with
>>> CONFIG_TCG_PASS_AREG0
>>> +    target_ldst_optimization="no"
>>
>> PASS_AREG0 is the way of the future -- you need to fix the ldst
>> optimization to work with it.
>
>
> There are two reasons to prevent working with PASS_AREG0.
> The first one is I'm not sure the history and future PASS_AREG0 and have not
> tested PASS_AREG0 which is only enabled for some guest architectures..
> The second is a problem of too many conditional paths in sources.
> I think it is not late to combine the ldst optimization with PASS_AREG0
> after the ldst optimization agreed as default.

It looks like you already support PASS_AREG0, the slow path code just
passes AREG0 as the first call argument.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH v2 4/4] configure: add CONFIG_QEMU_LDST_OPTIMIZATION for TCG qemu_ld/st optimization
  2012-07-07  7:51       ` Blue Swirl
@ 2012-07-08  8:35         ` Yeongkyoon Lee
  0 siblings, 0 replies; 22+ messages in thread
From: Yeongkyoon Lee @ 2012-07-08  8:35 UTC (permalink / raw)
  To: Blue Swirl
  Cc: laurent.desnogues, Peter Maydell, e.voevodin, qemu-devel, chenwj

On 2012년 07월 07일 16:51, Blue Swirl wrote:
> On Fri, Jul 6, 2012 at 11:43 AM, Yeongkyoon Lee
> <yeongkyoon.lee@samsung.com> wrote:
>> On 2012년 07월 05일 23:06, Peter Maydell wrote:
>>> On 5 July 2012 14:23, Yeongkyoon Lee <yeongkyoon.lee@samsung.com> wrote:
>>>> Add an option "--enable-ldst-optimization" to enable
>>>> CONFIG_QEMU_LDST_OPTIMIZATION macro for TCG qemu_ld/st optimization. It only
>>>> works with CONFIG_SOFTMMU and doesn't work with CONFIG_TCG_PASS_AREG0.
>>> This shouldn't be a user settable config option -- we should
>>> just identify what the optimal setting is for the guest/target
>>> combination and use it.
>>
>> It looks better remove the option after it is confirmed that it works well
>> for all the guest target architectures.
>>
>>
>>>>    case "$target_arch2" in
>>>>      alpha | sparc* | xtensa* | ppc*)
>>>>        echo "CONFIG_TCG_PASS_AREG0=y" >> $config_target_mak
>>>> +    # qemu_ld/st optimization is not available with
>>>> CONFIG_TCG_PASS_AREG0
>>>> +    target_ldst_optimization="no"
>>> PASS_AREG0 is the way of the future -- you need to fix the ldst
>>> optimization to work with it.
>>
>> There are two reasons to prevent working with PASS_AREG0.
>> The first one is I'm not sure the history and future PASS_AREG0 and have not
>> tested PASS_AREG0 which is only enabled for some guest architectures..
>> The second is a problem of too many conditional paths in sources.
>> I think it is not late to combine the ldst optimization with PASS_AREG0
>> after the ldst optimization agreed as default.
> It looks like you already support PASS_AREG0, the slow path code just
> passes AREG0 as the first call argument.
>
>

The code in slow path is currently dead code just for future PASS_AREG0 
porting.
It needs some porting for actual working.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH v2 0/4] tcg: enhance code generation quality for qemu_ld/st IRs
  2012-07-05 13:23 [Qemu-devel] [RFC][PATCH v2 0/4] tcg: enhance code generation quality for qemu_ld/st IRs Yeongkyoon Lee
                   ` (3 preceding siblings ...)
  2012-07-05 13:23 ` [Qemu-devel] [RFC][PATCH v2 4/4] configure: add CONFIG_QEMU_LDST_OPTIMIZATION for TCG qemu_ld/st optimization Yeongkyoon Lee
@ 2012-07-10  9:12 ` Yeongkyoon Lee
  4 siblings, 0 replies; 22+ messages in thread
From: Yeongkyoon Lee @ 2012-07-10  9:12 UTC (permalink / raw)
  To: Yeongkyoon Lee
  Cc: Peter Maydell, e.voevodin, chenwj, qemu-devel, laurent.desnogues,
	afaerber

On 2012년 07월 05일 22:23, Yeongkyoon Lee wrote:
> Summarized feature is as following.
>   - All the changes are wrapped by macro "CONFIG_QEMU_LDST_OPTIMIZATION" and disabled by default.
>   - They are enabled by "configure --enable-ldst-optimization" and need CONFIG_SOFTMMU.
>   - They do not work with CONFIG_TCG_PASS_AREG0 because it looks better apply them after areg0 codes come steady.
>   - Currently, they support only x86 and x86-64 and have been tested with x86 and ARM linux targets on x86/x86-64 host platforms.
>   - Build test has been done for all targets.

I'd like to summarize community's feedbacks/observations and propose new 
patch for ldst optimization.

* Feedbacks/observations
1. It needs to work with PASS_AREG0 (CONFIG_TCG_PASS_AREG0).
2. It does not need to be configured by user.
3. It looks good for a target to be ldst-optimized on x86/64 hosts not 
optionally.
4. CONFIG_QEMU_LDST_OPTIMIZATION looks necessary because common code 
(e.g. tcg.h/tcg.c) is used on-x86 hosts and it should support 
non-softmmu targets.
5. It might need two versions of MMU helpers, standard and extended, 
simultaneously because C code might want to call the standard version.

* Modification proposals
1. Apply ldst optimization also when PASS_AREG0 enabled.
2. Make softmmu targets always to use ldst optimization on x86/64 hosts. 
But testing for many targets is an issue...
3. Make target mem(op) helpers to provide extended MMU helpers and 
softmmu_header.h to provide standard MMU helpers.
4. Fix some mistypings and redundant checks.

I'm not sure whether my proposals are feasible, however, I'd like to try 
them.
How do you think about it?

> Yeongkyoon Lee (4):
>    tcg: add declarations and templates of extended MMU helpers
>    tcg: add extended MMU helpers to softmmu targets
>    tcg: add optimized TCG qemu_ld/st generation
>    configure: add CONFIG_QEMU_LDST_OPTIMIZATION for TCG qemu_ld/st
>      optimization
>
>   configure                     |   15 ++
>   softmmu_defs.h                |   13 ++
>   softmmu_template.h            |   51 +++++--
>   target-alpha/mem_helper.c     |   22 +++
>   target-arm/op_helper.c        |   23 +++
>   target-cris/op_helper.c       |   22 +++
>   target-i386/mem_helper.c      |   22 +++
>   target-lm32/op_helper.c       |   23 +++-
>   target-m68k/op_helper.c       |   22 +++
>   target-microblaze/op_helper.c |   22 +++
>   target-mips/op_helper.c       |   22 +++
>   target-ppc/mem_helper.c       |   22 +++
>   target-s390x/op_helper.c      |   22 +++
>   target-sh4/op_helper.c        |   22 +++
>   target-sparc/ldst_helper.c    |   23 +++
>   target-xtensa/op_helper.c     |   22 +++
>   tcg/i386/tcg-target.c         |  328 +++++++++++++++++++++++++++++++++++++++++
>   tcg/tcg.c                     |   12 ++
>   tcg/tcg.h                     |   35 +++++
>   19 files changed, 732 insertions(+), 11 deletions(-)
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2012-07-10  9:12 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-05 13:23 [Qemu-devel] [RFC][PATCH v2 0/4] tcg: enhance code generation quality for qemu_ld/st IRs Yeongkyoon Lee
2012-07-05 13:23 ` [Qemu-devel] [RFC][PATCH v2 1/4] tcg: add declarations and templates of extended MMU helpers Yeongkyoon Lee
2012-07-05 13:40   ` Peter Maydell
2012-07-06 10:30     ` Yeongkyoon Lee
2012-07-06 10:35       ` 陳韋任 (Wei-Ren Chen)
2012-07-05 13:23 ` [Qemu-devel] [RFC][PATCH v2 2/4] tcg: add extended MMU helpers to softmmu targets Yeongkyoon Lee
2012-07-05 13:43   ` Peter Maydell
2012-07-05 18:49     ` Blue Swirl
2012-07-06 12:16       ` Yeongkyoon Lee
2012-07-05 13:23 ` [Qemu-devel] [RFC][PATCH v2 3/4] tcg: add optimized TCG qemu_ld/st generation Yeongkyoon Lee
2012-07-05 14:04   ` Peter Maydell
2012-07-06 11:20     ` Yeongkyoon Lee
2012-07-06 11:28       ` Peter Maydell
2012-07-05 13:23 ` [Qemu-devel] [RFC][PATCH v2 4/4] configure: add CONFIG_QEMU_LDST_OPTIMIZATION for TCG qemu_ld/st optimization Yeongkyoon Lee
2012-07-05 13:55   ` Andreas Färber
2012-07-06  3:13     ` Evgeny Voevodin
2012-07-05 14:06   ` Peter Maydell
2012-07-05 14:26     ` Laurent Desnogues
2012-07-06 11:43     ` Yeongkyoon Lee
2012-07-07  7:51       ` Blue Swirl
2012-07-08  8:35         ` Yeongkyoon Lee
2012-07-10  9:12 ` [Qemu-devel] [RFC][PATCH v2 0/4] tcg: enhance code generation quality for qemu_ld/st IRs Yeongkyoon Lee

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.