All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/19] Mirror map JIT memory for TCG
@ 2020-10-30  0:49 Richard Henderson
  2020-10-30  0:49 ` [PATCH v2 01/19] tcg: Enhance flush_icache_range with separate data pointer Richard Henderson
                   ` (20 more replies)
  0 siblings, 21 replies; 30+ messages in thread
From: Richard Henderson @ 2020-10-30  0:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, j, laurent

This is my take on Joelle's patch set:
https://lists.nongnu.org/archive/html/qemu-devel/2020-10/msg07837.html

First, lots more patches.  For the most part, I convert one interface
at a time, instead of trying to do it all at once.  Then, convert the
tcg backends one at a time, allowing for a backend to say that it has
not been updated and not to use the split.  This takes care of TCI,
for one, which would never be converted, as it makes no sense.  But I
don't expect to ever try to convert mips either -- the memory mapping
constraints there are ugly.

There are many more places that "const" could logically be pushed.
I stopped at several major interfaces and left TODO comments.

I have only converted tcg/i386 and tcg/aarch64 so far.  That should
certainly be sufficient for immediate darwin/iOS testing.

Second, I've taken the start with rw and offset to rx approach, which
is the opposite of Joelle's patch set.  It's a close call, but this
direction may be slightly cleaner.

Third, there are almost no ifdefs.  The only ones are related to host
specific support.  That means that this is always available, modulo
the actual tcg backend support.  When the feature is disabled, we will
be adding and subtracting a 0 stored in a global variable.

Fourth, I have renamed the command-line parameter to "split-rwx".
I don't think this is perfect, and I'm not even sure if it's better
than "mirror-jit".  What this has done, though, is left the code
with inconsistant language -- "mirror" in some places, "split" in
others.  I'll clean that up once we know decide on naming.

Fifth, I have auto-enabled the feature for CONFIG_DEBUG_TCG, so that
it will fall-back to disabled without error.  But if you try to enable
it from the command-line without complete host support a fatal error
will be generated.  But this will make sure that the feature is
regularly tested.


r~


Richard Henderson (19):
  tcg: Enhance flush_icache_range with separate data pointer
  tcg: Move tcg prologue pointer out of TCGContext
  tcg: Move tcg epilogue pointer out of TCGContext
  tcg: Introduce tcg_mirror_rw_to_rx/tcg_mirror_rx_to_rw
  tcg: Adjust tcg_out_call for const
  tcg: Adjust tcg_out_label for const
  tcg: Adjust tcg_register_jit for const
  tcg: Adjust tb_target_set_jmp_target for split rwx
  tcg: Make DisasContextBase.tb const
  tcg: Make tb arg to synchronize_from_tb const
  tcg: Use Error with alloc_code_gen_buffer
  tcg: Add --accel tcg,split-rwx property
  accel/tcg: Support split-rwx for linux with memfd
  RFC: accel/tcg: Support split-rwx for darwin/iOS with vm_remap
  tcg: Return the rx mirror of TranslationBlock from exit_tb
  tcg/i386: Support split-rwx code generation
  tcg/aarch64: Use B not BL for tcg_out_goto_long
  tcg/aarch64: Implement flush_idcache_range manually
  tcg/aarch64: Support split-rwx code generation

 accel/tcg/tcg-runtime.h      |   2 +-
 include/disas/disas.h        |   2 +-
 include/exec/exec-all.h      |   2 +-
 include/exec/gen-icount.h    |   4 +-
 include/exec/log.h           |   2 +-
 include/exec/translator.h    |   2 +-
 include/hw/core/cpu.h        |   3 +-
 include/sysemu/tcg.h         |   2 +-
 include/tcg/tcg-op.h         |   2 +-
 include/tcg/tcg.h            |  37 +++--
 tcg/aarch64/tcg-target.h     |   9 +-
 tcg/arm/tcg-target.h         |  11 +-
 tcg/i386/tcg-target.h        |  10 +-
 tcg/mips/tcg-target.h        |  11 +-
 tcg/ppc/tcg-target.h         |   5 +-
 tcg/riscv/tcg-target.h       |  11 +-
 tcg/s390/tcg-target.h        |  12 +-
 tcg/sparc/tcg-target.h       |  11 +-
 tcg/tci/tcg-target.h         |  12 +-
 accel/tcg/cpu-exec.c         |  41 +++---
 accel/tcg/tcg-all.c          |  26 +++-
 accel/tcg/tcg-runtime.c      |   4 +-
 accel/tcg/translate-all.c    | 255 ++++++++++++++++++++++++++++-------
 accel/tcg/translator.c       |   4 +-
 bsd-user/main.c              |   2 +-
 disas.c                      |   4 +-
 linux-user/main.c            |   2 +-
 softmmu/physmem.c            |   9 +-
 target/arm/cpu.c             |   3 +-
 target/arm/translate-a64.c   |   2 +-
 target/avr/cpu.c             |   3 +-
 target/hppa/cpu.c            |   3 +-
 target/i386/cpu.c            |   3 +-
 target/microblaze/cpu.c      |   3 +-
 target/mips/cpu.c            |   3 +-
 target/riscv/cpu.c           |   3 +-
 target/rx/cpu.c              |   3 +-
 target/sh4/cpu.c             |   3 +-
 target/sparc/cpu.c           |   3 +-
 target/tricore/cpu.c         |   2 +-
 tcg/tcg-op.c                 |  15 ++-
 tcg/tcg.c                    |  85 +++++++++---
 tcg/tci.c                    |   4 +-
 accel/tcg/trace-events       |   2 +-
 tcg/aarch64/tcg-target.c.inc | 130 +++++++++++++-----
 tcg/arm/tcg-target.c.inc     |   6 +-
 tcg/i386/tcg-target.c.inc    |  38 +++---
 tcg/mips/tcg-target.c.inc    |  18 +--
 tcg/ppc/tcg-target.c.inc     |  45 ++++---
 tcg/riscv/tcg-target.c.inc   |  12 +-
 tcg/s390/tcg-target.c.inc    |   8 +-
 tcg/sparc/tcg-target.c.inc   |  22 +--
 tcg/tcg-pool.c.inc           |   6 +-
 tcg/tci/tcg-target.c.inc     |   2 +-
 54 files changed, 655 insertions(+), 269 deletions(-)

-- 
2.25.1



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v2 01/19] tcg: Enhance flush_icache_range with separate data pointer
  2020-10-30  0:49 [PATCH v2 00/19] Mirror map JIT memory for TCG Richard Henderson
@ 2020-10-30  0:49 ` Richard Henderson
  2020-11-01  6:54   ` Joelle van Dyne
  2020-10-30  0:49 ` [PATCH v2 02/19] tcg: Move tcg prologue pointer out of TCGContext Richard Henderson
                   ` (19 subsequent siblings)
  20 siblings, 1 reply; 30+ messages in thread
From: Richard Henderson @ 2020-10-30  0:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, j, laurent

We are shortly going to have a split rw/rx jit buffer.  Depending
on the host, we need to flush the dcache at the rw data pointer and
flush the icache at the rx code pointer.

For now, the two passed pointers are identical, so there is no
effective change in behaviour.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/aarch64/tcg-target.h     |  9 +++++++--
 tcg/arm/tcg-target.h         |  8 ++++++--
 tcg/i386/tcg-target.h        |  3 ++-
 tcg/mips/tcg-target.h        |  8 ++++++--
 tcg/ppc/tcg-target.h         |  2 +-
 tcg/riscv/tcg-target.h       |  8 ++++++--
 tcg/s390/tcg-target.h        |  3 ++-
 tcg/sparc/tcg-target.h       |  8 +++++---
 tcg/tci/tcg-target.h         |  3 ++-
 softmmu/physmem.c            |  9 ++++++++-
 tcg/tcg.c                    |  5 +++--
 tcg/aarch64/tcg-target.c.inc |  2 +-
 tcg/mips/tcg-target.c.inc    |  2 +-
 tcg/ppc/tcg-target.c.inc     | 21 +++++++++++----------
 tcg/sparc/tcg-target.c.inc   |  4 ++--
 15 files changed, 63 insertions(+), 32 deletions(-)

diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index 663dd0b95e..d0a6a059b7 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -148,9 +148,14 @@ typedef enum {
 #define TCG_TARGET_DEFAULT_MO (0)
 #define TCG_TARGET_HAS_MEMORY_BSWAP     1
 
-static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
+/* Flush the dcache at RW, and the icache at RX, as necessary. */
+static inline void flush_idcache_range(uintptr_t rx, uintptr_t rw, size_t len)
 {
-    __builtin___clear_cache((char *)start, (char *)stop);
+    /* TODO: Copy this from gcc to avoid 4 loops instead of 2. */
+    if (rw != rx) {
+        __builtin___clear_cache((char *)rw, (char *)(rw + len));
+    }
+    __builtin___clear_cache((char *)rx, (char *)(rx + len));
 }
 
 void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t);
diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
index 17e771374d..fa88b24e43 100644
--- a/tcg/arm/tcg-target.h
+++ b/tcg/arm/tcg-target.h
@@ -134,9 +134,13 @@ enum {
 #define TCG_TARGET_DEFAULT_MO (0)
 #define TCG_TARGET_HAS_MEMORY_BSWAP     1
 
-static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
+/* Flush the dcache at RW, and the icache at RX, as necessary. */
+static inline void flush_idcache_range(uintptr_t rx, uintptr_t rw, size_t len)
 {
-    __builtin___clear_cache((char *) start, (char *) stop);
+    if (rw != rx) {
+        __builtin___clear_cache((char *)rw, (char *)(rw + len));
+    }
+    __builtin___clear_cache((char *)rx, (char *)(rx + len));
 }
 
 /* not defined -- call should be eliminated at compile time */
diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index abd4ac7fc0..8323e72639 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -206,7 +206,8 @@ extern bool have_avx2;
 #define TCG_TARGET_extract_i64_valid(ofs, len) \
     (((ofs) == 8 && (len) == 8) || ((ofs) + (len)) == 32)
 
-static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
+/* Flush the dcache at RW, and the icache at RX, as necessary. */
+static inline void flush_idcache_range(uintptr_t rx, uintptr_t rw, size_t len)
 {
 }
 
diff --git a/tcg/mips/tcg-target.h b/tcg/mips/tcg-target.h
index c6b091d849..47b1226ee9 100644
--- a/tcg/mips/tcg-target.h
+++ b/tcg/mips/tcg-target.h
@@ -207,9 +207,13 @@ extern bool use_mips32r2_instructions;
 #define TCG_TARGET_DEFAULT_MO (0)
 #define TCG_TARGET_HAS_MEMORY_BSWAP     1
 
-static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
+/* Flush the dcache at RW, and the icache at RX, as necessary. */
+static inline void flush_idcache_range(uintptr_t rx, uintptr_t rw, size_t len)
 {
-    cacheflush ((void *)start, stop-start, ICACHE);
+    if (rx != rw) {
+        cacheflush((void *)rw, len, DCACHE);
+    }
+    cacheflush((void *)rx, len, ICACHE);
 }
 
 void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t);
diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index be10363956..fbb6dc1b47 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -175,7 +175,7 @@ extern bool have_vsx;
 #define TCG_TARGET_HAS_bitsel_vec       have_vsx
 #define TCG_TARGET_HAS_cmpsel_vec       0
 
-void flush_icache_range(uintptr_t start, uintptr_t stop);
+void flush_idcache_range(uintptr_t rx, uintptr_t rw, size_t len);
 void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t);
 
 #define TCG_TARGET_DEFAULT_MO (0)
diff --git a/tcg/riscv/tcg-target.h b/tcg/riscv/tcg-target.h
index 032439d806..0fa6ae358e 100644
--- a/tcg/riscv/tcg-target.h
+++ b/tcg/riscv/tcg-target.h
@@ -159,9 +159,13 @@ typedef enum {
 #define TCG_TARGET_HAS_mulsh_i64        1
 #endif
 
-static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
+/* Flush the dcache at RW, and the icache at RX, as necessary. */
+static inline void flush_idcache_range(uintptr_t rx, uintptr_t rw, size_t len)
 {
-    __builtin___clear_cache((char *)start, (char *)stop);
+    if (rx != rw) {
+        __builtin___clear_cache((char *)rw, (char *)(rw + len));
+    }
+    __builtin___clear_cache((char *)rx, (char *)(rx + len));
 }
 
 /* not defined -- call should be eliminated at compile time */
diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h
index 63c8797bd3..c3dc2e8938 100644
--- a/tcg/s390/tcg-target.h
+++ b/tcg/s390/tcg-target.h
@@ -145,7 +145,8 @@ enum {
     TCG_AREG0 = TCG_REG_R10,
 };
 
-static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
+/* Flush the dcache at RW, and the icache at RX, as necessary. */
+static inline void flush_idcache_range(uintptr_t rx, uintptr_t rw, size_t len)
 {
 }
 
diff --git a/tcg/sparc/tcg-target.h b/tcg/sparc/tcg-target.h
index 633841ebf2..c27c40231e 100644
--- a/tcg/sparc/tcg-target.h
+++ b/tcg/sparc/tcg-target.h
@@ -168,10 +168,12 @@ extern bool use_vis3_instructions;
 #define TCG_TARGET_DEFAULT_MO (0)
 #define TCG_TARGET_HAS_MEMORY_BSWAP     1
 
-static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
+/* Flush the dcache at RW, and the icache at RX, as necessary. */
+static inline void flush_idcache_range(uintptr_t rx, uintptr_t rw, size_t len)
 {
-    uintptr_t p;
-    for (p = start & -8; p < ((stop + 7) & -8); p += 8) {
+    /* No additional data flush to the RW virtual address required. */
+    uintptr_t p, end = (rx + len + 7) & -8;
+    for (p = rx & -8; p < end; p += 8) {
         __asm__ __volatile__("flush\t%0" : : "r" (p));
     }
 }
diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index 8c1c1d265d..6460449719 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -191,7 +191,8 @@ void tci_disas(uint8_t opc);
 
 #define HAVE_TCG_QEMU_TB_EXEC
 
-static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
+/* Flush the dcache at RW, and the icache at RX, as necessary. */
+static inline void flush_idcache_range(uintptr_t rx, uintptr_t rw, size_t len)
 {
 }
 
diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index a9adedb9f8..b23c1fef54 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -2954,7 +2954,14 @@ static inline MemTxResult address_space_write_rom_internal(AddressSpace *as,
                 invalidate_and_set_dirty(mr, addr1, l);
                 break;
             case FLUSH_CACHE:
-                flush_icache_range((uintptr_t)ram_ptr, (uintptr_t)ram_ptr + l);
+                /*
+                 * FIXME: This function is currently located in tcg/host/,
+                 * but we never come here when tcg is enabled; only for
+                 * real hardware acceleration.  This can actively fail
+                 * when TCI is configured, since that function is a nop.
+                 * We should move this to util/ or something.
+                 */
+                flush_idcache_range((uintptr_t)ram_ptr, (uintptr_t)ram_ptr, l);
                 break;
             }
         }
diff --git a/tcg/tcg.c b/tcg/tcg.c
index a8c28440e2..3bf36e0cfe 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1076,7 +1076,7 @@ void tcg_prologue_init(TCGContext *s)
 #endif
 
     buf1 = s->code_ptr;
-    flush_icache_range((uintptr_t)buf0, (uintptr_t)buf1);
+    flush_idcache_range((uintptr_t)buf0, (uintptr_t)buf0, buf1 - buf0);
 
     /* Deduct the prologue from the buffer.  */
     prologue_size = tcg_current_code_size(s);
@@ -4268,7 +4268,8 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
     }
 
     /* flush instruction cache */
-    flush_icache_range((uintptr_t)s->code_buf, (uintptr_t)s->code_ptr);
+    flush_idcache_range((uintptr_t)s->code_buf, (uintptr_t)s->code_buf,
+                        s->code_ptr - s->code_buf);
 
     return tcg_current_code_size(s);
 }
diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
index 26f71cb599..83af3108a4 100644
--- a/tcg/aarch64/tcg-target.c.inc
+++ b/tcg/aarch64/tcg-target.c.inc
@@ -1363,7 +1363,7 @@ void tb_target_set_jmp_target(uintptr_t tc_ptr, uintptr_t jmp_addr,
     }
     pair = (uint64_t)i2 << 32 | i1;
     qatomic_set((uint64_t *)jmp_addr, pair);
-    flush_icache_range(jmp_addr, jmp_addr + 8);
+    flush_idcache_range(jmp_addr, jmp_addr, 8);
 }
 
 static inline void tcg_out_goto_label(TCGContext *s, TCGLabel *l)
diff --git a/tcg/mips/tcg-target.c.inc b/tcg/mips/tcg-target.c.inc
index 41be574e89..c255ecb444 100644
--- a/tcg/mips/tcg-target.c.inc
+++ b/tcg/mips/tcg-target.c.inc
@@ -2660,7 +2660,7 @@ void tb_target_set_jmp_target(uintptr_t tc_ptr, uintptr_t jmp_addr,
                               uintptr_t addr)
 {
     qatomic_set((uint32_t *)jmp_addr, deposit32(OPC_J, 0, 26, addr >> 2));
-    flush_icache_range(jmp_addr, jmp_addr + 4);
+    flush_idcache_range(jmp_addr, jmp_addr, 4);
 }
 
 typedef struct {
diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index 18ee989f95..a848e98383 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc
@@ -1753,12 +1753,12 @@ void tb_target_set_jmp_target(uintptr_t tc_ptr, uintptr_t jmp_addr,
         /* As per the enclosing if, this is ppc64.  Avoid the _Static_assert
            within qatomic_set that would fail to build a ppc32 host.  */
         qatomic_set__nocheck((uint64_t *)jmp_addr, pair);
-        flush_icache_range(jmp_addr, jmp_addr + 8);
+        flush_idcache_range(jmp_addr, jmp_addr, 8);
     } else {
         intptr_t diff = addr - jmp_addr;
         tcg_debug_assert(in_range_b(diff));
         qatomic_set((uint32_t *)jmp_addr, B | (diff & 0x3fffffc));
-        flush_icache_range(jmp_addr, jmp_addr + 4);
+        flush_idcache_range(jmp_addr, jmp_addr, 4);
     }
 }
 
@@ -3864,22 +3864,23 @@ void tcg_register_jit(void *buf, size_t buf_size)
 }
 #endif /* __ELF__ */
 
-void flush_icache_range(uintptr_t start, uintptr_t stop)
+/* Flush the dcache at RW, and the icache at RX, as necessary. */
+void flush_idcache_range(uintptr_t rx, uintptr_t rw, uintptr_t len)
 {
-    uintptr_t p, start1, stop1;
+    uintptr_t p, start, stop;
     size_t dsize = qemu_dcache_linesize;
     size_t isize = qemu_icache_linesize;
 
-    start1 = start & ~(dsize - 1);
-    stop1 = (stop + dsize - 1) & ~(dsize - 1);
-    for (p = start1; p < stop1; p += dsize) {
+    start = rw & ~(dsize - 1);
+    stop = (rw + len + dsize - 1) & ~(dsize - 1);
+    for (p = start; p < stop; p += dsize) {
         asm volatile ("dcbst 0,%0" : : "r"(p) : "memory");
     }
     asm volatile ("sync" : : : "memory");
 
-    start &= start & ~(isize - 1);
-    stop1 = (stop + isize - 1) & ~(isize - 1);
-    for (p = start1; p < stop1; p += isize) {
+    start = rx & ~(isize - 1);
+    stop = (rx + len + isize - 1) & ~(isize - 1);
+    for (p = start; p < stop; p += isize) {
         asm volatile ("icbi 0,%0" : : "r"(p) : "memory");
     }
     asm volatile ("sync" : : : "memory");
diff --git a/tcg/sparc/tcg-target.c.inc b/tcg/sparc/tcg-target.c.inc
index 6775bd30fc..6e2d755f6a 100644
--- a/tcg/sparc/tcg-target.c.inc
+++ b/tcg/sparc/tcg-target.c.inc
@@ -1836,7 +1836,7 @@ void tb_target_set_jmp_target(uintptr_t tc_ptr, uintptr_t jmp_addr,
     if (!USE_REG_TB) {
         qatomic_set((uint32_t *)jmp_addr,
 		    deposit32(CALL, 0, 30, br_disp >> 2));
-        flush_icache_range(jmp_addr, jmp_addr + 4);
+        flush_idcache_range(jmp_addr, jmp_addr, 4);
         return;
     }
 
@@ -1860,5 +1860,5 @@ void tb_target_set_jmp_target(uintptr_t tc_ptr, uintptr_t jmp_addr,
     }
 
     qatomic_set((uint64_t *)jmp_addr, deposit64(i2, 32, 32, i1));
-    flush_icache_range(jmp_addr, jmp_addr + 8);
+    flush_idcache_range(jmp_addr, jmp_addr, 8);
 }
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 02/19] tcg: Move tcg prologue pointer out of TCGContext
  2020-10-30  0:49 [PATCH v2 00/19] Mirror map JIT memory for TCG Richard Henderson
  2020-10-30  0:49 ` [PATCH v2 01/19] tcg: Enhance flush_icache_range with separate data pointer Richard Henderson
@ 2020-10-30  0:49 ` Richard Henderson
  2020-10-30  0:49 ` [PATCH v2 03/19] tcg: Move tcg epilogue " Richard Henderson
                   ` (18 subsequent siblings)
  20 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2020-10-30  0:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, j, laurent

This value is constant across all thread-local copies of TCGContext,
so we might as well move it out of thread-local storage.

Use the correct function pointer type, and name the variable
tcg_qemu_tb_exec, which means that we are able to remove the
macro that does the casting.

Replace HAVE_TCG_QEMU_TB_EXEC with CONFIG_TCG_INTERPRETER,
as this is somewhat clearer in intent.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/tcg/tcg.h    | 9 ++++-----
 tcg/tci/tcg-target.h | 2 --
 tcg/tcg.c            | 9 ++++++++-
 tcg/tci.c            | 3 ++-
 4 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index 8804a8c4a2..5ff5bf2a73 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -621,7 +621,6 @@ struct TCGContext {
        here, because there's too much arithmetic throughout that relies
        on addition and subtraction working on bytes.  Rely on the GCC
        extension that allows arithmetic on void*.  */
-    void *code_gen_prologue;
     void *code_gen_epilogue;
     void *code_gen_buffer;
     size_t code_gen_buffer_size;
@@ -1220,11 +1219,11 @@ static inline unsigned get_mmuidx(TCGMemOpIdx oi)
 #define TB_EXIT_IDXMAX    1
 #define TB_EXIT_REQUESTED 3
 
-#ifdef HAVE_TCG_QEMU_TB_EXEC
-uintptr_t tcg_qemu_tb_exec(CPUArchState *env, uint8_t *tb_ptr);
+#ifdef CONFIG_TCG_INTERPRETER
+uintptr_t tcg_qemu_tb_exec(CPUArchState *env, void *tb_ptr);
 #else
-# define tcg_qemu_tb_exec(env, tb_ptr) \
-    ((uintptr_t (*)(void *, void *))tcg_ctx->code_gen_prologue)(env, tb_ptr)
+typedef uintptr_t tcg_prologue_fn(CPUArchState *env, void *tb_ptr);
+extern tcg_prologue_fn *tcg_qemu_tb_exec;
 #endif
 
 void tcg_register_jit(void *buf, size_t buf_size);
diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index 6460449719..49f3291f8a 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -189,8 +189,6 @@ typedef enum {
 
 void tci_disas(uint8_t opc);
 
-#define HAVE_TCG_QEMU_TB_EXEC
-
 /* Flush the dcache at RW, and the icache at RX, as necessary. */
 static inline void flush_idcache_range(uintptr_t rx, uintptr_t rw, size_t len)
 {
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 3bf36e0cfe..8d63c714fb 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -161,6 +161,10 @@ static TCGContext **tcg_ctxs;
 static unsigned int n_tcg_ctxs;
 TCGv_env cpu_env = 0;
 
+#ifndef CONFIG_TCG_INTERPRETER
+tcg_prologue_fn *tcg_qemu_tb_exec;
+#endif
+
 struct tcg_region_tree {
     QemuMutex lock;
     GTree *tree;
@@ -1053,7 +1057,10 @@ void tcg_prologue_init(TCGContext *s)
     s->code_ptr = buf0;
     s->code_buf = buf0;
     s->data_gen_ptr = NULL;
-    s->code_gen_prologue = buf0;
+
+#ifndef CONFIG_TCG_INTERPRETER
+    tcg_qemu_tb_exec = (tcg_prologue_fn *)buf0;
+#endif
 
     /* Compute a high-water mark, at which we voluntarily flush the buffer
        and start over.  The size here is arbitrary, significantly larger
diff --git a/tcg/tci.c b/tcg/tci.c
index 82039fd163..d996eb7cf8 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -475,8 +475,9 @@ static bool tci_compare64(uint64_t u0, uint64_t u1, TCGCond condition)
 #endif
 
 /* Interpret pseudo code in tb. */
-uintptr_t tcg_qemu_tb_exec(CPUArchState *env, uint8_t *tb_ptr)
+uintptr_t tcg_qemu_tb_exec(CPUArchState *env, void *v_tb_ptr)
 {
+    uint8_t *tb_ptr = v_tb_ptr;
     tcg_target_ulong regs[TCG_TARGET_NB_REGS];
     long tcg_temps[CPU_TEMP_BUF_NLONGS];
     uintptr_t sp_value = (uintptr_t)(tcg_temps + CPU_TEMP_BUF_NLONGS);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 03/19] tcg: Move tcg epilogue pointer out of TCGContext
  2020-10-30  0:49 [PATCH v2 00/19] Mirror map JIT memory for TCG Richard Henderson
  2020-10-30  0:49 ` [PATCH v2 01/19] tcg: Enhance flush_icache_range with separate data pointer Richard Henderson
  2020-10-30  0:49 ` [PATCH v2 02/19] tcg: Move tcg prologue pointer out of TCGContext Richard Henderson
@ 2020-10-30  0:49 ` Richard Henderson
  2020-10-30  0:49 ` [PATCH v2 04/19] tcg: Introduce tcg_mirror_rw_to_rx/tcg_mirror_rx_to_rw Richard Henderson
                   ` (17 subsequent siblings)
  20 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2020-10-30  0:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, j, laurent

This value is constant across all thread-local copies of TCGContext,
so we might as well move it out of thread-local storage.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/tcg/tcg.h            | 2 +-
 accel/tcg/tcg-runtime.c      | 2 +-
 tcg/tcg.c                    | 3 ++-
 tcg/aarch64/tcg-target.c.inc | 4 ++--
 tcg/arm/tcg-target.c.inc     | 2 +-
 tcg/i386/tcg-target.c.inc    | 4 ++--
 tcg/mips/tcg-target.c.inc    | 2 +-
 tcg/ppc/tcg-target.c.inc     | 2 +-
 tcg/riscv/tcg-target.c.inc   | 4 ++--
 tcg/s390/tcg-target.c.inc    | 4 ++--
 tcg/sparc/tcg-target.c.inc   | 2 +-
 11 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index 5ff5bf2a73..3c56a90abc 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -621,7 +621,6 @@ struct TCGContext {
        here, because there's too much arithmetic throughout that relies
        on addition and subtraction working on bytes.  Rely on the GCC
        extension that allows arithmetic on void*.  */
-    void *code_gen_epilogue;
     void *code_gen_buffer;
     size_t code_gen_buffer_size;
     void *code_gen_ptr;
@@ -678,6 +677,7 @@ struct TCGContext {
 
 extern TCGContext tcg_init_ctx;
 extern __thread TCGContext *tcg_ctx;
+extern void *tcg_code_gen_epilogue;
 extern TCGv_env cpu_env;
 
 static inline size_t temp_idx(TCGTemp *ts)
diff --git a/accel/tcg/tcg-runtime.c b/accel/tcg/tcg-runtime.c
index 446465a09a..f85dfefeab 100644
--- a/accel/tcg/tcg-runtime.c
+++ b/accel/tcg/tcg-runtime.c
@@ -154,7 +154,7 @@ void *HELPER(lookup_tb_ptr)(CPUArchState *env)
 
     tb = tb_lookup__cpu_state(cpu, &pc, &cs_base, &flags, curr_cflags());
     if (tb == NULL) {
-        return tcg_ctx->code_gen_epilogue;
+        return tcg_code_gen_epilogue;
     }
     qemu_log_mask_and_addr(CPU_LOG_EXEC, pc,
                            "Chain %d: %p ["
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 8d63c714fb..1916a818d9 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -160,6 +160,7 @@ static int tcg_out_ldst_finalize(TCGContext *s);
 static TCGContext **tcg_ctxs;
 static unsigned int n_tcg_ctxs;
 TCGv_env cpu_env = 0;
+void *tcg_code_gen_epilogue;
 
 #ifndef CONFIG_TCG_INTERPRETER
 tcg_prologue_fn *tcg_qemu_tb_exec;
@@ -1128,7 +1129,7 @@ void tcg_prologue_init(TCGContext *s)
 
     /* Assert that goto_ptr is implemented completely.  */
     if (TCG_TARGET_HAS_goto_ptr) {
-        tcg_debug_assert(s->code_gen_epilogue != NULL);
+        tcg_debug_assert(tcg_code_gen_epilogue != NULL);
     }
 }
 
diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
index 83af3108a4..76f8ae48ad 100644
--- a/tcg/aarch64/tcg-target.c.inc
+++ b/tcg/aarch64/tcg-target.c.inc
@@ -1873,7 +1873,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_exit_tb:
         /* Reuse the zeroing that exists for goto_ptr.  */
         if (a0 == 0) {
-            tcg_out_goto_long(s, s->code_gen_epilogue);
+            tcg_out_goto_long(s, tcg_code_gen_epilogue);
         } else {
             tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_X0, a0);
             tcg_out_goto_long(s, tb_ret_addr);
@@ -2894,7 +2894,7 @@ static void tcg_target_qemu_prologue(TCGContext *s)
      * Return path for goto_ptr. Set return value to 0, a-la exit_tb,
      * and fall through to the rest of the epilogue.
      */
-    s->code_gen_epilogue = s->code_ptr;
+    tcg_code_gen_epilogue = s->code_ptr;
     tcg_out_movi(s, TCG_TYPE_REG, TCG_REG_X0, 0);
 
     /* TB epilogue */
diff --git a/tcg/arm/tcg-target.c.inc b/tcg/arm/tcg-target.c.inc
index 62c37a954b..1e32bf42b8 100644
--- a/tcg/arm/tcg-target.c.inc
+++ b/tcg/arm/tcg-target.c.inc
@@ -2297,7 +2297,7 @@ static void tcg_target_qemu_prologue(TCGContext *s)
      * Return path for goto_ptr. Set return value to 0, a-la exit_tb,
      * and fall through to the rest of the epilogue.
      */
-    s->code_gen_epilogue = s->code_ptr;
+    tcg_code_gen_epilogue = s->code_ptr;
     tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R0, 0);
     tcg_out_epilogue(s);
 }
diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index d8797ed398..424dd1cdcf 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -2267,7 +2267,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_exit_tb:
         /* Reuse the zeroing that exists for goto_ptr.  */
         if (a0 == 0) {
-            tcg_out_jmp(s, s->code_gen_epilogue);
+            tcg_out_jmp(s, tcg_code_gen_epilogue);
         } else {
             tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_EAX, a0);
             tcg_out_jmp(s, tb_ret_addr);
@@ -3825,7 +3825,7 @@ static void tcg_target_qemu_prologue(TCGContext *s)
      * Return path for goto_ptr. Set return value to 0, a-la exit_tb,
      * and fall through to the rest of the epilogue.
      */
-    s->code_gen_epilogue = s->code_ptr;
+    tcg_code_gen_epilogue = s->code_ptr;
     tcg_out_movi(s, TCG_TYPE_REG, TCG_REG_EAX, 0);
 
     /* TB epilogue */
diff --git a/tcg/mips/tcg-target.c.inc b/tcg/mips/tcg-target.c.inc
index c255ecb444..f641105f9a 100644
--- a/tcg/mips/tcg-target.c.inc
+++ b/tcg/mips/tcg-target.c.inc
@@ -2483,7 +2483,7 @@ static void tcg_target_qemu_prologue(TCGContext *s)
      * Return path for goto_ptr. Set return value to 0, a-la exit_tb,
      * and fall through to the rest of the epilogue.
      */
-    s->code_gen_epilogue = s->code_ptr;
+    tcg_code_gen_epilogue = s->code_ptr;
     tcg_out_mov(s, TCG_TYPE_REG, TCG_REG_V0, TCG_REG_ZERO);
 
     /* TB epilogue */
diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index a848e98383..be116c6164 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc
@@ -2341,7 +2341,7 @@ static void tcg_target_qemu_prologue(TCGContext *s)
     tcg_out32(s, BCCTR | BO_ALWAYS);
 
     /* Epilogue */
-    s->code_gen_epilogue = tb_ret_addr = s->code_ptr;
+    tcg_code_gen_epilogue = tb_ret_addr = s->code_ptr;
 
     tcg_out_ld(s, TCG_TYPE_PTR, TCG_REG_R0, TCG_REG_R1, FRAME_SIZE+LR_OFFSET);
     for (i = 0; i < ARRAY_SIZE(tcg_target_callee_save_regs); ++i) {
diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index d536f3ccc1..ab08af7457 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -1288,7 +1288,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_exit_tb:
         /* Reuse the zeroing that exists for goto_ptr.  */
         if (a0 == 0) {
-            tcg_out_call_int(s, s->code_gen_epilogue, true);
+            tcg_out_call_int(s, tcg_code_gen_epilogue, true);
         } else {
             tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_A0, a0);
             tcg_out_call_int(s, tb_ret_addr, true);
@@ -1822,7 +1822,7 @@ static void tcg_target_qemu_prologue(TCGContext *s)
     tcg_out_opc_imm(s, OPC_JALR, TCG_REG_ZERO, tcg_target_call_iarg_regs[1], 0);
 
     /* Return path for goto_ptr. Set return value to 0 */
-    s->code_gen_epilogue = s->code_ptr;
+    tcg_code_gen_epilogue = s->code_ptr;
     tcg_out_mov(s, TCG_TYPE_REG, TCG_REG_A0, TCG_REG_ZERO);
 
     /* TB epilogue */
diff --git a/tcg/s390/tcg-target.c.inc b/tcg/s390/tcg-target.c.inc
index c5e096449b..ac99ccea73 100644
--- a/tcg/s390/tcg-target.c.inc
+++ b/tcg/s390/tcg-target.c.inc
@@ -1756,7 +1756,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         /* Reuse the zeroing that exists for goto_ptr.  */
         a0 = args[0];
         if (a0 == 0) {
-            tgen_gotoi(s, S390_CC_ALWAYS, s->code_gen_epilogue);
+            tgen_gotoi(s, S390_CC_ALWAYS, tcg_code_gen_epilogue);
         } else {
             tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R2, a0);
             tgen_gotoi(s, S390_CC_ALWAYS, tb_ret_addr);
@@ -2561,7 +2561,7 @@ static void tcg_target_qemu_prologue(TCGContext *s)
      * Return path for goto_ptr. Set return value to 0, a-la exit_tb,
      * and fall through to the rest of the epilogue.
      */
-    s->code_gen_epilogue = s->code_ptr;
+    tcg_code_gen_epilogue = s->code_ptr;
     tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R2, 0);
 
     /* TB epilogue */
diff --git a/tcg/sparc/tcg-target.c.inc b/tcg/sparc/tcg-target.c.inc
index 6e2d755f6a..5b3bc91b05 100644
--- a/tcg/sparc/tcg-target.c.inc
+++ b/tcg/sparc/tcg-target.c.inc
@@ -1038,7 +1038,7 @@ static void tcg_target_qemu_prologue(TCGContext *s)
     tcg_out_nop(s);
 
     /* Epilogue for goto_ptr.  */
-    s->code_gen_epilogue = s->code_ptr;
+    tcg_code_gen_epilogue = s->code_ptr;
     tcg_out_arithi(s, TCG_REG_G0, TCG_REG_I7, 8, RETURN);
     /* delay slot */
     tcg_out_movi_imm13(s, TCG_REG_O0, 0);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 04/19] tcg: Introduce tcg_mirror_rw_to_rx/tcg_mirror_rx_to_rw
  2020-10-30  0:49 [PATCH v2 00/19] Mirror map JIT memory for TCG Richard Henderson
                   ` (2 preceding siblings ...)
  2020-10-30  0:49 ` [PATCH v2 03/19] tcg: Move tcg epilogue " Richard Henderson
@ 2020-10-30  0:49 ` Richard Henderson
  2020-10-30  0:49 ` [PATCH v2 05/19] tcg: Adjust tcg_out_call for const Richard Henderson
                   ` (16 subsequent siblings)
  20 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2020-10-30  0:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, j, laurent

Add two helper functions, using a global variable to hold
the displacement.  The displacement is currently always 0,
so no change in behaviour.

Begin using the functions in tcg common code only.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/tcg-runtime.h      |  2 +-
 include/disas/disas.h        |  2 +-
 include/exec/exec-all.h      |  2 +-
 include/exec/log.h           |  2 +-
 include/tcg/tcg.h            | 28 +++++++++++++----
 accel/tcg/cpu-exec.c         |  2 +-
 accel/tcg/tcg-runtime.c      |  2 +-
 accel/tcg/translate-all.c    | 29 ++++++++---------
 disas.c                      |  4 ++-
 tcg/tcg.c                    | 60 +++++++++++++++++++++++++++++++-----
 tcg/tci.c                    |  5 +--
 accel/tcg/trace-events       |  2 +-
 tcg/aarch64/tcg-target.c.inc |  2 +-
 13 files changed, 101 insertions(+), 41 deletions(-)

diff --git a/accel/tcg/tcg-runtime.h b/accel/tcg/tcg-runtime.h
index 4eda24e63a..c276c8beb5 100644
--- a/accel/tcg/tcg-runtime.h
+++ b/accel/tcg/tcg-runtime.h
@@ -24,7 +24,7 @@ DEF_HELPER_FLAGS_1(clrsb_i64, TCG_CALL_NO_RWG_SE, i64, i64)
 DEF_HELPER_FLAGS_1(ctpop_i32, TCG_CALL_NO_RWG_SE, i32, i32)
 DEF_HELPER_FLAGS_1(ctpop_i64, TCG_CALL_NO_RWG_SE, i64, i64)
 
-DEF_HELPER_FLAGS_1(lookup_tb_ptr, TCG_CALL_NO_WG_SE, ptr, env)
+DEF_HELPER_FLAGS_1(lookup_tb_ptr, TCG_CALL_NO_WG_SE, cptr, env)
 
 DEF_HELPER_FLAGS_1(exit_atomic, TCG_CALL_NO_WG, noreturn, env)
 
diff --git a/include/disas/disas.h b/include/disas/disas.h
index 36c33f6f19..d363e95ede 100644
--- a/include/disas/disas.h
+++ b/include/disas/disas.h
@@ -7,7 +7,7 @@
 #include "cpu.h"
 
 /* Disassemble this for me please... (debugging). */
-void disas(FILE *out, void *code, unsigned long size);
+void disas(FILE *out, const void *code, unsigned long size);
 void target_disas(FILE *out, CPUState *cpu, target_ulong code,
                   target_ulong size);
 
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 4707ac140c..aa65103702 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -448,7 +448,7 @@ int probe_access_flags(CPUArchState *env, target_ulong addr,
  * Note: the address of search data can be obtained by adding @size to @ptr.
  */
 struct tb_tc {
-    void *ptr;    /* pointer to the translated code */
+    const void *ptr;    /* pointer to the translated code */
     size_t size;
 };
 
diff --git a/include/exec/log.h b/include/exec/log.h
index e02fff5de1..3c7fa65ead 100644
--- a/include/exec/log.h
+++ b/include/exec/log.h
@@ -56,7 +56,7 @@ static inline void log_target_disas(CPUState *cpu, target_ulong start,
     rcu_read_unlock();
 }
 
-static inline void log_disas(void *code, unsigned long size)
+static inline void log_disas(const void *code, unsigned long size)
 {
     QemuLogFile *logfile;
     rcu_read_lock();
diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index 3c56a90abc..f6f84421b2 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -261,7 +261,7 @@ struct TCGLabel {
     unsigned refs : 16;
     union {
         uintptr_t value;
-        tcg_insn_unit *value_ptr;
+        const tcg_insn_unit *value_ptr;
     } u;
     QSIMPLEQ_HEAD(, TCGRelocation) relocs;
     QSIMPLEQ_ENTRY(TCGLabel) next;
@@ -678,8 +678,24 @@ struct TCGContext {
 extern TCGContext tcg_init_ctx;
 extern __thread TCGContext *tcg_ctx;
 extern void *tcg_code_gen_epilogue;
+extern uintptr_t tcg_rx_mirror_diff;
 extern TCGv_env cpu_env;
 
+#ifdef CONFIG_DEBUG_TCG
+const void *tcg_mirror_rw_to_rx(void *rw);
+void *tcg_mirror_rx_to_rw(const void *rx);
+#else
+static inline const void *tcg_mirror_rw_to_rx(void *rw)
+{
+    return rw ? rw + tcg_rx_mirror_diff : NULL;
+}
+
+static inline void *tcg_mirror_rx_to_rw(const void *rx)
+{
+    return rx ? (void *)rx - tcg_rx_mirror_diff : NULL;
+}
+#endif
+
 static inline size_t temp_idx(TCGTemp *ts)
 {
     ptrdiff_t n = ts - tcg_ctx->temps;
@@ -1098,7 +1114,7 @@ static inline TCGLabel *arg_label(TCGArg i)
  * correct result.
  */
 
-static inline ptrdiff_t tcg_ptr_byte_diff(void *a, void *b)
+static inline ptrdiff_t tcg_ptr_byte_diff(const void *a, const void *b)
 {
     return a - b;
 }
@@ -1112,9 +1128,9 @@ static inline ptrdiff_t tcg_ptr_byte_diff(void *a, void *b)
  * to the destination address.
  */
 
-static inline ptrdiff_t tcg_pcrel_diff(TCGContext *s, void *target)
+static inline ptrdiff_t tcg_pcrel_diff(TCGContext *s, const void *target)
 {
-    return tcg_ptr_byte_diff(target, s->code_ptr);
+    return tcg_ptr_byte_diff(target, tcg_mirror_rw_to_rx(s->code_ptr));
 }
 
 /**
@@ -1220,9 +1236,9 @@ static inline unsigned get_mmuidx(TCGMemOpIdx oi)
 #define TB_EXIT_REQUESTED 3
 
 #ifdef CONFIG_TCG_INTERPRETER
-uintptr_t tcg_qemu_tb_exec(CPUArchState *env, void *tb_ptr);
+uintptr_t tcg_qemu_tb_exec(CPUArchState *env, const void *tb_ptr);
 #else
-typedef uintptr_t tcg_prologue_fn(CPUArchState *env, void *tb_ptr);
+typedef uintptr_t tcg_prologue_fn(CPUArchState *env, const void *tb_ptr);
 extern tcg_prologue_fn *tcg_qemu_tb_exec;
 #endif
 
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 58aea605d8..1e3cb570f6 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -150,7 +150,7 @@ static inline tcg_target_ulong cpu_tb_exec(CPUState *cpu, TranslationBlock *itb)
     uintptr_t ret;
     TranslationBlock *last_tb;
     int tb_exit;
-    uint8_t *tb_ptr = itb->tc.ptr;
+    const void *tb_ptr = itb->tc.ptr;
 
     qemu_log_mask_and_addr(CPU_LOG_EXEC, itb->pc,
                            "Trace %d: %p ["
diff --git a/accel/tcg/tcg-runtime.c b/accel/tcg/tcg-runtime.c
index f85dfefeab..d736f4ff55 100644
--- a/accel/tcg/tcg-runtime.c
+++ b/accel/tcg/tcg-runtime.c
@@ -145,7 +145,7 @@ uint64_t HELPER(ctpop_i64)(uint64_t arg)
     return ctpop64(arg);
 }
 
-void *HELPER(lookup_tb_ptr)(CPUArchState *env)
+const void *HELPER(lookup_tb_ptr)(CPUArchState *env)
 {
     CPUState *cpu = env_cpu(env);
     TranslationBlock *tb;
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index d76097296d..c3e35bdee6 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -269,9 +269,9 @@ static uint8_t *encode_sleb128(uint8_t *p, target_long val)
 
 /* Decode a signed leb128 sequence at *PP; increment *PP past the
    decoded value.  Return the decoded value.  */
-static target_long decode_sleb128(uint8_t **pp)
+static target_long decode_sleb128(const uint8_t **pp)
 {
-    uint8_t *p = *pp;
+    const uint8_t *p = *pp;
     target_long val = 0;
     int byte, shift = 0;
 
@@ -342,7 +342,7 @@ static int cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
     target_ulong data[TARGET_INSN_START_WORDS] = { tb->pc };
     uintptr_t host_pc = (uintptr_t)tb->tc.ptr;
     CPUArchState *env = cpu->env_ptr;
-    uint8_t *p = tb->tc.ptr + tb->tc.size;
+    const uint8_t *p = tb->tc.ptr + tb->tc.size;
     int i, j, num_insns = tb->icount;
 #ifdef CONFIG_PROFILER
     TCGProfile *prof = &tcg_ctx->prof;
@@ -1722,7 +1722,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
     }
 
     gen_code_buf = tcg_ctx->code_gen_ptr;
-    tb->tc.ptr = gen_code_buf;
+    tb->tc.ptr = tcg_mirror_rw_to_rx(gen_code_buf);
     tb->pc = pc;
     tb->cs_base = cs_base;
     tb->flags = flags;
@@ -1816,15 +1816,19 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
     if (qemu_loglevel_mask(CPU_LOG_TB_OUT_ASM) &&
         qemu_log_in_addr_range(tb->pc)) {
         FILE *logfile = qemu_log_lock();
-        int code_size, data_size = 0;
+        int code_size, data_size;
+        const tcg_target_ulong *rx_data_gen_ptr;
         size_t chunk_start;
         int insn = 0;
 
         if (tcg_ctx->data_gen_ptr) {
-            code_size = tcg_ctx->data_gen_ptr - tb->tc.ptr;
+            rx_data_gen_ptr = tcg_mirror_rw_to_rx(tcg_ctx->data_gen_ptr);
+            code_size = (const void *)rx_data_gen_ptr - tb->tc.ptr;
             data_size = gen_code_size - code_size;
         } else {
+            rx_data_gen_ptr = 0;
             code_size = gen_code_size;
+            data_size = 0;
         }
 
         /* Dump header and the first instruction */
@@ -1859,16 +1863,9 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
         if (data_size) {
             int i;
             qemu_log("  data: [size=%d]\n", data_size);
-            for (i = 0; i < data_size; i += sizeof(tcg_target_ulong)) {
-                if (sizeof(tcg_target_ulong) == 8) {
-                    qemu_log("0x%08" PRIxPTR ":  .quad  0x%016" PRIx64 "\n",
-                             (uintptr_t)tcg_ctx->data_gen_ptr + i,
-                             *(uint64_t *)(tcg_ctx->data_gen_ptr + i));
-                } else {
-                    qemu_log("0x%08" PRIxPTR ":  .long  0x%08x\n",
-                             (uintptr_t)tcg_ctx->data_gen_ptr + i,
-                             *(uint32_t *)(tcg_ctx->data_gen_ptr + i));
-                }
+            for (i = 0; i < data_size / sizeof(tcg_target_ulong); i++) {
+                qemu_log("0x%08" PRIxPTR ":  .quad  0x%" TCG_PRIlx "\n",
+                         (uintptr_t)&rx_data_gen_ptr[i], rx_data_gen_ptr[i]);
             }
         }
         qemu_log("\n");
diff --git a/disas.c b/disas.c
index 7c18d7d2a7..de1de7be94 100644
--- a/disas.c
+++ b/disas.c
@@ -299,8 +299,10 @@ char *plugin_disas(CPUState *cpu, uint64_t addr, size_t size)
 }
 
 /* Disassemble this for me please... (debugging). */
-void disas(FILE *out, void *code, unsigned long size)
+void disas(FILE *out, const void *ccode, unsigned long size)
 {
+    /* TODO: Push constness through the disas backends. */
+    void *code = (void *)ccode;
     uintptr_t pc;
     int count;
     CPUDebug s;
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 1916a818d9..88b13b9321 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -161,6 +161,7 @@ static TCGContext **tcg_ctxs;
 static unsigned int n_tcg_ctxs;
 TCGv_env cpu_env = 0;
 void *tcg_code_gen_epilogue;
+uintptr_t tcg_rx_mirror_diff;
 
 #ifndef CONFIG_TCG_INTERPRETER
 tcg_prologue_fn *tcg_qemu_tb_exec;
@@ -304,7 +305,7 @@ static void tcg_out_label(TCGContext *s, TCGLabel *l, tcg_insn_unit *ptr)
 {
     tcg_debug_assert(!l->has_value);
     l->has_value = 1;
-    l->u.value_ptr = ptr;
+    l->u.value_ptr = tcg_mirror_rw_to_rx(ptr);
 }
 
 TCGLabel *gen_new_label(void)
@@ -404,8 +405,9 @@ static void tcg_region_trees_init(void)
     }
 }
 
-static struct tcg_region_tree *tc_ptr_to_region_tree(void *p)
+static struct tcg_region_tree *tc_ptr_to_region_tree(const void *cp)
 {
+    void *p = tcg_mirror_rx_to_rw(cp);
     size_t region_idx;
 
     if (p < region.start_aligned) {
@@ -699,6 +701,7 @@ void tcg_region_init(void)
     size_t region_size;
     size_t n_regions;
     size_t i;
+    uintptr_t mirror_diff;
 
     n_regions = tcg_n_regions();
 
@@ -729,6 +732,7 @@ void tcg_region_init(void)
     region.end -= page_size;
 
     /* set guard pages */
+    mirror_diff = tcg_rx_mirror_diff;
     for (i = 0; i < region.n; i++) {
         void *start, *end;
         int rc;
@@ -736,6 +740,10 @@ void tcg_region_init(void)
         tcg_region_bounds(i, &start, &end);
         rc = qemu_mprotect_none(end, page_size);
         g_assert(!rc);
+        if (mirror_diff) {
+            rc = qemu_mprotect_none(end + mirror_diff, page_size);
+            g_assert(!rc);
+        }
     }
 
     tcg_region_trees_init();
@@ -750,6 +758,29 @@ void tcg_region_init(void)
 #endif
 }
 
+#ifdef CONFIG_DEBUG_TCG
+const void *tcg_mirror_rw_to_rx(void *rw)
+{
+    /* Pass NULL pointers unchanged. */
+    if (rw) {
+        g_assert(rw >= region.start && rw <= region.end);
+        rw += tcg_rx_mirror_diff;
+    }
+    return rw;
+}
+
+void *tcg_mirror_rx_to_rw(const void *rx)
+{
+    /* Pass NULL pointers unchanged. */
+    if (rx) {
+        rx -= tcg_rx_mirror_diff;
+        /* Assert that we end with a pointer in the rw region. */
+        g_assert(rx >= region.start && rx <= region.end);
+    }
+    return (void *)rx;
+}
+#endif /* CONFIG_DEBUG_TCG */
+
 static void alloc_tcg_plugin_context(TCGContext *s)
 {
 #ifdef CONFIG_PLUGIN
@@ -1059,8 +1090,15 @@ void tcg_prologue_init(TCGContext *s)
     s->code_buf = buf0;
     s->data_gen_ptr = NULL;
 
+    /*
+     * The region trees are not yet configured, but tcg_mirror_rw_to_rx
+     * needs the bounds for an assert.
+     */
+    region.start = buf0;
+    region.end = buf0 + total_size;
+
 #ifndef CONFIG_TCG_INTERPRETER
-    tcg_qemu_tb_exec = (tcg_prologue_fn *)buf0;
+    tcg_qemu_tb_exec = (tcg_prologue_fn *)tcg_mirror_rw_to_rx(buf0);
 #endif
 
     /* Compute a high-water mark, at which we voluntarily flush the buffer
@@ -1084,7 +1122,8 @@ void tcg_prologue_init(TCGContext *s)
 #endif
 
     buf1 = s->code_ptr;
-    flush_idcache_range((uintptr_t)buf0, (uintptr_t)buf0, buf1 - buf0);
+    flush_idcache_range((uintptr_t)tcg_mirror_rw_to_rx(buf0),
+                        (uintptr_t)buf0, buf1 - buf0);
 
     /* Deduct the prologue from the buffer.  */
     prologue_size = tcg_current_code_size(s);
@@ -4171,8 +4210,13 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
 
     tcg_reg_alloc_start(s);
 
-    s->code_buf = tb->tc.ptr;
-    s->code_ptr = tb->tc.ptr;
+    /*
+     * Reset the buffer pointers when restarting after overflow.
+     * TODO: Move this into translate-all.c with the rest of the
+     * buffer management.  Having only this done here is confusing.
+     */
+    s->code_buf = tcg_mirror_rx_to_rw(tb->tc.ptr);
+    s->code_ptr = s->code_buf;
 
 #ifdef TCG_TARGET_NEED_LDST_LABELS
     QSIMPLEQ_INIT(&s->ldst_labels);
@@ -4276,8 +4320,8 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
     }
 
     /* flush instruction cache */
-    flush_idcache_range((uintptr_t)s->code_buf, (uintptr_t)s->code_buf,
-                        s->code_ptr - s->code_buf);
+    flush_idcache_range((uintptr_t)tcg_mirror_rw_to_rx(s->code_buf),
+                        (uintptr_t)s->code_buf, s->code_ptr - s->code_buf);
 
     return tcg_current_code_size(s);
 }
diff --git a/tcg/tci.c b/tcg/tci.c
index d996eb7cf8..262a2b39ce 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -475,9 +475,10 @@ static bool tci_compare64(uint64_t u0, uint64_t u1, TCGCond condition)
 #endif
 
 /* Interpret pseudo code in tb. */
-uintptr_t tcg_qemu_tb_exec(CPUArchState *env, void *v_tb_ptr)
+uintptr_t tcg_qemu_tb_exec(CPUArchState *env, const void *v_tb_ptr)
 {
-    uint8_t *tb_ptr = v_tb_ptr;
+    /* TODO: Propagate const through this file. */
+    uint8_t *tb_ptr = (uint8_t *)v_tb_ptr;
     tcg_target_ulong regs[TCG_TARGET_NB_REGS];
     long tcg_temps[CPU_TEMP_BUF_NLONGS];
     uintptr_t sp_value = (uintptr_t)(tcg_temps + CPU_TEMP_BUF_NLONGS);
diff --git a/accel/tcg/trace-events b/accel/tcg/trace-events
index 385b9f749b..6eefb37f5d 100644
--- a/accel/tcg/trace-events
+++ b/accel/tcg/trace-events
@@ -7,4 +7,4 @@ exec_tb_nocache(void *tb, uintptr_t pc) "tb:%p pc=0x%"PRIxPTR
 exec_tb_exit(void *last_tb, unsigned int flags) "tb:%p flags=0x%x"
 
 # translate-all.c
-translate_block(void *tb, uintptr_t pc, uint8_t *tb_code) "tb:%p, pc:0x%"PRIxPTR", tb_code:%p"
+translate_block(void *tb, uintptr_t pc, const void *tb_code) "tb:%p, pc:0x%"PRIxPTR", tb_code:%p"
diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
index 76f8ae48ad..96dc9f4d0b 100644
--- a/tcg/aarch64/tcg-target.c.inc
+++ b/tcg/aarch64/tcg-target.c.inc
@@ -1306,7 +1306,7 @@ static void tcg_out_cmp(TCGContext *s, TCGType ext, TCGReg a,
     }
 }
 
-static inline void tcg_out_goto(TCGContext *s, tcg_insn_unit *target)
+static void tcg_out_goto(TCGContext *s, const tcg_insn_unit *target)
 {
     ptrdiff_t offset = target - s->code_ptr;
     tcg_debug_assert(offset == sextract64(offset, 0, 26));
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 05/19] tcg: Adjust tcg_out_call for const
  2020-10-30  0:49 [PATCH v2 00/19] Mirror map JIT memory for TCG Richard Henderson
                   ` (3 preceding siblings ...)
  2020-10-30  0:49 ` [PATCH v2 04/19] tcg: Introduce tcg_mirror_rw_to_rx/tcg_mirror_rx_to_rw Richard Henderson
@ 2020-10-30  0:49 ` Richard Henderson
  2020-10-30  0:49 ` [PATCH v2 06/19] tcg: Adjust tcg_out_label " Richard Henderson
                   ` (15 subsequent siblings)
  20 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2020-10-30  0:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, j, laurent

We must change all targets at once, since all must match
the declaration in tcg.c.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tcg.c                    | 2 +-
 tcg/aarch64/tcg-target.c.inc | 2 +-
 tcg/arm/tcg-target.c.inc     | 2 +-
 tcg/i386/tcg-target.c.inc    | 4 ++--
 tcg/mips/tcg-target.c.inc    | 6 +++---
 tcg/ppc/tcg-target.c.inc     | 8 ++++----
 tcg/riscv/tcg-target.c.inc   | 6 +++---
 tcg/s390/tcg-target.c.inc    | 2 +-
 tcg/sparc/tcg-target.c.inc   | 4 ++--
 tcg/tci/tcg-target.c.inc     | 2 +-
 10 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index 88b13b9321..ddc38b8c50 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -148,7 +148,7 @@ static void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg, TCGReg arg1,
                        intptr_t arg2);
 static bool tcg_out_sti(TCGContext *s, TCGType type, TCGArg val,
                         TCGReg base, intptr_t ofs);
-static void tcg_out_call(TCGContext *s, tcg_insn_unit *target);
+static void tcg_out_call(TCGContext *s, const tcg_insn_unit *target);
 static int tcg_target_const_match(tcg_target_long val, TCGType type,
                                   const TCGArgConstraint *arg_ct);
 #ifdef TCG_TARGET_NEED_LDST_LABELS
diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
index 96dc9f4d0b..6d8152c468 100644
--- a/tcg/aarch64/tcg-target.c.inc
+++ b/tcg/aarch64/tcg-target.c.inc
@@ -1329,7 +1329,7 @@ static inline void tcg_out_callr(TCGContext *s, TCGReg reg)
     tcg_out_insn(s, 3207, BLR, reg);
 }
 
-static inline void tcg_out_call(TCGContext *s, tcg_insn_unit *target)
+static inline void tcg_out_call(TCGContext *s, const tcg_insn_unit *target)
 {
     ptrdiff_t offset = target - s->code_ptr;
     if (offset == sextract64(offset, 0, 26)) {
diff --git a/tcg/arm/tcg-target.c.inc b/tcg/arm/tcg-target.c.inc
index 1e32bf42b8..d6dfe2b428 100644
--- a/tcg/arm/tcg-target.c.inc
+++ b/tcg/arm/tcg-target.c.inc
@@ -1033,7 +1033,7 @@ static void tcg_out_goto(TCGContext *s, int cond, tcg_insn_unit *addr)
 
 /* The call case is mostly used for helpers - so it's not unreasonable
  * for them to be beyond branch range */
-static void tcg_out_call(TCGContext *s, tcg_insn_unit *addr)
+static void tcg_out_call(TCGContext *s, const tcg_insn_unit *addr)
 {
     intptr_t addri = (intptr_t)addr;
     ptrdiff_t disp = tcg_pcrel_diff(s, addr);
diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index 424dd1cdcf..095553ce28 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -1591,7 +1591,7 @@ static void tcg_out_clz(TCGContext *s, int rexw, TCGReg dest, TCGReg arg1,
     }
 }
 
-static void tcg_out_branch(TCGContext *s, int call, tcg_insn_unit *dest)
+static void tcg_out_branch(TCGContext *s, int call, const tcg_insn_unit *dest)
 {
     intptr_t disp = tcg_pcrel_diff(s, dest) - 5;
 
@@ -1610,7 +1610,7 @@ static void tcg_out_branch(TCGContext *s, int call, tcg_insn_unit *dest)
     }
 }
 
-static inline void tcg_out_call(TCGContext *s, tcg_insn_unit *dest)
+static inline void tcg_out_call(TCGContext *s, const tcg_insn_unit *dest)
 {
     tcg_out_branch(s, 1, dest);
 }
diff --git a/tcg/mips/tcg-target.c.inc b/tcg/mips/tcg-target.c.inc
index f641105f9a..064f46fc6d 100644
--- a/tcg/mips/tcg-target.c.inc
+++ b/tcg/mips/tcg-target.c.inc
@@ -516,7 +516,7 @@ static void tcg_out_opc_sa64(TCGContext *s, MIPSInsn opc1, MIPSInsn opc2,
  * Type jump.
  * Returns true if the branch was in range and the insn was emitted.
  */
-static bool tcg_out_opc_jmp(TCGContext *s, MIPSInsn opc, void *target)
+static bool tcg_out_opc_jmp(TCGContext *s, MIPSInsn opc, const void *target)
 {
     uintptr_t dest = (uintptr_t)target;
     uintptr_t from = (uintptr_t)s->code_ptr + 4;
@@ -1079,7 +1079,7 @@ static void tcg_out_movcond(TCGContext *s, TCGCond cond, TCGReg ret,
     }
 }
 
-static void tcg_out_call_int(TCGContext *s, tcg_insn_unit *arg, bool tail)
+static void tcg_out_call_int(TCGContext *s, const tcg_insn_unit *arg, bool tail)
 {
     /* Note that the ABI requires the called function's address to be
        loaded into T9, even if a direct branch is in range.  */
@@ -1097,7 +1097,7 @@ static void tcg_out_call_int(TCGContext *s, tcg_insn_unit *arg, bool tail)
     }
 }
 
-static void tcg_out_call(TCGContext *s, tcg_insn_unit *arg)
+static void tcg_out_call(TCGContext *s, const tcg_insn_unit *arg)
 {
     tcg_out_call_int(s, arg, false);
     tcg_out_nop(s);
diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index be116c6164..513d784a83 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc
@@ -1106,7 +1106,7 @@ static void tcg_out_xori32(TCGContext *s, TCGReg dst, TCGReg src, uint32_t c)
     tcg_out_zori32(s, dst, src, c, XORI, XORIS);
 }
 
-static void tcg_out_b(TCGContext *s, int mask, tcg_insn_unit *target)
+static void tcg_out_b(TCGContext *s, int mask, const tcg_insn_unit *target)
 {
     ptrdiff_t disp = tcg_pcrel_diff(s, target);
     if (in_range_b(disp)) {
@@ -1762,13 +1762,13 @@ void tb_target_set_jmp_target(uintptr_t tc_ptr, uintptr_t jmp_addr,
     }
 }
 
-static void tcg_out_call(TCGContext *s, tcg_insn_unit *target)
+static void tcg_out_call(TCGContext *s, const tcg_insn_unit *target)
 {
 #ifdef _CALL_AIX
     /* Look through the descriptor.  If the branch is in range, and we
        don't have to spend too much effort on building the toc.  */
-    void *tgt = ((void **)target)[0];
-    uintptr_t toc = ((uintptr_t *)target)[1];
+    const void *tgt = ((const void * const *)target)[0];
+    uintptr_t toc = ((const uintptr_t *)target)[1];
     intptr_t diff = tcg_pcrel_diff(s, tgt);
 
     if (in_range_b(diff) && toc == (uint32_t)toc) {
diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index ab08af7457..4416a93e1f 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -449,7 +449,7 @@ static bool reloc_jimm20(tcg_insn_unit *code_ptr, tcg_insn_unit *target)
     return false;
 }
 
-static bool reloc_call(tcg_insn_unit *code_ptr, tcg_insn_unit *target)
+static bool reloc_call(tcg_insn_unit *code_ptr, const tcg_insn_unit *target)
 {
     intptr_t offset = (intptr_t)target - (intptr_t)code_ptr;
     int32_t lo = sextreg(offset, 0, 12);
@@ -861,7 +861,7 @@ static inline void tcg_out_goto(TCGContext *s, tcg_insn_unit *target)
     tcg_out_opc_jump(s, OPC_JAL, TCG_REG_ZERO, offset);
 }
 
-static void tcg_out_call_int(TCGContext *s, tcg_insn_unit *arg, bool tail)
+static void tcg_out_call_int(TCGContext *s, const tcg_insn_unit *arg, bool tail)
 {
     TCGReg link = tail ? TCG_REG_ZERO : TCG_REG_RA;
     ptrdiff_t offset = tcg_pcrel_diff(s, arg);
@@ -888,7 +888,7 @@ static void tcg_out_call_int(TCGContext *s, tcg_insn_unit *arg, bool tail)
     }
 }
 
-static void tcg_out_call(TCGContext *s, tcg_insn_unit *arg)
+static void tcg_out_call(TCGContext *s, const tcg_insn_unit *arg)
 {
     tcg_out_call_int(s, arg, false);
 }
diff --git a/tcg/s390/tcg-target.c.inc b/tcg/s390/tcg-target.c.inc
index ac99ccea73..7983befd96 100644
--- a/tcg/s390/tcg-target.c.inc
+++ b/tcg/s390/tcg-target.c.inc
@@ -1415,7 +1415,7 @@ static void tgen_brcond(TCGContext *s, TCGType type, TCGCond c,
     tgen_branch(s, cc, l);
 }
 
-static void tcg_out_call(TCGContext *s, tcg_insn_unit *dest)
+static void tcg_out_call(TCGContext *s, const tcg_insn_unit *dest)
 {
     ptrdiff_t off = dest - s->code_ptr;
     if (off == (int32_t)off) {
diff --git a/tcg/sparc/tcg-target.c.inc b/tcg/sparc/tcg-target.c.inc
index 5b3bc91b05..1a40911660 100644
--- a/tcg/sparc/tcg-target.c.inc
+++ b/tcg/sparc/tcg-target.c.inc
@@ -840,7 +840,7 @@ static void tcg_out_addsub2_i64(TCGContext *s, TCGReg rl, TCGReg rh,
     tcg_out_mov(s, TCG_TYPE_I64, rl, tmp);
 }
 
-static void tcg_out_call_nodelay(TCGContext *s, tcg_insn_unit *dest,
+static void tcg_out_call_nodelay(TCGContext *s, const tcg_insn_unit *dest,
                                  bool in_prologue)
 {
     ptrdiff_t disp = tcg_pcrel_diff(s, dest);
@@ -855,7 +855,7 @@ static void tcg_out_call_nodelay(TCGContext *s, tcg_insn_unit *dest,
     }
 }
 
-static void tcg_out_call(TCGContext *s, tcg_insn_unit *dest)
+static void tcg_out_call(TCGContext *s, const tcg_insn_unit *dest)
 {
     tcg_out_call_nodelay(s, dest, false);
     tcg_out_nop(s);
diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index 231b9b1775..d5a4d9d37c 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -545,7 +545,7 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
     old_code_ptr[1] = s->code_ptr - old_code_ptr;
 }
 
-static inline void tcg_out_call(TCGContext *s, tcg_insn_unit *arg)
+static inline void tcg_out_call(TCGContext *s, const tcg_insn_unit *arg)
 {
     uint8_t *old_code_ptr = s->code_ptr;
     tcg_out_op_t(s, INDEX_op_call);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 06/19] tcg: Adjust tcg_out_label for const
  2020-10-30  0:49 [PATCH v2 00/19] Mirror map JIT memory for TCG Richard Henderson
                   ` (4 preceding siblings ...)
  2020-10-30  0:49 ` [PATCH v2 05/19] tcg: Adjust tcg_out_call for const Richard Henderson
@ 2020-10-30  0:49 ` Richard Henderson
  2020-10-30  0:49 ` [PATCH v2 07/19] tcg: Adjust tcg_register_jit " Richard Henderson
                   ` (14 subsequent siblings)
  20 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2020-10-30  0:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, j, laurent

Simplify the arguments to always use s->code_ptr instead of
take it as an argument.  That makes it easy to ensure that
the value_ptr is always the rx version.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tcg.c                 |  6 +++---
 tcg/i386/tcg-target.c.inc | 10 +++++-----
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index ddc38b8c50..da16378d1c 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -301,11 +301,11 @@ static void tcg_out_reloc(TCGContext *s, tcg_insn_unit *code_ptr, int type,
     QSIMPLEQ_INSERT_TAIL(&l->relocs, r, next);
 }
 
-static void tcg_out_label(TCGContext *s, TCGLabel *l, tcg_insn_unit *ptr)
+static void tcg_out_label(TCGContext *s, TCGLabel *l)
 {
     tcg_debug_assert(!l->has_value);
     l->has_value = 1;
-    l->u.value_ptr = tcg_mirror_rw_to_rx(ptr);
+    l->u.value_ptr = tcg_mirror_rw_to_rx(s->code_ptr);
 }
 
 TCGLabel *gen_new_label(void)
@@ -4270,7 +4270,7 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
             break;
         case INDEX_op_set_label:
             tcg_reg_alloc_bb_end(s, s->reserved_regs);
-            tcg_out_label(s, arg_label(op->args[0]), s->code_ptr);
+            tcg_out_label(s, arg_label(op->args[0]));
             break;
         case INDEX_op_call:
             tcg_reg_alloc_call(s, op);
diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index 095553ce28..0ac1ef3d82 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -1452,7 +1452,7 @@ static void tcg_out_brcond2(TCGContext *s, const TCGArg *args,
     default:
         tcg_abort();
     }
-    tcg_out_label(s, label_next, s->code_ptr);
+    tcg_out_label(s, label_next);
 }
 #endif
 
@@ -1494,10 +1494,10 @@ static void tcg_out_setcond2(TCGContext *s, const TCGArg *args,
 
         tcg_out_movi(s, TCG_TYPE_I32, args[0], 0);
         tcg_out_jxx(s, JCC_JMP, label_over, 1);
-        tcg_out_label(s, label_true, s->code_ptr);
+        tcg_out_label(s, label_true);
 
         tcg_out_movi(s, TCG_TYPE_I32, args[0], 1);
-        tcg_out_label(s, label_over, s->code_ptr);
+        tcg_out_label(s, label_over);
     } else {
         /* When the destination does not overlap one of the arguments,
            clear the destination first, jump if cond false, and emit an
@@ -1511,7 +1511,7 @@ static void tcg_out_setcond2(TCGContext *s, const TCGArg *args,
         tcg_out_brcond2(s, new_args, const_args+1, 1);
 
         tgen_arithi(s, ARITH_ADD, args[0], 1, 0);
-        tcg_out_label(s, label_over, s->code_ptr);
+        tcg_out_label(s, label_over);
     }
 }
 #endif
@@ -1525,7 +1525,7 @@ static void tcg_out_cmov(TCGContext *s, TCGCond cond, int rexw,
         TCGLabel *over = gen_new_label();
         tcg_out_jxx(s, tcg_cond_to_jcc[tcg_invert_cond(cond)], over, 1);
         tcg_out_mov(s, TCG_TYPE_I32, dest, v1);
-        tcg_out_label(s, over, s->code_ptr);
+        tcg_out_label(s, over);
     }
 }
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 07/19] tcg: Adjust tcg_register_jit for const
  2020-10-30  0:49 [PATCH v2 00/19] Mirror map JIT memory for TCG Richard Henderson
                   ` (5 preceding siblings ...)
  2020-10-30  0:49 ` [PATCH v2 06/19] tcg: Adjust tcg_out_label " Richard Henderson
@ 2020-10-30  0:49 ` Richard Henderson
  2020-10-30  0:49 ` [PATCH v2 08/19] tcg: Adjust tb_target_set_jmp_target for split rwx Richard Henderson
                   ` (13 subsequent siblings)
  20 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2020-10-30  0:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, j, laurent

We must change all targets at once, since all must match
the declaration in tcg.c.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/tcg/tcg.h            |  2 +-
 tcg/tcg.c                    | 10 +++++-----
 tcg/aarch64/tcg-target.c.inc |  2 +-
 tcg/arm/tcg-target.c.inc     |  2 +-
 tcg/i386/tcg-target.c.inc    |  2 +-
 tcg/mips/tcg-target.c.inc    |  2 +-
 tcg/ppc/tcg-target.c.inc     |  2 +-
 tcg/riscv/tcg-target.c.inc   |  2 +-
 tcg/s390/tcg-target.c.inc    |  2 +-
 tcg/sparc/tcg-target.c.inc   |  2 +-
 10 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index f6f84421b2..76717b358b 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -1242,7 +1242,7 @@ typedef uintptr_t tcg_prologue_fn(CPUArchState *env, const void *tb_ptr);
 extern tcg_prologue_fn *tcg_qemu_tb_exec;
 #endif
 
-void tcg_register_jit(void *buf, size_t buf_size);
+void tcg_register_jit(const void *buf, size_t buf_size);
 
 #if TCG_TARGET_MAYBE_vec
 /* Return zero if the tuple (opc, type, vece) is unsupportable;
diff --git a/tcg/tcg.c b/tcg/tcg.c
index da16378d1c..4d5c95526c 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -96,7 +96,7 @@ typedef struct QEMU_PACKED {
     DebugFrameFDEHeader fde;
 } DebugFrameHeader;
 
-static void tcg_register_jit_int(void *buf, size_t size,
+static void tcg_register_jit_int(const void *buf, size_t size,
                                  const void *debug_frame,
                                  size_t debug_frame_size)
     __attribute__((unused));
@@ -1133,7 +1133,7 @@ void tcg_prologue_init(TCGContext *s)
     total_size -= prologue_size;
     s->code_gen_buffer_size = total_size;
 
-    tcg_register_jit(s->code_gen_buffer, total_size);
+    tcg_register_jit(tcg_mirror_rw_to_rx(s->code_gen_buffer), total_size);
 
 #ifdef DEBUG_DISAS
     if (qemu_loglevel_mask(CPU_LOG_TB_OUT_ASM)) {
@@ -4449,7 +4449,7 @@ static int find_string(const char *strtab, const char *str)
     }
 }
 
-static void tcg_register_jit_int(void *buf_ptr, size_t buf_size,
+static void tcg_register_jit_int(const void *buf_ptr, size_t buf_size,
                                  const void *debug_frame,
                                  size_t debug_frame_size)
 {
@@ -4651,13 +4651,13 @@ static void tcg_register_jit_int(void *buf_ptr, size_t buf_size,
 /* No support for the feature.  Provide the entry point expected by exec.c,
    and implement the internal function we declared earlier.  */
 
-static void tcg_register_jit_int(void *buf, size_t size,
+static void tcg_register_jit_int(const void *buf, size_t size,
                                  const void *debug_frame,
                                  size_t debug_frame_size)
 {
 }
 
-void tcg_register_jit(void *buf, size_t buf_size)
+void tcg_register_jit(const void *buf, size_t buf_size)
 {
 }
 #endif /* ELF_HOST_MACHINE */
diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
index 6d8152c468..9ace859db3 100644
--- a/tcg/aarch64/tcg-target.c.inc
+++ b/tcg/aarch64/tcg-target.c.inc
@@ -2964,7 +2964,7 @@ static const DebugFrame debug_frame = {
     }
 };
 
-void tcg_register_jit(void *buf, size_t buf_size)
+void tcg_register_jit(const void *buf, size_t buf_size)
 {
     tcg_register_jit_int(buf, buf_size, &debug_frame, sizeof(debug_frame));
 }
diff --git a/tcg/arm/tcg-target.c.inc b/tcg/arm/tcg-target.c.inc
index d6dfe2b428..431af3107c 100644
--- a/tcg/arm/tcg-target.c.inc
+++ b/tcg/arm/tcg-target.c.inc
@@ -2353,7 +2353,7 @@ static const DebugFrame debug_frame = {
     }
 };
 
-void tcg_register_jit(void *buf, size_t buf_size)
+void tcg_register_jit(const void *buf, size_t buf_size)
 {
     tcg_register_jit_int(buf, buf_size, &debug_frame, sizeof(debug_frame));
 }
diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index 0ac1ef3d82..7f74c77d7f 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -3998,7 +3998,7 @@ static const DebugFrame debug_frame = {
 #endif
 
 #if defined(ELF_HOST_MACHINE)
-void tcg_register_jit(void *buf, size_t buf_size)
+void tcg_register_jit(const void *buf, size_t buf_size)
 {
     tcg_register_jit_int(buf, buf_size, &debug_frame, sizeof(debug_frame));
 }
diff --git a/tcg/mips/tcg-target.c.inc b/tcg/mips/tcg-target.c.inc
index 064f46fc6d..b74dc15b86 100644
--- a/tcg/mips/tcg-target.c.inc
+++ b/tcg/mips/tcg-target.c.inc
@@ -2702,7 +2702,7 @@ static const DebugFrame debug_frame = {
     }
 };
 
-void tcg_register_jit(void *buf, size_t buf_size)
+void tcg_register_jit(const void *buf, size_t buf_size)
 {
     tcg_register_jit_int(buf, buf_size, &debug_frame, sizeof(debug_frame));
 }
diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index 513d784a83..bdaffeabb3 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc
@@ -3847,7 +3847,7 @@ static DebugFrame debug_frame = {
     }
 };
 
-void tcg_register_jit(void *buf, size_t buf_size)
+void tcg_register_jit(const void *buf, size_t buf_size)
 {
     uint8_t *p = &debug_frame.fde_reg_ofs[3];
     int i;
diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index 4416a93e1f..025e3cd0bb 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -1907,7 +1907,7 @@ static const DebugFrame debug_frame = {
     }
 };
 
-void tcg_register_jit(void *buf, size_t buf_size)
+void tcg_register_jit(const void *buf, size_t buf_size)
 {
     tcg_register_jit_int(buf, buf_size, &debug_frame, sizeof(debug_frame));
 }
diff --git a/tcg/s390/tcg-target.c.inc b/tcg/s390/tcg-target.c.inc
index 7983befd96..2444511177 100644
--- a/tcg/s390/tcg-target.c.inc
+++ b/tcg/s390/tcg-target.c.inc
@@ -2620,7 +2620,7 @@ static const DebugFrame debug_frame = {
     }
 };
 
-void tcg_register_jit(void *buf, size_t buf_size)
+void tcg_register_jit(const void *buf, size_t buf_size)
 {
     tcg_register_jit_int(buf, buf_size, &debug_frame, sizeof(debug_frame));
 }
diff --git a/tcg/sparc/tcg-target.c.inc b/tcg/sparc/tcg-target.c.inc
index 1a40911660..4c81d5f1c2 100644
--- a/tcg/sparc/tcg-target.c.inc
+++ b/tcg/sparc/tcg-target.c.inc
@@ -1816,7 +1816,7 @@ static const DebugFrame debug_frame = {
     .fde_ret_save = { 9, 15, 31 },      /* DW_CFA_register o7, i7 */
 };
 
-void tcg_register_jit(void *buf, size_t buf_size)
+void tcg_register_jit(const void *buf, size_t buf_size)
 {
     tcg_register_jit_int(buf, buf_size, &debug_frame, sizeof(debug_frame));
 }
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 08/19] tcg: Adjust tb_target_set_jmp_target for split rwx
  2020-10-30  0:49 [PATCH v2 00/19] Mirror map JIT memory for TCG Richard Henderson
                   ` (6 preceding siblings ...)
  2020-10-30  0:49 ` [PATCH v2 07/19] tcg: Adjust tcg_register_jit " Richard Henderson
@ 2020-10-30  0:49 ` Richard Henderson
  2020-10-30  0:49 ` [PATCH v2 09/19] tcg: Make DisasContextBase.tb const Richard Henderson
                   ` (12 subsequent siblings)
  20 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2020-10-30  0:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, j, laurent

Pass both rx and rw addresses to tb_target_set_jmp_target.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/aarch64/tcg-target.h     |  2 +-
 tcg/arm/tcg-target.h         |  2 +-
 tcg/i386/tcg-target.h        |  6 +++---
 tcg/mips/tcg-target.h        |  2 +-
 tcg/ppc/tcg-target.h         |  2 +-
 tcg/riscv/tcg-target.h       |  2 +-
 tcg/s390/tcg-target.h        |  8 ++++----
 tcg/sparc/tcg-target.h       |  2 +-
 tcg/tci/tcg-target.h         |  6 +++---
 accel/tcg/cpu-exec.c         |  4 +++-
 tcg/aarch64/tcg-target.c.inc | 12 ++++++------
 tcg/mips/tcg-target.c.inc    |  8 ++++----
 tcg/ppc/tcg-target.c.inc     | 16 ++++++++--------
 tcg/sparc/tcg-target.c.inc   | 14 +++++++-------
 14 files changed, 44 insertions(+), 42 deletions(-)

diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index d0a6a059b7..91313d93be 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -158,7 +158,7 @@ static inline void flush_idcache_range(uintptr_t rx, uintptr_t rw, size_t len)
     __builtin___clear_cache((char *)rx, (char *)(rx + len));
 }
 
-void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t);
+void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t, uintptr_t);
 
 #ifdef CONFIG_SOFTMMU
 #define TCG_TARGET_NEED_LDST_LABELS
diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
index fa88b24e43..b21a2fb6a1 100644
--- a/tcg/arm/tcg-target.h
+++ b/tcg/arm/tcg-target.h
@@ -144,7 +144,7 @@ static inline void flush_idcache_range(uintptr_t rx, uintptr_t rw, size_t len)
 }
 
 /* not defined -- call should be eliminated at compile time */
-void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t);
+void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t, uintptr_t);
 
 #ifdef CONFIG_SOFTMMU
 #define TCG_TARGET_NEED_LDST_LABELS
diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 8323e72639..f52ba0ffec 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -211,11 +211,11 @@ static inline void flush_idcache_range(uintptr_t rx, uintptr_t rw, size_t len)
 {
 }
 
-static inline void tb_target_set_jmp_target(uintptr_t tc_ptr,
-                                            uintptr_t jmp_addr, uintptr_t addr)
+static inline void tb_target_set_jmp_target(uintptr_t tc_ptr, uintptr_t jmp_rx,
+                                            uintptr_t jmp_rw, uintptr_t addr)
 {
     /* patch the branch destination */
-    qatomic_set((int32_t *)jmp_addr, addr - (jmp_addr + 4));
+    qatomic_set((int32_t *)jmp_rw, addr - (jmp_rx + 4));
     /* no need to flush icache explicitly */
 }
 
diff --git a/tcg/mips/tcg-target.h b/tcg/mips/tcg-target.h
index 47b1226ee9..cd548dacec 100644
--- a/tcg/mips/tcg-target.h
+++ b/tcg/mips/tcg-target.h
@@ -216,7 +216,7 @@ static inline void flush_idcache_range(uintptr_t rx, uintptr_t rw, size_t len)
     cacheflush((void *)rx, len, ICACHE);
 }
 
-void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t);
+void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t, uintptr_t);
 
 #ifdef CONFIG_SOFTMMU
 #define TCG_TARGET_NEED_LDST_LABELS
diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index fbb6dc1b47..8f3e4c924a 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -176,7 +176,7 @@ extern bool have_vsx;
 #define TCG_TARGET_HAS_cmpsel_vec       0
 
 void flush_idcache_range(uintptr_t rx, uintptr_t rw, size_t len);
-void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t);
+void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t, uintptr_t);
 
 #define TCG_TARGET_DEFAULT_MO (0)
 #define TCG_TARGET_HAS_MEMORY_BSWAP     1
diff --git a/tcg/riscv/tcg-target.h b/tcg/riscv/tcg-target.h
index 0fa6ae358e..e03fd17427 100644
--- a/tcg/riscv/tcg-target.h
+++ b/tcg/riscv/tcg-target.h
@@ -169,7 +169,7 @@ static inline void flush_idcache_range(uintptr_t rx, uintptr_t rw, size_t len)
 }
 
 /* not defined -- call should be eliminated at compile time */
-void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t);
+void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t, uintptr_t);
 
 #define TCG_TARGET_DEFAULT_MO (0)
 
diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h
index c3dc2e8938..c5a749e425 100644
--- a/tcg/s390/tcg-target.h
+++ b/tcg/s390/tcg-target.h
@@ -150,12 +150,12 @@ static inline void flush_idcache_range(uintptr_t rx, uintptr_t rw, size_t len)
 {
 }
 
-static inline void tb_target_set_jmp_target(uintptr_t tc_ptr,
-                                            uintptr_t jmp_addr, uintptr_t addr)
+static inline void tb_target_set_jmp_target(uintptr_t tc_ptr, uintptr_t jmp_rx,
+                                            uintptr_t jmp_rw, uintptr_t addr)
 {
     /* patch the branch destination */
-    intptr_t disp = addr - (jmp_addr - 2);
-    qatomic_set((int32_t *)jmp_addr, disp / 2);
+    intptr_t disp = addr - (jmp_rx - 2);
+    qatomic_set((int32_t *)jmp_rw, disp / 2);
     /* no need to flush icache explicitly */
 }
 
diff --git a/tcg/sparc/tcg-target.h b/tcg/sparc/tcg-target.h
index c27c40231e..87e2be61e6 100644
--- a/tcg/sparc/tcg-target.h
+++ b/tcg/sparc/tcg-target.h
@@ -178,7 +178,7 @@ static inline void flush_idcache_range(uintptr_t rx, uintptr_t rw, size_t len)
     }
 }
 
-void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t);
+void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t, uintptr_t);
 
 #define TCG_TARGET_NEED_POOL_LABELS
 
diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index 49f3291f8a..a19a6b06e5 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -201,11 +201,11 @@ static inline void flush_idcache_range(uintptr_t rx, uintptr_t rw, size_t len)
 
 #define TCG_TARGET_HAS_MEMORY_BSWAP     1
 
-static inline void tb_target_set_jmp_target(uintptr_t tc_ptr,
-                                            uintptr_t jmp_addr, uintptr_t addr)
+static inline void tb_target_set_jmp_target(uintptr_t tc_ptr, uintptr_t jmp_rx,
+                                            uintptr_t jmp_rw, uintptr_t addr)
 {
     /* patch the branch destination */
-    qatomic_set((int32_t *)jmp_addr, addr - (jmp_addr + 4));
+    qatomic_set((int32_t *)jmp_rw, addr - (jmp_rx + 4));
     /* no need to flush icache explicitly */
 }
 
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 1e3cb570f6..4af3faba80 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -354,7 +354,9 @@ void tb_set_jmp_target(TranslationBlock *tb, int n, uintptr_t addr)
     if (TCG_TARGET_HAS_direct_jump) {
         uintptr_t offset = tb->jmp_target_arg[n];
         uintptr_t tc_ptr = (uintptr_t)tb->tc.ptr;
-        tb_target_set_jmp_target(tc_ptr, tc_ptr + offset, addr);
+        uintptr_t jmp_rx = tc_ptr + offset;
+        uintptr_t jmp_rw = jmp_rx - tcg_rx_mirror_diff;
+        tb_target_set_jmp_target(tc_ptr, jmp_rx, jmp_rw, addr);
     } else {
         tb->jmp_target_arg[n] = addr;
     }
diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
index 9ace859db3..fea784cf75 100644
--- a/tcg/aarch64/tcg-target.c.inc
+++ b/tcg/aarch64/tcg-target.c.inc
@@ -1340,21 +1340,21 @@ static inline void tcg_out_call(TCGContext *s, const tcg_insn_unit *target)
     }
 }
 
-void tb_target_set_jmp_target(uintptr_t tc_ptr, uintptr_t jmp_addr,
-                              uintptr_t addr)
+void tb_target_set_jmp_target(uintptr_t tc_ptr, uintptr_t jmp_rx,
+                              uintptr_t jmp_rw, uintptr_t addr)
 {
     tcg_insn_unit i1, i2;
     TCGType rt = TCG_TYPE_I64;
     TCGReg  rd = TCG_REG_TMP;
     uint64_t pair;
 
-    ptrdiff_t offset = addr - jmp_addr;
+    ptrdiff_t offset = addr - jmp_rx;
 
     if (offset == sextract64(offset, 0, 26)) {
         i1 = I3206_B | ((offset >> 2) & 0x3ffffff);
         i2 = NOP;
     } else {
-        offset = (addr >> 12) - (jmp_addr >> 12);
+        offset = (addr >> 12) - (jmp_rx >> 12);
 
         /* patch ADRP */
         i1 = I3406_ADRP | (offset & 3) << 29 | (offset & 0x1ffffc) << (5 - 2) | rd;
@@ -1362,8 +1362,8 @@ void tb_target_set_jmp_target(uintptr_t tc_ptr, uintptr_t jmp_addr,
         i2 = I3401_ADDI | rt << 31 | (addr & 0xfff) << 10 | rd << 5 | rd;
     }
     pair = (uint64_t)i2 << 32 | i1;
-    qatomic_set((uint64_t *)jmp_addr, pair);
-    flush_idcache_range(jmp_addr, jmp_addr, 8);
+    qatomic_set((uint64_t *)jmp_rw, pair);
+    flush_idcache_range(jmp_rx, jmp_rw, 8);
 }
 
 static inline void tcg_out_goto_label(TCGContext *s, TCGLabel *l)
diff --git a/tcg/mips/tcg-target.c.inc b/tcg/mips/tcg-target.c.inc
index b74dc15b86..e87a632637 100644
--- a/tcg/mips/tcg-target.c.inc
+++ b/tcg/mips/tcg-target.c.inc
@@ -2656,11 +2656,11 @@ static void tcg_target_init(TCGContext *s)
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_GP);   /* global pointer */
 }
 
-void tb_target_set_jmp_target(uintptr_t tc_ptr, uintptr_t jmp_addr,
-                              uintptr_t addr)
+void tb_target_set_jmp_target(uintptr_t tc_ptr, uintptr_t jmp_rx,
+                              uintptr_t jmp_rw, uintptr_t addr)
 {
-    qatomic_set((uint32_t *)jmp_addr, deposit32(OPC_J, 0, 26, addr >> 2));
-    flush_idcache_range(jmp_addr, jmp_addr, 4);
+    qatomic_set((uint32_t *)jmp_rw, deposit32(OPC_J, 0, 26, addr >> 2));
+    flush_idcache_range(jmp_rx, jmp_rw, 4);
 }
 
 typedef struct {
diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index bdaffeabb3..6a71b81f4e 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc
@@ -1722,13 +1722,13 @@ static void tcg_out_mb(TCGContext *s, TCGArg a0)
     tcg_out32(s, insn);
 }
 
-void tb_target_set_jmp_target(uintptr_t tc_ptr, uintptr_t jmp_addr,
-                              uintptr_t addr)
+void tb_target_set_jmp_target(uintptr_t tc_ptr, uintptr_t jmp_rx,
+                              uintptr_t jmp_rw, uintptr_t addr)
 {
     if (TCG_TARGET_REG_BITS == 64) {
         tcg_insn_unit i1, i2;
         intptr_t tb_diff = addr - tc_ptr;
-        intptr_t br_diff = addr - (jmp_addr + 4);
+        intptr_t br_diff = addr - (jmp_rx + 4);
         uint64_t pair;
 
         /* This does not exercise the range of the branch, but we do
@@ -1752,13 +1752,13 @@ void tb_target_set_jmp_target(uintptr_t tc_ptr, uintptr_t jmp_addr,
 
         /* As per the enclosing if, this is ppc64.  Avoid the _Static_assert
            within qatomic_set that would fail to build a ppc32 host.  */
-        qatomic_set__nocheck((uint64_t *)jmp_addr, pair);
-        flush_idcache_range(jmp_addr, jmp_addr, 8);
+        qatomic_set__nocheck((uint64_t *)jmp_rw, pair);
+        flush_idcache_range(jmp_rx, jmp_rw, 8);
     } else {
-        intptr_t diff = addr - jmp_addr;
+        intptr_t diff = addr - jmp_rx;
         tcg_debug_assert(in_range_b(diff));
-        qatomic_set((uint32_t *)jmp_addr, B | (diff & 0x3fffffc));
-        flush_idcache_range(jmp_addr, jmp_addr, 4);
+        qatomic_set((uint32_t *)jmp_rw, B | (diff & 0x3fffffc));
+        flush_idcache_range(jmp_rx, jmp_rw, 4);
     }
 }
 
diff --git a/tcg/sparc/tcg-target.c.inc b/tcg/sparc/tcg-target.c.inc
index 4c81d5f1c2..d599ae27b5 100644
--- a/tcg/sparc/tcg-target.c.inc
+++ b/tcg/sparc/tcg-target.c.inc
@@ -1821,11 +1821,11 @@ void tcg_register_jit(const void *buf, size_t buf_size)
     tcg_register_jit_int(buf, buf_size, &debug_frame, sizeof(debug_frame));
 }
 
-void tb_target_set_jmp_target(uintptr_t tc_ptr, uintptr_t jmp_addr,
-                              uintptr_t addr)
+void tb_target_set_jmp_target(uintptr_t tc_ptr, uintptr_t jmp_rx,
+                              uintptr_t jmp_rw, uintptr_t addr)
 {
     intptr_t tb_disp = addr - tc_ptr;
-    intptr_t br_disp = addr - jmp_addr;
+    intptr_t br_disp = addr - jmp_rx;
     tcg_insn_unit i1, i2;
 
     /* We can reach the entire address space for ILP32.
@@ -1834,9 +1834,9 @@ void tb_target_set_jmp_target(uintptr_t tc_ptr, uintptr_t jmp_addr,
     tcg_debug_assert(br_disp == (int32_t)br_disp);
 
     if (!USE_REG_TB) {
-        qatomic_set((uint32_t *)jmp_addr,
+        qatomic_set((uint32_t *)jmp_rw,
 		    deposit32(CALL, 0, 30, br_disp >> 2));
-        flush_idcache_range(jmp_addr, jmp_addr, 4);
+        flush_idcache_range(jmp_rx, jmp_rw, 4);
         return;
     }
 
@@ -1859,6 +1859,6 @@ void tb_target_set_jmp_target(uintptr_t tc_ptr, uintptr_t jmp_addr,
               | INSN_IMM13((tb_disp & 0x3ff) | -0x400));
     }
 
-    qatomic_set((uint64_t *)jmp_addr, deposit64(i2, 32, 32, i1));
-    flush_idcache_range(jmp_addr, jmp_addr, 8);
+    qatomic_set((uint64_t *)jmp_rw, deposit64(i2, 32, 32, i1));
+    flush_idcache_range(jmp_rx, jmp_rw, 8);
 }
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 09/19] tcg: Make DisasContextBase.tb const
  2020-10-30  0:49 [PATCH v2 00/19] Mirror map JIT memory for TCG Richard Henderson
                   ` (7 preceding siblings ...)
  2020-10-30  0:49 ` [PATCH v2 08/19] tcg: Adjust tb_target_set_jmp_target for split rwx Richard Henderson
@ 2020-10-30  0:49 ` Richard Henderson
  2020-10-30  0:49 ` [PATCH v2 10/19] tcg: Make tb arg to synchronize_from_tb const Richard Henderson
                   ` (11 subsequent siblings)
  20 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2020-10-30  0:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, j, laurent

There is nothing within the translators that ought to be
changing the TranslationBlock data, so make it const.

This does not actually use the read-only copy of the
data structure that exists within the rx mirror.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/exec/gen-icount.h  | 4 ++--
 include/exec/translator.h  | 2 +-
 include/tcg/tcg-op.h       | 2 +-
 accel/tcg/translator.c     | 4 ++--
 target/arm/translate-a64.c | 2 +-
 tcg/tcg-op.c               | 2 +-
 6 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/include/exec/gen-icount.h b/include/exec/gen-icount.h
index 822c43cfd3..aa4b44354a 100644
--- a/include/exec/gen-icount.h
+++ b/include/exec/gen-icount.h
@@ -32,7 +32,7 @@ static inline void gen_io_end(void)
     tcg_temp_free_i32(tmp);
 }
 
-static inline void gen_tb_start(TranslationBlock *tb)
+static inline void gen_tb_start(const TranslationBlock *tb)
 {
     TCGv_i32 count, imm;
 
@@ -71,7 +71,7 @@ static inline void gen_tb_start(TranslationBlock *tb)
     tcg_temp_free_i32(count);
 }
 
-static inline void gen_tb_end(TranslationBlock *tb, int num_insns)
+static inline void gen_tb_end(const TranslationBlock *tb, int num_insns)
 {
     if (tb_cflags(tb) & CF_USE_ICOUNT) {
         /* Update the num_insn immediate parameter now that we know
diff --git a/include/exec/translator.h b/include/exec/translator.h
index 638e1529c5..24232ead41 100644
--- a/include/exec/translator.h
+++ b/include/exec/translator.h
@@ -67,7 +67,7 @@ typedef enum DisasJumpType {
  * Architecture-agnostic disassembly context.
  */
 typedef struct DisasContextBase {
-    TranslationBlock *tb;
+    const TranslationBlock *tb;
     target_ulong pc_first;
     target_ulong pc_next;
     DisasJumpType is_jmp;
diff --git a/include/tcg/tcg-op.h b/include/tcg/tcg-op.h
index 5abf17fecc..cbe39a3b95 100644
--- a/include/tcg/tcg-op.h
+++ b/include/tcg/tcg-op.h
@@ -805,7 +805,7 @@ static inline void tcg_gen_insn_start(target_ulong pc, target_ulong a1,
  * be NULL and @idx should be 0.  Otherwise, @tb should be valid and
  * @idx should be one of the TB_EXIT_ values.
  */
-void tcg_gen_exit_tb(TranslationBlock *tb, unsigned idx);
+void tcg_gen_exit_tb(const TranslationBlock *tb, unsigned idx);
 
 /**
  * tcg_gen_goto_tb() - output goto_tb TCG operation
diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
index fb1e19c585..a49a794065 100644
--- a/accel/tcg/translator.c
+++ b/accel/tcg/translator.c
@@ -133,8 +133,8 @@ void translator_loop(const TranslatorOps *ops, DisasContextBase *db,
     }
 
     /* The disas_log hook may use these values rather than recompute.  */
-    db->tb->size = db->pc_next - db->pc_first;
-    db->tb->icount = db->num_insns;
+    tb->size = db->pc_next - db->pc_first;
+    tb->icount = db->num_insns;
 
 #ifdef DEBUG_DISAS
     if (qemu_loglevel_mask(CPU_LOG_TB_IN_ASM)
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 072754fa24..297782e6ef 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -410,7 +410,7 @@ static inline bool use_goto_tb(DisasContext *s, int n, uint64_t dest)
 
 static inline void gen_goto_tb(DisasContext *s, int n, uint64_t dest)
 {
-    TranslationBlock *tb;
+    const TranslationBlock *tb;
 
     tb = s->base.tb;
     if (use_goto_tb(s, n, dest)) {
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 4b8a473fad..e3dc0cb4cb 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -2664,7 +2664,7 @@ void tcg_gen_extr32_i64(TCGv_i64 lo, TCGv_i64 hi, TCGv_i64 arg)
 
 /* QEMU specific operations.  */
 
-void tcg_gen_exit_tb(TranslationBlock *tb, unsigned idx)
+void tcg_gen_exit_tb(const TranslationBlock *tb, unsigned idx)
 {
     uintptr_t val = (uintptr_t)tb + idx;
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 10/19] tcg: Make tb arg to synchronize_from_tb const
  2020-10-30  0:49 [PATCH v2 00/19] Mirror map JIT memory for TCG Richard Henderson
                   ` (8 preceding siblings ...)
  2020-10-30  0:49 ` [PATCH v2 09/19] tcg: Make DisasContextBase.tb const Richard Henderson
@ 2020-10-30  0:49 ` Richard Henderson
  2020-10-30  0:49 ` [PATCH v2 11/19] tcg: Use Error with alloc_code_gen_buffer Richard Henderson
                   ` (10 subsequent siblings)
  20 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2020-10-30  0:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, j, laurent

There is nothing within the translators that ought to be
changing the TranslationBlock data, so make it const.

This does not actually use the read-only copy of the
data structure that exists within the rx mirror.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/hw/core/cpu.h   | 3 ++-
 target/arm/cpu.c        | 3 ++-
 target/avr/cpu.c        | 3 ++-
 target/hppa/cpu.c       | 3 ++-
 target/i386/cpu.c       | 3 ++-
 target/microblaze/cpu.c | 3 ++-
 target/mips/cpu.c       | 3 ++-
 target/riscv/cpu.c      | 3 ++-
 target/rx/cpu.c         | 3 ++-
 target/sh4/cpu.c        | 3 ++-
 target/sparc/cpu.c      | 3 ++-
 target/tricore/cpu.c    | 2 +-
 12 files changed, 23 insertions(+), 12 deletions(-)

diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index 9c3a45ad7b..67253e662b 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -189,7 +189,8 @@ struct CPUClass {
     void (*get_memory_mapping)(CPUState *cpu, MemoryMappingList *list,
                                Error **errp);
     void (*set_pc)(CPUState *cpu, vaddr value);
-    void (*synchronize_from_tb)(CPUState *cpu, struct TranslationBlock *tb);
+    void (*synchronize_from_tb)(CPUState *cpu,
+                                const struct TranslationBlock *tb);
     bool (*tlb_fill)(CPUState *cpu, vaddr address, int size,
                      MMUAccessType access_type, int mmu_idx,
                      bool probe, uintptr_t retaddr);
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 07492e9f9a..2f9be1c0ee 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -54,7 +54,8 @@ static void arm_cpu_set_pc(CPUState *cs, vaddr value)
     }
 }
 
-static void arm_cpu_synchronize_from_tb(CPUState *cs, TranslationBlock *tb)
+static void arm_cpu_synchronize_from_tb(CPUState *cs,
+                                        const TranslationBlock *tb)
 {
     ARMCPU *cpu = ARM_CPU(cs);
     CPUARMState *env = &cpu->env;
diff --git a/target/avr/cpu.c b/target/avr/cpu.c
index 5d9c4ad5bf..6f3d5a9e4a 100644
--- a/target/avr/cpu.c
+++ b/target/avr/cpu.c
@@ -41,7 +41,8 @@ static bool avr_cpu_has_work(CPUState *cs)
             && cpu_interrupts_enabled(env);
 }
 
-static void avr_cpu_synchronize_from_tb(CPUState *cs, TranslationBlock *tb)
+static void avr_cpu_synchronize_from_tb(CPUState *cs,
+                                        const TranslationBlock *tb)
 {
     AVRCPU *cpu = AVR_CPU(cs);
     CPUAVRState *env = &cpu->env;
diff --git a/target/hppa/cpu.c b/target/hppa/cpu.c
index 71b6aca45d..e28f047d10 100644
--- a/target/hppa/cpu.c
+++ b/target/hppa/cpu.c
@@ -35,7 +35,8 @@ static void hppa_cpu_set_pc(CPUState *cs, vaddr value)
     cpu->env.iaoq_b = value + 4;
 }
 
-static void hppa_cpu_synchronize_from_tb(CPUState *cs, TranslationBlock *tb)
+static void hppa_cpu_synchronize_from_tb(CPUState *cs,
+                                         const TranslationBlock *tb)
 {
     HPPACPU *cpu = HPPA_CPU(cs);
 
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 0d8606958e..01a8acafe3 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -7012,7 +7012,8 @@ static void x86_cpu_set_pc(CPUState *cs, vaddr value)
     cpu->env.eip = value;
 }
 
-static void x86_cpu_synchronize_from_tb(CPUState *cs, TranslationBlock *tb)
+static void x86_cpu_synchronize_from_tb(CPUState *cs,
+                                        const TranslationBlock *tb)
 {
     X86CPU *cpu = X86_CPU(cs);
 
diff --git a/target/microblaze/cpu.c b/target/microblaze/cpu.c
index 9b2482159d..c8e754cfb1 100644
--- a/target/microblaze/cpu.c
+++ b/target/microblaze/cpu.c
@@ -83,7 +83,8 @@ static void mb_cpu_set_pc(CPUState *cs, vaddr value)
     cpu->env.iflags = 0;
 }
 
-static void mb_cpu_synchronize_from_tb(CPUState *cs, TranslationBlock *tb)
+static void mb_cpu_synchronize_from_tb(CPUState *cs,
+                                       const TranslationBlock *tb)
 {
     MicroBlazeCPU *cpu = MICROBLAZE_CPU(cs);
 
diff --git a/target/mips/cpu.c b/target/mips/cpu.c
index 76d50b00b4..79eee215cf 100644
--- a/target/mips/cpu.c
+++ b/target/mips/cpu.c
@@ -44,7 +44,8 @@ static void mips_cpu_set_pc(CPUState *cs, vaddr value)
     }
 }
 
-static void mips_cpu_synchronize_from_tb(CPUState *cs, TranslationBlock *tb)
+static void mips_cpu_synchronize_from_tb(CPUState *cs,
+                                         const TranslationBlock *tb)
 {
     MIPSCPU *cpu = MIPS_CPU(cs);
     CPUMIPSState *env = &cpu->env;
diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 0bbfd7f457..faaa9d1e8f 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -282,7 +282,8 @@ static void riscv_cpu_set_pc(CPUState *cs, vaddr value)
     env->pc = value;
 }
 
-static void riscv_cpu_synchronize_from_tb(CPUState *cs, TranslationBlock *tb)
+static void riscv_cpu_synchronize_from_tb(CPUState *cs,
+                                          const TranslationBlock *tb)
 {
     RISCVCPU *cpu = RISCV_CPU(cs);
     CPURISCVState *env = &cpu->env;
diff --git a/target/rx/cpu.c b/target/rx/cpu.c
index 23ee17a701..2bb14144a7 100644
--- a/target/rx/cpu.c
+++ b/target/rx/cpu.c
@@ -33,7 +33,8 @@ static void rx_cpu_set_pc(CPUState *cs, vaddr value)
     cpu->env.pc = value;
 }
 
-static void rx_cpu_synchronize_from_tb(CPUState *cs, TranslationBlock *tb)
+static void rx_cpu_synchronize_from_tb(CPUState *cs,
+                                       const TranslationBlock *tb)
 {
     RXCPU *cpu = RX_CPU(cs);
 
diff --git a/target/sh4/cpu.c b/target/sh4/cpu.c
index 3c68021c56..1e0f05a15b 100644
--- a/target/sh4/cpu.c
+++ b/target/sh4/cpu.c
@@ -34,7 +34,8 @@ static void superh_cpu_set_pc(CPUState *cs, vaddr value)
     cpu->env.pc = value;
 }
 
-static void superh_cpu_synchronize_from_tb(CPUState *cs, TranslationBlock *tb)
+static void superh_cpu_synchronize_from_tb(CPUState *cs,
+                                           const TranslationBlock *tb)
 {
     SuperHCPU *cpu = SUPERH_CPU(cs);
 
diff --git a/target/sparc/cpu.c b/target/sparc/cpu.c
index cf21efd85f..b9241208b1 100644
--- a/target/sparc/cpu.c
+++ b/target/sparc/cpu.c
@@ -691,7 +691,8 @@ static void sparc_cpu_set_pc(CPUState *cs, vaddr value)
     cpu->env.npc = value + 4;
 }
 
-static void sparc_cpu_synchronize_from_tb(CPUState *cs, TranslationBlock *tb)
+static void sparc_cpu_synchronize_from_tb(CPUState *cs,
+                                          const TranslationBlock *tb)
 {
     SPARCCPU *cpu = SPARC_CPU(cs);
 
diff --git a/target/tricore/cpu.c b/target/tricore/cpu.c
index 2f2e5b029f..4bff1d4718 100644
--- a/target/tricore/cpu.c
+++ b/target/tricore/cpu.c
@@ -42,7 +42,7 @@ static void tricore_cpu_set_pc(CPUState *cs, vaddr value)
 }
 
 static void tricore_cpu_synchronize_from_tb(CPUState *cs,
-                                            TranslationBlock *tb)
+                                            const TranslationBlock *tb)
 {
     TriCoreCPU *cpu = TRICORE_CPU(cs);
     CPUTriCoreState *env = &cpu->env;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 11/19] tcg: Use Error with alloc_code_gen_buffer
  2020-10-30  0:49 [PATCH v2 00/19] Mirror map JIT memory for TCG Richard Henderson
                   ` (9 preceding siblings ...)
  2020-10-30  0:49 ` [PATCH v2 10/19] tcg: Make tb arg to synchronize_from_tb const Richard Henderson
@ 2020-10-30  0:49 ` Richard Henderson
  2020-10-30  0:49 ` [PATCH v2 12/19] tcg: Add --accel tcg,split-rwx property Richard Henderson
                   ` (9 subsequent siblings)
  20 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2020-10-30  0:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, j, laurent

Report better error messages than just "could not allocate".
Let alloc_code_gen_buffer set ctx->code_gen_buffer_size
and ctx->code_gen_buffer, and simply return bool.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/translate-all.c | 60 ++++++++++++++++++++++-----------------
 1 file changed, 34 insertions(+), 26 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index c3e35bdee6..fca632eefa 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -59,6 +59,7 @@
 #include "sysemu/cpus.h"
 #include "sysemu/cpu-timers.h"
 #include "sysemu/tcg.h"
+#include "qapi/error.h"
 
 /* #define DEBUG_TB_INVALIDATE */
 /* #define DEBUG_TB_FLUSH */
@@ -973,7 +974,7 @@ static void page_lock_pair(PageDesc **ret_p1, tb_page_addr_t phys1,
   (DEFAULT_CODE_GEN_BUFFER_SIZE_1 < MAX_CODE_GEN_BUFFER_SIZE \
    ? DEFAULT_CODE_GEN_BUFFER_SIZE_1 : MAX_CODE_GEN_BUFFER_SIZE)
 
-static inline size_t size_code_gen_buffer(size_t tb_size)
+static size_t size_code_gen_buffer(size_t tb_size)
 {
     /* Size the buffer.  */
     if (tb_size == 0) {
@@ -1024,7 +1025,7 @@ static inline void *split_cross_256mb(void *buf1, size_t size1)
 static uint8_t static_code_gen_buffer[DEFAULT_CODE_GEN_BUFFER_SIZE]
     __attribute__((aligned(CODE_GEN_ALIGN)));
 
-static inline void *alloc_code_gen_buffer(void)
+static bool alloc_code_gen_buffer(size_t tb_size, Error **errp)
 {
     void *buf = static_code_gen_buffer;
     void *end = static_code_gen_buffer + sizeof(static_code_gen_buffer);
@@ -1037,9 +1038,8 @@ static inline void *alloc_code_gen_buffer(void)
     size = end - buf;
 
     /* Honor a command-line option limiting the size of the buffer.  */
-    if (size > tcg_ctx->code_gen_buffer_size) {
-        size = QEMU_ALIGN_DOWN(tcg_ctx->code_gen_buffer_size,
-                               qemu_real_host_page_size);
+    if (size > tb_size) {
+        size = QEMU_ALIGN_DOWN(tb_size, qemu_real_host_page_size);
     }
     tcg_ctx->code_gen_buffer_size = size;
 
@@ -1051,31 +1051,43 @@ static inline void *alloc_code_gen_buffer(void)
 #endif
 
     if (qemu_mprotect_rwx(buf, size)) {
-        abort();
+        error_setg_errno(errp, errno, "mprotect of jit buffer");
+        return false;
     }
     qemu_madvise(buf, size, QEMU_MADV_HUGEPAGE);
 
-    return buf;
+    tcg_ctx->code_gen_buffer = buf;
+    return true;
 }
 #elif defined(_WIN32)
-static inline void *alloc_code_gen_buffer(void)
+static bool alloc_code_gen_buffer(size_t size, Error **errp)
 {
-    size_t size = tcg_ctx->code_gen_buffer_size;
-    return VirtualAlloc(NULL, size, MEM_RESERVE | MEM_COMMIT,
-                        PAGE_EXECUTE_READWRITE);
+    void *buf = VirtualAlloc(NULL, size, MEM_RESERVE | MEM_COMMIT,
+                             PAGE_EXECUTE_READWRITE);
+    if (buf == NULL) {
+        error_setg_win32(errp, GetLastError(),
+                         "allocate %zu bytes for jit buffer", size);
+        return false;
+    }
+
+    tcg_ctx->code_gen_buffer = buf;
+    tcg_ctx->code_gen_buffer_size = size;
+    return true;
 }
 #else
-static inline void *alloc_code_gen_buffer(void)
+static bool alloc_code_gen_buffer(size_t size, Error **errp)
 {
     int prot = PROT_WRITE | PROT_READ | PROT_EXEC;
     int flags = MAP_PRIVATE | MAP_ANONYMOUS;
-    size_t size = tcg_ctx->code_gen_buffer_size;
     void *buf;
 
     buf = mmap(NULL, size, prot, flags, -1, 0);
     if (buf == MAP_FAILED) {
-        return NULL;
+        error_setg_errno(errp, errno,
+                         "allocate %zu bytes for jit buffer", size);
+        return false;
     }
+    tcg_ctx->code_gen_buffer_size = size;
 
 #ifdef __mips__
     if (cross_256mb(buf, size)) {
@@ -1114,20 +1126,11 @@ static inline void *alloc_code_gen_buffer(void)
     /* Request large pages for the buffer.  */
     qemu_madvise(buf, size, QEMU_MADV_HUGEPAGE);
 
-    return buf;
+    tcg_ctx->code_gen_buffer = buf;
+    return true;
 }
 #endif /* USE_STATIC_CODE_GEN_BUFFER, WIN32, POSIX */
 
-static inline void code_gen_alloc(size_t tb_size)
-{
-    tcg_ctx->code_gen_buffer_size = size_code_gen_buffer(tb_size);
-    tcg_ctx->code_gen_buffer = alloc_code_gen_buffer();
-    if (tcg_ctx->code_gen_buffer == NULL) {
-        fprintf(stderr, "Could not allocate dynamic translator buffer\n");
-        exit(1);
-    }
-}
-
 static bool tb_cmp(const void *ap, const void *bp)
 {
     const TranslationBlock *a = ap;
@@ -1154,11 +1157,16 @@ static void tb_htable_init(void)
    size. */
 void tcg_exec_init(unsigned long tb_size)
 {
+    bool ok;
+
     tcg_allowed = true;
     cpu_gen_init();
     page_init();
     tb_htable_init();
-    code_gen_alloc(tb_size);
+
+    ok = alloc_code_gen_buffer(size_code_gen_buffer(tb_size), &error_fatal);
+    assert(ok);
+
 #if defined(CONFIG_SOFTMMU)
     /* There's no guest base to take into account, so go ahead and
        initialize the prologue now.  */
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 12/19] tcg: Add --accel tcg,split-rwx property
  2020-10-30  0:49 [PATCH v2 00/19] Mirror map JIT memory for TCG Richard Henderson
                   ` (10 preceding siblings ...)
  2020-10-30  0:49 ` [PATCH v2 11/19] tcg: Use Error with alloc_code_gen_buffer Richard Henderson
@ 2020-10-30  0:49 ` Richard Henderson
  2020-10-30  0:49 ` [PATCH v2 13/19] accel/tcg: Support split-rwx for linux with memfd Richard Henderson
                   ` (8 subsequent siblings)
  20 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2020-10-30  0:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, j, laurent

Plumb the value through to alloc_code_gen_buffer.
This is not supported by any os or tcg backend so
for now, enabling it will result in an error.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/sysemu/tcg.h      |  2 +-
 tcg/aarch64/tcg-target.h  |  1 +
 tcg/arm/tcg-target.h      |  1 +
 tcg/i386/tcg-target.h     |  1 +
 tcg/mips/tcg-target.h     |  1 +
 tcg/ppc/tcg-target.h      |  1 +
 tcg/riscv/tcg-target.h    |  1 +
 tcg/s390/tcg-target.h     |  1 +
 tcg/sparc/tcg-target.h    |  1 +
 tcg/tci/tcg-target.h      |  1 +
 accel/tcg/tcg-all.c       | 26 +++++++++++++++++++++++++-
 accel/tcg/translate-all.c | 35 +++++++++++++++++++++++++++--------
 bsd-user/main.c           |  2 +-
 linux-user/main.c         |  2 +-
 14 files changed, 64 insertions(+), 12 deletions(-)

diff --git a/include/sysemu/tcg.h b/include/sysemu/tcg.h
index d9d3ca8559..5734dd92dc 100644
--- a/include/sysemu/tcg.h
+++ b/include/sysemu/tcg.h
@@ -8,7 +8,7 @@
 #ifndef SYSEMU_TCG_H
 #define SYSEMU_TCG_H
 
-void tcg_exec_init(unsigned long tb_size);
+void tcg_exec_init(unsigned long tb_size, int mirror);
 #ifdef CONFIG_TCG
 extern bool tcg_allowed;
 #define tcg_enabled() (tcg_allowed)
diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index 91313d93be..fa64058d43 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -164,5 +164,6 @@ void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t, uintptr_t);
 #define TCG_TARGET_NEED_LDST_LABELS
 #endif
 #define TCG_TARGET_NEED_POOL_LABELS
+#define TCG_TARGET_SUPPORT_MIRROR       0
 
 #endif /* AARCH64_TCG_TARGET_H */
diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
index b21a2fb6a1..e355d6a4b2 100644
--- a/tcg/arm/tcg-target.h
+++ b/tcg/arm/tcg-target.h
@@ -150,5 +150,6 @@ void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t, uintptr_t);
 #define TCG_TARGET_NEED_LDST_LABELS
 #endif
 #define TCG_TARGET_NEED_POOL_LABELS
+#define TCG_TARGET_SUPPORT_MIRROR       0
 
 #endif
diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index f52ba0ffec..1b9d41bd56 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -236,5 +236,6 @@ static inline void tb_target_set_jmp_target(uintptr_t tc_ptr, uintptr_t jmp_rx,
 #define TCG_TARGET_NEED_LDST_LABELS
 #endif
 #define TCG_TARGET_NEED_POOL_LABELS
+#define TCG_TARGET_SUPPORT_MIRROR       0
 
 #endif
diff --git a/tcg/mips/tcg-target.h b/tcg/mips/tcg-target.h
index cd548dacec..d231522dc9 100644
--- a/tcg/mips/tcg-target.h
+++ b/tcg/mips/tcg-target.h
@@ -206,6 +206,7 @@ extern bool use_mips32r2_instructions;
 
 #define TCG_TARGET_DEFAULT_MO (0)
 #define TCG_TARGET_HAS_MEMORY_BSWAP     1
+#define TCG_TARGET_SUPPORT_MIRROR       0
 
 /* Flush the dcache at RW, and the icache at RX, as necessary. */
 static inline void flush_idcache_range(uintptr_t rx, uintptr_t rw, size_t len)
diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index 8f3e4c924a..78d6a5e96f 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -185,5 +185,6 @@ void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t, uintptr_t);
 #define TCG_TARGET_NEED_LDST_LABELS
 #endif
 #define TCG_TARGET_NEED_POOL_LABELS
+#define TCG_TARGET_SUPPORT_MIRROR       0
 
 #endif
diff --git a/tcg/riscv/tcg-target.h b/tcg/riscv/tcg-target.h
index e03fd17427..3c2e8305b0 100644
--- a/tcg/riscv/tcg-target.h
+++ b/tcg/riscv/tcg-target.h
@@ -179,5 +179,6 @@ void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t, uintptr_t);
 #define TCG_TARGET_NEED_POOL_LABELS
 
 #define TCG_TARGET_HAS_MEMORY_BSWAP 0
+#define TCG_TARGET_SUPPORT_MIRROR   0
 
 #endif
diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h
index c5a749e425..8324197127 100644
--- a/tcg/s390/tcg-target.h
+++ b/tcg/s390/tcg-target.h
@@ -163,5 +163,6 @@ static inline void tb_target_set_jmp_target(uintptr_t tc_ptr, uintptr_t jmp_rx,
 #define TCG_TARGET_NEED_LDST_LABELS
 #endif
 #define TCG_TARGET_NEED_POOL_LABELS
+#define TCG_TARGET_SUPPORT_MIRROR       0
 
 #endif
diff --git a/tcg/sparc/tcg-target.h b/tcg/sparc/tcg-target.h
index 87e2be61e6..517840705f 100644
--- a/tcg/sparc/tcg-target.h
+++ b/tcg/sparc/tcg-target.h
@@ -181,5 +181,6 @@ static inline void flush_idcache_range(uintptr_t rx, uintptr_t rw, size_t len)
 void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t, uintptr_t);
 
 #define TCG_TARGET_NEED_POOL_LABELS
+#define TCG_TARGET_SUPPORT_MIRROR       0
 
 #endif
diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index a19a6b06e5..3653fef947 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -200,6 +200,7 @@ static inline void flush_idcache_range(uintptr_t rx, uintptr_t rw, size_t len)
 #define TCG_TARGET_DEFAULT_MO  (0)
 
 #define TCG_TARGET_HAS_MEMORY_BSWAP     1
+#define TCG_TARGET_SUPPORT_MIRROR       0
 
 static inline void tb_target_set_jmp_target(uintptr_t tc_ptr, uintptr_t jmp_rx,
                                             uintptr_t jmp_rw, uintptr_t addr)
diff --git a/accel/tcg/tcg-all.c b/accel/tcg/tcg-all.c
index fa1208158f..ba4206d507 100644
--- a/accel/tcg/tcg-all.c
+++ b/accel/tcg/tcg-all.c
@@ -38,6 +38,7 @@ struct TCGState {
     AccelState parent_obj;
 
     bool mttcg_enabled;
+    int mirror_enabled;
     unsigned long tb_size;
 };
 typedef struct TCGState TCGState;
@@ -94,6 +95,13 @@ static void tcg_accel_instance_init(Object *obj)
     TCGState *s = TCG_STATE(obj);
 
     s->mttcg_enabled = default_mttcg_enabled();
+
+    /* If debugging enabled, default "auto on", otherwise off. */
+#ifdef CONFIG_DEBUG_TCG
+    s->mirror_enabled = -1;
+#else
+    s->mirror_enabled = 0;
+#endif
 }
 
 bool mttcg_enabled;
@@ -102,7 +110,7 @@ static int tcg_init(MachineState *ms)
 {
     TCGState *s = TCG_STATE(current_accel());
 
-    tcg_exec_init(s->tb_size * 1024 * 1024);
+    tcg_exec_init(s->tb_size * 1024 * 1024, s->mirror_enabled);
     mttcg_enabled = s->mttcg_enabled;
     cpus_register_accel(&tcg_cpus);
 
@@ -168,6 +176,18 @@ static void tcg_set_tb_size(Object *obj, Visitor *v,
     s->tb_size = value;
 }
 
+static bool tcg_get_split_rwx(Object *obj, Error **errp)
+{
+    TCGState *s = TCG_STATE(obj);
+    return s->mirror_enabled;
+}
+
+static void tcg_set_split_rwx(Object *obj, bool value, Error **errp)
+{
+    TCGState *s = TCG_STATE(obj);
+    s->mirror_enabled = value;
+}
+
 static void tcg_accel_class_init(ObjectClass *oc, void *data)
 {
     AccelClass *ac = ACCEL_CLASS(oc);
@@ -185,6 +205,10 @@ static void tcg_accel_class_init(ObjectClass *oc, void *data)
     object_class_property_set_description(oc, "tb-size",
         "TCG translation block cache size");
 
+    object_class_property_add_bool(oc, "split-rwx",
+        tcg_get_split_rwx, tcg_set_split_rwx);
+    object_class_property_set_description(oc, "split-rwx",
+        "Map jit pages into separate RW and RX regions");
 }
 
 static const TypeInfo tcg_accel_type = {
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index fca632eefa..8918a09f10 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1025,13 +1025,19 @@ static inline void *split_cross_256mb(void *buf1, size_t size1)
 static uint8_t static_code_gen_buffer[DEFAULT_CODE_GEN_BUFFER_SIZE]
     __attribute__((aligned(CODE_GEN_ALIGN)));
 
-static bool alloc_code_gen_buffer(size_t tb_size, Error **errp)
+static bool alloc_code_gen_buffer(size_t tb_size, int mirror, Error **errp)
 {
-    void *buf = static_code_gen_buffer;
-    void *end = static_code_gen_buffer + sizeof(static_code_gen_buffer);
+    void *buf, *end;
     size_t size;
 
+    if (mirror > 0) {
+        error_setg(errp, "jit split-rwx not supported");
+        return false;
+    }
+
     /* page-align the beginning and end of the buffer */
+    buf = static_code_gen_buffer;
+    end = static_code_gen_buffer + sizeof(static_code_gen_buffer);
     buf = QEMU_ALIGN_PTR_UP(buf, qemu_real_host_page_size);
     end = QEMU_ALIGN_PTR_DOWN(end, qemu_real_host_page_size);
 
@@ -1060,9 +1066,16 @@ static bool alloc_code_gen_buffer(size_t tb_size, Error **errp)
     return true;
 }
 #elif defined(_WIN32)
-static bool alloc_code_gen_buffer(size_t size, Error **errp)
+static bool alloc_code_gen_buffer(size_t size, int mirror, Error **errp)
 {
-    void *buf = VirtualAlloc(NULL, size, MEM_RESERVE | MEM_COMMIT,
+    void *buf;
+
+    if (mirror > 0) {
+        error_setg(errp, "jit split-rwx not supported");
+        return false;
+    }
+
+    buf = VirtualAlloc(NULL, size, MEM_RESERVE | MEM_COMMIT,
                              PAGE_EXECUTE_READWRITE);
     if (buf == NULL) {
         error_setg_win32(errp, GetLastError(),
@@ -1075,12 +1088,17 @@ static bool alloc_code_gen_buffer(size_t size, Error **errp)
     return true;
 }
 #else
-static bool alloc_code_gen_buffer(size_t size, Error **errp)
+static bool alloc_code_gen_buffer(size_t size, int mirror, Error **errp)
 {
     int prot = PROT_WRITE | PROT_READ | PROT_EXEC;
     int flags = MAP_PRIVATE | MAP_ANONYMOUS;
     void *buf;
 
+    if (mirror > 0) {
+        error_setg(errp, "jit split-rwx not supported");
+        return false;
+    }
+
     buf = mmap(NULL, size, prot, flags, -1, 0);
     if (buf == MAP_FAILED) {
         error_setg_errno(errp, errno,
@@ -1155,7 +1173,7 @@ static void tb_htable_init(void)
 /* Must be called before using the QEMU cpus. 'tb_size' is the size
    (in bytes) allocated to the translation buffer. Zero means default
    size. */
-void tcg_exec_init(unsigned long tb_size)
+void tcg_exec_init(unsigned long tb_size, int mirror)
 {
     bool ok;
 
@@ -1164,7 +1182,8 @@ void tcg_exec_init(unsigned long tb_size)
     page_init();
     tb_htable_init();
 
-    ok = alloc_code_gen_buffer(size_code_gen_buffer(tb_size), &error_fatal);
+    ok = alloc_code_gen_buffer(size_code_gen_buffer(tb_size),
+                               mirror, &error_fatal);
     assert(ok);
 
 #if defined(CONFIG_SOFTMMU)
diff --git a/bsd-user/main.c b/bsd-user/main.c
index ac40d79bfa..ffd4888a26 100644
--- a/bsd-user/main.c
+++ b/bsd-user/main.c
@@ -910,7 +910,7 @@ int main(int argc, char **argv)
     }
 
     /* init tcg before creating CPUs and to get qemu_host_page_size */
-    tcg_exec_init(0);
+    tcg_exec_init(0, false);
 
     cpu_type = parse_cpu_option(cpu_model);
     cpu = cpu_create(cpu_type);
diff --git a/linux-user/main.c b/linux-user/main.c
index 75c9785157..3856b2611d 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -705,7 +705,7 @@ int main(int argc, char **argv, char **envp)
     cpu_type = parse_cpu_option(cpu_model);
 
     /* init tcg before creating CPUs and to get qemu_host_page_size */
-    tcg_exec_init(0);
+    tcg_exec_init(0, false);
 
     cpu = cpu_create(cpu_type);
     env = cpu->env_ptr;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 13/19] accel/tcg: Support split-rwx for linux with memfd
  2020-10-30  0:49 [PATCH v2 00/19] Mirror map JIT memory for TCG Richard Henderson
                   ` (11 preceding siblings ...)
  2020-10-30  0:49 ` [PATCH v2 12/19] tcg: Add --accel tcg,split-rwx property Richard Henderson
@ 2020-10-30  0:49 ` Richard Henderson
  2020-10-30  0:49 ` [PATCH v2 14/19] RFC: accel/tcg: Support split-rwx for darwin/iOS with vm_remap Richard Henderson
                   ` (7 subsequent siblings)
  20 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2020-10-30  0:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, j, laurent

We cannot use a real temp file, because we would need to find
a filesystem that does not have noexec enabled.  However, a
memfd is not associated with any filesystem.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/translate-all.c | 87 +++++++++++++++++++++++++++++++++++----
 1 file changed, 80 insertions(+), 7 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 8918a09f10..3e69ebd1d3 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1088,17 +1088,11 @@ static bool alloc_code_gen_buffer(size_t size, int mirror, Error **errp)
     return true;
 }
 #else
-static bool alloc_code_gen_buffer(size_t size, int mirror, Error **errp)
+static bool alloc_code_gen_buffer_anon(size_t size, int prot, Error **errp)
 {
-    int prot = PROT_WRITE | PROT_READ | PROT_EXEC;
     int flags = MAP_PRIVATE | MAP_ANONYMOUS;
     void *buf;
 
-    if (mirror > 0) {
-        error_setg(errp, "jit split-rwx not supported");
-        return false;
-    }
-
     buf = mmap(NULL, size, prot, flags, -1, 0);
     if (buf == MAP_FAILED) {
         error_setg_errno(errp, errno,
@@ -1147,6 +1141,85 @@ static bool alloc_code_gen_buffer(size_t size, int mirror, Error **errp)
     tcg_ctx->code_gen_buffer = buf;
     return true;
 }
+
+#ifdef CONFIG_LINUX
+#include "qemu/memfd.h"
+
+static bool alloc_code_gen_buffer_mirror_memfd(size_t size, Error **errp)
+{
+    void *buf_rw, *buf_rx;
+    int fd;
+
+    fd = qemu_memfd_create("tcg-jit", size, false, 0, 0, errp);
+    if (fd < 0) {
+        return false;
+    }
+
+    buf_rw = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+    if (buf_rw == MAP_FAILED) {
+        error_setg_errno(errp, errno,
+                         "allocate %zu bytes for jit buffer", size);
+        close(fd);
+        return false;
+    }
+
+    buf_rx = mmap(NULL, size, PROT_READ | PROT_EXEC, MAP_SHARED, fd, 0);
+    if (buf_rx == MAP_FAILED) {
+        error_setg_errno(errp, errno,
+                         "allocate %zu bytes for jit mirror", size);
+        munmap(buf_rw, size);
+        close(fd);
+        return false;
+    }
+    close(fd);
+
+    tcg_ctx->code_gen_buffer = buf_rw;
+    tcg_ctx->code_gen_buffer_size = size;
+    tcg_rx_mirror_diff = buf_rx - buf_rw;
+
+    /* Request large pages for the buffer and the mirror.  */
+    qemu_madvise(buf_rw, size, QEMU_MADV_HUGEPAGE);
+    qemu_madvise(buf_rx, size, QEMU_MADV_HUGEPAGE);
+    return true;
+}
+#endif
+
+static bool alloc_code_gen_buffer_mirror(size_t size, Error **errp)
+{
+    if (TCG_TARGET_SUPPORT_MIRROR) {
+#ifdef CONFIG_LINUX
+        return alloc_code_gen_buffer_mirror_memfd(size, errp);
+#endif
+    }
+    error_setg(errp, "jit split-rwx not supported");
+    return false;
+}
+
+static bool alloc_code_gen_buffer(size_t size, int mirror, Error **errp)
+{
+    if (mirror) {
+        Error *local_err = NULL;
+        if (alloc_code_gen_buffer_mirror(size, &local_err)) {
+            return true;
+        }
+        /*
+         * If mirror force-on (1), fail;
+         * if mirror default-on (-1), fall through to mirror off.
+         */
+        if (mirror > 0) {
+            error_propagate(errp, local_err);
+            return false;
+        }
+    }
+
+    int prot = PROT_READ | PROT_WRITE | PROT_EXEC;
+#ifdef CONFIG_TCG_INTERPRETER
+    /* The tcg interpreter does not need execute permission. */
+    prot = PROT_READ | PROT_WRITE;
+#endif
+
+    return alloc_code_gen_buffer_anon(size, prot, errp);
+}
 #endif /* USE_STATIC_CODE_GEN_BUFFER, WIN32, POSIX */
 
 static bool tb_cmp(const void *ap, const void *bp)
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 14/19] RFC: accel/tcg: Support split-rwx for darwin/iOS with vm_remap
  2020-10-30  0:49 [PATCH v2 00/19] Mirror map JIT memory for TCG Richard Henderson
                   ` (12 preceding siblings ...)
  2020-10-30  0:49 ` [PATCH v2 13/19] accel/tcg: Support split-rwx for linux with memfd Richard Henderson
@ 2020-10-30  0:49 ` Richard Henderson
  2020-11-01  1:42   ` Joelle van Dyne
  2020-10-30  0:49 ` [PATCH v2 15/19] tcg: Return the rx mirror of TranslationBlock from exit_tb Richard Henderson
                   ` (6 subsequent siblings)
  20 siblings, 1 reply; 30+ messages in thread
From: Richard Henderson @ 2020-10-30  0:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, j, laurent

Cribbed from code posted by Joelle van Dyne <j@getutm.app>,
and rearranged to a cleaner structure.  Completely untested.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/translate-all.c | 68 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 67 insertions(+), 1 deletion(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 3e69ebd1d3..bf8263fdb4 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1093,6 +1093,11 @@ static bool alloc_code_gen_buffer_anon(size_t size, int prot, Error **errp)
     int flags = MAP_PRIVATE | MAP_ANONYMOUS;
     void *buf;
 
+#ifdef CONFIG_DARWIN
+    /* Applicable to both iOS and macOS (Apple Silicon). */
+    flags |= MAP_JIT;
+#endif
+
     buf = mmap(NULL, size, prot, flags, -1, 0);
     if (buf == MAP_FAILED) {
         error_setg_errno(errp, errno,
@@ -1182,13 +1187,74 @@ static bool alloc_code_gen_buffer_mirror_memfd(size_t size, Error **errp)
     qemu_madvise(buf_rx, size, QEMU_MADV_HUGEPAGE);
     return true;
 }
-#endif
+#endif /* CONFIG_LINUX */
+
+#ifdef CONFIG_DARWIN
+#include <mach/mach.h>
+
+extern kern_return_t mach_vm_remap(vm_map_t target_task,
+                                   mach_vm_address_t *target_address,
+                                   mach_vm_size_t size,
+                                   mach_vm_offset_t mask,
+                                   int flags,
+                                   vm_map_t src_task,
+                                   mach_vm_address_t src_address,
+                                   boolean_t copy,
+                                   vm_prot_t *cur_protection,
+                                   vm_prot_t *max_protection,
+                                   vm_inherit_t inheritance);
+
+static bool alloc_code_gen_buffer_mirror_vmremap(size_t size, Error **errp)
+{
+    kern_return_t ret;
+    mach_vm_address_t buf_rw, buf_rx;
+    vm_prot_t cur_prot, max_prot;
+
+    /* Map the read-write portion via normal anon memory. */
+    if (!alloc_code_gen_buffer_anon(size, PROT_READ | PROT_WRITE, errp)) {
+        return false;
+    }
+
+    buf_rw = tcg_ctx->code_gen_buffer;
+    buf_rx = 0;
+    ret = mach_vm_remap(mach_task_self(),
+                        &buf_rx,
+                        size,
+                        0,
+                        VM_FLAGS_ANYWHERE | VM_FLAGS_RANDOM_ADDR,
+                        mach_task_self(),
+                        buf_rw,
+                        false,
+                        &cur_prot,
+                        &max_prot,
+                        VM_INHERIT_NONE);
+    if (ret != KERN_SUCCESS) {
+        /* TODO: Convert "ret" to a human readable error message. */
+        error_setg(errp, "vm_remap for jit mirror failed");
+        munmap((void *)buf_rw, size);
+        return false;
+    }
+
+    if (mprotect((void *)buf_rx, size, PROT_READ | PROT_EXEC) != 0) {
+        error_setg_errno(errp, errno, "mprotect for jit mirror");
+        munmap((void *)buf_rx, size);
+        munmap((void *)buf_rw, size);
+        return false;
+    }
+
+    tcg_rx_mirror_diff = buf_rx - buf_rw;
+    return true;
+}
+#endif /* CONFIG_DARWIN */
 
 static bool alloc_code_gen_buffer_mirror(size_t size, Error **errp)
 {
     if (TCG_TARGET_SUPPORT_MIRROR) {
 #ifdef CONFIG_LINUX
         return alloc_code_gen_buffer_mirror_memfd(size, errp);
+#endif
+#ifdef CONFIG_DARWIN
+        return alloc_code_gen_buffer_mirror_vmremap(size, errp);
 #endif
     }
     error_setg(errp, "jit split-rwx not supported");
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 15/19] tcg: Return the rx mirror of TranslationBlock from exit_tb
  2020-10-30  0:49 [PATCH v2 00/19] Mirror map JIT memory for TCG Richard Henderson
                   ` (13 preceding siblings ...)
  2020-10-30  0:49 ` [PATCH v2 14/19] RFC: accel/tcg: Support split-rwx for darwin/iOS with vm_remap Richard Henderson
@ 2020-10-30  0:49 ` Richard Henderson
  2020-10-30  0:49 ` [PATCH v2 16/19] tcg/i386: Support split-rwx code generation Richard Henderson
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2020-10-30  0:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, j, laurent

This produces a small pc-relative displacement within the
generated code to the TB structure that preceeds it.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/cpu-exec.c | 35 ++++++++++++++++++++++-------------
 tcg/tcg-op.c         | 13 ++++++++++++-
 2 files changed, 34 insertions(+), 14 deletions(-)

diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 4af3faba80..f3d17f28d0 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -144,12 +144,13 @@ static void init_delay_params(SyncClocks *sc, const CPUState *cpu)
 #endif /* CONFIG USER ONLY */
 
 /* Execute a TB, and fix up the CPU state afterwards if necessary */
-static inline tcg_target_ulong cpu_tb_exec(CPUState *cpu, TranslationBlock *itb)
+static inline TranslationBlock *cpu_tb_exec(CPUState *cpu,
+                                            TranslationBlock *itb,
+                                            int *tb_exit)
 {
     CPUArchState *env = cpu->env_ptr;
     uintptr_t ret;
     TranslationBlock *last_tb;
-    int tb_exit;
     const void *tb_ptr = itb->tc.ptr;
 
     qemu_log_mask_and_addr(CPU_LOG_EXEC, itb->pc,
@@ -177,11 +178,20 @@ static inline tcg_target_ulong cpu_tb_exec(CPUState *cpu, TranslationBlock *itb)
 
     ret = tcg_qemu_tb_exec(env, tb_ptr);
     cpu->can_do_io = 1;
-    last_tb = (TranslationBlock *)(ret & ~TB_EXIT_MASK);
-    tb_exit = ret & TB_EXIT_MASK;
-    trace_exec_tb_exit(last_tb, tb_exit);
+    /*
+     * TODO: Delay swapping back to the read-write mirror of the TB
+     * until we actually need to modify the TB.  The read-only copy,
+     * coming from the rx mirror, shares the same host TLB entry as
+     * the code that executed the exit_tb opcode that arrived here.
+     * If we insist on touching both the RX and the RW pages, we
+     * double the host TLB pressure.
+     */
+    last_tb = tcg_mirror_rx_to_rw((void *)(ret & ~TB_EXIT_MASK));
+    *tb_exit = ret & TB_EXIT_MASK;
 
-    if (tb_exit > TB_EXIT_IDX1) {
+    trace_exec_tb_exit(last_tb, *tb_exit);
+
+    if (*tb_exit > TB_EXIT_IDX1) {
         /* We didn't start executing this TB (eg because the instruction
          * counter hit zero); we must restore the guest PC to the address
          * of the start of the TB.
@@ -199,7 +209,7 @@ static inline tcg_target_ulong cpu_tb_exec(CPUState *cpu, TranslationBlock *itb)
             cc->set_pc(cpu, last_tb->pc);
         }
     }
-    return ret;
+    return last_tb;
 }
 
 #ifndef CONFIG_USER_ONLY
@@ -210,6 +220,7 @@ static void cpu_exec_nocache(CPUState *cpu, int max_cycles,
 {
     TranslationBlock *tb;
     uint32_t cflags = curr_cflags() | CF_NOCACHE;
+    int tb_exit;
 
     if (ignore_icount) {
         cflags &= ~CF_USE_ICOUNT;
@@ -227,7 +238,7 @@ static void cpu_exec_nocache(CPUState *cpu, int max_cycles,
 
     /* execute the generated code */
     trace_exec_tb_nocache(tb, tb->pc);
-    cpu_tb_exec(cpu, tb);
+    cpu_tb_exec(cpu, tb, &tb_exit);
 
     mmap_lock();
     tb_phys_invalidate(tb, -1);
@@ -244,6 +255,7 @@ void cpu_exec_step_atomic(CPUState *cpu)
     uint32_t flags;
     uint32_t cflags = 1;
     uint32_t cf_mask = cflags & CF_HASH_MASK;
+    int tb_exit;
 
     if (sigsetjmp(cpu->jmp_env, 0) == 0) {
         start_exclusive();
@@ -260,7 +272,7 @@ void cpu_exec_step_atomic(CPUState *cpu)
         cc->cpu_exec_enter(cpu);
         /* execute the generated code */
         trace_exec_tb(tb, pc);
-        cpu_tb_exec(cpu, tb);
+        cpu_tb_exec(cpu, tb, &tb_exit);
         cc->cpu_exec_exit(cpu);
     } else {
         /*
@@ -653,13 +665,10 @@ static inline bool cpu_handle_interrupt(CPUState *cpu,
 static inline void cpu_loop_exec_tb(CPUState *cpu, TranslationBlock *tb,
                                     TranslationBlock **last_tb, int *tb_exit)
 {
-    uintptr_t ret;
     int32_t insns_left;
 
     trace_exec_tb(tb, tb->pc);
-    ret = cpu_tb_exec(cpu, tb);
-    tb = (TranslationBlock *)(ret & ~TB_EXIT_MASK);
-    *tb_exit = ret & TB_EXIT_MASK;
+    tb = cpu_tb_exec(cpu, tb, tb_exit);
     if (*tb_exit != TB_EXIT_REQUESTED) {
         *last_tb = tb;
         return;
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index e3dc0cb4cb..f0d22de3de 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -2666,7 +2666,18 @@ void tcg_gen_extr32_i64(TCGv_i64 lo, TCGv_i64 hi, TCGv_i64 arg)
 
 void tcg_gen_exit_tb(const TranslationBlock *tb, unsigned idx)
 {
-    uintptr_t val = (uintptr_t)tb + idx;
+    /*
+     * Let the jit code return the read-only version of the
+     * TranslationBlock, so that we minimize the pc-relative
+     * distance of the address of the exit_tb code to TB.
+     * This will improve utilization of pc-relative address loads.
+     *
+     * TODO: Move this to translator_loop, so that all const
+     * TranslationBlock pointers refer to read-only memory.
+     * This requires coordination with targets that do not use
+     * the translator_loop.
+     */
+    uintptr_t val = (uintptr_t)tcg_mirror_rw_to_rx((void *)tb) + idx;
 
     if (tb == NULL) {
         tcg_debug_assert(idx == 0);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 16/19] tcg/i386: Support split-rwx code generation
  2020-10-30  0:49 [PATCH v2 00/19] Mirror map JIT memory for TCG Richard Henderson
                   ` (14 preceding siblings ...)
  2020-10-30  0:49 ` [PATCH v2 15/19] tcg: Return the rx mirror of TranslationBlock from exit_tb Richard Henderson
@ 2020-10-30  0:49 ` Richard Henderson
  2020-10-30  0:49 ` [PATCH v2 17/19] tcg/aarch64: Use B not BL for tcg_out_goto_long Richard Henderson
                   ` (4 subsequent siblings)
  20 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2020-10-30  0:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, j, laurent

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/i386/tcg-target.h     |  2 +-
 tcg/i386/tcg-target.c.inc | 20 +++++++++++---------
 2 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 1b9d41bd56..bbbd1c2d4a 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -236,6 +236,6 @@ static inline void tb_target_set_jmp_target(uintptr_t tc_ptr, uintptr_t jmp_rx,
 #define TCG_TARGET_NEED_LDST_LABELS
 #endif
 #define TCG_TARGET_NEED_POOL_LABELS
-#define TCG_TARGET_SUPPORT_MIRROR       0
+#define TCG_TARGET_SUPPORT_MIRROR       1
 
 #endif
diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index 7f74c77d7f..e2c85381cd 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -165,7 +165,7 @@ static bool have_lzcnt;
 # define have_lzcnt 0
 #endif
 
-static tcg_insn_unit *tb_ret_addr;
+static const tcg_insn_unit *tb_ret_addr;
 
 static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
                         intptr_t value, intptr_t addend)
@@ -173,7 +173,7 @@ static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
     value += addend;
     switch(type) {
     case R_386_PC32:
-        value -= (uintptr_t)code_ptr;
+        value -= (uintptr_t)tcg_mirror_rw_to_rx(code_ptr);
         if (value != (int32_t)value) {
             return false;
         }
@@ -182,7 +182,7 @@ static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
         tcg_patch32(code_ptr, value);
         break;
     case R_386_PC8:
-        value -= (uintptr_t)code_ptr;
+        value -= (uintptr_t)tcg_mirror_rw_to_rx(code_ptr);
         if (value != (int8_t)value) {
             return false;
         }
@@ -1006,7 +1006,7 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
     }
 
     /* Try a 7 byte pc-relative lea before the 10 byte movq.  */
-    diff = arg - ((uintptr_t)s->code_ptr + 7);
+    diff = tcg_pcrel_diff(s, (const void *)arg) - 7;
     if (diff == (int32_t)diff) {
         tcg_out_opc(s, OPC_LEA | P_REXW, ret, 0, 0);
         tcg_out8(s, (LOWREGMASK(ret) << 3) | 5);
@@ -1615,7 +1615,7 @@ static inline void tcg_out_call(TCGContext *s, const tcg_insn_unit *dest)
     tcg_out_branch(s, 1, dest);
 }
 
-static void tcg_out_jmp(TCGContext *s, tcg_insn_unit *dest)
+static void tcg_out_jmp(TCGContext *s, const tcg_insn_unit *dest)
 {
     tcg_out_branch(s, 0, dest);
 }
@@ -1786,7 +1786,8 @@ static void add_qemu_ldst_label(TCGContext *s, bool is_ld, bool is_64,
     label->datahi_reg = datahi;
     label->addrlo_reg = addrlo;
     label->addrhi_reg = addrhi;
-    label->raddr = raddr;
+    /* TODO: Cast goes away when all hosts converted */
+    label->raddr = (void *)tcg_mirror_rw_to_rx(raddr);
     label->label_ptr[0] = label_ptr[0];
     if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
         label->label_ptr[1] = label_ptr[1];
@@ -2280,7 +2281,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
             /* jump displacement must be aligned for atomic patching;
              * see if we need to add extra nops before jump
              */
-            gap = tcg_pcrel_diff(s, QEMU_ALIGN_PTR_UP(s->code_ptr + 1, 4));
+            gap = QEMU_ALIGN_PTR_UP(s->code_ptr + 1, 4) - s->code_ptr;
             if (gap != 1) {
                 tcg_out_nopn(s, gap - 1);
             }
@@ -3825,11 +3826,12 @@ static void tcg_target_qemu_prologue(TCGContext *s)
      * Return path for goto_ptr. Set return value to 0, a-la exit_tb,
      * and fall through to the rest of the epilogue.
      */
-    tcg_code_gen_epilogue = s->code_ptr;
+    /* TODO: Cast goes away when all hosts converted */
+    tcg_code_gen_epilogue = (void *)tcg_mirror_rw_to_rx(s->code_ptr);
     tcg_out_movi(s, TCG_TYPE_REG, TCG_REG_EAX, 0);
 
     /* TB epilogue */
-    tb_ret_addr = s->code_ptr;
+    tb_ret_addr = tcg_mirror_rw_to_rx(s->code_ptr);
 
     tcg_out_addi(s, TCG_REG_CALL_STACK, stack_addend);
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 17/19] tcg/aarch64: Use B not BL for tcg_out_goto_long
  2020-10-30  0:49 [PATCH v2 00/19] Mirror map JIT memory for TCG Richard Henderson
                   ` (15 preceding siblings ...)
  2020-10-30  0:49 ` [PATCH v2 16/19] tcg/i386: Support split-rwx code generation Richard Henderson
@ 2020-10-30  0:49 ` Richard Henderson
  2020-10-30  0:49 ` [PATCH v2 18/19] tcg/aarch64: Implement flush_idcache_range manually Richard Henderson
                   ` (3 subsequent siblings)
  20 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2020-10-30  0:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, j, laurent

A typo generated a branch-and-link insn instead of plain branch.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/aarch64/tcg-target.c.inc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
index fea784cf75..bd888bc66d 100644
--- a/tcg/aarch64/tcg-target.c.inc
+++ b/tcg/aarch64/tcg-target.c.inc
@@ -1317,7 +1317,7 @@ static inline void tcg_out_goto_long(TCGContext *s, tcg_insn_unit *target)
 {
     ptrdiff_t offset = target - s->code_ptr;
     if (offset == sextract64(offset, 0, 26)) {
-        tcg_out_insn(s, 3206, BL, offset);
+        tcg_out_insn(s, 3206, B, offset);
     } else {
         tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_TMP, (intptr_t)target);
         tcg_out_insn(s, 3207, BR, TCG_REG_TMP);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 18/19] tcg/aarch64: Implement flush_idcache_range manually
  2020-10-30  0:49 [PATCH v2 00/19] Mirror map JIT memory for TCG Richard Henderson
                   ` (16 preceding siblings ...)
  2020-10-30  0:49 ` [PATCH v2 17/19] tcg/aarch64: Use B not BL for tcg_out_goto_long Richard Henderson
@ 2020-10-30  0:49 ` Richard Henderson
  2020-11-01  1:25   ` Joelle van Dyne
  2020-10-30  0:49 ` [PATCH v2 19/19] tcg/aarch64: Support split-rwx code generation Richard Henderson
                   ` (2 subsequent siblings)
  20 siblings, 1 reply; 30+ messages in thread
From: Richard Henderson @ 2020-10-30  0:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, j, laurent

Copy the single pointer implementation from libgcc and modify it to
support the double pointer interface we require.  This halves the
number of cache operations required when split-rwx is enabled.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/aarch64/tcg-target.h     | 11 +-------
 tcg/aarch64/tcg-target.c.inc | 53 ++++++++++++++++++++++++++++++++++++
 2 files changed, 54 insertions(+), 10 deletions(-)

diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index fa64058d43..e62d38ba55 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -148,16 +148,7 @@ typedef enum {
 #define TCG_TARGET_DEFAULT_MO (0)
 #define TCG_TARGET_HAS_MEMORY_BSWAP     1
 
-/* Flush the dcache at RW, and the icache at RX, as necessary. */
-static inline void flush_idcache_range(uintptr_t rx, uintptr_t rw, size_t len)
-{
-    /* TODO: Copy this from gcc to avoid 4 loops instead of 2. */
-    if (rw != rx) {
-        __builtin___clear_cache((char *)rw, (char *)(rw + len));
-    }
-    __builtin___clear_cache((char *)rx, (char *)(rx + len));
-}
-
+void flush_idcache_range(uintptr_t rx, uintptr_t rw, size_t len);
 void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t, uintptr_t);
 
 #ifdef CONFIG_SOFTMMU
diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
index bd888bc66d..5e8f3faad2 100644
--- a/tcg/aarch64/tcg-target.c.inc
+++ b/tcg/aarch64/tcg-target.c.inc
@@ -2968,3 +2968,56 @@ void tcg_register_jit(const void *buf, size_t buf_size)
 {
     tcg_register_jit_int(buf, buf_size, &debug_frame, sizeof(debug_frame));
 }
+
+/*
+ * Flush the dcache at RW, and the icache at RX, as necessary.
+ * This is a copy of gcc's __aarch64_sync_cache_range, modified
+ * to fit this three-operand interface.
+ */
+void flush_idcache_range(uintptr_t rx, uintptr_t rw, size_t len)
+{
+    const unsigned CTR_IDC = 1u << 28;
+    const unsigned CTR_DIC = 1u << 29;
+    static unsigned int cache_info;
+    uintptr_t icache_lsize, dcache_lsize, p;
+
+    if (!cache_info) {
+        /*
+         * CTR_EL0 [3:0] contains log2 of icache line size in words.
+         * CTR_EL0 [19:16] contains log2 of dcache line size in words.
+         */
+        asm volatile("mrs\t%0, ctr_el0" : "=r"(cache_info));
+    }
+
+    icache_lsize = 4 << extract32(cache_info, 0, 4);
+    dcache_lsize = 4 << extract32(cache_info, 16, 4);
+
+    /*
+     * If CTR_EL0.IDC is enabled, Data cache clean to the Point of Unification
+     * is not required for instruction to data coherence.
+     */
+    if (!(cache_info & CTR_IDC)) {
+        /*
+         * Loop over the address range, clearing one cache line at once.
+         * Data cache must be flushed to unification first to make sure
+         * the instruction cache fetches the updated data.
+         */
+        for (p = rw & -dcache_lsize; p < rw + len; p += dcache_lsize) {
+            asm volatile("dc\tcvau, %0" : : "r" (p) : "memory");
+        }
+        asm volatile("dsb\tish" : : : "memory");
+    }
+
+    /*
+     * If CTR_EL0.DIC is enabled, Instruction cache cleaning to the Point
+     * of Unification is not required for instruction to data coherence.
+     */
+    if (!(cache_info & CTR_DIC)) {
+        for (p = rx & -icache_lsize; p < rx + len; p += icache_lsize) {
+            asm volatile("ic\tivau, %0" : : "r"(p) : "memory");
+        }
+        asm volatile ("dsb\tish" : : : "memory");
+    }
+
+    asm volatile("isb" : : : "memory");
+}
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 19/19] tcg/aarch64: Support split-rwx code generation
  2020-10-30  0:49 [PATCH v2 00/19] Mirror map JIT memory for TCG Richard Henderson
                   ` (17 preceding siblings ...)
  2020-10-30  0:49 ` [PATCH v2 18/19] tcg/aarch64: Implement flush_idcache_range manually Richard Henderson
@ 2020-10-30  0:49 ` Richard Henderson
  2020-10-30  1:27 ` [PATCH v2 00/19] Mirror map JIT memory for TCG no-reply
  2020-10-30 18:26 ` Paolo Bonzini
  20 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2020-10-30  0:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, j, laurent

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/aarch64/tcg-target.h     |  2 +-
 tcg/aarch64/tcg-target.c.inc | 57 ++++++++++++++++++++----------------
 tcg/tcg-pool.c.inc           |  6 +++-
 3 files changed, 38 insertions(+), 27 deletions(-)

diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index e62d38ba55..abb94f9458 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -155,6 +155,6 @@ void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t, uintptr_t);
 #define TCG_TARGET_NEED_LDST_LABELS
 #endif
 #define TCG_TARGET_NEED_POOL_LABELS
-#define TCG_TARGET_SUPPORT_MIRROR       0
+#define TCG_TARGET_SUPPORT_MIRROR       1
 
 #endif /* AARCH64_TCG_TARGET_H */
diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
index 5e8f3faad2..c082a06152 100644
--- a/tcg/aarch64/tcg-target.c.inc
+++ b/tcg/aarch64/tcg-target.c.inc
@@ -78,38 +78,42 @@ static const int tcg_target_call_oarg_regs[1] = {
 #define TCG_REG_GUEST_BASE TCG_REG_X28
 #endif
 
-static inline bool reloc_pc26(tcg_insn_unit *code_ptr, tcg_insn_unit *target)
+static bool reloc_pc26(tcg_insn_unit *src_rw, const tcg_insn_unit *target)
 {
-    ptrdiff_t offset = target - code_ptr;
+    const tcg_insn_unit *src_rx = tcg_mirror_rw_to_rx(src_rw);
+    ptrdiff_t offset = target - src_rx;
+
     if (offset == sextract64(offset, 0, 26)) {
         /* read instruction, mask away previous PC_REL26 parameter contents,
            set the proper offset, then write back the instruction. */
-        *code_ptr = deposit32(*code_ptr, 0, 26, offset);
+        *src_rw = deposit32(*src_rw, 0, 26, offset);
         return true;
     }
     return false;
 }
 
-static inline bool reloc_pc19(tcg_insn_unit *code_ptr, tcg_insn_unit *target)
+static bool reloc_pc19(tcg_insn_unit *src_rw, const tcg_insn_unit *target)
 {
-    ptrdiff_t offset = target - code_ptr;
+    const tcg_insn_unit *src_rx = tcg_mirror_rw_to_rx(src_rw);
+    ptrdiff_t offset = target - src_rx;
+
     if (offset == sextract64(offset, 0, 19)) {
-        *code_ptr = deposit32(*code_ptr, 5, 19, offset);
+        *src_rw = deposit32(*src_rw, 5, 19, offset);
         return true;
     }
     return false;
 }
 
-static inline bool patch_reloc(tcg_insn_unit *code_ptr, int type,
-                               intptr_t value, intptr_t addend)
+static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
+                        intptr_t value, intptr_t addend)
 {
     tcg_debug_assert(addend == 0);
     switch (type) {
     case R_AARCH64_JUMP26:
     case R_AARCH64_CALL26:
-        return reloc_pc26(code_ptr, (tcg_insn_unit *)value);
+        return reloc_pc26(code_ptr, (const tcg_insn_unit *)value);
     case R_AARCH64_CONDBR19:
-        return reloc_pc19(code_ptr, (tcg_insn_unit *)value);
+        return reloc_pc19(code_ptr, (const tcg_insn_unit *)value);
     default:
         g_assert_not_reached();
     }
@@ -1050,12 +1054,13 @@ static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg rd,
     /* Look for host pointer values within 4G of the PC.  This happens
        often when loading pointers to QEMU's own data structures.  */
     if (type == TCG_TYPE_I64) {
-        tcg_target_long disp = value - (intptr_t)s->code_ptr;
+        intptr_t src_rx = (intptr_t)tcg_mirror_rw_to_rx(s->code_ptr);
+        tcg_target_long disp = value - src_rx;
         if (disp == sextract64(disp, 0, 21)) {
             tcg_out_insn(s, 3406, ADR, rd, disp);
             return;
         }
-        disp = (value >> 12) - ((intptr_t)s->code_ptr >> 12);
+        disp = (value >> 12) - (src_rx >> 12);
         if (disp == sextract64(disp, 0, 21)) {
             tcg_out_insn(s, 3406, ADRP, rd, disp);
             if (value & 0xfff) {
@@ -1308,14 +1313,14 @@ static void tcg_out_cmp(TCGContext *s, TCGType ext, TCGReg a,
 
 static void tcg_out_goto(TCGContext *s, const tcg_insn_unit *target)
 {
-    ptrdiff_t offset = target - s->code_ptr;
+    ptrdiff_t offset = tcg_pcrel_diff(s, target) >> 2;
     tcg_debug_assert(offset == sextract64(offset, 0, 26));
     tcg_out_insn(s, 3206, B, offset);
 }
 
-static inline void tcg_out_goto_long(TCGContext *s, tcg_insn_unit *target)
+static void tcg_out_goto_long(TCGContext *s, const tcg_insn_unit *target)
 {
-    ptrdiff_t offset = target - s->code_ptr;
+    ptrdiff_t offset = tcg_pcrel_diff(s, target) >> 2;
     if (offset == sextract64(offset, 0, 26)) {
         tcg_out_insn(s, 3206, B, offset);
     } else {
@@ -1329,9 +1334,9 @@ static inline void tcg_out_callr(TCGContext *s, TCGReg reg)
     tcg_out_insn(s, 3207, BLR, reg);
 }
 
-static inline void tcg_out_call(TCGContext *s, const tcg_insn_unit *target)
+static void tcg_out_call(TCGContext *s, const tcg_insn_unit *target)
 {
-    ptrdiff_t offset = target - s->code_ptr;
+    ptrdiff_t offset = tcg_pcrel_diff(s, target) >> 2;
     if (offset == sextract64(offset, 0, 26)) {
         tcg_out_insn(s, 3206, BL, offset);
     } else {
@@ -1393,7 +1398,7 @@ static void tcg_out_brcond(TCGContext *s, TCGType ext, TCGCond c, TCGArg a,
         tcg_out_reloc(s, s->code_ptr, R_AARCH64_CONDBR19, l, 0);
         offset = tcg_in32(s) >> 5;
     } else {
-        offset = l->u.value_ptr - s->code_ptr;
+        offset = tcg_pcrel_diff(s, l->u.value_ptr) >> 2;
         tcg_debug_assert(offset == sextract64(offset, 0, 19));
     }
 
@@ -1568,7 +1573,7 @@ static void * const qemu_st_helpers[16] = {
     [MO_BEQ]  = helper_be_stq_mmu,
 };
 
-static inline void tcg_out_adr(TCGContext *s, TCGReg rd, void *target)
+static inline void tcg_out_adr(TCGContext *s, TCGReg rd, const void *target)
 {
     ptrdiff_t offset = tcg_pcrel_diff(s, target);
     tcg_debug_assert(offset == sextract64(offset, 0, 21));
@@ -1581,7 +1586,7 @@ static bool tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
     MemOp opc = get_memop(oi);
     MemOp size = opc & MO_SIZE;
 
-    if (!reloc_pc19(lb->label_ptr[0], s->code_ptr)) {
+    if (!reloc_pc19(lb->label_ptr[0], tcg_mirror_rw_to_rx(s->code_ptr))) {
         return false;
     }
 
@@ -1606,7 +1611,7 @@ static bool tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
     MemOp opc = get_memop(oi);
     MemOp size = opc & MO_SIZE;
 
-    if (!reloc_pc19(lb->label_ptr[0], s->code_ptr)) {
+    if (!reloc_pc19(lb->label_ptr[0], tcg_mirror_rw_to_rx(s->code_ptr))) {
         return false;
     }
 
@@ -1631,7 +1636,8 @@ static void add_qemu_ldst_label(TCGContext *s, bool is_ld, TCGMemOpIdx oi,
     label->type = ext;
     label->datalo_reg = data_reg;
     label->addrlo_reg = addr_reg;
-    label->raddr = raddr;
+    /* TODO: Cast goes away when all hosts converted */
+    label->raddr = (void *)tcg_mirror_rw_to_rx(raddr);
     label->label_ptr[0] = label_ptr;
 }
 
@@ -1849,7 +1855,7 @@ static void tcg_out_qemu_st(TCGContext *s, TCGReg data_reg, TCGReg addr_reg,
 #endif /* CONFIG_SOFTMMU */
 }
 
-static tcg_insn_unit *tb_ret_addr;
+static const tcg_insn_unit *tb_ret_addr;
 
 static void tcg_out_op(TCGContext *s, TCGOpcode opc,
                        const TCGArg args[TCG_MAX_OP_ARGS],
@@ -2894,11 +2900,12 @@ static void tcg_target_qemu_prologue(TCGContext *s)
      * Return path for goto_ptr. Set return value to 0, a-la exit_tb,
      * and fall through to the rest of the epilogue.
      */
-    tcg_code_gen_epilogue = s->code_ptr;
+    /* TODO: Cast goes away when all hosts converted */
+    tcg_code_gen_epilogue = (void *)tcg_mirror_rw_to_rx(s->code_ptr);
     tcg_out_movi(s, TCG_TYPE_REG, TCG_REG_X0, 0);
 
     /* TB epilogue */
-    tb_ret_addr = s->code_ptr;
+    tb_ret_addr = tcg_mirror_rw_to_rx(s->code_ptr);
 
     /* Remove TCG locals stack space.  */
     tcg_out_insn(s, 3401, ADDI, TCG_TYPE_I64, TCG_REG_SP, TCG_REG_SP,
diff --git a/tcg/tcg-pool.c.inc b/tcg/tcg-pool.c.inc
index 82cbcc89bd..8d92833d7e 100644
--- a/tcg/tcg-pool.c.inc
+++ b/tcg/tcg-pool.c.inc
@@ -140,6 +140,8 @@ static int tcg_out_pool_finalize(TCGContext *s)
 
     for (; p != NULL; p = p->next) {
         size_t size = sizeof(tcg_target_ulong) * p->nlong;
+        uintptr_t value;
+
         if (!l || l->nlong != p->nlong || memcmp(l->data, p->data, size)) {
             if (unlikely(a > s->code_gen_highwater)) {
                 return -1;
@@ -148,7 +150,9 @@ static int tcg_out_pool_finalize(TCGContext *s)
             a += size;
             l = p;
         }
-        if (!patch_reloc(p->label, p->rtype, (intptr_t)a - size, p->addend)) {
+
+        value = (uintptr_t)tcg_mirror_rw_to_rx(a) - size;
+        if (!patch_reloc(p->label, p->rtype, value, p->addend)) {
             return -2;
         }
     }
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 00/19] Mirror map JIT memory for TCG
  2020-10-30  0:49 [PATCH v2 00/19] Mirror map JIT memory for TCG Richard Henderson
                   ` (18 preceding siblings ...)
  2020-10-30  0:49 ` [PATCH v2 19/19] tcg/aarch64: Support split-rwx code generation Richard Henderson
@ 2020-10-30  1:27 ` no-reply
  2020-10-30 18:26 ` Paolo Bonzini
  20 siblings, 0 replies; 30+ messages in thread
From: no-reply @ 2020-10-30  1:27 UTC (permalink / raw)
  To: richard.henderson; +Cc: pbonzini, laurent, qemu-devel, j

Patchew URL: https://patchew.org/QEMU/20201030004921.721096-1-richard.henderson@linaro.org/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20201030004921.721096-1-richard.henderson@linaro.org
Subject: [PATCH v2 00/19] Mirror map JIT memory for TCG

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag]         patchew/20201030004921.721096-1-richard.henderson@linaro.org -> patchew/20201030004921.721096-1-richard.henderson@linaro.org
Switched to a new branch 'test'
1cae0aa tcg/aarch64: Support split-rwx code generation
2a8bb6f tcg/aarch64: Implement flush_idcache_range manually
5aac937 tcg/aarch64: Use B not BL for tcg_out_goto_long
9f81275 tcg/i386: Support split-rwx code generation
5a11dd0 tcg: Return the rx mirror of TranslationBlock from exit_tb
af71330 RFC: accel/tcg: Support split-rwx for darwin/iOS with vm_remap
bf60965 accel/tcg: Support split-rwx for linux with memfd
90c1e77 tcg: Add --accel tcg,split-rwx property
d4805b8 tcg: Use Error with alloc_code_gen_buffer
9bfafc6 tcg: Make tb arg to synchronize_from_tb const
27ebfc9 tcg: Make DisasContextBase.tb const
5a7bc51 tcg: Adjust tb_target_set_jmp_target for split rwx
6a7c2c6 tcg: Adjust tcg_register_jit for const
51a0884 tcg: Adjust tcg_out_label for const
79f0c8e tcg: Adjust tcg_out_call for const
cd13bcf tcg: Introduce tcg_mirror_rw_to_rx/tcg_mirror_rx_to_rw
43a4727 tcg: Move tcg epilogue pointer out of TCGContext
5c5b85a tcg: Move tcg prologue pointer out of TCGContext
e320c51 tcg: Enhance flush_icache_range with separate data pointer

=== OUTPUT BEGIN ===
1/19 Checking commit e320c51e3e4a (tcg: Enhance flush_icache_range with separate data pointer)
2/19 Checking commit 5c5b85a1a024 (tcg: Move tcg prologue pointer out of TCGContext)
3/19 Checking commit 43a47275e60e (tcg: Move tcg epilogue pointer out of TCGContext)
4/19 Checking commit cd13bcf48f36 (tcg: Introduce tcg_mirror_rw_to_rx/tcg_mirror_rx_to_rw)
5/19 Checking commit 79f0c8e8dd04 (tcg: Adjust tcg_out_call for const)
6/19 Checking commit 51a088446659 (tcg: Adjust tcg_out_label for const)
7/19 Checking commit 6a7c2c61ac7c (tcg: Adjust tcg_register_jit for const)
8/19 Checking commit 5a7bc5100680 (tcg: Adjust tb_target_set_jmp_target for split rwx)
9/19 Checking commit 27ebfc9c7710 (tcg: Make DisasContextBase.tb const)
10/19 Checking commit 9bfafc6a1e66 (tcg: Make tb arg to synchronize_from_tb const)
11/19 Checking commit d4805b8c7ba9 (tcg: Use Error with alloc_code_gen_buffer)
12/19 Checking commit 90c1e773778d (tcg: Add --accel tcg,split-rwx property)
13/19 Checking commit bf60965b715c (accel/tcg: Support split-rwx for linux with memfd)
14/19 Checking commit af71330a9b77 (RFC: accel/tcg: Support split-rwx for darwin/iOS with vm_remap)
ERROR: externs should be avoided in .c files
#39: FILE: accel/tcg/translate-all.c:1195:
+extern kern_return_t mach_vm_remap(vm_map_t target_task,

total: 1 errors, 0 warnings, 86 lines checked

Patch 14/19 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

15/19 Checking commit 5a11dd063165 (tcg: Return the rx mirror of TranslationBlock from exit_tb)
16/19 Checking commit 9f812754c6c2 (tcg/i386: Support split-rwx code generation)
17/19 Checking commit 5aac937754cc (tcg/aarch64: Use B not BL for tcg_out_goto_long)
18/19 Checking commit 2a8bb6f3e71e (tcg/aarch64: Implement flush_idcache_range manually)
19/19 Checking commit 1cae0aa42840 (tcg/aarch64: Support split-rwx code generation)
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20201030004921.721096-1-richard.henderson@linaro.org/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 00/19] Mirror map JIT memory for TCG
  2020-10-30  0:49 [PATCH v2 00/19] Mirror map JIT memory for TCG Richard Henderson
                   ` (19 preceding siblings ...)
  2020-10-30  1:27 ` [PATCH v2 00/19] Mirror map JIT memory for TCG no-reply
@ 2020-10-30 18:26 ` Paolo Bonzini
  2020-10-30 18:57   ` Richard Henderson
  20 siblings, 1 reply; 30+ messages in thread
From: Paolo Bonzini @ 2020-10-30 18:26 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: j, laurent

On 30/10/20 01:49, Richard Henderson wrote:
> Fourth, I have renamed the command-line parameter to "split-rwx".

Stupid observation, but wouldn't it be "split-wx"?

Thanks,

Paolo

> I don't think this is perfect, and I'm not even sure if it's better
> than "mirror-jit".  What this has done, though, is left the code
> with inconsistant language -- "mirror" in some places, "split" in
> others.  I'll clean that up once we know decide on naming.



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 00/19] Mirror map JIT memory for TCG
  2020-10-30 18:26 ` Paolo Bonzini
@ 2020-10-30 18:57   ` Richard Henderson
  0 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2020-10-30 18:57 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: j, laurent

On 10/30/20 11:26 AM, Paolo Bonzini wrote:
> On 30/10/20 01:49, Richard Henderson wrote:
>> Fourth, I have renamed the command-line parameter to "split-rwx".
> 
> Stupid observation, but wouldn't it be "split-wx"?

Um, yes.  ;-)


r~


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 18/19] tcg/aarch64: Implement flush_idcache_range manually
  2020-10-30  0:49 ` [PATCH v2 18/19] tcg/aarch64: Implement flush_idcache_range manually Richard Henderson
@ 2020-11-01  1:25   ` Joelle van Dyne
  2020-11-01 15:09     ` Richard Henderson
  2020-11-03 23:08     ` Richard Henderson
  0 siblings, 2 replies; 30+ messages in thread
From: Joelle van Dyne @ 2020-11-01  1:25 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers

Unfortunately this crashes on iOS/Apple Silicon macOS.

(lldb) bt
* thread #19, stop reason = EXC_BAD_INSTRUCTION (code=1, subcode=0xd53b002a)
  * frame #0: 0x00000001169501e0
libqemu-x86_64-softmmu.utm.dylib`tcg_prologue_init + 760
...
(lldb) x/i 0x00000001169501e0
->  0x1169501e0: 0xd53b002a   mrs    x10, CTR_EL0

I was able to fix it by adding

#ifdef CONFIG_DARWIN
extern void sys_icache_invalidate(void *start, size_t len);
extern void sys_dcache_flush(void *start, size_t len);

void flush_idcache_range(uintptr_t rx, uintptr_t rw, size_t len)
{
    sys_dcache_flush((void *)rw, len);
    sys_icache_invalidate((void *)rx, len);
}
#else
...
#endif

Another thing, for x86 (and maybe other archs), the icache is cache
coherent but does it apply if we are aliasing the memory address? I
think in that case, it's like we're doing a DMA right and still need
to do flushing+invalidating?

-j

On Thu, Oct 29, 2020 at 5:49 PM Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Copy the single pointer implementation from libgcc and modify it to
> support the double pointer interface we require.  This halves the
> number of cache operations required when split-rwx is enabled.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  tcg/aarch64/tcg-target.h     | 11 +-------
>  tcg/aarch64/tcg-target.c.inc | 53 ++++++++++++++++++++++++++++++++++++
>  2 files changed, 54 insertions(+), 10 deletions(-)
>
> diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
> index fa64058d43..e62d38ba55 100644
> --- a/tcg/aarch64/tcg-target.h
> +++ b/tcg/aarch64/tcg-target.h
> @@ -148,16 +148,7 @@ typedef enum {
>  #define TCG_TARGET_DEFAULT_MO (0)
>  #define TCG_TARGET_HAS_MEMORY_BSWAP     1
>
> -/* Flush the dcache at RW, and the icache at RX, as necessary. */
> -static inline void flush_idcache_range(uintptr_t rx, uintptr_t rw, size_t len)
> -{
> -    /* TODO: Copy this from gcc to avoid 4 loops instead of 2. */
> -    if (rw != rx) {
> -        __builtin___clear_cache((char *)rw, (char *)(rw + len));
> -    }
> -    __builtin___clear_cache((char *)rx, (char *)(rx + len));
> -}
> -
> +void flush_idcache_range(uintptr_t rx, uintptr_t rw, size_t len);
>  void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t, uintptr_t);
>
>  #ifdef CONFIG_SOFTMMU
> diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
> index bd888bc66d..5e8f3faad2 100644
> --- a/tcg/aarch64/tcg-target.c.inc
> +++ b/tcg/aarch64/tcg-target.c.inc
> @@ -2968,3 +2968,56 @@ void tcg_register_jit(const void *buf, size_t buf_size)
>  {
>      tcg_register_jit_int(buf, buf_size, &debug_frame, sizeof(debug_frame));
>  }
> +
> +/*
> + * Flush the dcache at RW, and the icache at RX, as necessary.
> + * This is a copy of gcc's __aarch64_sync_cache_range, modified
> + * to fit this three-operand interface.
> + */
> +void flush_idcache_range(uintptr_t rx, uintptr_t rw, size_t len)
> +{
> +    const unsigned CTR_IDC = 1u << 28;
> +    const unsigned CTR_DIC = 1u << 29;
> +    static unsigned int cache_info;
> +    uintptr_t icache_lsize, dcache_lsize, p;
> +
> +    if (!cache_info) {
> +        /*
> +         * CTR_EL0 [3:0] contains log2 of icache line size in words.
> +         * CTR_EL0 [19:16] contains log2 of dcache line size in words.
> +         */
> +        asm volatile("mrs\t%0, ctr_el0" : "=r"(cache_info));
> +    }
> +
> +    icache_lsize = 4 << extract32(cache_info, 0, 4);
> +    dcache_lsize = 4 << extract32(cache_info, 16, 4);
> +
> +    /*
> +     * If CTR_EL0.IDC is enabled, Data cache clean to the Point of Unification
> +     * is not required for instruction to data coherence.
> +     */
> +    if (!(cache_info & CTR_IDC)) {
> +        /*
> +         * Loop over the address range, clearing one cache line at once.
> +         * Data cache must be flushed to unification first to make sure
> +         * the instruction cache fetches the updated data.
> +         */
> +        for (p = rw & -dcache_lsize; p < rw + len; p += dcache_lsize) {
> +            asm volatile("dc\tcvau, %0" : : "r" (p) : "memory");
> +        }
> +        asm volatile("dsb\tish" : : : "memory");
> +    }
> +
> +    /*
> +     * If CTR_EL0.DIC is enabled, Instruction cache cleaning to the Point
> +     * of Unification is not required for instruction to data coherence.
> +     */
> +    if (!(cache_info & CTR_DIC)) {
> +        for (p = rx & -icache_lsize; p < rx + len; p += icache_lsize) {
> +            asm volatile("ic\tivau, %0" : : "r"(p) : "memory");
> +        }
> +        asm volatile ("dsb\tish" : : : "memory");
> +    }
> +
> +    asm volatile("isb" : : : "memory");
> +}
> --
> 2.25.1
>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 14/19] RFC: accel/tcg: Support split-rwx for darwin/iOS with vm_remap
  2020-10-30  0:49 ` [PATCH v2 14/19] RFC: accel/tcg: Support split-rwx for darwin/iOS with vm_remap Richard Henderson
@ 2020-11-01  1:42   ` Joelle van Dyne
  2020-11-01 21:11     ` Joelle van Dyne
  0 siblings, 1 reply; 30+ messages in thread
From: Joelle van Dyne @ 2020-11-01  1:42 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers

There's a compiler warning:

warning: incompatible pointer to integer conversion assigning to
'mach_vm_address_t' (aka 'unsigned long long') from 'void *'
[-Wint-conversion]
    buf_rw = tcg_ctx->code_gen_buffer;

I changed it to
    buf_rw = (mach_vm_address_t)tcg_ctx->code_gen_buffer;

Also, MAP_JIT doesn't work with the split mapping (it needs the same
entitlements that allows for RWX mapping) so I made the following
changes

@@ -1088,15 +1094,11 @@ static bool alloc_code_gen_buffer(size_t size,
int mirror, Error **errp)
     return true;
 }
 #else
-static bool alloc_code_gen_buffer_anon(size_t size, int prot, Error **errp)
+static bool alloc_code_gen_buffer_anon(size_t size, int prot, int
flags, Error **errp)
 {
-    int flags = MAP_PRIVATE | MAP_ANONYMOUS;
     void *buf;

-#ifdef CONFIG_DARWIN
-    /* Applicable to both iOS and macOS (Apple Silicon). */
-    flags |= MAP_JIT;
-#endif
+    flags |= MAP_PRIVATE | MAP_ANONYMOUS;

     buf = mmap(NULL, size, prot, flags, -1, 0);
     if (buf == MAP_FAILED) {
@@ -1211,7 +1213,7 @@ static bool
alloc_code_gen_buffer_mirror_vmremap(size_t size, Error **errp)
     vm_prot_t cur_prot, max_prot;

     /* Map the read-write portion via normal anon memory. */
-    if (!alloc_code_gen_buffer_anon(size, PROT_READ | PROT_WRITE, errp)) {
+    if (!alloc_code_gen_buffer_anon(size, PROT_READ | PROT_WRITE, 0, errp)) {
         return false;
     }

@@ -1263,6 +1265,8 @@ static bool alloc_code_gen_buffer_mirror(size_t
size, Error **errp)

 static bool alloc_code_gen_buffer(size_t size, int mirror, Error **errp)
 {
+    int flags = 0;
+
     if (mirror) {
         Error *local_err = NULL;
         if (alloc_code_gen_buffer_mirror(size, &local_err)) {
@@ -1283,8 +1287,11 @@ static bool alloc_code_gen_buffer(size_t size,
int mirror, Error **errp)
     /* The tcg interpreter does not need execute permission. */
     prot = PROT_READ | PROT_WRITE;
 #endif
+#ifdef CONFIG_DARWIN
+    flags |= MAP_JIT;
+#endif

-    return alloc_code_gen_buffer_anon(size, prot, errp);
+    return alloc_code_gen_buffer_anon(size, prot, flags, errp);
 }
 #endif /* USE_STATIC_CODE_GEN_BUFFER, WIN32, POSIX */

With this in addition to the iOS host patches, I was able to run it on
the iPad but am getting random crashes that I am continuing to debug.

-j

On Thu, Oct 29, 2020 at 5:49 PM Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Cribbed from code posted by Joelle van Dyne <j@getutm.app>,
> and rearranged to a cleaner structure.  Completely untested.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  accel/tcg/translate-all.c | 68 ++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 67 insertions(+), 1 deletion(-)
>
> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> index 3e69ebd1d3..bf8263fdb4 100644
> --- a/accel/tcg/translate-all.c
> +++ b/accel/tcg/translate-all.c
> @@ -1093,6 +1093,11 @@ static bool alloc_code_gen_buffer_anon(size_t size, int prot, Error **errp)
>      int flags = MAP_PRIVATE | MAP_ANONYMOUS;
>      void *buf;
>
> +#ifdef CONFIG_DARWIN
> +    /* Applicable to both iOS and macOS (Apple Silicon). */
> +    flags |= MAP_JIT;
> +#endif
> +
>      buf = mmap(NULL, size, prot, flags, -1, 0);
>      if (buf == MAP_FAILED) {
>          error_setg_errno(errp, errno,
> @@ -1182,13 +1187,74 @@ static bool alloc_code_gen_buffer_mirror_memfd(size_t size, Error **errp)
>      qemu_madvise(buf_rx, size, QEMU_MADV_HUGEPAGE);
>      return true;
>  }
> -#endif
> +#endif /* CONFIG_LINUX */
> +
> +#ifdef CONFIG_DARWIN
> +#include <mach/mach.h>
> +
> +extern kern_return_t mach_vm_remap(vm_map_t target_task,
> +                                   mach_vm_address_t *target_address,
> +                                   mach_vm_size_t size,
> +                                   mach_vm_offset_t mask,
> +                                   int flags,
> +                                   vm_map_t src_task,
> +                                   mach_vm_address_t src_address,
> +                                   boolean_t copy,
> +                                   vm_prot_t *cur_protection,
> +                                   vm_prot_t *max_protection,
> +                                   vm_inherit_t inheritance);
> +
> +static bool alloc_code_gen_buffer_mirror_vmremap(size_t size, Error **errp)
> +{
> +    kern_return_t ret;
> +    mach_vm_address_t buf_rw, buf_rx;
> +    vm_prot_t cur_prot, max_prot;
> +
> +    /* Map the read-write portion via normal anon memory. */
> +    if (!alloc_code_gen_buffer_anon(size, PROT_READ | PROT_WRITE, errp)) {
> +        return false;
> +    }
> +
> +    buf_rw = tcg_ctx->code_gen_buffer;
> +    buf_rx = 0;
> +    ret = mach_vm_remap(mach_task_self(),
> +                        &buf_rx,
> +                        size,
> +                        0,
> +                        VM_FLAGS_ANYWHERE | VM_FLAGS_RANDOM_ADDR,
> +                        mach_task_self(),
> +                        buf_rw,
> +                        false,
> +                        &cur_prot,
> +                        &max_prot,
> +                        VM_INHERIT_NONE);
> +    if (ret != KERN_SUCCESS) {
> +        /* TODO: Convert "ret" to a human readable error message. */
> +        error_setg(errp, "vm_remap for jit mirror failed");
> +        munmap((void *)buf_rw, size);
> +        return false;
> +    }
> +
> +    if (mprotect((void *)buf_rx, size, PROT_READ | PROT_EXEC) != 0) {
> +        error_setg_errno(errp, errno, "mprotect for jit mirror");
> +        munmap((void *)buf_rx, size);
> +        munmap((void *)buf_rw, size);
> +        return false;
> +    }
> +
> +    tcg_rx_mirror_diff = buf_rx - buf_rw;
> +    return true;
> +}
> +#endif /* CONFIG_DARWIN */
>
>  static bool alloc_code_gen_buffer_mirror(size_t size, Error **errp)
>  {
>      if (TCG_TARGET_SUPPORT_MIRROR) {
>  #ifdef CONFIG_LINUX
>          return alloc_code_gen_buffer_mirror_memfd(size, errp);
> +#endif
> +#ifdef CONFIG_DARWIN
> +        return alloc_code_gen_buffer_mirror_vmremap(size, errp);
>  #endif
>      }
>      error_setg(errp, "jit split-rwx not supported");
> --
> 2.25.1
>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 01/19] tcg: Enhance flush_icache_range with separate data pointer
  2020-10-30  0:49 ` [PATCH v2 01/19] tcg: Enhance flush_icache_range with separate data pointer Richard Henderson
@ 2020-11-01  6:54   ` Joelle van Dyne
  2020-11-03 23:02     ` Richard Henderson
  0 siblings, 1 reply; 30+ messages in thread
From: Joelle van Dyne @ 2020-11-01  6:54 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers

s->code_ptr and s->code_buf are 4 byte pointers on aarch64 so the
cache flush is off by a factor of 4

diff --git a/tcg/tcg.c b/tcg/tcg.c
index 44b923f5fe..2c4b66965b 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -4325,7 +4325,8 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)

     /* flush instruction cache */
     flush_idcache_range((uintptr_t)tcg_mirror_rw_to_rx(s->code_buf),
-                        (uintptr_t)s->code_buf, s->code_ptr - s->code_buf);
+                        (uintptr_t)s->code_buf,
+                        (uintptr_t)s->code_ptr - (uintptr_t)s->code_buf);

     return tcg_current_code_size(s);
 }

With this and the other changes, split JIT works fine on iOS.

-j

On Thu, Oct 29, 2020 at 5:49 PM Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> We are shortly going to have a split rw/rx jit buffer.  Depending
> on the host, we need to flush the dcache at the rw data pointer and
> flush the icache at the rx code pointer.
>
> For now, the two passed pointers are identical, so there is no
> effective change in behaviour.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  tcg/aarch64/tcg-target.h     |  9 +++++++--
>  tcg/arm/tcg-target.h         |  8 ++++++--
>  tcg/i386/tcg-target.h        |  3 ++-
>  tcg/mips/tcg-target.h        |  8 ++++++--
>  tcg/ppc/tcg-target.h         |  2 +-
>  tcg/riscv/tcg-target.h       |  8 ++++++--
>  tcg/s390/tcg-target.h        |  3 ++-
>  tcg/sparc/tcg-target.h       |  8 +++++---
>  tcg/tci/tcg-target.h         |  3 ++-
>  softmmu/physmem.c            |  9 ++++++++-
>  tcg/tcg.c                    |  5 +++--
>  tcg/aarch64/tcg-target.c.inc |  2 +-
>  tcg/mips/tcg-target.c.inc    |  2 +-
>  tcg/ppc/tcg-target.c.inc     | 21 +++++++++++----------
>  tcg/sparc/tcg-target.c.inc   |  4 ++--
>  15 files changed, 63 insertions(+), 32 deletions(-)
>
> diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
> index 663dd0b95e..d0a6a059b7 100644
> --- a/tcg/aarch64/tcg-target.h
> +++ b/tcg/aarch64/tcg-target.h
> @@ -148,9 +148,14 @@ typedef enum {
>  #define TCG_TARGET_DEFAULT_MO (0)
>  #define TCG_TARGET_HAS_MEMORY_BSWAP     1
>
> -static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
> +/* Flush the dcache at RW, and the icache at RX, as necessary. */
> +static inline void flush_idcache_range(uintptr_t rx, uintptr_t rw, size_t len)
>  {
> -    __builtin___clear_cache((char *)start, (char *)stop);
> +    /* TODO: Copy this from gcc to avoid 4 loops instead of 2. */
> +    if (rw != rx) {
> +        __builtin___clear_cache((char *)rw, (char *)(rw + len));
> +    }
> +    __builtin___clear_cache((char *)rx, (char *)(rx + len));
>  }
>
>  void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t);
> diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
> index 17e771374d..fa88b24e43 100644
> --- a/tcg/arm/tcg-target.h
> +++ b/tcg/arm/tcg-target.h
> @@ -134,9 +134,13 @@ enum {
>  #define TCG_TARGET_DEFAULT_MO (0)
>  #define TCG_TARGET_HAS_MEMORY_BSWAP     1
>
> -static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
> +/* Flush the dcache at RW, and the icache at RX, as necessary. */
> +static inline void flush_idcache_range(uintptr_t rx, uintptr_t rw, size_t len)
>  {
> -    __builtin___clear_cache((char *) start, (char *) stop);
> +    if (rw != rx) {
> +        __builtin___clear_cache((char *)rw, (char *)(rw + len));
> +    }
> +    __builtin___clear_cache((char *)rx, (char *)(rx + len));
>  }
>
>  /* not defined -- call should be eliminated at compile time */
> diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
> index abd4ac7fc0..8323e72639 100644
> --- a/tcg/i386/tcg-target.h
> +++ b/tcg/i386/tcg-target.h
> @@ -206,7 +206,8 @@ extern bool have_avx2;
>  #define TCG_TARGET_extract_i64_valid(ofs, len) \
>      (((ofs) == 8 && (len) == 8) || ((ofs) + (len)) == 32)
>
> -static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
> +/* Flush the dcache at RW, and the icache at RX, as necessary. */
> +static inline void flush_idcache_range(uintptr_t rx, uintptr_t rw, size_t len)
>  {
>  }
>
> diff --git a/tcg/mips/tcg-target.h b/tcg/mips/tcg-target.h
> index c6b091d849..47b1226ee9 100644
> --- a/tcg/mips/tcg-target.h
> +++ b/tcg/mips/tcg-target.h
> @@ -207,9 +207,13 @@ extern bool use_mips32r2_instructions;
>  #define TCG_TARGET_DEFAULT_MO (0)
>  #define TCG_TARGET_HAS_MEMORY_BSWAP     1
>
> -static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
> +/* Flush the dcache at RW, and the icache at RX, as necessary. */
> +static inline void flush_idcache_range(uintptr_t rx, uintptr_t rw, size_t len)
>  {
> -    cacheflush ((void *)start, stop-start, ICACHE);
> +    if (rx != rw) {
> +        cacheflush((void *)rw, len, DCACHE);
> +    }
> +    cacheflush((void *)rx, len, ICACHE);
>  }
>
>  void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t);
> diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
> index be10363956..fbb6dc1b47 100644
> --- a/tcg/ppc/tcg-target.h
> +++ b/tcg/ppc/tcg-target.h
> @@ -175,7 +175,7 @@ extern bool have_vsx;
>  #define TCG_TARGET_HAS_bitsel_vec       have_vsx
>  #define TCG_TARGET_HAS_cmpsel_vec       0
>
> -void flush_icache_range(uintptr_t start, uintptr_t stop);
> +void flush_idcache_range(uintptr_t rx, uintptr_t rw, size_t len);
>  void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t);
>
>  #define TCG_TARGET_DEFAULT_MO (0)
> diff --git a/tcg/riscv/tcg-target.h b/tcg/riscv/tcg-target.h
> index 032439d806..0fa6ae358e 100644
> --- a/tcg/riscv/tcg-target.h
> +++ b/tcg/riscv/tcg-target.h
> @@ -159,9 +159,13 @@ typedef enum {
>  #define TCG_TARGET_HAS_mulsh_i64        1
>  #endif
>
> -static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
> +/* Flush the dcache at RW, and the icache at RX, as necessary. */
> +static inline void flush_idcache_range(uintptr_t rx, uintptr_t rw, size_t len)
>  {
> -    __builtin___clear_cache((char *)start, (char *)stop);
> +    if (rx != rw) {
> +        __builtin___clear_cache((char *)rw, (char *)(rw + len));
> +    }
> +    __builtin___clear_cache((char *)rx, (char *)(rx + len));
>  }
>
>  /* not defined -- call should be eliminated at compile time */
> diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h
> index 63c8797bd3..c3dc2e8938 100644
> --- a/tcg/s390/tcg-target.h
> +++ b/tcg/s390/tcg-target.h
> @@ -145,7 +145,8 @@ enum {
>      TCG_AREG0 = TCG_REG_R10,
>  };
>
> -static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
> +/* Flush the dcache at RW, and the icache at RX, as necessary. */
> +static inline void flush_idcache_range(uintptr_t rx, uintptr_t rw, size_t len)
>  {
>  }
>
> diff --git a/tcg/sparc/tcg-target.h b/tcg/sparc/tcg-target.h
> index 633841ebf2..c27c40231e 100644
> --- a/tcg/sparc/tcg-target.h
> +++ b/tcg/sparc/tcg-target.h
> @@ -168,10 +168,12 @@ extern bool use_vis3_instructions;
>  #define TCG_TARGET_DEFAULT_MO (0)
>  #define TCG_TARGET_HAS_MEMORY_BSWAP     1
>
> -static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
> +/* Flush the dcache at RW, and the icache at RX, as necessary. */
> +static inline void flush_idcache_range(uintptr_t rx, uintptr_t rw, size_t len)
>  {
> -    uintptr_t p;
> -    for (p = start & -8; p < ((stop + 7) & -8); p += 8) {
> +    /* No additional data flush to the RW virtual address required. */
> +    uintptr_t p, end = (rx + len + 7) & -8;
> +    for (p = rx & -8; p < end; p += 8) {
>          __asm__ __volatile__("flush\t%0" : : "r" (p));
>      }
>  }
> diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
> index 8c1c1d265d..6460449719 100644
> --- a/tcg/tci/tcg-target.h
> +++ b/tcg/tci/tcg-target.h
> @@ -191,7 +191,8 @@ void tci_disas(uint8_t opc);
>
>  #define HAVE_TCG_QEMU_TB_EXEC
>
> -static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
> +/* Flush the dcache at RW, and the icache at RX, as necessary. */
> +static inline void flush_idcache_range(uintptr_t rx, uintptr_t rw, size_t len)
>  {
>  }
>
> diff --git a/softmmu/physmem.c b/softmmu/physmem.c
> index a9adedb9f8..b23c1fef54 100644
> --- a/softmmu/physmem.c
> +++ b/softmmu/physmem.c
> @@ -2954,7 +2954,14 @@ static inline MemTxResult address_space_write_rom_internal(AddressSpace *as,
>                  invalidate_and_set_dirty(mr, addr1, l);
>                  break;
>              case FLUSH_CACHE:
> -                flush_icache_range((uintptr_t)ram_ptr, (uintptr_t)ram_ptr + l);
> +                /*
> +                 * FIXME: This function is currently located in tcg/host/,
> +                 * but we never come here when tcg is enabled; only for
> +                 * real hardware acceleration.  This can actively fail
> +                 * when TCI is configured, since that function is a nop.
> +                 * We should move this to util/ or something.
> +                 */
> +                flush_idcache_range((uintptr_t)ram_ptr, (uintptr_t)ram_ptr, l);
>                  break;
>              }
>          }
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index a8c28440e2..3bf36e0cfe 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -1076,7 +1076,7 @@ void tcg_prologue_init(TCGContext *s)
>  #endif
>
>      buf1 = s->code_ptr;
> -    flush_icache_range((uintptr_t)buf0, (uintptr_t)buf1);
> +    flush_idcache_range((uintptr_t)buf0, (uintptr_t)buf0, buf1 - buf0);
>
>      /* Deduct the prologue from the buffer.  */
>      prologue_size = tcg_current_code_size(s);
> @@ -4268,7 +4268,8 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
>      }
>
>      /* flush instruction cache */
> -    flush_icache_range((uintptr_t)s->code_buf, (uintptr_t)s->code_ptr);
> +    flush_idcache_range((uintptr_t)s->code_buf, (uintptr_t)s->code_buf,
> +                        s->code_ptr - s->code_buf);
>
>      return tcg_current_code_size(s);
>  }
> diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
> index 26f71cb599..83af3108a4 100644
> --- a/tcg/aarch64/tcg-target.c.inc
> +++ b/tcg/aarch64/tcg-target.c.inc
> @@ -1363,7 +1363,7 @@ void tb_target_set_jmp_target(uintptr_t tc_ptr, uintptr_t jmp_addr,
>      }
>      pair = (uint64_t)i2 << 32 | i1;
>      qatomic_set((uint64_t *)jmp_addr, pair);
> -    flush_icache_range(jmp_addr, jmp_addr + 8);
> +    flush_idcache_range(jmp_addr, jmp_addr, 8);
>  }
>
>  static inline void tcg_out_goto_label(TCGContext *s, TCGLabel *l)
> diff --git a/tcg/mips/tcg-target.c.inc b/tcg/mips/tcg-target.c.inc
> index 41be574e89..c255ecb444 100644
> --- a/tcg/mips/tcg-target.c.inc
> +++ b/tcg/mips/tcg-target.c.inc
> @@ -2660,7 +2660,7 @@ void tb_target_set_jmp_target(uintptr_t tc_ptr, uintptr_t jmp_addr,
>                                uintptr_t addr)
>  {
>      qatomic_set((uint32_t *)jmp_addr, deposit32(OPC_J, 0, 26, addr >> 2));
> -    flush_icache_range(jmp_addr, jmp_addr + 4);
> +    flush_idcache_range(jmp_addr, jmp_addr, 4);
>  }
>
>  typedef struct {
> diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
> index 18ee989f95..a848e98383 100644
> --- a/tcg/ppc/tcg-target.c.inc
> +++ b/tcg/ppc/tcg-target.c.inc
> @@ -1753,12 +1753,12 @@ void tb_target_set_jmp_target(uintptr_t tc_ptr, uintptr_t jmp_addr,
>          /* As per the enclosing if, this is ppc64.  Avoid the _Static_assert
>             within qatomic_set that would fail to build a ppc32 host.  */
>          qatomic_set__nocheck((uint64_t *)jmp_addr, pair);
> -        flush_icache_range(jmp_addr, jmp_addr + 8);
> +        flush_idcache_range(jmp_addr, jmp_addr, 8);
>      } else {
>          intptr_t diff = addr - jmp_addr;
>          tcg_debug_assert(in_range_b(diff));
>          qatomic_set((uint32_t *)jmp_addr, B | (diff & 0x3fffffc));
> -        flush_icache_range(jmp_addr, jmp_addr + 4);
> +        flush_idcache_range(jmp_addr, jmp_addr, 4);
>      }
>  }
>
> @@ -3864,22 +3864,23 @@ void tcg_register_jit(void *buf, size_t buf_size)
>  }
>  #endif /* __ELF__ */
>
> -void flush_icache_range(uintptr_t start, uintptr_t stop)
> +/* Flush the dcache at RW, and the icache at RX, as necessary. */
> +void flush_idcache_range(uintptr_t rx, uintptr_t rw, uintptr_t len)
>  {
> -    uintptr_t p, start1, stop1;
> +    uintptr_t p, start, stop;
>      size_t dsize = qemu_dcache_linesize;
>      size_t isize = qemu_icache_linesize;
>
> -    start1 = start & ~(dsize - 1);
> -    stop1 = (stop + dsize - 1) & ~(dsize - 1);
> -    for (p = start1; p < stop1; p += dsize) {
> +    start = rw & ~(dsize - 1);
> +    stop = (rw + len + dsize - 1) & ~(dsize - 1);
> +    for (p = start; p < stop; p += dsize) {
>          asm volatile ("dcbst 0,%0" : : "r"(p) : "memory");
>      }
>      asm volatile ("sync" : : : "memory");
>
> -    start &= start & ~(isize - 1);
> -    stop1 = (stop + isize - 1) & ~(isize - 1);
> -    for (p = start1; p < stop1; p += isize) {
> +    start = rx & ~(isize - 1);
> +    stop = (rx + len + isize - 1) & ~(isize - 1);
> +    for (p = start; p < stop; p += isize) {
>          asm volatile ("icbi 0,%0" : : "r"(p) : "memory");
>      }
>      asm volatile ("sync" : : : "memory");
> diff --git a/tcg/sparc/tcg-target.c.inc b/tcg/sparc/tcg-target.c.inc
> index 6775bd30fc..6e2d755f6a 100644
> --- a/tcg/sparc/tcg-target.c.inc
> +++ b/tcg/sparc/tcg-target.c.inc
> @@ -1836,7 +1836,7 @@ void tb_target_set_jmp_target(uintptr_t tc_ptr, uintptr_t jmp_addr,
>      if (!USE_REG_TB) {
>          qatomic_set((uint32_t *)jmp_addr,
>                     deposit32(CALL, 0, 30, br_disp >> 2));
> -        flush_icache_range(jmp_addr, jmp_addr + 4);
> +        flush_idcache_range(jmp_addr, jmp_addr, 4);
>          return;
>      }
>
> @@ -1860,5 +1860,5 @@ void tb_target_set_jmp_target(uintptr_t tc_ptr, uintptr_t jmp_addr,
>      }
>
>      qatomic_set((uint64_t *)jmp_addr, deposit64(i2, 32, 32, i1));
> -    flush_icache_range(jmp_addr, jmp_addr + 8);
> +    flush_idcache_range(jmp_addr, jmp_addr, 8);
>  }
> --
> 2.25.1
>


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 18/19] tcg/aarch64: Implement flush_idcache_range manually
  2020-11-01  1:25   ` Joelle van Dyne
@ 2020-11-01 15:09     ` Richard Henderson
  2020-11-03 23:08     ` Richard Henderson
  1 sibling, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2020-11-01 15:09 UTC (permalink / raw)
  To: Joelle van Dyne; +Cc: QEMU Developers

On 10/31/20 6:25 PM, Joelle van Dyne wrote:
> Another thing, for x86 (and maybe other archs), the icache is cache
> coherent but does it apply if we are aliasing the memory address? I
> think in that case, it's like we're doing a DMA right and still need
> to do flushing+invalidating?

No, it is not like dma.  The x86 caches are physically tagged, so virtual
aliasing does not matter.


r~


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 14/19] RFC: accel/tcg: Support split-rwx for darwin/iOS with vm_remap
  2020-11-01  1:42   ` Joelle van Dyne
@ 2020-11-01 21:11     ` Joelle van Dyne
  0 siblings, 0 replies; 30+ messages in thread
From: Joelle van Dyne @ 2020-11-01 21:11 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers

Another change I made in alloc_code_gen_buffer_mirror_vmremap (in my
patch as well) is to remove VM_FLAGS_RANDOM_ADDR. This was causing a
rare out of memory error whenever the random address it chooses is too
high.

-j

On Sat, Oct 31, 2020 at 6:42 PM Joelle van Dyne <j@getutm.app> wrote:
>
> There's a compiler warning:
>
> warning: incompatible pointer to integer conversion assigning to
> 'mach_vm_address_t' (aka 'unsigned long long') from 'void *'
> [-Wint-conversion]
>     buf_rw = tcg_ctx->code_gen_buffer;
>
> I changed it to
>     buf_rw = (mach_vm_address_t)tcg_ctx->code_gen_buffer;
>
> Also, MAP_JIT doesn't work with the split mapping (it needs the same
> entitlements that allows for RWX mapping) so I made the following
> changes
>
> @@ -1088,15 +1094,11 @@ static bool alloc_code_gen_buffer(size_t size,
> int mirror, Error **errp)
>      return true;
>  }
>  #else
> -static bool alloc_code_gen_buffer_anon(size_t size, int prot, Error **errp)
> +static bool alloc_code_gen_buffer_anon(size_t size, int prot, int
> flags, Error **errp)
>  {
> -    int flags = MAP_PRIVATE | MAP_ANONYMOUS;
>      void *buf;
>
> -#ifdef CONFIG_DARWIN
> -    /* Applicable to both iOS and macOS (Apple Silicon). */
> -    flags |= MAP_JIT;
> -#endif
> +    flags |= MAP_PRIVATE | MAP_ANONYMOUS;
>
>      buf = mmap(NULL, size, prot, flags, -1, 0);
>      if (buf == MAP_FAILED) {
> @@ -1211,7 +1213,7 @@ static bool
> alloc_code_gen_buffer_mirror_vmremap(size_t size, Error **errp)
>      vm_prot_t cur_prot, max_prot;
>
>      /* Map the read-write portion via normal anon memory. */
> -    if (!alloc_code_gen_buffer_anon(size, PROT_READ | PROT_WRITE, errp)) {
> +    if (!alloc_code_gen_buffer_anon(size, PROT_READ | PROT_WRITE, 0, errp)) {
>          return false;
>      }
>
> @@ -1263,6 +1265,8 @@ static bool alloc_code_gen_buffer_mirror(size_t
> size, Error **errp)
>
>  static bool alloc_code_gen_buffer(size_t size, int mirror, Error **errp)
>  {
> +    int flags = 0;
> +
>      if (mirror) {
>          Error *local_err = NULL;
>          if (alloc_code_gen_buffer_mirror(size, &local_err)) {
> @@ -1283,8 +1287,11 @@ static bool alloc_code_gen_buffer(size_t size,
> int mirror, Error **errp)
>      /* The tcg interpreter does not need execute permission. */
>      prot = PROT_READ | PROT_WRITE;
>  #endif
> +#ifdef CONFIG_DARWIN
> +    flags |= MAP_JIT;
> +#endif
>
> -    return alloc_code_gen_buffer_anon(size, prot, errp);
> +    return alloc_code_gen_buffer_anon(size, prot, flags, errp);
>  }
>  #endif /* USE_STATIC_CODE_GEN_BUFFER, WIN32, POSIX */
>
> With this in addition to the iOS host patches, I was able to run it on
> the iPad but am getting random crashes that I am continuing to debug.
>
> -j
>
> On Thu, Oct 29, 2020 at 5:49 PM Richard Henderson
> <richard.henderson@linaro.org> wrote:
> >
> > Cribbed from code posted by Joelle van Dyne <j@getutm.app>,
> > and rearranged to a cleaner structure.  Completely untested.
> >
> > Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> > ---
> >  accel/tcg/translate-all.c | 68 ++++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 67 insertions(+), 1 deletion(-)
> >
> > diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> > index 3e69ebd1d3..bf8263fdb4 100644
> > --- a/accel/tcg/translate-all.c
> > +++ b/accel/tcg/translate-all.c
> > @@ -1093,6 +1093,11 @@ static bool alloc_code_gen_buffer_anon(size_t size, int prot, Error **errp)
> >      int flags = MAP_PRIVATE | MAP_ANONYMOUS;
> >      void *buf;
> >
> > +#ifdef CONFIG_DARWIN
> > +    /* Applicable to both iOS and macOS (Apple Silicon). */
> > +    flags |= MAP_JIT;
> > +#endif
> > +
> >      buf = mmap(NULL, size, prot, flags, -1, 0);
> >      if (buf == MAP_FAILED) {
> >          error_setg_errno(errp, errno,
> > @@ -1182,13 +1187,74 @@ static bool alloc_code_gen_buffer_mirror_memfd(size_t size, Error **errp)
> >      qemu_madvise(buf_rx, size, QEMU_MADV_HUGEPAGE);
> >      return true;
> >  }
> > -#endif
> > +#endif /* CONFIG_LINUX */
> > +
> > +#ifdef CONFIG_DARWIN
> > +#include <mach/mach.h>
> > +
> > +extern kern_return_t mach_vm_remap(vm_map_t target_task,
> > +                                   mach_vm_address_t *target_address,
> > +                                   mach_vm_size_t size,
> > +                                   mach_vm_offset_t mask,
> > +                                   int flags,
> > +                                   vm_map_t src_task,
> > +                                   mach_vm_address_t src_address,
> > +                                   boolean_t copy,
> > +                                   vm_prot_t *cur_protection,
> > +                                   vm_prot_t *max_protection,
> > +                                   vm_inherit_t inheritance);
> > +
> > +static bool alloc_code_gen_buffer_mirror_vmremap(size_t size, Error **errp)
> > +{
> > +    kern_return_t ret;
> > +    mach_vm_address_t buf_rw, buf_rx;
> > +    vm_prot_t cur_prot, max_prot;
> > +
> > +    /* Map the read-write portion via normal anon memory. */
> > +    if (!alloc_code_gen_buffer_anon(size, PROT_READ | PROT_WRITE, errp)) {
> > +        return false;
> > +    }
> > +
> > +    buf_rw = tcg_ctx->code_gen_buffer;
> > +    buf_rx = 0;
> > +    ret = mach_vm_remap(mach_task_self(),
> > +                        &buf_rx,
> > +                        size,
> > +                        0,
> > +                        VM_FLAGS_ANYWHERE | VM_FLAGS_RANDOM_ADDR,
> > +                        mach_task_self(),
> > +                        buf_rw,
> > +                        false,
> > +                        &cur_prot,
> > +                        &max_prot,
> > +                        VM_INHERIT_NONE);
> > +    if (ret != KERN_SUCCESS) {
> > +        /* TODO: Convert "ret" to a human readable error message. */
> > +        error_setg(errp, "vm_remap for jit mirror failed");
> > +        munmap((void *)buf_rw, size);
> > +        return false;
> > +    }
> > +
> > +    if (mprotect((void *)buf_rx, size, PROT_READ | PROT_EXEC) != 0) {
> > +        error_setg_errno(errp, errno, "mprotect for jit mirror");
> > +        munmap((void *)buf_rx, size);
> > +        munmap((void *)buf_rw, size);
> > +        return false;
> > +    }
> > +
> > +    tcg_rx_mirror_diff = buf_rx - buf_rw;
> > +    return true;
> > +}
> > +#endif /* CONFIG_DARWIN */
> >
> >  static bool alloc_code_gen_buffer_mirror(size_t size, Error **errp)
> >  {
> >      if (TCG_TARGET_SUPPORT_MIRROR) {
> >  #ifdef CONFIG_LINUX
> >          return alloc_code_gen_buffer_mirror_memfd(size, errp);
> > +#endif
> > +#ifdef CONFIG_DARWIN
> > +        return alloc_code_gen_buffer_mirror_vmremap(size, errp);
> >  #endif
> >      }
> >      error_setg(errp, "jit split-rwx not supported");
> > --
> > 2.25.1
> >


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 01/19] tcg: Enhance flush_icache_range with separate data pointer
  2020-11-01  6:54   ` Joelle van Dyne
@ 2020-11-03 23:02     ` Richard Henderson
  0 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2020-11-03 23:02 UTC (permalink / raw)
  To: Joelle van Dyne; +Cc: QEMU Developers

On 10/31/20 11:54 PM, Joelle van Dyne wrote:
> s->code_ptr and s->code_buf are 4 byte pointers on aarch64 so the
> cache flush is off by a factor of 4
> 
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index 44b923f5fe..2c4b66965b 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -4325,7 +4325,8 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
> 
>      /* flush instruction cache */
>      flush_idcache_range((uintptr_t)tcg_mirror_rw_to_rx(s->code_buf),
> -                        (uintptr_t)s->code_buf, s->code_ptr - s->code_buf);
> +                        (uintptr_t)s->code_buf,
> +                        (uintptr_t)s->code_ptr - (uintptr_t)s->code_buf);

Yep, thanks.


r~


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 18/19] tcg/aarch64: Implement flush_idcache_range manually
  2020-11-01  1:25   ` Joelle van Dyne
  2020-11-01 15:09     ` Richard Henderson
@ 2020-11-03 23:08     ` Richard Henderson
  1 sibling, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2020-11-03 23:08 UTC (permalink / raw)
  To: Joelle van Dyne; +Cc: QEMU Developers

On 10/31/20 6:25 PM, Joelle van Dyne wrote:
> Unfortunately this crashes on iOS/Apple Silicon macOS.
> 
> (lldb) bt
> * thread #19, stop reason = EXC_BAD_INSTRUCTION (code=1, subcode=0xd53b002a)
>   * frame #0: 0x00000001169501e0
> libqemu-x86_64-softmmu.utm.dylib`tcg_prologue_init + 760
> ...
> (lldb) x/i 0x00000001169501e0
> ->  0x1169501e0: 0xd53b002a   mrs    x10, CTR_EL0

That is *really* annoying.  Why in the world would Apple not set SCTLR_ELx.UCT?
 There's nothing that the OS can do better than the application for ARMv8.0-A.

Oh well.  I'll paste your code in.


r~


^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2020-11-03 23:09 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-30  0:49 [PATCH v2 00/19] Mirror map JIT memory for TCG Richard Henderson
2020-10-30  0:49 ` [PATCH v2 01/19] tcg: Enhance flush_icache_range with separate data pointer Richard Henderson
2020-11-01  6:54   ` Joelle van Dyne
2020-11-03 23:02     ` Richard Henderson
2020-10-30  0:49 ` [PATCH v2 02/19] tcg: Move tcg prologue pointer out of TCGContext Richard Henderson
2020-10-30  0:49 ` [PATCH v2 03/19] tcg: Move tcg epilogue " Richard Henderson
2020-10-30  0:49 ` [PATCH v2 04/19] tcg: Introduce tcg_mirror_rw_to_rx/tcg_mirror_rx_to_rw Richard Henderson
2020-10-30  0:49 ` [PATCH v2 05/19] tcg: Adjust tcg_out_call for const Richard Henderson
2020-10-30  0:49 ` [PATCH v2 06/19] tcg: Adjust tcg_out_label " Richard Henderson
2020-10-30  0:49 ` [PATCH v2 07/19] tcg: Adjust tcg_register_jit " Richard Henderson
2020-10-30  0:49 ` [PATCH v2 08/19] tcg: Adjust tb_target_set_jmp_target for split rwx Richard Henderson
2020-10-30  0:49 ` [PATCH v2 09/19] tcg: Make DisasContextBase.tb const Richard Henderson
2020-10-30  0:49 ` [PATCH v2 10/19] tcg: Make tb arg to synchronize_from_tb const Richard Henderson
2020-10-30  0:49 ` [PATCH v2 11/19] tcg: Use Error with alloc_code_gen_buffer Richard Henderson
2020-10-30  0:49 ` [PATCH v2 12/19] tcg: Add --accel tcg,split-rwx property Richard Henderson
2020-10-30  0:49 ` [PATCH v2 13/19] accel/tcg: Support split-rwx for linux with memfd Richard Henderson
2020-10-30  0:49 ` [PATCH v2 14/19] RFC: accel/tcg: Support split-rwx for darwin/iOS with vm_remap Richard Henderson
2020-11-01  1:42   ` Joelle van Dyne
2020-11-01 21:11     ` Joelle van Dyne
2020-10-30  0:49 ` [PATCH v2 15/19] tcg: Return the rx mirror of TranslationBlock from exit_tb Richard Henderson
2020-10-30  0:49 ` [PATCH v2 16/19] tcg/i386: Support split-rwx code generation Richard Henderson
2020-10-30  0:49 ` [PATCH v2 17/19] tcg/aarch64: Use B not BL for tcg_out_goto_long Richard Henderson
2020-10-30  0:49 ` [PATCH v2 18/19] tcg/aarch64: Implement flush_idcache_range manually Richard Henderson
2020-11-01  1:25   ` Joelle van Dyne
2020-11-01 15:09     ` Richard Henderson
2020-11-03 23:08     ` Richard Henderson
2020-10-30  0:49 ` [PATCH v2 19/19] tcg/aarch64: Support split-rwx code generation Richard Henderson
2020-10-30  1:27 ` [PATCH v2 00/19] Mirror map JIT memory for TCG no-reply
2020-10-30 18:26 ` Paolo Bonzini
2020-10-30 18:57   ` Richard Henderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.