All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 00/17] tcg: CPUTLBEntryFull and TARGET_TB_PCREL
@ 2022-09-25 10:51 Richard Henderson
  2022-09-25 10:51 ` [PATCH v5 01/17] accel/tcg: Rename CPUIOTLBEntry to CPUTLBEntryFull Richard Henderson
                   ` (17 more replies)
  0 siblings, 18 replies; 38+ messages in thread
From: Richard Henderson @ 2022-09-25 10:51 UTC (permalink / raw)
  To: qemu-devel

Smooshing these two patch sets back together for review bandwidth.
I hope to make this the next tcg-next pull.

There are three from the first half, tlbentryfull, which are new.
These are following a hallway conversation with Peter about bits
in MemTxAttrs that are not actually related to memory transactions,
and infrastructure to address a to-do in an Arm patch set.

There are a few patches from the second half, pcrel, that have not
been reviewed. 

  07-target-sparc-Use-tlb_set_page_full.patch
  08-accel-tcg-Move-byte_swap-from-MemTxAttrs-to-CPUTL.patch
  09-accel-tcg-Add-force_aligned-to-CPUTLBEntryFull.patch
  10-accel-tcg-Remove-PageDesc-code_bitmap.patch
  13-accel-tcg-Do-not-align-tb-page_addr-0.patch
  15-accel-tcg-Introduce-tb_pc-and-tb_pc_log.patch
  16-accel-tcg-Introduce-TARGET_TB_PCREL.patch


r~


Richard Henderson (17):
  accel/tcg: Rename CPUIOTLBEntry to CPUTLBEntryFull
  accel/tcg: Drop addr member from SavedIOTLB
  accel/tcg: Suppress auto-invalidate in probe_access_internal
  accel/tcg: Introduce probe_access_full
  accel/tcg: Introduce tlb_set_page_full
  include/exec: Introduce TARGET_PAGE_ENTRY_EXTRA
  target/sparc: Use tlb_set_page_full
  accel/tcg: Move byte_swap from MemTxAttrs to CPUTLBEntryFull
  accel/tcg: Add force_aligned to CPUTLBEntryFull
  accel/tcg: Remove PageDesc code_bitmap
  accel/tcg: Use bool for page_find_alloc
  accel/tcg: Use DisasContextBase in plugin_gen_tb_start
  accel/tcg: Do not align tb->page_addr[0]
  include/hw/core: Create struct CPUJumpCache
  accel/tcg: Introduce tb_pc and tb_pc_log
  accel/tcg: Introduce TARGET_TB_PCREL
  accel/tcg: Split log_cpu_exec into inline and slow path

 include/exec/cpu-all.h                  |   6 +-
 include/exec/cpu-defs.h                 |  54 ++++--
 include/exec/exec-all.h                 |  84 ++++++++-
 include/exec/memattrs.h                 |   2 -
 include/exec/plugin-gen.h               |   7 +-
 include/hw/core/cpu.h                   |  10 +-
 accel/tcg/cpu-exec.c                    | 108 +++++++----
 accel/tcg/cputlb.c                      | 228 ++++++++++++++----------
 accel/tcg/plugin-gen.c                  |  22 +--
 accel/tcg/translate-all.c               | 168 ++++++-----------
 accel/tcg/translator.c                  |   2 +-
 target/arm/cpu.c                        |   4 +-
 target/arm/mte_helper.c                 |  14 +-
 target/arm/sve_helper.c                 |   4 +-
 target/arm/translate-a64.c              |   2 +-
 target/avr/cpu.c                        |   2 +-
 target/hexagon/cpu.c                    |   2 +-
 target/hppa/cpu.c                       |   4 +-
 target/i386/tcg/tcg-cpu.c               |   2 +-
 target/loongarch/cpu.c                  |   2 +-
 target/microblaze/cpu.c                 |   2 +-
 target/mips/tcg/exception.c             |   2 +-
 target/mips/tcg/sysemu/special_helper.c |   2 +-
 target/openrisc/cpu.c                   |   2 +-
 target/riscv/cpu.c                      |   4 +-
 target/rx/cpu.c                         |   2 +-
 target/s390x/tcg/mem_helper.c           |   4 -
 target/sh4/cpu.c                        |   4 +-
 target/sparc/cpu.c                      |   2 +-
 target/sparc/mmu_helper.c               | 123 ++++++-------
 target/tricore/cpu.c                    |   2 +-
 tcg/tcg.c                               |   6 +-
 32 files changed, 501 insertions(+), 381 deletions(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v5 01/17] accel/tcg: Rename CPUIOTLBEntry to CPUTLBEntryFull
  2022-09-25 10:51 [PATCH v5 00/17] tcg: CPUTLBEntryFull and TARGET_TB_PCREL Richard Henderson
@ 2022-09-25 10:51 ` Richard Henderson
  2022-09-29 11:45   ` Alex Bennée
  2022-09-25 10:51 ` [PATCH v5 02/17] accel/tcg: Drop addr member from SavedIOTLB Richard Henderson
                   ` (16 subsequent siblings)
  17 siblings, 1 reply; 38+ messages in thread
From: Richard Henderson @ 2022-09-25 10:51 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Philippe Mathieu-Daudé

This structure will shortly contain more than just
data for accessing MMIO.  Rename the 'addr' member
to 'xlat_section' to more clearly indicate its purpose.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/exec/cpu-defs.h    |  22 ++++----
 accel/tcg/cputlb.c         | 102 +++++++++++++++++++------------------
 target/arm/mte_helper.c    |  14 ++---
 target/arm/sve_helper.c    |   4 +-
 target/arm/translate-a64.c |   2 +-
 5 files changed, 73 insertions(+), 71 deletions(-)

diff --git a/include/exec/cpu-defs.h b/include/exec/cpu-defs.h
index ba3cd32a1e..f70f54d850 100644
--- a/include/exec/cpu-defs.h
+++ b/include/exec/cpu-defs.h
@@ -108,6 +108,7 @@ typedef uint64_t target_ulong;
 #  endif
 # endif
 
+/* Minimalized TLB entry for use by TCG fast path. */
 typedef struct CPUTLBEntry {
     /* bit TARGET_LONG_BITS to TARGET_PAGE_BITS : virtual address
        bit TARGET_PAGE_BITS-1..4  : Nonzero for accesses that should not
@@ -131,14 +132,14 @@ typedef struct CPUTLBEntry {
 
 QEMU_BUILD_BUG_ON(sizeof(CPUTLBEntry) != (1 << CPU_TLB_ENTRY_BITS));
 
-/* The IOTLB is not accessed directly inline by generated TCG code,
- * so the CPUIOTLBEntry layout is not as critical as that of the
- * CPUTLBEntry. (This is also why we don't want to combine the two
- * structs into one.)
+/*
+ * The full TLB entry, which is not accessed by generated TCG code,
+ * so the layout is not as critical as that of CPUTLBEntry. This is
+ * also why we don't want to combine the two structs.
  */
-typedef struct CPUIOTLBEntry {
+typedef struct CPUTLBEntryFull {
     /*
-     * @addr contains:
+     * @xlat_section contains:
      *  - in the lower TARGET_PAGE_BITS, a physical section number
      *  - with the lower TARGET_PAGE_BITS masked off, an offset which
      *    must be added to the virtual address to obtain:
@@ -146,9 +147,9 @@ typedef struct CPUIOTLBEntry {
      *       number is PHYS_SECTION_NOTDIRTY or PHYS_SECTION_ROM)
      *     + the offset within the target MemoryRegion (otherwise)
      */
-    hwaddr addr;
+    hwaddr xlat_section;
     MemTxAttrs attrs;
-} CPUIOTLBEntry;
+} CPUTLBEntryFull;
 
 /*
  * Data elements that are per MMU mode, minus the bits accessed by
@@ -172,9 +173,8 @@ typedef struct CPUTLBDesc {
     size_t vindex;
     /* The tlb victim table, in two parts.  */
     CPUTLBEntry vtable[CPU_VTLB_SIZE];
-    CPUIOTLBEntry viotlb[CPU_VTLB_SIZE];
-    /* The iotlb.  */
-    CPUIOTLBEntry *iotlb;
+    CPUTLBEntryFull vfulltlb[CPU_VTLB_SIZE];
+    CPUTLBEntryFull *fulltlb;
 } CPUTLBDesc;
 
 /*
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 8fad2d9b83..4585d7c015 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -200,13 +200,13 @@ static void tlb_mmu_resize_locked(CPUTLBDesc *desc, CPUTLBDescFast *fast,
     }
 
     g_free(fast->table);
-    g_free(desc->iotlb);
+    g_free(desc->fulltlb);
 
     tlb_window_reset(desc, now, 0);
     /* desc->n_used_entries is cleared by the caller */
     fast->mask = (new_size - 1) << CPU_TLB_ENTRY_BITS;
     fast->table = g_try_new(CPUTLBEntry, new_size);
-    desc->iotlb = g_try_new(CPUIOTLBEntry, new_size);
+    desc->fulltlb = g_try_new(CPUTLBEntryFull, new_size);
 
     /*
      * If the allocations fail, try smaller sizes. We just freed some
@@ -215,7 +215,7 @@ static void tlb_mmu_resize_locked(CPUTLBDesc *desc, CPUTLBDescFast *fast,
      * allocations to fail though, so we progressively reduce the allocation
      * size, aborting if we cannot even allocate the smallest TLB we support.
      */
-    while (fast->table == NULL || desc->iotlb == NULL) {
+    while (fast->table == NULL || desc->fulltlb == NULL) {
         if (new_size == (1 << CPU_TLB_DYN_MIN_BITS)) {
             error_report("%s: %s", __func__, strerror(errno));
             abort();
@@ -224,9 +224,9 @@ static void tlb_mmu_resize_locked(CPUTLBDesc *desc, CPUTLBDescFast *fast,
         fast->mask = (new_size - 1) << CPU_TLB_ENTRY_BITS;
 
         g_free(fast->table);
-        g_free(desc->iotlb);
+        g_free(desc->fulltlb);
         fast->table = g_try_new(CPUTLBEntry, new_size);
-        desc->iotlb = g_try_new(CPUIOTLBEntry, new_size);
+        desc->fulltlb = g_try_new(CPUTLBEntryFull, new_size);
     }
 }
 
@@ -258,7 +258,7 @@ static void tlb_mmu_init(CPUTLBDesc *desc, CPUTLBDescFast *fast, int64_t now)
     desc->n_used_entries = 0;
     fast->mask = (n_entries - 1) << CPU_TLB_ENTRY_BITS;
     fast->table = g_new(CPUTLBEntry, n_entries);
-    desc->iotlb = g_new(CPUIOTLBEntry, n_entries);
+    desc->fulltlb = g_new(CPUTLBEntryFull, n_entries);
     tlb_mmu_flush_locked(desc, fast);
 }
 
@@ -299,7 +299,7 @@ void tlb_destroy(CPUState *cpu)
         CPUTLBDescFast *fast = &env_tlb(env)->f[i];
 
         g_free(fast->table);
-        g_free(desc->iotlb);
+        g_free(desc->fulltlb);
     }
 }
 
@@ -1219,7 +1219,7 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
 
         /* Evict the old entry into the victim tlb.  */
         copy_tlb_helper_locked(tv, te);
-        desc->viotlb[vidx] = desc->iotlb[index];
+        desc->vfulltlb[vidx] = desc->fulltlb[index];
         tlb_n_used_entries_dec(env, mmu_idx);
     }
 
@@ -1236,8 +1236,8 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
      * subtract here is that of the page base, and not the same as the
      * vaddr we add back in io_readx()/io_writex()/get_page_addr_code().
      */
-    desc->iotlb[index].addr = iotlb - vaddr_page;
-    desc->iotlb[index].attrs = attrs;
+    desc->fulltlb[index].xlat_section = iotlb - vaddr_page;
+    desc->fulltlb[index].attrs = attrs;
 
     /* Now calculate the new entry */
     tn.addend = addend - vaddr_page;
@@ -1329,7 +1329,7 @@ static inline void cpu_transaction_failed(CPUState *cpu, hwaddr physaddr,
     }
 }
 
-static uint64_t io_readx(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
+static uint64_t io_readx(CPUArchState *env, CPUTLBEntryFull *full,
                          int mmu_idx, target_ulong addr, uintptr_t retaddr,
                          MMUAccessType access_type, MemOp op)
 {
@@ -1341,9 +1341,9 @@ static uint64_t io_readx(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
     bool locked = false;
     MemTxResult r;
 
-    section = iotlb_to_section(cpu, iotlbentry->addr, iotlbentry->attrs);
+    section = iotlb_to_section(cpu, full->xlat_section, full->attrs);
     mr = section->mr;
-    mr_offset = (iotlbentry->addr & TARGET_PAGE_MASK) + addr;
+    mr_offset = (full->xlat_section & TARGET_PAGE_MASK) + addr;
     cpu->mem_io_pc = retaddr;
     if (!cpu->can_do_io) {
         cpu_io_recompile(cpu, retaddr);
@@ -1353,14 +1353,14 @@ static uint64_t io_readx(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
         qemu_mutex_lock_iothread();
         locked = true;
     }
-    r = memory_region_dispatch_read(mr, mr_offset, &val, op, iotlbentry->attrs);
+    r = memory_region_dispatch_read(mr, mr_offset, &val, op, full->attrs);
     if (r != MEMTX_OK) {
         hwaddr physaddr = mr_offset +
             section->offset_within_address_space -
             section->offset_within_region;
 
         cpu_transaction_failed(cpu, physaddr, addr, memop_size(op), access_type,
-                               mmu_idx, iotlbentry->attrs, r, retaddr);
+                               mmu_idx, full->attrs, r, retaddr);
     }
     if (locked) {
         qemu_mutex_unlock_iothread();
@@ -1370,8 +1370,8 @@ static uint64_t io_readx(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
 }
 
 /*
- * Save a potentially trashed IOTLB entry for later lookup by plugin.
- * This is read by tlb_plugin_lookup if the iotlb entry doesn't match
+ * Save a potentially trashed CPUTLBEntryFull for later lookup by plugin.
+ * This is read by tlb_plugin_lookup if the fulltlb entry doesn't match
  * because of the side effect of io_writex changing memory layout.
  */
 static void save_iotlb_data(CPUState *cs, hwaddr addr,
@@ -1385,7 +1385,7 @@ static void save_iotlb_data(CPUState *cs, hwaddr addr,
 #endif
 }
 
-static void io_writex(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
+static void io_writex(CPUArchState *env, CPUTLBEntryFull *full,
                       int mmu_idx, uint64_t val, target_ulong addr,
                       uintptr_t retaddr, MemOp op)
 {
@@ -1396,9 +1396,9 @@ static void io_writex(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
     bool locked = false;
     MemTxResult r;
 
-    section = iotlb_to_section(cpu, iotlbentry->addr, iotlbentry->attrs);
+    section = iotlb_to_section(cpu, full->xlat_section, full->attrs);
     mr = section->mr;
-    mr_offset = (iotlbentry->addr & TARGET_PAGE_MASK) + addr;
+    mr_offset = (full->xlat_section & TARGET_PAGE_MASK) + addr;
     if (!cpu->can_do_io) {
         cpu_io_recompile(cpu, retaddr);
     }
@@ -1408,20 +1408,20 @@ static void io_writex(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
      * The memory_region_dispatch may trigger a flush/resize
      * so for plugins we save the iotlb_data just in case.
      */
-    save_iotlb_data(cpu, iotlbentry->addr, section, mr_offset);
+    save_iotlb_data(cpu, full->xlat_section, section, mr_offset);
 
     if (!qemu_mutex_iothread_locked()) {
         qemu_mutex_lock_iothread();
         locked = true;
     }
-    r = memory_region_dispatch_write(mr, mr_offset, val, op, iotlbentry->attrs);
+    r = memory_region_dispatch_write(mr, mr_offset, val, op, full->attrs);
     if (r != MEMTX_OK) {
         hwaddr physaddr = mr_offset +
             section->offset_within_address_space -
             section->offset_within_region;
 
         cpu_transaction_failed(cpu, physaddr, addr, memop_size(op),
-                               MMU_DATA_STORE, mmu_idx, iotlbentry->attrs, r,
+                               MMU_DATA_STORE, mmu_idx, full->attrs, r,
                                retaddr);
     }
     if (locked) {
@@ -1468,9 +1468,10 @@ static bool victim_tlb_hit(CPUArchState *env, size_t mmu_idx, size_t index,
             copy_tlb_helper_locked(vtlb, &tmptlb);
             qemu_spin_unlock(&env_tlb(env)->c.lock);
 
-            CPUIOTLBEntry tmpio, *io = &env_tlb(env)->d[mmu_idx].iotlb[index];
-            CPUIOTLBEntry *vio = &env_tlb(env)->d[mmu_idx].viotlb[vidx];
-            tmpio = *io; *io = *vio; *vio = tmpio;
+            CPUTLBEntryFull *f1 = &env_tlb(env)->d[mmu_idx].fulltlb[index];
+            CPUTLBEntryFull *f2 = &env_tlb(env)->d[mmu_idx].vfulltlb[vidx];
+            CPUTLBEntryFull tmpf;
+            tmpf = *f1; *f1 = *f2; *f2 = tmpf;
             return true;
         }
     }
@@ -1483,9 +1484,9 @@ static bool victim_tlb_hit(CPUArchState *env, size_t mmu_idx, size_t index,
                  (ADDR) & TARGET_PAGE_MASK)
 
 static void notdirty_write(CPUState *cpu, vaddr mem_vaddr, unsigned size,
-                           CPUIOTLBEntry *iotlbentry, uintptr_t retaddr)
+                           CPUTLBEntryFull *full, uintptr_t retaddr)
 {
-    ram_addr_t ram_addr = mem_vaddr + iotlbentry->addr;
+    ram_addr_t ram_addr = mem_vaddr + full->xlat_section;
 
     trace_memory_notdirty_write_access(mem_vaddr, ram_addr, size);
 
@@ -1578,9 +1579,9 @@ int probe_access_flags(CPUArchState *env, target_ulong addr,
     /* Handle clean RAM pages.  */
     if (unlikely(flags & TLB_NOTDIRTY)) {
         uintptr_t index = tlb_index(env, mmu_idx, addr);
-        CPUIOTLBEntry *iotlbentry = &env_tlb(env)->d[mmu_idx].iotlb[index];
+        CPUTLBEntryFull *full = &env_tlb(env)->d[mmu_idx].fulltlb[index];
 
-        notdirty_write(env_cpu(env), addr, 1, iotlbentry, retaddr);
+        notdirty_write(env_cpu(env), addr, 1, full, retaddr);
         flags &= ~TLB_NOTDIRTY;
     }
 
@@ -1605,19 +1606,19 @@ void *probe_access(CPUArchState *env, target_ulong addr, int size,
 
     if (unlikely(flags & (TLB_NOTDIRTY | TLB_WATCHPOINT))) {
         uintptr_t index = tlb_index(env, mmu_idx, addr);
-        CPUIOTLBEntry *iotlbentry = &env_tlb(env)->d[mmu_idx].iotlb[index];
+        CPUTLBEntryFull *full = &env_tlb(env)->d[mmu_idx].fulltlb[index];
 
         /* Handle watchpoints.  */
         if (flags & TLB_WATCHPOINT) {
             int wp_access = (access_type == MMU_DATA_STORE
                              ? BP_MEM_WRITE : BP_MEM_READ);
             cpu_check_watchpoint(env_cpu(env), addr, size,
-                                 iotlbentry->attrs, wp_access, retaddr);
+                                 full->attrs, wp_access, retaddr);
         }
 
         /* Handle clean RAM pages.  */
         if (flags & TLB_NOTDIRTY) {
-            notdirty_write(env_cpu(env), addr, 1, iotlbentry, retaddr);
+            notdirty_write(env_cpu(env), addr, 1, full, retaddr);
         }
     }
 
@@ -1674,7 +1675,7 @@ tb_page_addr_t get_page_addr_code_hostp(CPUArchState *env, target_ulong addr,
  * should have just filled the TLB. The one corner case is io_writex
  * which can cause TLB flushes and potential resizing of the TLBs
  * losing the information we need. In those cases we need to recover
- * data from a copy of the iotlbentry. As long as this always occurs
+ * data from a copy of the CPUTLBEntryFull. As long as this always occurs
  * from the same thread (which a mem callback will be) this is safe.
  */
 
@@ -1689,11 +1690,12 @@ bool tlb_plugin_lookup(CPUState *cpu, target_ulong addr, int mmu_idx,
     if (likely(tlb_hit(tlb_addr, addr))) {
         /* We must have an iotlb entry for MMIO */
         if (tlb_addr & TLB_MMIO) {
-            CPUIOTLBEntry *iotlbentry;
-            iotlbentry = &env_tlb(env)->d[mmu_idx].iotlb[index];
+            CPUTLBEntryFull *full;
+            full = &env_tlb(env)->d[mmu_idx].fulltlb[index];
             data->is_io = true;
-            data->v.io.section = iotlb_to_section(cpu, iotlbentry->addr, iotlbentry->attrs);
-            data->v.io.offset = (iotlbentry->addr & TARGET_PAGE_MASK) + addr;
+            data->v.io.section =
+                iotlb_to_section(cpu, full->xlat_section, full->attrs);
+            data->v.io.offset = (full->xlat_section & TARGET_PAGE_MASK) + addr;
         } else {
             data->is_io = false;
             data->v.ram.hostaddr = (void *)((uintptr_t)addr + tlbe->addend);
@@ -1801,7 +1803,7 @@ static void *atomic_mmu_lookup(CPUArchState *env, target_ulong addr,
 
     if (unlikely(tlb_addr & TLB_NOTDIRTY)) {
         notdirty_write(env_cpu(env), addr, size,
-                       &env_tlb(env)->d[mmu_idx].iotlb[index], retaddr);
+                       &env_tlb(env)->d[mmu_idx].fulltlb[index], retaddr);
     }
 
     return hostaddr;
@@ -1909,7 +1911,7 @@ load_helper(CPUArchState *env, target_ulong addr, MemOpIdx oi,
 
     /* Handle anything that isn't just a straight memory access.  */
     if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
-        CPUIOTLBEntry *iotlbentry;
+        CPUTLBEntryFull *full;
         bool need_swap;
 
         /* For anything that is unaligned, recurse through full_load.  */
@@ -1917,20 +1919,20 @@ load_helper(CPUArchState *env, target_ulong addr, MemOpIdx oi,
             goto do_unaligned_access;
         }
 
-        iotlbentry = &env_tlb(env)->d[mmu_idx].iotlb[index];
+        full = &env_tlb(env)->d[mmu_idx].fulltlb[index];
 
         /* Handle watchpoints.  */
         if (unlikely(tlb_addr & TLB_WATCHPOINT)) {
             /* On watchpoint hit, this will longjmp out.  */
             cpu_check_watchpoint(env_cpu(env), addr, size,
-                                 iotlbentry->attrs, BP_MEM_READ, retaddr);
+                                 full->attrs, BP_MEM_READ, retaddr);
         }
 
         need_swap = size > 1 && (tlb_addr & TLB_BSWAP);
 
         /* Handle I/O access.  */
         if (likely(tlb_addr & TLB_MMIO)) {
-            return io_readx(env, iotlbentry, mmu_idx, addr, retaddr,
+            return io_readx(env, full, mmu_idx, addr, retaddr,
                             access_type, op ^ (need_swap * MO_BSWAP));
         }
 
@@ -2245,12 +2247,12 @@ store_helper_unaligned(CPUArchState *env, target_ulong addr, uint64_t val,
      */
     if (unlikely(tlb_addr & TLB_WATCHPOINT)) {
         cpu_check_watchpoint(env_cpu(env), addr, size - size2,
-                             env_tlb(env)->d[mmu_idx].iotlb[index].attrs,
+                             env_tlb(env)->d[mmu_idx].fulltlb[index].attrs,
                              BP_MEM_WRITE, retaddr);
     }
     if (unlikely(tlb_addr2 & TLB_WATCHPOINT)) {
         cpu_check_watchpoint(env_cpu(env), page2, size2,
-                             env_tlb(env)->d[mmu_idx].iotlb[index2].attrs,
+                             env_tlb(env)->d[mmu_idx].fulltlb[index2].attrs,
                              BP_MEM_WRITE, retaddr);
     }
 
@@ -2314,7 +2316,7 @@ store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
 
     /* Handle anything that isn't just a straight memory access.  */
     if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
-        CPUIOTLBEntry *iotlbentry;
+        CPUTLBEntryFull *full;
         bool need_swap;
 
         /* For anything that is unaligned, recurse through byte stores.  */
@@ -2322,20 +2324,20 @@ store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
             goto do_unaligned_access;
         }
 
-        iotlbentry = &env_tlb(env)->d[mmu_idx].iotlb[index];
+        full = &env_tlb(env)->d[mmu_idx].fulltlb[index];
 
         /* Handle watchpoints.  */
         if (unlikely(tlb_addr & TLB_WATCHPOINT)) {
             /* On watchpoint hit, this will longjmp out.  */
             cpu_check_watchpoint(env_cpu(env), addr, size,
-                                 iotlbentry->attrs, BP_MEM_WRITE, retaddr);
+                                 full->attrs, BP_MEM_WRITE, retaddr);
         }
 
         need_swap = size > 1 && (tlb_addr & TLB_BSWAP);
 
         /* Handle I/O access.  */
         if (tlb_addr & TLB_MMIO) {
-            io_writex(env, iotlbentry, mmu_idx, val, addr, retaddr,
+            io_writex(env, full, mmu_idx, val, addr, retaddr,
                       op ^ (need_swap * MO_BSWAP));
             return;
         }
@@ -2347,7 +2349,7 @@ store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
 
         /* Handle clean RAM pages.  */
         if (tlb_addr & TLB_NOTDIRTY) {
-            notdirty_write(env_cpu(env), addr, size, iotlbentry, retaddr);
+            notdirty_write(env_cpu(env), addr, size, full, retaddr);
         }
 
         haddr = (void *)((uintptr_t)addr + entry->addend);
diff --git a/target/arm/mte_helper.c b/target/arm/mte_helper.c
index d11a8c70d0..fdd23ab3f8 100644
--- a/target/arm/mte_helper.c
+++ b/target/arm/mte_helper.c
@@ -106,7 +106,7 @@ static uint8_t *allocation_tag_mem(CPUARMState *env, int ptr_mmu_idx,
     return tags + index;
 #else
     uintptr_t index;
-    CPUIOTLBEntry *iotlbentry;
+    CPUTLBEntryFull *full;
     int in_page, flags;
     ram_addr_t ptr_ra;
     hwaddr ptr_paddr, tag_paddr, xlat;
@@ -129,7 +129,7 @@ static uint8_t *allocation_tag_mem(CPUARMState *env, int ptr_mmu_idx,
     assert(!(flags & TLB_INVALID_MASK));
 
     /*
-     * Find the iotlbentry for ptr.  This *must* be present in the TLB
+     * Find the CPUTLBEntryFull for ptr.  This *must* be present in the TLB
      * because we just found the mapping.
      * TODO: Perhaps there should be a cputlb helper that returns a
      * matching tlb entry + iotlb entry.
@@ -144,10 +144,10 @@ static uint8_t *allocation_tag_mem(CPUARMState *env, int ptr_mmu_idx,
         g_assert(tlb_hit(comparator, ptr));
     }
 # endif
-    iotlbentry = &env_tlb(env)->d[ptr_mmu_idx].iotlb[index];
+    full = &env_tlb(env)->d[ptr_mmu_idx].fulltlb[index];
 
     /* If the virtual page MemAttr != Tagged, access unchecked. */
-    if (!arm_tlb_mte_tagged(&iotlbentry->attrs)) {
+    if (!arm_tlb_mte_tagged(&full->attrs)) {
         return NULL;
     }
 
@@ -181,7 +181,7 @@ static uint8_t *allocation_tag_mem(CPUARMState *env, int ptr_mmu_idx,
         int wp = ptr_access == MMU_DATA_LOAD ? BP_MEM_READ : BP_MEM_WRITE;
         assert(ra != 0);
         cpu_check_watchpoint(env_cpu(env), ptr, ptr_size,
-                             iotlbentry->attrs, wp, ra);
+                             full->attrs, wp, ra);
     }
 
     /*
@@ -202,11 +202,11 @@ static uint8_t *allocation_tag_mem(CPUARMState *env, int ptr_mmu_idx,
     tag_paddr = ptr_paddr >> (LOG2_TAG_GRANULE + 1);
 
     /* Look up the address in tag space. */
-    tag_asi = iotlbentry->attrs.secure ? ARMASIdx_TagS : ARMASIdx_TagNS;
+    tag_asi = full->attrs.secure ? ARMASIdx_TagS : ARMASIdx_TagNS;
     tag_as = cpu_get_address_space(env_cpu(env), tag_asi);
     mr = address_space_translate(tag_as, tag_paddr, &xlat, NULL,
                                  tag_access == MMU_DATA_STORE,
-                                 iotlbentry->attrs);
+                                 full->attrs);
 
     /*
      * Note that @mr will never be NULL.  If there is nothing in the address
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index d6f7ef94fe..9cae8fd352 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -5384,8 +5384,8 @@ bool sve_probe_page(SVEHostPage *info, bool nofault, CPUARMState *env,
         g_assert(tlb_hit(comparator, addr));
 # endif
 
-        CPUIOTLBEntry *iotlbentry = &env_tlb(env)->d[mmu_idx].iotlb[index];
-        info->attrs = iotlbentry->attrs;
+        CPUTLBEntryFull *full = &env_tlb(env)->d[mmu_idx].fulltlb[index];
+        info->attrs = full->attrs;
     }
 #endif
 
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 9bed336b47..78b2d91ed4 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -14624,7 +14624,7 @@ static bool is_guarded_page(CPUARMState *env, DisasContext *s)
      * table entry even for that case.
      */
     return (tlb_hit(entry->addr_code, addr) &&
-            arm_tlb_bti_gp(&env_tlb(env)->d[mmu_idx].iotlb[index].attrs));
+            arm_tlb_bti_gp(&env_tlb(env)->d[mmu_idx].fulltlb[index].attrs));
 #endif
 }
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 02/17] accel/tcg: Drop addr member from SavedIOTLB
  2022-09-25 10:51 [PATCH v5 00/17] tcg: CPUTLBEntryFull and TARGET_TB_PCREL Richard Henderson
  2022-09-25 10:51 ` [PATCH v5 01/17] accel/tcg: Rename CPUIOTLBEntry to CPUTLBEntryFull Richard Henderson
@ 2022-09-25 10:51 ` Richard Henderson
  2022-09-29 11:46   ` Alex Bennée
  2022-09-25 10:51 ` [PATCH v5 03/17] accel/tcg: Suppress auto-invalidate in probe_access_internal Richard Henderson
                   ` (15 subsequent siblings)
  17 siblings, 1 reply; 38+ messages in thread
From: Richard Henderson @ 2022-09-25 10:51 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Philippe Mathieu-Daudé

This field is only written, not read; remove it.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/hw/core/cpu.h | 1 -
 accel/tcg/cputlb.c    | 7 +++----
 2 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index 500503da13..9e47184513 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -218,7 +218,6 @@ struct CPUWatchpoint {
  * the memory regions get moved around  by io_writex.
  */
 typedef struct SavedIOTLB {
-    hwaddr addr;
     MemoryRegionSection *section;
     hwaddr mr_offset;
 } SavedIOTLB;
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 4585d7c015..03395e725d 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -1374,12 +1374,11 @@ static uint64_t io_readx(CPUArchState *env, CPUTLBEntryFull *full,
  * This is read by tlb_plugin_lookup if the fulltlb entry doesn't match
  * because of the side effect of io_writex changing memory layout.
  */
-static void save_iotlb_data(CPUState *cs, hwaddr addr,
-                            MemoryRegionSection *section, hwaddr mr_offset)
+static void save_iotlb_data(CPUState *cs, MemoryRegionSection *section,
+                            hwaddr mr_offset)
 {
 #ifdef CONFIG_PLUGIN
     SavedIOTLB *saved = &cs->saved_iotlb;
-    saved->addr = addr;
     saved->section = section;
     saved->mr_offset = mr_offset;
 #endif
@@ -1408,7 +1407,7 @@ static void io_writex(CPUArchState *env, CPUTLBEntryFull *full,
      * The memory_region_dispatch may trigger a flush/resize
      * so for plugins we save the iotlb_data just in case.
      */
-    save_iotlb_data(cpu, full->xlat_section, section, mr_offset);
+    save_iotlb_data(cpu, section, mr_offset);
 
     if (!qemu_mutex_iothread_locked()) {
         qemu_mutex_lock_iothread();
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 03/17] accel/tcg: Suppress auto-invalidate in probe_access_internal
  2022-09-25 10:51 [PATCH v5 00/17] tcg: CPUTLBEntryFull and TARGET_TB_PCREL Richard Henderson
  2022-09-25 10:51 ` [PATCH v5 01/17] accel/tcg: Rename CPUIOTLBEntry to CPUTLBEntryFull Richard Henderson
  2022-09-25 10:51 ` [PATCH v5 02/17] accel/tcg: Drop addr member from SavedIOTLB Richard Henderson
@ 2022-09-25 10:51 ` Richard Henderson
  2022-09-29 11:49   ` Alex Bennée
  2022-09-29 11:50   ` David Hildenbrand
  2022-09-25 10:51 ` [PATCH v5 04/17] accel/tcg: Introduce probe_access_full Richard Henderson
                   ` (14 subsequent siblings)
  17 siblings, 2 replies; 38+ messages in thread
From: Richard Henderson @ 2022-09-25 10:51 UTC (permalink / raw)
  To: qemu-devel; +Cc: David Hildenbrand, Peter Maydell

When PAGE_WRITE_INV is set when calling tlb_set_page,
we immediately set TLB_INVALID_MASK in order to force
tlb_fill to be called on the next lookup.  Here in
probe_access_internal, we have just called tlb_fill
and eliminated true misses, thus the lookup must be valid.

This allows us to remove a warning comment from s390x.
There doesn't seem to be a reason to change the code though.

Cc: David Hildenbrand <david@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/cputlb.c            | 10 +++++++++-
 target/s390x/tcg/mem_helper.c |  4 ----
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 03395e725d..91f2b53142 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -1535,6 +1535,7 @@ static int probe_access_internal(CPUArchState *env, target_ulong addr,
     }
     tlb_addr = tlb_read_ofs(entry, elt_ofs);
 
+    flags = TLB_FLAGS_MASK;
     page_addr = addr & TARGET_PAGE_MASK;
     if (!tlb_hit_page(tlb_addr, page_addr)) {
         if (!victim_tlb_hit(env, mmu_idx, index, elt_ofs, page_addr)) {
@@ -1550,10 +1551,17 @@ static int probe_access_internal(CPUArchState *env, target_ulong addr,
 
             /* TLB resize via tlb_fill may have moved the entry.  */
             entry = tlb_entry(env, mmu_idx, addr);
+
+            /*
+             * With PAGE_WRITE_INV, we set TLB_INVALID_MASK immediately,
+             * to force the next access through tlb_fill.  We've just
+             * called tlb_fill, so we know that this entry *is* valid.
+             */
+            flags &= ~TLB_INVALID_MASK;
         }
         tlb_addr = tlb_read_ofs(entry, elt_ofs);
     }
-    flags = tlb_addr & TLB_FLAGS_MASK;
+    flags &= tlb_addr;
 
     /* Fold all "mmio-like" bits into TLB_MMIO.  This is not RAM.  */
     if (unlikely(flags & ~(TLB_WATCHPOINT | TLB_NOTDIRTY))) {
diff --git a/target/s390x/tcg/mem_helper.c b/target/s390x/tcg/mem_helper.c
index fc52aa128b..3758b9e688 100644
--- a/target/s390x/tcg/mem_helper.c
+++ b/target/s390x/tcg/mem_helper.c
@@ -148,10 +148,6 @@ static int s390_probe_access(CPUArchState *env, target_ulong addr, int size,
 #else
     int flags;
 
-    /*
-     * For !CONFIG_USER_ONLY, we cannot rely on TLB_INVALID_MASK or haddr==NULL
-     * to detect if there was an exception during tlb_fill().
-     */
     env->tlb_fill_exc = 0;
     flags = probe_access_flags(env, addr, access_type, mmu_idx, nonfault, phost,
                                ra);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 04/17] accel/tcg: Introduce probe_access_full
  2022-09-25 10:51 [PATCH v5 00/17] tcg: CPUTLBEntryFull and TARGET_TB_PCREL Richard Henderson
                   ` (2 preceding siblings ...)
  2022-09-25 10:51 ` [PATCH v5 03/17] accel/tcg: Suppress auto-invalidate in probe_access_internal Richard Henderson
@ 2022-09-25 10:51 ` Richard Henderson
  2022-09-29 11:51   ` Alex Bennée
  2022-09-25 10:51 ` [PATCH v5 05/17] accel/tcg: Introduce tlb_set_page_full Richard Henderson
                   ` (13 subsequent siblings)
  17 siblings, 1 reply; 38+ messages in thread
From: Richard Henderson @ 2022-09-25 10:51 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Philippe Mathieu-Daudé

Add an interface to return the CPUTLBEntryFull struct
that goes with the lookup.  The result is not intended
to be valid across multiple lookups, so the user must
use the results immediately.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/exec/exec-all.h | 11 ++++++++++
 accel/tcg/cputlb.c      | 47 +++++++++++++++++++++++++----------------
 2 files changed, 40 insertions(+), 18 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index bcad607c4e..758cf6bcc7 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -434,6 +434,17 @@ int probe_access_flags(CPUArchState *env, target_ulong addr,
                        MMUAccessType access_type, int mmu_idx,
                        bool nonfault, void **phost, uintptr_t retaddr);
 
+#ifndef CONFIG_USER_ONLY
+/**
+ * probe_access_full:
+ * Like probe_access_flags, except also return into @pfull.
+ */
+int probe_access_full(CPUArchState *env, target_ulong addr,
+                      MMUAccessType access_type, int mmu_idx,
+                      bool nonfault, void **phost,
+                      CPUTLBEntryFull **pfull, uintptr_t retaddr);
+#endif
+
 #define CODE_GEN_ALIGN           16 /* must be >= of the size of a icache line */
 
 /* Estimated block size for TB allocation.  */
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 91f2b53142..62159549f6 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -1512,7 +1512,8 @@ static void notdirty_write(CPUState *cpu, vaddr mem_vaddr, unsigned size,
 static int probe_access_internal(CPUArchState *env, target_ulong addr,
                                  int fault_size, MMUAccessType access_type,
                                  int mmu_idx, bool nonfault,
-                                 void **phost, uintptr_t retaddr)
+                                 void **phost, CPUTLBEntryFull **pfull,
+                                 uintptr_t retaddr)
 {
     uintptr_t index = tlb_index(env, mmu_idx, addr);
     CPUTLBEntry *entry = tlb_entry(env, mmu_idx, addr);
@@ -1546,10 +1547,12 @@ static int probe_access_internal(CPUArchState *env, target_ulong addr,
                                        mmu_idx, nonfault, retaddr)) {
                 /* Non-faulting page table read failed.  */
                 *phost = NULL;
+                *pfull = NULL;
                 return TLB_INVALID_MASK;
             }
 
             /* TLB resize via tlb_fill may have moved the entry.  */
+            index = tlb_index(env, mmu_idx, addr);
             entry = tlb_entry(env, mmu_idx, addr);
 
             /*
@@ -1563,6 +1566,8 @@ static int probe_access_internal(CPUArchState *env, target_ulong addr,
     }
     flags &= tlb_addr;
 
+    *pfull = &env_tlb(env)->d[mmu_idx].fulltlb[index];
+
     /* Fold all "mmio-like" bits into TLB_MMIO.  This is not RAM.  */
     if (unlikely(flags & ~(TLB_WATCHPOINT | TLB_NOTDIRTY))) {
         *phost = NULL;
@@ -1574,37 +1579,44 @@ static int probe_access_internal(CPUArchState *env, target_ulong addr,
     return flags;
 }
 
-int probe_access_flags(CPUArchState *env, target_ulong addr,
-                       MMUAccessType access_type, int mmu_idx,
-                       bool nonfault, void **phost, uintptr_t retaddr)
+int probe_access_full(CPUArchState *env, target_ulong addr,
+                      MMUAccessType access_type, int mmu_idx,
+                      bool nonfault, void **phost, CPUTLBEntryFull **pfull,
+                      uintptr_t retaddr)
 {
-    int flags;
-
-    flags = probe_access_internal(env, addr, 0, access_type, mmu_idx,
-                                  nonfault, phost, retaddr);
+    int flags = probe_access_internal(env, addr, 0, access_type, mmu_idx,
+                                      nonfault, phost, pfull, retaddr);
 
     /* Handle clean RAM pages.  */
     if (unlikely(flags & TLB_NOTDIRTY)) {
-        uintptr_t index = tlb_index(env, mmu_idx, addr);
-        CPUTLBEntryFull *full = &env_tlb(env)->d[mmu_idx].fulltlb[index];
-
-        notdirty_write(env_cpu(env), addr, 1, full, retaddr);
+        notdirty_write(env_cpu(env), addr, 1, *pfull, retaddr);
         flags &= ~TLB_NOTDIRTY;
     }
 
     return flags;
 }
 
+int probe_access_flags(CPUArchState *env, target_ulong addr,
+                       MMUAccessType access_type, int mmu_idx,
+                       bool nonfault, void **phost, uintptr_t retaddr)
+{
+    CPUTLBEntryFull *full;
+
+    return probe_access_full(env, addr, access_type, mmu_idx,
+                             nonfault, phost, &full, retaddr);
+}
+
 void *probe_access(CPUArchState *env, target_ulong addr, int size,
                    MMUAccessType access_type, int mmu_idx, uintptr_t retaddr)
 {
+    CPUTLBEntryFull *full;
     void *host;
     int flags;
 
     g_assert(-(addr | TARGET_PAGE_MASK) >= size);
 
     flags = probe_access_internal(env, addr, size, access_type, mmu_idx,
-                                  false, &host, retaddr);
+                                  false, &host, &full, retaddr);
 
     /* Per the interface, size == 0 merely faults the access. */
     if (size == 0) {
@@ -1612,9 +1624,6 @@ void *probe_access(CPUArchState *env, target_ulong addr, int size,
     }
 
     if (unlikely(flags & (TLB_NOTDIRTY | TLB_WATCHPOINT))) {
-        uintptr_t index = tlb_index(env, mmu_idx, addr);
-        CPUTLBEntryFull *full = &env_tlb(env)->d[mmu_idx].fulltlb[index];
-
         /* Handle watchpoints.  */
         if (flags & TLB_WATCHPOINT) {
             int wp_access = (access_type == MMU_DATA_STORE
@@ -1635,11 +1644,12 @@ void *probe_access(CPUArchState *env, target_ulong addr, int size,
 void *tlb_vaddr_to_host(CPUArchState *env, abi_ptr addr,
                         MMUAccessType access_type, int mmu_idx)
 {
+    CPUTLBEntryFull *full;
     void *host;
     int flags;
 
     flags = probe_access_internal(env, addr, 0, access_type,
-                                  mmu_idx, true, &host, 0);
+                                  mmu_idx, true, &host, &full, 0);
 
     /* No combination of flags are expected by the caller. */
     return flags ? NULL : host;
@@ -1658,10 +1668,11 @@ void *tlb_vaddr_to_host(CPUArchState *env, abi_ptr addr,
 tb_page_addr_t get_page_addr_code_hostp(CPUArchState *env, target_ulong addr,
                                         void **hostp)
 {
+    CPUTLBEntryFull *full;
     void *p;
 
     (void)probe_access_internal(env, addr, 1, MMU_INST_FETCH,
-                                cpu_mmu_index(env, true), false, &p, 0);
+                                cpu_mmu_index(env, true), false, &p, &full, 0);
     if (p == NULL) {
         return -1;
     }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 05/17] accel/tcg: Introduce tlb_set_page_full
  2022-09-25 10:51 [PATCH v5 00/17] tcg: CPUTLBEntryFull and TARGET_TB_PCREL Richard Henderson
                   ` (3 preceding siblings ...)
  2022-09-25 10:51 ` [PATCH v5 04/17] accel/tcg: Introduce probe_access_full Richard Henderson
@ 2022-09-25 10:51 ` Richard Henderson
  2022-09-29 12:00   ` Alex Bennée
  2022-09-25 10:51 ` [PATCH v5 06/17] include/exec: Introduce TARGET_PAGE_ENTRY_EXTRA Richard Henderson
                   ` (12 subsequent siblings)
  17 siblings, 1 reply; 38+ messages in thread
From: Richard Henderson @ 2022-09-25 10:51 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Philippe Mathieu-Daudé

Now that we have collected all of the page data into
CPUTLBEntryFull, provide an interface to record that
all in one go, instead of using 4 arguments.  This interface
allows CPUTLBEntryFull to be extended without having to
change the number of arguments.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/exec/cpu-defs.h | 14 +++++++++++
 include/exec/exec-all.h | 22 ++++++++++++++++++
 accel/tcg/cputlb.c      | 51 ++++++++++++++++++++++++++---------------
 3 files changed, 69 insertions(+), 18 deletions(-)

diff --git a/include/exec/cpu-defs.h b/include/exec/cpu-defs.h
index f70f54d850..5e12cc1854 100644
--- a/include/exec/cpu-defs.h
+++ b/include/exec/cpu-defs.h
@@ -148,7 +148,21 @@ typedef struct CPUTLBEntryFull {
      *     + the offset within the target MemoryRegion (otherwise)
      */
     hwaddr xlat_section;
+
+    /*
+     * @phys_addr contains the physical address in the address space
+     * given by cpu_asidx_from_attrs(cpu, @attrs).
+     */
+    hwaddr phys_addr;
+
+    /* @attrs contains the memory transaction attributes for the page. */
     MemTxAttrs attrs;
+
+    /* @prot contains the complete protections for the page. */
+    uint8_t prot;
+
+    /* @lg_page_size contains the log2 of the page size. */
+    uint8_t lg_page_size;
 } CPUTLBEntryFull;
 
 /*
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 758cf6bcc7..0f25a1ba85 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -257,6 +257,28 @@ void tlb_flush_range_by_mmuidx_all_cpus_synced(CPUState *cpu,
                                                uint16_t idxmap,
                                                unsigned bits);
 
+/**
+ * tlb_set_page_full:
+ * @cpu: CPU context
+ * @mmu_idx: mmu index of the tlb to modify
+ * @vaddr: virtual address of the entry to add
+ * @full: the details of the tlb entry
+ *
+ * Add an entry to @cpu tlb index @mmu_idx.  All of the fields of
+ * @full must be filled, except for xlat_section, and constitute
+ * the complete description of the translated page.
+ *
+ * This is generally called by the target tlb_fill function after
+ * having performed a successful page table walk to find the physical
+ * address and attributes for the translation.
+ *
+ * At most one entry for a given virtual address is permitted. Only a
+ * single TARGET_PAGE_SIZE region is mapped; @full->lg_page_size is only
+ * used by tlb_flush_page.
+ */
+void tlb_set_page_full(CPUState *cpu, int mmu_idx, target_ulong vaddr,
+                       CPUTLBEntryFull *full);
+
 /**
  * tlb_set_page_with_attrs:
  * @cpu: CPU to add this TLB entry for
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 62159549f6..382c5d3109 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -1095,16 +1095,16 @@ static void tlb_add_large_page(CPUArchState *env, int mmu_idx,
     env_tlb(env)->d[mmu_idx].large_page_mask = lp_mask;
 }
 
-/* Add a new TLB entry. At most one entry for a given virtual address
+/*
+ * Add a new TLB entry. At most one entry for a given virtual address
  * is permitted. Only a single TARGET_PAGE_SIZE region is mapped, the
  * supplied size is only used by tlb_flush_page.
  *
  * Called from TCG-generated code, which is under an RCU read-side
  * critical section.
  */
-void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
-                             hwaddr paddr, MemTxAttrs attrs, int prot,
-                             int mmu_idx, target_ulong size)
+void tlb_set_page_full(CPUState *cpu, int mmu_idx,
+                       target_ulong vaddr, CPUTLBEntryFull *full)
 {
     CPUArchState *env = cpu->env_ptr;
     CPUTLB *tlb = env_tlb(env);
@@ -1117,35 +1117,36 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
     CPUTLBEntry *te, tn;
     hwaddr iotlb, xlat, sz, paddr_page;
     target_ulong vaddr_page;
-    int asidx = cpu_asidx_from_attrs(cpu, attrs);
-    int wp_flags;
+    int asidx, wp_flags, prot;
     bool is_ram, is_romd;
 
     assert_cpu_is_self(cpu);
 
-    if (size <= TARGET_PAGE_SIZE) {
+    if (full->lg_page_size <= TARGET_PAGE_BITS) {
         sz = TARGET_PAGE_SIZE;
     } else {
-        tlb_add_large_page(env, mmu_idx, vaddr, size);
-        sz = size;
+        sz = (hwaddr)1 << full->lg_page_size;
+        tlb_add_large_page(env, mmu_idx, vaddr, sz);
     }
     vaddr_page = vaddr & TARGET_PAGE_MASK;
-    paddr_page = paddr & TARGET_PAGE_MASK;
+    paddr_page = full->phys_addr & TARGET_PAGE_MASK;
 
+    prot = full->prot;
+    asidx = cpu_asidx_from_attrs(cpu, full->attrs);
     section = address_space_translate_for_iotlb(cpu, asidx, paddr_page,
-                                                &xlat, &sz, attrs, &prot);
+                                                &xlat, &sz, full->attrs, &prot);
     assert(sz >= TARGET_PAGE_SIZE);
 
     tlb_debug("vaddr=" TARGET_FMT_lx " paddr=0x" TARGET_FMT_plx
               " prot=%x idx=%d\n",
-              vaddr, paddr, prot, mmu_idx);
+              vaddr, full->phys_addr, prot, mmu_idx);
 
     address = vaddr_page;
-    if (size < TARGET_PAGE_SIZE) {
+    if (full->lg_page_size < TARGET_PAGE_BITS) {
         /* Repeat the MMU check and TLB fill on every access.  */
         address |= TLB_INVALID_MASK;
     }
-    if (attrs.byte_swap) {
+    if (full->attrs.byte_swap) {
         address |= TLB_BSWAP;
     }
 
@@ -1236,8 +1237,10 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
      * subtract here is that of the page base, and not the same as the
      * vaddr we add back in io_readx()/io_writex()/get_page_addr_code().
      */
+    desc->fulltlb[index] = *full;
     desc->fulltlb[index].xlat_section = iotlb - vaddr_page;
-    desc->fulltlb[index].attrs = attrs;
+    desc->fulltlb[index].phys_addr = paddr_page;
+    desc->fulltlb[index].prot = prot;
 
     /* Now calculate the new entry */
     tn.addend = addend - vaddr_page;
@@ -1272,9 +1275,21 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
     qemu_spin_unlock(&tlb->c.lock);
 }
 
-/* Add a new TLB entry, but without specifying the memory
- * transaction attributes to be used.
- */
+void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
+                             hwaddr paddr, MemTxAttrs attrs, int prot,
+                             int mmu_idx, target_ulong size)
+{
+    CPUTLBEntryFull full = {
+        .phys_addr = paddr,
+        .attrs = attrs,
+        .prot = prot,
+        .lg_page_size = ctz64(size)
+    };
+
+    assert(is_power_of_2(size));
+    tlb_set_page_full(cpu, mmu_idx, vaddr, &full);
+}
+
 void tlb_set_page(CPUState *cpu, target_ulong vaddr,
                   hwaddr paddr, int prot,
                   int mmu_idx, target_ulong size)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 06/17] include/exec: Introduce TARGET_PAGE_ENTRY_EXTRA
  2022-09-25 10:51 [PATCH v5 00/17] tcg: CPUTLBEntryFull and TARGET_TB_PCREL Richard Henderson
                   ` (4 preceding siblings ...)
  2022-09-25 10:51 ` [PATCH v5 05/17] accel/tcg: Introduce tlb_set_page_full Richard Henderson
@ 2022-09-25 10:51 ` Richard Henderson
  2022-09-29 12:00   ` Alex Bennée
  2022-09-25 10:51 ` [PATCH v5 07/17] target/sparc: Use tlb_set_page_full Richard Henderson
                   ` (11 subsequent siblings)
  17 siblings, 1 reply; 38+ messages in thread
From: Richard Henderson @ 2022-09-25 10:51 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Philippe Mathieu-Daudé

Allow the target to cache items from the guest page tables.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/exec/cpu-defs.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/include/exec/cpu-defs.h b/include/exec/cpu-defs.h
index 5e12cc1854..67239b4e5e 100644
--- a/include/exec/cpu-defs.h
+++ b/include/exec/cpu-defs.h
@@ -163,6 +163,15 @@ typedef struct CPUTLBEntryFull {
 
     /* @lg_page_size contains the log2 of the page size. */
     uint8_t lg_page_size;
+
+    /*
+     * Allow target-specific additions to this structure.
+     * This may be used to cache items from the guest cpu
+     * page tables for later use by the implementation.
+     */
+#ifdef TARGET_PAGE_ENTRY_EXTRA
+    TARGET_PAGE_ENTRY_EXTRA
+#endif
 } CPUTLBEntryFull;
 
 /*
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 07/17] target/sparc: Use tlb_set_page_full
  2022-09-25 10:51 [PATCH v5 00/17] tcg: CPUTLBEntryFull and TARGET_TB_PCREL Richard Henderson
                   ` (5 preceding siblings ...)
  2022-09-25 10:51 ` [PATCH v5 06/17] include/exec: Introduce TARGET_PAGE_ENTRY_EXTRA Richard Henderson
@ 2022-09-25 10:51 ` Richard Henderson
  2022-09-25 10:51 ` [PATCH v5 08/17] accel/tcg: Move byte_swap from MemTxAttrs to CPUTLBEntryFull Richard Henderson
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 38+ messages in thread
From: Richard Henderson @ 2022-09-25 10:51 UTC (permalink / raw)
  To: qemu-devel; +Cc: Mark Cave-Ayland

Convert get_physical_address and all subroutines to use
CPUTLBEntryFull, consolidating 4 pointer arguments, and
providing the larger structure to the lower layers.
This last will be important to the next patch.

Cc: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/sparc/mmu_helper.c | 123 ++++++++++++++++----------------------
 1 file changed, 53 insertions(+), 70 deletions(-)

diff --git a/target/sparc/mmu_helper.c b/target/sparc/mmu_helper.c
index 919448a494..08f656cbb6 100644
--- a/target/sparc/mmu_helper.c
+++ b/target/sparc/mmu_helper.c
@@ -64,10 +64,9 @@ static const int perm_table[2][8] = {
     }
 };
 
-static int get_physical_address(CPUSPARCState *env, hwaddr *physical,
-                                int *prot, int *access_index, MemTxAttrs *attrs,
-                                target_ulong address, int rw, int mmu_idx,
-                                target_ulong *page_size)
+static int get_physical_address(CPUSPARCState *env, CPUTLBEntryFull *full,
+                                int *access_index, target_ulong address,
+                                int rw, int mmu_idx)
 {
     int access_perms = 0;
     hwaddr pde_ptr;
@@ -80,20 +79,20 @@ static int get_physical_address(CPUSPARCState *env, hwaddr *physical,
     is_user = mmu_idx == MMU_USER_IDX;
 
     if (mmu_idx == MMU_PHYS_IDX) {
-        *page_size = TARGET_PAGE_SIZE;
+        full->lg_page_size = TARGET_PAGE_BITS;
         /* Boot mode: instruction fetches are taken from PROM */
         if (rw == 2 && (env->mmuregs[0] & env->def.mmu_bm)) {
-            *physical = env->prom_addr | (address & 0x7ffffULL);
-            *prot = PAGE_READ | PAGE_EXEC;
+            full->phys_addr = env->prom_addr | (address & 0x7ffffULL);
+            full->prot = PAGE_READ | PAGE_EXEC;
             return 0;
         }
-        *physical = address;
-        *prot = PAGE_READ | PAGE_WRITE | PAGE_EXEC;
+        full->phys_addr = address;
+        full->prot = PAGE_READ | PAGE_WRITE | PAGE_EXEC;
         return 0;
     }
 
     *access_index = ((rw & 1) << 2) | (rw & 2) | (is_user ? 0 : 1);
-    *physical = 0xffffffffffff0000ULL;
+    full->phys_addr = 0xffffffffffff0000ULL;
 
     /* SPARC reference MMU table walk: Context table->L1->L2->PTE */
     /* Context base + context number */
@@ -157,16 +156,16 @@ static int get_physical_address(CPUSPARCState *env, hwaddr *physical,
                 case 2: /* L3 PTE */
                     page_offset = 0;
                 }
-                *page_size = TARGET_PAGE_SIZE;
+                full->lg_page_size = TARGET_PAGE_BITS;
                 break;
             case 2: /* L2 PTE */
                 page_offset = address & 0x3f000;
-                *page_size = 0x40000;
+                full->lg_page_size = ctz32(0x40000);
             }
             break;
         case 2: /* L1 PTE */
             page_offset = address & 0xfff000;
-            *page_size = 0x1000000;
+            full->lg_page_size = ctz32(0x1000000);
         }
     }
 
@@ -188,16 +187,16 @@ static int get_physical_address(CPUSPARCState *env, hwaddr *physical,
     }
 
     /* the page can be put in the TLB */
-    *prot = perm_table[is_user][access_perms];
+    full->prot = perm_table[is_user][access_perms];
     if (!(pde & PG_MODIFIED_MASK)) {
         /* only set write access if already dirty... otherwise wait
            for dirty access */
-        *prot &= ~PAGE_WRITE;
+        full->prot &= ~PAGE_WRITE;
     }
 
     /* Even if large ptes, we map only one 4KB page in the cache to
        avoid filling it too fast */
-    *physical = ((hwaddr)(pde & PTE_ADDR_MASK) << 4) + page_offset;
+    full->phys_addr = ((hwaddr)(pde & PTE_ADDR_MASK) << 4) + page_offset;
     return error_code;
 }
 
@@ -208,11 +207,8 @@ bool sparc_cpu_tlb_fill(CPUState *cs, vaddr address, int size,
 {
     SPARCCPU *cpu = SPARC_CPU(cs);
     CPUSPARCState *env = &cpu->env;
-    hwaddr paddr;
-    target_ulong vaddr;
-    target_ulong page_size;
-    int error_code = 0, prot, access_index;
-    MemTxAttrs attrs = {};
+    CPUTLBEntryFull full = { };
+    int error_code = 0, access_index;
 
     /*
      * TODO: If we ever need tlb_vaddr_to_host for this target,
@@ -223,16 +219,13 @@ bool sparc_cpu_tlb_fill(CPUState *cs, vaddr address, int size,
     assert(!probe);
 
     address &= TARGET_PAGE_MASK;
-    error_code = get_physical_address(env, &paddr, &prot, &access_index, &attrs,
-                                      address, access_type,
-                                      mmu_idx, &page_size);
-    vaddr = address;
+    error_code = get_physical_address(env, &full, &access_index,
+                                      address, access_type, mmu_idx);
     if (likely(error_code == 0)) {
         qemu_log_mask(CPU_LOG_MMU,
-                      "Translate at %" VADDR_PRIx " -> "
-                      TARGET_FMT_plx ", vaddr " TARGET_FMT_lx "\n",
-                      address, paddr, vaddr);
-        tlb_set_page(cs, vaddr, paddr, prot, mmu_idx, page_size);
+                      "Translate at %" VADDR_PRIx " -> " TARGET_FMT_plx "\n",
+                      address, full.phys_addr);
+        tlb_set_page_full(cs, mmu_idx, address, &full);
         return true;
     }
 
@@ -247,8 +240,9 @@ bool sparc_cpu_tlb_fill(CPUState *cs, vaddr address, int size,
            permissions. If no mapping is available, redirect accesses to
            neverland. Fake/overridden mappings will be flushed when
            switching to normal mode. */
-        prot = PAGE_READ | PAGE_WRITE | PAGE_EXEC;
-        tlb_set_page(cs, vaddr, paddr, prot, mmu_idx, TARGET_PAGE_SIZE);
+        full.prot = PAGE_READ | PAGE_WRITE | PAGE_EXEC;
+        full.lg_page_size = TARGET_PAGE_BITS;
+        tlb_set_page_full(cs, mmu_idx, address, &full);
         return true;
     } else {
         if (access_type == MMU_INST_FETCH) {
@@ -545,8 +539,7 @@ static uint64_t build_sfsr(CPUSPARCState *env, int mmu_idx, int rw)
     return sfsr;
 }
 
-static int get_physical_address_data(CPUSPARCState *env, hwaddr *physical,
-                                     int *prot, MemTxAttrs *attrs,
+static int get_physical_address_data(CPUSPARCState *env, CPUTLBEntryFull *full,
                                      target_ulong address, int rw, int mmu_idx)
 {
     CPUState *cs = env_cpu(env);
@@ -579,11 +572,12 @@ static int get_physical_address_data(CPUSPARCState *env, hwaddr *physical,
 
     for (i = 0; i < 64; i++) {
         /* ctx match, vaddr match, valid? */
-        if (ultrasparc_tag_match(&env->dtlb[i], address, context, physical)) {
+        if (ultrasparc_tag_match(&env->dtlb[i], address, context,
+                                 &full->phys_addr)) {
             int do_fault = 0;
 
             if (TTE_IS_IE(env->dtlb[i].tte)) {
-                attrs->byte_swap = true;
+                full->attrs.byte_swap = true;
             }
 
             /* access ok? */
@@ -616,9 +610,9 @@ static int get_physical_address_data(CPUSPARCState *env, hwaddr *physical,
             }
 
             if (!do_fault) {
-                *prot = PAGE_READ;
+                full->prot = PAGE_READ;
                 if (TTE_IS_W_OK(env->dtlb[i].tte)) {
-                    *prot |= PAGE_WRITE;
+                    full->prot |= PAGE_WRITE;
                 }
 
                 TTE_SET_USED(env->dtlb[i].tte);
@@ -645,8 +639,7 @@ static int get_physical_address_data(CPUSPARCState *env, hwaddr *physical,
     return 1;
 }
 
-static int get_physical_address_code(CPUSPARCState *env, hwaddr *physical,
-                                     int *prot, MemTxAttrs *attrs,
+static int get_physical_address_code(CPUSPARCState *env, CPUTLBEntryFull *full,
                                      target_ulong address, int mmu_idx)
 {
     CPUState *cs = env_cpu(env);
@@ -681,7 +674,7 @@ static int get_physical_address_code(CPUSPARCState *env, hwaddr *physical,
     for (i = 0; i < 64; i++) {
         /* ctx match, vaddr match, valid? */
         if (ultrasparc_tag_match(&env->itlb[i],
-                                 address, context, physical)) {
+                                 address, context, &full->phys_addr)) {
             /* access ok? */
             if (TTE_IS_PRIV(env->itlb[i].tte) && is_user) {
                 /* Fault status register */
@@ -708,7 +701,7 @@ static int get_physical_address_code(CPUSPARCState *env, hwaddr *physical,
 
                 return 1;
             }
-            *prot = PAGE_EXEC;
+            full->prot = PAGE_EXEC;
             TTE_SET_USED(env->itlb[i].tte);
             return 0;
         }
@@ -722,14 +715,13 @@ static int get_physical_address_code(CPUSPARCState *env, hwaddr *physical,
     return 1;
 }
 
-static int get_physical_address(CPUSPARCState *env, hwaddr *physical,
-                                int *prot, int *access_index, MemTxAttrs *attrs,
-                                target_ulong address, int rw, int mmu_idx,
-                                target_ulong *page_size)
+static int get_physical_address(CPUSPARCState *env, CPUTLBEntryFull *full,
+                                int *access_index, target_ulong address,
+                                int rw, int mmu_idx)
 {
     /* ??? We treat everything as a small page, then explicitly flush
        everything when an entry is evicted.  */
-    *page_size = TARGET_PAGE_SIZE;
+    full->lg_page_size = TARGET_PAGE_BITS;
 
     /* safety net to catch wrong softmmu index use from dynamic code */
     if (env->tl > 0 && mmu_idx != MMU_NUCLEUS_IDX) {
@@ -747,17 +739,15 @@ static int get_physical_address(CPUSPARCState *env, hwaddr *physical,
     }
 
     if (mmu_idx == MMU_PHYS_IDX) {
-        *physical = ultrasparc_truncate_physical(address);
-        *prot = PAGE_READ | PAGE_WRITE | PAGE_EXEC;
+        full->phys_addr = ultrasparc_truncate_physical(address);
+        full->prot = PAGE_READ | PAGE_WRITE | PAGE_EXEC;
         return 0;
     }
 
     if (rw == 2) {
-        return get_physical_address_code(env, physical, prot, attrs, address,
-                                         mmu_idx);
+        return get_physical_address_code(env, full, address, mmu_idx);
     } else {
-        return get_physical_address_data(env, physical, prot, attrs, address,
-                                         rw, mmu_idx);
+        return get_physical_address_data(env, full, address, rw, mmu_idx);
     }
 }
 
@@ -768,25 +758,18 @@ bool sparc_cpu_tlb_fill(CPUState *cs, vaddr address, int size,
 {
     SPARCCPU *cpu = SPARC_CPU(cs);
     CPUSPARCState *env = &cpu->env;
-    target_ulong vaddr;
-    hwaddr paddr;
-    target_ulong page_size;
-    MemTxAttrs attrs = {};
-    int error_code = 0, prot, access_index;
+    CPUTLBEntryFull full;
+    int error_code = 0, access_index;
 
     address &= TARGET_PAGE_MASK;
-    error_code = get_physical_address(env, &paddr, &prot, &access_index, &attrs,
-                                      address, access_type,
-                                      mmu_idx, &page_size);
+    error_code = get_physical_address(env, &full, &access_index,
+                                      address, access_type, mmu_idx);
     if (likely(error_code == 0)) {
-        vaddr = address;
-
-        trace_mmu_helper_mmu_fault(address, paddr, mmu_idx, env->tl,
+        trace_mmu_helper_mmu_fault(address, full.phys_addr, mmu_idx, env->tl,
                                    env->dmmu.mmu_primary_context,
                                    env->dmmu.mmu_secondary_context);
 
-        tlb_set_page_with_attrs(cs, vaddr, paddr, attrs, prot, mmu_idx,
-                                page_size);
+        tlb_set_page_full(cs, mmu_idx, address, &full);
         return true;
     }
     if (probe) {
@@ -888,12 +871,12 @@ void dump_mmu(CPUSPARCState *env)
 static int cpu_sparc_get_phys_page(CPUSPARCState *env, hwaddr *phys,
                                    target_ulong addr, int rw, int mmu_idx)
 {
-    target_ulong page_size;
-    int prot, access_index;
-    MemTxAttrs attrs = {};
+    CPUTLBEntryFull full = {};
+    int access_index, ret;
 
-    return get_physical_address(env, phys, &prot, &access_index, &attrs, addr,
-                                rw, mmu_idx, &page_size);
+    ret = get_physical_address(env, &full, &access_index, addr, rw, mmu_idx);
+    *phys = full.phys_addr;
+    return ret;
 }
 
 #if defined(TARGET_SPARC64)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 08/17] accel/tcg: Move byte_swap from MemTxAttrs to CPUTLBEntryFull
  2022-09-25 10:51 [PATCH v5 00/17] tcg: CPUTLBEntryFull and TARGET_TB_PCREL Richard Henderson
                   ` (6 preceding siblings ...)
  2022-09-25 10:51 ` [PATCH v5 07/17] target/sparc: Use tlb_set_page_full Richard Henderson
@ 2022-09-25 10:51 ` Richard Henderson
  2022-09-29 12:27   ` Alex Bennée
  2022-09-25 10:51 ` [PATCH v5 09/17] accel/tcg: Add force_aligned " Richard Henderson
                   ` (9 subsequent siblings)
  17 siblings, 1 reply; 38+ messages in thread
From: Richard Henderson @ 2022-09-25 10:51 UTC (permalink / raw)
  To: qemu-devel; +Cc: Mark Cave-Ayland

We had previously placed this bit in MemTxAttrs because we had
no other way to communicate that information to tlb_set_page*.
The bit is not relevant to memory transactions, only page table
entries, and now we do have a way to pass in the bit.

Cc: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/exec/cpu-all.h    | 6 +++---
 include/exec/cpu-defs.h   | 3 +++
 include/exec/memattrs.h   | 2 --
 accel/tcg/cputlb.c        | 8 ++++----
 target/sparc/mmu_helper.c | 2 +-
 5 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index 491629b9ba..064aa5aee8 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -384,8 +384,8 @@ CPUArchState *cpu_copy(CPUArchState *env);
 #define TLB_MMIO            (1 << (TARGET_PAGE_BITS_MIN - 3))
 /* Set if TLB entry contains a watchpoint.  */
 #define TLB_WATCHPOINT      (1 << (TARGET_PAGE_BITS_MIN - 4))
-/* Set if TLB entry requires byte swap.  */
-#define TLB_BSWAP           (1 << (TARGET_PAGE_BITS_MIN - 5))
+/* Set if TLB entry requires slow path handling.  */
+#define TLB_SLOW_PATH       (1 << (TARGET_PAGE_BITS_MIN - 5))
 /* Set if TLB entry writes ignored.  */
 #define TLB_DISCARD_WRITE   (1 << (TARGET_PAGE_BITS_MIN - 6))
 
@@ -394,7 +394,7 @@ CPUArchState *cpu_copy(CPUArchState *env);
  */
 #define TLB_FLAGS_MASK \
     (TLB_INVALID_MASK | TLB_NOTDIRTY | TLB_MMIO \
-    | TLB_WATCHPOINT | TLB_BSWAP | TLB_DISCARD_WRITE)
+    | TLB_WATCHPOINT | TLB_SLOW_PATH | TLB_DISCARD_WRITE)
 
 /**
  * tlb_hit_page: return true if page aligned @addr is a hit against the
diff --git a/include/exec/cpu-defs.h b/include/exec/cpu-defs.h
index 67239b4e5e..7c0ba93826 100644
--- a/include/exec/cpu-defs.h
+++ b/include/exec/cpu-defs.h
@@ -164,6 +164,9 @@ typedef struct CPUTLBEntryFull {
     /* @lg_page_size contains the log2 of the page size. */
     uint8_t lg_page_size;
 
+    /* @byte_swap indicates that all accesses use inverted endianness. */
+    bool byte_swap;
+
     /*
      * Allow target-specific additions to this structure.
      * This may be used to cache items from the guest cpu
diff --git a/include/exec/memattrs.h b/include/exec/memattrs.h
index 9fb98bc1ef..570e73c06f 100644
--- a/include/exec/memattrs.h
+++ b/include/exec/memattrs.h
@@ -45,8 +45,6 @@ typedef struct MemTxAttrs {
     unsigned int memory:1;
     /* Requester ID (for MSI for example) */
     unsigned int requester_id:16;
-    /* Invert endianness for this page */
-    unsigned int byte_swap:1;
     /*
      * The following are target-specific page-table bits.  These are not
      * related to actual memory transactions at all.  However, this structure
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 382c5d3109..1a5a6bd98b 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -1146,8 +1146,8 @@ void tlb_set_page_full(CPUState *cpu, int mmu_idx,
         /* Repeat the MMU check and TLB fill on every access.  */
         address |= TLB_INVALID_MASK;
     }
-    if (full->attrs.byte_swap) {
-        address |= TLB_BSWAP;
+    if (full->byte_swap) {
+        address |= TLB_SLOW_PATH;
     }
 
     is_ram = memory_region_is_ram(section->mr);
@@ -1961,7 +1961,7 @@ load_helper(CPUArchState *env, target_ulong addr, MemOpIdx oi,
                                  full->attrs, BP_MEM_READ, retaddr);
         }
 
-        need_swap = size > 1 && (tlb_addr & TLB_BSWAP);
+        need_swap = size > 1 && full->byte_swap;
 
         /* Handle I/O access.  */
         if (likely(tlb_addr & TLB_MMIO)) {
@@ -2366,7 +2366,7 @@ store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
                                  full->attrs, BP_MEM_WRITE, retaddr);
         }
 
-        need_swap = size > 1 && (tlb_addr & TLB_BSWAP);
+        need_swap = size > 1 && full->byte_swap;
 
         /* Handle I/O access.  */
         if (tlb_addr & TLB_MMIO) {
diff --git a/target/sparc/mmu_helper.c b/target/sparc/mmu_helper.c
index 08f656cbb6..a857bd9569 100644
--- a/target/sparc/mmu_helper.c
+++ b/target/sparc/mmu_helper.c
@@ -577,7 +577,7 @@ static int get_physical_address_data(CPUSPARCState *env, CPUTLBEntryFull *full,
             int do_fault = 0;
 
             if (TTE_IS_IE(env->dtlb[i].tte)) {
-                full->attrs.byte_swap = true;
+                full->byte_swap = true;
             }
 
             /* access ok? */
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 09/17] accel/tcg: Add force_aligned to CPUTLBEntryFull
  2022-09-25 10:51 [PATCH v5 00/17] tcg: CPUTLBEntryFull and TARGET_TB_PCREL Richard Henderson
                   ` (7 preceding siblings ...)
  2022-09-25 10:51 ` [PATCH v5 08/17] accel/tcg: Move byte_swap from MemTxAttrs to CPUTLBEntryFull Richard Henderson
@ 2022-09-25 10:51 ` Richard Henderson
  2022-09-25 10:51 ` [PATCH v5 10/17] accel/tcg: Remove PageDesc code_bitmap Richard Henderson
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 38+ messages in thread
From: Richard Henderson @ 2022-09-25 10:51 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell

Support per-page natural alignment checking.  This will be
used by Arm for pages mapped with memory type Device.

Cc: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/exec/cpu-defs.h |  3 +++
 accel/tcg/cputlb.c      | 20 +++++++++++++-------
 2 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/include/exec/cpu-defs.h b/include/exec/cpu-defs.h
index 7c0ba93826..d0acbb4d35 100644
--- a/include/exec/cpu-defs.h
+++ b/include/exec/cpu-defs.h
@@ -167,6 +167,9 @@ typedef struct CPUTLBEntryFull {
     /* @byte_swap indicates that all accesses use inverted endianness. */
     bool byte_swap;
 
+    /* @force_aligned indicates that all accesses must be aligned. */
+    bool force_aligned;
+
     /*
      * Allow target-specific additions to this structure.
      * This may be used to cache items from the guest cpu
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 1a5a6bd98b..01a89b4a1f 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -1146,7 +1146,7 @@ void tlb_set_page_full(CPUState *cpu, int mmu_idx,
         /* Repeat the MMU check and TLB fill on every access.  */
         address |= TLB_INVALID_MASK;
     }
-    if (full->byte_swap) {
+    if (full->byte_swap || full->force_aligned) {
         address |= TLB_SLOW_PATH;
     }
 
@@ -1944,16 +1944,19 @@ load_helper(CPUArchState *env, target_ulong addr, MemOpIdx oi,
 
     /* Handle anything that isn't just a straight memory access.  */
     if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
-        CPUTLBEntryFull *full;
+        CPUTLBEntryFull *full = &env_tlb(env)->d[mmu_idx].fulltlb[index];
         bool need_swap;
 
         /* For anything that is unaligned, recurse through full_load.  */
         if ((addr & (size - 1)) != 0) {
+            /* Honor per-page alignment requirements. */
+            if (full->force_aligned) {
+                cpu_unaligned_access(env_cpu(env), addr, access_type,
+                                     mmu_idx, retaddr);
+            }
             goto do_unaligned_access;
         }
 
-        full = &env_tlb(env)->d[mmu_idx].fulltlb[index];
-
         /* Handle watchpoints.  */
         if (unlikely(tlb_addr & TLB_WATCHPOINT)) {
             /* On watchpoint hit, this will longjmp out.  */
@@ -2349,16 +2352,19 @@ store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
 
     /* Handle anything that isn't just a straight memory access.  */
     if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
-        CPUTLBEntryFull *full;
+        CPUTLBEntryFull *full = &env_tlb(env)->d[mmu_idx].fulltlb[index];
         bool need_swap;
 
         /* For anything that is unaligned, recurse through byte stores.  */
         if ((addr & (size - 1)) != 0) {
+            /* Honor per-page alignment requirements. */
+            if (full->force_aligned) {
+                cpu_unaligned_access(env_cpu(env), addr, MMU_DATA_STORE,
+                                     mmu_idx, retaddr);
+            }
             goto do_unaligned_access;
         }
 
-        full = &env_tlb(env)->d[mmu_idx].fulltlb[index];
-
         /* Handle watchpoints.  */
         if (unlikely(tlb_addr & TLB_WATCHPOINT)) {
             /* On watchpoint hit, this will longjmp out.  */
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 10/17] accel/tcg: Remove PageDesc code_bitmap
  2022-09-25 10:51 [PATCH v5 00/17] tcg: CPUTLBEntryFull and TARGET_TB_PCREL Richard Henderson
                   ` (8 preceding siblings ...)
  2022-09-25 10:51 ` [PATCH v5 09/17] accel/tcg: Add force_aligned " Richard Henderson
@ 2022-09-25 10:51 ` Richard Henderson
  2022-09-29 12:27   ` Alex Bennée
  2022-09-25 10:51 ` [PATCH v5 11/17] accel/tcg: Use bool for page_find_alloc Richard Henderson
                   ` (7 subsequent siblings)
  17 siblings, 1 reply; 38+ messages in thread
From: Richard Henderson @ 2022-09-25 10:51 UTC (permalink / raw)
  To: qemu-devel

This bitmap is created and discarded immediately.
We gain nothing by its existence.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20220822232338.1727934-2-richard.henderson@linaro.org>
---
 accel/tcg/translate-all.c | 78 ++-------------------------------------
 1 file changed, 4 insertions(+), 74 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index f5e8592d4a..9046ea80f7 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -102,21 +102,14 @@
 #define assert_memory_lock() tcg_debug_assert(have_mmap_lock())
 #endif
 
-#define SMC_BITMAP_USE_THRESHOLD 10
-
 typedef struct PageDesc {
     /* list of TBs intersecting this ram page */
     uintptr_t first_tb;
-#ifdef CONFIG_SOFTMMU
-    /* in order to optimize self modifying code, we count the number
-       of lookups we do to a given page to use a bitmap */
-    unsigned long *code_bitmap;
-    unsigned int code_write_count;
-#else
+#ifdef CONFIG_USER_ONLY
     unsigned long flags;
     void *target_data;
 #endif
-#ifndef CONFIG_USER_ONLY
+#ifdef CONFIG_SOFTMMU
     QemuSpin lock;
 #endif
 } PageDesc;
@@ -907,17 +900,6 @@ void tb_htable_init(void)
     qht_init(&tb_ctx.htable, tb_cmp, CODE_GEN_HTABLE_SIZE, mode);
 }
 
-/* call with @p->lock held */
-static inline void invalidate_page_bitmap(PageDesc *p)
-{
-    assert_page_locked(p);
-#ifdef CONFIG_SOFTMMU
-    g_free(p->code_bitmap);
-    p->code_bitmap = NULL;
-    p->code_write_count = 0;
-#endif
-}
-
 /* Set to NULL all the 'first_tb' fields in all PageDescs. */
 static void page_flush_tb_1(int level, void **lp)
 {
@@ -932,7 +914,6 @@ static void page_flush_tb_1(int level, void **lp)
         for (i = 0; i < V_L2_SIZE; ++i) {
             page_lock(&pd[i]);
             pd[i].first_tb = (uintptr_t)NULL;
-            invalidate_page_bitmap(pd + i);
             page_unlock(&pd[i]);
         }
     } else {
@@ -1197,11 +1178,9 @@ static void do_tb_phys_invalidate(TranslationBlock *tb, bool rm_from_page_list)
     if (rm_from_page_list) {
         p = page_find(tb->page_addr[0] >> TARGET_PAGE_BITS);
         tb_page_remove(p, tb);
-        invalidate_page_bitmap(p);
         if (tb->page_addr[1] != -1) {
             p = page_find(tb->page_addr[1] >> TARGET_PAGE_BITS);
             tb_page_remove(p, tb);
-            invalidate_page_bitmap(p);
         }
     }
 
@@ -1246,35 +1225,6 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
     }
 }
 
-#ifdef CONFIG_SOFTMMU
-/* call with @p->lock held */
-static void build_page_bitmap(PageDesc *p)
-{
-    int n, tb_start, tb_end;
-    TranslationBlock *tb;
-
-    assert_page_locked(p);
-    p->code_bitmap = bitmap_new(TARGET_PAGE_SIZE);
-
-    PAGE_FOR_EACH_TB(p, tb, n) {
-        /* NOTE: this is subtle as a TB may span two physical pages */
-        if (n == 0) {
-            /* NOTE: tb_end may be after the end of the page, but
-               it is not a problem */
-            tb_start = tb->pc & ~TARGET_PAGE_MASK;
-            tb_end = tb_start + tb->size;
-            if (tb_end > TARGET_PAGE_SIZE) {
-                tb_end = TARGET_PAGE_SIZE;
-             }
-        } else {
-            tb_start = 0;
-            tb_end = ((tb->pc + tb->size) & ~TARGET_PAGE_MASK);
-        }
-        bitmap_set(p->code_bitmap, tb_start, tb_end - tb_start);
-    }
-}
-#endif
-
 /* add the tb in the target page and protect it if necessary
  *
  * Called with mmap_lock held for user-mode emulation.
@@ -1295,7 +1245,6 @@ static inline void tb_page_add(PageDesc *p, TranslationBlock *tb,
     page_already_protected = p->first_tb != (uintptr_t)NULL;
 #endif
     p->first_tb = (uintptr_t)tb | n;
-    invalidate_page_bitmap(p);
 
 #if defined(CONFIG_USER_ONLY)
     /* translator_loop() must have made all TB pages non-writable */
@@ -1357,10 +1306,8 @@ tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
     /* remove TB from the page(s) if we couldn't insert it */
     if (unlikely(existing_tb)) {
         tb_page_remove(p, tb);
-        invalidate_page_bitmap(p);
         if (p2) {
             tb_page_remove(p2, tb);
-            invalidate_page_bitmap(p2);
         }
         tb = existing_tb;
     }
@@ -1731,7 +1678,6 @@ tb_invalidate_phys_page_range__locked(struct page_collection *pages,
 #if !defined(CONFIG_USER_ONLY)
     /* if no code remaining, no need to continue to use slow writes */
     if (!p->first_tb) {
-        invalidate_page_bitmap(p);
         tlb_unprotect_code(start);
     }
 #endif
@@ -1827,24 +1773,8 @@ void tb_invalidate_phys_page_fast(struct page_collection *pages,
     }
 
     assert_page_locked(p);
-    if (!p->code_bitmap &&
-        ++p->code_write_count >= SMC_BITMAP_USE_THRESHOLD) {
-        build_page_bitmap(p);
-    }
-    if (p->code_bitmap) {
-        unsigned int nr;
-        unsigned long b;
-
-        nr = start & ~TARGET_PAGE_MASK;
-        b = p->code_bitmap[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG - 1));
-        if (b & ((1 << len) - 1)) {
-            goto do_invalidate;
-        }
-    } else {
-    do_invalidate:
-        tb_invalidate_phys_page_range__locked(pages, p, start, start + len,
-                                              retaddr);
-    }
+    tb_invalidate_phys_page_range__locked(pages, p, start, start + len,
+                                          retaddr);
 }
 #else
 /* Called with mmap_lock held. If pc is not 0 then it indicates the
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 11/17] accel/tcg: Use bool for page_find_alloc
  2022-09-25 10:51 [PATCH v5 00/17] tcg: CPUTLBEntryFull and TARGET_TB_PCREL Richard Henderson
                   ` (9 preceding siblings ...)
  2022-09-25 10:51 ` [PATCH v5 10/17] accel/tcg: Remove PageDesc code_bitmap Richard Henderson
@ 2022-09-25 10:51 ` Richard Henderson
  2022-09-25 10:51 ` [PATCH v5 12/17] accel/tcg: Use DisasContextBase in plugin_gen_tb_start Richard Henderson
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 38+ messages in thread
From: Richard Henderson @ 2022-09-25 10:51 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alex Bennée, Philippe Mathieu-Daudé

Bool is more appropriate type for the alloc parameter.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/translate-all.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 9046ea80f7..69f39d669d 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -465,7 +465,7 @@ void page_init(void)
 #endif
 }
 
-static PageDesc *page_find_alloc(tb_page_addr_t index, int alloc)
+static PageDesc *page_find_alloc(tb_page_addr_t index, bool alloc)
 {
     PageDesc *pd;
     void **lp;
@@ -533,11 +533,11 @@ static PageDesc *page_find_alloc(tb_page_addr_t index, int alloc)
 
 static inline PageDesc *page_find(tb_page_addr_t index)
 {
-    return page_find_alloc(index, 0);
+    return page_find_alloc(index, false);
 }
 
 static void page_lock_pair(PageDesc **ret_p1, tb_page_addr_t phys1,
-                           PageDesc **ret_p2, tb_page_addr_t phys2, int alloc);
+                           PageDesc **ret_p2, tb_page_addr_t phys2, bool alloc);
 
 /* In user-mode page locks aren't used; mmap_lock is enough */
 #ifdef CONFIG_USER_ONLY
@@ -651,7 +651,7 @@ static inline void page_unlock(PageDesc *pd)
 /* lock the page(s) of a TB in the correct acquisition order */
 static inline void page_lock_tb(const TranslationBlock *tb)
 {
-    page_lock_pair(NULL, tb->page_addr[0], NULL, tb->page_addr[1], 0);
+    page_lock_pair(NULL, tb->page_addr[0], NULL, tb->page_addr[1], false);
 }
 
 static inline void page_unlock_tb(const TranslationBlock *tb)
@@ -840,7 +840,7 @@ void page_collection_unlock(struct page_collection *set)
 #endif /* !CONFIG_USER_ONLY */
 
 static void page_lock_pair(PageDesc **ret_p1, tb_page_addr_t phys1,
-                           PageDesc **ret_p2, tb_page_addr_t phys2, int alloc)
+                           PageDesc **ret_p2, tb_page_addr_t phys2, bool alloc)
 {
     PageDesc *p1, *p2;
     tb_page_addr_t page1;
@@ -1290,7 +1290,7 @@ tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
      * Note that inserting into the hash table first isn't an option, since
      * we can only insert TBs that are fully initialized.
      */
-    page_lock_pair(&p, phys_pc, &p2, phys_page2, 1);
+    page_lock_pair(&p, phys_pc, &p2, phys_page2, true);
     tb_page_add(p, tb, 0, phys_pc & TARGET_PAGE_MASK);
     if (p2) {
         tb_page_add(p2, tb, 1, phys_page2);
@@ -2219,7 +2219,7 @@ void page_set_flags(target_ulong start, target_ulong end, int flags)
     for (addr = start, len = end - start;
          len != 0;
          len -= TARGET_PAGE_SIZE, addr += TARGET_PAGE_SIZE) {
-        PageDesc *p = page_find_alloc(addr >> TARGET_PAGE_BITS, 1);
+        PageDesc *p = page_find_alloc(addr >> TARGET_PAGE_BITS, true);
 
         /* If the write protection bit is set, then we invalidate
            the code inside.  */
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 12/17] accel/tcg: Use DisasContextBase in plugin_gen_tb_start
  2022-09-25 10:51 [PATCH v5 00/17] tcg: CPUTLBEntryFull and TARGET_TB_PCREL Richard Henderson
                   ` (10 preceding siblings ...)
  2022-09-25 10:51 ` [PATCH v5 11/17] accel/tcg: Use bool for page_find_alloc Richard Henderson
@ 2022-09-25 10:51 ` Richard Henderson
  2022-09-25 10:51 ` [PATCH v5 13/17] accel/tcg: Do not align tb->page_addr[0] Richard Henderson
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 38+ messages in thread
From: Richard Henderson @ 2022-09-25 10:51 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alex Bennée

Use the pc coming from db->pc_first rather than the TB.

Use the cached host_addr rather than re-computing for the
first page.  We still need a separate lookup for the second
page because it won't be computed for DisasContextBase until
the translator actually performs a read from the page.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/exec/plugin-gen.h |  7 ++++---
 accel/tcg/plugin-gen.c    | 22 +++++++++++-----------
 accel/tcg/translator.c    |  2 +-
 3 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/include/exec/plugin-gen.h b/include/exec/plugin-gen.h
index f92f169739..5004728c61 100644
--- a/include/exec/plugin-gen.h
+++ b/include/exec/plugin-gen.h
@@ -19,7 +19,8 @@ struct DisasContextBase;
 
 #ifdef CONFIG_PLUGIN
 
-bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb, bool supress);
+bool plugin_gen_tb_start(CPUState *cpu, const struct DisasContextBase *db,
+                         bool supress);
 void plugin_gen_tb_end(CPUState *cpu);
 void plugin_gen_insn_start(CPUState *cpu, const struct DisasContextBase *db);
 void plugin_gen_insn_end(void);
@@ -48,8 +49,8 @@ static inline void plugin_insn_append(abi_ptr pc, const void *from, size_t size)
 
 #else /* !CONFIG_PLUGIN */
 
-static inline
-bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb, bool supress)
+static inline bool
+plugin_gen_tb_start(CPUState *cpu, const struct DisasContextBase *db, bool sup)
 {
     return false;
 }
diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
index 3d0b101e34..80dff68934 100644
--- a/accel/tcg/plugin-gen.c
+++ b/accel/tcg/plugin-gen.c
@@ -852,7 +852,8 @@ static void plugin_gen_inject(const struct qemu_plugin_tb *plugin_tb)
     pr_ops();
 }
 
-bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb, bool mem_only)
+bool plugin_gen_tb_start(CPUState *cpu, const DisasContextBase *db,
+                         bool mem_only)
 {
     bool ret = false;
 
@@ -870,9 +871,9 @@ bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb, bool mem_onl
 
         ret = true;
 
-        ptb->vaddr = tb->pc;
+        ptb->vaddr = db->pc_first;
         ptb->vaddr2 = -1;
-        get_page_addr_code_hostp(cpu->env_ptr, tb->pc, &ptb->haddr1);
+        ptb->haddr1 = db->host_addr[0];
         ptb->haddr2 = NULL;
         ptb->mem_only = mem_only;
 
@@ -898,16 +899,15 @@ void plugin_gen_insn_start(CPUState *cpu, const DisasContextBase *db)
      * Note that we skip this when haddr1 == NULL, e.g. when we're
      * fetching instructions from a region not backed by RAM.
      */
-    if (likely(ptb->haddr1 != NULL && ptb->vaddr2 == -1) &&
-        unlikely((db->pc_next & TARGET_PAGE_MASK) !=
-                 (db->pc_first & TARGET_PAGE_MASK))) {
-        get_page_addr_code_hostp(cpu->env_ptr, db->pc_next,
-                                 &ptb->haddr2);
-        ptb->vaddr2 = db->pc_next;
-    }
-    if (likely(ptb->vaddr2 == -1)) {
+    if (ptb->haddr1 == NULL) {
+        pinsn->haddr = NULL;
+    } else if (is_same_page(db, db->pc_next)) {
         pinsn->haddr = ptb->haddr1 + pinsn->vaddr - ptb->vaddr;
     } else {
+        if (ptb->vaddr2 == -1) {
+            ptb->vaddr2 = TARGET_PAGE_ALIGN(db->pc_first);
+            get_page_addr_code_hostp(cpu->env_ptr, ptb->vaddr2, &ptb->haddr2);
+        }
         pinsn->haddr = ptb->haddr2 + pinsn->vaddr - ptb->vaddr2;
     }
 }
diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
index ca8a5f2d83..8e78fd7a9c 100644
--- a/accel/tcg/translator.c
+++ b/accel/tcg/translator.c
@@ -75,7 +75,7 @@ void translator_loop(CPUState *cpu, TranslationBlock *tb, int max_insns,
     ops->tb_start(db, cpu);
     tcg_debug_assert(db->is_jmp == DISAS_NEXT);  /* no early exit */
 
-    plugin_enabled = plugin_gen_tb_start(cpu, tb, cflags & CF_MEMI_ONLY);
+    plugin_enabled = plugin_gen_tb_start(cpu, db, cflags & CF_MEMI_ONLY);
 
     while (true) {
         db->num_insns++;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 13/17] accel/tcg: Do not align tb->page_addr[0]
  2022-09-25 10:51 [PATCH v5 00/17] tcg: CPUTLBEntryFull and TARGET_TB_PCREL Richard Henderson
                   ` (11 preceding siblings ...)
  2022-09-25 10:51 ` [PATCH v5 12/17] accel/tcg: Use DisasContextBase in plugin_gen_tb_start Richard Henderson
@ 2022-09-25 10:51 ` Richard Henderson
  2022-09-25 10:51 ` [PATCH v5 14/17] include/hw/core: Create struct CPUJumpCache Richard Henderson
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 38+ messages in thread
From: Richard Henderson @ 2022-09-25 10:51 UTC (permalink / raw)
  To: qemu-devel

Let tb->page_addr[0] contain the offset within the page of the
start of the translation block.  We need to recover this value
anyway at various points, and it is easier to discard the page
offset when it's not needed, which happens naturally via the
existing find_page shift.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/cpu-exec.c      | 16 ++++++++--------
 accel/tcg/cputlb.c        |  3 ++-
 accel/tcg/translate-all.c |  9 +++++----
 3 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 5f43b9769a..dd58a144a8 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -174,7 +174,7 @@ struct tb_desc {
     target_ulong pc;
     target_ulong cs_base;
     CPUArchState *env;
-    tb_page_addr_t phys_page1;
+    tb_page_addr_t page_addr0;
     uint32_t flags;
     uint32_t cflags;
     uint32_t trace_vcpu_dstate;
@@ -186,7 +186,7 @@ static bool tb_lookup_cmp(const void *p, const void *d)
     const struct tb_desc *desc = d;
 
     if (tb->pc == desc->pc &&
-        tb->page_addr[0] == desc->phys_page1 &&
+        tb->page_addr[0] == desc->page_addr0 &&
         tb->cs_base == desc->cs_base &&
         tb->flags == desc->flags &&
         tb->trace_vcpu_dstate == desc->trace_vcpu_dstate &&
@@ -195,8 +195,8 @@ static bool tb_lookup_cmp(const void *p, const void *d)
         if (tb->page_addr[1] == -1) {
             return true;
         } else {
-            tb_page_addr_t phys_page2;
-            target_ulong virt_page2;
+            tb_page_addr_t phys_page1;
+            target_ulong virt_page1;
 
             /*
              * We know that the first page matched, and an otherwise valid TB
@@ -207,9 +207,9 @@ static bool tb_lookup_cmp(const void *p, const void *d)
              * is different for the new TB.  Therefore any exception raised
              * here by the faulting lookup is not premature.
              */
-            virt_page2 = TARGET_PAGE_ALIGN(desc->pc);
-            phys_page2 = get_page_addr_code(desc->env, virt_page2);
-            if (tb->page_addr[1] == phys_page2) {
+            virt_page1 = TARGET_PAGE_ALIGN(desc->pc);
+            phys_page1 = get_page_addr_code(desc->env, virt_page1);
+            if (tb->page_addr[1] == phys_page1) {
                 return true;
             }
         }
@@ -235,7 +235,7 @@ static TranslationBlock *tb_htable_lookup(CPUState *cpu, target_ulong pc,
     if (phys_pc == -1) {
         return NULL;
     }
-    desc.phys_page1 = phys_pc & TARGET_PAGE_MASK;
+    desc.page_addr0 = phys_pc;
     h = tb_hash_func(phys_pc, pc, flags, cflags, *cpu->trace_dstate);
     return qht_lookup_custom(&tb_ctx.htable, &desc, h, tb_lookup_cmp);
 }
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 01a89b4a1f..f5e6ca2da2 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -951,7 +951,8 @@ void tlb_flush_page_bits_by_mmuidx_all_cpus_synced(CPUState *src_cpu,
    can be detected */
 void tlb_protect_code(ram_addr_t ram_addr)
 {
-    cpu_physical_memory_test_and_clear_dirty(ram_addr, TARGET_PAGE_SIZE,
+    cpu_physical_memory_test_and_clear_dirty(ram_addr & TARGET_PAGE_MASK,
+                                             TARGET_PAGE_SIZE,
                                              DIRTY_MEMORY_CODE);
 }
 
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 69f39d669d..f429d33981 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1167,7 +1167,7 @@ static void do_tb_phys_invalidate(TranslationBlock *tb, bool rm_from_page_list)
     qemu_spin_unlock(&tb->jmp_lock);
 
     /* remove the TB from the hash list */
-    phys_pc = tb->page_addr[0] + (tb->pc & ~TARGET_PAGE_MASK);
+    phys_pc = tb->page_addr[0];
     h = tb_hash_func(phys_pc, tb->pc, tb->flags, orig_cflags,
                      tb->trace_vcpu_dstate);
     if (!qht_remove(&tb_ctx.htable, tb, h)) {
@@ -1291,7 +1291,7 @@ tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
      * we can only insert TBs that are fully initialized.
      */
     page_lock_pair(&p, phys_pc, &p2, phys_page2, true);
-    tb_page_add(p, tb, 0, phys_pc & TARGET_PAGE_MASK);
+    tb_page_add(p, tb, 0, phys_pc);
     if (p2) {
         tb_page_add(p2, tb, 1, phys_page2);
     } else {
@@ -1644,11 +1644,12 @@ tb_invalidate_phys_page_range__locked(struct page_collection *pages,
         if (n == 0) {
             /* NOTE: tb_end may be after the end of the page, but
                it is not a problem */
-            tb_start = tb->page_addr[0] + (tb->pc & ~TARGET_PAGE_MASK);
+            tb_start = tb->page_addr[0];
             tb_end = tb_start + tb->size;
         } else {
             tb_start = tb->page_addr[1];
-            tb_end = tb_start + ((tb->pc + tb->size) & ~TARGET_PAGE_MASK);
+            tb_end = tb_start + ((tb->page_addr[0] + tb->size)
+                                 & ~TARGET_PAGE_MASK);
         }
         if (!(tb_end <= start || tb_start >= end)) {
 #ifdef TARGET_HAS_PRECISE_SMC
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 14/17] include/hw/core: Create struct CPUJumpCache
  2022-09-25 10:51 [PATCH v5 00/17] tcg: CPUTLBEntryFull and TARGET_TB_PCREL Richard Henderson
                   ` (12 preceding siblings ...)
  2022-09-25 10:51 ` [PATCH v5 13/17] accel/tcg: Do not align tb->page_addr[0] Richard Henderson
@ 2022-09-25 10:51 ` Richard Henderson
  2022-09-29 13:46   ` Alex Bennée
  2022-09-25 10:51 ` [PATCH v5 15/17] accel/tcg: Introduce tb_pc and tb_pc_log Richard Henderson
                   ` (3 subsequent siblings)
  17 siblings, 1 reply; 38+ messages in thread
From: Richard Henderson @ 2022-09-25 10:51 UTC (permalink / raw)
  To: qemu-devel; +Cc: Philippe Mathieu-Daudé

Wrap the bare TranslationBlock pointer into a structure.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/hw/core/cpu.h     | 8 ++++++--
 accel/tcg/cpu-exec.c      | 9 ++++++---
 accel/tcg/cputlb.c        | 2 +-
 accel/tcg/translate-all.c | 4 ++--
 4 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index 9e47184513..ee5b75dea0 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -232,6 +232,10 @@ struct hvf_vcpu_state;
 #define TB_JMP_CACHE_BITS 12
 #define TB_JMP_CACHE_SIZE (1 << TB_JMP_CACHE_BITS)
 
+typedef struct {
+    TranslationBlock *tb;
+} CPUJumpCache;
+
 /* work queue */
 
 /* The union type allows passing of 64 bit target pointers on 32 bit
@@ -361,7 +365,7 @@ struct CPUState {
     IcountDecr *icount_decr_ptr;
 
     /* Accessed in parallel; all accesses must be atomic */
-    TranslationBlock *tb_jmp_cache[TB_JMP_CACHE_SIZE];
+    CPUJumpCache tb_jmp_cache[TB_JMP_CACHE_SIZE];
 
     struct GDBRegisterState *gdb_regs;
     int gdb_num_regs;
@@ -452,7 +456,7 @@ static inline void cpu_tb_jmp_cache_clear(CPUState *cpu)
     unsigned int i;
 
     for (i = 0; i < TB_JMP_CACHE_SIZE; i++) {
-        qatomic_set(&cpu->tb_jmp_cache[i], NULL);
+        qatomic_set(&cpu->tb_jmp_cache[i].tb, NULL);
     }
 }
 
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index dd58a144a8..c6283d5798 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -252,7 +252,7 @@ static inline TranslationBlock *tb_lookup(CPUState *cpu, target_ulong pc,
     tcg_debug_assert(!(cflags & CF_INVALID));
 
     hash = tb_jmp_cache_hash_func(pc);
-    tb = qatomic_rcu_read(&cpu->tb_jmp_cache[hash]);
+    tb = qatomic_rcu_read(&cpu->tb_jmp_cache[hash].tb);
 
     if (likely(tb &&
                tb->pc == pc &&
@@ -266,7 +266,7 @@ static inline TranslationBlock *tb_lookup(CPUState *cpu, target_ulong pc,
     if (tb == NULL) {
         return NULL;
     }
-    qatomic_set(&cpu->tb_jmp_cache[hash], tb);
+    qatomic_set(&cpu->tb_jmp_cache[hash].tb, tb);
     return tb;
 }
 
@@ -987,6 +987,8 @@ int cpu_exec(CPUState *cpu)
 
             tb = tb_lookup(cpu, pc, cs_base, flags, cflags);
             if (tb == NULL) {
+                uint32_t h;
+
                 mmap_lock();
                 tb = tb_gen_code(cpu, pc, cs_base, flags, cflags);
                 mmap_unlock();
@@ -994,7 +996,8 @@ int cpu_exec(CPUState *cpu)
                  * We add the TB in the virtual pc hash table
                  * for the fast lookup
                  */
-                qatomic_set(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)], tb);
+                h = tb_jmp_cache_hash_func(pc);
+                qatomic_set(&cpu->tb_jmp_cache[h].tb, tb);
             }
 
 #ifndef CONFIG_USER_ONLY
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index f5e6ca2da2..fb8f3087f1 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -103,7 +103,7 @@ static void tb_jmp_cache_clear_page(CPUState *cpu, target_ulong page_addr)
     unsigned int i, i0 = tb_jmp_cache_hash_page(page_addr);
 
     for (i = 0; i < TB_JMP_PAGE_SIZE; i++) {
-        qatomic_set(&cpu->tb_jmp_cache[i0 + i], NULL);
+        qatomic_set(&cpu->tb_jmp_cache[i0 + i].tb, NULL);
     }
 }
 
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index f429d33981..efa479ccf3 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1187,8 +1187,8 @@ static void do_tb_phys_invalidate(TranslationBlock *tb, bool rm_from_page_list)
     /* remove the TB from the hash list */
     h = tb_jmp_cache_hash_func(tb->pc);
     CPU_FOREACH(cpu) {
-        if (qatomic_read(&cpu->tb_jmp_cache[h]) == tb) {
-            qatomic_set(&cpu->tb_jmp_cache[h], NULL);
+        if (qatomic_read(&cpu->tb_jmp_cache[h].tb) == tb) {
+            qatomic_set(&cpu->tb_jmp_cache[h].tb, NULL);
         }
     }
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 15/17] accel/tcg: Introduce tb_pc and tb_pc_log
  2022-09-25 10:51 [PATCH v5 00/17] tcg: CPUTLBEntryFull and TARGET_TB_PCREL Richard Henderson
                   ` (13 preceding siblings ...)
  2022-09-25 10:51 ` [PATCH v5 14/17] include/hw/core: Create struct CPUJumpCache Richard Henderson
@ 2022-09-25 10:51 ` Richard Henderson
  2022-09-25 10:51 ` [PATCH v5 16/17] accel/tcg: Introduce TARGET_TB_PCREL Richard Henderson
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 38+ messages in thread
From: Richard Henderson @ 2022-09-25 10:51 UTC (permalink / raw)
  To: qemu-devel

The availability of tb->pc will shortly be conditional.
Introduce accessor functions to minimize ifdefs.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/exec/exec-all.h                 | 12 ++++++++++
 accel/tcg/cpu-exec.c                    | 20 ++++++++---------
 accel/tcg/translate-all.c               | 29 +++++++++++++------------
 target/arm/cpu.c                        |  4 ++--
 target/avr/cpu.c                        |  2 +-
 target/hexagon/cpu.c                    |  2 +-
 target/hppa/cpu.c                       |  4 ++--
 target/i386/tcg/tcg-cpu.c               |  2 +-
 target/loongarch/cpu.c                  |  2 +-
 target/microblaze/cpu.c                 |  2 +-
 target/mips/tcg/exception.c             |  2 +-
 target/mips/tcg/sysemu/special_helper.c |  2 +-
 target/openrisc/cpu.c                   |  2 +-
 target/riscv/cpu.c                      |  4 ++--
 target/rx/cpu.c                         |  2 +-
 target/sh4/cpu.c                        |  4 ++--
 target/sparc/cpu.c                      |  2 +-
 target/tricore/cpu.c                    |  2 +-
 tcg/tcg.c                               |  6 ++---
 19 files changed, 59 insertions(+), 46 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 0f25a1ba85..dc72615ad4 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -566,6 +566,18 @@ struct TranslationBlock {
     uintptr_t jmp_dest[2];
 };
 
+/* Hide the read to avoid ifdefs for TARGET_TB_PCREL. */
+static inline target_ulong tb_pc(const TranslationBlock *tb)
+{
+    return tb->pc;
+}
+
+/* Similarly, but for logs. */
+static inline target_ulong tb_pc_log(const TranslationBlock *tb)
+{
+    return tb->pc;
+}
+
 /* Hide the qatomic_read to make code a little easier on the eyes */
 static inline uint32_t tb_cflags(const TranslationBlock *tb)
 {
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index c6283d5798..2cf84952e1 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -185,7 +185,7 @@ static bool tb_lookup_cmp(const void *p, const void *d)
     const TranslationBlock *tb = p;
     const struct tb_desc *desc = d;
 
-    if (tb->pc == desc->pc &&
+    if (tb_pc(tb) == desc->pc &&
         tb->page_addr[0] == desc->page_addr0 &&
         tb->cs_base == desc->cs_base &&
         tb->flags == desc->flags &&
@@ -422,7 +422,7 @@ cpu_tb_exec(CPUState *cpu, TranslationBlock *itb, int *tb_exit)
     TranslationBlock *last_tb;
     const void *tb_ptr = itb->tc.ptr;
 
-    log_cpu_exec(itb->pc, cpu, itb);
+    log_cpu_exec(tb_pc_log(itb), cpu, itb);
 
     qemu_thread_jit_execute();
     ret = tcg_qemu_tb_exec(env, tb_ptr);
@@ -446,16 +446,16 @@ cpu_tb_exec(CPUState *cpu, TranslationBlock *itb, int *tb_exit)
          * of the start of the TB.
          */
         CPUClass *cc = CPU_GET_CLASS(cpu);
-        qemu_log_mask_and_addr(CPU_LOG_EXEC, last_tb->pc,
+        qemu_log_mask_and_addr(CPU_LOG_EXEC, tb_pc_log(last_tb),
                                "Stopped execution of TB chain before %p ["
                                TARGET_FMT_lx "] %s\n",
-                               last_tb->tc.ptr, last_tb->pc,
-                               lookup_symbol(last_tb->pc));
+                               last_tb->tc.ptr, tb_pc_log(last_tb),
+                               lookup_symbol(tb_pc_log(last_tb)));
         if (cc->tcg_ops->synchronize_from_tb) {
             cc->tcg_ops->synchronize_from_tb(cpu, last_tb);
         } else {
             assert(cc->set_pc);
-            cc->set_pc(cpu, last_tb->pc);
+            cc->set_pc(cpu, tb_pc(last_tb));
         }
     }
 
@@ -597,11 +597,11 @@ static inline void tb_add_jump(TranslationBlock *tb, int n,
 
     qemu_spin_unlock(&tb_next->jmp_lock);
 
-    qemu_log_mask_and_addr(CPU_LOG_EXEC, tb->pc,
+    qemu_log_mask_and_addr(CPU_LOG_EXEC, tb_pc_log(tb),
                            "Linking TBs %p [" TARGET_FMT_lx
                            "] index %d -> %p [" TARGET_FMT_lx "]\n",
-                           tb->tc.ptr, tb->pc, n,
-                           tb_next->tc.ptr, tb_next->pc);
+                           tb->tc.ptr, tb_pc_log(tb), n,
+                           tb_next->tc.ptr, tb_pc_log(tb_next));
     return;
 
  out_unlock_next:
@@ -851,7 +851,7 @@ static inline void cpu_loop_exec_tb(CPUState *cpu, TranslationBlock *tb,
 {
     int32_t insns_left;
 
-    trace_exec_tb(tb, tb->pc);
+    trace_exec_tb(tb, tb_pc_log(tb));
     tb = cpu_tb_exec(cpu, tb, tb_exit);
     if (*tb_exit != TB_EXIT_REQUESTED) {
         *last_tb = tb;
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index efa479ccf3..9e57822b44 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -298,7 +298,7 @@ static int encode_search(TranslationBlock *tb, uint8_t *block)
 
         for (j = 0; j < TARGET_INSN_START_WORDS; ++j) {
             if (i == 0) {
-                prev = (j == 0 ? tb->pc : 0);
+                prev = (j == 0 ? tb_pc(tb) : 0);
             } else {
                 prev = tcg_ctx->gen_insn_data[i - 1][j];
             }
@@ -326,7 +326,7 @@ static int encode_search(TranslationBlock *tb, uint8_t *block)
 static int cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
                                      uintptr_t searched_pc, bool reset_icount)
 {
-    target_ulong data[TARGET_INSN_START_WORDS] = { tb->pc };
+    target_ulong data[TARGET_INSN_START_WORDS] = { tb_pc(tb) };
     uintptr_t host_pc = (uintptr_t)tb->tc.ptr;
     CPUArchState *env = cpu->env_ptr;
     const uint8_t *p = tb->tc.ptr + tb->tc.size;
@@ -884,7 +884,7 @@ static bool tb_cmp(const void *ap, const void *bp)
     const TranslationBlock *a = ap;
     const TranslationBlock *b = bp;
 
-    return a->pc == b->pc &&
+    return tb_pc(a) == tb_pc(b) &&
         a->cs_base == b->cs_base &&
         a->flags == b->flags &&
         (tb_cflags(a) & ~CF_INVALID) == (tb_cflags(b) & ~CF_INVALID) &&
@@ -1012,9 +1012,10 @@ static void do_tb_invalidate_check(void *p, uint32_t hash, void *userp)
     TranslationBlock *tb = p;
     target_ulong addr = *(target_ulong *)userp;
 
-    if (!(addr + TARGET_PAGE_SIZE <= tb->pc || addr >= tb->pc + tb->size)) {
+    if (!(addr + TARGET_PAGE_SIZE <= tb_pc(tb) ||
+          addr >= tb_pc(tb) + tb->size)) {
         printf("ERROR invalidate: address=" TARGET_FMT_lx
-               " PC=%08lx size=%04x\n", addr, (long)tb->pc, tb->size);
+               " PC=%08lx size=%04x\n", addr, (long)tb_pc(tb), tb->size);
     }
 }
 
@@ -1033,11 +1034,11 @@ static void do_tb_page_check(void *p, uint32_t hash, void *userp)
     TranslationBlock *tb = p;
     int flags1, flags2;
 
-    flags1 = page_get_flags(tb->pc);
-    flags2 = page_get_flags(tb->pc + tb->size - 1);
+    flags1 = page_get_flags(tb_pc(tb));
+    flags2 = page_get_flags(tb_pc(tb) + tb->size - 1);
     if ((flags1 & PAGE_WRITE) || (flags2 & PAGE_WRITE)) {
         printf("ERROR page flags: PC=%08lx size=%04x f1=%x f2=%x\n",
-               (long)tb->pc, tb->size, flags1, flags2);
+               (long)tb_pc(tb), tb->size, flags1, flags2);
     }
 }
 
@@ -1168,7 +1169,7 @@ static void do_tb_phys_invalidate(TranslationBlock *tb, bool rm_from_page_list)
 
     /* remove the TB from the hash list */
     phys_pc = tb->page_addr[0];
-    h = tb_hash_func(phys_pc, tb->pc, tb->flags, orig_cflags,
+    h = tb_hash_func(phys_pc, tb_pc(tb), tb->flags, orig_cflags,
                      tb->trace_vcpu_dstate);
     if (!qht_remove(&tb_ctx.htable, tb, h)) {
         return;
@@ -1299,7 +1300,7 @@ tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
     }
 
     /* add in the hash table */
-    h = tb_hash_func(phys_pc, tb->pc, tb->flags, tb->cflags,
+    h = tb_hash_func(phys_pc, tb_pc(tb), tb->flags, tb->cflags,
                      tb->trace_vcpu_dstate);
     qht_insert(&tb_ctx.htable, tb, h, &existing_tb);
 
@@ -1399,7 +1400,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
     tcg_ctx->cpu = NULL;
     max_insns = tb->icount;
 
-    trace_translate_block(tb, tb->pc, tb->tc.ptr);
+    trace_translate_block(tb, tb_pc_log(tb), tb->tc.ptr);
 
     /* generate machine code */
     tb->jmp_reset_offset[0] = TB_JMP_RESET_OFFSET_INVALID;
@@ -1476,7 +1477,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
 
 #ifdef DEBUG_DISAS
     if (qemu_loglevel_mask(CPU_LOG_TB_OUT_ASM) &&
-        qemu_log_in_addr_range(tb->pc)) {
+        qemu_log_in_addr_range(tb_pc_log(tb))) {
         FILE *logfile = qemu_log_trylock();
         if (logfile) {
             int code_size, data_size;
@@ -1916,9 +1917,9 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
      */
     cpu->cflags_next_tb = curr_cflags(cpu) | CF_MEMI_ONLY | CF_LAST_IO | n;
 
-    qemu_log_mask_and_addr(CPU_LOG_EXEC, tb->pc,
+    qemu_log_mask_and_addr(CPU_LOG_EXEC, tb_pc_log(tb),
                            "cpu_io_recompile: rewound execution of TB to "
-                           TARGET_FMT_lx "\n", tb->pc);
+                           TARGET_FMT_lx "\n", tb_pc_log(tb));
 
     cpu_loop_exit_noexc(cpu);
 }
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 7ec3281da9..047bf3f4ab 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -72,9 +72,9 @@ void arm_cpu_synchronize_from_tb(CPUState *cs,
      * never possible for an AArch64 TB to chain to an AArch32 TB.
      */
     if (is_a64(env)) {
-        env->pc = tb->pc;
+        env->pc = tb_pc(tb);
     } else {
-        env->regs[15] = tb->pc;
+        env->regs[15] = tb_pc(tb);
     }
 }
 #endif /* CONFIG_TCG */
diff --git a/target/avr/cpu.c b/target/avr/cpu.c
index 05b992ff73..6ebef62b4c 100644
--- a/target/avr/cpu.c
+++ b/target/avr/cpu.c
@@ -47,7 +47,7 @@ static void avr_cpu_synchronize_from_tb(CPUState *cs,
     AVRCPU *cpu = AVR_CPU(cs);
     CPUAVRState *env = &cpu->env;
 
-    env->pc_w = tb->pc / 2; /* internally PC points to words */
+    env->pc_w = tb_pc(tb) / 2; /* internally PC points to words */
 }
 
 static void avr_cpu_reset(DeviceState *ds)
diff --git a/target/hexagon/cpu.c b/target/hexagon/cpu.c
index fa9bd702d6..6289a6e64a 100644
--- a/target/hexagon/cpu.c
+++ b/target/hexagon/cpu.c
@@ -256,7 +256,7 @@ static void hexagon_cpu_synchronize_from_tb(CPUState *cs,
 {
     HexagonCPU *cpu = HEXAGON_CPU(cs);
     CPUHexagonState *env = &cpu->env;
-    env->gpr[HEX_REG_PC] = tb->pc;
+    env->gpr[HEX_REG_PC] = tb_pc(tb);
 }
 
 static bool hexagon_cpu_has_work(CPUState *cs)
diff --git a/target/hppa/cpu.c b/target/hppa/cpu.c
index a6f52caf14..fc9d43f620 100644
--- a/target/hppa/cpu.c
+++ b/target/hppa/cpu.c
@@ -42,7 +42,7 @@ static void hppa_cpu_synchronize_from_tb(CPUState *cs,
     HPPACPU *cpu = HPPA_CPU(cs);
 
 #ifdef CONFIG_USER_ONLY
-    cpu->env.iaoq_f = tb->pc;
+    cpu->env.iaoq_f = tb_pc(tb);
     cpu->env.iaoq_b = tb->cs_base;
 #else
     /* Recover the IAOQ values from the GVA + PRIV.  */
@@ -52,7 +52,7 @@ static void hppa_cpu_synchronize_from_tb(CPUState *cs,
     int32_t diff = cs_base;
 
     cpu->env.iasq_f = iasq_f;
-    cpu->env.iaoq_f = (tb->pc & ~iasq_f) + priv;
+    cpu->env.iaoq_f = (tb_pc(tb) & ~iasq_f) + priv;
     if (diff) {
         cpu->env.iaoq_b = cpu->env.iaoq_f + diff;
     }
diff --git a/target/i386/tcg/tcg-cpu.c b/target/i386/tcg/tcg-cpu.c
index d3c2b8fb49..6cf14c83ff 100644
--- a/target/i386/tcg/tcg-cpu.c
+++ b/target/i386/tcg/tcg-cpu.c
@@ -51,7 +51,7 @@ static void x86_cpu_synchronize_from_tb(CPUState *cs,
 {
     X86CPU *cpu = X86_CPU(cs);
 
-    cpu->env.eip = tb->pc - tb->cs_base;
+    cpu->env.eip = tb_pc(tb) - tb->cs_base;
 }
 
 #ifndef CONFIG_USER_ONLY
diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index 941e2772bc..262ddfb51c 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -309,7 +309,7 @@ static void loongarch_cpu_synchronize_from_tb(CPUState *cs,
     LoongArchCPU *cpu = LOONGARCH_CPU(cs);
     CPULoongArchState *env = &cpu->env;
 
-    env->pc = tb->pc;
+    env->pc = tb_pc(tb);
 }
 #endif /* CONFIG_TCG */
 
diff --git a/target/microblaze/cpu.c b/target/microblaze/cpu.c
index aed200dcff..5a642db285 100644
--- a/target/microblaze/cpu.c
+++ b/target/microblaze/cpu.c
@@ -89,7 +89,7 @@ static void mb_cpu_synchronize_from_tb(CPUState *cs,
 {
     MicroBlazeCPU *cpu = MICROBLAZE_CPU(cs);
 
-    cpu->env.pc = tb->pc;
+    cpu->env.pc = tb_pc(tb);
     cpu->env.iflags = tb->flags & IFLAGS_TB_MASK;
 }
 
diff --git a/target/mips/tcg/exception.c b/target/mips/tcg/exception.c
index 2bd77a61de..96e61170e6 100644
--- a/target/mips/tcg/exception.c
+++ b/target/mips/tcg/exception.c
@@ -82,7 +82,7 @@ void mips_cpu_synchronize_from_tb(CPUState *cs, const TranslationBlock *tb)
     MIPSCPU *cpu = MIPS_CPU(cs);
     CPUMIPSState *env = &cpu->env;
 
-    env->active_tc.PC = tb->pc;
+    env->active_tc.PC = tb_pc(tb);
     env->hflags &= ~MIPS_HFLAG_BMASK;
     env->hflags |= tb->flags & MIPS_HFLAG_BMASK;
 }
diff --git a/target/mips/tcg/sysemu/special_helper.c b/target/mips/tcg/sysemu/special_helper.c
index f4f8fe8afc..3c5f35c759 100644
--- a/target/mips/tcg/sysemu/special_helper.c
+++ b/target/mips/tcg/sysemu/special_helper.c
@@ -94,7 +94,7 @@ bool mips_io_recompile_replay_branch(CPUState *cs, const TranslationBlock *tb)
     CPUMIPSState *env = &cpu->env;
 
     if ((env->hflags & MIPS_HFLAG_BMASK) != 0
-        && env->active_tc.PC != tb->pc) {
+        && env->active_tc.PC != tb_pc(tb)) {
         env->active_tc.PC -= (env->hflags & MIPS_HFLAG_B16 ? 2 : 4);
         env->hflags &= ~MIPS_HFLAG_BMASK;
         return true;
diff --git a/target/openrisc/cpu.c b/target/openrisc/cpu.c
index cb9f35f408..7bba181420 100644
--- a/target/openrisc/cpu.c
+++ b/target/openrisc/cpu.c
@@ -36,7 +36,7 @@ static void openrisc_cpu_synchronize_from_tb(CPUState *cs,
 {
     OpenRISCCPU *cpu = OPENRISC_CPU(cs);
 
-    cpu->env.pc = tb->pc;
+    cpu->env.pc = tb_pc(tb);
 }
 
 
diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index aee14a239a..103aedefcd 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -478,9 +478,9 @@ static void riscv_cpu_synchronize_from_tb(CPUState *cs,
     RISCVMXL xl = FIELD_EX32(tb->flags, TB_FLAGS, XL);
 
     if (xl == MXL_RV32) {
-        env->pc = (int32_t)tb->pc;
+        env->pc = (int32_t)tb_pc(tb);
     } else {
-        env->pc = tb->pc;
+        env->pc = tb_pc(tb);
     }
 }
 
diff --git a/target/rx/cpu.c b/target/rx/cpu.c
index fb30080ac4..f1e0008e04 100644
--- a/target/rx/cpu.c
+++ b/target/rx/cpu.c
@@ -37,7 +37,7 @@ static void rx_cpu_synchronize_from_tb(CPUState *cs,
 {
     RXCPU *cpu = RX_CPU(cs);
 
-    cpu->env.pc = tb->pc;
+    cpu->env.pc = tb_pc(tb);
 }
 
 static bool rx_cpu_has_work(CPUState *cs)
diff --git a/target/sh4/cpu.c b/target/sh4/cpu.c
index 06b2691dc4..6948c8fa33 100644
--- a/target/sh4/cpu.c
+++ b/target/sh4/cpu.c
@@ -39,7 +39,7 @@ static void superh_cpu_synchronize_from_tb(CPUState *cs,
 {
     SuperHCPU *cpu = SUPERH_CPU(cs);
 
-    cpu->env.pc = tb->pc;
+    cpu->env.pc = tb_pc(tb);
     cpu->env.flags = tb->flags & TB_FLAG_ENVFLAGS_MASK;
 }
 
@@ -51,7 +51,7 @@ static bool superh_io_recompile_replay_branch(CPUState *cs,
     CPUSH4State *env = &cpu->env;
 
     if ((env->flags & ((DELAY_SLOT | DELAY_SLOT_CONDITIONAL))) != 0
-        && env->pc != tb->pc) {
+        && env->pc != tb_pc(tb)) {
         env->pc -= 2;
         env->flags &= ~(DELAY_SLOT | DELAY_SLOT_CONDITIONAL);
         return true;
diff --git a/target/sparc/cpu.c b/target/sparc/cpu.c
index 55268ed2a1..0471c2fe5a 100644
--- a/target/sparc/cpu.c
+++ b/target/sparc/cpu.c
@@ -698,7 +698,7 @@ static void sparc_cpu_synchronize_from_tb(CPUState *cs,
 {
     SPARCCPU *cpu = SPARC_CPU(cs);
 
-    cpu->env.pc = tb->pc;
+    cpu->env.pc = tb_pc(tb);
     cpu->env.npc = tb->cs_base;
 }
 
diff --git a/target/tricore/cpu.c b/target/tricore/cpu.c
index b95682b7f0..35f3347add 100644
--- a/target/tricore/cpu.c
+++ b/target/tricore/cpu.c
@@ -47,7 +47,7 @@ static void tricore_cpu_synchronize_from_tb(CPUState *cs,
     TriCoreCPU *cpu = TRICORE_CPU(cs);
     CPUTriCoreState *env = &cpu->env;
 
-    env->PC = tb->pc;
+    env->PC = tb_pc(tb);
 }
 
 static void tricore_cpu_reset(DeviceState *dev)
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 0f9cfe96f2..11bdb96dd1 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -4218,7 +4218,7 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
 
 #ifdef DEBUG_DISAS
     if (unlikely(qemu_loglevel_mask(CPU_LOG_TB_OP)
-                 && qemu_log_in_addr_range(tb->pc))) {
+                 && qemu_log_in_addr_range(tb_pc_log(tb)))) {
         FILE *logfile = qemu_log_trylock();
         if (logfile) {
             fprintf(logfile, "OP:\n");
@@ -4265,7 +4265,7 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
     if (s->nb_indirects > 0) {
 #ifdef DEBUG_DISAS
         if (unlikely(qemu_loglevel_mask(CPU_LOG_TB_OP_IND)
-                     && qemu_log_in_addr_range(tb->pc))) {
+                     && qemu_log_in_addr_range(tb_pc_log(tb)))) {
             FILE *logfile = qemu_log_trylock();
             if (logfile) {
                 fprintf(logfile, "OP before indirect lowering:\n");
@@ -4288,7 +4288,7 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
 
 #ifdef DEBUG_DISAS
     if (unlikely(qemu_loglevel_mask(CPU_LOG_TB_OP_OPT)
-                 && qemu_log_in_addr_range(tb->pc))) {
+                 && qemu_log_in_addr_range(tb_pc_log(tb)))) {
         FILE *logfile = qemu_log_trylock();
         if (logfile) {
             fprintf(logfile, "OP after optimization and liveness analysis:\n");
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 16/17] accel/tcg: Introduce TARGET_TB_PCREL
  2022-09-25 10:51 [PATCH v5 00/17] tcg: CPUTLBEntryFull and TARGET_TB_PCREL Richard Henderson
                   ` (14 preceding siblings ...)
  2022-09-25 10:51 ` [PATCH v5 15/17] accel/tcg: Introduce tb_pc and tb_pc_log Richard Henderson
@ 2022-09-25 10:51 ` Richard Henderson
  2022-09-30 12:02   ` Peter Maydell
  2022-09-25 10:51 ` [PATCH v5 17/17] accel/tcg: Split log_cpu_exec into inline and slow path Richard Henderson
  2022-09-29  2:16 ` [PATCH v5 00/17] tcg: CPUTLBEntryFull and TARGET_TB_PCREL Richard Henderson
  17 siblings, 1 reply; 38+ messages in thread
From: Richard Henderson @ 2022-09-25 10:51 UTC (permalink / raw)
  To: qemu-devel

Prepare for targets to be able to produce TBs that can
run in more than one virtual context.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/exec/cpu-defs.h   |  3 +++
 include/exec/exec-all.h   | 41 ++++++++++++++++++++++++++---
 include/hw/core/cpu.h     |  1 +
 accel/tcg/cpu-exec.c      | 55 ++++++++++++++++++++++++++++++---------
 accel/tcg/translate-all.c | 48 ++++++++++++++++++++++------------
 5 files changed, 115 insertions(+), 33 deletions(-)

diff --git a/include/exec/cpu-defs.h b/include/exec/cpu-defs.h
index d0acbb4d35..ef950b05c4 100644
--- a/include/exec/cpu-defs.h
+++ b/include/exec/cpu-defs.h
@@ -54,6 +54,9 @@
 #  error TARGET_PAGE_BITS must be defined in cpu-param.h
 # endif
 #endif
+#ifndef TARGET_TB_PCREL
+# define TARGET_TB_PCREL 0
+#endif
 
 #define TARGET_LONG_SIZE (TARGET_LONG_BITS / 8)
 
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index dc72615ad4..0edd1b1ab4 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -492,8 +492,32 @@ struct tb_tc {
 };
 
 struct TranslationBlock {
-    target_ulong pc;   /* simulated PC corresponding to this block (EIP + CS base) */
-    target_ulong cs_base; /* CS base for this block */
+#if !TARGET_TB_PCREL
+    /*
+     * Guest PC corresponding to this block.  This must be the true
+     * virtual address.  Therefore e.g. x86 stores EIP + CS_BASE, and
+     * targets like Arm, MIPS, HP-PA, which reuse low bits for ISA or
+     * privilege, must store those bits elsewhere.
+     *
+     * If TARGET_TB_PCREL, the opcodes for the TranslationBlock are
+     * written such that the TB is associated only with the physical
+     * page and may be run in any virtual address context.  In this case,
+     * PC must always be taken from ENV in a target-specific manner.
+     * Unwind information is taken as offsets from the page, to be
+     * deposited into the "current" PC.
+     */
+    target_ulong pc;
+#endif
+
+    /*
+     * Target-specific data associated with the TranslationBlock, e.g.:
+     * x86: the original user, the Code Segment virtual base,
+     * arm: an extension of tb->flags,
+     * s390x: instruction data for EXECUTE,
+     * sparc: the next pc of the instruction queue (for delay slots).
+     */
+    target_ulong cs_base;
+
     uint32_t flags; /* flags defining in which context the code was generated */
     uint32_t cflags;    /* compile flags */
 
@@ -569,13 +593,24 @@ struct TranslationBlock {
 /* Hide the read to avoid ifdefs for TARGET_TB_PCREL. */
 static inline target_ulong tb_pc(const TranslationBlock *tb)
 {
+#if TARGET_TB_PCREL
+    qemu_build_not_reached();
+#else
     return tb->pc;
+#endif
 }
 
-/* Similarly, but for logs. */
+/*
+ * Similarly, but for logs. In this case, when the virtual pc
+ * is not available, use the physical address.
+ */
 static inline target_ulong tb_pc_log(const TranslationBlock *tb)
 {
+#if TARGET_TB_PCREL
+    return tb->page_addr[0];
+#else
     return tb->pc;
+#endif
 }
 
 /* Hide the qatomic_read to make code a little easier on the eyes */
diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index ee5b75dea0..b73dd31495 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -234,6 +234,7 @@ struct hvf_vcpu_state;
 
 typedef struct {
     TranslationBlock *tb;
+    vaddr pc;
 } CPUJumpCache;
 
 /* work queue */
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 2cf84952e1..7fe42269ea 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -185,7 +185,7 @@ static bool tb_lookup_cmp(const void *p, const void *d)
     const TranslationBlock *tb = p;
     const struct tb_desc *desc = d;
 
-    if (tb_pc(tb) == desc->pc &&
+    if ((TARGET_TB_PCREL || tb_pc(tb) == desc->pc) &&
         tb->page_addr[0] == desc->page_addr0 &&
         tb->cs_base == desc->cs_base &&
         tb->flags == desc->flags &&
@@ -236,7 +236,8 @@ static TranslationBlock *tb_htable_lookup(CPUState *cpu, target_ulong pc,
         return NULL;
     }
     desc.page_addr0 = phys_pc;
-    h = tb_hash_func(phys_pc, pc, flags, cflags, *cpu->trace_dstate);
+    h = tb_hash_func(phys_pc, (TARGET_TB_PCREL ? 0 : pc),
+                     flags, cflags, *cpu->trace_dstate);
     return qht_lookup_custom(&tb_ctx.htable, &desc, h, tb_lookup_cmp);
 }
 
@@ -252,21 +253,42 @@ static inline TranslationBlock *tb_lookup(CPUState *cpu, target_ulong pc,
     tcg_debug_assert(!(cflags & CF_INVALID));
 
     hash = tb_jmp_cache_hash_func(pc);
-    tb = qatomic_rcu_read(&cpu->tb_jmp_cache[hash].tb);
-
-    if (likely(tb &&
-               tb->pc == pc &&
-               tb->cs_base == cs_base &&
-               tb->flags == flags &&
-               tb->trace_vcpu_dstate == *cpu->trace_dstate &&
-               tb_cflags(tb) == cflags)) {
-        return tb;
+    if (TARGET_TB_PCREL) {
+        /* Use acquire to ensure current load of pc from tb_jmp_cache[]. */
+        tb = qatomic_load_acquire(&cpu->tb_jmp_cache[hash].tb);
+    } else {
+        /* Use rcu_read to ensure current load of pc from *tb. */
+        tb = qatomic_rcu_read(&cpu->tb_jmp_cache[hash].tb);
     }
+    if (likely(tb)) {
+        target_ulong jmp_pc;
+
+        if (TARGET_TB_PCREL) {
+            jmp_pc = cpu->tb_jmp_cache[hash].pc;
+        } else {
+            jmp_pc = tb_pc(tb);
+        }
+        if (jmp_pc == pc &&
+            tb->cs_base == cs_base &&
+            tb->flags == flags &&
+            tb->trace_vcpu_dstate == *cpu->trace_dstate &&
+            tb_cflags(tb) == cflags) {
+            return tb;
+        }
+    }
+
     tb = tb_htable_lookup(cpu, pc, cs_base, flags, cflags);
     if (tb == NULL) {
         return NULL;
     }
-    qatomic_set(&cpu->tb_jmp_cache[hash].tb, tb);
+
+    if (TARGET_TB_PCREL) {
+        cpu->tb_jmp_cache[hash].pc = pc;
+        /* Use store_release on tb to ensure pc is current. */
+        qatomic_store_release(&cpu->tb_jmp_cache[hash].tb, tb);
+    } else {
+        qatomic_set(&cpu->tb_jmp_cache[hash].tb, tb);
+    }
     return tb;
 }
 
@@ -454,6 +476,7 @@ cpu_tb_exec(CPUState *cpu, TranslationBlock *itb, int *tb_exit)
         if (cc->tcg_ops->synchronize_from_tb) {
             cc->tcg_ops->synchronize_from_tb(cpu, last_tb);
         } else {
+            assert(!TARGET_TB_PCREL);
             assert(cc->set_pc);
             cc->set_pc(cpu, tb_pc(last_tb));
         }
@@ -997,7 +1020,13 @@ int cpu_exec(CPUState *cpu)
                  * for the fast lookup
                  */
                 h = tb_jmp_cache_hash_func(pc);
-                qatomic_set(&cpu->tb_jmp_cache[h].tb, tb);
+                if (TARGET_TB_PCREL) {
+                    cpu->tb_jmp_cache[h].pc = pc;
+                    /* Use store_release on tb to ensure pc is current. */
+                    qatomic_store_release(&cpu->tb_jmp_cache[h].tb, tb);
+                } else {
+                    qatomic_set(&cpu->tb_jmp_cache[h].tb, tb);
+                }
             }
 
 #ifndef CONFIG_USER_ONLY
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 9e57822b44..42e897b285 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -298,7 +298,7 @@ static int encode_search(TranslationBlock *tb, uint8_t *block)
 
         for (j = 0; j < TARGET_INSN_START_WORDS; ++j) {
             if (i == 0) {
-                prev = (j == 0 ? tb_pc(tb) : 0);
+                prev = (!TARGET_TB_PCREL && j == 0 ? tb_pc(tb) : 0);
             } else {
                 prev = tcg_ctx->gen_insn_data[i - 1][j];
             }
@@ -326,7 +326,7 @@ static int encode_search(TranslationBlock *tb, uint8_t *block)
 static int cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
                                      uintptr_t searched_pc, bool reset_icount)
 {
-    target_ulong data[TARGET_INSN_START_WORDS] = { tb_pc(tb) };
+    target_ulong data[TARGET_INSN_START_WORDS];
     uintptr_t host_pc = (uintptr_t)tb->tc.ptr;
     CPUArchState *env = cpu->env_ptr;
     const uint8_t *p = tb->tc.ptr + tb->tc.size;
@@ -342,6 +342,11 @@ static int cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
         return -1;
     }
 
+    memset(data, 0, sizeof(data));
+    if (!TARGET_TB_PCREL) {
+        data[0] = tb_pc(tb);
+    }
+
     /* Reconstruct the stored insn data while looking for the point at
        which the end of the insn exceeds the searched_pc.  */
     for (i = 0; i < num_insns; ++i) {
@@ -884,13 +889,13 @@ static bool tb_cmp(const void *ap, const void *bp)
     const TranslationBlock *a = ap;
     const TranslationBlock *b = bp;
 
-    return tb_pc(a) == tb_pc(b) &&
-        a->cs_base == b->cs_base &&
-        a->flags == b->flags &&
-        (tb_cflags(a) & ~CF_INVALID) == (tb_cflags(b) & ~CF_INVALID) &&
-        a->trace_vcpu_dstate == b->trace_vcpu_dstate &&
-        a->page_addr[0] == b->page_addr[0] &&
-        a->page_addr[1] == b->page_addr[1];
+    return ((TARGET_TB_PCREL || tb_pc(a) == tb_pc(b)) &&
+            a->cs_base == b->cs_base &&
+            a->flags == b->flags &&
+            (tb_cflags(a) & ~CF_INVALID) == (tb_cflags(b) & ~CF_INVALID) &&
+            a->trace_vcpu_dstate == b->trace_vcpu_dstate &&
+            a->page_addr[0] == b->page_addr[0] &&
+            a->page_addr[1] == b->page_addr[1]);
 }
 
 void tb_htable_init(void)
@@ -1169,8 +1174,8 @@ static void do_tb_phys_invalidate(TranslationBlock *tb, bool rm_from_page_list)
 
     /* remove the TB from the hash list */
     phys_pc = tb->page_addr[0];
-    h = tb_hash_func(phys_pc, tb_pc(tb), tb->flags, orig_cflags,
-                     tb->trace_vcpu_dstate);
+    h = tb_hash_func(phys_pc, (TARGET_TB_PCREL ? 0 : tb_pc(tb)),
+                     tb->flags, orig_cflags, tb->trace_vcpu_dstate);
     if (!qht_remove(&tb_ctx.htable, tb, h)) {
         return;
     }
@@ -1186,10 +1191,17 @@ static void do_tb_phys_invalidate(TranslationBlock *tb, bool rm_from_page_list)
     }
 
     /* remove the TB from the hash list */
-    h = tb_jmp_cache_hash_func(tb->pc);
-    CPU_FOREACH(cpu) {
-        if (qatomic_read(&cpu->tb_jmp_cache[h].tb) == tb) {
-            qatomic_set(&cpu->tb_jmp_cache[h].tb, NULL);
+    if (TARGET_TB_PCREL) {
+        /* Any TB may be at any virtual address */
+        CPU_FOREACH(cpu) {
+            cpu_tb_jmp_cache_clear(cpu);
+        }
+    } else {
+        h = tb_jmp_cache_hash_func(tb_pc(tb));
+        CPU_FOREACH(cpu) {
+            if (qatomic_read(&cpu->tb_jmp_cache[h].tb) == tb) {
+                qatomic_set(&cpu->tb_jmp_cache[h].tb, NULL);
+            }
         }
     }
 
@@ -1300,8 +1312,8 @@ tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
     }
 
     /* add in the hash table */
-    h = tb_hash_func(phys_pc, tb_pc(tb), tb->flags, tb->cflags,
-                     tb->trace_vcpu_dstate);
+    h = tb_hash_func(phys_pc, (TARGET_TB_PCREL ? 0 : tb_pc(tb)),
+                     tb->flags, tb->cflags, tb->trace_vcpu_dstate);
     qht_insert(&tb_ctx.htable, tb, h, &existing_tb);
 
     /* remove TB from the page(s) if we couldn't insert it */
@@ -1371,7 +1383,9 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
 
     gen_code_buf = tcg_ctx->code_gen_ptr;
     tb->tc.ptr = tcg_splitwx_to_rx(gen_code_buf);
+#if !TARGET_TB_PCREL
     tb->pc = pc;
+#endif
     tb->cs_base = cs_base;
     tb->flags = flags;
     tb->cflags = cflags;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 17/17] accel/tcg: Split log_cpu_exec into inline and slow path
  2022-09-25 10:51 [PATCH v5 00/17] tcg: CPUTLBEntryFull and TARGET_TB_PCREL Richard Henderson
                   ` (15 preceding siblings ...)
  2022-09-25 10:51 ` [PATCH v5 16/17] accel/tcg: Introduce TARGET_TB_PCREL Richard Henderson
@ 2022-09-25 10:51 ` Richard Henderson
  2022-09-29  2:16 ` [PATCH v5 00/17] tcg: CPUTLBEntryFull and TARGET_TB_PCREL Richard Henderson
  17 siblings, 0 replies; 38+ messages in thread
From: Richard Henderson @ 2022-09-25 10:51 UTC (permalink / raw)
  To: qemu-devel; +Cc: Philippe Mathieu-Daudé

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/cpu-exec.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 7fe42269ea..318dbc1a2c 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -292,12 +292,11 @@ static inline TranslationBlock *tb_lookup(CPUState *cpu, target_ulong pc,
     return tb;
 }
 
-static inline void log_cpu_exec(target_ulong pc, CPUState *cpu,
-                                const TranslationBlock *tb)
+static void log_cpu_exec_slow(CPUState *cpu, const TranslationBlock *tb)
 {
-    if (unlikely(qemu_loglevel_mask(CPU_LOG_TB_CPU | CPU_LOG_EXEC))
-        && qemu_log_in_addr_range(pc)) {
+    target_ulong pc = tb_pc_log(tb);
 
+    if (qemu_log_in_addr_range(pc)) {
         qemu_log_mask(CPU_LOG_EXEC,
                       "Trace %d: %p [" TARGET_FMT_lx
                       "/" TARGET_FMT_lx "/%08x/%08x] %s\n",
@@ -324,6 +323,13 @@ static inline void log_cpu_exec(target_ulong pc, CPUState *cpu,
     }
 }
 
+static inline void log_cpu_exec(CPUState *cpu, const TranslationBlock *tb)
+{
+    if (unlikely(qemu_loglevel_mask(CPU_LOG_TB_CPU | CPU_LOG_EXEC))) {
+        log_cpu_exec_slow(cpu, tb);
+    }
+}
+
 static bool check_for_breakpoints(CPUState *cpu, target_ulong pc,
                                   uint32_t *cflags)
 {
@@ -421,7 +427,7 @@ const void *HELPER(lookup_tb_ptr)(CPUArchState *env)
         return tcg_code_gen_epilogue;
     }
 
-    log_cpu_exec(pc, cpu, tb);
+    log_cpu_exec(cpu, tb);
 
     return tb->tc.ptr;
 }
@@ -444,7 +450,7 @@ cpu_tb_exec(CPUState *cpu, TranslationBlock *itb, int *tb_exit)
     TranslationBlock *last_tb;
     const void *tb_ptr = itb->tc.ptr;
 
-    log_cpu_exec(tb_pc_log(itb), cpu, itb);
+    log_cpu_exec(cpu, itb);
 
     qemu_thread_jit_execute();
     ret = tcg_qemu_tb_exec(env, tb_ptr);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 00/17] tcg: CPUTLBEntryFull and TARGET_TB_PCREL
  2022-09-25 10:51 [PATCH v5 00/17] tcg: CPUTLBEntryFull and TARGET_TB_PCREL Richard Henderson
                   ` (16 preceding siblings ...)
  2022-09-25 10:51 ` [PATCH v5 17/17] accel/tcg: Split log_cpu_exec into inline and slow path Richard Henderson
@ 2022-09-29  2:16 ` Richard Henderson
  2022-09-29  6:53   ` Mark Cave-Ayland
  17 siblings, 1 reply; 38+ messages in thread
From: Richard Henderson @ 2022-09-29  2:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alex Bennée, Philippe Mathieu-Daudé

On 9/25/22 03:51, Richard Henderson wrote:
> Smooshing these two patch sets back together for review bandwidth.
> I hope to make this the next tcg-next pull.
> 
> There are three from the first half, tlbentryfull, which are new.
> These are following a hallway conversation with Peter about bits
> in MemTxAttrs that are not actually related to memory transactions,
> and infrastructure to address a to-do in an Arm patch set.
> 
> There are a few patches from the second half, pcrel, that have not
> been reviewed.
> 
>    07-target-sparc-Use-tlb_set_page_full.patch
>    08-accel-tcg-Move-byte_swap-from-MemTxAttrs-to-CPUTL.patch
>    09-accel-tcg-Add-force_aligned-to-CPUTLBEntryFull.patch
>    10-accel-tcg-Remove-PageDesc-code_bitmap.patch
>    13-accel-tcg-Do-not-align-tb-page_addr-0.patch
>    15-accel-tcg-Introduce-tb_pc-and-tb_pc_log.patch
>    16-accel-tcg-Introduce-TARGET_TB_PCREL.patch

FWIW, the target/sparc patch fails (the peril of insufficiently tested airport updates), 
so I'm going to drop 7+8 until I have time to investigate.  I'm also going to drop patch 9 
for now, and present it alongside the Arm patch that will use it.

But otherwise, gentle ping.


r~


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 00/17] tcg: CPUTLBEntryFull and TARGET_TB_PCREL
  2022-09-29  2:16 ` [PATCH v5 00/17] tcg: CPUTLBEntryFull and TARGET_TB_PCREL Richard Henderson
@ 2022-09-29  6:53   ` Mark Cave-Ayland
  0 siblings, 0 replies; 38+ messages in thread
From: Mark Cave-Ayland @ 2022-09-29  6:53 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: Alex Bennée, Philippe Mathieu-Daudé

On 29/09/2022 03:16, Richard Henderson wrote:

> On 9/25/22 03:51, Richard Henderson wrote:
>> Smooshing these two patch sets back together for review bandwidth.
>> I hope to make this the next tcg-next pull.
>>
>> There are three from the first half, tlbentryfull, which are new.
>> These are following a hallway conversation with Peter about bits
>> in MemTxAttrs that are not actually related to memory transactions,
>> and infrastructure to address a to-do in an Arm patch set.
>>
>> There are a few patches from the second half, pcrel, that have not
>> been reviewed.
>>
>>    07-target-sparc-Use-tlb_set_page_full.patch
>>    08-accel-tcg-Move-byte_swap-from-MemTxAttrs-to-CPUTL.patch
>>    09-accel-tcg-Add-force_aligned-to-CPUTLBEntryFull.patch
>>    10-accel-tcg-Remove-PageDesc-code_bitmap.patch
>>    13-accel-tcg-Do-not-align-tb-page_addr-0.patch
>>    15-accel-tcg-Introduce-tb_pc-and-tb_pc_log.patch
>>    16-accel-tcg-Introduce-TARGET_TB_PCREL.patch
> 
> FWIW, the target/sparc patch fails (the peril of insufficiently tested airport 
> updates), so I'm going to drop 7+8 until I have time to investigate.  I'm also going 
> to drop patch 9 for now, and present it alongside the Arm patch that will use it.

If it helps, the test case that exercises the IE (Invert Endian) page bit in 
target/sparc is Milax (OpenSolaris) which uses it to map PCI devices. Grab a suitable 
copy of milax032sparc.iso and test with:

     qemu-system-sparc64 -m 256 -cdrom milax032sparc.iso -nographic -boot d

You should see the IE pages accessed at the point where it starts poking the CDROM 
device, and timeouts if it isn't working.


ATB,

Mark.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 01/17] accel/tcg: Rename CPUIOTLBEntry to CPUTLBEntryFull
  2022-09-25 10:51 ` [PATCH v5 01/17] accel/tcg: Rename CPUIOTLBEntry to CPUTLBEntryFull Richard Henderson
@ 2022-09-29 11:45   ` Alex Bennée
  0 siblings, 0 replies; 38+ messages in thread
From: Alex Bennée @ 2022-09-29 11:45 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Peter Maydell, Philippe Mathieu-Daudé, qemu-devel


Richard Henderson <richard.henderson@linaro.org> writes:

> This structure will shortly contain more than just
> data for accessing MMIO.  Rename the 'addr' member
> to 'xlat_section' to more clearly indicate its purpose.
>
> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 02/17] accel/tcg: Drop addr member from SavedIOTLB
  2022-09-25 10:51 ` [PATCH v5 02/17] accel/tcg: Drop addr member from SavedIOTLB Richard Henderson
@ 2022-09-29 11:46   ` Alex Bennée
  0 siblings, 0 replies; 38+ messages in thread
From: Alex Bennée @ 2022-09-29 11:46 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Peter Maydell, Philippe Mathieu-Daudé, qemu-devel


Richard Henderson <richard.henderson@linaro.org> writes:

> This field is only written, not read; remove it.
>
> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 03/17] accel/tcg: Suppress auto-invalidate in probe_access_internal
  2022-09-25 10:51 ` [PATCH v5 03/17] accel/tcg: Suppress auto-invalidate in probe_access_internal Richard Henderson
@ 2022-09-29 11:49   ` Alex Bennée
  2022-09-29 11:50   ` David Hildenbrand
  1 sibling, 0 replies; 38+ messages in thread
From: Alex Bennée @ 2022-09-29 11:49 UTC (permalink / raw)
  To: Richard Henderson; +Cc: David Hildenbrand, Peter Maydell, qemu-devel


Richard Henderson <richard.henderson@linaro.org> writes:

> When PAGE_WRITE_INV is set when calling tlb_set_page,
> we immediately set TLB_INVALID_MASK in order to force
> tlb_fill to be called on the next lookup.  Here in
> probe_access_internal, we have just called tlb_fill
> and eliminated true misses, thus the lookup must be valid.
>
> This allows us to remove a warning comment from s390x.
> There doesn't seem to be a reason to change the code though.
>
> Cc: David Hildenbrand <david@redhat.com>
> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 03/17] accel/tcg: Suppress auto-invalidate in probe_access_internal
  2022-09-25 10:51 ` [PATCH v5 03/17] accel/tcg: Suppress auto-invalidate in probe_access_internal Richard Henderson
  2022-09-29 11:49   ` Alex Bennée
@ 2022-09-29 11:50   ` David Hildenbrand
  1 sibling, 0 replies; 38+ messages in thread
From: David Hildenbrand @ 2022-09-29 11:50 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: Peter Maydell

On 25.09.22 12:51, Richard Henderson wrote:
> When PAGE_WRITE_INV is set when calling tlb_set_page,
> we immediately set TLB_INVALID_MASK in order to force
> tlb_fill to be called on the next lookup.  Here in
> probe_access_internal, we have just called tlb_fill
> and eliminated true misses, thus the lookup must be valid.
> 
> This allows us to remove a warning comment from s390x.
> There doesn't seem to be a reason to change the code though.
> 
> Cc: David Hildenbrand <david@redhat.com>
> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 04/17] accel/tcg: Introduce probe_access_full
  2022-09-25 10:51 ` [PATCH v5 04/17] accel/tcg: Introduce probe_access_full Richard Henderson
@ 2022-09-29 11:51   ` Alex Bennée
  0 siblings, 0 replies; 38+ messages in thread
From: Alex Bennée @ 2022-09-29 11:51 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Peter Maydell, Philippe Mathieu-Daudé, qemu-devel


Richard Henderson <richard.henderson@linaro.org> writes:

> Add an interface to return the CPUTLBEntryFull struct
> that goes with the lookup.  The result is not intended
> to be valid across multiple lookups, so the user must
> use the results immediately.
>
> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  include/exec/exec-all.h | 11 ++++++++++
>  accel/tcg/cputlb.c      | 47 +++++++++++++++++++++++++----------------
>  2 files changed, 40 insertions(+), 18 deletions(-)
>
> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> index bcad607c4e..758cf6bcc7 100644
> --- a/include/exec/exec-all.h
> +++ b/include/exec/exec-all.h
> @@ -434,6 +434,17 @@ int probe_access_flags(CPUArchState *env, target_ulong addr,
>                         MMUAccessType access_type, int mmu_idx,
>                         bool nonfault, void **phost, uintptr_t retaddr);
>  
> +#ifndef CONFIG_USER_ONLY
> +/**
> + * probe_access_full:
> + * Like probe_access_flags, except also return into @pfull.
> + */

That lifetime requirement on @pfull really should be documented here as
well as the commit message.

Otherwise:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 05/17] accel/tcg: Introduce tlb_set_page_full
  2022-09-25 10:51 ` [PATCH v5 05/17] accel/tcg: Introduce tlb_set_page_full Richard Henderson
@ 2022-09-29 12:00   ` Alex Bennée
  0 siblings, 0 replies; 38+ messages in thread
From: Alex Bennée @ 2022-09-29 12:00 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Peter Maydell, Philippe Mathieu-Daudé, qemu-devel


Richard Henderson <richard.henderson@linaro.org> writes:

> Now that we have collected all of the page data into
> CPUTLBEntryFull, provide an interface to record that
> all in one go, instead of using 4 arguments.  This interface
> allows CPUTLBEntryFull to be extended without having to
> change the number of arguments.
>
> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 06/17] include/exec: Introduce TARGET_PAGE_ENTRY_EXTRA
  2022-09-25 10:51 ` [PATCH v5 06/17] include/exec: Introduce TARGET_PAGE_ENTRY_EXTRA Richard Henderson
@ 2022-09-29 12:00   ` Alex Bennée
  0 siblings, 0 replies; 38+ messages in thread
From: Alex Bennée @ 2022-09-29 12:00 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Peter Maydell, Philippe Mathieu-Daudé, qemu-devel


Richard Henderson <richard.henderson@linaro.org> writes:

> Allow the target to cache items from the guest page tables.
>
> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 08/17] accel/tcg: Move byte_swap from MemTxAttrs to CPUTLBEntryFull
  2022-09-25 10:51 ` [PATCH v5 08/17] accel/tcg: Move byte_swap from MemTxAttrs to CPUTLBEntryFull Richard Henderson
@ 2022-09-29 12:27   ` Alex Bennée
  0 siblings, 0 replies; 38+ messages in thread
From: Alex Bennée @ 2022-09-29 12:27 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Mark Cave-Ayland, qemu-devel


Richard Henderson <richard.henderson@linaro.org> writes:

> We had previously placed this bit in MemTxAttrs because we had
> no other way to communicate that information to tlb_set_page*.
> The bit is not relevant to memory transactions, only page table
> entries, and now we do have a way to pass in the bit.
>
> Cc: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 10/17] accel/tcg: Remove PageDesc code_bitmap
  2022-09-25 10:51 ` [PATCH v5 10/17] accel/tcg: Remove PageDesc code_bitmap Richard Henderson
@ 2022-09-29 12:27   ` Alex Bennée
  0 siblings, 0 replies; 38+ messages in thread
From: Alex Bennée @ 2022-09-29 12:27 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <richard.henderson@linaro.org> writes:

> This bitmap is created and discarded immediately.
> We gain nothing by its existence.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> Message-Id: <20220822232338.1727934-2-richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 14/17] include/hw/core: Create struct CPUJumpCache
  2022-09-25 10:51 ` [PATCH v5 14/17] include/hw/core: Create struct CPUJumpCache Richard Henderson
@ 2022-09-29 13:46   ` Alex Bennée
  2022-09-29 16:22     ` Richard Henderson
  0 siblings, 1 reply; 38+ messages in thread
From: Alex Bennée @ 2022-09-29 13:46 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Philippe Mathieu-Daudé, qemu-devel


Richard Henderson <richard.henderson@linaro.org> writes:

> Wrap the bare TranslationBlock pointer into a structure.
>
> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  include/hw/core/cpu.h     | 8 ++++++--
>  accel/tcg/cpu-exec.c      | 9 ++++++---
>  accel/tcg/cputlb.c        | 2 +-
>  accel/tcg/translate-all.c | 4 ++--
>  4 files changed, 15 insertions(+), 8 deletions(-)
>
> diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
> index 9e47184513..ee5b75dea0 100644
> --- a/include/hw/core/cpu.h
> +++ b/include/hw/core/cpu.h
> @@ -232,6 +232,10 @@ struct hvf_vcpu_state;
>  #define TB_JMP_CACHE_BITS 12
>  #define TB_JMP_CACHE_SIZE (1 << TB_JMP_CACHE_BITS)
>  
> +typedef struct {
> +    TranslationBlock *tb;
> +} CPUJumpCache;
> +

I don't quite follow whats going on here. I see we add vaddr pc in a
later patch but I don't quite see why a cache for looking up TBs gets a
sidechan value added later.

Is this because the vaddr will no longer match the tb->pc? Maybe a
comment on the structure is needed?

>  /* work queue */
>  
>  /* The union type allows passing of 64 bit target pointers on 32 bit
> @@ -361,7 +365,7 @@ struct CPUState {
>      IcountDecr *icount_decr_ptr;
>  
>      /* Accessed in parallel; all accesses must be atomic */
> -    TranslationBlock *tb_jmp_cache[TB_JMP_CACHE_SIZE];
> +    CPUJumpCache tb_jmp_cache[TB_JMP_CACHE_SIZE];
>  
>      struct GDBRegisterState *gdb_regs;
>      int gdb_num_regs;
> @@ -452,7 +456,7 @@ static inline void cpu_tb_jmp_cache_clear(CPUState *cpu)
>      unsigned int i;
>  
>      for (i = 0; i < TB_JMP_CACHE_SIZE; i++) {
> -        qatomic_set(&cpu->tb_jmp_cache[i], NULL);
> +        qatomic_set(&cpu->tb_jmp_cache[i].tb, NULL);
>      }
>  }
>  
> diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
> index dd58a144a8..c6283d5798 100644
> --- a/accel/tcg/cpu-exec.c
> +++ b/accel/tcg/cpu-exec.c
> @@ -252,7 +252,7 @@ static inline TranslationBlock *tb_lookup(CPUState *cpu, target_ulong pc,
>      tcg_debug_assert(!(cflags & CF_INVALID));
>  
>      hash = tb_jmp_cache_hash_func(pc);
> -    tb = qatomic_rcu_read(&cpu->tb_jmp_cache[hash]);
> +    tb = qatomic_rcu_read(&cpu->tb_jmp_cache[hash].tb);
>  
>      if (likely(tb &&
>                 tb->pc == pc &&
> @@ -266,7 +266,7 @@ static inline TranslationBlock *tb_lookup(CPUState *cpu, target_ulong pc,
>      if (tb == NULL) {
>          return NULL;
>      }
> -    qatomic_set(&cpu->tb_jmp_cache[hash], tb);
> +    qatomic_set(&cpu->tb_jmp_cache[hash].tb, tb);
>      return tb;
>  }
>  
> @@ -987,6 +987,8 @@ int cpu_exec(CPUState *cpu)
>  
>              tb = tb_lookup(cpu, pc, cs_base, flags, cflags);
>              if (tb == NULL) {
> +                uint32_t h;
> +
>                  mmap_lock();
>                  tb = tb_gen_code(cpu, pc, cs_base, flags, cflags);
>                  mmap_unlock();
> @@ -994,7 +996,8 @@ int cpu_exec(CPUState *cpu)
>                   * We add the TB in the virtual pc hash table
>                   * for the fast lookup
>                   */
> -                qatomic_set(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)], tb);
> +                h = tb_jmp_cache_hash_func(pc);
> +                qatomic_set(&cpu->tb_jmp_cache[h].tb, tb);
>              }
>  
>  #ifndef CONFIG_USER_ONLY
> diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
> index f5e6ca2da2..fb8f3087f1 100644
> --- a/accel/tcg/cputlb.c
> +++ b/accel/tcg/cputlb.c
> @@ -103,7 +103,7 @@ static void tb_jmp_cache_clear_page(CPUState *cpu, target_ulong page_addr)
>      unsigned int i, i0 = tb_jmp_cache_hash_page(page_addr);
>  
>      for (i = 0; i < TB_JMP_PAGE_SIZE; i++) {
> -        qatomic_set(&cpu->tb_jmp_cache[i0 + i], NULL);
> +        qatomic_set(&cpu->tb_jmp_cache[i0 + i].tb, NULL);
>      }
>  }
>  
> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> index f429d33981..efa479ccf3 100644
> --- a/accel/tcg/translate-all.c
> +++ b/accel/tcg/translate-all.c
> @@ -1187,8 +1187,8 @@ static void do_tb_phys_invalidate(TranslationBlock *tb, bool rm_from_page_list)
>      /* remove the TB from the hash list */
>      h = tb_jmp_cache_hash_func(tb->pc);
>      CPU_FOREACH(cpu) {
> -        if (qatomic_read(&cpu->tb_jmp_cache[h]) == tb) {
> -            qatomic_set(&cpu->tb_jmp_cache[h], NULL);
> +        if (qatomic_read(&cpu->tb_jmp_cache[h].tb) == tb) {
> +            qatomic_set(&cpu->tb_jmp_cache[h].tb, NULL);
>          }
>      }


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 14/17] include/hw/core: Create struct CPUJumpCache
  2022-09-29 13:46   ` Alex Bennée
@ 2022-09-29 16:22     ` Richard Henderson
  2022-09-29 17:01       ` Alex Bennée
  0 siblings, 1 reply; 38+ messages in thread
From: Richard Henderson @ 2022-09-29 16:22 UTC (permalink / raw)
  To: Alex Bennée; +Cc: Philippe Mathieu-Daudé, qemu-devel

On 9/29/22 06:46, Alex Bennée wrote:
> 
> Richard Henderson <richard.henderson@linaro.org> writes:
> 
>> Wrap the bare TranslationBlock pointer into a structure.
>>
>> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>   include/hw/core/cpu.h     | 8 ++++++--
>>   accel/tcg/cpu-exec.c      | 9 ++++++---
>>   accel/tcg/cputlb.c        | 2 +-
>>   accel/tcg/translate-all.c | 4 ++--
>>   4 files changed, 15 insertions(+), 8 deletions(-)
>>
>> diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
>> index 9e47184513..ee5b75dea0 100644
>> --- a/include/hw/core/cpu.h
>> +++ b/include/hw/core/cpu.h
>> @@ -232,6 +232,10 @@ struct hvf_vcpu_state;
>>   #define TB_JMP_CACHE_BITS 12
>>   #define TB_JMP_CACHE_SIZE (1 << TB_JMP_CACHE_BITS)
>>   
>> +typedef struct {
>> +    TranslationBlock *tb;
>> +} CPUJumpCache;
>> +
> 
> I don't quite follow whats going on here. I see we add vaddr pc in a
> later patch but I don't quite see why a cache for looking up TBs gets a
> sidechan value added later.
> 
> Is this because the vaddr will no longer match the tb->pc? Maybe a
> comment on the structure is needed?

Correct, there will be no tb->pc, so the cpu has to remember the virtual address itself.

This patch only wraps the current pointer into a structure.


r~

> 
>>   /* work queue */
>>   
>>   /* The union type allows passing of 64 bit target pointers on 32 bit
>> @@ -361,7 +365,7 @@ struct CPUState {
>>       IcountDecr *icount_decr_ptr;
>>   
>>       /* Accessed in parallel; all accesses must be atomic */
>> -    TranslationBlock *tb_jmp_cache[TB_JMP_CACHE_SIZE];
>> +    CPUJumpCache tb_jmp_cache[TB_JMP_CACHE_SIZE];
>>   
>>       struct GDBRegisterState *gdb_regs;
>>       int gdb_num_regs;
>> @@ -452,7 +456,7 @@ static inline void cpu_tb_jmp_cache_clear(CPUState *cpu)
>>       unsigned int i;
>>   
>>       for (i = 0; i < TB_JMP_CACHE_SIZE; i++) {
>> -        qatomic_set(&cpu->tb_jmp_cache[i], NULL);
>> +        qatomic_set(&cpu->tb_jmp_cache[i].tb, NULL);
>>       }
>>   }
>>   
>> diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
>> index dd58a144a8..c6283d5798 100644
>> --- a/accel/tcg/cpu-exec.c
>> +++ b/accel/tcg/cpu-exec.c
>> @@ -252,7 +252,7 @@ static inline TranslationBlock *tb_lookup(CPUState *cpu, target_ulong pc,
>>       tcg_debug_assert(!(cflags & CF_INVALID));
>>   
>>       hash = tb_jmp_cache_hash_func(pc);
>> -    tb = qatomic_rcu_read(&cpu->tb_jmp_cache[hash]);
>> +    tb = qatomic_rcu_read(&cpu->tb_jmp_cache[hash].tb);
>>   
>>       if (likely(tb &&
>>                  tb->pc == pc &&
>> @@ -266,7 +266,7 @@ static inline TranslationBlock *tb_lookup(CPUState *cpu, target_ulong pc,
>>       if (tb == NULL) {
>>           return NULL;
>>       }
>> -    qatomic_set(&cpu->tb_jmp_cache[hash], tb);
>> +    qatomic_set(&cpu->tb_jmp_cache[hash].tb, tb);
>>       return tb;
>>   }
>>   
>> @@ -987,6 +987,8 @@ int cpu_exec(CPUState *cpu)
>>   
>>               tb = tb_lookup(cpu, pc, cs_base, flags, cflags);
>>               if (tb == NULL) {
>> +                uint32_t h;
>> +
>>                   mmap_lock();
>>                   tb = tb_gen_code(cpu, pc, cs_base, flags, cflags);
>>                   mmap_unlock();
>> @@ -994,7 +996,8 @@ int cpu_exec(CPUState *cpu)
>>                    * We add the TB in the virtual pc hash table
>>                    * for the fast lookup
>>                    */
>> -                qatomic_set(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)], tb);
>> +                h = tb_jmp_cache_hash_func(pc);
>> +                qatomic_set(&cpu->tb_jmp_cache[h].tb, tb);
>>               }
>>   
>>   #ifndef CONFIG_USER_ONLY
>> diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
>> index f5e6ca2da2..fb8f3087f1 100644
>> --- a/accel/tcg/cputlb.c
>> +++ b/accel/tcg/cputlb.c
>> @@ -103,7 +103,7 @@ static void tb_jmp_cache_clear_page(CPUState *cpu, target_ulong page_addr)
>>       unsigned int i, i0 = tb_jmp_cache_hash_page(page_addr);
>>   
>>       for (i = 0; i < TB_JMP_PAGE_SIZE; i++) {
>> -        qatomic_set(&cpu->tb_jmp_cache[i0 + i], NULL);
>> +        qatomic_set(&cpu->tb_jmp_cache[i0 + i].tb, NULL);
>>       }
>>   }
>>   
>> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
>> index f429d33981..efa479ccf3 100644
>> --- a/accel/tcg/translate-all.c
>> +++ b/accel/tcg/translate-all.c
>> @@ -1187,8 +1187,8 @@ static void do_tb_phys_invalidate(TranslationBlock *tb, bool rm_from_page_list)
>>       /* remove the TB from the hash list */
>>       h = tb_jmp_cache_hash_func(tb->pc);
>>       CPU_FOREACH(cpu) {
>> -        if (qatomic_read(&cpu->tb_jmp_cache[h]) == tb) {
>> -            qatomic_set(&cpu->tb_jmp_cache[h], NULL);
>> +        if (qatomic_read(&cpu->tb_jmp_cache[h].tb) == tb) {
>> +            qatomic_set(&cpu->tb_jmp_cache[h].tb, NULL);
>>           }
>>       }
> 
> 



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 14/17] include/hw/core: Create struct CPUJumpCache
  2022-09-29 16:22     ` Richard Henderson
@ 2022-09-29 17:01       ` Alex Bennée
  0 siblings, 0 replies; 38+ messages in thread
From: Alex Bennée @ 2022-09-29 17:01 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Philippe Mathieu-Daudé, qemu-devel


Richard Henderson <richard.henderson@linaro.org> writes:

> On 9/29/22 06:46, Alex Bennée wrote:
>> Richard Henderson <richard.henderson@linaro.org> writes:
>> 
>>> Wrap the bare TranslationBlock pointer into a structure.
>>>
>>> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
>>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>>> ---
>>>   include/hw/core/cpu.h     | 8 ++++++--
>>>   accel/tcg/cpu-exec.c      | 9 ++++++---
>>>   accel/tcg/cputlb.c        | 2 +-
>>>   accel/tcg/translate-all.c | 4 ++--
>>>   4 files changed, 15 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
>>> index 9e47184513..ee5b75dea0 100644
>>> --- a/include/hw/core/cpu.h
>>> +++ b/include/hw/core/cpu.h
>>> @@ -232,6 +232,10 @@ struct hvf_vcpu_state;
>>>   #define TB_JMP_CACHE_BITS 12
>>>   #define TB_JMP_CACHE_SIZE (1 << TB_JMP_CACHE_BITS)
>>>   +typedef struct {
>>> +    TranslationBlock *tb;
>>> +} CPUJumpCache;
>>> +
>> I don't quite follow whats going on here. I see we add vaddr pc in a
>> later patch but I don't quite see why a cache for looking up TBs gets a
>> sidechan value added later.
>> Is this because the vaddr will no longer match the tb->pc? Maybe a
>> comment on the structure is needed?
>
> Correct, there will be no tb->pc, so the cpu has to remember the virtual address itself.
>
> This patch only wraps the current pointer into a structure.

Sure - however we don't expand the comment when vaddr is added later.
I'm also concerned that now we have two fields there is scope for skew.
I guess the acquire/release semantics are to ensure we may fail safe but
never get a false positive?

>
>
> r~
>
>> 
>>>   /* work queue */
>>>     /* The union type allows passing of 64 bit target pointers on
>>> 32 bit
>>> @@ -361,7 +365,7 @@ struct CPUState {
>>>       IcountDecr *icount_decr_ptr;
>>>         /* Accessed in parallel; all accesses must be atomic */
>>> -    TranslationBlock *tb_jmp_cache[TB_JMP_CACHE_SIZE];
>>> +    CPUJumpCache tb_jmp_cache[TB_JMP_CACHE_SIZE];
>>>         struct GDBRegisterState *gdb_regs;
>>>       int gdb_num_regs;
>>> @@ -452,7 +456,7 @@ static inline void cpu_tb_jmp_cache_clear(CPUState *cpu)
>>>       unsigned int i;
>>>         for (i = 0; i < TB_JMP_CACHE_SIZE; i++) {
>>> -        qatomic_set(&cpu->tb_jmp_cache[i], NULL);
>>> +        qatomic_set(&cpu->tb_jmp_cache[i].tb, NULL);
>>>       }
>>>   }
>>>   diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
>>> index dd58a144a8..c6283d5798 100644
>>> --- a/accel/tcg/cpu-exec.c
>>> +++ b/accel/tcg/cpu-exec.c
>>> @@ -252,7 +252,7 @@ static inline TranslationBlock *tb_lookup(CPUState *cpu, target_ulong pc,
>>>       tcg_debug_assert(!(cflags & CF_INVALID));
>>>         hash = tb_jmp_cache_hash_func(pc);
>>> -    tb = qatomic_rcu_read(&cpu->tb_jmp_cache[hash]);
>>> +    tb = qatomic_rcu_read(&cpu->tb_jmp_cache[hash].tb);
>>>         if (likely(tb &&
>>>                  tb->pc == pc &&
>>> @@ -266,7 +266,7 @@ static inline TranslationBlock *tb_lookup(CPUState *cpu, target_ulong pc,
>>>       if (tb == NULL) {
>>>           return NULL;
>>>       }
>>> -    qatomic_set(&cpu->tb_jmp_cache[hash], tb);
>>> +    qatomic_set(&cpu->tb_jmp_cache[hash].tb, tb);
>>>       return tb;
>>>   }
>>>   @@ -987,6 +987,8 @@ int cpu_exec(CPUState *cpu)
>>>                 tb = tb_lookup(cpu, pc, cs_base, flags, cflags);
>>>               if (tb == NULL) {
>>> +                uint32_t h;
>>> +
>>>                   mmap_lock();
>>>                   tb = tb_gen_code(cpu, pc, cs_base, flags, cflags);
>>>                   mmap_unlock();
>>> @@ -994,7 +996,8 @@ int cpu_exec(CPUState *cpu)
>>>                    * We add the TB in the virtual pc hash table
>>>                    * for the fast lookup
>>>                    */
>>> -                qatomic_set(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)], tb);
>>> +                h = tb_jmp_cache_hash_func(pc);
>>> +                qatomic_set(&cpu->tb_jmp_cache[h].tb, tb);
>>>               }
>>>     #ifndef CONFIG_USER_ONLY
>>> diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
>>> index f5e6ca2da2..fb8f3087f1 100644
>>> --- a/accel/tcg/cputlb.c
>>> +++ b/accel/tcg/cputlb.c
>>> @@ -103,7 +103,7 @@ static void tb_jmp_cache_clear_page(CPUState *cpu, target_ulong page_addr)
>>>       unsigned int i, i0 = tb_jmp_cache_hash_page(page_addr);
>>>         for (i = 0; i < TB_JMP_PAGE_SIZE; i++) {
>>> -        qatomic_set(&cpu->tb_jmp_cache[i0 + i], NULL);
>>> +        qatomic_set(&cpu->tb_jmp_cache[i0 + i].tb, NULL);
>>>       }
>>>   }
>>>   diff --git a/accel/tcg/translate-all.c
>>> b/accel/tcg/translate-all.c
>>> index f429d33981..efa479ccf3 100644
>>> --- a/accel/tcg/translate-all.c
>>> +++ b/accel/tcg/translate-all.c
>>> @@ -1187,8 +1187,8 @@ static void do_tb_phys_invalidate(TranslationBlock *tb, bool rm_from_page_list)
>>>       /* remove the TB from the hash list */
>>>       h = tb_jmp_cache_hash_func(tb->pc);
>>>       CPU_FOREACH(cpu) {
>>> -        if (qatomic_read(&cpu->tb_jmp_cache[h]) == tb) {
>>> -            qatomic_set(&cpu->tb_jmp_cache[h], NULL);
>>> +        if (qatomic_read(&cpu->tb_jmp_cache[h].tb) == tb) {
>>> +            qatomic_set(&cpu->tb_jmp_cache[h].tb, NULL);
>>>           }
>>>       }
>> 


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 16/17] accel/tcg: Introduce TARGET_TB_PCREL
  2022-09-25 10:51 ` [PATCH v5 16/17] accel/tcg: Introduce TARGET_TB_PCREL Richard Henderson
@ 2022-09-30 12:02   ` Peter Maydell
  2022-09-30 12:59     ` Alex Bennée
  0 siblings, 1 reply; 38+ messages in thread
From: Peter Maydell @ 2022-09-30 12:02 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Sun, 25 Sept 2022 at 12:15, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Prepare for targets to be able to produce TBs that can
> run in more than one virtual context.

> -/* Similarly, but for logs. */
> +/*
> + * Similarly, but for logs. In this case, when the virtual pc
> + * is not available, use the physical address.
> + */
>  static inline target_ulong tb_pc_log(const TranslationBlock *tb)
>  {
> +#if TARGET_TB_PCREL
> +    return tb->page_addr[0];
> +#else
>      return tb->pc;
> +#endif
>  }

This is going to break previously working setups involving
the "filter logging to a particular address range" and also
anybody post-processing logfiles and expecting to see
the virtual address in -d exec logging, I think.

For the exec logging, we surely must know the actual
virtual PC at the point of TB execution -- we were
previously just using tb->pc as a convenient architecture
independent place to get that from, but should now do
something else.

For places where logging a virtual PC becomes meaningless,
we should at least indicate whether we're logging a
physaddr or a vaddr, because now depending on the config
we might do either.

For the range-filter stuff, I'm not sure what to do.
Alex, any ideas?

(I see the -dfilter option documentation doesn't say
whether it's intending to work on physical or virtual
addresses...)

thanks
-- PMM


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 16/17] accel/tcg: Introduce TARGET_TB_PCREL
  2022-09-30 12:02   ` Peter Maydell
@ 2022-09-30 12:59     ` Alex Bennée
  2022-09-30 13:25       ` Peter Maydell
  0 siblings, 1 reply; 38+ messages in thread
From: Alex Bennée @ 2022-09-30 12:59 UTC (permalink / raw)
  To: Peter Maydell; +Cc: Richard Henderson, qemu-devel


Peter Maydell <peter.maydell@linaro.org> writes:

> On Sun, 25 Sept 2022 at 12:15, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>>
>> Prepare for targets to be able to produce TBs that can
>> run in more than one virtual context.
>
>> -/* Similarly, but for logs. */
>> +/*
>> + * Similarly, but for logs. In this case, when the virtual pc
>> + * is not available, use the physical address.
>> + */
>>  static inline target_ulong tb_pc_log(const TranslationBlock *tb)
>>  {
>> +#if TARGET_TB_PCREL
>> +    return tb->page_addr[0];
>> +#else
>>      return tb->pc;
>> +#endif
>>  }
>
> This is going to break previously working setups involving
> the "filter logging to a particular address range" and also
> anybody post-processing logfiles and expecting to see
> the virtual address in -d exec logging, I think.

To be honest I've never found -exec logging that useful for system
emulation (beyond check-tcg tests) because it just generates so much
data.

> For the exec logging, we surely must know the actual
> virtual PC at the point of TB execution -- we were
> previously just using tb->pc as a convenient architecture
> independent place to get that from, but should now do
> something else.
>
> For places where logging a virtual PC becomes meaningless,
> we should at least indicate whether we're logging a
> physaddr or a vaddr, because now depending on the config
> we might do either.

Yes we should extend the logging to say phys-pc or virt-pc 

> For the range-filter stuff, I'm not sure what to do.
> Alex, any ideas?
>
> (I see the -dfilter option documentation doesn't say
> whether it's intending to work on physical or virtual
> addresses...)

I have a feeling for system emulation phys-pc is the most natural but we
could extend the filter spec to be explicit.

>
> thanks
> -- PMM


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 16/17] accel/tcg: Introduce TARGET_TB_PCREL
  2022-09-30 12:59     ` Alex Bennée
@ 2022-09-30 13:25       ` Peter Maydell
  2022-09-30 14:57         ` Alex Bennée
  2022-09-30 17:35         ` Richard Henderson
  0 siblings, 2 replies; 38+ messages in thread
From: Peter Maydell @ 2022-09-30 13:25 UTC (permalink / raw)
  To: Alex Bennée; +Cc: Richard Henderson, qemu-devel

On Fri, 30 Sept 2022 at 14:23, Alex Bennée <alex.bennee@linaro.org> wrote:
>
>
> Peter Maydell <peter.maydell@linaro.org> writes:
> > This is going to break previously working setups involving
> > the "filter logging to a particular address range" and also
> > anybody post-processing logfiles and expecting to see
> > the virtual address in -d exec logging, I think.
>
> To be honest I've never found -exec logging that useful for system
> emulation (beyond check-tcg tests) because it just generates so much
> data.

It can be very useful for "give me a list of all the
PC values where we executed an instruction", for shorter
test cases. You can then (given several of these) look at
where two runs diverge, and similar things. I use it,
so please don't break it :-)

> > For the range-filter stuff, I'm not sure what to do.
> > Alex, any ideas?
> >
> > (I see the -dfilter option documentation doesn't say
> > whether it's intending to work on physical or virtual
> > addresses...)
>
> I have a feeling for system emulation phys-pc is the most natural but we
> could extend the filter spec to be explicit.

...isn't it currently based on virtual addresses, though ?

-- PMM


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 16/17] accel/tcg: Introduce TARGET_TB_PCREL
  2022-09-30 13:25       ` Peter Maydell
@ 2022-09-30 14:57         ` Alex Bennée
  2022-09-30 15:08           ` Peter Maydell
  2022-09-30 17:35         ` Richard Henderson
  1 sibling, 1 reply; 38+ messages in thread
From: Alex Bennée @ 2022-09-30 14:57 UTC (permalink / raw)
  To: Peter Maydell; +Cc: Richard Henderson, qemu-devel


Peter Maydell <peter.maydell@linaro.org> writes:

> On Fri, 30 Sept 2022 at 14:23, Alex Bennée <alex.bennee@linaro.org> wrote:
>>
>>
>> Peter Maydell <peter.maydell@linaro.org> writes:
>> > This is going to break previously working setups involving
>> > the "filter logging to a particular address range" and also
>> > anybody post-processing logfiles and expecting to see
>> > the virtual address in -d exec logging, I think.
>>
>> To be honest I've never found -exec logging that useful for system
>> emulation (beyond check-tcg tests) because it just generates so much
>> data.
>
> It can be very useful for "give me a list of all the
> PC values where we executed an instruction", for shorter
> test cases. You can then (given several of these) look at
> where two runs diverge, and similar things. I use it,
> so please don't break it :-)

ack.

FWIW you can also do that with:

   -plugin ./contrib/plugins/libexeclog.so,ifilter="instruction"

and avoid having to reduce a bunch of massive logs.

>
>> > For the range-filter stuff, I'm not sure what to do.
>> > Alex, any ideas?
>> >
>> > (I see the -dfilter option documentation doesn't say
>> > whether it's intending to work on physical or virtual
>> > addresses...)
>>
>> I have a feeling for system emulation phys-pc is the most natural but we
>> could extend the filter spec to be explicit.
>
> ...isn't it currently based on virtual addresses, though ?

Yes - or rather it only ever considered whatever was in tb->pc.

>
> -- PMM


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 16/17] accel/tcg: Introduce TARGET_TB_PCREL
  2022-09-30 14:57         ` Alex Bennée
@ 2022-09-30 15:08           ` Peter Maydell
  0 siblings, 0 replies; 38+ messages in thread
From: Peter Maydell @ 2022-09-30 15:08 UTC (permalink / raw)
  To: Alex Bennée; +Cc: Richard Henderson, qemu-devel

On Fri, 30 Sept 2022 at 15:59, Alex Bennée <alex.bennee@linaro.org> wrote:
>
>
> Peter Maydell <peter.maydell@linaro.org> writes:
> > It can be very useful for "give me a list of all the
> > PC values where we executed an instruction", for shorter
> > test cases. You can then (given several of these) look at
> > where two runs diverge, and similar things. I use it,
> > so please don't break it :-)
>
> ack.
>
> FWIW you can also do that with:
>
>    -plugin ./contrib/plugins/libexeclog.so,ifilter="instruction"
>
> and avoid having to reduce a bunch of massive logs.

Yes, but if you needed the logs anyway (for the -d cpu output,
for instance) then it's simpler to use the exec logging and filter.

thanks
-- PMM


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 16/17] accel/tcg: Introduce TARGET_TB_PCREL
  2022-09-30 13:25       ` Peter Maydell
  2022-09-30 14:57         ` Alex Bennée
@ 2022-09-30 17:35         ` Richard Henderson
  1 sibling, 0 replies; 38+ messages in thread
From: Richard Henderson @ 2022-09-30 17:35 UTC (permalink / raw)
  To: Peter Maydell, Alex Bennée; +Cc: qemu-devel

On 9/30/22 06:25, Peter Maydell wrote:
> On Fri, 30 Sept 2022 at 14:23, Alex Bennée <alex.bennee@linaro.org> wrote:
>>
>>
>> Peter Maydell <peter.maydell@linaro.org> writes:
>>> This is going to break previously working setups involving
>>> the "filter logging to a particular address range" and also
>>> anybody post-processing logfiles and expecting to see
>>> the virtual address in -d exec logging, I think.
>>
>> To be honest I've never found -exec logging that useful for system
>> emulation (beyond check-tcg tests) because it just generates so much
>> data.
> 
> It can be very useful for "give me a list of all the
> PC values where we executed an instruction", for shorter
> test cases. You can then (given several of these) look at
> where two runs diverge, and similar things. I use it,
> so please don't break it :-)

Ok, I'm reworking the patchset to always have the proper virtual pc for logging.


r~


^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2022-09-30 17:37 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-25 10:51 [PATCH v5 00/17] tcg: CPUTLBEntryFull and TARGET_TB_PCREL Richard Henderson
2022-09-25 10:51 ` [PATCH v5 01/17] accel/tcg: Rename CPUIOTLBEntry to CPUTLBEntryFull Richard Henderson
2022-09-29 11:45   ` Alex Bennée
2022-09-25 10:51 ` [PATCH v5 02/17] accel/tcg: Drop addr member from SavedIOTLB Richard Henderson
2022-09-29 11:46   ` Alex Bennée
2022-09-25 10:51 ` [PATCH v5 03/17] accel/tcg: Suppress auto-invalidate in probe_access_internal Richard Henderson
2022-09-29 11:49   ` Alex Bennée
2022-09-29 11:50   ` David Hildenbrand
2022-09-25 10:51 ` [PATCH v5 04/17] accel/tcg: Introduce probe_access_full Richard Henderson
2022-09-29 11:51   ` Alex Bennée
2022-09-25 10:51 ` [PATCH v5 05/17] accel/tcg: Introduce tlb_set_page_full Richard Henderson
2022-09-29 12:00   ` Alex Bennée
2022-09-25 10:51 ` [PATCH v5 06/17] include/exec: Introduce TARGET_PAGE_ENTRY_EXTRA Richard Henderson
2022-09-29 12:00   ` Alex Bennée
2022-09-25 10:51 ` [PATCH v5 07/17] target/sparc: Use tlb_set_page_full Richard Henderson
2022-09-25 10:51 ` [PATCH v5 08/17] accel/tcg: Move byte_swap from MemTxAttrs to CPUTLBEntryFull Richard Henderson
2022-09-29 12:27   ` Alex Bennée
2022-09-25 10:51 ` [PATCH v5 09/17] accel/tcg: Add force_aligned " Richard Henderson
2022-09-25 10:51 ` [PATCH v5 10/17] accel/tcg: Remove PageDesc code_bitmap Richard Henderson
2022-09-29 12:27   ` Alex Bennée
2022-09-25 10:51 ` [PATCH v5 11/17] accel/tcg: Use bool for page_find_alloc Richard Henderson
2022-09-25 10:51 ` [PATCH v5 12/17] accel/tcg: Use DisasContextBase in plugin_gen_tb_start Richard Henderson
2022-09-25 10:51 ` [PATCH v5 13/17] accel/tcg: Do not align tb->page_addr[0] Richard Henderson
2022-09-25 10:51 ` [PATCH v5 14/17] include/hw/core: Create struct CPUJumpCache Richard Henderson
2022-09-29 13:46   ` Alex Bennée
2022-09-29 16:22     ` Richard Henderson
2022-09-29 17:01       ` Alex Bennée
2022-09-25 10:51 ` [PATCH v5 15/17] accel/tcg: Introduce tb_pc and tb_pc_log Richard Henderson
2022-09-25 10:51 ` [PATCH v5 16/17] accel/tcg: Introduce TARGET_TB_PCREL Richard Henderson
2022-09-30 12:02   ` Peter Maydell
2022-09-30 12:59     ` Alex Bennée
2022-09-30 13:25       ` Peter Maydell
2022-09-30 14:57         ` Alex Bennée
2022-09-30 15:08           ` Peter Maydell
2022-09-30 17:35         ` Richard Henderson
2022-09-25 10:51 ` [PATCH v5 17/17] accel/tcg: Split log_cpu_exec into inline and slow path Richard Henderson
2022-09-29  2:16 ` [PATCH v5 00/17] tcg: CPUTLBEntryFull and TARGET_TB_PCREL Richard Henderson
2022-09-29  6:53   ` Mark Cave-Ayland

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.