All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH  v3 0/3] softmmu demacro
@ 2019-02-15 14:31 Alex Bennée
  2019-02-15 14:31 ` [Qemu-devel] [PATCH v3 1/3] accel/tcg: demacro cputlb Alex Bennée
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Alex Bennée @ 2019-02-15 14:31 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota, mark.cave-ayland, Alex Bennée

Hi,

This is hopefully the final version of the softmmu demacro series. I
tracked down the remaining failures to:

  - not always using the correct victim tlb
  - not masking reads when we unrolled the unaligned helpers

Other than that I've rolled in the changes that were made to support
dynamic TLB code and the follow-up fixes. All patches need some review
before we can merge them.

Alex Bennée (3):
  accel/tcg: demacro cputlb
  accel/tcg: remove softmmu_template.h
  accel/tcg: move unaligned helpers out of core helpers

 accel/tcg/cputlb.c           | 501 +++++++++++++++++++++++++++++++++--
 accel/tcg/softmmu_template.h | 454 -------------------------------
 2 files changed, 475 insertions(+), 480 deletions(-)
 delete mode 100644 accel/tcg/softmmu_template.h

-- 
2.20.1

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH  v3 1/3] accel/tcg: demacro cputlb
  2019-02-15 14:31 [Qemu-devel] [PATCH v3 0/3] softmmu demacro Alex Bennée
@ 2019-02-15 14:31 ` Alex Bennée
  2019-02-15 14:31 ` [Qemu-devel] [PATCH v3 2/3] accel/tcg: remove softmmu_template.h Alex Bennée
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Alex Bennée @ 2019-02-15 14:31 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota, mark.cave-ayland, Alex Bennée

Instead of expanding a series of macros to generate the load/store
helpers we move stuff into common functions and rely on the compiler
to eliminate the dead code for each variant.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v3
  - rebase, apply tlb_fill fixes from 20190209162745.12668-3-cota@braap.org
  - ensure load_helper honours code/read in the victim_tlb access
  - convert comments to proper block style
---
 accel/tcg/cputlb.c | 483 ++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 457 insertions(+), 26 deletions(-)

diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 88cc8389e9..351f579fed 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -1167,26 +1167,426 @@ static void *atomic_mmu_lookup(CPUArchState *env, target_ulong addr,
 }
 
 #ifdef TARGET_WORDS_BIGENDIAN
-# define TGT_BE(X)  (X)
-# define TGT_LE(X)  BSWAP(X)
+#define NEED_BE_BSWAP 0
+#define NEED_LE_BSWAP 1
 #else
-# define TGT_BE(X)  BSWAP(X)
-# define TGT_LE(X)  (X)
+#define NEED_BE_BSWAP 1
+#define NEED_LE_BSWAP 0
 #endif
 
-#define MMUSUFFIX _mmu
+/*
+ * Byte Swap Helper
+ *
+ * This should all dead code away depending on the build host and
+ * access type.
+ */
 
-#define DATA_SIZE 1
-#include "softmmu_template.h"
+static inline uint64_t handle_bswap(uint64_t val, int size, bool big_endian)
+{
+    if ((big_endian && NEED_BE_BSWAP) || (!big_endian && NEED_LE_BSWAP)) {
+        switch (size) {
+        case 1: return val;
+        case 2: return bswap16(val);
+        case 4: return bswap32(val);
+        case 8: return bswap64(val);
+        default:
+            g_assert_not_reached();
+        }
+    } else {
+        return val;
+    }
+}
 
-#define DATA_SIZE 2
-#include "softmmu_template.h"
+/*
+ * Load Helpers
+ *
+ * We support two different access types. SOFTMMU_CODE_ACCESS is
+ * specifically for reading instructions from system memory. It is
+ * called by the translation loop and in some helpers where the code
+ * is disassembled. It shouldn't be called directly by guest code.
+ */
 
-#define DATA_SIZE 4
-#include "softmmu_template.h"
+static tcg_target_ulong load_helper(CPUArchState *env, target_ulong addr,
+                                    TCGMemOpIdx oi, uintptr_t retaddr,
+                                    size_t size, bool big_endian,
+                                    bool code_read)
+{
+    uintptr_t mmu_idx = get_mmuidx(oi);
+    uintptr_t index = tlb_index(env, mmu_idx, addr);
+    CPUTLBEntry *entry = tlb_entry(env, mmu_idx, addr);
+    target_ulong tlb_addr = code_read ? entry->addr_code : entry->addr_read;
+    const size_t tlb_off = code_read ?
+        offsetof(CPUTLBEntry, addr_code) : offsetof(CPUTLBEntry, addr_read);
+    unsigned a_bits = get_alignment_bits(get_memop(oi));
+    uintptr_t haddr;
+    tcg_target_ulong res;
+
+    /* Handle CPU specific unaligned behaviour */
+    if (addr & ((1 << a_bits) - 1)) {
+        cpu_unaligned_access(ENV_GET_CPU(env), addr,
+                             code_read ? MMU_INST_FETCH : MMU_DATA_LOAD,
+                             mmu_idx, retaddr);
+    }
 
-#define DATA_SIZE 8
-#include "softmmu_template.h"
+    /* If the TLB entry is for a different page, reload and try again.  */
+    if (!tlb_hit(tlb_addr, addr)) {
+        if (!victim_tlb_hit(env, mmu_idx, index, tlb_off,
+                            addr & TARGET_PAGE_MASK)) {
+            tlb_fill(ENV_GET_CPU(env), addr, size,
+                     code_read ? MMU_INST_FETCH : MMU_DATA_LOAD,
+                     mmu_idx, retaddr);
+            index = tlb_index(env, mmu_idx, addr);
+            entry = tlb_entry(env, mmu_idx, addr);
+        }
+        tlb_addr = code_read ? entry->addr_code : entry->addr_read;
+    }
+
+    /* Handle an IO access.  */
+    if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
+        CPUIOTLBEntry *iotlbentry = &env->iotlb[mmu_idx][index];
+        uint64_t tmp;
+
+        if ((addr & (size - 1)) != 0) {
+            goto do_unaligned_access;
+        }
+
+        tmp = io_readx(env, iotlbentry, mmu_idx, addr, retaddr,
+                       addr & tlb_addr & TLB_RECHECK,
+                       code_read ? MMU_INST_FETCH : MMU_DATA_LOAD, size);
+        return handle_bswap(tmp, size, big_endian);
+    }
+
+    /* Handle slow unaligned access (it spans two pages or IO).  */
+    if (size > 1
+        && unlikely((addr & ~TARGET_PAGE_MASK) + size - 1
+                    >= TARGET_PAGE_SIZE)) {
+        target_ulong addr1, addr2;
+        tcg_target_ulong r1, r2;
+        unsigned shift;
+    do_unaligned_access:
+        addr1 = addr & ~(size - 1);
+        addr2 = addr1 + size;
+        r1 = load_helper(env, addr1, oi, retaddr, size, big_endian, code_read);
+        r2 = load_helper(env, addr2, oi, retaddr, size, big_endian, code_read);
+        shift = (addr & (size - 1)) * 8;
+
+        if (big_endian) {
+            /* Big-endian combine.  */
+            res = (r1 << shift) | (r2 >> ((size * 8) - shift));
+        } else {
+            /* Little-endian combine.  */
+            res = (r1 >> shift) | (r2 << ((size * 8) - shift));
+        }
+        return res;
+    }
+
+    haddr = addr + entry->addend;
+
+    switch (size) {
+    case 1:
+        res = ldub_p((uint8_t *)haddr);
+        break;
+    case 2:
+        if (big_endian) {
+            res = lduw_be_p((uint8_t *)haddr);
+        } else {
+            res = lduw_le_p((uint8_t *)haddr);
+        }
+        break;
+    case 4:
+        if (big_endian) {
+            res = ldl_be_p((uint8_t *)haddr);
+        } else {
+            res = ldl_le_p((uint8_t *)haddr);
+        }
+        break;
+    case 8:
+        if (big_endian) {
+            res = ldq_be_p((uint8_t *)haddr);
+        } else {
+            res = ldq_le_p((uint8_t *)haddr);
+        }
+        break;
+    default:
+        g_assert_not_reached();
+        break;
+    }
+
+    return res;
+}
+
+/*
+ * For the benefit of TCG generated code, we want to avoid the
+ * complication of ABI-specific return type promotion and always
+ * return a value extended to the register size of the host. This is
+ * tcg_target_long, except in the case of a 32-bit host and 64-bit
+ * data, and for that we always have uint64_t.
+ *
+ * We don't bother with this widened value for SOFTMMU_CODE_ACCESS.
+ */
+
+tcg_target_ulong __attribute__((flatten))
+helper_ret_ldub_mmu(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
+                    uintptr_t retaddr)
+{
+    return load_helper(env, addr, oi, retaddr, 1, false, false);
+}
+
+tcg_target_ulong __attribute__((flatten))
+helper_le_lduw_mmu(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
+                   uintptr_t retaddr)
+{
+    return load_helper(env, addr, oi, retaddr, 2, false, false);
+}
+
+tcg_target_ulong __attribute__((flatten))
+helper_be_lduw_mmu(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
+                   uintptr_t retaddr)
+{
+    return load_helper(env, addr, oi, retaddr, 2, true, false);
+}
+
+tcg_target_ulong __attribute__((flatten))
+helper_le_ldul_mmu(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
+                   uintptr_t retaddr)
+{
+    return load_helper(env, addr, oi, retaddr, 4, false, false);
+}
+
+tcg_target_ulong __attribute__((flatten))
+helper_be_ldul_mmu(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
+                   uintptr_t retaddr)
+{
+    return load_helper(env, addr, oi, retaddr, 4, true, false);
+}
+
+uint64_t __attribute__((flatten))
+helper_le_ldq_mmu(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
+                  uintptr_t retaddr)
+{
+    return load_helper(env, addr, oi, retaddr, 8, false, false);
+}
+
+uint64_t __attribute__((flatten))
+helper_be_ldq_mmu(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
+                  uintptr_t retaddr)
+{
+    return load_helper(env, addr, oi, retaddr, 8, true, false);
+}
+
+/*
+ * Provide signed versions of the load routines as well.  We can of course
+ * avoid this for 64-bit data, or for 32-bit data on 32-bit host.
+ */
+
+
+tcg_target_ulong __attribute__((flatten))
+helper_ret_ldsb_mmu(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
+                    uintptr_t retaddr)
+{
+    return (int8_t)helper_ret_ldub_mmu(env, addr, oi, retaddr);
+}
+
+tcg_target_ulong __attribute__((flatten))
+helper_le_ldsw_mmu(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
+                   uintptr_t retaddr)
+{
+    return (int16_t)helper_le_lduw_mmu(env, addr, oi, retaddr);
+}
+
+tcg_target_ulong __attribute__((flatten))
+helper_be_ldsw_mmu(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
+                   uintptr_t retaddr)
+{
+    return (int16_t)helper_be_lduw_mmu(env, addr, oi, retaddr);
+}
+
+tcg_target_ulong
+helper_le_ldsl_mmu(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
+                   uintptr_t retaddr)
+{
+    return (int32_t)helper_le_lduw_mmu(env, addr, oi, retaddr);
+}
+
+tcg_target_ulong
+helper_be_ldsl_mmu(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
+                   uintptr_t retaddr)
+{
+    return (int32_t)helper_be_lduw_mmu(env, addr, oi, retaddr);
+}
+
+/*
+ * Store Helpers
+ */
+
+static void store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
+                         TCGMemOpIdx oi, uintptr_t retaddr, size_t size,
+                         bool big_endian)
+{
+    uintptr_t mmu_idx = get_mmuidx(oi);
+    uintptr_t index = tlb_index(env, mmu_idx, addr);
+    CPUTLBEntry *entry = tlb_entry(env, mmu_idx, addr);
+    target_ulong tlb_addr = tlb_addr_write(entry);
+    unsigned a_bits = get_alignment_bits(get_memop(oi));
+    uintptr_t haddr;
+
+    /* Handle CPU specific unaligned behaviour */
+    if (addr & ((1 << a_bits) - 1)) {
+        cpu_unaligned_access(ENV_GET_CPU(env), addr, MMU_DATA_STORE,
+                             mmu_idx, retaddr);
+    }
+
+    /* If the TLB entry is for a different page, reload and try again.  */
+    if (!tlb_hit(tlb_addr, addr)) {
+        if (!VICTIM_TLB_HIT(addr_write, addr)) {
+            tlb_fill(ENV_GET_CPU(env), addr, size, MMU_DATA_STORE,
+                     mmu_idx, retaddr);
+            index = tlb_index(env, mmu_idx, addr);
+            entry = tlb_entry(env, mmu_idx, addr);
+        }
+        tlb_addr = tlb_addr_write(entry) & ~TLB_INVALID_MASK;
+    }
+
+    /* Handle an IO access.  */
+    if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
+        CPUIOTLBEntry *iotlbentry = &env->iotlb[mmu_idx][index];
+
+        if ((addr & (size - 1)) != 0) {
+            goto do_unaligned_access;
+        }
+
+        io_writex(env, iotlbentry, mmu_idx,
+                  handle_bswap(val, size, big_endian),
+                  addr, retaddr, tlb_addr & TLB_RECHECK, size);
+        return;
+    }
+
+    /* Handle slow unaligned access (it spans two pages or IO).  */
+    if (size > 1
+        && unlikely((addr & ~TARGET_PAGE_MASK) + size - 1
+                     >= TARGET_PAGE_SIZE)) {
+        int i;
+        uintptr_t index2;
+        CPUTLBEntry *entry2;
+        target_ulong page2, tlb_addr2;
+    do_unaligned_access:
+        /*
+         * Ensure the second page is in the TLB.  Note that the first page
+         * is already guaranteed to be filled, and that the second page
+         * cannot evict the first.
+         */
+        page2 = (addr + size) & TARGET_PAGE_MASK;
+        index2 = tlb_index(env, mmu_idx, page2);
+        entry2 = tlb_entry(env, mmu_idx, index2);
+        tlb_addr2 = tlb_addr_write(entry2);
+        if (!tlb_hit_page(tlb_addr2, page2)
+            && !VICTIM_TLB_HIT(addr_write, page2)) {
+            tlb_fill(ENV_GET_CPU(env), page2, size, MMU_DATA_STORE,
+                     mmu_idx, retaddr);
+            index2 = tlb_index(env, mmu_idx, page2);
+            entry2 = tlb_entry(env, mmu_idx, index2);
+        }
+
+        /*
+         * XXX: not efficient, but simple.
+         * This loop must go in the forward direction to avoid issues
+         * with self-modifying code in Windows 64-bit.
+         */
+        for (i = 0; i < size; ++i) {
+            uint8_t val8;
+            if (big_endian) {
+                /* Big-endian extract.  */
+                val8 = val >> (((size - 1) * 8) - (i * 8));
+            } else {
+                /* Little-endian extract.  */
+                val8 = val >> (i * 8);
+            }
+            store_helper(env, addr + i, val8, oi, retaddr, 1, big_endian);
+        }
+        return;
+    }
+
+    haddr = addr + entry->addend;
+
+    switch (size) {
+    case 1:
+        stb_p((uint8_t *)haddr, val);
+        break;
+    case 2:
+        if (big_endian) {
+            stw_be_p((uint8_t *)haddr, val);
+        } else {
+            stw_le_p((uint8_t *)haddr, val);
+        }
+        break;
+    case 4:
+        if (big_endian) {
+            stl_be_p((uint8_t *)haddr, val);
+        } else {
+            stl_le_p((uint8_t *)haddr, val);
+        }
+        break;
+    case 8:
+        if (big_endian) {
+            stq_be_p((uint8_t *)haddr, val);
+        } else {
+            stq_le_p((uint8_t *)haddr, val);
+        }
+        break;
+    default:
+        g_assert_not_reached();
+        break;
+    }
+}
+
+void __attribute__((flatten))
+helper_ret_stb_mmu(CPUArchState *env, target_ulong addr, uint8_t val,
+                   TCGMemOpIdx oi, uintptr_t retaddr)
+{
+    store_helper(env, addr, val, oi, retaddr, 1, false);
+}
+
+void __attribute__((flatten))
+helper_le_stw_mmu(CPUArchState *env, target_ulong addr, uint16_t val,
+                  TCGMemOpIdx oi, uintptr_t retaddr)
+{
+    store_helper(env, addr, val, oi, retaddr, 2, false);
+}
+
+void __attribute__((flatten))
+helper_be_stw_mmu(CPUArchState *env, target_ulong addr, uint16_t val,
+                  TCGMemOpIdx oi, uintptr_t retaddr)
+{
+    store_helper(env, addr, val, oi, retaddr, 2, true);
+}
+
+void __attribute__((flatten))
+helper_le_stl_mmu(CPUArchState *env, target_ulong addr, uint32_t val,
+                  TCGMemOpIdx oi, uintptr_t retaddr)
+{
+    store_helper(env, addr, val, oi, retaddr, 4, false);
+}
+
+void __attribute__((flatten))
+helper_be_stl_mmu(CPUArchState *env, target_ulong addr, uint32_t val,
+                  TCGMemOpIdx oi, uintptr_t retaddr)
+{
+    store_helper(env, addr, val, oi, retaddr, 4, true);
+}
+
+void __attribute__((flatten))
+helper_le_stq_mmu(CPUArchState *env, target_ulong addr, uint64_t val,
+                  TCGMemOpIdx oi, uintptr_t retaddr)
+{
+    store_helper(env, addr, val, oi, retaddr, 8, false);
+}
+
+void __attribute__((flatten))
+helper_be_stq_mmu(CPUArchState *env, target_ulong addr, uint64_t val,
+                  TCGMemOpIdx oi, uintptr_t retaddr)
+{
+    store_helper(env, addr, val, oi, retaddr, 8, true);
+}
 
 /* First set of helpers allows passing in of OI and RETADDR.  This makes
    them callable from other helpers.  */
@@ -1247,20 +1647,51 @@ static void *atomic_mmu_lookup(CPUArchState *env, target_ulong addr,
 
 /* Code access functions.  */
 
-#undef MMUSUFFIX
-#define MMUSUFFIX _cmmu
-#undef GETPC
-#define GETPC() ((uintptr_t)0)
-#define SOFTMMU_CODE_ACCESS
+uint8_t __attribute__((flatten))
+helper_ret_ldb_cmmu(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
+                    uintptr_t retaddr)
+{
+    return load_helper(env, addr, oi, retaddr, 1, false, true);
+}
 
-#define DATA_SIZE 1
-#include "softmmu_template.h"
+uint16_t __attribute__((flatten))
+helper_le_ldw_cmmu(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
+                   uintptr_t retaddr)
+{
+    return load_helper(env, addr, oi, retaddr, 2, false, true);
+}
 
-#define DATA_SIZE 2
-#include "softmmu_template.h"
+uint16_t __attribute__((flatten))
+helper_be_ldw_cmmu(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
+                   uintptr_t retaddr)
+{
+    return load_helper(env, addr, oi, retaddr, 2, true, true);
+}
 
-#define DATA_SIZE 4
-#include "softmmu_template.h"
+uint32_t __attribute__((flatten))
+helper_le_ldl_cmmu(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
+                   uintptr_t retaddr)
+{
+    return load_helper(env, addr, oi, retaddr, 4, false, true);
+}
 
-#define DATA_SIZE 8
-#include "softmmu_template.h"
+uint32_t __attribute__((flatten))
+helper_be_ldl_cmmu(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
+                   uintptr_t retaddr)
+{
+    return load_helper(env, addr, oi, retaddr, 4, true, true);
+}
+
+uint64_t __attribute__((flatten))
+helper_le_ldq_cmmu(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
+                   uintptr_t retaddr)
+{
+    return load_helper(env, addr, oi, retaddr, 8, false, true);
+}
+
+uint64_t __attribute__((flatten))
+helper_be_ldq_cmmu(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
+                   uintptr_t retaddr)
+{
+    return load_helper(env, addr, oi, retaddr, 8, true, true);
+}
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH  v3 2/3] accel/tcg: remove softmmu_template.h
  2019-02-15 14:31 [Qemu-devel] [PATCH v3 0/3] softmmu demacro Alex Bennée
  2019-02-15 14:31 ` [Qemu-devel] [PATCH v3 1/3] accel/tcg: demacro cputlb Alex Bennée
@ 2019-02-15 14:31 ` Alex Bennée
  2019-02-15 14:31 ` [Qemu-devel] [PATCH v3 3/3] accel/tcg: move unaligned helpers out of core helpers Alex Bennée
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Alex Bennée @ 2019-02-15 14:31 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota, mark.cave-ayland, Alex Bennée

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 accel/tcg/softmmu_template.h | 454 -----------------------------------
 1 file changed, 454 deletions(-)
 delete mode 100644 accel/tcg/softmmu_template.h

diff --git a/accel/tcg/softmmu_template.h b/accel/tcg/softmmu_template.h
deleted file mode 100644
index e970a8b378..0000000000
--- a/accel/tcg/softmmu_template.h
+++ /dev/null
@@ -1,454 +0,0 @@
-/*
- *  Software MMU support
- *
- * Generate helpers used by TCG for qemu_ld/st ops and code load
- * functions.
- *
- * Included from target op helpers and exec.c.
- *
- *  Copyright (c) 2003 Fabrice Bellard
- *
- * This library is free software; you can redistribute it and/or
- * modify it under the terms of the GNU Lesser General Public
- * License as published by the Free Software Foundation; either
- * version 2.1 of the License, or (at your option) any later version.
- *
- * This library is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- * Lesser General Public License for more details.
- *
- * You should have received a copy of the GNU Lesser General Public
- * License along with this library; if not, see <http://www.gnu.org/licenses/>.
- */
-#if DATA_SIZE == 8
-#define SUFFIX q
-#define LSUFFIX q
-#define SDATA_TYPE  int64_t
-#define DATA_TYPE  uint64_t
-#elif DATA_SIZE == 4
-#define SUFFIX l
-#define LSUFFIX l
-#define SDATA_TYPE  int32_t
-#define DATA_TYPE  uint32_t
-#elif DATA_SIZE == 2
-#define SUFFIX w
-#define LSUFFIX uw
-#define SDATA_TYPE  int16_t
-#define DATA_TYPE  uint16_t
-#elif DATA_SIZE == 1
-#define SUFFIX b
-#define LSUFFIX ub
-#define SDATA_TYPE  int8_t
-#define DATA_TYPE  uint8_t
-#else
-#error unsupported data size
-#endif
-
-
-/* For the benefit of TCG generated code, we want to avoid the complication
-   of ABI-specific return type promotion and always return a value extended
-   to the register size of the host.  This is tcg_target_long, except in the
-   case of a 32-bit host and 64-bit data, and for that we always have
-   uint64_t.  Don't bother with this widened value for SOFTMMU_CODE_ACCESS.  */
-#if defined(SOFTMMU_CODE_ACCESS) || DATA_SIZE == 8
-# define WORD_TYPE  DATA_TYPE
-# define USUFFIX    SUFFIX
-#else
-# define WORD_TYPE  tcg_target_ulong
-# define USUFFIX    glue(u, SUFFIX)
-# define SSUFFIX    glue(s, SUFFIX)
-#endif
-
-#ifdef SOFTMMU_CODE_ACCESS
-#define READ_ACCESS_TYPE MMU_INST_FETCH
-#define ADDR_READ addr_code
-#else
-#define READ_ACCESS_TYPE MMU_DATA_LOAD
-#define ADDR_READ addr_read
-#endif
-
-#if DATA_SIZE == 8
-# define BSWAP(X)  bswap64(X)
-#elif DATA_SIZE == 4
-# define BSWAP(X)  bswap32(X)
-#elif DATA_SIZE == 2
-# define BSWAP(X)  bswap16(X)
-#else
-# define BSWAP(X)  (X)
-#endif
-
-#if DATA_SIZE == 1
-# define helper_le_ld_name  glue(glue(helper_ret_ld, USUFFIX), MMUSUFFIX)
-# define helper_be_ld_name  helper_le_ld_name
-# define helper_le_lds_name glue(glue(helper_ret_ld, SSUFFIX), MMUSUFFIX)
-# define helper_be_lds_name helper_le_lds_name
-# define helper_le_st_name  glue(glue(helper_ret_st, SUFFIX), MMUSUFFIX)
-# define helper_be_st_name  helper_le_st_name
-#else
-# define helper_le_ld_name  glue(glue(helper_le_ld, USUFFIX), MMUSUFFIX)
-# define helper_be_ld_name  glue(glue(helper_be_ld, USUFFIX), MMUSUFFIX)
-# define helper_le_lds_name glue(glue(helper_le_ld, SSUFFIX), MMUSUFFIX)
-# define helper_be_lds_name glue(glue(helper_be_ld, SSUFFIX), MMUSUFFIX)
-# define helper_le_st_name  glue(glue(helper_le_st, SUFFIX), MMUSUFFIX)
-# define helper_be_st_name  glue(glue(helper_be_st, SUFFIX), MMUSUFFIX)
-#endif
-
-#ifndef SOFTMMU_CODE_ACCESS
-static inline DATA_TYPE glue(io_read, SUFFIX)(CPUArchState *env,
-                                              size_t mmu_idx, size_t index,
-                                              target_ulong addr,
-                                              uintptr_t retaddr,
-                                              bool recheck,
-                                              MMUAccessType access_type)
-{
-    CPUIOTLBEntry *iotlbentry = &env->iotlb[mmu_idx][index];
-    return io_readx(env, iotlbentry, mmu_idx, addr, retaddr, recheck,
-                    access_type, DATA_SIZE);
-}
-#endif
-
-WORD_TYPE helper_le_ld_name(CPUArchState *env, target_ulong addr,
-                            TCGMemOpIdx oi, uintptr_t retaddr)
-{
-    uintptr_t mmu_idx = get_mmuidx(oi);
-    uintptr_t index = tlb_index(env, mmu_idx, addr);
-    CPUTLBEntry *entry = tlb_entry(env, mmu_idx, addr);
-    target_ulong tlb_addr = entry->ADDR_READ;
-    unsigned a_bits = get_alignment_bits(get_memop(oi));
-    uintptr_t haddr;
-    DATA_TYPE res;
-
-    if (addr & ((1 << a_bits) - 1)) {
-        cpu_unaligned_access(ENV_GET_CPU(env), addr, READ_ACCESS_TYPE,
-                             mmu_idx, retaddr);
-    }
-
-    /* If the TLB entry is for a different page, reload and try again.  */
-    if (!tlb_hit(tlb_addr, addr)) {
-        if (!VICTIM_TLB_HIT(ADDR_READ, addr)) {
-            tlb_fill(ENV_GET_CPU(env), addr, DATA_SIZE, READ_ACCESS_TYPE,
-                     mmu_idx, retaddr);
-            index = tlb_index(env, mmu_idx, addr);
-            entry = tlb_entry(env, mmu_idx, addr);
-        }
-        tlb_addr = entry->ADDR_READ;
-    }
-
-    /* Handle an IO access.  */
-    if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
-        if ((addr & (DATA_SIZE - 1)) != 0) {
-            goto do_unaligned_access;
-        }
-
-        /* ??? Note that the io helpers always read data in the target
-           byte ordering.  We should push the LE/BE request down into io.  */
-        res = glue(io_read, SUFFIX)(env, mmu_idx, index, addr, retaddr,
-                                    tlb_addr & TLB_RECHECK,
-                                    READ_ACCESS_TYPE);
-        res = TGT_LE(res);
-        return res;
-    }
-
-    /* Handle slow unaligned access (it spans two pages or IO).  */
-    if (DATA_SIZE > 1
-        && unlikely((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1
-                    >= TARGET_PAGE_SIZE)) {
-        target_ulong addr1, addr2;
-        DATA_TYPE res1, res2;
-        unsigned shift;
-    do_unaligned_access:
-        addr1 = addr & ~(DATA_SIZE - 1);
-        addr2 = addr1 + DATA_SIZE;
-        res1 = helper_le_ld_name(env, addr1, oi, retaddr);
-        res2 = helper_le_ld_name(env, addr2, oi, retaddr);
-        shift = (addr & (DATA_SIZE - 1)) * 8;
-
-        /* Little-endian combine.  */
-        res = (res1 >> shift) | (res2 << ((DATA_SIZE * 8) - shift));
-        return res;
-    }
-
-    haddr = addr + entry->addend;
-#if DATA_SIZE == 1
-    res = glue(glue(ld, LSUFFIX), _p)((uint8_t *)haddr);
-#else
-    res = glue(glue(ld, LSUFFIX), _le_p)((uint8_t *)haddr);
-#endif
-    return res;
-}
-
-#if DATA_SIZE > 1
-WORD_TYPE helper_be_ld_name(CPUArchState *env, target_ulong addr,
-                            TCGMemOpIdx oi, uintptr_t retaddr)
-{
-    uintptr_t mmu_idx = get_mmuidx(oi);
-    uintptr_t index = tlb_index(env, mmu_idx, addr);
-    CPUTLBEntry *entry = tlb_entry(env, mmu_idx, addr);
-    target_ulong tlb_addr = entry->ADDR_READ;
-    unsigned a_bits = get_alignment_bits(get_memop(oi));
-    uintptr_t haddr;
-    DATA_TYPE res;
-
-    if (addr & ((1 << a_bits) - 1)) {
-        cpu_unaligned_access(ENV_GET_CPU(env), addr, READ_ACCESS_TYPE,
-                             mmu_idx, retaddr);
-    }
-
-    /* If the TLB entry is for a different page, reload and try again.  */
-    if (!tlb_hit(tlb_addr, addr)) {
-        if (!VICTIM_TLB_HIT(ADDR_READ, addr)) {
-            tlb_fill(ENV_GET_CPU(env), addr, DATA_SIZE, READ_ACCESS_TYPE,
-                     mmu_idx, retaddr);
-            index = tlb_index(env, mmu_idx, addr);
-            entry = tlb_entry(env, mmu_idx, addr);
-        }
-        tlb_addr = entry->ADDR_READ;
-    }
-
-    /* Handle an IO access.  */
-    if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
-        if ((addr & (DATA_SIZE - 1)) != 0) {
-            goto do_unaligned_access;
-        }
-
-        /* ??? Note that the io helpers always read data in the target
-           byte ordering.  We should push the LE/BE request down into io.  */
-        res = glue(io_read, SUFFIX)(env, mmu_idx, index, addr, retaddr,
-                                    tlb_addr & TLB_RECHECK,
-                                    READ_ACCESS_TYPE);
-        res = TGT_BE(res);
-        return res;
-    }
-
-    /* Handle slow unaligned access (it spans two pages or IO).  */
-    if (DATA_SIZE > 1
-        && unlikely((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1
-                    >= TARGET_PAGE_SIZE)) {
-        target_ulong addr1, addr2;
-        DATA_TYPE res1, res2;
-        unsigned shift;
-    do_unaligned_access:
-        addr1 = addr & ~(DATA_SIZE - 1);
-        addr2 = addr1 + DATA_SIZE;
-        res1 = helper_be_ld_name(env, addr1, oi, retaddr);
-        res2 = helper_be_ld_name(env, addr2, oi, retaddr);
-        shift = (addr & (DATA_SIZE - 1)) * 8;
-
-        /* Big-endian combine.  */
-        res = (res1 << shift) | (res2 >> ((DATA_SIZE * 8) - shift));
-        return res;
-    }
-
-    haddr = addr + entry->addend;
-    res = glue(glue(ld, LSUFFIX), _be_p)((uint8_t *)haddr);
-    return res;
-}
-#endif /* DATA_SIZE > 1 */
-
-#ifndef SOFTMMU_CODE_ACCESS
-
-/* Provide signed versions of the load routines as well.  We can of course
-   avoid this for 64-bit data, or for 32-bit data on 32-bit host.  */
-#if DATA_SIZE * 8 < TCG_TARGET_REG_BITS
-WORD_TYPE helper_le_lds_name(CPUArchState *env, target_ulong addr,
-                             TCGMemOpIdx oi, uintptr_t retaddr)
-{
-    return (SDATA_TYPE)helper_le_ld_name(env, addr, oi, retaddr);
-}
-
-# if DATA_SIZE > 1
-WORD_TYPE helper_be_lds_name(CPUArchState *env, target_ulong addr,
-                             TCGMemOpIdx oi, uintptr_t retaddr)
-{
-    return (SDATA_TYPE)helper_be_ld_name(env, addr, oi, retaddr);
-}
-# endif
-#endif
-
-static inline void glue(io_write, SUFFIX)(CPUArchState *env,
-                                          size_t mmu_idx, size_t index,
-                                          DATA_TYPE val,
-                                          target_ulong addr,
-                                          uintptr_t retaddr,
-                                          bool recheck)
-{
-    CPUIOTLBEntry *iotlbentry = &env->iotlb[mmu_idx][index];
-    return io_writex(env, iotlbentry, mmu_idx, val, addr, retaddr,
-                     recheck, DATA_SIZE);
-}
-
-void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
-                       TCGMemOpIdx oi, uintptr_t retaddr)
-{
-    uintptr_t mmu_idx = get_mmuidx(oi);
-    uintptr_t index = tlb_index(env, mmu_idx, addr);
-    CPUTLBEntry *entry = tlb_entry(env, mmu_idx, addr);
-    target_ulong tlb_addr = tlb_addr_write(entry);
-    unsigned a_bits = get_alignment_bits(get_memop(oi));
-    uintptr_t haddr;
-
-    if (addr & ((1 << a_bits) - 1)) {
-        cpu_unaligned_access(ENV_GET_CPU(env), addr, MMU_DATA_STORE,
-                             mmu_idx, retaddr);
-    }
-
-    /* If the TLB entry is for a different page, reload and try again.  */
-    if (!tlb_hit(tlb_addr, addr)) {
-        if (!VICTIM_TLB_HIT(addr_write, addr)) {
-            tlb_fill(ENV_GET_CPU(env), addr, DATA_SIZE, MMU_DATA_STORE,
-                     mmu_idx, retaddr);
-            index = tlb_index(env, mmu_idx, addr);
-            entry = tlb_entry(env, mmu_idx, addr);
-        }
-        tlb_addr = tlb_addr_write(entry) & ~TLB_INVALID_MASK;
-    }
-
-    /* Handle an IO access.  */
-    if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
-        if ((addr & (DATA_SIZE - 1)) != 0) {
-            goto do_unaligned_access;
-        }
-
-        /* ??? Note that the io helpers always read data in the target
-           byte ordering.  We should push the LE/BE request down into io.  */
-        val = TGT_LE(val);
-        glue(io_write, SUFFIX)(env, mmu_idx, index, val, addr,
-                               retaddr, tlb_addr & TLB_RECHECK);
-        return;
-    }
-
-    /* Handle slow unaligned access (it spans two pages or IO).  */
-    if (DATA_SIZE > 1
-        && unlikely((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1
-                     >= TARGET_PAGE_SIZE)) {
-        int i;
-        target_ulong page2;
-        CPUTLBEntry *entry2;
-    do_unaligned_access:
-        /* Ensure the second page is in the TLB.  Note that the first page
-           is already guaranteed to be filled, and that the second page
-           cannot evict the first.  */
-        page2 = (addr + DATA_SIZE) & TARGET_PAGE_MASK;
-        entry2 = tlb_entry(env, mmu_idx, page2);
-        if (!tlb_hit_page(tlb_addr_write(entry2), page2)
-            && !VICTIM_TLB_HIT(addr_write, page2)) {
-            tlb_fill(ENV_GET_CPU(env), page2, DATA_SIZE, MMU_DATA_STORE,
-                     mmu_idx, retaddr);
-        }
-
-        /* XXX: not efficient, but simple.  */
-        /* This loop must go in the forward direction to avoid issues
-           with self-modifying code in Windows 64-bit.  */
-        for (i = 0; i < DATA_SIZE; ++i) {
-            /* Little-endian extract.  */
-            uint8_t val8 = val >> (i * 8);
-            glue(helper_ret_stb, MMUSUFFIX)(env, addr + i, val8,
-                                            oi, retaddr);
-        }
-        return;
-    }
-
-    haddr = addr + entry->addend;
-#if DATA_SIZE == 1
-    glue(glue(st, SUFFIX), _p)((uint8_t *)haddr, val);
-#else
-    glue(glue(st, SUFFIX), _le_p)((uint8_t *)haddr, val);
-#endif
-}
-
-#if DATA_SIZE > 1
-void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
-                       TCGMemOpIdx oi, uintptr_t retaddr)
-{
-    uintptr_t mmu_idx = get_mmuidx(oi);
-    uintptr_t index = tlb_index(env, mmu_idx, addr);
-    CPUTLBEntry *entry = tlb_entry(env, mmu_idx, addr);
-    target_ulong tlb_addr = tlb_addr_write(entry);
-    unsigned a_bits = get_alignment_bits(get_memop(oi));
-    uintptr_t haddr;
-
-    if (addr & ((1 << a_bits) - 1)) {
-        cpu_unaligned_access(ENV_GET_CPU(env), addr, MMU_DATA_STORE,
-                             mmu_idx, retaddr);
-    }
-
-    /* If the TLB entry is for a different page, reload and try again.  */
-    if (!tlb_hit(tlb_addr, addr)) {
-        if (!VICTIM_TLB_HIT(addr_write, addr)) {
-            tlb_fill(ENV_GET_CPU(env), addr, DATA_SIZE, MMU_DATA_STORE,
-                     mmu_idx, retaddr);
-            index = tlb_index(env, mmu_idx, addr);
-            entry = tlb_entry(env, mmu_idx, addr);
-        }
-        tlb_addr = tlb_addr_write(entry) & ~TLB_INVALID_MASK;
-    }
-
-    /* Handle an IO access.  */
-    if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
-        if ((addr & (DATA_SIZE - 1)) != 0) {
-            goto do_unaligned_access;
-        }
-
-        /* ??? Note that the io helpers always read data in the target
-           byte ordering.  We should push the LE/BE request down into io.  */
-        val = TGT_BE(val);
-        glue(io_write, SUFFIX)(env, mmu_idx, index, val, addr, retaddr,
-                               tlb_addr & TLB_RECHECK);
-        return;
-    }
-
-    /* Handle slow unaligned access (it spans two pages or IO).  */
-    if (DATA_SIZE > 1
-        && unlikely((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1
-                     >= TARGET_PAGE_SIZE)) {
-        int i;
-        target_ulong page2;
-        CPUTLBEntry *entry2;
-    do_unaligned_access:
-        /* Ensure the second page is in the TLB.  Note that the first page
-           is already guaranteed to be filled, and that the second page
-           cannot evict the first.  */
-        page2 = (addr + DATA_SIZE) & TARGET_PAGE_MASK;
-        entry2 = tlb_entry(env, mmu_idx, page2);
-        if (!tlb_hit_page(tlb_addr_write(entry2), page2)
-            && !VICTIM_TLB_HIT(addr_write, page2)) {
-            tlb_fill(ENV_GET_CPU(env), page2, DATA_SIZE, MMU_DATA_STORE,
-                     mmu_idx, retaddr);
-        }
-
-        /* XXX: not efficient, but simple */
-        /* This loop must go in the forward direction to avoid issues
-           with self-modifying code.  */
-        for (i = 0; i < DATA_SIZE; ++i) {
-            /* Big-endian extract.  */
-            uint8_t val8 = val >> (((DATA_SIZE - 1) * 8) - (i * 8));
-            glue(helper_ret_stb, MMUSUFFIX)(env, addr + i, val8,
-                                            oi, retaddr);
-        }
-        return;
-    }
-
-    haddr = addr + entry->addend;
-    glue(glue(st, SUFFIX), _be_p)((uint8_t *)haddr, val);
-}
-#endif /* DATA_SIZE > 1 */
-#endif /* !defined(SOFTMMU_CODE_ACCESS) */
-
-#undef READ_ACCESS_TYPE
-#undef DATA_TYPE
-#undef SUFFIX
-#undef LSUFFIX
-#undef DATA_SIZE
-#undef ADDR_READ
-#undef WORD_TYPE
-#undef SDATA_TYPE
-#undef USUFFIX
-#undef SSUFFIX
-#undef BSWAP
-#undef helper_le_ld_name
-#undef helper_be_ld_name
-#undef helper_le_lds_name
-#undef helper_be_lds_name
-#undef helper_le_st_name
-#undef helper_be_st_name
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH v3 3/3] accel/tcg: move unaligned helpers out of core helpers
  2019-02-15 14:31 [Qemu-devel] [PATCH v3 0/3] softmmu demacro Alex Bennée
  2019-02-15 14:31 ` [Qemu-devel] [PATCH v3 1/3] accel/tcg: demacro cputlb Alex Bennée
  2019-02-15 14:31 ` [Qemu-devel] [PATCH v3 2/3] accel/tcg: remove softmmu_template.h Alex Bennée
@ 2019-02-15 14:31 ` Alex Bennée
  2019-02-15 15:03 ` [Qemu-devel] [PATCH v3 0/3] softmmu demacro no-reply
  2019-02-19 14:22 ` Alex Bennée
  4 siblings, 0 replies; 11+ messages in thread
From: Alex Bennée @ 2019-02-15 14:31 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota, mark.cave-ayland, Alex Bennée

Using __flatten__ ends up with fairly large functions. Attempt to
ameliorate this by moving the unaligned handling out of the main
helper. We can now use __always_inline__ instead of __flatten__ and
keep the main functions relatively small.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v3
  - take care to mask high bits in unaligned_load_helper
---
 accel/tcg/cputlb.c | 200 ++++++++++++++++++++++++---------------------
 1 file changed, 109 insertions(+), 91 deletions(-)

diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 351f579fed..5f225e3623 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -1206,10 +1206,47 @@ static inline uint64_t handle_bswap(uint64_t val, int size, bool big_endian)
  * is disassembled. It shouldn't be called directly by guest code.
  */
 
-static tcg_target_ulong load_helper(CPUArchState *env, target_ulong addr,
-                                    TCGMemOpIdx oi, uintptr_t retaddr,
-                                    size_t size, bool big_endian,
-                                    bool code_read)
+static inline __attribute__((__always_inline__))
+tcg_target_ulong load_helper(CPUArchState *env, target_ulong addr,
+                             TCGMemOpIdx oi, uintptr_t retaddr,
+                             size_t size, bool big_endian,
+                             bool code_read);
+
+static tcg_target_ulong unaligned_load_helper(CPUArchState *env, target_ulong addr,
+                                              TCGMemOpIdx oi, uintptr_t retaddr,
+                                              size_t size, bool big_endian,
+                                              bool code_read)
+{
+    target_ulong addr1, addr2;
+    tcg_target_ulong res, r1, r2;
+    const tcg_target_ulong size_mask = MAKE_64BIT_MASK(0, size * 8);
+    unsigned shift;
+
+    addr1 = addr & ~(size - 1);
+    addr2 = addr1 + size;
+
+    r1 = load_helper(env, addr1, oi, retaddr, size, big_endian, code_read);
+    r2 = load_helper(env, addr2, oi, retaddr, size, big_endian, code_read);
+    shift = (addr & (size - 1)) * 8;
+
+    if (big_endian) {
+        /* Big-endian combine.  */
+        r2 &= size_mask;
+        res = (r1 << shift) | (r2 >> ((size * 8) - shift));
+    } else {
+        /* Little-endian combine.  */
+        r1 &= size_mask;
+        res = (r1 >> shift) | (r2 << ((size * 8) - shift));
+    }
+    return res;
+}
+
+
+static inline __attribute__((__always_inline__))
+tcg_target_ulong load_helper(CPUArchState *env, target_ulong addr,
+                             TCGMemOpIdx oi, uintptr_t retaddr,
+                             size_t size, bool big_endian,
+                             bool code_read)
 {
     uintptr_t mmu_idx = get_mmuidx(oi);
     uintptr_t index = tlb_index(env, mmu_idx, addr);
@@ -1247,7 +1284,8 @@ static tcg_target_ulong load_helper(CPUArchState *env, target_ulong addr,
         uint64_t tmp;
 
         if ((addr & (size - 1)) != 0) {
-            goto do_unaligned_access;
+            return unaligned_load_helper(env, addr, oi, retaddr,
+                                         size, big_endian, code_read);
         }
 
         tmp = io_readx(env, iotlbentry, mmu_idx, addr, retaddr,
@@ -1260,24 +1298,9 @@ static tcg_target_ulong load_helper(CPUArchState *env, target_ulong addr,
     if (size > 1
         && unlikely((addr & ~TARGET_PAGE_MASK) + size - 1
                     >= TARGET_PAGE_SIZE)) {
-        target_ulong addr1, addr2;
-        tcg_target_ulong r1, r2;
-        unsigned shift;
-    do_unaligned_access:
-        addr1 = addr & ~(size - 1);
-        addr2 = addr1 + size;
-        r1 = load_helper(env, addr1, oi, retaddr, size, big_endian, code_read);
-        r2 = load_helper(env, addr2, oi, retaddr, size, big_endian, code_read);
-        shift = (addr & (size - 1)) * 8;
 
-        if (big_endian) {
-            /* Big-endian combine.  */
-            res = (r1 << shift) | (r2 >> ((size * 8) - shift));
-        } else {
-            /* Little-endian combine.  */
-            res = (r1 >> shift) | (r2 << ((size * 8) - shift));
-        }
-        return res;
+        return unaligned_load_helper(env, addr, oi, retaddr,
+                                     size, big_endian, code_read);
     }
 
     haddr = addr + entry->addend;
@@ -1325,50 +1348,49 @@ static tcg_target_ulong load_helper(CPUArchState *env, target_ulong addr,
  * We don't bother with this widened value for SOFTMMU_CODE_ACCESS.
  */
 
-tcg_target_ulong __attribute__((flatten))
+tcg_target_ulong
 helper_ret_ldub_mmu(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
                     uintptr_t retaddr)
 {
     return load_helper(env, addr, oi, retaddr, 1, false, false);
 }
 
-tcg_target_ulong __attribute__((flatten))
+tcg_target_ulong
 helper_le_lduw_mmu(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
                    uintptr_t retaddr)
 {
     return load_helper(env, addr, oi, retaddr, 2, false, false);
 }
 
-tcg_target_ulong __attribute__((flatten))
+tcg_target_ulong
 helper_be_lduw_mmu(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
                    uintptr_t retaddr)
 {
     return load_helper(env, addr, oi, retaddr, 2, true, false);
 }
 
-tcg_target_ulong __attribute__((flatten))
+tcg_target_ulong
 helper_le_ldul_mmu(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
                    uintptr_t retaddr)
 {
     return load_helper(env, addr, oi, retaddr, 4, false, false);
 }
 
-tcg_target_ulong __attribute__((flatten))
+tcg_target_ulong
 helper_be_ldul_mmu(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
                    uintptr_t retaddr)
 {
     return load_helper(env, addr, oi, retaddr, 4, true, false);
 }
 
-uint64_t __attribute__((flatten))
-helper_le_ldq_mmu(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
+
+uint64_t helper_le_ldq_mmu(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
                   uintptr_t retaddr)
 {
     return load_helper(env, addr, oi, retaddr, 8, false, false);
 }
 
-uint64_t __attribute__((flatten))
-helper_be_ldq_mmu(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
+uint64_t helper_be_ldq_mmu(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
                   uintptr_t retaddr)
 {
     return load_helper(env, addr, oi, retaddr, 8, true, false);
@@ -1379,22 +1401,21 @@ helper_be_ldq_mmu(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
  * avoid this for 64-bit data, or for 32-bit data on 32-bit host.
  */
 
-
-tcg_target_ulong __attribute__((flatten))
+tcg_target_ulong
 helper_ret_ldsb_mmu(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
                     uintptr_t retaddr)
 {
     return (int8_t)helper_ret_ldub_mmu(env, addr, oi, retaddr);
 }
 
-tcg_target_ulong __attribute__((flatten))
+tcg_target_ulong
 helper_le_ldsw_mmu(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
                    uintptr_t retaddr)
 {
     return (int16_t)helper_le_lduw_mmu(env, addr, oi, retaddr);
 }
 
-tcg_target_ulong __attribute__((flatten))
+tcg_target_ulong
 helper_be_ldsw_mmu(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
                    uintptr_t retaddr)
 {
@@ -1418,10 +1439,43 @@ helper_be_ldsl_mmu(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
 /*
  * Store Helpers
  */
+static inline __attribute__((__always_inline__))
+void store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
+                  TCGMemOpIdx oi, uintptr_t retaddr, size_t size,
+                  bool big_endian);
 
-static void store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
-                         TCGMemOpIdx oi, uintptr_t retaddr, size_t size,
-                         bool big_endian)
+
+static void
+unaligned_store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
+                       TCGMemOpIdx oi, uintptr_t retaddr, size_t size,
+                       bool big_endian)
+{
+    int i;
+
+    /*
+     * This is inefficient but simple and hopefully on the slow path
+     * anyway. The loop must go in the forward direction to avoid
+     * issues with self-modifying code in Windows 64-bit.
+     */
+    for (i = 0; i < size; ++i) {
+        uint8_t val8;
+        if (big_endian) {
+            /* Big-endian extract.  */
+            val8 = val >> (((size - 1) * 8) - (i * 8));
+        } else {
+            /* Little-endian extract.  */
+            val8 = val >> (i * 8);
+        }
+        store_helper(env, addr + i, val8, oi, retaddr, 1, big_endian);
+    }
+    return;
+}
+
+
+static inline __attribute__((__always_inline__))
+void store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
+                  TCGMemOpIdx oi, uintptr_t retaddr, size_t size,
+                  bool big_endian)
 {
     uintptr_t mmu_idx = get_mmuidx(oi);
     uintptr_t index = tlb_index(env, mmu_idx, addr);
@@ -1452,7 +1506,8 @@ static void store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
         CPUIOTLBEntry *iotlbentry = &env->iotlb[mmu_idx][index];
 
         if ((addr & (size - 1)) != 0) {
-            goto do_unaligned_access;
+            unaligned_store_helper(env, addr, val, oi, retaddr, size, big_endian);
+            return;
         }
 
         io_writex(env, iotlbentry, mmu_idx,
@@ -1465,44 +1520,7 @@ static void store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
     if (size > 1
         && unlikely((addr & ~TARGET_PAGE_MASK) + size - 1
                      >= TARGET_PAGE_SIZE)) {
-        int i;
-        uintptr_t index2;
-        CPUTLBEntry *entry2;
-        target_ulong page2, tlb_addr2;
-    do_unaligned_access:
-        /*
-         * Ensure the second page is in the TLB.  Note that the first page
-         * is already guaranteed to be filled, and that the second page
-         * cannot evict the first.
-         */
-        page2 = (addr + size) & TARGET_PAGE_MASK;
-        index2 = tlb_index(env, mmu_idx, page2);
-        entry2 = tlb_entry(env, mmu_idx, index2);
-        tlb_addr2 = tlb_addr_write(entry2);
-        if (!tlb_hit_page(tlb_addr2, page2)
-            && !VICTIM_TLB_HIT(addr_write, page2)) {
-            tlb_fill(ENV_GET_CPU(env), page2, size, MMU_DATA_STORE,
-                     mmu_idx, retaddr);
-            index2 = tlb_index(env, mmu_idx, page2);
-            entry2 = tlb_entry(env, mmu_idx, index2);
-        }
-
-        /*
-         * XXX: not efficient, but simple.
-         * This loop must go in the forward direction to avoid issues
-         * with self-modifying code in Windows 64-bit.
-         */
-        for (i = 0; i < size; ++i) {
-            uint8_t val8;
-            if (big_endian) {
-                /* Big-endian extract.  */
-                val8 = val >> (((size - 1) * 8) - (i * 8));
-            } else {
-                /* Little-endian extract.  */
-                val8 = val >> (i * 8);
-            }
-            store_helper(env, addr + i, val8, oi, retaddr, 1, big_endian);
-        }
+        unaligned_store_helper(env, addr, val, oi, retaddr, size, big_endian);
         return;
     }
 
@@ -1539,49 +1557,49 @@ static void store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
     }
 }
 
-void __attribute__((flatten))
+void
 helper_ret_stb_mmu(CPUArchState *env, target_ulong addr, uint8_t val,
                    TCGMemOpIdx oi, uintptr_t retaddr)
 {
     store_helper(env, addr, val, oi, retaddr, 1, false);
 }
 
-void __attribute__((flatten))
+void
 helper_le_stw_mmu(CPUArchState *env, target_ulong addr, uint16_t val,
                   TCGMemOpIdx oi, uintptr_t retaddr)
 {
     store_helper(env, addr, val, oi, retaddr, 2, false);
 }
 
-void __attribute__((flatten))
+void
 helper_be_stw_mmu(CPUArchState *env, target_ulong addr, uint16_t val,
                   TCGMemOpIdx oi, uintptr_t retaddr)
 {
     store_helper(env, addr, val, oi, retaddr, 2, true);
 }
 
-void __attribute__((flatten))
+void
 helper_le_stl_mmu(CPUArchState *env, target_ulong addr, uint32_t val,
                   TCGMemOpIdx oi, uintptr_t retaddr)
 {
     store_helper(env, addr, val, oi, retaddr, 4, false);
 }
 
-void __attribute__((flatten))
+void
 helper_be_stl_mmu(CPUArchState *env, target_ulong addr, uint32_t val,
                   TCGMemOpIdx oi, uintptr_t retaddr)
 {
     store_helper(env, addr, val, oi, retaddr, 4, true);
 }
 
-void __attribute__((flatten))
+void
 helper_le_stq_mmu(CPUArchState *env, target_ulong addr, uint64_t val,
                   TCGMemOpIdx oi, uintptr_t retaddr)
 {
     store_helper(env, addr, val, oi, retaddr, 8, false);
 }
 
-void __attribute__((flatten))
+void
 helper_be_stq_mmu(CPUArchState *env, target_ulong addr, uint64_t val,
                   TCGMemOpIdx oi, uintptr_t retaddr)
 {
@@ -1647,49 +1665,49 @@ helper_be_stq_mmu(CPUArchState *env, target_ulong addr, uint64_t val,
 
 /* Code access functions.  */
 
-uint8_t __attribute__((flatten))
+uint8_t
 helper_ret_ldb_cmmu(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
                     uintptr_t retaddr)
 {
     return load_helper(env, addr, oi, retaddr, 1, false, true);
 }
 
-uint16_t __attribute__((flatten))
+uint16_t
 helper_le_ldw_cmmu(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
                    uintptr_t retaddr)
 {
     return load_helper(env, addr, oi, retaddr, 2, false, true);
 }
 
-uint16_t __attribute__((flatten))
+uint16_t
 helper_be_ldw_cmmu(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
                    uintptr_t retaddr)
 {
     return load_helper(env, addr, oi, retaddr, 2, true, true);
 }
 
-uint32_t __attribute__((flatten))
+uint32_t
 helper_le_ldl_cmmu(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
                    uintptr_t retaddr)
 {
     return load_helper(env, addr, oi, retaddr, 4, false, true);
 }
 
-uint32_t __attribute__((flatten))
+uint32_t
 helper_be_ldl_cmmu(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
                    uintptr_t retaddr)
 {
     return load_helper(env, addr, oi, retaddr, 4, true, true);
 }
 
-uint64_t __attribute__((flatten))
+uint64_t
 helper_le_ldq_cmmu(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
                    uintptr_t retaddr)
 {
     return load_helper(env, addr, oi, retaddr, 8, false, true);
 }
 
-uint64_t __attribute__((flatten))
+uint64_t
 helper_be_ldq_cmmu(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
                    uintptr_t retaddr)
 {
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH  v3 0/3] softmmu demacro
  2019-02-15 14:31 [Qemu-devel] [PATCH v3 0/3] softmmu demacro Alex Bennée
                   ` (2 preceding siblings ...)
  2019-02-15 14:31 ` [Qemu-devel] [PATCH v3 3/3] accel/tcg: move unaligned helpers out of core helpers Alex Bennée
@ 2019-02-15 15:03 ` no-reply
  2019-02-19 14:22 ` Alex Bennée
  4 siblings, 0 replies; 11+ messages in thread
From: no-reply @ 2019-02-15 15:03 UTC (permalink / raw)
  To: alex.bennee; +Cc: fam, qemu-devel, cota, mark.cave-ayland

Patchew URL: https://patchew.org/QEMU/20190215143115.28777-1-alex.bennee@linaro.org/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Message-id: 20190215143115.28777-1-alex.bennee@linaro.org
Subject: [Qemu-devel] [PATCH  v3 0/3] softmmu demacro
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 t [tag update]            patchew/20190215133005.15955-1-david@redhat.com -> patchew/20190215133005.15955-1-david@redhat.com
 * [new tag]               patchew/20190215143115.28777-1-alex.bennee@linaro.org -> patchew/20190215143115.28777-1-alex.bennee@linaro.org
Switched to a new branch 'test'
e26497c4c1 accel/tcg: move unaligned helpers out of core helpers
bb7a57bffb accel/tcg: remove softmmu_template.h
3a3cfc0d92 accel/tcg: demacro cputlb

=== OUTPUT BEGIN ===
1/3 Checking commit 3a3cfc0d9223 (accel/tcg: demacro cputlb)
2/3 Checking commit bb7a57bffbe0 (accel/tcg: remove softmmu_template.h)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#11: 
deleted file mode 100644

total: 0 errors, 1 warnings, 0 lines checked

Patch 2/3 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
3/3 Checking commit e26497c4c1e5 (accel/tcg: move unaligned helpers out of core helpers)
ERROR: externs should be avoided in .c files
#29: FILE: accel/tcg/cputlb.c:1210:
+tcg_target_ulong load_helper(CPUArchState *env, target_ulong addr,

WARNING: line over 80 characters
#34: FILE: accel/tcg/cputlb.c:1215:
+static tcg_target_ulong unaligned_load_helper(CPUArchState *env, target_ulong addr,

ERROR: externs should be avoided in .c files
#199: FILE: accel/tcg/cputlb.c:1443:
+void store_helper(CPUArchState *env, target_ulong addr, uint64_t val,

WARNING: line over 80 characters
#246: FILE: accel/tcg/cputlb.c:1509:
+            unaligned_store_helper(env, addr, val, oi, retaddr, size, big_endian);

total: 2 errors, 2 warnings, 381 lines checked

Patch 3/3 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20190215143115.28777-1-alex.bennee@linaro.org/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH  v3 0/3] softmmu demacro
  2019-02-15 14:31 [Qemu-devel] [PATCH v3 0/3] softmmu demacro Alex Bennée
                   ` (3 preceding siblings ...)
  2019-02-15 15:03 ` [Qemu-devel] [PATCH v3 0/3] softmmu demacro no-reply
@ 2019-02-19 14:22 ` Alex Bennée
  2019-02-19 18:25   ` Emilio G. Cota
  4 siblings, 1 reply; 11+ messages in thread
From: Alex Bennée @ 2019-02-19 14:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota, mark.cave-ayland


Alex Bennée <alex.bennee@linaro.org> writes:

> Hi,
>
> This is hopefully the final version of the softmmu demacro series. I
> tracked down the remaining failures to:
>
>   - not always using the correct victim tlb
>   - not masking reads when we unrolled the unaligned helpers
>
> Other than that I've rolled in the changes that were made to support
> dynamic TLB code and the follow-up fixes. All patches need some review
> before we can merge them.

Measurements:

Before

hyperfine -w 2 './aarch64-softmmu/qemu-system-aarch64 -machine virt,graphics=on,gic-version=3,virtualization=on -cpu cortex-a53 -serial mon:stdio -nic user,model=virtio-net-
pci,hostfwd=tcp::2222-:22 -device virtio-scsi-pci -device scsi-hd,drive=hd0 -blockdev driver=raw,node-name=hd0,discard=unmap,file.driver=host_device,file.filename=/dev/zvol/
hackpool-0/debian-buster-arm64 -kernel ../linux.git/builds/arm64.nopreempt/arch/arm64/boot/Image -append "console=ttyAMA0 root=/dev/sda2 systemd.unit=benchmark-pigz.service"
 -display none -m 8192 -smp 4 --snapshot'
Benchmark #1: ./aarch64-softmmu/qemu-system-aarch64 -machine virt,graphics=on,gic-version=3,virtualization=on -cpu cortex-a53 -serial mon:stdio -nic user,model=virtio-net-pci,hostfwd=tcp::2222-:22 -device virtio-scsi-pci -device scsi-hd,drive=hd0 -blockdev driver=raw,node-name=hd0,discard=unmap,file.driver=host_device,file.filename=/dev/zvol/hackpool-0/debian-buster-arm64 -kernel ../linux.git/builds/arm64.nopreempt/arch/arm64/boot/Image -append "console=ttyAMA0 root=/dev/sda2 systemd.unit=benchmark-pigz.service" -display none -m 8192 -smp 4 --snapshot
  Time (mean ± σ):     66.035 s ±  1.425 s    [User: 241.495 s, System: 2.327 s]
  Range (min … max):   64.718 s … 68.944 s    10 runs

After

hyperfine -w 2 './aarch64-softmmu/qemu-system-aarch64 -machine virt,graphics=on,gic-version=3,virtualization=on -cpu cortex-a53 -serial mon:stdio -nic user,model=virtio-net-pci,hostfwd=tcp::2222-:22 -device virtio-scsi-pci -device scsi-hd,drive=hd0 -blockdev driver=raw,node-name=hd0,discard=unmap,file.driver=host_device,file.filename=/dev/zvol/hackpool-0/debian-buster-arm64 -kernel ../linux.git/builds/arm64.nopreempt/arch/arm64/boot/Image -append "console=ttyAMA0 root=/dev/sda2 systemd.unit=benchmark-pigz.service" -display none -m 8192 -smp 4 --snapshot'
Benchmark #1: ./aarch64-softmmu/qemu-system-aarch64 -machine virt,graphics=on,gic-version=3,virtualization=on -cpu cortex-a53 -serial mon:stdio -nic user,model=virtio-net-pci,hostfwd=tcp::2222-:22 -device virtio-scsi-pci -device scsi-hd,drive=hd0 -blockdev driver=raw,node-name=hd0,discard=unmap,file.driver=host_device,file.filename=/dev/zvol/hackpool-0/debian-buster-arm64 -kernel ../linux.git/builds/arm64.nopreempt/arch/arm64/boot/Image -append "console=ttyAMA0 root=/dev/sda2 systemd.unit=benchmark-pigz.service" -display none -m 8192 -smp 4 --snapshot
  Time (mean ± σ):     65.331 s ±  0.684 s    [User: 240.528 s, System: 2.306 s]
  Range (min … max):   64.624 s … 66.914 s    10 runs

Although I'm not going to claim a performance boost, just that it's in
the noise.

Emilio,

Any chance you could run it through your benchmark suite?

>
> Alex Bennée (3):
>   accel/tcg: demacro cputlb
>   accel/tcg: remove softmmu_template.h
>   accel/tcg: move unaligned helpers out of core helpers
>
>  accel/tcg/cputlb.c           | 501 +++++++++++++++++++++++++++++++++--
>  accel/tcg/softmmu_template.h | 454 -------------------------------
>  2 files changed, 475 insertions(+), 480 deletions(-)
>  delete mode 100644 accel/tcg/softmmu_template.h


--
Alex Bennée

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH  v3 0/3] softmmu demacro
  2019-02-19 14:22 ` Alex Bennée
@ 2019-02-19 18:25   ` Emilio G. Cota
  2019-02-25 22:27     ` Mark Cave-Ayland
  0 siblings, 1 reply; 11+ messages in thread
From: Emilio G. Cota @ 2019-02-19 18:25 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel, mark.cave-ayland

On Tue, Feb 19, 2019 at 14:22:37 +0000, Alex Bennée wrote:
> Emilio,
> 
> Any chance you could run it through your benchmark suite?

Something isn't quite right. For instance, gcc in SPEC doesn't
complete; it fails after 2s with some errors about the syntax of
the source being compiled. Before the patches the error isn't
there (the test completes in ~7s, as usual), so I suspect
some guest memory accesses aren't going to the right place.

I am too busy right now to try to debug this, but if you have
patches to test, I can run them. The patches I tested are in
your v3 branch on github, i.e. with 430f28154b5 at the head.

Thanks,

		Emilio

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/3] softmmu demacro
  2019-02-19 18:25   ` Emilio G. Cota
@ 2019-02-25 22:27     ` Mark Cave-Ayland
  2019-02-26  9:24       ` Alex Bennée
  0 siblings, 1 reply; 11+ messages in thread
From: Mark Cave-Ayland @ 2019-02-25 22:27 UTC (permalink / raw)
  To: Emilio G. Cota, Alex Bennée; +Cc: qemu-devel

On 19/02/2019 18:25, Emilio G. Cota wrote:

> On Tue, Feb 19, 2019 at 14:22:37 +0000, Alex Bennée wrote:
>> Emilio,
>>
>> Any chance you could run it through your benchmark suite?
> 
> Something isn't quite right. For instance, gcc in SPEC doesn't
> complete; it fails after 2s with some errors about the syntax of
> the source being compiled. Before the patches the error isn't
> there (the test completes in ~7s, as usual), so I suspect
> some guest memory accesses aren't going to the right place.
> 
> I am too busy right now to try to debug this, but if you have
> patches to test, I can run them. The patches I tested are in
> your v3 branch on github, i.e. with 430f28154b5 at the head.

I spent a few hours yesterday and today testing this patchset against my OpenBIOS
test images and found that all of them were able to boot, except for one MacOS image
which started exhibiting flashing blocks on a progress bar during boot. Tracing this
back, I found the problem was still present when just the first patch "accel/tcg:
demacro cputlb" was applied.

First of all there were a couple of typos I found in load_helper() and store_helper()
which can be fixed up with the following diff:


diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 351f579fed..f3cc2dd0d9 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -1251,7 +1251,7 @@ static tcg_target_ulong load_helper(CPUArchState *env,
target_ulong addr,
         }

         tmp = io_readx(env, iotlbentry, mmu_idx, addr, retaddr,
-                       addr & tlb_addr & TLB_RECHECK,
+                       tlb_addr & TLB_RECHECK,
                        code_read ? MMU_INST_FETCH : MMU_DATA_LOAD, size);
         return handle_bswap(tmp, size, big_endian);
     }
@@ -1405,14 +1405,14 @@ tcg_target_ulong
 helper_le_ldsl_mmu(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
                    uintptr_t retaddr)
 {
-    return (int32_t)helper_le_lduw_mmu(env, addr, oi, retaddr);
+    return (int32_t)helper_le_ldul_mmu(env, addr, oi, retaddr);
 }

 tcg_target_ulong
 helper_be_ldsl_mmu(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
                    uintptr_t retaddr)
 {
-    return (int32_t)helper_be_lduw_mmu(env, addr, oi, retaddr);
+    return (int32_t)helper_be_ldul_mmu(env, addr, oi, retaddr);
 }
+    return (int32_t)helper_le_ldul_mmu(env, addr, oi, retaddr);
 }

 tcg_target_ulong
 helper_be_ldsl_mmu(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
                    uintptr_t retaddr)
 {
-    return (int32_t)helper_be_lduw_mmu(env, addr, oi, retaddr);
+    return (int32_t)helper_be_ldul_mmu(env, addr, oi, retaddr);
 }

 /*
@@ -1427,6 +1427,7 @@ static void store_helper(CPUArchState *env, target_ulong addr,
uint64_t val,
     uintptr_t index = tlb_index(env, mmu_idx, addr);
     CPUTLBEntry *entry = tlb_entry(env, mmu_idx, addr);
     target_ulong tlb_addr = tlb_addr_write(entry);
+    const size_t tlb_off = offsetof(CPUTLBEntry, addr_write);
     unsigned a_bits = get_alignment_bits(get_memop(oi));
     uintptr_t haddr;

@@ -1438,7 +1439,8 @@ static void store_helper(CPUArchState *env, target_ulong addr,
uint64_t val,

     /* If the TLB entry is for a different page, reload and try again.  */
     if (!tlb_hit(tlb_addr, addr)) {
-        if (!VICTIM_TLB_HIT(addr_write, addr)) {
+        if (!victim_tlb_hit(env, mmu_idx, index, tlb_off,
+            addr & TARGET_PAGE_MASK)) {
             tlb_fill(ENV_GET_CPU(env), addr, size, MMU_DATA_STORE,
                      mmu_idx, retaddr);
             index = tlb_index(env, mmu_idx, addr);
@@ -1466,7 +1468,6 @@ static void store_helper(CPUArchState *env, target_ulong addr,
uint64_t val,
         && unlikely((addr & ~TARGET_PAGE_MASK) + size - 1
                      >= TARGET_PAGE_SIZE)) {
         int i;
-        uintptr_t index2;
         CPUTLBEntry *entry2;
         target_ulong page2, tlb_addr2;
     do_unaligned_access:
@@ -1476,15 +1477,13 @@ static void store_helper(CPUArchState *env, target_ulong
addr, uint64_t val,
          * cannot evict the first.
          */
         page2 = (addr + size) & TARGET_PAGE_MASK;
-        index2 = tlb_index(env, mmu_idx, page2);
-        entry2 = tlb_entry(env, mmu_idx, index2);
+        entry2 = tlb_entry(env, mmu_idx, page2);
         tlb_addr2 = tlb_addr_write(entry2);
         if (!tlb_hit_page(tlb_addr2, page2)
-            && !VICTIM_TLB_HIT(addr_write, page2)) {
+            && !victim_tlb_hit(env, mmu_idx, index, tlb_off,
+                               page2 & TARGET_PAGE_MASK)) {
             tlb_fill(ENV_GET_CPU(env), page2, size, MMU_DATA_STORE,
                      mmu_idx, retaddr);
-            index2 = tlb_index(env, mmu_idx, page2);
-            entry2 = tlb_entry(env, mmu_idx, index2);
         }

         /*


Even with this patch applied I was still seeing issues with the progress bar, so I
ended up manually unrolling softmmu_template.h myself until I could see at what point
things were breaking.

The culprit turned out to be the change in type of res in load_helper() from the size
-related type of uint8_t/uint16_t/uint32_t/uint64_t to tcg_target_ulong in the slow
path as seen from my locally unrolled version:


WORKING:

tcg_target_ulong helper_le_ldul_mmu(CPUArchState *env, target_ulong addr,
                            TCGMemOpIdx oi, uintptr_t retaddr)
{
    ....

    /* Handle slow unaligned access (it spans two pages or IO).  */
    if (4 > 1
        && unlikely((addr & ~TARGET_PAGE_MASK) + 4 - 1
                    >= TARGET_PAGE_SIZE)) {
        target_ulong addr1, addr2;
        uint32_t res1, res2;
        ^^^^^^^^
        unsigned shift;
    do_unaligned_access:
        addr1 = addr & ~(4 - 1);
        addr2 = addr1 + 4;
        res1 = helper_le_ldul_mmu(env, addr1, oi, retaddr);
        res2 = helper_le_ldul_mmu(env, addr2, oi, retaddr);
        shift = (addr & (4 - 1)) * 8;

        /* Little-endian combine.  */
        res = (res1 >> shift) | (res2 << ((4 * 8) - shift));
        return res;
    }

}

tcg_target_ulong helper_be_ldul_mmu(CPUArchState *env, target_ulong addr,
                            TCGMemOpIdx oi, uintptr_t retaddr)
{
    ....

    /* Handle slow unaligned access (it spans two pages or IO).  */
    if (4 > 1
        && unlikely((addr & ~TARGET_PAGE_MASK) + 4 - 1
                    >= TARGET_PAGE_SIZE)) {
        target_ulong addr1, addr2;
        uint32_t res1, res2;
        ^^^^^^^^
        unsigned shift;
    do_unaligned_access:
        addr1 = addr & ~(4 - 1);
        addr2 = addr1 + 4;
        res1 = helper_be_ldul_mmu(env, addr1, oi, retaddr);
        res2 = helper_be_ldul_mmu(env, addr2, oi, retaddr);
        shift = (addr & (4 - 1)) * 8;

        /* Big-endian combine.  */
        res = (res1 << shift) | (res2 >> ((4 * 8) - shift));
        return res;
    }
}


CORRUPTED:

tcg_target_ulong helper_le_ldul_mmu(CPUArchState *env, target_ulong addr,
                            TCGMemOpIdx oi, uintptr_t retaddr)
{
    ....

    /* Handle slow unaligned access (it spans two pages or IO).  */
    if (4 > 1
        && unlikely((addr & ~TARGET_PAGE_MASK) + 4 - 1
                    >= TARGET_PAGE_SIZE)) {
        target_ulong addr1, addr2;
        tcg_target_ulong res1, res2;
        ^^^^^^^^^^^^^^^^
        unsigned shift;
    do_unaligned_access:
        addr1 = addr & ~(4 - 1);
        addr2 = addr1 + 4;
        res1 = helper_le_ldul_mmu(env, addr1, oi, retaddr);
        res2 = helper_le_ldul_mmu(env, addr2, oi, retaddr);
        shift = (addr & (4 - 1)) * 8;

        /* Little-endian combine.  */
        res = (res1 >> shift) | (res2 << ((4 * 8) - shift));
        return res;
    }

}

tcg_target_ulong helper_be_ldul_mmu(CPUArchState *env, target_ulong addr,
                            TCGMemOpIdx oi, uintptr_t retaddr)
{
    ....

    /* Handle slow unaligned access (it spans two pages or IO).  */
    if (4 > 1
        && unlikely((addr & ~TARGET_PAGE_MASK) + 4 - 1
                    >= TARGET_PAGE_SIZE)) {
        target_ulong addr1, addr2;
        tcg_target_ulong res1, res2;
        ^^^^^^^^^^^^^^^^
        unsigned shift;
    do_unaligned_access:
        addr1 = addr & ~(4 - 1);
        addr2 = addr1 + 4;
        res1 = helper_be_ldul_mmu(env, addr1, oi, retaddr);
        res2 = helper_be_ldul_mmu(env, addr2, oi, retaddr);
        shift = (addr & (4 - 1)) * 8;

        /* Big-endian combine.  */
        res = (res1 << shift) | (res2 >> ((4 * 8) - shift));
        return res;
    }
}


Presumably the issue here is somehow related to the compiler incorrectly
extending/reducing the shift when the larger type is involved? Also during my tests
the visual corruption was only present for 32-bit accesses, but presumably all the
access sizes will need a similar fix.


ATB,

Mark.

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/3] softmmu demacro
  2019-02-25 22:27     ` Mark Cave-Ayland
@ 2019-02-26  9:24       ` Alex Bennée
  2019-02-26 19:03         ` Mark Cave-Ayland
  0 siblings, 1 reply; 11+ messages in thread
From: Alex Bennée @ 2019-02-26  9:24 UTC (permalink / raw)
  To: Mark Cave-Ayland; +Cc: Emilio G. Cota, qemu-devel


Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> writes:

> On 19/02/2019 18:25, Emilio G. Cota wrote:
>
>> On Tue, Feb 19, 2019 at 14:22:37 +0000, Alex Bennée wrote:
>>> Emilio,
>>>
>>> Any chance you could run it through your benchmark suite?
>>
>> Something isn't quite right. For instance, gcc in SPEC doesn't
>> complete; it fails after 2s with some errors about the syntax of
>> the source being compiled. Before the patches the error isn't
>> there (the test completes in ~7s, as usual), so I suspect
>> some guest memory accesses aren't going to the right place.
>>
>> I am too busy right now to try to debug this, but if you have
>> patches to test, I can run them. The patches I tested are in
>> your v3 branch on github, i.e. with 430f28154b5 at the head.
>
> I spent a few hours yesterday and today testing this patchset against my OpenBIOS
> test images and found that all of them were able to boot, except for one MacOS image
> which started exhibiting flashing blocks on a progress bar during boot. Tracing this
> back, I found the problem was still present when just the first patch "accel/tcg:
> demacro cputlb" was applied.

Yeah in the original version of the patch I de-macroed each element one
by one which made bisecting errors easier. This is more of a big bang
approach which has made it harder to find which bit of the conversion
broke.

That said I did test it fairly hard on ARM but I guess we see less
unaligned and page-crossing access there.

>
> First of all there were a couple of typos I found in load_helper() and store_helper()
> which can be fixed up with the following diff:
>
>
> diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
> index 351f579fed..f3cc2dd0d9 100644
> --- a/accel/tcg/cputlb.c
> +++ b/accel/tcg/cputlb.c
> @@ -1251,7 +1251,7 @@ static tcg_target_ulong load_helper(CPUArchState *env,
> target_ulong addr,
>          }
>
>          tmp = io_readx(env, iotlbentry, mmu_idx, addr, retaddr,
> -                       addr & tlb_addr & TLB_RECHECK,
> +                       tlb_addr & TLB_RECHECK,
>                         code_read ? MMU_INST_FETCH : MMU_DATA_LOAD, size);
>          return handle_bswap(tmp, size, big_endian);
>      }
> @@ -1405,14 +1405,14 @@ tcg_target_ulong
>  helper_le_ldsl_mmu(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
>                     uintptr_t retaddr)
>  {
> -    return (int32_t)helper_le_lduw_mmu(env, addr, oi, retaddr);
> +    return (int32_t)helper_le_ldul_mmu(env, addr, oi, retaddr);
>  }
>
>  tcg_target_ulong
>  helper_be_ldsl_mmu(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
>                     uintptr_t retaddr)
>  {
> -    return (int32_t)helper_be_lduw_mmu(env, addr, oi, retaddr);
> +    return (int32_t)helper_be_ldul_mmu(env, addr, oi, retaddr);
>  }
> +    return (int32_t)helper_le_ldul_mmu(env, addr, oi, retaddr);
>  }
>
>  tcg_target_ulong
>  helper_be_ldsl_mmu(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
>                     uintptr_t retaddr)
>  {
> -    return (int32_t)helper_be_lduw_mmu(env, addr, oi, retaddr);
> +    return (int32_t)helper_be_ldul_mmu(env, addr, oi, retaddr);
>  }

Awesome, I shall apply those.

>
>  /*
> @@ -1427,6 +1427,7 @@ static void store_helper(CPUArchState *env, target_ulong addr,
> uint64_t val,
>      uintptr_t index = tlb_index(env, mmu_idx, addr);
>      CPUTLBEntry *entry = tlb_entry(env, mmu_idx, addr);
>      target_ulong tlb_addr = tlb_addr_write(entry);
> +    const size_t tlb_off = offsetof(CPUTLBEntry, addr_write);
>      unsigned a_bits = get_alignment_bits(get_memop(oi));
>      uintptr_t haddr;
>
> @@ -1438,7 +1439,8 @@ static void store_helper(CPUArchState *env, target_ulong addr,
> uint64_t val,
>
>      /* If the TLB entry is for a different page, reload and try again.  */
>      if (!tlb_hit(tlb_addr, addr)) {
> -        if (!VICTIM_TLB_HIT(addr_write, addr)) {
> +        if (!victim_tlb_hit(env, mmu_idx, index, tlb_off,
> +            addr & TARGET_PAGE_MASK)) {
>              tlb_fill(ENV_GET_CPU(env), addr, size, MMU_DATA_STORE,
>                       mmu_idx, retaddr);
>              index = tlb_index(env, mmu_idx, addr);
> @@ -1466,7 +1468,6 @@ static void store_helper(CPUArchState *env, target_ulong addr,
> uint64_t val,
>          && unlikely((addr & ~TARGET_PAGE_MASK) + size - 1
>                       >= TARGET_PAGE_SIZE)) {
>          int i;
> -        uintptr_t index2;
>          CPUTLBEntry *entry2;
>          target_ulong page2, tlb_addr2;
>      do_unaligned_access:
> @@ -1476,15 +1477,13 @@ static void store_helper(CPUArchState *env, target_ulong
> addr, uint64_t val,
>           * cannot evict the first.
>           */
>          page2 = (addr + size) & TARGET_PAGE_MASK;
> -        index2 = tlb_index(env, mmu_idx, page2);
> -        entry2 = tlb_entry(env, mmu_idx, index2);
> +        entry2 = tlb_entry(env, mmu_idx, page2);
>          tlb_addr2 = tlb_addr_write(entry2);
>          if (!tlb_hit_page(tlb_addr2, page2)
> -            && !VICTIM_TLB_HIT(addr_write, page2)) {
> +            && !victim_tlb_hit(env, mmu_idx, index, tlb_off,
> +                               page2 & TARGET_PAGE_MASK)) {
>              tlb_fill(ENV_GET_CPU(env), page2, size, MMU_DATA_STORE,
>                       mmu_idx, retaddr);
> -            index2 = tlb_index(env, mmu_idx, page2);
> -            entry2 = tlb_entry(env, mmu_idx, index2);
>          }

Oops, I thought I'd applied the victim fix to both loads and stores.

>
>          /*
>
>
> Even with this patch applied I was still seeing issues with the progress bar, so I
> ended up manually unrolling softmmu_template.h myself until I could see at what point
> things were breaking.
>
> The culprit turned out to be the change in type of res in load_helper() from the size
> -related type of uint8_t/uint16_t/uint32_t/uint64_t to tcg_target_ulong in the slow
> path as seen from my locally unrolled version:
>
>
> WORKING:
>
> tcg_target_ulong helper_le_ldul_mmu(CPUArchState *env, target_ulong addr,
>                             TCGMemOpIdx oi, uintptr_t retaddr)
> {
>     ....
>
>     /* Handle slow unaligned access (it spans two pages or IO).  */
>     if (4 > 1
>         && unlikely((addr & ~TARGET_PAGE_MASK) + 4 - 1
>                     >= TARGET_PAGE_SIZE)) {
>         target_ulong addr1, addr2;
>         uint32_t res1, res2;
>         ^^^^^^^^
>         unsigned shift;
>     do_unaligned_access:
>         addr1 = addr & ~(4 - 1);
>         addr2 = addr1 + 4;
>         res1 = helper_le_ldul_mmu(env, addr1, oi, retaddr);
>         res2 = helper_le_ldul_mmu(env, addr2, oi, retaddr);
>         shift = (addr & (4 - 1)) * 8;
>
>         /* Little-endian combine.  */
>         res = (res1 >> shift) | (res2 << ((4 * 8) - shift));
>         return res;
>     }
>
> }
>
> tcg_target_ulong helper_be_ldul_mmu(CPUArchState *env, target_ulong addr,
>                             TCGMemOpIdx oi, uintptr_t retaddr)
> {
>     ....
>
>     /* Handle slow unaligned access (it spans two pages or IO).  */
>     if (4 > 1
>         && unlikely((addr & ~TARGET_PAGE_MASK) + 4 - 1
>                     >= TARGET_PAGE_SIZE)) {
>         target_ulong addr1, addr2;
>         uint32_t res1, res2;
>         ^^^^^^^^
>         unsigned shift;
>     do_unaligned_access:
>         addr1 = addr & ~(4 - 1);
>         addr2 = addr1 + 4;
>         res1 = helper_be_ldul_mmu(env, addr1, oi, retaddr);
>         res2 = helper_be_ldul_mmu(env, addr2, oi, retaddr);
>         shift = (addr & (4 - 1)) * 8;
>
>         /* Big-endian combine.  */
>         res = (res1 << shift) | (res2 >> ((4 * 8) - shift));
>         return res;
>     }
> }
>
>
> CORRUPTED:
>
> tcg_target_ulong helper_le_ldul_mmu(CPUArchState *env, target_ulong addr,
>                             TCGMemOpIdx oi, uintptr_t retaddr)
> {
>     ....
>
>     /* Handle slow unaligned access (it spans two pages or IO).  */
>     if (4 > 1
>         && unlikely((addr & ~TARGET_PAGE_MASK) + 4 - 1
>                     >= TARGET_PAGE_SIZE)) {
>         target_ulong addr1, addr2;
>         tcg_target_ulong res1, res2;
>         ^^^^^^^^^^^^^^^^
>         unsigned shift;
>     do_unaligned_access:
>         addr1 = addr & ~(4 - 1);
>         addr2 = addr1 + 4;
>         res1 = helper_le_ldul_mmu(env, addr1, oi, retaddr);
>         res2 = helper_le_ldul_mmu(env, addr2, oi, retaddr);
>         shift = (addr & (4 - 1)) * 8;
>
>         /* Little-endian combine.  */
>         res = (res1 >> shift) | (res2 << ((4 * 8) - shift));
>         return res;
>     }
>
> }
>
> tcg_target_ulong helper_be_ldul_mmu(CPUArchState *env, target_ulong addr,
>                             TCGMemOpIdx oi, uintptr_t retaddr)
> {
>     ....
>
>     /* Handle slow unaligned access (it spans two pages or IO).  */
>     if (4 > 1
>         && unlikely((addr & ~TARGET_PAGE_MASK) + 4 - 1
>                     >= TARGET_PAGE_SIZE)) {
>         target_ulong addr1, addr2;
>         tcg_target_ulong res1, res2;
>         ^^^^^^^^^^^^^^^^
>         unsigned shift;
>     do_unaligned_access:
>         addr1 = addr & ~(4 - 1);
>         addr2 = addr1 + 4;
>         res1 = helper_be_ldul_mmu(env, addr1, oi, retaddr);
>         res2 = helper_be_ldul_mmu(env, addr2, oi, retaddr);
>         shift = (addr & (4 - 1)) * 8;
>
>         /* Big-endian combine.  */
>         res = (res1 << shift) | (res2 >> ((4 * 8) - shift));
>         return res;
>     }
> }
>
>
> Presumably the issue here is somehow related to the compiler incorrectly
> extending/reducing the shift when the larger type is involved? Also during my tests
> the visual corruption was only present for 32-bit accesses, but presumably all the
> access sizes will need a similar fix.

So I did fix this in the third patch when I out of lined the unaligned
helpers so:

     const tcg_target_ulong size_mask = MAKE_64BIT_MASK(0, size * 8);

and

     /* Big-endian combine.  */
     r2 &= size_mask;

or

     /* Little-endian combine.  */
     r1 &= size_mask;

I guess I should also apply that to patch 1 for bisectability.

Thanks for digging into the failures. It does make me think that we
really could do with some system mode tests to properly exercise all
softmmu code paths:

  * each size access, load and store
  * unaligned accesses, load and store (+ potential arch specific handling)
  * page striding accesses faulting and non-faulting

I'm not sure how much work it would be to get a minimal bare metal
kernel that would be able to do these and report success/fail. When I
did the MTTCG work I used kvm-unit-tests as a fairly complete mini-os
but maybe that's overkill for what we need here. After all we already
have migration kernels that just do a simple tick-tock for testing
migration. It would be nice to have the tests in C with maybe a minimal
per-arch assembly so we can share the tests across various platforms.

--
Alex Bennée

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/3] softmmu demacro
  2019-02-26  9:24       ` Alex Bennée
@ 2019-02-26 19:03         ` Mark Cave-Ayland
  2019-02-26 19:57           ` Mark Cave-Ayland
  0 siblings, 1 reply; 11+ messages in thread
From: Mark Cave-Ayland @ 2019-02-26 19:03 UTC (permalink / raw)
  To: Alex Bennée; +Cc: Emilio G. Cota, qemu-devel

On 26/02/2019 09:24, Alex Bennée wrote:

>> Presumably the issue here is somehow related to the compiler incorrectly
>> extending/reducing the shift when the larger type is involved? Also during my tests
>> the visual corruption was only present for 32-bit accesses, but presumably all the
>> access sizes will need a similar fix.
> 
> So I did fix this in the third patch when I out of lined the unaligned
> helpers so:
> 
>      const tcg_target_ulong size_mask = MAKE_64BIT_MASK(0, size * 8);
> 
> and
> 
>      /* Big-endian combine.  */
>      r2 &= size_mask;
> 
> or
> 
>      /* Little-endian combine.  */
>      r1 &= size_mask;
> 
> I guess I should also apply that to patch 1 for bisectability.

I've done that locally, and while things have improved with progress bars I'm still
seeing some strange blank fills appearing on the display in MacOS. I'll have another
dig further and see what's going on...

> Thanks for digging into the failures. It does make me think that we
> really could do with some system mode tests to properly exercise all
> softmmu code paths:
> 
>   * each size access, load and store
>   * unaligned accesses, load and store (+ potential arch specific handling)
>   * page striding accesses faulting and non-faulting
> 
> I'm not sure how much work it would be to get a minimal bare metal
> kernel that would be able to do these and report success/fail. When I
> did the MTTCG work I used kvm-unit-tests as a fairly complete mini-os
> but maybe that's overkill for what we need here. After all we already
> have migration kernels that just do a simple tick-tock for testing
> migration. It would be nice to have the tests in C with maybe a minimal
> per-arch assembly so we can share the tests across various platforms.

Right. Having unrolled the macros myself whilst softmmu_template.h appears on the
surface to be deceptively simple, it hides quite a few subtle implementation details
which is why I think your work is so valuable here.


ATB,

Mark.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/3] softmmu demacro
  2019-02-26 19:03         ` Mark Cave-Ayland
@ 2019-02-26 19:57           ` Mark Cave-Ayland
  0 siblings, 0 replies; 11+ messages in thread
From: Mark Cave-Ayland @ 2019-02-26 19:57 UTC (permalink / raw)
  To: Alex Bennée; +Cc: Emilio G. Cota, qemu-devel

On 26/02/2019 19:03, Mark Cave-Ayland wrote:

> On 26/02/2019 09:24, Alex Bennée wrote:
> 
>>> Presumably the issue here is somehow related to the compiler incorrectly
>>> extending/reducing the shift when the larger type is involved? Also during my tests
>>> the visual corruption was only present for 32-bit accesses, but presumably all the
>>> access sizes will need a similar fix.
>>
>> So I did fix this in the third patch when I out of lined the unaligned
>> helpers so:
>>
>>      const tcg_target_ulong size_mask = MAKE_64BIT_MASK(0, size * 8);
>>
>> and
>>
>>      /* Big-endian combine.  */
>>      r2 &= size_mask;
>>
>> or
>>
>>      /* Little-endian combine.  */
>>      r1 &= size_mask;
>>
>> I guess I should also apply that to patch 1 for bisectability.
> 
> I've done that locally, and while things have improved with progress bars I'm still
> seeing some strange blank fills appearing on the display in MacOS. I'll have another
> dig further and see what's going on...

Okay I've found it: looks like you also need to apply size_mask to the final result
to keep within the number of bits represented by size:


diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 7729254424..73710f9b9c 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -1274,11 +1274,11 @@ static tcg_target_ulong load_helper(CPUArchState *env,
target_ulong addr,
         if (big_endian) {
             /* Big-endian combine.  */
             r2 &= size_mask;
-            res = (r1 << shift) | (r2 >> ((size * 8) - shift));
+            res = ((r1 << shift) | (r2 >> ((size * 8) - shift))) & size_mask;
         } else {
             /* Little-endian combine.  */
             r1 &= size_mask;
-            res = (r1 >> shift) | (r2 << ((size * 8) - shift));
+            res = ((r1 >> shift) | (r2 << ((size * 8) - shift))) & size_mask;
         }
         return res;
     }


I've now incorporated this into your original patchset, rebased and pushed the result
to https://github.com/mcayland/qemu/tree/alex-softmmu for you to grab and test.
Hopefully this version will now pass Emilio's tests too: I don't have a benchmark
setup in place, so it's worth testing to make sure that my fixes haven't introduced
any performance regressions.

Something else I noticed is that patch 3 removes the extra victim TLB fill from the
unaligned access path in store_helper() but doesn't mention it in the commit message
- is this deliberate?


ATB,

Mark.

^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-02-26 19:57 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-15 14:31 [Qemu-devel] [PATCH v3 0/3] softmmu demacro Alex Bennée
2019-02-15 14:31 ` [Qemu-devel] [PATCH v3 1/3] accel/tcg: demacro cputlb Alex Bennée
2019-02-15 14:31 ` [Qemu-devel] [PATCH v3 2/3] accel/tcg: remove softmmu_template.h Alex Bennée
2019-02-15 14:31 ` [Qemu-devel] [PATCH v3 3/3] accel/tcg: move unaligned helpers out of core helpers Alex Bennée
2019-02-15 15:03 ` [Qemu-devel] [PATCH v3 0/3] softmmu demacro no-reply
2019-02-19 14:22 ` Alex Bennée
2019-02-19 18:25   ` Emilio G. Cota
2019-02-25 22:27     ` Mark Cave-Ayland
2019-02-26  9:24       ` Alex Bennée
2019-02-26 19:03         ` Mark Cave-Ayland
2019-02-26 19:57           ` Mark Cave-Ayland

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.