All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v2 0/2] cputlb: implement load_helper_unaligned() for unaligned loads
@ 2021-06-09 14:10 Philippe Mathieu-Daudé
  2021-06-09 14:10 ` [PATCH v2 1/2] accel/tcg/cputlb: Extract load_helper_unaligned() from load_helper() Philippe Mathieu-Daudé
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-06-09 14:10 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Richard Henderson, Mark Cave-Ayland,
	Philippe Mathieu-Daudé,
	Paolo Bonzini, Alex Bennée

Reposting Mark's patch:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg814227.html
but split in 2 patches for easier review.

Mark Cave-Ayland (1):
  cputlb: implement load_helper_unaligned() for unaligned loads

Philippe Mathieu-Daudé (1):
  accel/tcg/cputlb: Extract load_helper_unaligned() from load_helper()

 accel/tcg/cputlb.c | 106 ++++++++++++++++++++++++++++++++++++---------
 1 file changed, 85 insertions(+), 21 deletions(-)

-- 
2.31.1



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2 1/2] accel/tcg/cputlb: Extract load_helper_unaligned() from load_helper()
  2021-06-09 14:10 [RFC PATCH v2 0/2] cputlb: implement load_helper_unaligned() for unaligned loads Philippe Mathieu-Daudé
@ 2021-06-09 14:10 ` Philippe Mathieu-Daudé
  2021-06-09 14:10 ` [RFC PATCH v2 2/2] cputlb: implement load_helper_unaligned() for unaligned loads Philippe Mathieu-Daudé
  2021-06-09 16:09 ` [RFC PATCH v2 0/2] " Mark Cave-Ayland
  2 siblings, 0 replies; 6+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-06-09 14:10 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Richard Henderson, Mark Cave-Ayland,
	Philippe Mathieu-Daudé,
	Paolo Bonzini, Alex Bennée

Replace a goto statement by an inlined function for easier review.
No logical change intended.

Inspired-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
---
 accel/tcg/cputlb.c | 54 ++++++++++++++++++++++++++++------------------
 1 file changed, 33 insertions(+), 21 deletions(-)

diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index f24348e9793..2b5d569412c 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -1851,6 +1851,34 @@ load_memop(const void *haddr, MemOp op)
     }
 }
 
+static inline uint64_t QEMU_ALWAYS_INLINE
+load_helper_unaligned(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
+                      uintptr_t retaddr, MemOp op, bool code_read,
+                      FullLoadHelper *full_load)
+{
+    size_t size = memop_size(op);
+    target_ulong addr1, addr2;
+    uint64_t res;
+    uint64_t r1, r2;
+    unsigned shift;
+
+    addr1 = addr & ~((target_ulong)size - 1);
+    addr2 = addr1 + size;
+    r1 = full_load(env, addr1, oi, retaddr);
+    r2 = full_load(env, addr2, oi, retaddr);
+    shift = (addr & (size - 1)) * 8;
+
+    if (memop_big_endian(op)) {
+        /* Big-endian combine.  */
+        res = (r1 << shift) | (r2 >> ((size * 8) - shift));
+    } else {
+        /* Little-endian combine.  */
+        res = (r1 >> shift) | (r2 << ((size * 8) - shift));
+    }
+
+    return res & MAKE_64BIT_MASK(0, size * 8);
+}
+
 static inline uint64_t QEMU_ALWAYS_INLINE
 load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
             uintptr_t retaddr, MemOp op, bool code_read,
@@ -1866,7 +1894,6 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
         code_read ? MMU_INST_FETCH : MMU_DATA_LOAD;
     unsigned a_bits = get_alignment_bits(get_memop(oi));
     void *haddr;
-    uint64_t res;
     size_t size = memop_size(op);
 
     /* Handle CPU specific unaligned behaviour */
@@ -1893,9 +1920,10 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
         CPUIOTLBEntry *iotlbentry;
         bool need_swap;
 
-        /* For anything that is unaligned, recurse through full_load.  */
+        /* For anything that is unaligned, recurse through byte loads.  */
         if ((addr & (size - 1)) != 0) {
-            goto do_unaligned_access;
+            return load_helper_unaligned(env, addr, oi, retaddr, op,
+                                         code_read, full_load);
         }
 
         iotlbentry = &env_tlb(env)->d[mmu_idx].iotlb[index];
@@ -1932,24 +1960,8 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
     if (size > 1
         && unlikely((addr & ~TARGET_PAGE_MASK) + size - 1
                     >= TARGET_PAGE_SIZE)) {
-        target_ulong addr1, addr2;
-        uint64_t r1, r2;
-        unsigned shift;
-    do_unaligned_access:
-        addr1 = addr & ~((target_ulong)size - 1);
-        addr2 = addr1 + size;
-        r1 = full_load(env, addr1, oi, retaddr);
-        r2 = full_load(env, addr2, oi, retaddr);
-        shift = (addr & (size - 1)) * 8;
-
-        if (memop_big_endian(op)) {
-            /* Big-endian combine.  */
-            res = (r1 << shift) | (r2 >> ((size * 8) - shift));
-        } else {
-            /* Little-endian combine.  */
-            res = (r1 >> shift) | (r2 << ((size * 8) - shift));
-        }
-        return res & MAKE_64BIT_MASK(0, size * 8);
+        return load_helper_unaligned(env, addr, oi, retaddr, op,
+                                     code_read, full_load);
     }
 
     haddr = (void *)((uintptr_t)addr + entry->addend);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [RFC PATCH v2 2/2] cputlb: implement load_helper_unaligned() for unaligned loads
  2021-06-09 14:10 [RFC PATCH v2 0/2] cputlb: implement load_helper_unaligned() for unaligned loads Philippe Mathieu-Daudé
  2021-06-09 14:10 ` [PATCH v2 1/2] accel/tcg/cputlb: Extract load_helper_unaligned() from load_helper() Philippe Mathieu-Daudé
@ 2021-06-09 14:10 ` Philippe Mathieu-Daudé
  2021-06-10 21:41   ` Richard Henderson
  2021-06-09 16:09 ` [RFC PATCH v2 0/2] " Mark Cave-Ayland
  2 siblings, 1 reply; 6+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-06-09 14:10 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Richard Henderson, Mark Cave-Ayland,
	Philippe Mathieu-Daudé,
	Paolo Bonzini, Alex Bennée

From: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>

[RFC because this is currently only lightly tested and there have been some
discussions about whether this should be handled elsewhere in the memory API]

If an unaligned load is required then the load is split into 2 separate accesses
and combined together within load_helper(). This does not work correctly with
MMIO accesses because the original access size is used for both individual
accesses causing the little and big endian combine to return the wrong result.

There is already a similar solution in place for store_helper() where an unaligned
access is handled by a separate store_helper_unaligned() function which instead
of using the original access size, uses a single-byte access size to shift and
combine the result correctly regardless of the orignal access size or endian.

Implement a similar load_helper_unaligned() function which uses the same approach
for unaligned loads to return the correct result according to the original test
case.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/360
Message-Id: <20210609093528.9616-1-mark.cave-ayland@ilande.co.uk>
[PMD: Extract load_helper_unaligned() in earlier patch]
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
---
 accel/tcg/cputlb.c | 84 +++++++++++++++++++++++++++++++++++++---------
 1 file changed, 68 insertions(+), 16 deletions(-)

diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 2b5d569412c..f8a790d8b4a 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -1856,27 +1856,79 @@ load_helper_unaligned(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
                       uintptr_t retaddr, MemOp op, bool code_read,
                       FullLoadHelper *full_load)
 {
+    uintptr_t mmu_idx = get_mmuidx(oi);
     size_t size = memop_size(op);
-    target_ulong addr1, addr2;
-    uint64_t res;
-    uint64_t r1, r2;
-    unsigned shift;
+    uintptr_t index, index2;
+    CPUTLBEntry *entry, *entry2;
+    const size_t tlb_off = code_read ?
+        offsetof(CPUTLBEntry, addr_code) : offsetof(CPUTLBEntry, addr_read);
+    const MMUAccessType access_type =
+        code_read ? MMU_INST_FETCH : MMU_DATA_LOAD;
+    target_ulong page2, tlb_addr, tlb_addr2;
+    uint64_t val = 0;
+    size_t size2;
+    int i;
 
-    addr1 = addr & ~((target_ulong)size - 1);
-    addr2 = addr1 + size;
-    r1 = full_load(env, addr1, oi, retaddr);
-    r2 = full_load(env, addr2, oi, retaddr);
-    shift = (addr & (size - 1)) * 8;
+    /*
+     * Ensure the second page is in the TLB.  Note that the first page
+     * is already guaranteed to be filled, and that the second page
+     * cannot evict the first.
+     */
+    page2 = (addr + size) & TARGET_PAGE_MASK;
+    size2 = (addr + size) & ~TARGET_PAGE_MASK;
+    index2 = tlb_index(env, mmu_idx, page2);
+    entry2 = tlb_entry(env, mmu_idx, page2);
 
-    if (memop_big_endian(op)) {
-        /* Big-endian combine.  */
-        res = (r1 << shift) | (r2 >> ((size * 8) - shift));
-    } else {
-        /* Little-endian combine.  */
-        res = (r1 >> shift) | (r2 << ((size * 8) - shift));
+    tlb_addr2 = code_read ? entry2->addr_code : entry2->addr_read;
+    if (!tlb_hit_page(tlb_addr2, page2)) {
+        if (!victim_tlb_hit(env, mmu_idx, index2, tlb_off, page2)) {
+            tlb_fill(env_cpu(env), page2, size2, access_type,
+                     mmu_idx, retaddr);
+            index2 = tlb_index(env, mmu_idx, page2);
+            entry2 = tlb_entry(env, mmu_idx, page2);
+        }
+        tlb_addr2 = code_read ? entry2->addr_code : entry2->addr_read;
     }
 
-    return res & MAKE_64BIT_MASK(0, size * 8);
+    index = tlb_index(env, mmu_idx, addr);
+    entry = tlb_entry(env, mmu_idx, addr);
+    tlb_addr = code_read ? entry->addr_code : entry->addr_read;
+
+    /*
+     * Handle watchpoints
+     */
+    if (unlikely(tlb_addr & TLB_WATCHPOINT)) {
+        cpu_check_watchpoint(env_cpu(env), addr, size - size2,
+                             env_tlb(env)->d[mmu_idx].iotlb[index].attrs,
+                             BP_MEM_READ, retaddr);
+    }
+    if (unlikely(tlb_addr2 & TLB_WATCHPOINT)) {
+        cpu_check_watchpoint(env_cpu(env), page2, size2,
+                             env_tlb(env)->d[mmu_idx].iotlb[index2].attrs,
+                             BP_MEM_READ, retaddr);
+    }
+
+    /*
+     * XXX: not efficient, but simple.
+     * This loop must go in the forward direction to avoid issues
+     * with self-modifying code in Windows 64-bit.
+     */
+    oi = make_memop_idx(MO_UB, mmu_idx);
+    if (memop_big_endian(op)) {
+        for (i = 0; i < size; ++i) {
+            /* Big-endian load.  */
+            uint8_t val8 = helper_ret_ldub_mmu(env, addr + i, oi, retaddr);
+            val |= val8 << (((size - 1) * 8) - (i * 8));
+        }
+    } else {
+        for (i = 0; i < size; ++i) {
+            /* Little-endian load.  */
+            uint8_t val8 = helper_ret_ldub_mmu(env, addr + i, oi, retaddr);
+            val |= val8 << (i * 8);
+        }
+    }
+
+    return val & MAKE_64BIT_MASK(0, size * 8);
 }
 
 static inline uint64_t QEMU_ALWAYS_INLINE
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH v2 0/2] cputlb: implement load_helper_unaligned() for unaligned loads
  2021-06-09 14:10 [RFC PATCH v2 0/2] cputlb: implement load_helper_unaligned() for unaligned loads Philippe Mathieu-Daudé
  2021-06-09 14:10 ` [PATCH v2 1/2] accel/tcg/cputlb: Extract load_helper_unaligned() from load_helper() Philippe Mathieu-Daudé
  2021-06-09 14:10 ` [RFC PATCH v2 2/2] cputlb: implement load_helper_unaligned() for unaligned loads Philippe Mathieu-Daudé
@ 2021-06-09 16:09 ` Mark Cave-Ayland
  2 siblings, 0 replies; 6+ messages in thread
From: Mark Cave-Ayland @ 2021-06-09 16:09 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, qemu-devel
  Cc: Alex Bennée, Peter Maydell,
	incoming+qemu-project-qemu-11167699-3xhw7c0pviow7og92yv73e0tr-issue-360,
	Richard Henderson, Paolo Bonzini

On 09/06/2021 15:10, Philippe Mathieu-Daudé wrote:

(Added gitlab issue email)

> Reposting Mark's patch:
> https://www.mail-archive.com/qemu-devel@nongnu.org/msg814227.html
> but split in 2 patches for easier review.
> 
> Mark Cave-Ayland (1):
>    cputlb: implement load_helper_unaligned() for unaligned loads
> 
> Philippe Mathieu-Daudé (1):
>    accel/tcg/cputlb: Extract load_helper_unaligned() from load_helper()
> 
>   accel/tcg/cputlb.c | 106 ++++++++++++++++++++++++++++++++++++---------
>   1 file changed, 85 insertions(+), 21 deletions(-)

Thanks Phil. I'm replying to this to keep track of a few thoughts that came up in our 
discussion on IRC:

- Should these unaligned accesses be handled by the memory API?

- There is an overlap with Andrew Jeffrey's unaligned access patchset for the memory 
API at 
http://patchwork.ozlabs.org/project/qemu-devel/patch/20170630030058.28943-1-andrew@aj.id.au/. 
This would certainly benefit devices which currently handle unaligned accesses 
themselves.

- Currently there aren't any qtests to cover the unaligned access cputlb path

- How would using the memory API implementation interact with MemoryRegionOps 
.valid.unaligned and .impl.unaligned?

- The current cputlb store_helper_unaligned() and also load_helper_unaligned() 
proposed by this patchset always use byte accesses, i.e they do not honour the target 
MemoryRegion min_access_size. Switching to the memory API could therefore cause some 
existing cases to break, although -d guest_errors should now log these.

- Phil thinks that using the memory API could break ISA bus accesses


ATB,

Mark.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH v2 2/2] cputlb: implement load_helper_unaligned() for unaligned loads
  2021-06-09 14:10 ` [RFC PATCH v2 2/2] cputlb: implement load_helper_unaligned() for unaligned loads Philippe Mathieu-Daudé
@ 2021-06-10 21:41   ` Richard Henderson
  2021-06-10 22:13     ` Mark Cave-Ayland
  0 siblings, 1 reply; 6+ messages in thread
From: Richard Henderson @ 2021-06-10 21:41 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, qemu-devel
  Cc: Alex Bennée, Peter Maydell, Mark Cave-Ayland, Paolo Bonzini

On 6/9/21 7:10 AM, Philippe Mathieu-Daudé wrote:
> +    oi = make_memop_idx(MO_UB, mmu_idx);
> +    if (memop_big_endian(op)) {
> +        for (i = 0; i < size; ++i) {
> +            /* Big-endian load.  */
> +            uint8_t val8 = helper_ret_ldub_mmu(env, addr + i, oi, retaddr);
> +            val |= val8 << (((size - 1) * 8) - (i * 8));
> +        }
> +    } else {
> +        for (i = 0; i < size; ++i) {
> +            /* Little-endian load.  */
> +            uint8_t val8 = helper_ret_ldub_mmu(env, addr + i, oi, retaddr);
> +            val |= val8 << (i * 8);
> +        }
> +    }

This doesn't quite work.  You can't just call helper_ret_ldub_mmu, as the other 
option is full_ldub_code.  So, at present you've broken unaligned code loads.

We also need noinline markup for clang, like we do for helper_ret_stb_mmu. 
I've no proof of that, but it certainly makes sense to record how we expect the 
inline loop to be resolved.

Finally, you have to use uint64_t for val8, otherwise the shift fails for size 
== 8.

I'll fix these up and see how things go.


r~


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH v2 2/2] cputlb: implement load_helper_unaligned() for unaligned loads
  2021-06-10 21:41   ` Richard Henderson
@ 2021-06-10 22:13     ` Mark Cave-Ayland
  0 siblings, 0 replies; 6+ messages in thread
From: Mark Cave-Ayland @ 2021-06-10 22:13 UTC (permalink / raw)
  To: Richard Henderson, Philippe Mathieu-Daudé, qemu-devel
  Cc: Peter Maydell, Alex Bennée, Paolo Bonzini

On 10/06/2021 22:41, Richard Henderson wrote:

> On 6/9/21 7:10 AM, Philippe Mathieu-Daudé wrote:
>> +    oi = make_memop_idx(MO_UB, mmu_idx);
>> +    if (memop_big_endian(op)) {
>> +        for (i = 0; i < size; ++i) {
>> +            /* Big-endian load.  */
>> +            uint8_t val8 = helper_ret_ldub_mmu(env, addr + i, oi, retaddr);
>> +            val |= val8 << (((size - 1) * 8) - (i * 8));
>> +        }
>> +    } else {
>> +        for (i = 0; i < size; ++i) {
>> +            /* Little-endian load.  */
>> +            uint8_t val8 = helper_ret_ldub_mmu(env, addr + i, oi, retaddr);
>> +            val |= val8 << (i * 8);
>> +        }
>> +    }
> 
> This doesn't quite work.  You can't just call helper_ret_ldub_mmu, as the other 
> option is full_ldub_code.  So, at present you've broken unaligned code loads.
> 
> We also need noinline markup for clang, like we do for helper_ret_stb_mmu. I've no 
> proof of that, but it certainly makes sense to record how we expect the inline loop 
> to be resolved.
> 
> Finally, you have to use uint64_t for val8, otherwise the shift fails for size == 8.
> 
> I'll fix these up and see how things go.

Ah that makes sense - I wasn't quite sure whether the previous full_load() would 
already be handled correctly by the code_read logic in load_helper_unaligned().

Sleeping on the problem, I do wonder if the role of load_helper() and store_helper() 
should be just to ensure that if an access crosses a page then both entries are in 
the softtlb before passing MMIO accesses down through the memory API with Andrew 
Jeffrey's proposed unaligned access patch (see 
https://gitlab.com/qemu-project/qemu/-/issues/360#note_597130838 for a summary of the 
IRC discussion).

However as I suspect this may cause some breakage then I would still be fine with 
load_helper_unaligned() in the meantime.


ATB,

Mark.


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-06-10 22:14 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-09 14:10 [RFC PATCH v2 0/2] cputlb: implement load_helper_unaligned() for unaligned loads Philippe Mathieu-Daudé
2021-06-09 14:10 ` [PATCH v2 1/2] accel/tcg/cputlb: Extract load_helper_unaligned() from load_helper() Philippe Mathieu-Daudé
2021-06-09 14:10 ` [RFC PATCH v2 2/2] cputlb: implement load_helper_unaligned() for unaligned loads Philippe Mathieu-Daudé
2021-06-10 21:41   ` Richard Henderson
2021-06-10 22:13     ` Mark Cave-Ayland
2021-06-09 16:09 ` [RFC PATCH v2 0/2] " Mark Cave-Ayland

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.