qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 00/20] Move rom and notdirty handling to cputlb
@ 2019-09-22  3:54 Richard Henderson
  2019-09-22  3:54 ` [PATCH v3 01/20] exec: Use TARGET_PAGE_BITS_MIN for TLB flags Richard Henderson
                   ` (22 more replies)
  0 siblings, 23 replies; 53+ messages in thread
From: Richard Henderson @ 2019-09-22  3:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, alex.bennee, stefanha, david

Ok!  Third time is the charm, because this time it works.

New to v3:

  * Covert io_mem_rom with a new TLB_ROM bit.

  * This in turn means that there are no longer any special RAM
    case along along the MMIO path -- they all have devices on
    the other end.

  * This in turn means that we can fold the bulk of 
    memory_region_section_get_iotlb into tlb_set_page_with_attrs,
    a couple of redundant tests vs the MemoryRegion.
    The result in patch 14 is, IMO, much more understandable.

  * Fold away uses of cpu->mem_io_pc in tb_invalidate_phys_page__locked,
    the cause of the problems for my previous two patch sets.

    BTW, I was correct with my guess in the v2 cover letter that the use
    of memory_notdirty_write_{prepare,complete} within atomic_mmu_lookup
    must have been broken, for not setting mem_io_pc.  :-P

  * Fix a missed use of cpu->mem_io_pc in tb_check_watchpoint,
    which meant that the previous TLB_WATCHPOINT cleanup was a
    titch broken.

The remaining two users of cpu->mem_io_pc are hw/misc/mips_itu.c and
target/i386/helper.c.  I haven't looked, but I assume that these are
legitimately on the MMIO path, and there probably isn't a decent way
to remove the uses.


r~


Richard Henderson (20):
  exec: Use TARGET_PAGE_BITS_MIN for TLB flags
  exec: Split out variable page size support to exec-vary.c
  exec: Use const alias for TARGET_PAGE_BITS_VARY
  exec: Restrict TARGET_PAGE_BITS_VARY assert to CONFIG_DEBUG_TCG
  exec: Promote TARGET_PAGE_MASK to target_long
  exec: Tidy TARGET_PAGE_ALIGN
  exec: Cache TARGET_PAGE_MASK for TARGET_PAGE_BITS_VARY
  cputlb: Disable __always_inline__ without optimization
  cputlb: Replace switches in load/store_helper with callback
  cputlb: Introduce TLB_BSWAP
  exec: Adjust notdirty tracing
  cputlb: Move ROM handling from I/O path to TLB path
  cputlb: Move NOTDIRTY handling from I/O path to TLB path
  cputlb: Partially inline memory_region_section_get_iotlb
  cputlb: Merge and move memory_notdirty_write_{prepare,complete}
  cputlb: Handle TLB_NOTDIRTY in probe_access
  cputlb: Remove cpu->mem_io_vaddr
  cputlb: Remove tb_invalidate_phys_page_range is_cpu_write_access
  cputlb: Pass retaddr to tb_invalidate_phys_page_fast
  cputlb: Pass retaddr to tb_check_watchpoint

 Makefile.target                |   2 +-
 accel/tcg/translate-all.h      |   8 +-
 include/exec/cpu-all.h         |  48 ++--
 include/exec/cpu-common.h      |   3 -
 include/exec/exec-all.h        |   6 +-
 include/exec/memory-internal.h |  65 ------
 include/hw/core/cpu.h          |   2 -
 include/qemu-common.h          |   6 +
 include/qemu/compiler.h        |  11 +
 accel/tcg/cputlb.c             | 388 +++++++++++++++++++--------------
 accel/tcg/translate-all.c      |  51 ++---
 exec-vary.c                    |  88 ++++++++
 exec.c                         | 192 +---------------
 hw/core/cpu.c                  |   1 -
 memory.c                       |  20 --
 trace-events                   |   4 +-
 16 files changed, 403 insertions(+), 492 deletions(-)
 create mode 100644 exec-vary.c

-- 
2.17.1



^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v3 01/20] exec: Use TARGET_PAGE_BITS_MIN for TLB flags
  2019-09-22  3:54 [PATCH v3 00/20] Move rom and notdirty handling to cputlb Richard Henderson
@ 2019-09-22  3:54 ` Richard Henderson
  2019-09-23  8:24   ` David Hildenbrand
  2019-09-22  3:54 ` [PATCH v3 02/20] exec: Split out variable page size support to exec-vary.c Richard Henderson
                   ` (21 subsequent siblings)
  22 siblings, 1 reply; 53+ messages in thread
From: Richard Henderson @ 2019-09-22  3:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, alex.bennee, stefanha, david

These bits do not need to vary with the actual page size
used by the guest.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/exec/cpu-all.h | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index d2d443c4f9..e0c8dc540c 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -317,20 +317,24 @@ CPUArchState *cpu_copy(CPUArchState *env);
 
 #if !defined(CONFIG_USER_ONLY)
 
-/* Flags stored in the low bits of the TLB virtual address.  These are
- * defined so that fast path ram access is all zeros.
+/*
+ * Flags stored in the low bits of the TLB virtual address.
+ * These are defined so that fast path ram access is all zeros.
  * The flags all must be between TARGET_PAGE_BITS and
  * maximum address alignment bit.
+ *
+ * Use TARGET_PAGE_BITS_MIN so that these bits are constant
+ * when TARGET_PAGE_BITS_VARY is in effect.
  */
 /* Zero if TLB entry is valid.  */
-#define TLB_INVALID_MASK    (1 << (TARGET_PAGE_BITS - 1))
+#define TLB_INVALID_MASK    (1 << (TARGET_PAGE_BITS_MIN - 1))
 /* Set if TLB entry references a clean RAM page.  The iotlb entry will
    contain the page physical address.  */
-#define TLB_NOTDIRTY        (1 << (TARGET_PAGE_BITS - 2))
+#define TLB_NOTDIRTY        (1 << (TARGET_PAGE_BITS_MIN - 2))
 /* Set if TLB entry is an IO callback.  */
-#define TLB_MMIO            (1 << (TARGET_PAGE_BITS - 3))
+#define TLB_MMIO            (1 << (TARGET_PAGE_BITS_MIN - 3))
 /* Set if TLB entry contains a watchpoint.  */
-#define TLB_WATCHPOINT      (1 << (TARGET_PAGE_BITS - 4))
+#define TLB_WATCHPOINT      (1 << (TARGET_PAGE_BITS_MIN - 4))
 
 /* Use this mask to check interception with an alignment mask
  * in a TCG backend.
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v3 02/20] exec: Split out variable page size support to exec-vary.c
  2019-09-22  3:54 [PATCH v3 00/20] Move rom and notdirty handling to cputlb Richard Henderson
  2019-09-22  3:54 ` [PATCH v3 01/20] exec: Use TARGET_PAGE_BITS_MIN for TLB flags Richard Henderson
@ 2019-09-22  3:54 ` Richard Henderson
  2019-09-23  8:26   ` David Hildenbrand
  2019-09-22  3:54 ` [PATCH v3 03/20] exec: Use const alias for TARGET_PAGE_BITS_VARY Richard Henderson
                   ` (20 subsequent siblings)
  22 siblings, 1 reply; 53+ messages in thread
From: Richard Henderson @ 2019-09-22  3:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, alex.bennee, stefanha, david

The next patch will play a trick with "const" that will
confuse the compiler about the uses of target_page_bits
within exec.c.  Moving everything to a new file prevents
this confusion.

No functional change so far.

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 Makefile.target       |  2 +-
 include/qemu-common.h |  6 +++++
 exec-vary.c           | 57 +++++++++++++++++++++++++++++++++++++++++++
 exec.c                | 34 --------------------------
 4 files changed, 64 insertions(+), 35 deletions(-)
 create mode 100644 exec-vary.c

diff --git a/Makefile.target b/Makefile.target
index 5e916230c4..ca3d14efe1 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -107,7 +107,7 @@ obj-y += trace/
 
 #########################################################
 # cpu emulator library
-obj-y += exec.o
+obj-y += exec.o exec-vary.o
 obj-y += accel/
 obj-$(CONFIG_TCG) += tcg/tcg.o tcg/tcg-op.o tcg/tcg-op-vec.o tcg/tcg-op-gvec.o
 obj-$(CONFIG_TCG) += tcg/tcg-common.o tcg/optimize.o
diff --git a/include/qemu-common.h b/include/qemu-common.h
index 8d84db90b0..082da59e85 100644
--- a/include/qemu-common.h
+++ b/include/qemu-common.h
@@ -74,6 +74,12 @@ void cpu_exec_step_atomic(CPUState *cpu);
  */
 bool set_preferred_target_page_bits(int bits);
 
+/**
+ * finalize_target_page_bits:
+ * Commit the final value set by set_preferred_target_page_bits.
+ */
+void finalize_target_page_bits(void);
+
 /**
  * Sends a (part of) iovec down a socket, yielding when the socket is full, or
  * Receives data into a (part of) iovec from a socket,
diff --git a/exec-vary.c b/exec-vary.c
new file mode 100644
index 0000000000..48c0ab306c
--- /dev/null
+++ b/exec-vary.c
@@ -0,0 +1,57 @@
+/*
+ * Variable page size handling
+ *
+ *  Copyright (c) 2003 Fabrice Bellard
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "exec/exec-all.h"
+
+#ifdef TARGET_PAGE_BITS_VARY
+int target_page_bits;
+bool target_page_bits_decided;
+#endif
+
+bool set_preferred_target_page_bits(int bits)
+{
+    /*
+     * The target page size is the lowest common denominator for all
+     * the CPUs in the system, so we can only make it smaller, never
+     * larger. And we can't make it smaller once we've committed to
+     * a particular size.
+     */
+#ifdef TARGET_PAGE_BITS_VARY
+    assert(bits >= TARGET_PAGE_BITS_MIN);
+    if (target_page_bits == 0 || target_page_bits > bits) {
+        if (target_page_bits_decided) {
+            return false;
+        }
+        target_page_bits = bits;
+    }
+#endif
+    return true;
+}
+
+void finalize_target_page_bits(void)
+{
+#ifdef TARGET_PAGE_BITS_VARY
+    if (target_page_bits == 0) {
+        target_page_bits = TARGET_PAGE_BITS_MIN;
+    }
+    target_page_bits_decided = true;
+#endif
+}
diff --git a/exec.c b/exec.c
index 8b998974f8..33bd0e36c1 100644
--- a/exec.c
+++ b/exec.c
@@ -92,11 +92,6 @@ MemoryRegion io_mem_rom, io_mem_notdirty;
 static MemoryRegion io_mem_unassigned;
 #endif
 
-#ifdef TARGET_PAGE_BITS_VARY
-int target_page_bits;
-bool target_page_bits_decided;
-#endif
-
 CPUTailQ cpus = QTAILQ_HEAD_INITIALIZER(cpus);
 
 /* current CPU in the current thread. It is only valid inside
@@ -110,37 +105,8 @@ int use_icount;
 uintptr_t qemu_host_page_size;
 intptr_t qemu_host_page_mask;
 
-bool set_preferred_target_page_bits(int bits)
-{
-    /* The target page size is the lowest common denominator for all
-     * the CPUs in the system, so we can only make it smaller, never
-     * larger. And we can't make it smaller once we've committed to
-     * a particular size.
-     */
-#ifdef TARGET_PAGE_BITS_VARY
-    assert(bits >= TARGET_PAGE_BITS_MIN);
-    if (target_page_bits == 0 || target_page_bits > bits) {
-        if (target_page_bits_decided) {
-            return false;
-        }
-        target_page_bits = bits;
-    }
-#endif
-    return true;
-}
-
 #if !defined(CONFIG_USER_ONLY)
 
-static void finalize_target_page_bits(void)
-{
-#ifdef TARGET_PAGE_BITS_VARY
-    if (target_page_bits == 0) {
-        target_page_bits = TARGET_PAGE_BITS_MIN;
-    }
-    target_page_bits_decided = true;
-#endif
-}
-
 typedef struct PhysPageEntry PhysPageEntry;
 
 struct PhysPageEntry {
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v3 03/20] exec: Use const alias for TARGET_PAGE_BITS_VARY
  2019-09-22  3:54 [PATCH v3 00/20] Move rom and notdirty handling to cputlb Richard Henderson
  2019-09-22  3:54 ` [PATCH v3 01/20] exec: Use TARGET_PAGE_BITS_MIN for TLB flags Richard Henderson
  2019-09-22  3:54 ` [PATCH v3 02/20] exec: Split out variable page size support to exec-vary.c Richard Henderson
@ 2019-09-22  3:54 ` Richard Henderson
  2019-09-22  3:54 ` [PATCH v3 04/20] exec: Restrict TARGET_PAGE_BITS_VARY assert to CONFIG_DEBUG_TCG Richard Henderson
                   ` (19 subsequent siblings)
  22 siblings, 0 replies; 53+ messages in thread
From: Richard Henderson @ 2019-09-22  3:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, alex.bennee, stefanha, david

Using a variable that is declared "const" for this tells the
compiler that it may read the value once and assume that it
does not change across function calls.

For target_page_size, this means we have only one assert per
function, and one read of the variable.

This reduces the size of qemu-system-aarch64 by 8k.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/exec/cpu-all.h | 10 +++++----
 exec-vary.c            | 46 ++++++++++++++++++++++++++++++++++--------
 2 files changed, 44 insertions(+), 12 deletions(-)

diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index e0c8dc540c..a53b761b48 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -210,10 +210,12 @@ static inline void stl_phys_notdirty(AddressSpace *as, hwaddr addr, uint32_t val
 /* page related stuff */
 
 #ifdef TARGET_PAGE_BITS_VARY
-extern bool target_page_bits_decided;
-extern int target_page_bits;
-#define TARGET_PAGE_BITS ({ assert(target_page_bits_decided); \
-                            target_page_bits; })
+typedef struct {
+    bool decided;
+    int bits;
+} TargetPageBits;
+extern const TargetPageBits target_page;
+#define TARGET_PAGE_BITS (assert(target_page.decided), target_page.bits)
 #else
 #define TARGET_PAGE_BITS_MIN TARGET_PAGE_BITS
 #endif
diff --git a/exec-vary.c b/exec-vary.c
index 48c0ab306c..67cdf57a9c 100644
--- a/exec-vary.c
+++ b/exec-vary.c
@@ -22,8 +22,38 @@
 #include "exec/exec-all.h"
 
 #ifdef TARGET_PAGE_BITS_VARY
-int target_page_bits;
-bool target_page_bits_decided;
+/*
+ * We want to declare the "target_page" variable as const, which tells
+ * the compiler that it can cache any value that it reads across calls.
+ * This avoids multiple assertions and multiple reads within any one user.
+ *
+ * This works because we initialize the target_page data very early, in a
+ * location far removed from the functions that require the final results.
+ *
+ * This also requires that we have a non-constant symbol by which we can
+ * perform the actual initialization, and which forces the data to be
+ * allocated within writable memory.  Thus "init_target_page", and we use
+ * that symbol exclusively in the two functions that initialize this value.
+ *
+ * The "target_page" symbol is created as an alias of "init_target_page".
+ */
+static TargetPageBits init_target_page;
+
+/*
+ * Note that this is *not* a redundant decl, this is the definition of
+ * the "target_page" symbol.  The syntax for this definition requires
+ * the use of the extern keyword.  This seems to be a GCC bug in
+ * either the syntax for the alias attribute or in -Wredundant-decls.
+ *
+ * See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91765
+ */
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wredundant-decls"
+
+extern const TargetPageBits target_page
+    __attribute__((alias("init_target_page")));
+
+#pragma GCC diagnostic pop
 #endif
 
 bool set_preferred_target_page_bits(int bits)
@@ -36,11 +66,11 @@ bool set_preferred_target_page_bits(int bits)
      */
 #ifdef TARGET_PAGE_BITS_VARY
     assert(bits >= TARGET_PAGE_BITS_MIN);
-    if (target_page_bits == 0 || target_page_bits > bits) {
-        if (target_page_bits_decided) {
+    if (init_target_page.bits == 0 || init_target_page.bits > bits) {
+        if (init_target_page.decided) {
             return false;
         }
-        target_page_bits = bits;
+        init_target_page.bits = bits;
     }
 #endif
     return true;
@@ -49,9 +79,9 @@ bool set_preferred_target_page_bits(int bits)
 void finalize_target_page_bits(void)
 {
 #ifdef TARGET_PAGE_BITS_VARY
-    if (target_page_bits == 0) {
-        target_page_bits = TARGET_PAGE_BITS_MIN;
+    if (init_target_page.bits == 0) {
+        init_target_page.bits = TARGET_PAGE_BITS_MIN;
     }
-    target_page_bits_decided = true;
+    init_target_page.decided = true;
 #endif
 }
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v3 04/20] exec: Restrict TARGET_PAGE_BITS_VARY assert to CONFIG_DEBUG_TCG
  2019-09-22  3:54 [PATCH v3 00/20] Move rom and notdirty handling to cputlb Richard Henderson
                   ` (2 preceding siblings ...)
  2019-09-22  3:54 ` [PATCH v3 03/20] exec: Use const alias for TARGET_PAGE_BITS_VARY Richard Henderson
@ 2019-09-22  3:54 ` Richard Henderson
  2019-09-22  3:54 ` [PATCH v3 05/20] exec: Promote TARGET_PAGE_MASK to target_long Richard Henderson
                   ` (18 subsequent siblings)
  22 siblings, 0 replies; 53+ messages in thread
From: Richard Henderson @ 2019-09-22  3:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, alex.bennee, stefanha, david

This reduces the size of a release build by about 10k.
Noticably, within the tlb miss helpers.

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/exec/cpu-all.h | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index a53b761b48..b11ee1f711 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -215,7 +215,11 @@ typedef struct {
     int bits;
 } TargetPageBits;
 extern const TargetPageBits target_page;
-#define TARGET_PAGE_BITS (assert(target_page.decided), target_page.bits)
+# ifdef CONFIG_DEBUG_TCG
+#  define TARGET_PAGE_BITS (assert(target_page.decided), target_page.bits)
+# else
+#  define TARGET_PAGE_BITS target_page.bits
+# endif
 #else
 #define TARGET_PAGE_BITS_MIN TARGET_PAGE_BITS
 #endif
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v3 05/20] exec: Promote TARGET_PAGE_MASK to target_long
  2019-09-22  3:54 [PATCH v3 00/20] Move rom and notdirty handling to cputlb Richard Henderson
                   ` (3 preceding siblings ...)
  2019-09-22  3:54 ` [PATCH v3 04/20] exec: Restrict TARGET_PAGE_BITS_VARY assert to CONFIG_DEBUG_TCG Richard Henderson
@ 2019-09-22  3:54 ` Richard Henderson
  2019-09-23  8:30   ` David Hildenbrand
  2019-09-22  3:54 ` [PATCH v3 06/20] exec: Tidy TARGET_PAGE_ALIGN Richard Henderson
                   ` (17 subsequent siblings)
  22 siblings, 1 reply; 53+ messages in thread
From: Richard Henderson @ 2019-09-22  3:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, alex.bennee, stefanha, david

There are some uint64_t uses that expect TARGET_PAGE_MASK to
extend for a 32-bit, so this must continue to be a signed type.
Define based on TARGET_PAGE_BITS not TARGET_PAGE_SIZE; this
will make a following patch more clear.

This should not have a functional effect so far.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/exec/cpu-all.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index b11ee1f711..34d36cebca 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -225,7 +225,7 @@ extern const TargetPageBits target_page;
 #endif
 
 #define TARGET_PAGE_SIZE (1 << TARGET_PAGE_BITS)
-#define TARGET_PAGE_MASK ~(TARGET_PAGE_SIZE - 1)
+#define TARGET_PAGE_MASK ((target_long)-1 << TARGET_PAGE_BITS)
 #define TARGET_PAGE_ALIGN(addr) (((addr) + TARGET_PAGE_SIZE - 1) & TARGET_PAGE_MASK)
 
 /* Using intptr_t ensures that qemu_*_page_mask is sign-extended even
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v3 06/20] exec: Tidy TARGET_PAGE_ALIGN
  2019-09-22  3:54 [PATCH v3 00/20] Move rom and notdirty handling to cputlb Richard Henderson
                   ` (4 preceding siblings ...)
  2019-09-22  3:54 ` [PATCH v3 05/20] exec: Promote TARGET_PAGE_MASK to target_long Richard Henderson
@ 2019-09-22  3:54 ` Richard Henderson
  2019-09-23  8:30   ` David Hildenbrand
  2019-09-22  3:54 ` [PATCH v3 07/20] exec: Cache TARGET_PAGE_MASK for TARGET_PAGE_BITS_VARY Richard Henderson
                   ` (16 subsequent siblings)
  22 siblings, 1 reply; 53+ messages in thread
From: Richard Henderson @ 2019-09-22  3:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, alex.bennee, stefanha, david

Use TARGET_PAGE_MASK twice instead of TARGET_PAGE_SIZE once.
This is functionally identical, but will help a following patch.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/exec/cpu-all.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index 34d36cebca..5246770271 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -226,7 +226,8 @@ extern const TargetPageBits target_page;
 
 #define TARGET_PAGE_SIZE (1 << TARGET_PAGE_BITS)
 #define TARGET_PAGE_MASK ((target_long)-1 << TARGET_PAGE_BITS)
-#define TARGET_PAGE_ALIGN(addr) (((addr) + TARGET_PAGE_SIZE - 1) & TARGET_PAGE_MASK)
+#define TARGET_PAGE_ALIGN(addr) \
+    (((addr) + ~TARGET_PAGE_MASK) & TARGET_PAGE_MASK)
 
 /* Using intptr_t ensures that qemu_*_page_mask is sign-extended even
  * when intptr_t is 32-bit and we are aligning a long long.
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v3 07/20] exec: Cache TARGET_PAGE_MASK for TARGET_PAGE_BITS_VARY
  2019-09-22  3:54 [PATCH v3 00/20] Move rom and notdirty handling to cputlb Richard Henderson
                   ` (5 preceding siblings ...)
  2019-09-22  3:54 ` [PATCH v3 06/20] exec: Tidy TARGET_PAGE_ALIGN Richard Henderson
@ 2019-09-22  3:54 ` Richard Henderson
  2019-09-23  8:31   ` David Hildenbrand
  2019-09-22  3:54 ` [PATCH v3 08/20] cputlb: Disable __always_inline__ without optimization Richard Henderson
                   ` (15 subsequent siblings)
  22 siblings, 1 reply; 53+ messages in thread
From: Richard Henderson @ 2019-09-22  3:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, alex.bennee, stefanha, david

This eliminates a set of runtime shifts.  It turns out that we
require TARGET_PAGE_MASK more often than TARGET_PAGE_SIZE, so
redefine TARGET_PAGE_SIZE based on TARGET_PAGE_MASK instead of
the other way around.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/exec/cpu-all.h | 8 ++++++--
 exec-vary.c            | 1 +
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index 5246770271..2db73c7a27 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -213,19 +213,23 @@ static inline void stl_phys_notdirty(AddressSpace *as, hwaddr addr, uint32_t val
 typedef struct {
     bool decided;
     int bits;
+    target_long mask;
 } TargetPageBits;
 extern const TargetPageBits target_page;
 # ifdef CONFIG_DEBUG_TCG
 #  define TARGET_PAGE_BITS (assert(target_page.decided), target_page.bits)
+#  define TARGET_PAGE_MASK (assert(target_page.decided), target_page.mask)
 # else
 #  define TARGET_PAGE_BITS target_page.bits
+#  define TARGET_PAGE_MASK target_page.mask
 # endif
+# define TARGET_PAGE_SIZE  ((int)-TARGET_PAGE_MASK)
 #else
 #define TARGET_PAGE_BITS_MIN TARGET_PAGE_BITS
+#define TARGET_PAGE_SIZE   (1 << TARGET_PAGE_BITS)
+#define TARGET_PAGE_MASK   ((target_long)-1 << TARGET_PAGE_BITS)
 #endif
 
-#define TARGET_PAGE_SIZE (1 << TARGET_PAGE_BITS)
-#define TARGET_PAGE_MASK ((target_long)-1 << TARGET_PAGE_BITS)
 #define TARGET_PAGE_ALIGN(addr) \
     (((addr) + ~TARGET_PAGE_MASK) & TARGET_PAGE_MASK)
 
diff --git a/exec-vary.c b/exec-vary.c
index 67cdf57a9c..26daf281f2 100644
--- a/exec-vary.c
+++ b/exec-vary.c
@@ -83,5 +83,6 @@ void finalize_target_page_bits(void)
         init_target_page.bits = TARGET_PAGE_BITS_MIN;
     }
     init_target_page.decided = true;
+    init_target_page.mask = (target_long)-1 << init_target_page.bits;
 #endif
 }
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v3 08/20] cputlb: Disable __always_inline__ without optimization
  2019-09-22  3:54 [PATCH v3 00/20] Move rom and notdirty handling to cputlb Richard Henderson
                   ` (6 preceding siblings ...)
  2019-09-22  3:54 ` [PATCH v3 07/20] exec: Cache TARGET_PAGE_MASK for TARGET_PAGE_BITS_VARY Richard Henderson
@ 2019-09-22  3:54 ` Richard Henderson
  2019-09-23  9:18   ` Philippe Mathieu-Daudé
  2019-09-23  9:45   ` Paolo Bonzini
  2019-09-22  3:54 ` [PATCH v3 09/20] cputlb: Replace switches in load/store_helper with callback Richard Henderson
                   ` (14 subsequent siblings)
  22 siblings, 2 replies; 53+ messages in thread
From: Richard Henderson @ 2019-09-22  3:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, alex.bennee, stefanha, david

This forced inlining can result in missing symbols,
which makes a debugging build harder to follow.

Reviewed-by: David Hildenbrand <david@redhat.com>
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/qemu/compiler.h | 11 +++++++++++
 accel/tcg/cputlb.c      |  4 ++--
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/include/qemu/compiler.h b/include/qemu/compiler.h
index 09fc44cca4..d6d400c523 100644
--- a/include/qemu/compiler.h
+++ b/include/qemu/compiler.h
@@ -170,6 +170,17 @@
 # define QEMU_NONSTRING
 #endif
 
+/*
+ * Forced inlining may be desired to encourage constant propagation
+ * of function parameters.  However, it can also make debugging harder,
+ * so disable it for a non-optimizing build.
+ */
+#if defined(__OPTIMIZE__) && __has_attribute(always_inline)
+#define QEMU_ALWAYS_INLINE  __attribute__((always_inline))
+#else
+#define QEMU_ALWAYS_INLINE
+#endif
+
 /* Implement C11 _Generic via GCC builtins.  Example:
  *
  *    QEMU_GENERIC(x, (float, sinf), (long double, sinl), sin) (x)
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index abae79650c..2222b87764 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -1281,7 +1281,7 @@ static void *atomic_mmu_lookup(CPUArchState *env, target_ulong addr,
 typedef uint64_t FullLoadHelper(CPUArchState *env, target_ulong addr,
                                 TCGMemOpIdx oi, uintptr_t retaddr);
 
-static inline uint64_t __attribute__((always_inline))
+static inline uint64_t QEMU_ALWAYS_INLINE
 load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
             uintptr_t retaddr, MemOp op, bool code_read,
             FullLoadHelper *full_load)
@@ -1530,7 +1530,7 @@ tcg_target_ulong helper_be_ldsl_mmu(CPUArchState *env, target_ulong addr,
  * Store Helpers
  */
 
-static inline void __attribute__((always_inline))
+static inline void QEMU_ALWAYS_INLINE
 store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
              TCGMemOpIdx oi, uintptr_t retaddr, MemOp op)
 {
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v3 09/20] cputlb: Replace switches in load/store_helper with callback
  2019-09-22  3:54 [PATCH v3 00/20] Move rom and notdirty handling to cputlb Richard Henderson
                   ` (7 preceding siblings ...)
  2019-09-22  3:54 ` [PATCH v3 08/20] cputlb: Disable __always_inline__ without optimization Richard Henderson
@ 2019-09-22  3:54 ` Richard Henderson
  2019-09-23  8:32   ` David Hildenbrand
                     ` (2 more replies)
  2019-09-22  3:54 ` [PATCH v3 10/20] cputlb: Introduce TLB_BSWAP Richard Henderson
                   ` (13 subsequent siblings)
  22 siblings, 3 replies; 53+ messages in thread
From: Richard Henderson @ 2019-09-22  3:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, alex.bennee, stefanha, david

Add a function parameter to perform the actual load/store to ram.
With optimization, this results in identical code.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/cputlb.c | 159 +++++++++++++++++++++++----------------------
 1 file changed, 83 insertions(+), 76 deletions(-)

diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 2222b87764..b4a63d3928 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -1280,11 +1280,38 @@ static void *atomic_mmu_lookup(CPUArchState *env, target_ulong addr,
 
 typedef uint64_t FullLoadHelper(CPUArchState *env, target_ulong addr,
                                 TCGMemOpIdx oi, uintptr_t retaddr);
+typedef uint64_t LoadHelper(const void *);
+
+/* Wrap the unaligned load helpers to that they have a common signature.  */
+static inline uint64_t wrap_ldub(const void *haddr)
+{
+    return ldub_p(haddr);
+}
+
+static inline uint64_t wrap_lduw_be(const void *haddr)
+{
+    return lduw_be_p(haddr);
+}
+
+static inline uint64_t wrap_lduw_le(const void *haddr)
+{
+    return lduw_le_p(haddr);
+}
+
+static inline uint64_t wrap_ldul_be(const void *haddr)
+{
+    return (uint32_t)ldl_be_p(haddr);
+}
+
+static inline uint64_t wrap_ldul_le(const void *haddr)
+{
+    return (uint32_t)ldl_le_p(haddr);
+}
 
 static inline uint64_t QEMU_ALWAYS_INLINE
 load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
             uintptr_t retaddr, MemOp op, bool code_read,
-            FullLoadHelper *full_load)
+            FullLoadHelper *full_load, LoadHelper *direct)
 {
     uintptr_t mmu_idx = get_mmuidx(oi);
     uintptr_t index = tlb_index(env, mmu_idx, addr);
@@ -1373,33 +1400,7 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
 
  do_aligned_access:
     haddr = (void *)((uintptr_t)addr + entry->addend);
-    switch (op) {
-    case MO_UB:
-        res = ldub_p(haddr);
-        break;
-    case MO_BEUW:
-        res = lduw_be_p(haddr);
-        break;
-    case MO_LEUW:
-        res = lduw_le_p(haddr);
-        break;
-    case MO_BEUL:
-        res = (uint32_t)ldl_be_p(haddr);
-        break;
-    case MO_LEUL:
-        res = (uint32_t)ldl_le_p(haddr);
-        break;
-    case MO_BEQ:
-        res = ldq_be_p(haddr);
-        break;
-    case MO_LEQ:
-        res = ldq_le_p(haddr);
-        break;
-    default:
-        g_assert_not_reached();
-    }
-
-    return res;
+    return direct(haddr);
 }
 
 /*
@@ -1415,7 +1416,8 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
 static uint64_t full_ldub_mmu(CPUArchState *env, target_ulong addr,
                               TCGMemOpIdx oi, uintptr_t retaddr)
 {
-    return load_helper(env, addr, oi, retaddr, MO_UB, false, full_ldub_mmu);
+    return load_helper(env, addr, oi, retaddr, MO_UB, false,
+                       full_ldub_mmu, wrap_ldub);
 }
 
 tcg_target_ulong helper_ret_ldub_mmu(CPUArchState *env, target_ulong addr,
@@ -1428,7 +1430,7 @@ static uint64_t full_le_lduw_mmu(CPUArchState *env, target_ulong addr,
                                  TCGMemOpIdx oi, uintptr_t retaddr)
 {
     return load_helper(env, addr, oi, retaddr, MO_LEUW, false,
-                       full_le_lduw_mmu);
+                       full_le_lduw_mmu, wrap_lduw_le);
 }
 
 tcg_target_ulong helper_le_lduw_mmu(CPUArchState *env, target_ulong addr,
@@ -1441,7 +1443,7 @@ static uint64_t full_be_lduw_mmu(CPUArchState *env, target_ulong addr,
                                  TCGMemOpIdx oi, uintptr_t retaddr)
 {
     return load_helper(env, addr, oi, retaddr, MO_BEUW, false,
-                       full_be_lduw_mmu);
+                       full_be_lduw_mmu, wrap_lduw_be);
 }
 
 tcg_target_ulong helper_be_lduw_mmu(CPUArchState *env, target_ulong addr,
@@ -1454,7 +1456,7 @@ static uint64_t full_le_ldul_mmu(CPUArchState *env, target_ulong addr,
                                  TCGMemOpIdx oi, uintptr_t retaddr)
 {
     return load_helper(env, addr, oi, retaddr, MO_LEUL, false,
-                       full_le_ldul_mmu);
+                       full_le_ldul_mmu, wrap_ldul_le);
 }
 
 tcg_target_ulong helper_le_ldul_mmu(CPUArchState *env, target_ulong addr,
@@ -1467,7 +1469,7 @@ static uint64_t full_be_ldul_mmu(CPUArchState *env, target_ulong addr,
                                  TCGMemOpIdx oi, uintptr_t retaddr)
 {
     return load_helper(env, addr, oi, retaddr, MO_BEUL, false,
-                       full_be_ldul_mmu);
+                       full_be_ldul_mmu, wrap_ldul_be);
 }
 
 tcg_target_ulong helper_be_ldul_mmu(CPUArchState *env, target_ulong addr,
@@ -1480,14 +1482,14 @@ uint64_t helper_le_ldq_mmu(CPUArchState *env, target_ulong addr,
                            TCGMemOpIdx oi, uintptr_t retaddr)
 {
     return load_helper(env, addr, oi, retaddr, MO_LEQ, false,
-                       helper_le_ldq_mmu);
+                       helper_le_ldq_mmu, ldq_le_p);
 }
 
 uint64_t helper_be_ldq_mmu(CPUArchState *env, target_ulong addr,
                            TCGMemOpIdx oi, uintptr_t retaddr)
 {
     return load_helper(env, addr, oi, retaddr, MO_BEQ, false,
-                       helper_be_ldq_mmu);
+                       helper_be_ldq_mmu, ldq_be_p);
 }
 
 /*
@@ -1530,9 +1532,38 @@ tcg_target_ulong helper_be_ldsl_mmu(CPUArchState *env, target_ulong addr,
  * Store Helpers
  */
 
+typedef void StoreHelper(void *, uint64_t);
+
+/* Wrap the unaligned store helpers to that they have a common signature.  */
+static inline void wrap_stb(void *haddr, uint64_t val)
+{
+    stb_p(haddr, val);
+}
+
+static inline void wrap_stw_be(void *haddr, uint64_t val)
+{
+    stw_be_p(haddr, val);
+}
+
+static inline void wrap_stw_le(void *haddr, uint64_t val)
+{
+    stw_le_p(haddr, val);
+}
+
+static inline void wrap_stl_be(void *haddr, uint64_t val)
+{
+    stl_be_p(haddr, val);
+}
+
+static inline void wrap_stl_le(void *haddr, uint64_t val)
+{
+    stl_le_p(haddr, val);
+}
+
 static inline void QEMU_ALWAYS_INLINE
 store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
-             TCGMemOpIdx oi, uintptr_t retaddr, MemOp op)
+             TCGMemOpIdx oi, uintptr_t retaddr, MemOp op,
+             StoreHelper *direct)
 {
     uintptr_t mmu_idx = get_mmuidx(oi);
     uintptr_t index = tlb_index(env, mmu_idx, addr);
@@ -1657,74 +1688,49 @@ store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
 
  do_aligned_access:
     haddr = (void *)((uintptr_t)addr + entry->addend);
-    switch (op) {
-    case MO_UB:
-        stb_p(haddr, val);
-        break;
-    case MO_BEUW:
-        stw_be_p(haddr, val);
-        break;
-    case MO_LEUW:
-        stw_le_p(haddr, val);
-        break;
-    case MO_BEUL:
-        stl_be_p(haddr, val);
-        break;
-    case MO_LEUL:
-        stl_le_p(haddr, val);
-        break;
-    case MO_BEQ:
-        stq_be_p(haddr, val);
-        break;
-    case MO_LEQ:
-        stq_le_p(haddr, val);
-        break;
-    default:
-        g_assert_not_reached();
-        break;
-    }
+    direct(haddr, val);
 }
 
 void helper_ret_stb_mmu(CPUArchState *env, target_ulong addr, uint8_t val,
                         TCGMemOpIdx oi, uintptr_t retaddr)
 {
-    store_helper(env, addr, val, oi, retaddr, MO_UB);
+    store_helper(env, addr, val, oi, retaddr, MO_UB, wrap_stb);
 }
 
 void helper_le_stw_mmu(CPUArchState *env, target_ulong addr, uint16_t val,
                        TCGMemOpIdx oi, uintptr_t retaddr)
 {
-    store_helper(env, addr, val, oi, retaddr, MO_LEUW);
+    store_helper(env, addr, val, oi, retaddr, MO_LEUW, wrap_stw_le);
 }
 
 void helper_be_stw_mmu(CPUArchState *env, target_ulong addr, uint16_t val,
                        TCGMemOpIdx oi, uintptr_t retaddr)
 {
-    store_helper(env, addr, val, oi, retaddr, MO_BEUW);
+    store_helper(env, addr, val, oi, retaddr, MO_BEUW, wrap_stw_be);
 }
 
 void helper_le_stl_mmu(CPUArchState *env, target_ulong addr, uint32_t val,
                        TCGMemOpIdx oi, uintptr_t retaddr)
 {
-    store_helper(env, addr, val, oi, retaddr, MO_LEUL);
+    store_helper(env, addr, val, oi, retaddr, MO_LEUL, wrap_stl_le);
 }
 
 void helper_be_stl_mmu(CPUArchState *env, target_ulong addr, uint32_t val,
                        TCGMemOpIdx oi, uintptr_t retaddr)
 {
-    store_helper(env, addr, val, oi, retaddr, MO_BEUL);
+    store_helper(env, addr, val, oi, retaddr, MO_BEUL, wrap_stl_be);
 }
 
 void helper_le_stq_mmu(CPUArchState *env, target_ulong addr, uint64_t val,
                        TCGMemOpIdx oi, uintptr_t retaddr)
 {
-    store_helper(env, addr, val, oi, retaddr, MO_LEQ);
+    store_helper(env, addr, val, oi, retaddr, MO_LEQ, stq_le_p);
 }
 
 void helper_be_stq_mmu(CPUArchState *env, target_ulong addr, uint64_t val,
                        TCGMemOpIdx oi, uintptr_t retaddr)
 {
-    store_helper(env, addr, val, oi, retaddr, MO_BEQ);
+    store_helper(env, addr, val, oi, retaddr, MO_BEQ, stq_be_p);
 }
 
 /* First set of helpers allows passing in of OI and RETADDR.  This makes
@@ -1789,7 +1795,8 @@ void helper_be_stq_mmu(CPUArchState *env, target_ulong addr, uint64_t val,
 static uint64_t full_ldub_cmmu(CPUArchState *env, target_ulong addr,
                                TCGMemOpIdx oi, uintptr_t retaddr)
 {
-    return load_helper(env, addr, oi, retaddr, MO_8, true, full_ldub_cmmu);
+    return load_helper(env, addr, oi, retaddr, MO_8, true,
+                       full_ldub_cmmu, wrap_ldub);
 }
 
 uint8_t helper_ret_ldb_cmmu(CPUArchState *env, target_ulong addr,
@@ -1802,7 +1809,7 @@ static uint64_t full_le_lduw_cmmu(CPUArchState *env, target_ulong addr,
                                   TCGMemOpIdx oi, uintptr_t retaddr)
 {
     return load_helper(env, addr, oi, retaddr, MO_LEUW, true,
-                       full_le_lduw_cmmu);
+                       full_le_lduw_cmmu, wrap_lduw_le);
 }
 
 uint16_t helper_le_ldw_cmmu(CPUArchState *env, target_ulong addr,
@@ -1815,7 +1822,7 @@ static uint64_t full_be_lduw_cmmu(CPUArchState *env, target_ulong addr,
                                   TCGMemOpIdx oi, uintptr_t retaddr)
 {
     return load_helper(env, addr, oi, retaddr, MO_BEUW, true,
-                       full_be_lduw_cmmu);
+                       full_be_lduw_cmmu, wrap_lduw_be);
 }
 
 uint16_t helper_be_ldw_cmmu(CPUArchState *env, target_ulong addr,
@@ -1828,7 +1835,7 @@ static uint64_t full_le_ldul_cmmu(CPUArchState *env, target_ulong addr,
                                   TCGMemOpIdx oi, uintptr_t retaddr)
 {
     return load_helper(env, addr, oi, retaddr, MO_LEUL, true,
-                       full_le_ldul_cmmu);
+                       full_le_ldul_cmmu, wrap_ldul_le);
 }
 
 uint32_t helper_le_ldl_cmmu(CPUArchState *env, target_ulong addr,
@@ -1841,7 +1848,7 @@ static uint64_t full_be_ldul_cmmu(CPUArchState *env, target_ulong addr,
                                   TCGMemOpIdx oi, uintptr_t retaddr)
 {
     return load_helper(env, addr, oi, retaddr, MO_BEUL, true,
-                       full_be_ldul_cmmu);
+                       full_be_ldul_cmmu, wrap_ldul_be);
 }
 
 uint32_t helper_be_ldl_cmmu(CPUArchState *env, target_ulong addr,
@@ -1854,12 +1861,12 @@ uint64_t helper_le_ldq_cmmu(CPUArchState *env, target_ulong addr,
                             TCGMemOpIdx oi, uintptr_t retaddr)
 {
     return load_helper(env, addr, oi, retaddr, MO_LEQ, true,
-                       helper_le_ldq_cmmu);
+                       helper_le_ldq_cmmu, ldq_le_p);
 }
 
 uint64_t helper_be_ldq_cmmu(CPUArchState *env, target_ulong addr,
                             TCGMemOpIdx oi, uintptr_t retaddr)
 {
     return load_helper(env, addr, oi, retaddr, MO_BEQ, true,
-                       helper_be_ldq_cmmu);
+                       helper_be_ldq_cmmu, ldq_be_p);
 }
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v3 10/20] cputlb: Introduce TLB_BSWAP
  2019-09-22  3:54 [PATCH v3 00/20] Move rom and notdirty handling to cputlb Richard Henderson
                   ` (8 preceding siblings ...)
  2019-09-22  3:54 ` [PATCH v3 09/20] cputlb: Replace switches in load/store_helper with callback Richard Henderson
@ 2019-09-22  3:54 ` Richard Henderson
  2019-09-23  8:33   ` David Hildenbrand
  2019-09-22  3:54 ` [PATCH v3 11/20] exec: Adjust notdirty tracing Richard Henderson
                   ` (12 subsequent siblings)
  22 siblings, 1 reply; 53+ messages in thread
From: Richard Henderson @ 2019-09-22  3:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, alex.bennee, stefanha, david

Handle bswap on ram directly in load/store_helper.  This fixes a
bug with the previous implementation in that one cannot use the
I/O path for RAM.

Fixes: a26fc6f5152b47f1
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/exec/cpu-all.h |   4 +-
 accel/tcg/cputlb.c     | 108 +++++++++++++++++++++--------------------
 2 files changed, 59 insertions(+), 53 deletions(-)

diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index 2db73c7a27..1ebd1b59ab 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -346,12 +346,14 @@ CPUArchState *cpu_copy(CPUArchState *env);
 #define TLB_MMIO            (1 << (TARGET_PAGE_BITS_MIN - 3))
 /* Set if TLB entry contains a watchpoint.  */
 #define TLB_WATCHPOINT      (1 << (TARGET_PAGE_BITS_MIN - 4))
+/* Set if TLB entry requires byte swap.  */
+#define TLB_BSWAP           (1 << (TARGET_PAGE_BITS_MIN - 5))
 
 /* Use this mask to check interception with an alignment mask
  * in a TCG backend.
  */
 #define TLB_FLAGS_MASK \
-    (TLB_INVALID_MASK | TLB_NOTDIRTY | TLB_MMIO | TLB_WATCHPOINT)
+    (TLB_INVALID_MASK | TLB_NOTDIRTY | TLB_MMIO | TLB_WATCHPOINT | TLB_BSWAP)
 
 /**
  * tlb_hit_page: return true if page aligned @addr is a hit against the
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index b4a63d3928..cb603917a2 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -737,8 +737,7 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
         address |= TLB_INVALID_MASK;
     }
     if (attrs.byte_swap) {
-        /* Force the access through the I/O slow path.  */
-        address |= TLB_MMIO;
+        address |= TLB_BSWAP;
     }
     if (!memory_region_is_ram(section->mr) &&
         !memory_region_is_romd(section->mr)) {
@@ -901,10 +900,6 @@ static uint64_t io_readx(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
     bool locked = false;
     MemTxResult r;
 
-    if (iotlbentry->attrs.byte_swap) {
-        op ^= MO_BSWAP;
-    }
-
     section = iotlb_to_section(cpu, iotlbentry->addr, iotlbentry->attrs);
     mr = section->mr;
     mr_offset = (iotlbentry->addr & TARGET_PAGE_MASK) + addr;
@@ -947,10 +942,6 @@ static void io_writex(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
     bool locked = false;
     MemTxResult r;
 
-    if (iotlbentry->attrs.byte_swap) {
-        op ^= MO_BSWAP;
-    }
-
     section = iotlb_to_section(cpu, iotlbentry->addr, iotlbentry->attrs);
     mr = section->mr;
     mr_offset = (iotlbentry->addr & TARGET_PAGE_MASK) + addr;
@@ -1133,8 +1124,8 @@ void *probe_access(CPUArchState *env, target_ulong addr, int size,
                              wp_access, retaddr);
     }
 
-    if (tlb_addr & (TLB_NOTDIRTY | TLB_MMIO)) {
-        /* I/O access */
+    /* Reject I/O access, or other required slow-path.  */
+    if (tlb_addr & (TLB_NOTDIRTY | TLB_MMIO | TLB_BSWAP)) {
         return NULL;
     }
 
@@ -1311,7 +1302,8 @@ static inline uint64_t wrap_ldul_le(const void *haddr)
 static inline uint64_t QEMU_ALWAYS_INLINE
 load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
             uintptr_t retaddr, MemOp op, bool code_read,
-            FullLoadHelper *full_load, LoadHelper *direct)
+            FullLoadHelper *full_load, LoadHelper *direct,
+            LoadHelper *direct_swap)
 {
     uintptr_t mmu_idx = get_mmuidx(oi);
     uintptr_t index = tlb_index(env, mmu_idx, addr);
@@ -1361,17 +1353,21 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
             /* On watchpoint hit, this will longjmp out.  */
             cpu_check_watchpoint(env_cpu(env), addr, size,
                                  iotlbentry->attrs, BP_MEM_READ, retaddr);
-
-            /* The backing page may or may not require I/O.  */
-            tlb_addr &= ~TLB_WATCHPOINT;
-            if ((tlb_addr & ~TARGET_PAGE_MASK) == 0) {
-                goto do_aligned_access;
-            }
         }
 
         /* Handle I/O access.  */
-        return io_readx(env, iotlbentry, mmu_idx, addr,
-                        retaddr, access_type, op);
+        if (likely(tlb_addr & TLB_MMIO)) {
+            return io_readx(env, iotlbentry, mmu_idx, addr,
+                            retaddr, access_type,
+                            op ^ (tlb_addr & TLB_BSWAP ? MO_BSWAP : 0));
+        }
+
+        haddr = (void *)((uintptr_t)addr + entry->addend);
+
+        if (unlikely(tlb_addr & TLB_BSWAP)) {
+            return direct_swap(haddr);
+        }
+        return direct(haddr);
     }
 
     /* Handle slow unaligned access (it spans two pages or IO).  */
@@ -1398,7 +1394,6 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
         return res & MAKE_64BIT_MASK(0, size * 8);
     }
 
- do_aligned_access:
     haddr = (void *)((uintptr_t)addr + entry->addend);
     return direct(haddr);
 }
@@ -1417,7 +1412,7 @@ static uint64_t full_ldub_mmu(CPUArchState *env, target_ulong addr,
                               TCGMemOpIdx oi, uintptr_t retaddr)
 {
     return load_helper(env, addr, oi, retaddr, MO_UB, false,
-                       full_ldub_mmu, wrap_ldub);
+                       full_ldub_mmu, wrap_ldub, wrap_ldub);
 }
 
 tcg_target_ulong helper_ret_ldub_mmu(CPUArchState *env, target_ulong addr,
@@ -1430,7 +1425,7 @@ static uint64_t full_le_lduw_mmu(CPUArchState *env, target_ulong addr,
                                  TCGMemOpIdx oi, uintptr_t retaddr)
 {
     return load_helper(env, addr, oi, retaddr, MO_LEUW, false,
-                       full_le_lduw_mmu, wrap_lduw_le);
+                       full_le_lduw_mmu, wrap_lduw_le, wrap_lduw_be);
 }
 
 tcg_target_ulong helper_le_lduw_mmu(CPUArchState *env, target_ulong addr,
@@ -1443,7 +1438,7 @@ static uint64_t full_be_lduw_mmu(CPUArchState *env, target_ulong addr,
                                  TCGMemOpIdx oi, uintptr_t retaddr)
 {
     return load_helper(env, addr, oi, retaddr, MO_BEUW, false,
-                       full_be_lduw_mmu, wrap_lduw_be);
+                       full_be_lduw_mmu, wrap_lduw_be, wrap_lduw_le);
 }
 
 tcg_target_ulong helper_be_lduw_mmu(CPUArchState *env, target_ulong addr,
@@ -1456,7 +1451,7 @@ static uint64_t full_le_ldul_mmu(CPUArchState *env, target_ulong addr,
                                  TCGMemOpIdx oi, uintptr_t retaddr)
 {
     return load_helper(env, addr, oi, retaddr, MO_LEUL, false,
-                       full_le_ldul_mmu, wrap_ldul_le);
+                       full_le_ldul_mmu, wrap_ldul_le, wrap_ldul_be);
 }
 
 tcg_target_ulong helper_le_ldul_mmu(CPUArchState *env, target_ulong addr,
@@ -1469,7 +1464,7 @@ static uint64_t full_be_ldul_mmu(CPUArchState *env, target_ulong addr,
                                  TCGMemOpIdx oi, uintptr_t retaddr)
 {
     return load_helper(env, addr, oi, retaddr, MO_BEUL, false,
-                       full_be_ldul_mmu, wrap_ldul_be);
+                       full_be_ldul_mmu, wrap_ldul_be, wrap_ldul_le);
 }
 
 tcg_target_ulong helper_be_ldul_mmu(CPUArchState *env, target_ulong addr,
@@ -1482,14 +1477,14 @@ uint64_t helper_le_ldq_mmu(CPUArchState *env, target_ulong addr,
                            TCGMemOpIdx oi, uintptr_t retaddr)
 {
     return load_helper(env, addr, oi, retaddr, MO_LEQ, false,
-                       helper_le_ldq_mmu, ldq_le_p);
+                       helper_le_ldq_mmu, ldq_le_p, ldq_be_p);
 }
 
 uint64_t helper_be_ldq_mmu(CPUArchState *env, target_ulong addr,
                            TCGMemOpIdx oi, uintptr_t retaddr)
 {
     return load_helper(env, addr, oi, retaddr, MO_BEQ, false,
-                       helper_be_ldq_mmu, ldq_be_p);
+                       helper_be_ldq_mmu, ldq_be_p, ldq_le_p);
 }
 
 /*
@@ -1563,7 +1558,7 @@ static inline void wrap_stl_le(void *haddr, uint64_t val)
 static inline void QEMU_ALWAYS_INLINE
 store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
              TCGMemOpIdx oi, uintptr_t retaddr, MemOp op,
-             StoreHelper *direct)
+             StoreHelper *direct, StoreHelper *direct_swap)
 {
     uintptr_t mmu_idx = get_mmuidx(oi);
     uintptr_t index = tlb_index(env, mmu_idx, addr);
@@ -1608,16 +1603,22 @@ store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
             /* On watchpoint hit, this will longjmp out.  */
             cpu_check_watchpoint(env_cpu(env), addr, size,
                                  iotlbentry->attrs, BP_MEM_WRITE, retaddr);
-
-            /* The backing page may or may not require I/O.  */
-            tlb_addr &= ~TLB_WATCHPOINT;
-            if ((tlb_addr & ~TARGET_PAGE_MASK) == 0) {
-                goto do_aligned_access;
-            }
         }
 
         /* Handle I/O access.  */
-        io_writex(env, iotlbentry, mmu_idx, val, addr, retaddr, op);
+        if (likely(tlb_addr & (TLB_MMIO | TLB_NOTDIRTY))) {
+            io_writex(env, iotlbentry, mmu_idx, val, addr, retaddr,
+                      op ^ (tlb_addr & TLB_BSWAP ? MO_BSWAP : 0));
+            return;
+        }
+
+        haddr = (void *)((uintptr_t)addr + entry->addend);
+
+        if (unlikely(tlb_addr & TLB_BSWAP)) {
+            direct_swap(haddr, val);
+        } else {
+            direct(haddr, val);
+        }
         return;
     }
 
@@ -1686,7 +1687,6 @@ store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
         return;
     }
 
- do_aligned_access:
     haddr = (void *)((uintptr_t)addr + entry->addend);
     direct(haddr, val);
 }
@@ -1694,43 +1694,47 @@ store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
 void helper_ret_stb_mmu(CPUArchState *env, target_ulong addr, uint8_t val,
                         TCGMemOpIdx oi, uintptr_t retaddr)
 {
-    store_helper(env, addr, val, oi, retaddr, MO_UB, wrap_stb);
+    store_helper(env, addr, val, oi, retaddr, MO_UB, wrap_stb, wrap_stb);
 }
 
 void helper_le_stw_mmu(CPUArchState *env, target_ulong addr, uint16_t val,
                        TCGMemOpIdx oi, uintptr_t retaddr)
 {
-    store_helper(env, addr, val, oi, retaddr, MO_LEUW, wrap_stw_le);
+    store_helper(env, addr, val, oi, retaddr, MO_LEUW,
+                 wrap_stw_le, wrap_stw_be);
 }
 
 void helper_be_stw_mmu(CPUArchState *env, target_ulong addr, uint16_t val,
                        TCGMemOpIdx oi, uintptr_t retaddr)
 {
-    store_helper(env, addr, val, oi, retaddr, MO_BEUW, wrap_stw_be);
+    store_helper(env, addr, val, oi, retaddr, MO_BEUW,
+                 wrap_stw_be, wrap_stw_le);
 }
 
 void helper_le_stl_mmu(CPUArchState *env, target_ulong addr, uint32_t val,
                        TCGMemOpIdx oi, uintptr_t retaddr)
 {
-    store_helper(env, addr, val, oi, retaddr, MO_LEUL, wrap_stl_le);
+    store_helper(env, addr, val, oi, retaddr, MO_LEUL,
+                 wrap_stl_le, wrap_stl_be);
 }
 
 void helper_be_stl_mmu(CPUArchState *env, target_ulong addr, uint32_t val,
                        TCGMemOpIdx oi, uintptr_t retaddr)
 {
-    store_helper(env, addr, val, oi, retaddr, MO_BEUL, wrap_stl_be);
+    store_helper(env, addr, val, oi, retaddr, MO_BEUL,
+                 wrap_stl_be, wrap_stl_le);
 }
 
 void helper_le_stq_mmu(CPUArchState *env, target_ulong addr, uint64_t val,
                        TCGMemOpIdx oi, uintptr_t retaddr)
 {
-    store_helper(env, addr, val, oi, retaddr, MO_LEQ, stq_le_p);
+    store_helper(env, addr, val, oi, retaddr, MO_LEQ, stq_le_p, stq_be_p);
 }
 
 void helper_be_stq_mmu(CPUArchState *env, target_ulong addr, uint64_t val,
                        TCGMemOpIdx oi, uintptr_t retaddr)
 {
-    store_helper(env, addr, val, oi, retaddr, MO_BEQ, stq_be_p);
+    store_helper(env, addr, val, oi, retaddr, MO_BEQ, stq_be_p, stq_le_p);
 }
 
 /* First set of helpers allows passing in of OI and RETADDR.  This makes
@@ -1796,7 +1800,7 @@ static uint64_t full_ldub_cmmu(CPUArchState *env, target_ulong addr,
                                TCGMemOpIdx oi, uintptr_t retaddr)
 {
     return load_helper(env, addr, oi, retaddr, MO_8, true,
-                       full_ldub_cmmu, wrap_ldub);
+                       full_ldub_cmmu, wrap_ldub, wrap_ldub);
 }
 
 uint8_t helper_ret_ldb_cmmu(CPUArchState *env, target_ulong addr,
@@ -1809,7 +1813,7 @@ static uint64_t full_le_lduw_cmmu(CPUArchState *env, target_ulong addr,
                                   TCGMemOpIdx oi, uintptr_t retaddr)
 {
     return load_helper(env, addr, oi, retaddr, MO_LEUW, true,
-                       full_le_lduw_cmmu, wrap_lduw_le);
+                       full_le_lduw_cmmu, wrap_lduw_le, wrap_lduw_be);
 }
 
 uint16_t helper_le_ldw_cmmu(CPUArchState *env, target_ulong addr,
@@ -1822,7 +1826,7 @@ static uint64_t full_be_lduw_cmmu(CPUArchState *env, target_ulong addr,
                                   TCGMemOpIdx oi, uintptr_t retaddr)
 {
     return load_helper(env, addr, oi, retaddr, MO_BEUW, true,
-                       full_be_lduw_cmmu, wrap_lduw_be);
+                       full_be_lduw_cmmu, wrap_lduw_be, wrap_lduw_le);
 }
 
 uint16_t helper_be_ldw_cmmu(CPUArchState *env, target_ulong addr,
@@ -1835,7 +1839,7 @@ static uint64_t full_le_ldul_cmmu(CPUArchState *env, target_ulong addr,
                                   TCGMemOpIdx oi, uintptr_t retaddr)
 {
     return load_helper(env, addr, oi, retaddr, MO_LEUL, true,
-                       full_le_ldul_cmmu, wrap_ldul_le);
+                       full_le_ldul_cmmu, wrap_ldul_le, wrap_ldul_be);
 }
 
 uint32_t helper_le_ldl_cmmu(CPUArchState *env, target_ulong addr,
@@ -1848,7 +1852,7 @@ static uint64_t full_be_ldul_cmmu(CPUArchState *env, target_ulong addr,
                                   TCGMemOpIdx oi, uintptr_t retaddr)
 {
     return load_helper(env, addr, oi, retaddr, MO_BEUL, true,
-                       full_be_ldul_cmmu, wrap_ldul_be);
+                       full_be_ldul_cmmu, wrap_ldul_be, wrap_ldul_le);
 }
 
 uint32_t helper_be_ldl_cmmu(CPUArchState *env, target_ulong addr,
@@ -1861,12 +1865,12 @@ uint64_t helper_le_ldq_cmmu(CPUArchState *env, target_ulong addr,
                             TCGMemOpIdx oi, uintptr_t retaddr)
 {
     return load_helper(env, addr, oi, retaddr, MO_LEQ, true,
-                       helper_le_ldq_cmmu, ldq_le_p);
+                       helper_le_ldq_cmmu, ldq_le_p, ldq_be_p);
 }
 
 uint64_t helper_be_ldq_cmmu(CPUArchState *env, target_ulong addr,
                             TCGMemOpIdx oi, uintptr_t retaddr)
 {
     return load_helper(env, addr, oi, retaddr, MO_BEQ, true,
-                       helper_be_ldq_cmmu, ldq_be_p);
+                       helper_be_ldq_cmmu, ldq_be_p, ldq_le_p);
 }
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v3 11/20] exec: Adjust notdirty tracing
  2019-09-22  3:54 [PATCH v3 00/20] Move rom and notdirty handling to cputlb Richard Henderson
                   ` (9 preceding siblings ...)
  2019-09-22  3:54 ` [PATCH v3 10/20] cputlb: Introduce TLB_BSWAP Richard Henderson
@ 2019-09-22  3:54 ` Richard Henderson
  2019-09-23  9:17   ` Philippe Mathieu-Daudé
  2019-09-22  3:54 ` [PATCH v3 12/20] cputlb: Move ROM handling from I/O path to TLB path Richard Henderson
                   ` (11 subsequent siblings)
  22 siblings, 1 reply; 53+ messages in thread
From: Richard Henderson @ 2019-09-22  3:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, alex.bennee, stefanha, david

The memory_region_tb_read tracepoint is unreachable, since notdirty
is supposed to apply only to writes.  The memory_region_tb_write
tracepoint is mis-named, because notdirty is not only used for TB
invalidation.  It is also used for e.g. VGA RAM updates and migration.

Replace memory_region_tb_write with memory_notdirty_write_access,
and place it in memory_notdirty_write_prepare where it can catch
all of the instances.  Add memory_notdirty_set_dirty to log when
we no longer intercept writes to a page.

Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 exec.c       | 3 +++
 memory.c     | 4 ----
 trace-events | 4 ++--
 3 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/exec.c b/exec.c
index 33bd0e36c1..7ce0515635 100644
--- a/exec.c
+++ b/exec.c
@@ -2721,6 +2721,8 @@ void memory_notdirty_write_prepare(NotDirtyInfo *ndi,
     ndi->size = size;
     ndi->pages = NULL;
 
+    trace_memory_notdirty_write_access(mem_vaddr, ram_addr, size);
+
     assert(tcg_enabled());
     if (!cpu_physical_memory_get_dirty_flag(ram_addr, DIRTY_MEMORY_CODE)) {
         ndi->pages = page_collection_lock(ram_addr, ram_addr + size);
@@ -2745,6 +2747,7 @@ void memory_notdirty_write_complete(NotDirtyInfo *ndi)
     /* we remove the notdirty callback only if the code has been
        flushed */
     if (!cpu_physical_memory_is_clean(ndi->ram_addr)) {
+        trace_memory_notdirty_set_dirty(ndi->mem_vaddr);
         tlb_set_dirty(ndi->cpu, ndi->mem_vaddr);
     }
 }
diff --git a/memory.c b/memory.c
index b9dd6b94ca..57c44c97db 100644
--- a/memory.c
+++ b/memory.c
@@ -438,7 +438,6 @@ static MemTxResult  memory_region_read_accessor(MemoryRegion *mr,
         /* Accesses to code which has previously been translated into a TB show
          * up in the MMIO path, as accesses to the io_mem_notdirty
          * MemoryRegion. */
-        trace_memory_region_tb_read(get_cpu_index(), addr, tmp, size);
     } else if (TRACE_MEMORY_REGION_OPS_READ_ENABLED) {
         hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
         trace_memory_region_ops_read(get_cpu_index(), mr, abs_addr, tmp, size);
@@ -465,7 +464,6 @@ static MemTxResult memory_region_read_with_attrs_accessor(MemoryRegion *mr,
         /* Accesses to code which has previously been translated into a TB show
          * up in the MMIO path, as accesses to the io_mem_notdirty
          * MemoryRegion. */
-        trace_memory_region_tb_read(get_cpu_index(), addr, tmp, size);
     } else if (TRACE_MEMORY_REGION_OPS_READ_ENABLED) {
         hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
         trace_memory_region_ops_read(get_cpu_index(), mr, abs_addr, tmp, size);
@@ -490,7 +488,6 @@ static MemTxResult memory_region_write_accessor(MemoryRegion *mr,
         /* Accesses to code which has previously been translated into a TB show
          * up in the MMIO path, as accesses to the io_mem_notdirty
          * MemoryRegion. */
-        trace_memory_region_tb_write(get_cpu_index(), addr, tmp, size);
     } else if (TRACE_MEMORY_REGION_OPS_WRITE_ENABLED) {
         hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
         trace_memory_region_ops_write(get_cpu_index(), mr, abs_addr, tmp, size);
@@ -515,7 +512,6 @@ static MemTxResult memory_region_write_with_attrs_accessor(MemoryRegion *mr,
         /* Accesses to code which has previously been translated into a TB show
          * up in the MMIO path, as accesses to the io_mem_notdirty
          * MemoryRegion. */
-        trace_memory_region_tb_write(get_cpu_index(), addr, tmp, size);
     } else if (TRACE_MEMORY_REGION_OPS_WRITE_ENABLED) {
         hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
         trace_memory_region_ops_write(get_cpu_index(), mr, abs_addr, tmp, size);
diff --git a/trace-events b/trace-events
index 823a4ae64e..20821ba545 100644
--- a/trace-events
+++ b/trace-events
@@ -52,14 +52,14 @@ dma_map_wait(void *dbs) "dbs=%p"
 find_ram_offset(uint64_t size, uint64_t offset) "size: 0x%" PRIx64 " @ 0x%" PRIx64
 find_ram_offset_loop(uint64_t size, uint64_t candidate, uint64_t offset, uint64_t next, uint64_t mingap) "trying size: 0x%" PRIx64 " @ 0x%" PRIx64 ", offset: 0x%" PRIx64" next: 0x%" PRIx64 " mingap: 0x%" PRIx64
 ram_block_discard_range(const char *rbname, void *hva, size_t length, bool need_madvise, bool need_fallocate, int ret) "%s@%p + 0x%zx: madvise: %d fallocate: %d ret: %d"
+memory_notdirty_write_access(uint64_t vaddr, uint64_t ram_addr, unsigned size) "0x%" PRIx64 " ram_addr 0x%" PRIx64 " size %u"
+memory_notdirty_set_dirty(uint64_t vaddr) "0x%" PRIx64
 
 # memory.c
 memory_region_ops_read(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u"
 memory_region_ops_write(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u"
 memory_region_subpage_read(int cpu_index, void *mr, uint64_t offset, uint64_t value, unsigned size) "cpu %d mr %p offset 0x%"PRIx64" value 0x%"PRIx64" size %u"
 memory_region_subpage_write(int cpu_index, void *mr, uint64_t offset, uint64_t value, unsigned size) "cpu %d mr %p offset 0x%"PRIx64" value 0x%"PRIx64" size %u"
-memory_region_tb_read(int cpu_index, uint64_t addr, uint64_t value, unsigned size) "cpu %d addr 0x%"PRIx64" value 0x%"PRIx64" size %u"
-memory_region_tb_write(int cpu_index, uint64_t addr, uint64_t value, unsigned size) "cpu %d addr 0x%"PRIx64" value 0x%"PRIx64" size %u"
 memory_region_ram_device_read(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u"
 memory_region_ram_device_write(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u"
 flatview_new(void *view, void *root) "%p (root %p)"
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v3 12/20] cputlb: Move ROM handling from I/O path to TLB path
  2019-09-22  3:54 [PATCH v3 00/20] Move rom and notdirty handling to cputlb Richard Henderson
                   ` (10 preceding siblings ...)
  2019-09-22  3:54 ` [PATCH v3 11/20] exec: Adjust notdirty tracing Richard Henderson
@ 2019-09-22  3:54 ` Richard Henderson
  2019-09-23  8:39   ` David Hildenbrand
  2019-09-22  3:54 ` [PATCH v3 13/20] cputlb: Move NOTDIRTY " Richard Henderson
                   ` (10 subsequent siblings)
  22 siblings, 1 reply; 53+ messages in thread
From: Richard Henderson @ 2019-09-22  3:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, alex.bennee, stefanha, david

It does not require going through the whole I/O path
in order to discard a write.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/exec/cpu-all.h    |  5 ++++-
 include/exec/cpu-common.h |  1 -
 accel/tcg/cputlb.c        | 35 +++++++++++++++++++--------------
 exec.c                    | 41 +--------------------------------------
 4 files changed, 25 insertions(+), 57 deletions(-)

diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index 1ebd1b59ab..9f0b17802e 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -348,12 +348,15 @@ CPUArchState *cpu_copy(CPUArchState *env);
 #define TLB_WATCHPOINT      (1 << (TARGET_PAGE_BITS_MIN - 4))
 /* Set if TLB entry requires byte swap.  */
 #define TLB_BSWAP           (1 << (TARGET_PAGE_BITS_MIN - 5))
+/* Set if TLB entry writes ignored.  */
+#define TLB_ROM             (1 << (TARGET_PAGE_BITS_MIN - 6))
 
 /* Use this mask to check interception with an alignment mask
  * in a TCG backend.
  */
 #define TLB_FLAGS_MASK \
-    (TLB_INVALID_MASK | TLB_NOTDIRTY | TLB_MMIO | TLB_WATCHPOINT | TLB_BSWAP)
+    (TLB_INVALID_MASK | TLB_NOTDIRTY | TLB_MMIO \
+    | TLB_WATCHPOINT | TLB_BSWAP | TLB_ROM)
 
 /**
  * tlb_hit_page: return true if page aligned @addr is a hit against the
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index f7dbe75fbc..1c0e03ddc2 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -100,7 +100,6 @@ void qemu_flush_coalesced_mmio_buffer(void);
 
 void cpu_flush_icache_range(hwaddr start, hwaddr len);
 
-extern struct MemoryRegion io_mem_rom;
 extern struct MemoryRegion io_mem_notdirty;
 
 typedef int (RAMBlockIterFunc)(RAMBlock *rb, void *opaque);
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index cb603917a2..7ab523d7ec 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -577,7 +577,7 @@ static void tlb_reset_dirty_range_locked(CPUTLBEntry *tlb_entry,
 {
     uintptr_t addr = tlb_entry->addr_write;
 
-    if ((addr & (TLB_INVALID_MASK | TLB_MMIO | TLB_NOTDIRTY)) == 0) {
+    if ((addr & (TLB_INVALID_MASK | TLB_MMIO | TLB_ROM | TLB_NOTDIRTY)) == 0) {
         addr &= TARGET_PAGE_MASK;
         addr += tlb_entry->addend;
         if ((addr - start) < length) {
@@ -745,7 +745,6 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
         address |= TLB_MMIO;
         addend = 0;
     } else {
-        /* TLB_MMIO for rom/romd handled below */
         addend = (uintptr_t)memory_region_get_ram_ptr(section->mr) + xlat;
     }
 
@@ -822,16 +821,17 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
 
     tn.addr_write = -1;
     if (prot & PAGE_WRITE) {
-        if ((memory_region_is_ram(section->mr) && section->readonly)
-            || memory_region_is_romd(section->mr)) {
-            /* Write access calls the I/O callback.  */
-            tn.addr_write = address | TLB_MMIO;
-        } else if (memory_region_is_ram(section->mr)
-                   && cpu_physical_memory_is_clean(
-                       memory_region_get_ram_addr(section->mr) + xlat)) {
-            tn.addr_write = address | TLB_NOTDIRTY;
-        } else {
-            tn.addr_write = address;
+        tn.addr_write = address;
+        if (memory_region_is_romd(section->mr)) {
+            /* Use the MMIO path so that the device can switch states. */
+            tn.addr_write |= TLB_MMIO;
+        } else if (memory_region_is_ram(section->mr)) {
+            if (section->readonly) {
+                tn.addr_write |= TLB_ROM;
+            } else if (cpu_physical_memory_is_clean(
+                        memory_region_get_ram_addr(section->mr) + xlat)) {
+                tn.addr_write |= TLB_NOTDIRTY;
+            }
         }
         if (prot & PAGE_WRITE_INV) {
             tn.addr_write |= TLB_INVALID_MASK;
@@ -904,7 +904,7 @@ static uint64_t io_readx(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
     mr = section->mr;
     mr_offset = (iotlbentry->addr & TARGET_PAGE_MASK) + addr;
     cpu->mem_io_pc = retaddr;
-    if (mr != &io_mem_rom && mr != &io_mem_notdirty && !cpu->can_do_io) {
+    if (mr != &io_mem_notdirty && !cpu->can_do_io) {
         cpu_io_recompile(cpu, retaddr);
     }
 
@@ -945,7 +945,7 @@ static void io_writex(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
     section = iotlb_to_section(cpu, iotlbentry->addr, iotlbentry->attrs);
     mr = section->mr;
     mr_offset = (iotlbentry->addr & TARGET_PAGE_MASK) + addr;
-    if (mr != &io_mem_rom && mr != &io_mem_notdirty && !cpu->can_do_io) {
+    if (mr != &io_mem_notdirty && !cpu->can_do_io) {
         cpu_io_recompile(cpu, retaddr);
     }
     cpu->mem_io_vaddr = addr;
@@ -1125,7 +1125,7 @@ void *probe_access(CPUArchState *env, target_ulong addr, int size,
     }
 
     /* Reject I/O access, or other required slow-path.  */
-    if (tlb_addr & (TLB_NOTDIRTY | TLB_MMIO | TLB_BSWAP)) {
+    if (tlb_addr & (TLB_NOTDIRTY | TLB_MMIO | TLB_BSWAP | TLB_ROM)) {
         return NULL;
     }
 
@@ -1612,6 +1612,11 @@ store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
             return;
         }
 
+        /* Ignore writes to ROM.  */
+        if (unlikely(tlb_addr & TLB_ROM)) {
+            return;
+        }
+
         haddr = (void *)((uintptr_t)addr + entry->addend);
 
         if (unlikely(tlb_addr & TLB_BSWAP)) {
diff --git a/exec.c b/exec.c
index 7ce0515635..e21e068535 100644
--- a/exec.c
+++ b/exec.c
@@ -88,7 +88,7 @@ static MemoryRegion *system_io;
 AddressSpace address_space_io;
 AddressSpace address_space_memory;
 
-MemoryRegion io_mem_rom, io_mem_notdirty;
+MemoryRegion io_mem_notdirty;
 static MemoryRegion io_mem_unassigned;
 #endif
 
@@ -158,7 +158,6 @@ typedef struct subpage_t {
 
 #define PHYS_SECTION_UNASSIGNED 0
 #define PHYS_SECTION_NOTDIRTY 1
-#define PHYS_SECTION_ROM 2
 
 static void io_mem_init(void);
 static void memory_map_init(void);
@@ -1441,8 +1440,6 @@ hwaddr memory_region_section_get_iotlb(CPUState *cpu,
         iotlb = memory_region_get_ram_addr(section->mr) + xlat;
         if (!section->readonly) {
             iotlb |= PHYS_SECTION_NOTDIRTY;
-        } else {
-            iotlb |= PHYS_SECTION_ROM;
         }
     } else {
         AddressSpaceDispatch *d;
@@ -2968,38 +2965,6 @@ static uint16_t dummy_section(PhysPageMap *map, FlatView *fv, MemoryRegion *mr)
     return phys_section_add(map, &section);
 }
 
-static void readonly_mem_write(void *opaque, hwaddr addr,
-                               uint64_t val, unsigned size)
-{
-    /* Ignore any write to ROM. */
-}
-
-static bool readonly_mem_accepts(void *opaque, hwaddr addr,
-                                 unsigned size, bool is_write,
-                                 MemTxAttrs attrs)
-{
-    return is_write;
-}
-
-/* This will only be used for writes, because reads are special cased
- * to directly access the underlying host ram.
- */
-static const MemoryRegionOps readonly_mem_ops = {
-    .write = readonly_mem_write,
-    .valid.accepts = readonly_mem_accepts,
-    .endianness = DEVICE_NATIVE_ENDIAN,
-    .valid = {
-        .min_access_size = 1,
-        .max_access_size = 8,
-        .unaligned = false,
-    },
-    .impl = {
-        .min_access_size = 1,
-        .max_access_size = 8,
-        .unaligned = false,
-    },
-};
-
 MemoryRegionSection *iotlb_to_section(CPUState *cpu,
                                       hwaddr index, MemTxAttrs attrs)
 {
@@ -3013,8 +2978,6 @@ MemoryRegionSection *iotlb_to_section(CPUState *cpu,
 
 static void io_mem_init(void)
 {
-    memory_region_init_io(&io_mem_rom, NULL, &readonly_mem_ops,
-                          NULL, NULL, UINT64_MAX);
     memory_region_init_io(&io_mem_unassigned, NULL, &unassigned_mem_ops, NULL,
                           NULL, UINT64_MAX);
 
@@ -3035,8 +2998,6 @@ AddressSpaceDispatch *address_space_dispatch_new(FlatView *fv)
     assert(n == PHYS_SECTION_UNASSIGNED);
     n = dummy_section(&d->map, fv, &io_mem_notdirty);
     assert(n == PHYS_SECTION_NOTDIRTY);
-    n = dummy_section(&d->map, fv, &io_mem_rom);
-    assert(n == PHYS_SECTION_ROM);
 
     d->phys_map  = (PhysPageEntry) { .ptr = PHYS_MAP_NODE_NIL, .skip = 1 };
 
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v3 13/20] cputlb: Move NOTDIRTY handling from I/O path to TLB path
  2019-09-22  3:54 [PATCH v3 00/20] Move rom and notdirty handling to cputlb Richard Henderson
                   ` (11 preceding siblings ...)
  2019-09-22  3:54 ` [PATCH v3 12/20] cputlb: Move ROM handling from I/O path to TLB path Richard Henderson
@ 2019-09-22  3:54 ` Richard Henderson
  2019-09-23  8:41   ` David Hildenbrand
  2019-09-23  9:30   ` Philippe Mathieu-Daudé
  2019-09-22  3:54 ` [PATCH v3 14/20] cputlb: Partially inline memory_region_section_get_iotlb Richard Henderson
                   ` (9 subsequent siblings)
  22 siblings, 2 replies; 53+ messages in thread
From: Richard Henderson @ 2019-09-22  3:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, alex.bennee, stefanha, david

Pages that we want to track for NOTDIRTY are RAM.  We do not
really need to go through the I/O path to handle them.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/exec/cpu-common.h |  2 --
 accel/tcg/cputlb.c        | 26 +++++++++++++++++---
 exec.c                    | 50 ---------------------------------------
 memory.c                  | 16 -------------
 4 files changed, 23 insertions(+), 71 deletions(-)

diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 1c0e03ddc2..81753bbb34 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -100,8 +100,6 @@ void qemu_flush_coalesced_mmio_buffer(void);
 
 void cpu_flush_icache_range(hwaddr start, hwaddr len);
 
-extern struct MemoryRegion io_mem_notdirty;
-
 typedef int (RAMBlockIterFunc)(RAMBlock *rb, void *opaque);
 
 int qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque);
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 7ab523d7ec..b7bd738115 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -904,7 +904,7 @@ static uint64_t io_readx(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
     mr = section->mr;
     mr_offset = (iotlbentry->addr & TARGET_PAGE_MASK) + addr;
     cpu->mem_io_pc = retaddr;
-    if (mr != &io_mem_notdirty && !cpu->can_do_io) {
+    if (!cpu->can_do_io) {
         cpu_io_recompile(cpu, retaddr);
     }
 
@@ -945,7 +945,7 @@ static void io_writex(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
     section = iotlb_to_section(cpu, iotlbentry->addr, iotlbentry->attrs);
     mr = section->mr;
     mr_offset = (iotlbentry->addr & TARGET_PAGE_MASK) + addr;
-    if (mr != &io_mem_notdirty && !cpu->can_do_io) {
+    if (!cpu->can_do_io) {
         cpu_io_recompile(cpu, retaddr);
     }
     cpu->mem_io_vaddr = addr;
@@ -1606,7 +1606,7 @@ store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
         }
 
         /* Handle I/O access.  */
-        if (likely(tlb_addr & (TLB_MMIO | TLB_NOTDIRTY))) {
+        if (tlb_addr & TLB_MMIO) {
             io_writex(env, iotlbentry, mmu_idx, val, addr, retaddr,
                       op ^ (tlb_addr & TLB_BSWAP ? MO_BSWAP : 0));
             return;
@@ -1619,6 +1619,26 @@ store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
 
         haddr = (void *)((uintptr_t)addr + entry->addend);
 
+        /* Handle clean RAM pages.  */
+        if (tlb_addr & TLB_NOTDIRTY) {
+            NotDirtyInfo ndi;
+
+            /* We require mem_io_pc in tb_invalidate_phys_page_range.  */
+            env_cpu(env)->mem_io_pc = retaddr;
+
+            memory_notdirty_write_prepare(&ndi, env_cpu(env), addr,
+                                          addr + iotlbentry->addr, size);
+
+            if (unlikely(tlb_addr & TLB_BSWAP)) {
+                direct_swap(haddr, val);
+            } else {
+                direct(haddr, val);
+            }
+
+            memory_notdirty_write_complete(&ndi);
+            return;
+        }
+
         if (unlikely(tlb_addr & TLB_BSWAP)) {
             direct_swap(haddr, val);
         } else {
diff --git a/exec.c b/exec.c
index e21e068535..abf58b68a0 100644
--- a/exec.c
+++ b/exec.c
@@ -88,7 +88,6 @@ static MemoryRegion *system_io;
 AddressSpace address_space_io;
 AddressSpace address_space_memory;
 
-MemoryRegion io_mem_notdirty;
 static MemoryRegion io_mem_unassigned;
 #endif
 
@@ -157,7 +156,6 @@ typedef struct subpage_t {
 } subpage_t;
 
 #define PHYS_SECTION_UNASSIGNED 0
-#define PHYS_SECTION_NOTDIRTY 1
 
 static void io_mem_init(void);
 static void memory_map_init(void);
@@ -1438,9 +1436,6 @@ hwaddr memory_region_section_get_iotlb(CPUState *cpu,
     if (memory_region_is_ram(section->mr)) {
         /* Normal RAM.  */
         iotlb = memory_region_get_ram_addr(section->mr) + xlat;
-        if (!section->readonly) {
-            iotlb |= PHYS_SECTION_NOTDIRTY;
-        }
     } else {
         AddressSpaceDispatch *d;
 
@@ -2749,42 +2744,6 @@ void memory_notdirty_write_complete(NotDirtyInfo *ndi)
     }
 }
 
-/* Called within RCU critical section.  */
-static void notdirty_mem_write(void *opaque, hwaddr ram_addr,
-                               uint64_t val, unsigned size)
-{
-    NotDirtyInfo ndi;
-
-    memory_notdirty_write_prepare(&ndi, current_cpu, current_cpu->mem_io_vaddr,
-                         ram_addr, size);
-
-    stn_p(qemu_map_ram_ptr(NULL, ram_addr), size, val);
-    memory_notdirty_write_complete(&ndi);
-}
-
-static bool notdirty_mem_accepts(void *opaque, hwaddr addr,
-                                 unsigned size, bool is_write,
-                                 MemTxAttrs attrs)
-{
-    return is_write;
-}
-
-static const MemoryRegionOps notdirty_mem_ops = {
-    .write = notdirty_mem_write,
-    .valid.accepts = notdirty_mem_accepts,
-    .endianness = DEVICE_NATIVE_ENDIAN,
-    .valid = {
-        .min_access_size = 1,
-        .max_access_size = 8,
-        .unaligned = false,
-    },
-    .impl = {
-        .min_access_size = 1,
-        .max_access_size = 8,
-        .unaligned = false,
-    },
-};
-
 /* Generate a debug exception if a watchpoint has been hit.  */
 void cpu_check_watchpoint(CPUState *cpu, vaddr addr, vaddr len,
                           MemTxAttrs attrs, int flags, uintptr_t ra)
@@ -2980,13 +2939,6 @@ static void io_mem_init(void)
 {
     memory_region_init_io(&io_mem_unassigned, NULL, &unassigned_mem_ops, NULL,
                           NULL, UINT64_MAX);
-
-    /* io_mem_notdirty calls tb_invalidate_phys_page_fast,
-     * which can be called without the iothread mutex.
-     */
-    memory_region_init_io(&io_mem_notdirty, NULL, &notdirty_mem_ops, NULL,
-                          NULL, UINT64_MAX);
-    memory_region_clear_global_locking(&io_mem_notdirty);
 }
 
 AddressSpaceDispatch *address_space_dispatch_new(FlatView *fv)
@@ -2996,8 +2948,6 @@ AddressSpaceDispatch *address_space_dispatch_new(FlatView *fv)
 
     n = dummy_section(&d->map, fv, &io_mem_unassigned);
     assert(n == PHYS_SECTION_UNASSIGNED);
-    n = dummy_section(&d->map, fv, &io_mem_notdirty);
-    assert(n == PHYS_SECTION_NOTDIRTY);
 
     d->phys_map  = (PhysPageEntry) { .ptr = PHYS_MAP_NODE_NIL, .skip = 1 };
 
diff --git a/memory.c b/memory.c
index 57c44c97db..a99b8c0767 100644
--- a/memory.c
+++ b/memory.c
@@ -434,10 +434,6 @@ static MemTxResult  memory_region_read_accessor(MemoryRegion *mr,
     tmp = mr->ops->read(mr->opaque, addr, size);
     if (mr->subpage) {
         trace_memory_region_subpage_read(get_cpu_index(), mr, addr, tmp, size);
-    } else if (mr == &io_mem_notdirty) {
-        /* Accesses to code which has previously been translated into a TB show
-         * up in the MMIO path, as accesses to the io_mem_notdirty
-         * MemoryRegion. */
     } else if (TRACE_MEMORY_REGION_OPS_READ_ENABLED) {
         hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
         trace_memory_region_ops_read(get_cpu_index(), mr, abs_addr, tmp, size);
@@ -460,10 +456,6 @@ static MemTxResult memory_region_read_with_attrs_accessor(MemoryRegion *mr,
     r = mr->ops->read_with_attrs(mr->opaque, addr, &tmp, size, attrs);
     if (mr->subpage) {
         trace_memory_region_subpage_read(get_cpu_index(), mr, addr, tmp, size);
-    } else if (mr == &io_mem_notdirty) {
-        /* Accesses to code which has previously been translated into a TB show
-         * up in the MMIO path, as accesses to the io_mem_notdirty
-         * MemoryRegion. */
     } else if (TRACE_MEMORY_REGION_OPS_READ_ENABLED) {
         hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
         trace_memory_region_ops_read(get_cpu_index(), mr, abs_addr, tmp, size);
@@ -484,10 +476,6 @@ static MemTxResult memory_region_write_accessor(MemoryRegion *mr,
 
     if (mr->subpage) {
         trace_memory_region_subpage_write(get_cpu_index(), mr, addr, tmp, size);
-    } else if (mr == &io_mem_notdirty) {
-        /* Accesses to code which has previously been translated into a TB show
-         * up in the MMIO path, as accesses to the io_mem_notdirty
-         * MemoryRegion. */
     } else if (TRACE_MEMORY_REGION_OPS_WRITE_ENABLED) {
         hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
         trace_memory_region_ops_write(get_cpu_index(), mr, abs_addr, tmp, size);
@@ -508,10 +496,6 @@ static MemTxResult memory_region_write_with_attrs_accessor(MemoryRegion *mr,
 
     if (mr->subpage) {
         trace_memory_region_subpage_write(get_cpu_index(), mr, addr, tmp, size);
-    } else if (mr == &io_mem_notdirty) {
-        /* Accesses to code which has previously been translated into a TB show
-         * up in the MMIO path, as accesses to the io_mem_notdirty
-         * MemoryRegion. */
     } else if (TRACE_MEMORY_REGION_OPS_WRITE_ENABLED) {
         hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
         trace_memory_region_ops_write(get_cpu_index(), mr, abs_addr, tmp, size);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v3 14/20] cputlb: Partially inline memory_region_section_get_iotlb
  2019-09-22  3:54 [PATCH v3 00/20] Move rom and notdirty handling to cputlb Richard Henderson
                   ` (12 preceding siblings ...)
  2019-09-22  3:54 ` [PATCH v3 13/20] cputlb: Move NOTDIRTY " Richard Henderson
@ 2019-09-22  3:54 ` Richard Henderson
  2019-09-22  3:54 ` [PATCH v3 15/20] cputlb: Merge and move memory_notdirty_write_{prepare, complete} Richard Henderson
                   ` (8 subsequent siblings)
  22 siblings, 0 replies; 53+ messages in thread
From: Richard Henderson @ 2019-09-22  3:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, alex.bennee, stefanha, david

There is only one caller, tlb_set_page_with_attrs.  We cannot
inline the entire function because the AddressSpaceDispatch
structure is private to exec.c, and cannot easily be moved to
include/exec/memory-internal.h.

Compute is_ram and is_romd once within tlb_set_page_with_attrs.
Fold the number of tests against these predicates.  Compute
cpu_physical_memory_is_clean outside of the tlb lock region.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/exec/exec-all.h |  6 +---
 accel/tcg/cputlb.c      | 68 ++++++++++++++++++++++++++---------------
 exec.c                  | 22 ++-----------
 3 files changed, 47 insertions(+), 49 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 81b02eb2fe..49db07ba0b 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -509,11 +509,7 @@ address_space_translate_for_iotlb(CPUState *cpu, int asidx, hwaddr addr,
                                   hwaddr *xlat, hwaddr *plen,
                                   MemTxAttrs attrs, int *prot);
 hwaddr memory_region_section_get_iotlb(CPUState *cpu,
-                                       MemoryRegionSection *section,
-                                       target_ulong vaddr,
-                                       hwaddr paddr, hwaddr xlat,
-                                       int prot,
-                                       target_ulong *address);
+                                       MemoryRegionSection *section);
 #endif
 
 /* vl.c */
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index b7bd738115..1a839c0f82 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -704,13 +704,14 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
     MemoryRegionSection *section;
     unsigned int index;
     target_ulong address;
-    target_ulong code_address;
+    target_ulong write_address;
     uintptr_t addend;
     CPUTLBEntry *te, tn;
     hwaddr iotlb, xlat, sz, paddr_page;
     target_ulong vaddr_page;
     int asidx = cpu_asidx_from_attrs(cpu, attrs);
     int wp_flags;
+    bool is_ram, is_romd;
 
     assert_cpu_is_self(cpu);
 
@@ -739,18 +740,46 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
     if (attrs.byte_swap) {
         address |= TLB_BSWAP;
     }
-    if (!memory_region_is_ram(section->mr) &&
-        !memory_region_is_romd(section->mr)) {
-        /* IO memory case */
-        address |= TLB_MMIO;
-        addend = 0;
-    } else {
+
+    is_ram = memory_region_is_ram(section->mr);
+    is_romd = memory_region_is_romd(section->mr);
+
+    if (is_ram || is_romd) {
+        /* RAM and ROMD both have associated host memory. */
         addend = (uintptr_t)memory_region_get_ram_ptr(section->mr) + xlat;
+    } else {
+        /* I/O does not; force the host address to NULL. */
+        addend = 0;
+    }
+
+    write_address = address;
+    if (is_ram) {
+        iotlb = memory_region_get_ram_addr(section->mr) + xlat;
+        /*
+         * Computing is_clean is expensive; avoid all that unless
+         * the page is actually writable.
+         */
+        if (prot & PAGE_WRITE) {
+            if (section->readonly) {
+                write_address |= TLB_ROM;
+            } else if (cpu_physical_memory_is_clean(iotlb)) {
+                write_address |= TLB_NOTDIRTY;
+            }
+        }
+    } else {
+        /* I/O or ROMD */
+        iotlb = memory_region_section_get_iotlb(cpu, section) + xlat;
+        /*
+         * Writes to romd devices must go through MMIO to enable write.
+         * Reads to romd devices go through the ram_ptr found above,
+         * but of course reads to I/O must go through MMIO.
+         */
+        write_address |= TLB_MMIO;
+        if (!is_romd) {
+            address = write_address;
+        }
     }
 
-    code_address = address;
-    iotlb = memory_region_section_get_iotlb(cpu, section, vaddr_page,
-                                            paddr_page, xlat, prot, &address);
     wp_flags = cpu_watchpoint_address_matches(cpu, vaddr_page,
                                               TARGET_PAGE_SIZE);
 
@@ -790,8 +819,8 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
     /*
      * At this point iotlb contains a physical section number in the lower
      * TARGET_PAGE_BITS, and either
-     *  + the ram_addr_t of the page base of the target RAM (if NOTDIRTY or ROM)
-     *  + the offset within section->mr of the page base (otherwise)
+     *  + the ram_addr_t of the page base of the target RAM (RAM)
+     *  + the offset within section->mr of the page base (I/O, ROMD)
      * We subtract the vaddr_page (which is page aligned and thus won't
      * disturb the low bits) to give an offset which can be added to the
      * (non-page-aligned) vaddr of the eventual memory access to get
@@ -814,25 +843,14 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
     }
 
     if (prot & PAGE_EXEC) {
-        tn.addr_code = code_address;
+        tn.addr_code = address;
     } else {
         tn.addr_code = -1;
     }
 
     tn.addr_write = -1;
     if (prot & PAGE_WRITE) {
-        tn.addr_write = address;
-        if (memory_region_is_romd(section->mr)) {
-            /* Use the MMIO path so that the device can switch states. */
-            tn.addr_write |= TLB_MMIO;
-        } else if (memory_region_is_ram(section->mr)) {
-            if (section->readonly) {
-                tn.addr_write |= TLB_ROM;
-            } else if (cpu_physical_memory_is_clean(
-                        memory_region_get_ram_addr(section->mr) + xlat)) {
-                tn.addr_write |= TLB_NOTDIRTY;
-            }
-        }
+        tn.addr_write = write_address;
         if (prot & PAGE_WRITE_INV) {
             tn.addr_write |= TLB_INVALID_MASK;
         }
diff --git a/exec.c b/exec.c
index abf58b68a0..9c9cc811b3 100644
--- a/exec.c
+++ b/exec.c
@@ -1425,26 +1425,10 @@ bool cpu_physical_memory_snapshot_get_dirty(DirtyBitmapSnapshot *snap,
 
 /* Called from RCU critical section */
 hwaddr memory_region_section_get_iotlb(CPUState *cpu,
-                                       MemoryRegionSection *section,
-                                       target_ulong vaddr,
-                                       hwaddr paddr, hwaddr xlat,
-                                       int prot,
-                                       target_ulong *address)
+                                       MemoryRegionSection *section)
 {
-    hwaddr iotlb;
-
-    if (memory_region_is_ram(section->mr)) {
-        /* Normal RAM.  */
-        iotlb = memory_region_get_ram_addr(section->mr) + xlat;
-    } else {
-        AddressSpaceDispatch *d;
-
-        d = flatview_to_dispatch(section->fv);
-        iotlb = section - d->map.sections;
-        iotlb += xlat;
-    }
-
-    return iotlb;
+    AddressSpaceDispatch *d = flatview_to_dispatch(section->fv);
+    return section - d->map.sections;
 }
 #endif /* defined(CONFIG_USER_ONLY) */
 
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v3 15/20] cputlb: Merge and move memory_notdirty_write_{prepare, complete}
  2019-09-22  3:54 [PATCH v3 00/20] Move rom and notdirty handling to cputlb Richard Henderson
                   ` (13 preceding siblings ...)
  2019-09-22  3:54 ` [PATCH v3 14/20] cputlb: Partially inline memory_region_section_get_iotlb Richard Henderson
@ 2019-09-22  3:54 ` Richard Henderson
  2019-09-22  3:54 ` [PATCH v3 16/20] cputlb: Handle TLB_NOTDIRTY in probe_access Richard Henderson
                   ` (7 subsequent siblings)
  22 siblings, 0 replies; 53+ messages in thread
From: Richard Henderson @ 2019-09-22  3:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, alex.bennee, stefanha, david

Since 9458a9a1df1a, all readers of the dirty bitmaps wait
for the rcu lock, which means that they wait until the end
of any executing TranslationBlock.

As a consequence, there is no need for the actual access
to happen in between the _prepare and _complete.  Therefore,
we can improve things by merging the two functions into
notdirty_write and dropping the NotDirtyInfo structure.

In addition, the only users of notdirty_write are in cputlb.c,
so move the merged function there.  Pass in the CPUIOTLBEntry
from which the ram_addr_t may be computed.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/exec/memory-internal.h | 65 -----------------------------
 accel/tcg/cputlb.c             | 76 +++++++++++++++++++---------------
 exec.c                         | 44 --------------------
 3 files changed, 42 insertions(+), 143 deletions(-)

diff --git a/include/exec/memory-internal.h b/include/exec/memory-internal.h
index ef4fb92371..9fcc2af25c 100644
--- a/include/exec/memory-internal.h
+++ b/include/exec/memory-internal.h
@@ -49,70 +49,5 @@ void address_space_dispatch_free(AddressSpaceDispatch *d);
 
 void mtree_print_dispatch(struct AddressSpaceDispatch *d,
                           MemoryRegion *root);
-
-struct page_collection;
-
-/* Opaque struct for passing info from memory_notdirty_write_prepare()
- * to memory_notdirty_write_complete(). Callers should treat all fields
- * as private, with the exception of @active.
- *
- * @active is a field which is not touched by either the prepare or
- * complete functions, but which the caller can use if it wishes to
- * track whether it has called prepare for this struct and so needs
- * to later call the complete function.
- */
-typedef struct {
-    CPUState *cpu;
-    struct page_collection *pages;
-    ram_addr_t ram_addr;
-    vaddr mem_vaddr;
-    unsigned size;
-    bool active;
-} NotDirtyInfo;
-
-/**
- * memory_notdirty_write_prepare: call before writing to non-dirty memory
- * @ndi: pointer to opaque NotDirtyInfo struct
- * @cpu: CPU doing the write
- * @mem_vaddr: virtual address of write
- * @ram_addr: the ram address of the write
- * @size: size of write in bytes
- *
- * Any code which writes to the host memory corresponding to
- * guest RAM which has been marked as NOTDIRTY must wrap those
- * writes in calls to memory_notdirty_write_prepare() and
- * memory_notdirty_write_complete():
- *
- *  NotDirtyInfo ndi;
- *  memory_notdirty_write_prepare(&ndi, ....);
- *  ... perform write here ...
- *  memory_notdirty_write_complete(&ndi);
- *
- * These calls will ensure that we flush any TCG translated code for
- * the memory being written, update the dirty bits and (if possible)
- * remove the slowpath callback for writing to the memory.
- *
- * This must only be called if we are using TCG; it will assert otherwise.
- *
- * We may take locks in the prepare call, so callers must ensure that
- * they don't exit (via longjump or otherwise) without calling complete.
- *
- * This call must only be made inside an RCU critical section.
- * (Note that while we're executing a TCG TB we're always in an
- * RCU critical section, which is likely to be the case for callers
- * of these functions.)
- */
-void memory_notdirty_write_prepare(NotDirtyInfo *ndi,
-                                   CPUState *cpu,
-                                   vaddr mem_vaddr,
-                                   ram_addr_t ram_addr,
-                                   unsigned size);
-/**
- * memory_notdirty_write_complete: finish write to non-dirty memory
- * @ndi: pointer to the opaque NotDirtyInfo struct which was initialized
- * by memory_not_dirty_write_prepare().
- */
-void memory_notdirty_write_complete(NotDirtyInfo *ndi);
-
 #endif
 #endif
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 1a839c0f82..6f685cb93a 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -33,6 +33,7 @@
 #include "exec/helper-proto.h"
 #include "qemu/atomic.h"
 #include "qemu/atomic128.h"
+#include "translate-all.h"
 
 /* DEBUG defines, enable DEBUG_TLB_LOG to log to the CPU_LOG_MMU target */
 /* #define DEBUG_TLB */
@@ -1084,6 +1085,37 @@ tb_page_addr_t get_page_addr_code(CPUArchState *env, target_ulong addr)
     return qemu_ram_addr_from_host_nofail(p);
 }
 
+static void notdirty_write(CPUState *cpu, vaddr mem_vaddr, unsigned size,
+                           CPUIOTLBEntry *iotlbentry, uintptr_t retaddr)
+{
+    ram_addr_t ram_addr = mem_vaddr + iotlbentry->addr;
+
+    trace_memory_notdirty_write_access(mem_vaddr, ram_addr, size);
+
+    if (!cpu_physical_memory_get_dirty_flag(ram_addr, DIRTY_MEMORY_CODE)) {
+        struct page_collection *pages
+            = page_collection_lock(ram_addr, ram_addr + size);
+
+        /* We require mem_io_pc in tb_invalidate_phys_page_range.  */
+        cpu->mem_io_pc = retaddr;
+
+        tb_invalidate_phys_page_fast(pages, ram_addr, size);
+        page_collection_unlock(pages);
+    }
+
+    /*
+     * Set both VGA and migration bits for simplicity and to remove
+     * the notdirty callback faster.
+     */
+    cpu_physical_memory_set_dirty_range(ram_addr, size, DIRTY_CLIENTS_NOCODE);
+
+    /* We remove the notdirty callback only if the code has been flushed. */
+    if (!cpu_physical_memory_is_clean(ram_addr)) {
+        trace_memory_notdirty_set_dirty(mem_vaddr);
+        tlb_set_dirty(cpu, mem_vaddr);
+    }
+}
+
 /*
  * Probe for whether the specified guest access is permitted. If it is not
  * permitted then an exception will be taken in the same way as if this
@@ -1203,8 +1235,7 @@ void *tlb_vaddr_to_host(CPUArchState *env, abi_ptr addr,
 /* Probe for a read-modify-write atomic operation.  Do not allow unaligned
  * operations, or io operations to proceed.  Return the host address.  */
 static void *atomic_mmu_lookup(CPUArchState *env, target_ulong addr,
-                               TCGMemOpIdx oi, uintptr_t retaddr,
-                               NotDirtyInfo *ndi)
+                               TCGMemOpIdx oi, uintptr_t retaddr)
 {
     size_t mmu_idx = get_mmuidx(oi);
     uintptr_t index = tlb_index(env, mmu_idx, addr);
@@ -1264,12 +1295,9 @@ static void *atomic_mmu_lookup(CPUArchState *env, target_ulong addr,
 
     hostaddr = (void *)((uintptr_t)addr + tlbe->addend);
 
-    ndi->active = false;
     if (unlikely(tlb_addr & TLB_NOTDIRTY)) {
-        ndi->active = true;
-        memory_notdirty_write_prepare(ndi, env_cpu(env), addr,
-                                      qemu_ram_addr_from_host_nofail(hostaddr),
-                                      1 << s_bits);
+        notdirty_write(env_cpu(env), addr, 1 << s_bits,
+                       &env_tlb(env)->d[mmu_idx].iotlb[index], retaddr);
     }
 
     return hostaddr;
@@ -1635,28 +1663,13 @@ store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
             return;
         }
 
-        haddr = (void *)((uintptr_t)addr + entry->addend);
-
         /* Handle clean RAM pages.  */
         if (tlb_addr & TLB_NOTDIRTY) {
-            NotDirtyInfo ndi;
-
-            /* We require mem_io_pc in tb_invalidate_phys_page_range.  */
-            env_cpu(env)->mem_io_pc = retaddr;
-
-            memory_notdirty_write_prepare(&ndi, env_cpu(env), addr,
-                                          addr + iotlbentry->addr, size);
-
-            if (unlikely(tlb_addr & TLB_BSWAP)) {
-                direct_swap(haddr, val);
-            } else {
-                direct(haddr, val);
-            }
-
-            memory_notdirty_write_complete(&ndi);
-            return;
+            notdirty_write(env_cpu(env), addr, size, iotlbentry, retaddr);
         }
 
+        haddr = (void *)((uintptr_t)addr + entry->addend);
+
         if (unlikely(tlb_addr & TLB_BSWAP)) {
             direct_swap(haddr, val);
         } else {
@@ -1786,14 +1799,9 @@ void helper_be_stq_mmu(CPUArchState *env, target_ulong addr, uint64_t val,
 #define EXTRA_ARGS     , TCGMemOpIdx oi, uintptr_t retaddr
 #define ATOMIC_NAME(X) \
     HELPER(glue(glue(glue(atomic_ ## X, SUFFIX), END), _mmu))
-#define ATOMIC_MMU_DECLS NotDirtyInfo ndi
-#define ATOMIC_MMU_LOOKUP atomic_mmu_lookup(env, addr, oi, retaddr, &ndi)
-#define ATOMIC_MMU_CLEANUP                              \
-    do {                                                \
-        if (unlikely(ndi.active)) {                     \
-            memory_notdirty_write_complete(&ndi);       \
-        }                                               \
-    } while (0)
+#define ATOMIC_MMU_DECLS
+#define ATOMIC_MMU_LOOKUP atomic_mmu_lookup(env, addr, oi, retaddr)
+#define ATOMIC_MMU_CLEANUP
 
 #define DATA_SIZE 1
 #include "atomic_template.h"
@@ -1821,7 +1829,7 @@ void helper_be_stq_mmu(CPUArchState *env, target_ulong addr, uint64_t val,
 #undef ATOMIC_MMU_LOOKUP
 #define EXTRA_ARGS         , TCGMemOpIdx oi
 #define ATOMIC_NAME(X)     HELPER(glue(glue(atomic_ ## X, SUFFIX), END))
-#define ATOMIC_MMU_LOOKUP  atomic_mmu_lookup(env, addr, oi, GETPC(), &ndi)
+#define ATOMIC_MMU_LOOKUP  atomic_mmu_lookup(env, addr, oi, GETPC())
 
 #define DATA_SIZE 1
 #include "atomic_template.h"
diff --git a/exec.c b/exec.c
index 9c9cc811b3..090bcc05da 100644
--- a/exec.c
+++ b/exec.c
@@ -2684,50 +2684,6 @@ ram_addr_t qemu_ram_addr_from_host(void *ptr)
     return block->offset + offset;
 }
 
-/* Called within RCU critical section. */
-void memory_notdirty_write_prepare(NotDirtyInfo *ndi,
-                          CPUState *cpu,
-                          vaddr mem_vaddr,
-                          ram_addr_t ram_addr,
-                          unsigned size)
-{
-    ndi->cpu = cpu;
-    ndi->ram_addr = ram_addr;
-    ndi->mem_vaddr = mem_vaddr;
-    ndi->size = size;
-    ndi->pages = NULL;
-
-    trace_memory_notdirty_write_access(mem_vaddr, ram_addr, size);
-
-    assert(tcg_enabled());
-    if (!cpu_physical_memory_get_dirty_flag(ram_addr, DIRTY_MEMORY_CODE)) {
-        ndi->pages = page_collection_lock(ram_addr, ram_addr + size);
-        tb_invalidate_phys_page_fast(ndi->pages, ram_addr, size);
-    }
-}
-
-/* Called within RCU critical section. */
-void memory_notdirty_write_complete(NotDirtyInfo *ndi)
-{
-    if (ndi->pages) {
-        assert(tcg_enabled());
-        page_collection_unlock(ndi->pages);
-        ndi->pages = NULL;
-    }
-
-    /* Set both VGA and migration bits for simplicity and to remove
-     * the notdirty callback faster.
-     */
-    cpu_physical_memory_set_dirty_range(ndi->ram_addr, ndi->size,
-                                        DIRTY_CLIENTS_NOCODE);
-    /* we remove the notdirty callback only if the code has been
-       flushed */
-    if (!cpu_physical_memory_is_clean(ndi->ram_addr)) {
-        trace_memory_notdirty_set_dirty(ndi->mem_vaddr);
-        tlb_set_dirty(ndi->cpu, ndi->mem_vaddr);
-    }
-}
-
 /* Generate a debug exception if a watchpoint has been hit.  */
 void cpu_check_watchpoint(CPUState *cpu, vaddr addr, vaddr len,
                           MemTxAttrs attrs, int flags, uintptr_t ra)
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v3 16/20] cputlb: Handle TLB_NOTDIRTY in probe_access
  2019-09-22  3:54 [PATCH v3 00/20] Move rom and notdirty handling to cputlb Richard Henderson
                   ` (14 preceding siblings ...)
  2019-09-22  3:54 ` [PATCH v3 15/20] cputlb: Merge and move memory_notdirty_write_{prepare, complete} Richard Henderson
@ 2019-09-22  3:54 ` Richard Henderson
  2019-09-22  3:54 ` [PATCH v3 17/20] cputlb: Remove cpu->mem_io_vaddr Richard Henderson
                   ` (6 subsequent siblings)
  22 siblings, 0 replies; 53+ messages in thread
From: Richard Henderson @ 2019-09-22  3:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, alex.bennee, stefanha, david

We can use notdirty_write for the write and
return a valid host pointer for this case.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/cputlb.c | 26 +++++++++++++++++---------
 1 file changed, 17 insertions(+), 9 deletions(-)

diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 6f685cb93a..6f4096bd0d 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -1167,16 +1167,24 @@ void *probe_access(CPUArchState *env, target_ulong addr, int size,
         return NULL;
     }
 
-    /* Handle watchpoints.  */
-    if (tlb_addr & TLB_WATCHPOINT) {
-        cpu_check_watchpoint(env_cpu(env), addr, size,
-                             env_tlb(env)->d[mmu_idx].iotlb[index].attrs,
-                             wp_access, retaddr);
-    }
+    if (unlikely(tlb_addr & TLB_FLAGS_MASK)) {
+        CPUIOTLBEntry *iotlbentry = &env_tlb(env)->d[mmu_idx].iotlb[index];
 
-    /* Reject I/O access, or other required slow-path.  */
-    if (tlb_addr & (TLB_NOTDIRTY | TLB_MMIO | TLB_BSWAP | TLB_ROM)) {
-        return NULL;
+        /* Reject I/O access, or other required slow-path.  */
+        if (tlb_addr & (TLB_MMIO | TLB_BSWAP | TLB_ROM)) {
+            return NULL;
+        }
+
+        /* Handle watchpoints.  */
+        if (tlb_addr & TLB_WATCHPOINT) {
+            cpu_check_watchpoint(env_cpu(env), addr, size,
+                                 iotlbentry->attrs, wp_access, retaddr);
+        }
+
+        /* Handle clean RAM pages.  */
+        if (tlb_addr & TLB_NOTDIRTY) {
+            notdirty_write(env_cpu(env), addr, size, iotlbentry, retaddr);
+        }
     }
 
     return (void *)((uintptr_t)addr + entry->addend);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v3 17/20] cputlb: Remove cpu->mem_io_vaddr
  2019-09-22  3:54 [PATCH v3 00/20] Move rom and notdirty handling to cputlb Richard Henderson
                   ` (15 preceding siblings ...)
  2019-09-22  3:54 ` [PATCH v3 16/20] cputlb: Handle TLB_NOTDIRTY in probe_access Richard Henderson
@ 2019-09-22  3:54 ` Richard Henderson
  2019-09-23  8:50   ` David Hildenbrand
  2019-09-22  3:54 ` [PATCH v3 18/20] cputlb: Remove tb_invalidate_phys_page_range is_cpu_write_access Richard Henderson
                   ` (5 subsequent siblings)
  22 siblings, 1 reply; 53+ messages in thread
From: Richard Henderson @ 2019-09-22  3:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, alex.bennee, stefanha, david

With the merge of notdirty handling into store_helper,
the last user of cpu->mem_io_vaddr was removed.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/hw/core/cpu.h | 2 --
 accel/tcg/cputlb.c    | 2 --
 hw/core/cpu.c         | 1 -
 3 files changed, 5 deletions(-)

diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index c7cda65c66..031f587e51 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -338,7 +338,6 @@ struct qemu_work_item;
  * @next_cpu: Next CPU sharing TB cache.
  * @opaque: User data.
  * @mem_io_pc: Host Program Counter at which the memory was accessed.
- * @mem_io_vaddr: Target virtual address at which the memory was accessed.
  * @kvm_fd: vCPU file descriptor for KVM.
  * @work_mutex: Lock to prevent multiple access to queued_work_*.
  * @queued_work_first: First asynchronous work pending.
@@ -413,7 +412,6 @@ struct CPUState {
      * we store some rarely used information in the CPU context.
      */
     uintptr_t mem_io_pc;
-    vaddr mem_io_vaddr;
     /*
      * This is only needed for the legacy cpu_unassigned_access() hook;
      * when all targets using it have been converted to use
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 6f4096bd0d..257c59c08c 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -927,7 +927,6 @@ static uint64_t io_readx(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
         cpu_io_recompile(cpu, retaddr);
     }
 
-    cpu->mem_io_vaddr = addr;
     cpu->mem_io_access_type = access_type;
 
     if (mr->global_locking && !qemu_mutex_iothread_locked()) {
@@ -967,7 +966,6 @@ static void io_writex(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
     if (!cpu->can_do_io) {
         cpu_io_recompile(cpu, retaddr);
     }
-    cpu->mem_io_vaddr = addr;
     cpu->mem_io_pc = retaddr;
 
     if (mr->global_locking && !qemu_mutex_iothread_locked()) {
diff --git a/hw/core/cpu.c b/hw/core/cpu.c
index 0035845511..73b1ee34d0 100644
--- a/hw/core/cpu.c
+++ b/hw/core/cpu.c
@@ -261,7 +261,6 @@ static void cpu_common_reset(CPUState *cpu)
     cpu->interrupt_request = 0;
     cpu->halted = 0;
     cpu->mem_io_pc = 0;
-    cpu->mem_io_vaddr = 0;
     cpu->icount_extra = 0;
     atomic_set(&cpu->icount_decr_ptr->u32, 0);
     cpu->can_do_io = 1;
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v3 18/20] cputlb: Remove tb_invalidate_phys_page_range is_cpu_write_access
  2019-09-22  3:54 [PATCH v3 00/20] Move rom and notdirty handling to cputlb Richard Henderson
                   ` (16 preceding siblings ...)
  2019-09-22  3:54 ` [PATCH v3 17/20] cputlb: Remove cpu->mem_io_vaddr Richard Henderson
@ 2019-09-22  3:54 ` Richard Henderson
  2019-09-23  8:52   ` David Hildenbrand
  2019-09-22  3:54 ` [PATCH v3 19/20] cputlb: Pass retaddr to tb_invalidate_phys_page_fast Richard Henderson
                   ` (4 subsequent siblings)
  22 siblings, 1 reply; 53+ messages in thread
From: Richard Henderson @ 2019-09-22  3:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, alex.bennee, stefanha, david

All callers pass false to this argument.  Remove it and pass the
constant on to tb_invalidate_phys_page_range__locked.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/translate-all.h | 3 +--
 accel/tcg/translate-all.c | 6 ++----
 exec.c                    | 4 ++--
 3 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/accel/tcg/translate-all.h b/accel/tcg/translate-all.h
index 64f5fd9a05..31f2117188 100644
--- a/accel/tcg/translate-all.h
+++ b/accel/tcg/translate-all.h
@@ -28,8 +28,7 @@ struct page_collection *page_collection_lock(tb_page_addr_t start,
 void page_collection_unlock(struct page_collection *set);
 void tb_invalidate_phys_page_fast(struct page_collection *pages,
                                   tb_page_addr_t start, int len);
-void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
-                                   int is_cpu_write_access);
+void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end);
 void tb_check_watchpoint(CPUState *cpu);
 
 #ifdef CONFIG_USER_ONLY
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 5d1e08b169..de4b697163 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1983,8 +1983,7 @@ tb_invalidate_phys_page_range__locked(struct page_collection *pages,
  *
  * Called with mmap_lock held for user-mode emulation
  */
-void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
-                                   int is_cpu_write_access)
+void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end)
 {
     struct page_collection *pages;
     PageDesc *p;
@@ -1996,8 +1995,7 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
         return;
     }
     pages = page_collection_lock(start, end);
-    tb_invalidate_phys_page_range__locked(pages, p, start, end,
-                                          is_cpu_write_access);
+    tb_invalidate_phys_page_range__locked(pages, p, start, end, 0);
     page_collection_unlock(pages);
 }
 
diff --git a/exec.c b/exec.c
index 090bcc05da..fed25d029b 100644
--- a/exec.c
+++ b/exec.c
@@ -978,7 +978,7 @@ const char *parse_cpu_option(const char *cpu_option)
 void tb_invalidate_phys_addr(target_ulong addr)
 {
     mmap_lock();
-    tb_invalidate_phys_page_range(addr, addr + 1, 0);
+    tb_invalidate_phys_page_range(addr, addr + 1);
     mmap_unlock();
 }
 
@@ -1005,7 +1005,7 @@ void tb_invalidate_phys_addr(AddressSpace *as, hwaddr addr, MemTxAttrs attrs)
         return;
     }
     ram_addr = memory_region_get_ram_addr(mr) + addr;
-    tb_invalidate_phys_page_range(ram_addr, ram_addr + 1, 0);
+    tb_invalidate_phys_page_range(ram_addr, ram_addr + 1);
     rcu_read_unlock();
 }
 
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v3 19/20] cputlb: Pass retaddr to tb_invalidate_phys_page_fast
  2019-09-22  3:54 [PATCH v3 00/20] Move rom and notdirty handling to cputlb Richard Henderson
                   ` (17 preceding siblings ...)
  2019-09-22  3:54 ` [PATCH v3 18/20] cputlb: Remove tb_invalidate_phys_page_range is_cpu_write_access Richard Henderson
@ 2019-09-22  3:54 ` Richard Henderson
  2019-09-23  8:53   ` David Hildenbrand
  2019-09-22  3:54 ` [PATCH v3 20/20] cputlb: Pass retaddr to tb_check_watchpoint Richard Henderson
                   ` (3 subsequent siblings)
  22 siblings, 1 reply; 53+ messages in thread
From: Richard Henderson @ 2019-09-22  3:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, alex.bennee, stefanha, david

Rather than rely on cpu->mem_io_pc, pass retaddr down directly.

Within tb_invalidate_phys_page_range__locked, the is_cpu_write_access
parameter is non-zero exactly when retaddr would be non-zero, so that
is a simple replacement.

Recognize that current_tb_not_found is true only when mem_io_pc
(and now retaddr) are also non-zero, so remove a redundant test.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/translate-all.h |  3 ++-
 accel/tcg/cputlb.c        |  6 +-----
 accel/tcg/translate-all.c | 39 +++++++++++++++++++--------------------
 3 files changed, 22 insertions(+), 26 deletions(-)

diff --git a/accel/tcg/translate-all.h b/accel/tcg/translate-all.h
index 31f2117188..135c1ea96a 100644
--- a/accel/tcg/translate-all.h
+++ b/accel/tcg/translate-all.h
@@ -27,7 +27,8 @@ struct page_collection *page_collection_lock(tb_page_addr_t start,
                                              tb_page_addr_t end);
 void page_collection_unlock(struct page_collection *set);
 void tb_invalidate_phys_page_fast(struct page_collection *pages,
-                                  tb_page_addr_t start, int len);
+                                  tb_page_addr_t start, int len,
+                                  uintptr_t retaddr);
 void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end);
 void tb_check_watchpoint(CPUState *cpu);
 
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 257c59c08c..eff129447d 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -1093,11 +1093,7 @@ static void notdirty_write(CPUState *cpu, vaddr mem_vaddr, unsigned size,
     if (!cpu_physical_memory_get_dirty_flag(ram_addr, DIRTY_MEMORY_CODE)) {
         struct page_collection *pages
             = page_collection_lock(ram_addr, ram_addr + size);
-
-        /* We require mem_io_pc in tb_invalidate_phys_page_range.  */
-        cpu->mem_io_pc = retaddr;
-
-        tb_invalidate_phys_page_fast(pages, ram_addr, size);
+        tb_invalidate_phys_page_fast(pages, ram_addr, size, retaddr);
         page_collection_unlock(pages);
     }
 
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index de4b697163..db77fb221b 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1889,7 +1889,7 @@ static void
 tb_invalidate_phys_page_range__locked(struct page_collection *pages,
                                       PageDesc *p, tb_page_addr_t start,
                                       tb_page_addr_t end,
-                                      int is_cpu_write_access)
+                                      uintptr_t retaddr)
 {
     TranslationBlock *tb;
     tb_page_addr_t tb_start, tb_end;
@@ -1897,9 +1897,9 @@ tb_invalidate_phys_page_range__locked(struct page_collection *pages,
 #ifdef TARGET_HAS_PRECISE_SMC
     CPUState *cpu = current_cpu;
     CPUArchState *env = NULL;
-    int current_tb_not_found = is_cpu_write_access;
+    bool current_tb_not_found = retaddr != 0;
+    bool current_tb_modified = false;
     TranslationBlock *current_tb = NULL;
-    int current_tb_modified = 0;
     target_ulong current_pc = 0;
     target_ulong current_cs_base = 0;
     uint32_t current_flags = 0;
@@ -1931,24 +1931,21 @@ tb_invalidate_phys_page_range__locked(struct page_collection *pages,
         if (!(tb_end <= start || tb_start >= end)) {
 #ifdef TARGET_HAS_PRECISE_SMC
             if (current_tb_not_found) {
-                current_tb_not_found = 0;
-                current_tb = NULL;
-                if (cpu->mem_io_pc) {
-                    /* now we have a real cpu fault */
-                    current_tb = tcg_tb_lookup(cpu->mem_io_pc);
-                }
+                current_tb_not_found = false;
+                /* now we have a real cpu fault */
+                current_tb = tcg_tb_lookup(retaddr);
             }
             if (current_tb == tb &&
                 (tb_cflags(current_tb) & CF_COUNT_MASK) != 1) {
-                /* If we are modifying the current TB, we must stop
-                its execution. We could be more precise by checking
-                that the modification is after the current PC, but it
-                would require a specialized function to partially
-                restore the CPU state */
-
-                current_tb_modified = 1;
-                cpu_restore_state_from_tb(cpu, current_tb,
-                                          cpu->mem_io_pc, true);
+                /*
+                 * If we are modifying the current TB, we must stop
+                 * its execution. We could be more precise by checking
+                 * that the modification is after the current PC, but it
+                 * would require a specialized function to partially
+                 * restore the CPU state.
+                 */
+                current_tb_modified = true;
+                cpu_restore_state_from_tb(cpu, current_tb, retaddr, true);
                 cpu_get_tb_cpu_state(env, &current_pc, &current_cs_base,
                                      &current_flags);
             }
@@ -2042,7 +2039,8 @@ void tb_invalidate_phys_range(target_ulong start, target_ulong end)
  * Call with all @pages in the range [@start, @start + len[ locked.
  */
 void tb_invalidate_phys_page_fast(struct page_collection *pages,
-                                  tb_page_addr_t start, int len)
+                                  tb_page_addr_t start, int len,
+                                  uintptr_t retaddr)
 {
     PageDesc *p;
 
@@ -2069,7 +2067,8 @@ void tb_invalidate_phys_page_fast(struct page_collection *pages,
         }
     } else {
     do_invalidate:
-        tb_invalidate_phys_page_range__locked(pages, p, start, start + len, 1);
+        tb_invalidate_phys_page_range__locked(pages, p, start, start + len,
+                                              retaddr);
     }
 }
 #else
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v3 20/20] cputlb: Pass retaddr to tb_check_watchpoint
  2019-09-22  3:54 [PATCH v3 00/20] Move rom and notdirty handling to cputlb Richard Henderson
                   ` (18 preceding siblings ...)
  2019-09-22  3:54 ` [PATCH v3 19/20] cputlb: Pass retaddr to tb_invalidate_phys_page_fast Richard Henderson
@ 2019-09-22  3:54 ` Richard Henderson
  2019-09-23  8:54   ` David Hildenbrand
  2019-09-22  4:02 ` [PATCH v3 00/20] Move rom and notdirty handling to cputlb Richard Henderson
                   ` (2 subsequent siblings)
  22 siblings, 1 reply; 53+ messages in thread
From: Richard Henderson @ 2019-09-22  3:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, alex.bennee, stefanha, david

Fixes the previous TLB_WATCHPOINT patches because we are currently
failing to set cpu->mem_io_pc with the call to cpu_check_watchpoint.
Pass down the retaddr directly because it's readily available.

Fixes: 50b107c5d61
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/translate-all.h | 2 +-
 accel/tcg/translate-all.c | 6 +++---
 exec.c                    | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/accel/tcg/translate-all.h b/accel/tcg/translate-all.h
index 135c1ea96a..a557b4e2bb 100644
--- a/accel/tcg/translate-all.h
+++ b/accel/tcg/translate-all.h
@@ -30,7 +30,7 @@ void tb_invalidate_phys_page_fast(struct page_collection *pages,
                                   tb_page_addr_t start, int len,
                                   uintptr_t retaddr);
 void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end);
-void tb_check_watchpoint(CPUState *cpu);
+void tb_check_watchpoint(CPUState *cpu, uintptr_t retaddr);
 
 #ifdef CONFIG_USER_ONLY
 int page_unprotect(target_ulong address, uintptr_t pc);
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index db77fb221b..66d4bc4341 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -2142,16 +2142,16 @@ static bool tb_invalidate_phys_page(tb_page_addr_t addr, uintptr_t pc)
 #endif
 
 /* user-mode: call with mmap_lock held */
-void tb_check_watchpoint(CPUState *cpu)
+void tb_check_watchpoint(CPUState *cpu, uintptr_t retaddr)
 {
     TranslationBlock *tb;
 
     assert_memory_lock();
 
-    tb = tcg_tb_lookup(cpu->mem_io_pc);
+    tb = tcg_tb_lookup(retaddr);
     if (tb) {
         /* We can use retranslation to find the PC.  */
-        cpu_restore_state_from_tb(cpu, tb, cpu->mem_io_pc, true);
+        cpu_restore_state_from_tb(cpu, tb, retaddr, true);
         tb_phys_invalidate(tb, -1);
     } else {
         /* The exception probably happened in a helper.  The CPU state should
diff --git a/exec.c b/exec.c
index fed25d029b..ceeef4cd4b 100644
--- a/exec.c
+++ b/exec.c
@@ -2724,7 +2724,7 @@ void cpu_check_watchpoint(CPUState *cpu, vaddr addr, vaddr len,
                 cpu->watchpoint_hit = wp;
 
                 mmap_lock();
-                tb_check_watchpoint(cpu);
+                tb_check_watchpoint(cpu, ra);
                 if (wp->flags & BP_STOP_BEFORE_ACCESS) {
                     cpu->exception_index = EXCP_DEBUG;
                     mmap_unlock();
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 00/20] Move rom and notdirty handling to cputlb
  2019-09-22  3:54 [PATCH v3 00/20] Move rom and notdirty handling to cputlb Richard Henderson
                   ` (19 preceding siblings ...)
  2019-09-22  3:54 ` [PATCH v3 20/20] cputlb: Pass retaddr to tb_check_watchpoint Richard Henderson
@ 2019-09-22  4:02 ` Richard Henderson
  2019-09-22  6:46 ` no-reply
  2019-09-23  8:23 ` David Hildenbrand
  22 siblings, 0 replies; 53+ messages in thread
From: Richard Henderson @ 2019-09-22  4:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, alex.bennee, stefanha, david

On 9/21/19 8:54 PM, Richard Henderson wrote:
> Richard Henderson (20):
>   exec: Use TARGET_PAGE_BITS_MIN for TLB flags
>   exec: Split out variable page size support to exec-vary.c
>   exec: Use const alias for TARGET_PAGE_BITS_VARY
>   exec: Restrict TARGET_PAGE_BITS_VARY assert to CONFIG_DEBUG_TCG
>   exec: Promote TARGET_PAGE_MASK to target_long
>   exec: Tidy TARGET_PAGE_ALIGN
>   exec: Cache TARGET_PAGE_MASK for TARGET_PAGE_BITS_VARY

I didn't mean to include the TARGET_PAGE_BITS_VARY patches.
Ignore the first 7, or review them at your pleasure.  ;-)


r~


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 00/20] Move rom and notdirty handling to cputlb
  2019-09-22  3:54 [PATCH v3 00/20] Move rom and notdirty handling to cputlb Richard Henderson
                   ` (20 preceding siblings ...)
  2019-09-22  4:02 ` [PATCH v3 00/20] Move rom and notdirty handling to cputlb Richard Henderson
@ 2019-09-22  6:46 ` no-reply
  2019-09-23  8:23 ` David Hildenbrand
  22 siblings, 0 replies; 53+ messages in thread
From: no-reply @ 2019-09-22  6:46 UTC (permalink / raw)
  To: richard.henderson; +Cc: pbonzini, alex.bennee, qemu-devel, stefanha, david

Patchew URL: https://patchew.org/QEMU/20190922035458.14879-1-richard.henderson@linaro.org/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Message-id: 20190922035458.14879-1-richard.henderson@linaro.org
Subject: [PATCH v3 00/20] Move rom and notdirty handling to cputlb
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

From https://github.com/patchew-project/qemu
 * [new tag]         patchew/20190922035458.14879-1-richard.henderson@linaro.org -> patchew/20190922035458.14879-1-richard.henderson@linaro.org
Switched to a new branch 'test'
34dd130 cputlb: Pass retaddr to tb_check_watchpoint
7d95821 cputlb: Pass retaddr to tb_invalidate_phys_page_fast
0f82f7a cputlb: Remove tb_invalidate_phys_page_range is_cpu_write_access
143b897 cputlb: Remove cpu->mem_io_vaddr
1ac8201 cputlb: Handle TLB_NOTDIRTY in probe_access
d9ff1a3 cputlb: Merge and move memory_notdirty_write_{prepare, complete}
32abbae cputlb: Partially inline memory_region_section_get_iotlb
3976ef9 cputlb: Move NOTDIRTY handling from I/O path to TLB path
4f3e8aa cputlb: Move ROM handling from I/O path to TLB path
3c42e98 exec: Adjust notdirty tracing
e3bb276 cputlb: Introduce TLB_BSWAP
e48326a cputlb: Replace switches in load/store_helper with callback
287b700 cputlb: Disable __always_inline__ without optimization
8c801ae exec: Cache TARGET_PAGE_MASK for TARGET_PAGE_BITS_VARY
2f76655 exec: Tidy TARGET_PAGE_ALIGN
aa7fdf2 exec: Promote TARGET_PAGE_MASK to target_long
20c6e62 exec: Restrict TARGET_PAGE_BITS_VARY assert to CONFIG_DEBUG_TCG
4db4c87 exec: Use const alias for TARGET_PAGE_BITS_VARY
0d3457e exec: Split out variable page size support to exec-vary.c
b4b2c05 exec: Use TARGET_PAGE_BITS_MIN for TLB flags

=== OUTPUT BEGIN ===
1/20 Checking commit b4b2c05eaa12 (exec: Use TARGET_PAGE_BITS_MIN for TLB flags)
2/20 Checking commit 0d3457ee2850 (exec: Split out variable page size support to exec-vary.c)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#33: 
new file mode 100644

total: 0 errors, 1 warnings, 125 lines checked

Patch 2/20 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
3/20 Checking commit 4db4c8782d8c (exec: Use const alias for TARGET_PAGE_BITS_VARY)
ERROR: externs should be avoided in .c files
#58: FILE: exec-vary.c:53:
+extern const TargetPageBits target_page

total: 1 errors, 0 warnings, 82 lines checked

Patch 3/20 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

4/20 Checking commit 20c6e620cee9 (exec: Restrict TARGET_PAGE_BITS_VARY assert to CONFIG_DEBUG_TCG)
5/20 Checking commit aa7fdf250f2b (exec: Promote TARGET_PAGE_MASK to target_long)
6/20 Checking commit 2f766556467d (exec: Tidy TARGET_PAGE_ALIGN)
7/20 Checking commit 8c801ae246fe (exec: Cache TARGET_PAGE_MASK for TARGET_PAGE_BITS_VARY)
8/20 Checking commit 287b700d5006 (cputlb: Disable __always_inline__ without optimization)
WARNING: architecture specific defines should be avoided
#50: FILE: include/qemu/compiler.h:178:
+#if defined(__OPTIMIZE__) && __has_attribute(always_inline)

total: 0 errors, 1 warnings, 33 lines checked

Patch 8/20 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
9/20 Checking commit e48326ae3127 (cputlb: Replace switches in load/store_helper with callback)
10/20 Checking commit e3bb276cde3b (cputlb: Introduce TLB_BSWAP)
11/20 Checking commit 3c42e98e5a4a (exec: Adjust notdirty tracing)
12/20 Checking commit 4f3e8aa354bc (cputlb: Move ROM handling from I/O path to TLB path)
13/20 Checking commit 3976ef9549a0 (cputlb: Move NOTDIRTY handling from I/O path to TLB path)
14/20 Checking commit 32abbae47942 (cputlb: Partially inline memory_region_section_get_iotlb)
15/20 Checking commit d9ff1a3e3b82 (cputlb: Merge and move memory_notdirty_write_{prepare, complete})
16/20 Checking commit 1ac820151e2f (cputlb: Handle TLB_NOTDIRTY in probe_access)
17/20 Checking commit 143b897be8e4 (cputlb: Remove cpu->mem_io_vaddr)
18/20 Checking commit 0f82f7a7365d (cputlb: Remove tb_invalidate_phys_page_range is_cpu_write_access)
19/20 Checking commit 7d95821f8ce9 (cputlb: Pass retaddr to tb_invalidate_phys_page_fast)
20/20 Checking commit 34dd130e1b50 (cputlb: Pass retaddr to tb_check_watchpoint)
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20190922035458.14879-1-richard.henderson@linaro.org/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 00/20] Move rom and notdirty handling to cputlb
  2019-09-22  3:54 [PATCH v3 00/20] Move rom and notdirty handling to cputlb Richard Henderson
                   ` (21 preceding siblings ...)
  2019-09-22  6:46 ` no-reply
@ 2019-09-23  8:23 ` David Hildenbrand
  22 siblings, 0 replies; 53+ messages in thread
From: David Hildenbrand @ 2019-09-23  8:23 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: pbonzini, alex.bennee, stefanha

On 22.09.19 05:54, Richard Henderson wrote:
> Ok!  Third time is the charm, because this time it works.

Yeay :) I don't wanna know how hard it was to debug that...

> 
> New to v3:
> 
>   * Covert io_mem_rom with a new TLB_ROM bit.
> 
>   * This in turn means that there are no longer any special RAM
>     case along along the MMIO path -- they all have devices on
>     the other end.

That sounds like a really nice cleanup.

> 
>   * This in turn means that we can fold the bulk of 
>     memory_region_section_get_iotlb into tlb_set_page_with_attrs,
>     a couple of redundant tests vs the MemoryRegion.
>     The result in patch 14 is, IMO, much more understandable.
> 
>   * Fold away uses of cpu->mem_io_pc in tb_invalidate_phys_page__locked,
>     the cause of the problems for my previous two patch sets.
> 
>     BTW, I was correct with my guess in the v2 cover letter that the use
>     of memory_notdirty_write_{prepare,complete} within atomic_mmu_lookup
>     must have been broken, for not setting mem_io_pc.  :-P
> 
>   * Fix a missed use of cpu->mem_io_pc in tb_check_watchpoint,
>     which meant that the previous TLB_WATCHPOINT cleanup was a
>     titch broken.

So there was a PC already getting stored.

> 
> The remaining two users of cpu->mem_io_pc are hw/misc/mips_itu.c and
> target/i386/helper.c.  I haven't looked, but I assume that these are
> legitimately on the MMIO path, and there probably isn't a decent way
> to remove the uses.
> 
> 
> r~
> 
> 
> Richard Henderson (20):
>   exec: Use TARGET_PAGE_BITS_MIN for TLB flags
>   exec: Split out variable page size support to exec-vary.c
>   exec: Use const alias for TARGET_PAGE_BITS_VARY
>   exec: Restrict TARGET_PAGE_BITS_VARY assert to CONFIG_DEBUG_TCG
>   exec: Promote TARGET_PAGE_MASK to target_long
>   exec: Tidy TARGET_PAGE_ALIGN
>   exec: Cache TARGET_PAGE_MASK for TARGET_PAGE_BITS_VARY
>   cputlb: Disable __always_inline__ without optimization
>   cputlb: Replace switches in load/store_helper with callback
>   cputlb: Introduce TLB_BSWAP
>   exec: Adjust notdirty tracing
>   cputlb: Move ROM handling from I/O path to TLB path
>   cputlb: Move NOTDIRTY handling from I/O path to TLB path
>   cputlb: Partially inline memory_region_section_get_iotlb
>   cputlb: Merge and move memory_notdirty_write_{prepare,complete}
>   cputlb: Handle TLB_NOTDIRTY in probe_access
>   cputlb: Remove cpu->mem_io_vaddr
>   cputlb: Remove tb_invalidate_phys_page_range is_cpu_write_access
>   cputlb: Pass retaddr to tb_invalidate_phys_page_fast
>   cputlb: Pass retaddr to tb_check_watchpoint
> 
>  Makefile.target                |   2 +-
>  accel/tcg/translate-all.h      |   8 +-
>  include/exec/cpu-all.h         |  48 ++--
>  include/exec/cpu-common.h      |   3 -
>  include/exec/exec-all.h        |   6 +-
>  include/exec/memory-internal.h |  65 ------
>  include/hw/core/cpu.h          |   2 -
>  include/qemu-common.h          |   6 +
>  include/qemu/compiler.h        |  11 +
>  accel/tcg/cputlb.c             | 388 +++++++++++++++++++--------------
>  accel/tcg/translate-all.c      |  51 ++---
>  exec-vary.c                    |  88 ++++++++
>  exec.c                         | 192 +---------------
>  hw/core/cpu.c                  |   1 -
>  memory.c                       |  20 --
>  trace-events                   |   4 +-
>  16 files changed, 403 insertions(+), 492 deletions(-)
>  create mode 100644 exec-vary.c
> 


-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 01/20] exec: Use TARGET_PAGE_BITS_MIN for TLB flags
  2019-09-22  3:54 ` [PATCH v3 01/20] exec: Use TARGET_PAGE_BITS_MIN for TLB flags Richard Henderson
@ 2019-09-23  8:24   ` David Hildenbrand
  0 siblings, 0 replies; 53+ messages in thread
From: David Hildenbrand @ 2019-09-23  8:24 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: pbonzini, alex.bennee, stefanha

On 22.09.19 05:54, Richard Henderson wrote:
> These bits do not need to vary with the actual page size
> used by the guest.
> 
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  include/exec/cpu-all.h | 16 ++++++++++------
>  1 file changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
> index d2d443c4f9..e0c8dc540c 100644
> --- a/include/exec/cpu-all.h
> +++ b/include/exec/cpu-all.h
> @@ -317,20 +317,24 @@ CPUArchState *cpu_copy(CPUArchState *env);
>  
>  #if !defined(CONFIG_USER_ONLY)
>  
> -/* Flags stored in the low bits of the TLB virtual address.  These are
> - * defined so that fast path ram access is all zeros.
> +/*
> + * Flags stored in the low bits of the TLB virtual address.
> + * These are defined so that fast path ram access is all zeros.
>   * The flags all must be between TARGET_PAGE_BITS and
>   * maximum address alignment bit.
> + *
> + * Use TARGET_PAGE_BITS_MIN so that these bits are constant
> + * when TARGET_PAGE_BITS_VARY is in effect.
>   */
>  /* Zero if TLB entry is valid.  */
> -#define TLB_INVALID_MASK    (1 << (TARGET_PAGE_BITS - 1))
> +#define TLB_INVALID_MASK    (1 << (TARGET_PAGE_BITS_MIN - 1))
>  /* Set if TLB entry references a clean RAM page.  The iotlb entry will
>     contain the page physical address.  */
> -#define TLB_NOTDIRTY        (1 << (TARGET_PAGE_BITS - 2))
> +#define TLB_NOTDIRTY        (1 << (TARGET_PAGE_BITS_MIN - 2))
>  /* Set if TLB entry is an IO callback.  */
> -#define TLB_MMIO            (1 << (TARGET_PAGE_BITS - 3))
> +#define TLB_MMIO            (1 << (TARGET_PAGE_BITS_MIN - 3))
>  /* Set if TLB entry contains a watchpoint.  */
> -#define TLB_WATCHPOINT      (1 << (TARGET_PAGE_BITS - 4))
> +#define TLB_WATCHPOINT      (1 << (TARGET_PAGE_BITS_MIN - 4))
>  
>  /* Use this mask to check interception with an alignment mask
>   * in a TCG backend.
> 

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 02/20] exec: Split out variable page size support to exec-vary.c
  2019-09-22  3:54 ` [PATCH v3 02/20] exec: Split out variable page size support to exec-vary.c Richard Henderson
@ 2019-09-23  8:26   ` David Hildenbrand
  2019-09-23 16:27     ` Richard Henderson
  0 siblings, 1 reply; 53+ messages in thread
From: David Hildenbrand @ 2019-09-23  8:26 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: pbonzini, alex.bennee, stefanha

On 22.09.19 05:54, Richard Henderson wrote:
> The next patch will play a trick with "const" that will
> confuse the compiler about the uses of target_page_bits
> within exec.c.  Moving everything to a new file prevents
> this confusion.
> 
> No functional change so far.
> 
> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  Makefile.target       |  2 +-
>  include/qemu-common.h |  6 +++++
>  exec-vary.c           | 57 +++++++++++++++++++++++++++++++++++++++++++
>  exec.c                | 34 --------------------------
>  4 files changed, 64 insertions(+), 35 deletions(-)
>  create mode 100644 exec-vary.c
> 
> diff --git a/Makefile.target b/Makefile.target
> index 5e916230c4..ca3d14efe1 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -107,7 +107,7 @@ obj-y += trace/
>  
>  #########################################################
>  # cpu emulator library
> -obj-y += exec.o
> +obj-y += exec.o exec-vary.o
>  obj-y += accel/
>  obj-$(CONFIG_TCG) += tcg/tcg.o tcg/tcg-op.o tcg/tcg-op-vec.o tcg/tcg-op-gvec.o
>  obj-$(CONFIG_TCG) += tcg/tcg-common.o tcg/optimize.o
> diff --git a/include/qemu-common.h b/include/qemu-common.h
> index 8d84db90b0..082da59e85 100644
> --- a/include/qemu-common.h
> +++ b/include/qemu-common.h
> @@ -74,6 +74,12 @@ void cpu_exec_step_atomic(CPUState *cpu);
>   */
>  bool set_preferred_target_page_bits(int bits);
>  
> +/**
> + * finalize_target_page_bits:
> + * Commit the final value set by set_preferred_target_page_bits.
> + */
> +void finalize_target_page_bits(void);
> +
>  /**
>   * Sends a (part of) iovec down a socket, yielding when the socket is full, or
>   * Receives data into a (part of) iovec from a socket,
> diff --git a/exec-vary.c b/exec-vary.c
> new file mode 100644
> index 0000000000..48c0ab306c
> --- /dev/null
> +++ b/exec-vary.c
> @@ -0,0 +1,57 @@
> +/*
> + * Variable page size handling
> + *
> + *  Copyright (c) 2003 Fabrice Bellard
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2 of the License, or (at your option) any later version.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this library; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu-common.h"
> +#include "exec/exec-all.h"
> +
> +#ifdef TARGET_PAGE_BITS_VARY
> +int target_page_bits;
> +bool target_page_bits_decided;
> +#endif
> +
> +bool set_preferred_target_page_bits(int bits)
> +{
> +    /*
> +     * The target page size is the lowest common denominator for all
> +     * the CPUs in the system, so we can only make it smaller, never
> +     * larger. And we can't make it smaller once we've committed to
> +     * a particular size.
> +     */
> +#ifdef TARGET_PAGE_BITS_VARY
> +    assert(bits >= TARGET_PAGE_BITS_MIN);
> +    if (target_page_bits == 0 || target_page_bits > bits) {
> +        if (target_page_bits_decided) {
> +            return false;
> +        }
> +        target_page_bits = bits;
> +    }
> +#endif
> +    return true;
> +}
> +
> +void finalize_target_page_bits(void)
> +{
> +#ifdef TARGET_PAGE_BITS_VARY
> +    if (target_page_bits == 0) {
> +        target_page_bits = TARGET_PAGE_BITS_MIN;
> +    }
> +    target_page_bits_decided = true;
> +#endif
> +}

I wonder if it would be nicer to handle this in the header file instead,
providing dummy functions there.

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 05/20] exec: Promote TARGET_PAGE_MASK to target_long
  2019-09-22  3:54 ` [PATCH v3 05/20] exec: Promote TARGET_PAGE_MASK to target_long Richard Henderson
@ 2019-09-23  8:30   ` David Hildenbrand
  0 siblings, 0 replies; 53+ messages in thread
From: David Hildenbrand @ 2019-09-23  8:30 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: pbonzini, alex.bennee, stefanha

On 22.09.19 05:54, Richard Henderson wrote:
> There are some uint64_t uses that expect TARGET_PAGE_MASK to
> extend for a 32-bit, so this must continue to be a signed type.
> Define based on TARGET_PAGE_BITS not TARGET_PAGE_SIZE; this
> will make a following patch more clear.
> 
> This should not have a functional effect so far.
> 
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  include/exec/cpu-all.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
> index b11ee1f711..34d36cebca 100644
> --- a/include/exec/cpu-all.h
> +++ b/include/exec/cpu-all.h
> @@ -225,7 +225,7 @@ extern const TargetPageBits target_page;
>  #endif
>  
>  #define TARGET_PAGE_SIZE (1 << TARGET_PAGE_BITS)
> -#define TARGET_PAGE_MASK ~(TARGET_PAGE_SIZE - 1)
> +#define TARGET_PAGE_MASK ((target_long)-1 << TARGET_PAGE_BITS)
>  #define TARGET_PAGE_ALIGN(addr) (((addr) + TARGET_PAGE_SIZE - 1) & TARGET_PAGE_MASK)
>  
>  /* Using intptr_t ensures that qemu_*_page_mask is sign-extended even
> 

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 06/20] exec: Tidy TARGET_PAGE_ALIGN
  2019-09-22  3:54 ` [PATCH v3 06/20] exec: Tidy TARGET_PAGE_ALIGN Richard Henderson
@ 2019-09-23  8:30   ` David Hildenbrand
  0 siblings, 0 replies; 53+ messages in thread
From: David Hildenbrand @ 2019-09-23  8:30 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: pbonzini, alex.bennee, stefanha

On 22.09.19 05:54, Richard Henderson wrote:
> Use TARGET_PAGE_MASK twice instead of TARGET_PAGE_SIZE once.
> This is functionally identical, but will help a following patch.
> 
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  include/exec/cpu-all.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
> index 34d36cebca..5246770271 100644
> --- a/include/exec/cpu-all.h
> +++ b/include/exec/cpu-all.h
> @@ -226,7 +226,8 @@ extern const TargetPageBits target_page;
>  
>  #define TARGET_PAGE_SIZE (1 << TARGET_PAGE_BITS)
>  #define TARGET_PAGE_MASK ((target_long)-1 << TARGET_PAGE_BITS)
> -#define TARGET_PAGE_ALIGN(addr) (((addr) + TARGET_PAGE_SIZE - 1) & TARGET_PAGE_MASK)
> +#define TARGET_PAGE_ALIGN(addr) \
> +    (((addr) + ~TARGET_PAGE_MASK) & TARGET_PAGE_MASK)
>  
>  /* Using intptr_t ensures that qemu_*_page_mask is sign-extended even
>   * when intptr_t is 32-bit and we are aligning a long long.
> 

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 07/20] exec: Cache TARGET_PAGE_MASK for TARGET_PAGE_BITS_VARY
  2019-09-22  3:54 ` [PATCH v3 07/20] exec: Cache TARGET_PAGE_MASK for TARGET_PAGE_BITS_VARY Richard Henderson
@ 2019-09-23  8:31   ` David Hildenbrand
  0 siblings, 0 replies; 53+ messages in thread
From: David Hildenbrand @ 2019-09-23  8:31 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: pbonzini, alex.bennee, stefanha

On 22.09.19 05:54, Richard Henderson wrote:
> This eliminates a set of runtime shifts.  It turns out that we
> require TARGET_PAGE_MASK more often than TARGET_PAGE_SIZE, so
> redefine TARGET_PAGE_SIZE based on TARGET_PAGE_MASK instead of
> the other way around.
> 
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  include/exec/cpu-all.h | 8 ++++++--
>  exec-vary.c            | 1 +
>  2 files changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
> index 5246770271..2db73c7a27 100644
> --- a/include/exec/cpu-all.h
> +++ b/include/exec/cpu-all.h
> @@ -213,19 +213,23 @@ static inline void stl_phys_notdirty(AddressSpace *as, hwaddr addr, uint32_t val
>  typedef struct {
>      bool decided;
>      int bits;
> +    target_long mask;
>  } TargetPageBits;
>  extern const TargetPageBits target_page;
>  # ifdef CONFIG_DEBUG_TCG
>  #  define TARGET_PAGE_BITS (assert(target_page.decided), target_page.bits)
> +#  define TARGET_PAGE_MASK (assert(target_page.decided), target_page.mask)
>  # else
>  #  define TARGET_PAGE_BITS target_page.bits
> +#  define TARGET_PAGE_MASK target_page.mask
>  # endif
> +# define TARGET_PAGE_SIZE  ((int)-TARGET_PAGE_MASK)
>  #else
>  #define TARGET_PAGE_BITS_MIN TARGET_PAGE_BITS
> +#define TARGET_PAGE_SIZE   (1 << TARGET_PAGE_BITS)
> +#define TARGET_PAGE_MASK   ((target_long)-1 << TARGET_PAGE_BITS)
>  #endif
>  
> -#define TARGET_PAGE_SIZE (1 << TARGET_PAGE_BITS)
> -#define TARGET_PAGE_MASK ((target_long)-1 << TARGET_PAGE_BITS)
>  #define TARGET_PAGE_ALIGN(addr) \
>      (((addr) + ~TARGET_PAGE_MASK) & TARGET_PAGE_MASK)
>  
> diff --git a/exec-vary.c b/exec-vary.c
> index 67cdf57a9c..26daf281f2 100644
> --- a/exec-vary.c
> +++ b/exec-vary.c
> @@ -83,5 +83,6 @@ void finalize_target_page_bits(void)
>          init_target_page.bits = TARGET_PAGE_BITS_MIN;
>      }
>      init_target_page.decided = true;
> +    init_target_page.mask = (target_long)-1 << init_target_page.bits;
>  #endif
>  }
> 

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 09/20] cputlb: Replace switches in load/store_helper with callback
  2019-09-22  3:54 ` [PATCH v3 09/20] cputlb: Replace switches in load/store_helper with callback Richard Henderson
@ 2019-09-23  8:32   ` David Hildenbrand
  2019-09-23  9:27   ` Philippe Mathieu-Daudé
  2019-09-23  9:51   ` Paolo Bonzini
  2 siblings, 0 replies; 53+ messages in thread
From: David Hildenbrand @ 2019-09-23  8:32 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: pbonzini, alex.bennee, stefanha

On 22.09.19 05:54, Richard Henderson wrote:
> Add a function parameter to perform the actual load/store to ram.
> With optimization, this results in identical code.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  accel/tcg/cputlb.c | 159 +++++++++++++++++++++++----------------------
>  1 file changed, 83 insertions(+), 76 deletions(-)
> 
> diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
> index 2222b87764..b4a63d3928 100644
> --- a/accel/tcg/cputlb.c
> +++ b/accel/tcg/cputlb.c
> @@ -1280,11 +1280,38 @@ static void *atomic_mmu_lookup(CPUArchState *env, target_ulong addr,
>  
>  typedef uint64_t FullLoadHelper(CPUArchState *env, target_ulong addr,
>                                  TCGMemOpIdx oi, uintptr_t retaddr);
> +typedef uint64_t LoadHelper(const void *);
> +
> +/* Wrap the unaligned load helpers to that they have a common signature.  */
> +static inline uint64_t wrap_ldub(const void *haddr)
> +{
> +    return ldub_p(haddr);
> +}
> +
> +static inline uint64_t wrap_lduw_be(const void *haddr)
> +{
> +    return lduw_be_p(haddr);
> +}
> +
> +static inline uint64_t wrap_lduw_le(const void *haddr)
> +{
> +    return lduw_le_p(haddr);
> +}
> +
> +static inline uint64_t wrap_ldul_be(const void *haddr)
> +{
> +    return (uint32_t)ldl_be_p(haddr);
> +}
> +
> +static inline uint64_t wrap_ldul_le(const void *haddr)
> +{
> +    return (uint32_t)ldl_le_p(haddr);
> +}
>  
>  static inline uint64_t QEMU_ALWAYS_INLINE
>  load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
>              uintptr_t retaddr, MemOp op, bool code_read,
> -            FullLoadHelper *full_load)
> +            FullLoadHelper *full_load, LoadHelper *direct)
>  {
>      uintptr_t mmu_idx = get_mmuidx(oi);
>      uintptr_t index = tlb_index(env, mmu_idx, addr);
> @@ -1373,33 +1400,7 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
>  
>   do_aligned_access:
>      haddr = (void *)((uintptr_t)addr + entry->addend);
> -    switch (op) {
> -    case MO_UB:
> -        res = ldub_p(haddr);
> -        break;
> -    case MO_BEUW:
> -        res = lduw_be_p(haddr);
> -        break;
> -    case MO_LEUW:
> -        res = lduw_le_p(haddr);
> -        break;
> -    case MO_BEUL:
> -        res = (uint32_t)ldl_be_p(haddr);
> -        break;
> -    case MO_LEUL:
> -        res = (uint32_t)ldl_le_p(haddr);
> -        break;
> -    case MO_BEQ:
> -        res = ldq_be_p(haddr);
> -        break;
> -    case MO_LEQ:
> -        res = ldq_le_p(haddr);
> -        break;
> -    default:
> -        g_assert_not_reached();
> -    }
> -
> -    return res;
> +    return direct(haddr);
>  }
>  
>  /*
> @@ -1415,7 +1416,8 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
>  static uint64_t full_ldub_mmu(CPUArchState *env, target_ulong addr,
>                                TCGMemOpIdx oi, uintptr_t retaddr)
>  {
> -    return load_helper(env, addr, oi, retaddr, MO_UB, false, full_ldub_mmu);
> +    return load_helper(env, addr, oi, retaddr, MO_UB, false,
> +                       full_ldub_mmu, wrap_ldub);
>  }
>  
>  tcg_target_ulong helper_ret_ldub_mmu(CPUArchState *env, target_ulong addr,
> @@ -1428,7 +1430,7 @@ static uint64_t full_le_lduw_mmu(CPUArchState *env, target_ulong addr,
>                                   TCGMemOpIdx oi, uintptr_t retaddr)
>  {
>      return load_helper(env, addr, oi, retaddr, MO_LEUW, false,
> -                       full_le_lduw_mmu);
> +                       full_le_lduw_mmu, wrap_lduw_le);
>  }
>  
>  tcg_target_ulong helper_le_lduw_mmu(CPUArchState *env, target_ulong addr,
> @@ -1441,7 +1443,7 @@ static uint64_t full_be_lduw_mmu(CPUArchState *env, target_ulong addr,
>                                   TCGMemOpIdx oi, uintptr_t retaddr)
>  {
>      return load_helper(env, addr, oi, retaddr, MO_BEUW, false,
> -                       full_be_lduw_mmu);
> +                       full_be_lduw_mmu, wrap_lduw_be);
>  }
>  
>  tcg_target_ulong helper_be_lduw_mmu(CPUArchState *env, target_ulong addr,
> @@ -1454,7 +1456,7 @@ static uint64_t full_le_ldul_mmu(CPUArchState *env, target_ulong addr,
>                                   TCGMemOpIdx oi, uintptr_t retaddr)
>  {
>      return load_helper(env, addr, oi, retaddr, MO_LEUL, false,
> -                       full_le_ldul_mmu);
> +                       full_le_ldul_mmu, wrap_ldul_le);
>  }
>  
>  tcg_target_ulong helper_le_ldul_mmu(CPUArchState *env, target_ulong addr,
> @@ -1467,7 +1469,7 @@ static uint64_t full_be_ldul_mmu(CPUArchState *env, target_ulong addr,
>                                   TCGMemOpIdx oi, uintptr_t retaddr)
>  {
>      return load_helper(env, addr, oi, retaddr, MO_BEUL, false,
> -                       full_be_ldul_mmu);
> +                       full_be_ldul_mmu, wrap_ldul_be);
>  }
>  
>  tcg_target_ulong helper_be_ldul_mmu(CPUArchState *env, target_ulong addr,
> @@ -1480,14 +1482,14 @@ uint64_t helper_le_ldq_mmu(CPUArchState *env, target_ulong addr,
>                             TCGMemOpIdx oi, uintptr_t retaddr)
>  {
>      return load_helper(env, addr, oi, retaddr, MO_LEQ, false,
> -                       helper_le_ldq_mmu);
> +                       helper_le_ldq_mmu, ldq_le_p);
>  }
>  
>  uint64_t helper_be_ldq_mmu(CPUArchState *env, target_ulong addr,
>                             TCGMemOpIdx oi, uintptr_t retaddr)
>  {
>      return load_helper(env, addr, oi, retaddr, MO_BEQ, false,
> -                       helper_be_ldq_mmu);
> +                       helper_be_ldq_mmu, ldq_be_p);
>  }
>  
>  /*
> @@ -1530,9 +1532,38 @@ tcg_target_ulong helper_be_ldsl_mmu(CPUArchState *env, target_ulong addr,
>   * Store Helpers
>   */
>  
> +typedef void StoreHelper(void *, uint64_t);
> +
> +/* Wrap the unaligned store helpers to that they have a common signature.  */
> +static inline void wrap_stb(void *haddr, uint64_t val)
> +{
> +    stb_p(haddr, val);
> +}
> +
> +static inline void wrap_stw_be(void *haddr, uint64_t val)
> +{
> +    stw_be_p(haddr, val);
> +}
> +
> +static inline void wrap_stw_le(void *haddr, uint64_t val)
> +{
> +    stw_le_p(haddr, val);
> +}
> +
> +static inline void wrap_stl_be(void *haddr, uint64_t val)
> +{
> +    stl_be_p(haddr, val);
> +}
> +
> +static inline void wrap_stl_le(void *haddr, uint64_t val)
> +{
> +    stl_le_p(haddr, val);
> +}
> +
>  static inline void QEMU_ALWAYS_INLINE
>  store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
> -             TCGMemOpIdx oi, uintptr_t retaddr, MemOp op)
> +             TCGMemOpIdx oi, uintptr_t retaddr, MemOp op,
> +             StoreHelper *direct)
>  {
>      uintptr_t mmu_idx = get_mmuidx(oi);
>      uintptr_t index = tlb_index(env, mmu_idx, addr);
> @@ -1657,74 +1688,49 @@ store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
>  
>   do_aligned_access:
>      haddr = (void *)((uintptr_t)addr + entry->addend);
> -    switch (op) {
> -    case MO_UB:
> -        stb_p(haddr, val);
> -        break;
> -    case MO_BEUW:
> -        stw_be_p(haddr, val);
> -        break;
> -    case MO_LEUW:
> -        stw_le_p(haddr, val);
> -        break;
> -    case MO_BEUL:
> -        stl_be_p(haddr, val);
> -        break;
> -    case MO_LEUL:
> -        stl_le_p(haddr, val);
> -        break;
> -    case MO_BEQ:
> -        stq_be_p(haddr, val);
> -        break;
> -    case MO_LEQ:
> -        stq_le_p(haddr, val);
> -        break;
> -    default:
> -        g_assert_not_reached();
> -        break;
> -    }
> +    direct(haddr, val);
>  }
>  
>  void helper_ret_stb_mmu(CPUArchState *env, target_ulong addr, uint8_t val,
>                          TCGMemOpIdx oi, uintptr_t retaddr)
>  {
> -    store_helper(env, addr, val, oi, retaddr, MO_UB);
> +    store_helper(env, addr, val, oi, retaddr, MO_UB, wrap_stb);
>  }
>  
>  void helper_le_stw_mmu(CPUArchState *env, target_ulong addr, uint16_t val,
>                         TCGMemOpIdx oi, uintptr_t retaddr)
>  {
> -    store_helper(env, addr, val, oi, retaddr, MO_LEUW);
> +    store_helper(env, addr, val, oi, retaddr, MO_LEUW, wrap_stw_le);
>  }
>  
>  void helper_be_stw_mmu(CPUArchState *env, target_ulong addr, uint16_t val,
>                         TCGMemOpIdx oi, uintptr_t retaddr)
>  {
> -    store_helper(env, addr, val, oi, retaddr, MO_BEUW);
> +    store_helper(env, addr, val, oi, retaddr, MO_BEUW, wrap_stw_be);
>  }
>  
>  void helper_le_stl_mmu(CPUArchState *env, target_ulong addr, uint32_t val,
>                         TCGMemOpIdx oi, uintptr_t retaddr)
>  {
> -    store_helper(env, addr, val, oi, retaddr, MO_LEUL);
> +    store_helper(env, addr, val, oi, retaddr, MO_LEUL, wrap_stl_le);
>  }
>  
>  void helper_be_stl_mmu(CPUArchState *env, target_ulong addr, uint32_t val,
>                         TCGMemOpIdx oi, uintptr_t retaddr)
>  {
> -    store_helper(env, addr, val, oi, retaddr, MO_BEUL);
> +    store_helper(env, addr, val, oi, retaddr, MO_BEUL, wrap_stl_be);
>  }
>  
>  void helper_le_stq_mmu(CPUArchState *env, target_ulong addr, uint64_t val,
>                         TCGMemOpIdx oi, uintptr_t retaddr)
>  {
> -    store_helper(env, addr, val, oi, retaddr, MO_LEQ);
> +    store_helper(env, addr, val, oi, retaddr, MO_LEQ, stq_le_p);
>  }
>  
>  void helper_be_stq_mmu(CPUArchState *env, target_ulong addr, uint64_t val,
>                         TCGMemOpIdx oi, uintptr_t retaddr)
>  {
> -    store_helper(env, addr, val, oi, retaddr, MO_BEQ);
> +    store_helper(env, addr, val, oi, retaddr, MO_BEQ, stq_be_p);
>  }
>  
>  /* First set of helpers allows passing in of OI and RETADDR.  This makes
> @@ -1789,7 +1795,8 @@ void helper_be_stq_mmu(CPUArchState *env, target_ulong addr, uint64_t val,
>  static uint64_t full_ldub_cmmu(CPUArchState *env, target_ulong addr,
>                                 TCGMemOpIdx oi, uintptr_t retaddr)
>  {
> -    return load_helper(env, addr, oi, retaddr, MO_8, true, full_ldub_cmmu);
> +    return load_helper(env, addr, oi, retaddr, MO_8, true,
> +                       full_ldub_cmmu, wrap_ldub);
>  }
>  
>  uint8_t helper_ret_ldb_cmmu(CPUArchState *env, target_ulong addr,
> @@ -1802,7 +1809,7 @@ static uint64_t full_le_lduw_cmmu(CPUArchState *env, target_ulong addr,
>                                    TCGMemOpIdx oi, uintptr_t retaddr)
>  {
>      return load_helper(env, addr, oi, retaddr, MO_LEUW, true,
> -                       full_le_lduw_cmmu);
> +                       full_le_lduw_cmmu, wrap_lduw_le);
>  }
>  
>  uint16_t helper_le_ldw_cmmu(CPUArchState *env, target_ulong addr,
> @@ -1815,7 +1822,7 @@ static uint64_t full_be_lduw_cmmu(CPUArchState *env, target_ulong addr,
>                                    TCGMemOpIdx oi, uintptr_t retaddr)
>  {
>      return load_helper(env, addr, oi, retaddr, MO_BEUW, true,
> -                       full_be_lduw_cmmu);
> +                       full_be_lduw_cmmu, wrap_lduw_be);
>  }
>  
>  uint16_t helper_be_ldw_cmmu(CPUArchState *env, target_ulong addr,
> @@ -1828,7 +1835,7 @@ static uint64_t full_le_ldul_cmmu(CPUArchState *env, target_ulong addr,
>                                    TCGMemOpIdx oi, uintptr_t retaddr)
>  {
>      return load_helper(env, addr, oi, retaddr, MO_LEUL, true,
> -                       full_le_ldul_cmmu);
> +                       full_le_ldul_cmmu, wrap_ldul_le);
>  }
>  
>  uint32_t helper_le_ldl_cmmu(CPUArchState *env, target_ulong addr,
> @@ -1841,7 +1848,7 @@ static uint64_t full_be_ldul_cmmu(CPUArchState *env, target_ulong addr,
>                                    TCGMemOpIdx oi, uintptr_t retaddr)
>  {
>      return load_helper(env, addr, oi, retaddr, MO_BEUL, true,
> -                       full_be_ldul_cmmu);
> +                       full_be_ldul_cmmu, wrap_ldul_be);
>  }
>  
>  uint32_t helper_be_ldl_cmmu(CPUArchState *env, target_ulong addr,
> @@ -1854,12 +1861,12 @@ uint64_t helper_le_ldq_cmmu(CPUArchState *env, target_ulong addr,
>                              TCGMemOpIdx oi, uintptr_t retaddr)
>  {
>      return load_helper(env, addr, oi, retaddr, MO_LEQ, true,
> -                       helper_le_ldq_cmmu);
> +                       helper_le_ldq_cmmu, ldq_le_p);
>  }
>  
>  uint64_t helper_be_ldq_cmmu(CPUArchState *env, target_ulong addr,
>                              TCGMemOpIdx oi, uintptr_t retaddr)
>  {
>      return load_helper(env, addr, oi, retaddr, MO_BEQ, true,
> -                       helper_be_ldq_cmmu);
> +                       helper_be_ldq_cmmu, ldq_be_p);
>  }
> 

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 10/20] cputlb: Introduce TLB_BSWAP
  2019-09-22  3:54 ` [PATCH v3 10/20] cputlb: Introduce TLB_BSWAP Richard Henderson
@ 2019-09-23  8:33   ` David Hildenbrand
  0 siblings, 0 replies; 53+ messages in thread
From: David Hildenbrand @ 2019-09-23  8:33 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: pbonzini, alex.bennee, stefanha

On 22.09.19 05:54, Richard Henderson wrote:
> Handle bswap on ram directly in load/store_helper.  This fixes a
> bug with the previous implementation in that one cannot use the
> I/O path for RAM.
> 
> Fixes: a26fc6f5152b47f1
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  include/exec/cpu-all.h |   4 +-
>  accel/tcg/cputlb.c     | 108 +++++++++++++++++++++--------------------
>  2 files changed, 59 insertions(+), 53 deletions(-)
> 
> diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
> index 2db73c7a27..1ebd1b59ab 100644
> --- a/include/exec/cpu-all.h
> +++ b/include/exec/cpu-all.h
> @@ -346,12 +346,14 @@ CPUArchState *cpu_copy(CPUArchState *env);
>  #define TLB_MMIO            (1 << (TARGET_PAGE_BITS_MIN - 3))
>  /* Set if TLB entry contains a watchpoint.  */
>  #define TLB_WATCHPOINT      (1 << (TARGET_PAGE_BITS_MIN - 4))
> +/* Set if TLB entry requires byte swap.  */
> +#define TLB_BSWAP           (1 << (TARGET_PAGE_BITS_MIN - 5))
>  
>  /* Use this mask to check interception with an alignment mask
>   * in a TCG backend.
>   */
>  #define TLB_FLAGS_MASK \
> -    (TLB_INVALID_MASK | TLB_NOTDIRTY | TLB_MMIO | TLB_WATCHPOINT)
> +    (TLB_INVALID_MASK | TLB_NOTDIRTY | TLB_MMIO | TLB_WATCHPOINT | TLB_BSWAP)
>  
>  /**
>   * tlb_hit_page: return true if page aligned @addr is a hit against the
> diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
> index b4a63d3928..cb603917a2 100644
> --- a/accel/tcg/cputlb.c
> +++ b/accel/tcg/cputlb.c
> @@ -737,8 +737,7 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
>          address |= TLB_INVALID_MASK;
>      }
>      if (attrs.byte_swap) {
> -        /* Force the access through the I/O slow path.  */
> -        address |= TLB_MMIO;
> +        address |= TLB_BSWAP;
>      }
>      if (!memory_region_is_ram(section->mr) &&
>          !memory_region_is_romd(section->mr)) {
> @@ -901,10 +900,6 @@ static uint64_t io_readx(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
>      bool locked = false;
>      MemTxResult r;
>  
> -    if (iotlbentry->attrs.byte_swap) {
> -        op ^= MO_BSWAP;
> -    }
> -
>      section = iotlb_to_section(cpu, iotlbentry->addr, iotlbentry->attrs);
>      mr = section->mr;
>      mr_offset = (iotlbentry->addr & TARGET_PAGE_MASK) + addr;
> @@ -947,10 +942,6 @@ static void io_writex(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
>      bool locked = false;
>      MemTxResult r;
>  
> -    if (iotlbentry->attrs.byte_swap) {
> -        op ^= MO_BSWAP;
> -    }
> -
>      section = iotlb_to_section(cpu, iotlbentry->addr, iotlbentry->attrs);
>      mr = section->mr;
>      mr_offset = (iotlbentry->addr & TARGET_PAGE_MASK) + addr;
> @@ -1133,8 +1124,8 @@ void *probe_access(CPUArchState *env, target_ulong addr, int size,
>                               wp_access, retaddr);
>      }
>  
> -    if (tlb_addr & (TLB_NOTDIRTY | TLB_MMIO)) {
> -        /* I/O access */
> +    /* Reject I/O access, or other required slow-path.  */
> +    if (tlb_addr & (TLB_NOTDIRTY | TLB_MMIO | TLB_BSWAP)) {
>          return NULL;
>      }
>  
> @@ -1311,7 +1302,8 @@ static inline uint64_t wrap_ldul_le(const void *haddr)
>  static inline uint64_t QEMU_ALWAYS_INLINE
>  load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
>              uintptr_t retaddr, MemOp op, bool code_read,
> -            FullLoadHelper *full_load, LoadHelper *direct)
> +            FullLoadHelper *full_load, LoadHelper *direct,
> +            LoadHelper *direct_swap)
>  {
>      uintptr_t mmu_idx = get_mmuidx(oi);
>      uintptr_t index = tlb_index(env, mmu_idx, addr);
> @@ -1361,17 +1353,21 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
>              /* On watchpoint hit, this will longjmp out.  */
>              cpu_check_watchpoint(env_cpu(env), addr, size,
>                                   iotlbentry->attrs, BP_MEM_READ, retaddr);
> -
> -            /* The backing page may or may not require I/O.  */
> -            tlb_addr &= ~TLB_WATCHPOINT;
> -            if ((tlb_addr & ~TARGET_PAGE_MASK) == 0) {
> -                goto do_aligned_access;
> -            }
>          }
>  
>          /* Handle I/O access.  */
> -        return io_readx(env, iotlbentry, mmu_idx, addr,
> -                        retaddr, access_type, op);
> +        if (likely(tlb_addr & TLB_MMIO)) {
> +            return io_readx(env, iotlbentry, mmu_idx, addr,
> +                            retaddr, access_type,
> +                            op ^ (tlb_addr & TLB_BSWAP ? MO_BSWAP : 0));
> +        }
> +
> +        haddr = (void *)((uintptr_t)addr + entry->addend);
> +
> +        if (unlikely(tlb_addr & TLB_BSWAP)) {
> +            return direct_swap(haddr);
> +        }
> +        return direct(haddr);
>      }
>  
>      /* Handle slow unaligned access (it spans two pages or IO).  */
> @@ -1398,7 +1394,6 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
>          return res & MAKE_64BIT_MASK(0, size * 8);
>      }
>  
> - do_aligned_access:
>      haddr = (void *)((uintptr_t)addr + entry->addend);
>      return direct(haddr);
>  }
> @@ -1417,7 +1412,7 @@ static uint64_t full_ldub_mmu(CPUArchState *env, target_ulong addr,
>                                TCGMemOpIdx oi, uintptr_t retaddr)
>  {
>      return load_helper(env, addr, oi, retaddr, MO_UB, false,
> -                       full_ldub_mmu, wrap_ldub);
> +                       full_ldub_mmu, wrap_ldub, wrap_ldub);
>  }
>  
>  tcg_target_ulong helper_ret_ldub_mmu(CPUArchState *env, target_ulong addr,
> @@ -1430,7 +1425,7 @@ static uint64_t full_le_lduw_mmu(CPUArchState *env, target_ulong addr,
>                                   TCGMemOpIdx oi, uintptr_t retaddr)
>  {
>      return load_helper(env, addr, oi, retaddr, MO_LEUW, false,
> -                       full_le_lduw_mmu, wrap_lduw_le);
> +                       full_le_lduw_mmu, wrap_lduw_le, wrap_lduw_be);
>  }
>  
>  tcg_target_ulong helper_le_lduw_mmu(CPUArchState *env, target_ulong addr,
> @@ -1443,7 +1438,7 @@ static uint64_t full_be_lduw_mmu(CPUArchState *env, target_ulong addr,
>                                   TCGMemOpIdx oi, uintptr_t retaddr)
>  {
>      return load_helper(env, addr, oi, retaddr, MO_BEUW, false,
> -                       full_be_lduw_mmu, wrap_lduw_be);
> +                       full_be_lduw_mmu, wrap_lduw_be, wrap_lduw_le);
>  }
>  
>  tcg_target_ulong helper_be_lduw_mmu(CPUArchState *env, target_ulong addr,
> @@ -1456,7 +1451,7 @@ static uint64_t full_le_ldul_mmu(CPUArchState *env, target_ulong addr,
>                                   TCGMemOpIdx oi, uintptr_t retaddr)
>  {
>      return load_helper(env, addr, oi, retaddr, MO_LEUL, false,
> -                       full_le_ldul_mmu, wrap_ldul_le);
> +                       full_le_ldul_mmu, wrap_ldul_le, wrap_ldul_be);
>  }
>  
>  tcg_target_ulong helper_le_ldul_mmu(CPUArchState *env, target_ulong addr,
> @@ -1469,7 +1464,7 @@ static uint64_t full_be_ldul_mmu(CPUArchState *env, target_ulong addr,
>                                   TCGMemOpIdx oi, uintptr_t retaddr)
>  {
>      return load_helper(env, addr, oi, retaddr, MO_BEUL, false,
> -                       full_be_ldul_mmu, wrap_ldul_be);
> +                       full_be_ldul_mmu, wrap_ldul_be, wrap_ldul_le);
>  }
>  
>  tcg_target_ulong helper_be_ldul_mmu(CPUArchState *env, target_ulong addr,
> @@ -1482,14 +1477,14 @@ uint64_t helper_le_ldq_mmu(CPUArchState *env, target_ulong addr,
>                             TCGMemOpIdx oi, uintptr_t retaddr)
>  {
>      return load_helper(env, addr, oi, retaddr, MO_LEQ, false,
> -                       helper_le_ldq_mmu, ldq_le_p);
> +                       helper_le_ldq_mmu, ldq_le_p, ldq_be_p);
>  }
>  
>  uint64_t helper_be_ldq_mmu(CPUArchState *env, target_ulong addr,
>                             TCGMemOpIdx oi, uintptr_t retaddr)
>  {
>      return load_helper(env, addr, oi, retaddr, MO_BEQ, false,
> -                       helper_be_ldq_mmu, ldq_be_p);
> +                       helper_be_ldq_mmu, ldq_be_p, ldq_le_p);
>  }
>  
>  /*
> @@ -1563,7 +1558,7 @@ static inline void wrap_stl_le(void *haddr, uint64_t val)
>  static inline void QEMU_ALWAYS_INLINE
>  store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
>               TCGMemOpIdx oi, uintptr_t retaddr, MemOp op,
> -             StoreHelper *direct)
> +             StoreHelper *direct, StoreHelper *direct_swap)
>  {
>      uintptr_t mmu_idx = get_mmuidx(oi);
>      uintptr_t index = tlb_index(env, mmu_idx, addr);
> @@ -1608,16 +1603,22 @@ store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
>              /* On watchpoint hit, this will longjmp out.  */
>              cpu_check_watchpoint(env_cpu(env), addr, size,
>                                   iotlbentry->attrs, BP_MEM_WRITE, retaddr);
> -
> -            /* The backing page may or may not require I/O.  */
> -            tlb_addr &= ~TLB_WATCHPOINT;
> -            if ((tlb_addr & ~TARGET_PAGE_MASK) == 0) {
> -                goto do_aligned_access;
> -            }
>          }
>  
>          /* Handle I/O access.  */
> -        io_writex(env, iotlbentry, mmu_idx, val, addr, retaddr, op);
> +        if (likely(tlb_addr & (TLB_MMIO | TLB_NOTDIRTY))) {
> +            io_writex(env, iotlbentry, mmu_idx, val, addr, retaddr,
> +                      op ^ (tlb_addr & TLB_BSWAP ? MO_BSWAP : 0));
> +            return;
> +        }
> +
> +        haddr = (void *)((uintptr_t)addr + entry->addend);
> +
> +        if (unlikely(tlb_addr & TLB_BSWAP)) {
> +            direct_swap(haddr, val);
> +        } else {
> +            direct(haddr, val);
> +        }
>          return;
>      }
>  
> @@ -1686,7 +1687,6 @@ store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
>          return;
>      }
>  
> - do_aligned_access:
>      haddr = (void *)((uintptr_t)addr + entry->addend);
>      direct(haddr, val);
>  }
> @@ -1694,43 +1694,47 @@ store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
>  void helper_ret_stb_mmu(CPUArchState *env, target_ulong addr, uint8_t val,
>                          TCGMemOpIdx oi, uintptr_t retaddr)
>  {
> -    store_helper(env, addr, val, oi, retaddr, MO_UB, wrap_stb);
> +    store_helper(env, addr, val, oi, retaddr, MO_UB, wrap_stb, wrap_stb);
>  }
>  
>  void helper_le_stw_mmu(CPUArchState *env, target_ulong addr, uint16_t val,
>                         TCGMemOpIdx oi, uintptr_t retaddr)
>  {
> -    store_helper(env, addr, val, oi, retaddr, MO_LEUW, wrap_stw_le);
> +    store_helper(env, addr, val, oi, retaddr, MO_LEUW,
> +                 wrap_stw_le, wrap_stw_be);
>  }
>  
>  void helper_be_stw_mmu(CPUArchState *env, target_ulong addr, uint16_t val,
>                         TCGMemOpIdx oi, uintptr_t retaddr)
>  {
> -    store_helper(env, addr, val, oi, retaddr, MO_BEUW, wrap_stw_be);
> +    store_helper(env, addr, val, oi, retaddr, MO_BEUW,
> +                 wrap_stw_be, wrap_stw_le);
>  }
>  
>  void helper_le_stl_mmu(CPUArchState *env, target_ulong addr, uint32_t val,
>                         TCGMemOpIdx oi, uintptr_t retaddr)
>  {
> -    store_helper(env, addr, val, oi, retaddr, MO_LEUL, wrap_stl_le);
> +    store_helper(env, addr, val, oi, retaddr, MO_LEUL,
> +                 wrap_stl_le, wrap_stl_be);
>  }
>  
>  void helper_be_stl_mmu(CPUArchState *env, target_ulong addr, uint32_t val,
>                         TCGMemOpIdx oi, uintptr_t retaddr)
>  {
> -    store_helper(env, addr, val, oi, retaddr, MO_BEUL, wrap_stl_be);
> +    store_helper(env, addr, val, oi, retaddr, MO_BEUL,
> +                 wrap_stl_be, wrap_stl_le);
>  }
>  
>  void helper_le_stq_mmu(CPUArchState *env, target_ulong addr, uint64_t val,
>                         TCGMemOpIdx oi, uintptr_t retaddr)
>  {
> -    store_helper(env, addr, val, oi, retaddr, MO_LEQ, stq_le_p);
> +    store_helper(env, addr, val, oi, retaddr, MO_LEQ, stq_le_p, stq_be_p);
>  }
>  
>  void helper_be_stq_mmu(CPUArchState *env, target_ulong addr, uint64_t val,
>                         TCGMemOpIdx oi, uintptr_t retaddr)
>  {
> -    store_helper(env, addr, val, oi, retaddr, MO_BEQ, stq_be_p);
> +    store_helper(env, addr, val, oi, retaddr, MO_BEQ, stq_be_p, stq_le_p);
>  }
>  
>  /* First set of helpers allows passing in of OI and RETADDR.  This makes
> @@ -1796,7 +1800,7 @@ static uint64_t full_ldub_cmmu(CPUArchState *env, target_ulong addr,
>                                 TCGMemOpIdx oi, uintptr_t retaddr)
>  {
>      return load_helper(env, addr, oi, retaddr, MO_8, true,
> -                       full_ldub_cmmu, wrap_ldub);
> +                       full_ldub_cmmu, wrap_ldub, wrap_ldub);
>  }
>  
>  uint8_t helper_ret_ldb_cmmu(CPUArchState *env, target_ulong addr,
> @@ -1809,7 +1813,7 @@ static uint64_t full_le_lduw_cmmu(CPUArchState *env, target_ulong addr,
>                                    TCGMemOpIdx oi, uintptr_t retaddr)
>  {
>      return load_helper(env, addr, oi, retaddr, MO_LEUW, true,
> -                       full_le_lduw_cmmu, wrap_lduw_le);
> +                       full_le_lduw_cmmu, wrap_lduw_le, wrap_lduw_be);
>  }
>  
>  uint16_t helper_le_ldw_cmmu(CPUArchState *env, target_ulong addr,
> @@ -1822,7 +1826,7 @@ static uint64_t full_be_lduw_cmmu(CPUArchState *env, target_ulong addr,
>                                    TCGMemOpIdx oi, uintptr_t retaddr)
>  {
>      return load_helper(env, addr, oi, retaddr, MO_BEUW, true,
> -                       full_be_lduw_cmmu, wrap_lduw_be);
> +                       full_be_lduw_cmmu, wrap_lduw_be, wrap_lduw_le);
>  }
>  
>  uint16_t helper_be_ldw_cmmu(CPUArchState *env, target_ulong addr,
> @@ -1835,7 +1839,7 @@ static uint64_t full_le_ldul_cmmu(CPUArchState *env, target_ulong addr,
>                                    TCGMemOpIdx oi, uintptr_t retaddr)
>  {
>      return load_helper(env, addr, oi, retaddr, MO_LEUL, true,
> -                       full_le_ldul_cmmu, wrap_ldul_le);
> +                       full_le_ldul_cmmu, wrap_ldul_le, wrap_ldul_be);
>  }
>  
>  uint32_t helper_le_ldl_cmmu(CPUArchState *env, target_ulong addr,
> @@ -1848,7 +1852,7 @@ static uint64_t full_be_ldul_cmmu(CPUArchState *env, target_ulong addr,
>                                    TCGMemOpIdx oi, uintptr_t retaddr)
>  {
>      return load_helper(env, addr, oi, retaddr, MO_BEUL, true,
> -                       full_be_ldul_cmmu, wrap_ldul_be);
> +                       full_be_ldul_cmmu, wrap_ldul_be, wrap_ldul_le);
>  }
>  
>  uint32_t helper_be_ldl_cmmu(CPUArchState *env, target_ulong addr,
> @@ -1861,12 +1865,12 @@ uint64_t helper_le_ldq_cmmu(CPUArchState *env, target_ulong addr,
>                              TCGMemOpIdx oi, uintptr_t retaddr)
>  {
>      return load_helper(env, addr, oi, retaddr, MO_LEQ, true,
> -                       helper_le_ldq_cmmu, ldq_le_p);
> +                       helper_le_ldq_cmmu, ldq_le_p, ldq_be_p);
>  }
>  
>  uint64_t helper_be_ldq_cmmu(CPUArchState *env, target_ulong addr,
>                              TCGMemOpIdx oi, uintptr_t retaddr)
>  {
>      return load_helper(env, addr, oi, retaddr, MO_BEQ, true,
> -                       helper_be_ldq_cmmu, ldq_be_p);
> +                       helper_be_ldq_cmmu, ldq_be_p, ldq_le_p);
>  }
> 

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 12/20] cputlb: Move ROM handling from I/O path to TLB path
  2019-09-22  3:54 ` [PATCH v3 12/20] cputlb: Move ROM handling from I/O path to TLB path Richard Henderson
@ 2019-09-23  8:39   ` David Hildenbrand
  0 siblings, 0 replies; 53+ messages in thread
From: David Hildenbrand @ 2019-09-23  8:39 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: pbonzini, alex.bennee, stefanha

On 22.09.19 05:54, Richard Henderson wrote:
> It does not require going through the whole I/O path
> in order to discard a write.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  include/exec/cpu-all.h    |  5 ++++-
>  include/exec/cpu-common.h |  1 -
>  accel/tcg/cputlb.c        | 35 +++++++++++++++++++--------------
>  exec.c                    | 41 +--------------------------------------
>  4 files changed, 25 insertions(+), 57 deletions(-)
> 
> diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
> index 1ebd1b59ab..9f0b17802e 100644
> --- a/include/exec/cpu-all.h
> +++ b/include/exec/cpu-all.h
> @@ -348,12 +348,15 @@ CPUArchState *cpu_copy(CPUArchState *env);
>  #define TLB_WATCHPOINT      (1 << (TARGET_PAGE_BITS_MIN - 4))
>  /* Set if TLB entry requires byte swap.  */
>  #define TLB_BSWAP           (1 << (TARGET_PAGE_BITS_MIN - 5))
> +/* Set if TLB entry writes ignored.  */
> +#define TLB_ROM             (1 << (TARGET_PAGE_BITS_MIN - 6))

I was wondering if TLB_DISCARD_WRITE/TLB_IGNORE_WRITE/TLB_READONLY would
make it clearer what's actually happening here.

E.g., it isn't used for memory_region_is_romd(section->mr) but only for
memory_region_is_ram(section->mr) && section->readonly.

But anyhow, changes look fine to me

Reviewed-by: David Hildenbrand <david@redhat.com>

>  
>  /* Use this mask to check interception with an alignment mask
>   * in a TCG backend.
>   */
>  #define TLB_FLAGS_MASK \
> -    (TLB_INVALID_MASK | TLB_NOTDIRTY | TLB_MMIO | TLB_WATCHPOINT | TLB_BSWAP)
> +    (TLB_INVALID_MASK | TLB_NOTDIRTY | TLB_MMIO \
> +    | TLB_WATCHPOINT | TLB_BSWAP | TLB_ROM)
>  
>  /**
>   * tlb_hit_page: return true if page aligned @addr is a hit against the
> diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
> index f7dbe75fbc..1c0e03ddc2 100644
> --- a/include/exec/cpu-common.h
> +++ b/include/exec/cpu-common.h
> @@ -100,7 +100,6 @@ void qemu_flush_coalesced_mmio_buffer(void);
>  
>  void cpu_flush_icache_range(hwaddr start, hwaddr len);
>  
> -extern struct MemoryRegion io_mem_rom;
>  extern struct MemoryRegion io_mem_notdirty;
>  
>  typedef int (RAMBlockIterFunc)(RAMBlock *rb, void *opaque);
> diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
> index cb603917a2..7ab523d7ec 100644
> --- a/accel/tcg/cputlb.c
> +++ b/accel/tcg/cputlb.c
> @@ -577,7 +577,7 @@ static void tlb_reset_dirty_range_locked(CPUTLBEntry *tlb_entry,
>  {
>      uintptr_t addr = tlb_entry->addr_write;
>  
> -    if ((addr & (TLB_INVALID_MASK | TLB_MMIO | TLB_NOTDIRTY)) == 0) {
> +    if ((addr & (TLB_INVALID_MASK | TLB_MMIO | TLB_ROM | TLB_NOTDIRTY)) == 0) {
>          addr &= TARGET_PAGE_MASK;
>          addr += tlb_entry->addend;
>          if ((addr - start) < length) {
> @@ -745,7 +745,6 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
>          address |= TLB_MMIO;
>          addend = 0;
>      } else {
> -        /* TLB_MMIO for rom/romd handled below */
>          addend = (uintptr_t)memory_region_get_ram_ptr(section->mr) + xlat;
>      }
>  
> @@ -822,16 +821,17 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
>  
>      tn.addr_write = -1;
>      if (prot & PAGE_WRITE) {
> -        if ((memory_region_is_ram(section->mr) && section->readonly)
> -            || memory_region_is_romd(section->mr)) {
> -            /* Write access calls the I/O callback.  */
> -            tn.addr_write = address | TLB_MMIO;
> -        } else if (memory_region_is_ram(section->mr)
> -                   && cpu_physical_memory_is_clean(
> -                       memory_region_get_ram_addr(section->mr) + xlat)) {
> -            tn.addr_write = address | TLB_NOTDIRTY;
> -        } else {
> -            tn.addr_write = address;
> +        tn.addr_write = address;
> +        if (memory_region_is_romd(section->mr)) {
> +            /* Use the MMIO path so that the device can switch states. */
> +            tn.addr_write |= TLB_MMIO;
> +        } else if (memory_region_is_ram(section->mr)) {
> +            if (section->readonly) {
> +                tn.addr_write |= TLB_ROM;
> +            } else if (cpu_physical_memory_is_clean(
> +                        memory_region_get_ram_addr(section->mr) + xlat)) {
> +                tn.addr_write |= TLB_NOTDIRTY;
> +            }
>          }
>          if (prot & PAGE_WRITE_INV) {
>              tn.addr_write |= TLB_INVALID_MASK;
> @@ -904,7 +904,7 @@ static uint64_t io_readx(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
>      mr = section->mr;
>      mr_offset = (iotlbentry->addr & TARGET_PAGE_MASK) + addr;
>      cpu->mem_io_pc = retaddr;
> -    if (mr != &io_mem_rom && mr != &io_mem_notdirty && !cpu->can_do_io) {
> +    if (mr != &io_mem_notdirty && !cpu->can_do_io) {
>          cpu_io_recompile(cpu, retaddr);
>      }
>  
> @@ -945,7 +945,7 @@ static void io_writex(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
>      section = iotlb_to_section(cpu, iotlbentry->addr, iotlbentry->attrs);
>      mr = section->mr;
>      mr_offset = (iotlbentry->addr & TARGET_PAGE_MASK) + addr;
> -    if (mr != &io_mem_rom && mr != &io_mem_notdirty && !cpu->can_do_io) {
> +    if (mr != &io_mem_notdirty && !cpu->can_do_io) {
>          cpu_io_recompile(cpu, retaddr);
>      }
>      cpu->mem_io_vaddr = addr;
> @@ -1125,7 +1125,7 @@ void *probe_access(CPUArchState *env, target_ulong addr, int size,
>      }
>  
>      /* Reject I/O access, or other required slow-path.  */
> -    if (tlb_addr & (TLB_NOTDIRTY | TLB_MMIO | TLB_BSWAP)) {
> +    if (tlb_addr & (TLB_NOTDIRTY | TLB_MMIO | TLB_BSWAP | TLB_ROM)) {
>          return NULL;
>      }
>  
> @@ -1612,6 +1612,11 @@ store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
>              return;
>          }
>  
> +        /* Ignore writes to ROM.  */
> +        if (unlikely(tlb_addr & TLB_ROM)) {
> +            return;
> +        }
> +
>          haddr = (void *)((uintptr_t)addr + entry->addend);
>  
>          if (unlikely(tlb_addr & TLB_BSWAP)) {
> diff --git a/exec.c b/exec.c
> index 7ce0515635..e21e068535 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -88,7 +88,7 @@ static MemoryRegion *system_io;
>  AddressSpace address_space_io;
>  AddressSpace address_space_memory;
>  
> -MemoryRegion io_mem_rom, io_mem_notdirty;
> +MemoryRegion io_mem_notdirty;
>  static MemoryRegion io_mem_unassigned;
>  #endif
>  
> @@ -158,7 +158,6 @@ typedef struct subpage_t {
>  
>  #define PHYS_SECTION_UNASSIGNED 0
>  #define PHYS_SECTION_NOTDIRTY 1
> -#define PHYS_SECTION_ROM 2
>  
>  static void io_mem_init(void);
>  static void memory_map_init(void);
> @@ -1441,8 +1440,6 @@ hwaddr memory_region_section_get_iotlb(CPUState *cpu,
>          iotlb = memory_region_get_ram_addr(section->mr) + xlat;
>          if (!section->readonly) {
>              iotlb |= PHYS_SECTION_NOTDIRTY;
> -        } else {
> -            iotlb |= PHYS_SECTION_ROM;
>          }
>      } else {
>          AddressSpaceDispatch *d;
> @@ -2968,38 +2965,6 @@ static uint16_t dummy_section(PhysPageMap *map, FlatView *fv, MemoryRegion *mr)
>      return phys_section_add(map, &section);
>  }
>  
> -static void readonly_mem_write(void *opaque, hwaddr addr,
> -                               uint64_t val, unsigned size)
> -{
> -    /* Ignore any write to ROM. */
> -}
> -
> -static bool readonly_mem_accepts(void *opaque, hwaddr addr,
> -                                 unsigned size, bool is_write,
> -                                 MemTxAttrs attrs)
> -{
> -    return is_write;
> -}
> -
> -/* This will only be used for writes, because reads are special cased
> - * to directly access the underlying host ram.
> - */
> -static const MemoryRegionOps readonly_mem_ops = {
> -    .write = readonly_mem_write,
> -    .valid.accepts = readonly_mem_accepts,
> -    .endianness = DEVICE_NATIVE_ENDIAN,
> -    .valid = {
> -        .min_access_size = 1,
> -        .max_access_size = 8,
> -        .unaligned = false,
> -    },
> -    .impl = {
> -        .min_access_size = 1,
> -        .max_access_size = 8,
> -        .unaligned = false,
> -    },
> -};
> -
>  MemoryRegionSection *iotlb_to_section(CPUState *cpu,
>                                        hwaddr index, MemTxAttrs attrs)
>  {
> @@ -3013,8 +2978,6 @@ MemoryRegionSection *iotlb_to_section(CPUState *cpu,
>  
>  static void io_mem_init(void)
>  {
> -    memory_region_init_io(&io_mem_rom, NULL, &readonly_mem_ops,
> -                          NULL, NULL, UINT64_MAX);
>      memory_region_init_io(&io_mem_unassigned, NULL, &unassigned_mem_ops, NULL,
>                            NULL, UINT64_MAX);
>  
> @@ -3035,8 +2998,6 @@ AddressSpaceDispatch *address_space_dispatch_new(FlatView *fv)
>      assert(n == PHYS_SECTION_UNASSIGNED);
>      n = dummy_section(&d->map, fv, &io_mem_notdirty);
>      assert(n == PHYS_SECTION_NOTDIRTY);
> -    n = dummy_section(&d->map, fv, &io_mem_rom);
> -    assert(n == PHYS_SECTION_ROM);
>  
>      d->phys_map  = (PhysPageEntry) { .ptr = PHYS_MAP_NODE_NIL, .skip = 1 };
>  
> 


-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 13/20] cputlb: Move NOTDIRTY handling from I/O path to TLB path
  2019-09-22  3:54 ` [PATCH v3 13/20] cputlb: Move NOTDIRTY " Richard Henderson
@ 2019-09-23  8:41   ` David Hildenbrand
  2019-09-23  9:30   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 53+ messages in thread
From: David Hildenbrand @ 2019-09-23  8:41 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: pbonzini, alex.bennee, stefanha

On 22.09.19 05:54, Richard Henderson wrote:
> Pages that we want to track for NOTDIRTY are RAM.  We do not
> really need to go through the I/O path to handle them.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  include/exec/cpu-common.h |  2 --
>  accel/tcg/cputlb.c        | 26 +++++++++++++++++---
>  exec.c                    | 50 ---------------------------------------
>  memory.c                  | 16 -------------
>  4 files changed, 23 insertions(+), 71 deletions(-)
> 
> diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
> index 1c0e03ddc2..81753bbb34 100644
> --- a/include/exec/cpu-common.h
> +++ b/include/exec/cpu-common.h
> @@ -100,8 +100,6 @@ void qemu_flush_coalesced_mmio_buffer(void);
>  
>  void cpu_flush_icache_range(hwaddr start, hwaddr len);
>  
> -extern struct MemoryRegion io_mem_notdirty;
> -
>  typedef int (RAMBlockIterFunc)(RAMBlock *rb, void *opaque);
>  
>  int qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque);
> diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
> index 7ab523d7ec..b7bd738115 100644
> --- a/accel/tcg/cputlb.c
> +++ b/accel/tcg/cputlb.c
> @@ -904,7 +904,7 @@ static uint64_t io_readx(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
>      mr = section->mr;
>      mr_offset = (iotlbentry->addr & TARGET_PAGE_MASK) + addr;
>      cpu->mem_io_pc = retaddr;
> -    if (mr != &io_mem_notdirty && !cpu->can_do_io) {
> +    if (!cpu->can_do_io) {
>          cpu_io_recompile(cpu, retaddr);
>      }
>  
> @@ -945,7 +945,7 @@ static void io_writex(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
>      section = iotlb_to_section(cpu, iotlbentry->addr, iotlbentry->attrs);
>      mr = section->mr;
>      mr_offset = (iotlbentry->addr & TARGET_PAGE_MASK) + addr;
> -    if (mr != &io_mem_notdirty && !cpu->can_do_io) {
> +    if (!cpu->can_do_io) {
>          cpu_io_recompile(cpu, retaddr);
>      }
>      cpu->mem_io_vaddr = addr;
> @@ -1606,7 +1606,7 @@ store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
>          }
>  
>          /* Handle I/O access.  */
> -        if (likely(tlb_addr & (TLB_MMIO | TLB_NOTDIRTY))) {
> +        if (tlb_addr & TLB_MMIO) {
>              io_writex(env, iotlbentry, mmu_idx, val, addr, retaddr,
>                        op ^ (tlb_addr & TLB_BSWAP ? MO_BSWAP : 0));
>              return;
> @@ -1619,6 +1619,26 @@ store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
>  
>          haddr = (void *)((uintptr_t)addr + entry->addend);
>  
> +        /* Handle clean RAM pages.  */
> +        if (tlb_addr & TLB_NOTDIRTY) {
> +            NotDirtyInfo ndi;
> +
> +            /* We require mem_io_pc in tb_invalidate_phys_page_range.  */
> +            env_cpu(env)->mem_io_pc = retaddr;
> +
> +            memory_notdirty_write_prepare(&ndi, env_cpu(env), addr,
> +                                          addr + iotlbentry->addr, size);
> +
> +            if (unlikely(tlb_addr & TLB_BSWAP)) {
> +                direct_swap(haddr, val);
> +            } else {
> +                direct(haddr, val);
> +            }
> +
> +            memory_notdirty_write_complete(&ndi);
> +            return;
> +        }
> +
>          if (unlikely(tlb_addr & TLB_BSWAP)) {
>              direct_swap(haddr, val);
>          } else {
> diff --git a/exec.c b/exec.c
> index e21e068535..abf58b68a0 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -88,7 +88,6 @@ static MemoryRegion *system_io;
>  AddressSpace address_space_io;
>  AddressSpace address_space_memory;
>  
> -MemoryRegion io_mem_notdirty;
>  static MemoryRegion io_mem_unassigned;
>  #endif
>  
> @@ -157,7 +156,6 @@ typedef struct subpage_t {
>  } subpage_t;
>  
>  #define PHYS_SECTION_UNASSIGNED 0
> -#define PHYS_SECTION_NOTDIRTY 1
>  
>  static void io_mem_init(void);
>  static void memory_map_init(void);
> @@ -1438,9 +1436,6 @@ hwaddr memory_region_section_get_iotlb(CPUState *cpu,
>      if (memory_region_is_ram(section->mr)) {
>          /* Normal RAM.  */
>          iotlb = memory_region_get_ram_addr(section->mr) + xlat;
> -        if (!section->readonly) {
> -            iotlb |= PHYS_SECTION_NOTDIRTY;
> -        }
>      } else {
>          AddressSpaceDispatch *d;
>  
> @@ -2749,42 +2744,6 @@ void memory_notdirty_write_complete(NotDirtyInfo *ndi)
>      }
>  }
>  
> -/* Called within RCU critical section.  */
> -static void notdirty_mem_write(void *opaque, hwaddr ram_addr,
> -                               uint64_t val, unsigned size)
> -{
> -    NotDirtyInfo ndi;
> -
> -    memory_notdirty_write_prepare(&ndi, current_cpu, current_cpu->mem_io_vaddr,
> -                         ram_addr, size);
> -
> -    stn_p(qemu_map_ram_ptr(NULL, ram_addr), size, val);
> -    memory_notdirty_write_complete(&ndi);
> -}
> -
> -static bool notdirty_mem_accepts(void *opaque, hwaddr addr,
> -                                 unsigned size, bool is_write,
> -                                 MemTxAttrs attrs)
> -{
> -    return is_write;
> -}
> -
> -static const MemoryRegionOps notdirty_mem_ops = {
> -    .write = notdirty_mem_write,
> -    .valid.accepts = notdirty_mem_accepts,
> -    .endianness = DEVICE_NATIVE_ENDIAN,
> -    .valid = {
> -        .min_access_size = 1,
> -        .max_access_size = 8,
> -        .unaligned = false,
> -    },
> -    .impl = {
> -        .min_access_size = 1,
> -        .max_access_size = 8,
> -        .unaligned = false,
> -    },
> -};
> -
>  /* Generate a debug exception if a watchpoint has been hit.  */
>  void cpu_check_watchpoint(CPUState *cpu, vaddr addr, vaddr len,
>                            MemTxAttrs attrs, int flags, uintptr_t ra)
> @@ -2980,13 +2939,6 @@ static void io_mem_init(void)
>  {
>      memory_region_init_io(&io_mem_unassigned, NULL, &unassigned_mem_ops, NULL,
>                            NULL, UINT64_MAX);
> -
> -    /* io_mem_notdirty calls tb_invalidate_phys_page_fast,
> -     * which can be called without the iothread mutex.
> -     */
> -    memory_region_init_io(&io_mem_notdirty, NULL, &notdirty_mem_ops, NULL,
> -                          NULL, UINT64_MAX);
> -    memory_region_clear_global_locking(&io_mem_notdirty);
>  }
>  
>  AddressSpaceDispatch *address_space_dispatch_new(FlatView *fv)
> @@ -2996,8 +2948,6 @@ AddressSpaceDispatch *address_space_dispatch_new(FlatView *fv)
>  
>      n = dummy_section(&d->map, fv, &io_mem_unassigned);
>      assert(n == PHYS_SECTION_UNASSIGNED);
> -    n = dummy_section(&d->map, fv, &io_mem_notdirty);
> -    assert(n == PHYS_SECTION_NOTDIRTY);
>  
>      d->phys_map  = (PhysPageEntry) { .ptr = PHYS_MAP_NODE_NIL, .skip = 1 };
>  
> diff --git a/memory.c b/memory.c
> index 57c44c97db..a99b8c0767 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -434,10 +434,6 @@ static MemTxResult  memory_region_read_accessor(MemoryRegion *mr,
>      tmp = mr->ops->read(mr->opaque, addr, size);
>      if (mr->subpage) {
>          trace_memory_region_subpage_read(get_cpu_index(), mr, addr, tmp, size);
> -    } else if (mr == &io_mem_notdirty) {
> -        /* Accesses to code which has previously been translated into a TB show
> -         * up in the MMIO path, as accesses to the io_mem_notdirty
> -         * MemoryRegion. */
>      } else if (TRACE_MEMORY_REGION_OPS_READ_ENABLED) {
>          hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
>          trace_memory_region_ops_read(get_cpu_index(), mr, abs_addr, tmp, size);
> @@ -460,10 +456,6 @@ static MemTxResult memory_region_read_with_attrs_accessor(MemoryRegion *mr,
>      r = mr->ops->read_with_attrs(mr->opaque, addr, &tmp, size, attrs);
>      if (mr->subpage) {
>          trace_memory_region_subpage_read(get_cpu_index(), mr, addr, tmp, size);
> -    } else if (mr == &io_mem_notdirty) {
> -        /* Accesses to code which has previously been translated into a TB show
> -         * up in the MMIO path, as accesses to the io_mem_notdirty
> -         * MemoryRegion. */
>      } else if (TRACE_MEMORY_REGION_OPS_READ_ENABLED) {
>          hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
>          trace_memory_region_ops_read(get_cpu_index(), mr, abs_addr, tmp, size);
> @@ -484,10 +476,6 @@ static MemTxResult memory_region_write_accessor(MemoryRegion *mr,
>  
>      if (mr->subpage) {
>          trace_memory_region_subpage_write(get_cpu_index(), mr, addr, tmp, size);
> -    } else if (mr == &io_mem_notdirty) {
> -        /* Accesses to code which has previously been translated into a TB show
> -         * up in the MMIO path, as accesses to the io_mem_notdirty
> -         * MemoryRegion. */
>      } else if (TRACE_MEMORY_REGION_OPS_WRITE_ENABLED) {
>          hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
>          trace_memory_region_ops_write(get_cpu_index(), mr, abs_addr, tmp, size);
> @@ -508,10 +496,6 @@ static MemTxResult memory_region_write_with_attrs_accessor(MemoryRegion *mr,
>  
>      if (mr->subpage) {
>          trace_memory_region_subpage_write(get_cpu_index(), mr, addr, tmp, size);
> -    } else if (mr == &io_mem_notdirty) {
> -        /* Accesses to code which has previously been translated into a TB show
> -         * up in the MMIO path, as accesses to the io_mem_notdirty
> -         * MemoryRegion. */
>      } else if (TRACE_MEMORY_REGION_OPS_WRITE_ENABLED) {
>          hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
>          trace_memory_region_ops_write(get_cpu_index(), mr, abs_addr, tmp, size);
> 

Complicated stuff but I think this is fine

Acked-by: David Hildenbrand <david@redhat.com>

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 17/20] cputlb: Remove cpu->mem_io_vaddr
  2019-09-22  3:54 ` [PATCH v3 17/20] cputlb: Remove cpu->mem_io_vaddr Richard Henderson
@ 2019-09-23  8:50   ` David Hildenbrand
  0 siblings, 0 replies; 53+ messages in thread
From: David Hildenbrand @ 2019-09-23  8:50 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: pbonzini, alex.bennee, stefanha

On 22.09.19 05:54, Richard Henderson wrote:
> With the merge of notdirty handling into store_helper,
> the last user of cpu->mem_io_vaddr was removed.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  include/hw/core/cpu.h | 2 --
>  accel/tcg/cputlb.c    | 2 --
>  hw/core/cpu.c         | 1 -
>  3 files changed, 5 deletions(-)
> 
> diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
> index c7cda65c66..031f587e51 100644
> --- a/include/hw/core/cpu.h
> +++ b/include/hw/core/cpu.h
> @@ -338,7 +338,6 @@ struct qemu_work_item;
>   * @next_cpu: Next CPU sharing TB cache.
>   * @opaque: User data.
>   * @mem_io_pc: Host Program Counter at which the memory was accessed.
> - * @mem_io_vaddr: Target virtual address at which the memory was accessed.
>   * @kvm_fd: vCPU file descriptor for KVM.
>   * @work_mutex: Lock to prevent multiple access to queued_work_*.
>   * @queued_work_first: First asynchronous work pending.
> @@ -413,7 +412,6 @@ struct CPUState {
>       * we store some rarely used information in the CPU context.
>       */
>      uintptr_t mem_io_pc;
> -    vaddr mem_io_vaddr;
>      /*
>       * This is only needed for the legacy cpu_unassigned_access() hook;
>       * when all targets using it have been converted to use
> diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
> index 6f4096bd0d..257c59c08c 100644
> --- a/accel/tcg/cputlb.c
> +++ b/accel/tcg/cputlb.c
> @@ -927,7 +927,6 @@ static uint64_t io_readx(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
>          cpu_io_recompile(cpu, retaddr);
>      }
>  
> -    cpu->mem_io_vaddr = addr;
>      cpu->mem_io_access_type = access_type;
>  
>      if (mr->global_locking && !qemu_mutex_iothread_locked()) {
> @@ -967,7 +966,6 @@ static void io_writex(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
>      if (!cpu->can_do_io) {
>          cpu_io_recompile(cpu, retaddr);
>      }
> -    cpu->mem_io_vaddr = addr;
>      cpu->mem_io_pc = retaddr;
>  
>      if (mr->global_locking && !qemu_mutex_iothread_locked()) {
> diff --git a/hw/core/cpu.c b/hw/core/cpu.c
> index 0035845511..73b1ee34d0 100644
> --- a/hw/core/cpu.c
> +++ b/hw/core/cpu.c
> @@ -261,7 +261,6 @@ static void cpu_common_reset(CPUState *cpu)
>      cpu->interrupt_request = 0;
>      cpu->halted = 0;
>      cpu->mem_io_pc = 0;
> -    cpu->mem_io_vaddr = 0;
>      cpu->icount_extra = 0;
>      atomic_set(&cpu->icount_decr_ptr->u32, 0);
>      cpu->can_do_io = 1;
> 

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 18/20] cputlb: Remove tb_invalidate_phys_page_range is_cpu_write_access
  2019-09-22  3:54 ` [PATCH v3 18/20] cputlb: Remove tb_invalidate_phys_page_range is_cpu_write_access Richard Henderson
@ 2019-09-23  8:52   ` David Hildenbrand
  2019-09-23 16:05     ` Richard Henderson
  0 siblings, 1 reply; 53+ messages in thread
From: David Hildenbrand @ 2019-09-23  8:52 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: pbonzini, alex.bennee, stefanha

On 22.09.19 05:54, Richard Henderson wrote:
> All callers pass false to this argument.  Remove it and pass the
> constant on to tb_invalidate_phys_page_range__locked.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  accel/tcg/translate-all.h | 3 +--
>  accel/tcg/translate-all.c | 6 ++----
>  exec.c                    | 4 ++--
>  3 files changed, 5 insertions(+), 8 deletions(-)
> 
> diff --git a/accel/tcg/translate-all.h b/accel/tcg/translate-all.h
> index 64f5fd9a05..31f2117188 100644
> --- a/accel/tcg/translate-all.h
> +++ b/accel/tcg/translate-all.h
> @@ -28,8 +28,7 @@ struct page_collection *page_collection_lock(tb_page_addr_t start,
>  void page_collection_unlock(struct page_collection *set);
>  void tb_invalidate_phys_page_fast(struct page_collection *pages,
>                                    tb_page_addr_t start, int len);
> -void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
> -                                   int is_cpu_write_access);
> +void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end);
>  void tb_check_watchpoint(CPUState *cpu);
>  
>  #ifdef CONFIG_USER_ONLY
> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> index 5d1e08b169..de4b697163 100644
> --- a/accel/tcg/translate-all.c
> +++ b/accel/tcg/translate-all.c
> @@ -1983,8 +1983,7 @@ tb_invalidate_phys_page_range__locked(struct page_collection *pages,
>   *
>   * Called with mmap_lock held for user-mode emulation
>   */
> -void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
> -                                   int is_cpu_write_access)
> +void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end)
>  {
>      struct page_collection *pages;
>      PageDesc *p;
> @@ -1996,8 +1995,7 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
>          return;
>      }
>      pages = page_collection_lock(start, end);
> -    tb_invalidate_phys_page_range__locked(pages, p, start, end,
> -                                          is_cpu_write_access);
> +    tb_invalidate_phys_page_range__locked(pages, p, start, end, 0);

I'd prefer "false" to highlight the semantics, but as it's and int ...

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 19/20] cputlb: Pass retaddr to tb_invalidate_phys_page_fast
  2019-09-22  3:54 ` [PATCH v3 19/20] cputlb: Pass retaddr to tb_invalidate_phys_page_fast Richard Henderson
@ 2019-09-23  8:53   ` David Hildenbrand
  0 siblings, 0 replies; 53+ messages in thread
From: David Hildenbrand @ 2019-09-23  8:53 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: pbonzini, alex.bennee, stefanha

On 22.09.19 05:54, Richard Henderson wrote:
> Rather than rely on cpu->mem_io_pc, pass retaddr down directly.
> 
> Within tb_invalidate_phys_page_range__locked, the is_cpu_write_access
> parameter is non-zero exactly when retaddr would be non-zero, so that
> is a simple replacement.
> 
> Recognize that current_tb_not_found is true only when mem_io_pc
> (and now retaddr) are also non-zero, so remove a redundant test.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  accel/tcg/translate-all.h |  3 ++-
>  accel/tcg/cputlb.c        |  6 +-----
>  accel/tcg/translate-all.c | 39 +++++++++++++++++++--------------------
>  3 files changed, 22 insertions(+), 26 deletions(-)
> 
> diff --git a/accel/tcg/translate-all.h b/accel/tcg/translate-all.h
> index 31f2117188..135c1ea96a 100644
> --- a/accel/tcg/translate-all.h
> +++ b/accel/tcg/translate-all.h
> @@ -27,7 +27,8 @@ struct page_collection *page_collection_lock(tb_page_addr_t start,
>                                               tb_page_addr_t end);
>  void page_collection_unlock(struct page_collection *set);
>  void tb_invalidate_phys_page_fast(struct page_collection *pages,
> -                                  tb_page_addr_t start, int len);
> +                                  tb_page_addr_t start, int len,
> +                                  uintptr_t retaddr);
>  void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end);
>  void tb_check_watchpoint(CPUState *cpu);
>  
> diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
> index 257c59c08c..eff129447d 100644
> --- a/accel/tcg/cputlb.c
> +++ b/accel/tcg/cputlb.c
> @@ -1093,11 +1093,7 @@ static void notdirty_write(CPUState *cpu, vaddr mem_vaddr, unsigned size,
>      if (!cpu_physical_memory_get_dirty_flag(ram_addr, DIRTY_MEMORY_CODE)) {
>          struct page_collection *pages
>              = page_collection_lock(ram_addr, ram_addr + size);
> -
> -        /* We require mem_io_pc in tb_invalidate_phys_page_range.  */
> -        cpu->mem_io_pc = retaddr;
> -
> -        tb_invalidate_phys_page_fast(pages, ram_addr, size);
> +        tb_invalidate_phys_page_fast(pages, ram_addr, size, retaddr);
>          page_collection_unlock(pages);
>      }
>  
> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> index de4b697163..db77fb221b 100644
> --- a/accel/tcg/translate-all.c
> +++ b/accel/tcg/translate-all.c
> @@ -1889,7 +1889,7 @@ static void
>  tb_invalidate_phys_page_range__locked(struct page_collection *pages,
>                                        PageDesc *p, tb_page_addr_t start,
>                                        tb_page_addr_t end,
> -                                      int is_cpu_write_access)
> +                                      uintptr_t retaddr)
>  {
>      TranslationBlock *tb;
>      tb_page_addr_t tb_start, tb_end;
> @@ -1897,9 +1897,9 @@ tb_invalidate_phys_page_range__locked(struct page_collection *pages,
>  #ifdef TARGET_HAS_PRECISE_SMC
>      CPUState *cpu = current_cpu;
>      CPUArchState *env = NULL;
> -    int current_tb_not_found = is_cpu_write_access;
> +    bool current_tb_not_found = retaddr != 0;
> +    bool current_tb_modified = false;
>      TranslationBlock *current_tb = NULL;
> -    int current_tb_modified = 0;
>      target_ulong current_pc = 0;
>      target_ulong current_cs_base = 0;
>      uint32_t current_flags = 0;
> @@ -1931,24 +1931,21 @@ tb_invalidate_phys_page_range__locked(struct page_collection *pages,
>          if (!(tb_end <= start || tb_start >= end)) {
>  #ifdef TARGET_HAS_PRECISE_SMC
>              if (current_tb_not_found) {
> -                current_tb_not_found = 0;
> -                current_tb = NULL;
> -                if (cpu->mem_io_pc) {
> -                    /* now we have a real cpu fault */
> -                    current_tb = tcg_tb_lookup(cpu->mem_io_pc);
> -                }
> +                current_tb_not_found = false;
> +                /* now we have a real cpu fault */
> +                current_tb = tcg_tb_lookup(retaddr);
>              }
>              if (current_tb == tb &&
>                  (tb_cflags(current_tb) & CF_COUNT_MASK) != 1) {
> -                /* If we are modifying the current TB, we must stop
> -                its execution. We could be more precise by checking
> -                that the modification is after the current PC, but it
> -                would require a specialized function to partially
> -                restore the CPU state */
> -
> -                current_tb_modified = 1;
> -                cpu_restore_state_from_tb(cpu, current_tb,
> -                                          cpu->mem_io_pc, true);
> +                /*
> +                 * If we are modifying the current TB, we must stop
> +                 * its execution. We could be more precise by checking
> +                 * that the modification is after the current PC, but it
> +                 * would require a specialized function to partially
> +                 * restore the CPU state.
> +                 */
> +                current_tb_modified = true;
> +                cpu_restore_state_from_tb(cpu, current_tb, retaddr, true);
>                  cpu_get_tb_cpu_state(env, &current_pc, &current_cs_base,
>                                       &current_flags);
>              }
> @@ -2042,7 +2039,8 @@ void tb_invalidate_phys_range(target_ulong start, target_ulong end)
>   * Call with all @pages in the range [@start, @start + len[ locked.
>   */
>  void tb_invalidate_phys_page_fast(struct page_collection *pages,
> -                                  tb_page_addr_t start, int len)
> +                                  tb_page_addr_t start, int len,
> +                                  uintptr_t retaddr)
>  {
>      PageDesc *p;
>  
> @@ -2069,7 +2067,8 @@ void tb_invalidate_phys_page_fast(struct page_collection *pages,
>          }
>      } else {
>      do_invalidate:
> -        tb_invalidate_phys_page_range__locked(pages, p, start, start + len, 1);
> +        tb_invalidate_phys_page_range__locked(pages, p, start, start + len,
> +                                              retaddr);
>      }
>  }
>  #else
> 

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 20/20] cputlb: Pass retaddr to tb_check_watchpoint
  2019-09-22  3:54 ` [PATCH v3 20/20] cputlb: Pass retaddr to tb_check_watchpoint Richard Henderson
@ 2019-09-23  8:54   ` David Hildenbrand
  0 siblings, 0 replies; 53+ messages in thread
From: David Hildenbrand @ 2019-09-23  8:54 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: pbonzini, alex.bennee, stefanha

On 22.09.19 05:54, Richard Henderson wrote:
> Fixes the previous TLB_WATCHPOINT patches because we are currently
> failing to set cpu->mem_io_pc with the call to cpu_check_watchpoint.
> Pass down the retaddr directly because it's readily available.
> 
> Fixes: 50b107c5d61
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  accel/tcg/translate-all.h | 2 +-
>  accel/tcg/translate-all.c | 6 +++---
>  exec.c                    | 2 +-
>  3 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/accel/tcg/translate-all.h b/accel/tcg/translate-all.h
> index 135c1ea96a..a557b4e2bb 100644
> --- a/accel/tcg/translate-all.h
> +++ b/accel/tcg/translate-all.h
> @@ -30,7 +30,7 @@ void tb_invalidate_phys_page_fast(struct page_collection *pages,
>                                    tb_page_addr_t start, int len,
>                                    uintptr_t retaddr);
>  void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end);
> -void tb_check_watchpoint(CPUState *cpu);
> +void tb_check_watchpoint(CPUState *cpu, uintptr_t retaddr);
>  
>  #ifdef CONFIG_USER_ONLY
>  int page_unprotect(target_ulong address, uintptr_t pc);
> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> index db77fb221b..66d4bc4341 100644
> --- a/accel/tcg/translate-all.c
> +++ b/accel/tcg/translate-all.c
> @@ -2142,16 +2142,16 @@ static bool tb_invalidate_phys_page(tb_page_addr_t addr, uintptr_t pc)
>  #endif
>  
>  /* user-mode: call with mmap_lock held */
> -void tb_check_watchpoint(CPUState *cpu)
> +void tb_check_watchpoint(CPUState *cpu, uintptr_t retaddr)
>  {
>      TranslationBlock *tb;
>  
>      assert_memory_lock();
>  
> -    tb = tcg_tb_lookup(cpu->mem_io_pc);
> +    tb = tcg_tb_lookup(retaddr);
>      if (tb) {
>          /* We can use retranslation to find the PC.  */
> -        cpu_restore_state_from_tb(cpu, tb, cpu->mem_io_pc, true);
> +        cpu_restore_state_from_tb(cpu, tb, retaddr, true);
>          tb_phys_invalidate(tb, -1);
>      } else {
>          /* The exception probably happened in a helper.  The CPU state should
> diff --git a/exec.c b/exec.c
> index fed25d029b..ceeef4cd4b 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -2724,7 +2724,7 @@ void cpu_check_watchpoint(CPUState *cpu, vaddr addr, vaddr len,
>                  cpu->watchpoint_hit = wp;
>  
>                  mmap_lock();
> -                tb_check_watchpoint(cpu);
> +                tb_check_watchpoint(cpu, ra);
>                  if (wp->flags & BP_STOP_BEFORE_ACCESS) {
>                      cpu->exception_index = EXCP_DEBUG;
>                      mmap_unlock();
> 

Finally, a lot of this stuff gets cleaned up :)

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 11/20] exec: Adjust notdirty tracing
  2019-09-22  3:54 ` [PATCH v3 11/20] exec: Adjust notdirty tracing Richard Henderson
@ 2019-09-23  9:17   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 53+ messages in thread
From: Philippe Mathieu-Daudé @ 2019-09-23  9:17 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: pbonzini, alex.bennee, stefanha, david

On 9/22/19 5:54 AM, Richard Henderson wrote:
> The memory_region_tb_read tracepoint is unreachable, since notdirty
> is supposed to apply only to writes.  The memory_region_tb_write
> tracepoint is mis-named, because notdirty is not only used for TB
> invalidation.  It is also used for e.g. VGA RAM updates and migration.
> 
> Replace memory_region_tb_write with memory_notdirty_write_access,
> and place it in memory_notdirty_write_prepare where it can catch
> all of the instances.  Add memory_notdirty_set_dirty to log when
> we no longer intercept writes to a page.
> 
> Reviewed-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  exec.c       | 3 +++
>  memory.c     | 4 ----
>  trace-events | 4 ++--
>  3 files changed, 5 insertions(+), 6 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index 33bd0e36c1..7ce0515635 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -2721,6 +2721,8 @@ void memory_notdirty_write_prepare(NotDirtyInfo *ndi,
>      ndi->size = size;
>      ndi->pages = NULL;
>  
> +    trace_memory_notdirty_write_access(mem_vaddr, ram_addr, size);
> +
>      assert(tcg_enabled());
>      if (!cpu_physical_memory_get_dirty_flag(ram_addr, DIRTY_MEMORY_CODE)) {
>          ndi->pages = page_collection_lock(ram_addr, ram_addr + size);
> @@ -2745,6 +2747,7 @@ void memory_notdirty_write_complete(NotDirtyInfo *ndi)
>      /* we remove the notdirty callback only if the code has been
>         flushed */
>      if (!cpu_physical_memory_is_clean(ndi->ram_addr)) {
> +        trace_memory_notdirty_set_dirty(ndi->mem_vaddr);
>          tlb_set_dirty(ndi->cpu, ndi->mem_vaddr);
>      }
>  }
> diff --git a/memory.c b/memory.c
> index b9dd6b94ca..57c44c97db 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -438,7 +438,6 @@ static MemTxResult  memory_region_read_accessor(MemoryRegion *mr,
>          /* Accesses to code which has previously been translated into a TB show
>           * up in the MMIO path, as accesses to the io_mem_notdirty
>           * MemoryRegion. */
> -        trace_memory_region_tb_read(get_cpu_index(), addr, tmp, size);
>      } else if (TRACE_MEMORY_REGION_OPS_READ_ENABLED) {
>          hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
>          trace_memory_region_ops_read(get_cpu_index(), mr, abs_addr, tmp, size);
> @@ -465,7 +464,6 @@ static MemTxResult memory_region_read_with_attrs_accessor(MemoryRegion *mr,
>          /* Accesses to code which has previously been translated into a TB show
>           * up in the MMIO path, as accesses to the io_mem_notdirty
>           * MemoryRegion. */
> -        trace_memory_region_tb_read(get_cpu_index(), addr, tmp, size);
>      } else if (TRACE_MEMORY_REGION_OPS_READ_ENABLED) {
>          hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
>          trace_memory_region_ops_read(get_cpu_index(), mr, abs_addr, tmp, size);
> @@ -490,7 +488,6 @@ static MemTxResult memory_region_write_accessor(MemoryRegion *mr,
>          /* Accesses to code which has previously been translated into a TB show
>           * up in the MMIO path, as accesses to the io_mem_notdirty
>           * MemoryRegion. */
> -        trace_memory_region_tb_write(get_cpu_index(), addr, tmp, size);
>      } else if (TRACE_MEMORY_REGION_OPS_WRITE_ENABLED) {
>          hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
>          trace_memory_region_ops_write(get_cpu_index(), mr, abs_addr, tmp, size);
> @@ -515,7 +512,6 @@ static MemTxResult memory_region_write_with_attrs_accessor(MemoryRegion *mr,
>          /* Accesses to code which has previously been translated into a TB show
>           * up in the MMIO path, as accesses to the io_mem_notdirty
>           * MemoryRegion. */
> -        trace_memory_region_tb_write(get_cpu_index(), addr, tmp, size);
>      } else if (TRACE_MEMORY_REGION_OPS_WRITE_ENABLED) {
>          hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
>          trace_memory_region_ops_write(get_cpu_index(), mr, abs_addr, tmp, size);
> diff --git a/trace-events b/trace-events
> index 823a4ae64e..20821ba545 100644
> --- a/trace-events
> +++ b/trace-events
> @@ -52,14 +52,14 @@ dma_map_wait(void *dbs) "dbs=%p"
>  find_ram_offset(uint64_t size, uint64_t offset) "size: 0x%" PRIx64 " @ 0x%" PRIx64
>  find_ram_offset_loop(uint64_t size, uint64_t candidate, uint64_t offset, uint64_t next, uint64_t mingap) "trying size: 0x%" PRIx64 " @ 0x%" PRIx64 ", offset: 0x%" PRIx64" next: 0x%" PRIx64 " mingap: 0x%" PRIx64
>  ram_block_discard_range(const char *rbname, void *hva, size_t length, bool need_madvise, bool need_fallocate, int ret) "%s@%p + 0x%zx: madvise: %d fallocate: %d ret: %d"
> +memory_notdirty_write_access(uint64_t vaddr, uint64_t ram_addr, unsigned size) "0x%" PRIx64 " ram_addr 0x%" PRIx64 " size %u"
> +memory_notdirty_set_dirty(uint64_t vaddr) "0x%" PRIx64
>  
>  # memory.c
>  memory_region_ops_read(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u"
>  memory_region_ops_write(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u"
>  memory_region_subpage_read(int cpu_index, void *mr, uint64_t offset, uint64_t value, unsigned size) "cpu %d mr %p offset 0x%"PRIx64" value 0x%"PRIx64" size %u"
>  memory_region_subpage_write(int cpu_index, void *mr, uint64_t offset, uint64_t value, unsigned size) "cpu %d mr %p offset 0x%"PRIx64" value 0x%"PRIx64" size %u"
> -memory_region_tb_read(int cpu_index, uint64_t addr, uint64_t value, unsigned size) "cpu %d addr 0x%"PRIx64" value 0x%"PRIx64" size %u"
> -memory_region_tb_write(int cpu_index, uint64_t addr, uint64_t value, unsigned size) "cpu %d addr 0x%"PRIx64" value 0x%"PRIx64" size %u"
>  memory_region_ram_device_read(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u"
>  memory_region_ram_device_write(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u"
>  flatview_new(void *view, void *root) "%p (root %p)"
> 

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 08/20] cputlb: Disable __always_inline__ without optimization
  2019-09-22  3:54 ` [PATCH v3 08/20] cputlb: Disable __always_inline__ without optimization Richard Henderson
@ 2019-09-23  9:18   ` Philippe Mathieu-Daudé
  2019-09-23  9:45   ` Paolo Bonzini
  1 sibling, 0 replies; 53+ messages in thread
From: Philippe Mathieu-Daudé @ 2019-09-23  9:18 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: pbonzini, alex.bennee, stefanha, david

On 9/22/19 5:54 AM, Richard Henderson wrote:
> This forced inlining can result in missing symbols,
> which makes a debugging build harder to follow.
> 
> Reviewed-by: David Hildenbrand <david@redhat.com>
> Reported-by: Peter Maydell <peter.maydell@linaro.org>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  include/qemu/compiler.h | 11 +++++++++++
>  accel/tcg/cputlb.c      |  4 ++--
>  2 files changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/include/qemu/compiler.h b/include/qemu/compiler.h
> index 09fc44cca4..d6d400c523 100644
> --- a/include/qemu/compiler.h
> +++ b/include/qemu/compiler.h
> @@ -170,6 +170,17 @@
>  # define QEMU_NONSTRING
>  #endif
>  
> +/*
> + * Forced inlining may be desired to encourage constant propagation
> + * of function parameters.  However, it can also make debugging harder,
> + * so disable it for a non-optimizing build.
> + */
> +#if defined(__OPTIMIZE__) && __has_attribute(always_inline)
> +#define QEMU_ALWAYS_INLINE  __attribute__((always_inline))
> +#else
> +#define QEMU_ALWAYS_INLINE
> +#endif
> +
>  /* Implement C11 _Generic via GCC builtins.  Example:
>   *
>   *    QEMU_GENERIC(x, (float, sinf), (long double, sinl), sin) (x)
> diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
> index abae79650c..2222b87764 100644
> --- a/accel/tcg/cputlb.c
> +++ b/accel/tcg/cputlb.c
> @@ -1281,7 +1281,7 @@ static void *atomic_mmu_lookup(CPUArchState *env, target_ulong addr,
>  typedef uint64_t FullLoadHelper(CPUArchState *env, target_ulong addr,
>                                  TCGMemOpIdx oi, uintptr_t retaddr);
>  
> -static inline uint64_t __attribute__((always_inline))
> +static inline uint64_t QEMU_ALWAYS_INLINE
>  load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
>              uintptr_t retaddr, MemOp op, bool code_read,
>              FullLoadHelper *full_load)
> @@ -1530,7 +1530,7 @@ tcg_target_ulong helper_be_ldsl_mmu(CPUArchState *env, target_ulong addr,
>   * Store Helpers
>   */
>  
> -static inline void __attribute__((always_inline))
> +static inline void QEMU_ALWAYS_INLINE
>  store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
>               TCGMemOpIdx oi, uintptr_t retaddr, MemOp op)
>  {
> 

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 09/20] cputlb: Replace switches in load/store_helper with callback
  2019-09-22  3:54 ` [PATCH v3 09/20] cputlb: Replace switches in load/store_helper with callback Richard Henderson
  2019-09-23  8:32   ` David Hildenbrand
@ 2019-09-23  9:27   ` Philippe Mathieu-Daudé
  2019-09-23  9:51   ` Paolo Bonzini
  2 siblings, 0 replies; 53+ messages in thread
From: Philippe Mathieu-Daudé @ 2019-09-23  9:27 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: pbonzini, alex.bennee, stefanha, david

On 9/22/19 5:54 AM, Richard Henderson wrote:
> Add a function parameter to perform the actual load/store to ram.
> With optimization, this results in identical code.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  accel/tcg/cputlb.c | 159 +++++++++++++++++++++++----------------------
>  1 file changed, 83 insertions(+), 76 deletions(-)
> 
> diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
> index 2222b87764..b4a63d3928 100644
> --- a/accel/tcg/cputlb.c
> +++ b/accel/tcg/cputlb.c
> @@ -1280,11 +1280,38 @@ static void *atomic_mmu_lookup(CPUArchState *env, target_ulong addr,
>  
>  typedef uint64_t FullLoadHelper(CPUArchState *env, target_ulong addr,
>                                  TCGMemOpIdx oi, uintptr_t retaddr);
> +typedef uint64_t LoadHelper(const void *);
> +
> +/* Wrap the unaligned load helpers to that they have a common signature.  */
> +static inline uint64_t wrap_ldub(const void *haddr)
> +{
> +    return ldub_p(haddr);
> +}
> +
> +static inline uint64_t wrap_lduw_be(const void *haddr)
> +{
> +    return lduw_be_p(haddr);
> +}
> +
> +static inline uint64_t wrap_lduw_le(const void *haddr)
> +{
> +    return lduw_le_p(haddr);
> +}
> +
> +static inline uint64_t wrap_ldul_be(const void *haddr)
> +{
> +    return (uint32_t)ldl_be_p(haddr);

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>

> +}
> +
> +static inline uint64_t wrap_ldul_le(const void *haddr)
> +{
> +    return (uint32_t)ldl_le_p(haddr);
> +}
>  
>  static inline uint64_t QEMU_ALWAYS_INLINE
>  load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
>              uintptr_t retaddr, MemOp op, bool code_read,
> -            FullLoadHelper *full_load)
> +            FullLoadHelper *full_load, LoadHelper *direct)
>  {
>      uintptr_t mmu_idx = get_mmuidx(oi);
>      uintptr_t index = tlb_index(env, mmu_idx, addr);
> @@ -1373,33 +1400,7 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
>  
>   do_aligned_access:
>      haddr = (void *)((uintptr_t)addr + entry->addend);
> -    switch (op) {
> -    case MO_UB:
> -        res = ldub_p(haddr);
> -        break;
> -    case MO_BEUW:
> -        res = lduw_be_p(haddr);
> -        break;
> -    case MO_LEUW:
> -        res = lduw_le_p(haddr);
> -        break;
> -    case MO_BEUL:
> -        res = (uint32_t)ldl_be_p(haddr);
> -        break;
> -    case MO_LEUL:
> -        res = (uint32_t)ldl_le_p(haddr);
> -        break;
> -    case MO_BEQ:
> -        res = ldq_be_p(haddr);
> -        break;
> -    case MO_LEQ:
> -        res = ldq_le_p(haddr);
> -        break;
> -    default:
> -        g_assert_not_reached();
> -    }
> -
> -    return res;
> +    return direct(haddr);
>  }
>  
>  /*
> @@ -1415,7 +1416,8 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
>  static uint64_t full_ldub_mmu(CPUArchState *env, target_ulong addr,
>                                TCGMemOpIdx oi, uintptr_t retaddr)
>  {
> -    return load_helper(env, addr, oi, retaddr, MO_UB, false, full_ldub_mmu);
> +    return load_helper(env, addr, oi, retaddr, MO_UB, false,
> +                       full_ldub_mmu, wrap_ldub);
>  }
>  
>  tcg_target_ulong helper_ret_ldub_mmu(CPUArchState *env, target_ulong addr,
> @@ -1428,7 +1430,7 @@ static uint64_t full_le_lduw_mmu(CPUArchState *env, target_ulong addr,
>                                   TCGMemOpIdx oi, uintptr_t retaddr)
>  {
>      return load_helper(env, addr, oi, retaddr, MO_LEUW, false,
> -                       full_le_lduw_mmu);
> +                       full_le_lduw_mmu, wrap_lduw_le);
>  }
>  
>  tcg_target_ulong helper_le_lduw_mmu(CPUArchState *env, target_ulong addr,
> @@ -1441,7 +1443,7 @@ static uint64_t full_be_lduw_mmu(CPUArchState *env, target_ulong addr,
>                                   TCGMemOpIdx oi, uintptr_t retaddr)
>  {
>      return load_helper(env, addr, oi, retaddr, MO_BEUW, false,
> -                       full_be_lduw_mmu);
> +                       full_be_lduw_mmu, wrap_lduw_be);
>  }
>  
>  tcg_target_ulong helper_be_lduw_mmu(CPUArchState *env, target_ulong addr,
> @@ -1454,7 +1456,7 @@ static uint64_t full_le_ldul_mmu(CPUArchState *env, target_ulong addr,
>                                   TCGMemOpIdx oi, uintptr_t retaddr)
>  {
>      return load_helper(env, addr, oi, retaddr, MO_LEUL, false,
> -                       full_le_ldul_mmu);
> +                       full_le_ldul_mmu, wrap_ldul_le);
>  }
>  
>  tcg_target_ulong helper_le_ldul_mmu(CPUArchState *env, target_ulong addr,
> @@ -1467,7 +1469,7 @@ static uint64_t full_be_ldul_mmu(CPUArchState *env, target_ulong addr,
>                                   TCGMemOpIdx oi, uintptr_t retaddr)
>  {
>      return load_helper(env, addr, oi, retaddr, MO_BEUL, false,
> -                       full_be_ldul_mmu);
> +                       full_be_ldul_mmu, wrap_ldul_be);
>  }
>  
>  tcg_target_ulong helper_be_ldul_mmu(CPUArchState *env, target_ulong addr,
> @@ -1480,14 +1482,14 @@ uint64_t helper_le_ldq_mmu(CPUArchState *env, target_ulong addr,
>                             TCGMemOpIdx oi, uintptr_t retaddr)
>  {
>      return load_helper(env, addr, oi, retaddr, MO_LEQ, false,
> -                       helper_le_ldq_mmu);
> +                       helper_le_ldq_mmu, ldq_le_p);
>  }
>  
>  uint64_t helper_be_ldq_mmu(CPUArchState *env, target_ulong addr,
>                             TCGMemOpIdx oi, uintptr_t retaddr)
>  {
>      return load_helper(env, addr, oi, retaddr, MO_BEQ, false,
> -                       helper_be_ldq_mmu);
> +                       helper_be_ldq_mmu, ldq_be_p);
>  }
>  
>  /*
> @@ -1530,9 +1532,38 @@ tcg_target_ulong helper_be_ldsl_mmu(CPUArchState *env, target_ulong addr,
>   * Store Helpers
>   */
>  
> +typedef void StoreHelper(void *, uint64_t);
> +
> +/* Wrap the unaligned store helpers to that they have a common signature.  */
> +static inline void wrap_stb(void *haddr, uint64_t val)
> +{
> +    stb_p(haddr, val);
> +}
> +
> +static inline void wrap_stw_be(void *haddr, uint64_t val)
> +{
> +    stw_be_p(haddr, val);
> +}
> +
> +static inline void wrap_stw_le(void *haddr, uint64_t val)
> +{
> +    stw_le_p(haddr, val);
> +}
> +
> +static inline void wrap_stl_be(void *haddr, uint64_t val)
> +{
> +    stl_be_p(haddr, val);
> +}
> +
> +static inline void wrap_stl_le(void *haddr, uint64_t val)
> +{
> +    stl_le_p(haddr, val);
> +}
> +
>  static inline void QEMU_ALWAYS_INLINE
>  store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
> -             TCGMemOpIdx oi, uintptr_t retaddr, MemOp op)
> +             TCGMemOpIdx oi, uintptr_t retaddr, MemOp op,
> +             StoreHelper *direct)
>  {
>      uintptr_t mmu_idx = get_mmuidx(oi);
>      uintptr_t index = tlb_index(env, mmu_idx, addr);
> @@ -1657,74 +1688,49 @@ store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
>  
>   do_aligned_access:
>      haddr = (void *)((uintptr_t)addr + entry->addend);
> -    switch (op) {
> -    case MO_UB:
> -        stb_p(haddr, val);
> -        break;
> -    case MO_BEUW:
> -        stw_be_p(haddr, val);
> -        break;
> -    case MO_LEUW:
> -        stw_le_p(haddr, val);
> -        break;
> -    case MO_BEUL:
> -        stl_be_p(haddr, val);
> -        break;
> -    case MO_LEUL:
> -        stl_le_p(haddr, val);
> -        break;
> -    case MO_BEQ:
> -        stq_be_p(haddr, val);
> -        break;
> -    case MO_LEQ:
> -        stq_le_p(haddr, val);
> -        break;
> -    default:
> -        g_assert_not_reached();
> -        break;
> -    }
> +    direct(haddr, val);
>  }
>  
>  void helper_ret_stb_mmu(CPUArchState *env, target_ulong addr, uint8_t val,
>                          TCGMemOpIdx oi, uintptr_t retaddr)
>  {
> -    store_helper(env, addr, val, oi, retaddr, MO_UB);
> +    store_helper(env, addr, val, oi, retaddr, MO_UB, wrap_stb);
>  }
>  
>  void helper_le_stw_mmu(CPUArchState *env, target_ulong addr, uint16_t val,
>                         TCGMemOpIdx oi, uintptr_t retaddr)
>  {
> -    store_helper(env, addr, val, oi, retaddr, MO_LEUW);
> +    store_helper(env, addr, val, oi, retaddr, MO_LEUW, wrap_stw_le);
>  }
>  
>  void helper_be_stw_mmu(CPUArchState *env, target_ulong addr, uint16_t val,
>                         TCGMemOpIdx oi, uintptr_t retaddr)
>  {
> -    store_helper(env, addr, val, oi, retaddr, MO_BEUW);
> +    store_helper(env, addr, val, oi, retaddr, MO_BEUW, wrap_stw_be);
>  }
>  
>  void helper_le_stl_mmu(CPUArchState *env, target_ulong addr, uint32_t val,
>                         TCGMemOpIdx oi, uintptr_t retaddr)
>  {
> -    store_helper(env, addr, val, oi, retaddr, MO_LEUL);
> +    store_helper(env, addr, val, oi, retaddr, MO_LEUL, wrap_stl_le);
>  }
>  
>  void helper_be_stl_mmu(CPUArchState *env, target_ulong addr, uint32_t val,
>                         TCGMemOpIdx oi, uintptr_t retaddr)
>  {
> -    store_helper(env, addr, val, oi, retaddr, MO_BEUL);
> +    store_helper(env, addr, val, oi, retaddr, MO_BEUL, wrap_stl_be);
>  }
>  
>  void helper_le_stq_mmu(CPUArchState *env, target_ulong addr, uint64_t val,
>                         TCGMemOpIdx oi, uintptr_t retaddr)
>  {
> -    store_helper(env, addr, val, oi, retaddr, MO_LEQ);
> +    store_helper(env, addr, val, oi, retaddr, MO_LEQ, stq_le_p);
>  }
>  
>  void helper_be_stq_mmu(CPUArchState *env, target_ulong addr, uint64_t val,
>                         TCGMemOpIdx oi, uintptr_t retaddr)
>  {
> -    store_helper(env, addr, val, oi, retaddr, MO_BEQ);
> +    store_helper(env, addr, val, oi, retaddr, MO_BEQ, stq_be_p);
>  }
>  
>  /* First set of helpers allows passing in of OI and RETADDR.  This makes
> @@ -1789,7 +1795,8 @@ void helper_be_stq_mmu(CPUArchState *env, target_ulong addr, uint64_t val,
>  static uint64_t full_ldub_cmmu(CPUArchState *env, target_ulong addr,
>                                 TCGMemOpIdx oi, uintptr_t retaddr)
>  {
> -    return load_helper(env, addr, oi, retaddr, MO_8, true, full_ldub_cmmu);
> +    return load_helper(env, addr, oi, retaddr, MO_8, true,
> +                       full_ldub_cmmu, wrap_ldub);
>  }
>  
>  uint8_t helper_ret_ldb_cmmu(CPUArchState *env, target_ulong addr,
> @@ -1802,7 +1809,7 @@ static uint64_t full_le_lduw_cmmu(CPUArchState *env, target_ulong addr,
>                                    TCGMemOpIdx oi, uintptr_t retaddr)
>  {
>      return load_helper(env, addr, oi, retaddr, MO_LEUW, true,
> -                       full_le_lduw_cmmu);
> +                       full_le_lduw_cmmu, wrap_lduw_le);
>  }
>  
>  uint16_t helper_le_ldw_cmmu(CPUArchState *env, target_ulong addr,
> @@ -1815,7 +1822,7 @@ static uint64_t full_be_lduw_cmmu(CPUArchState *env, target_ulong addr,
>                                    TCGMemOpIdx oi, uintptr_t retaddr)
>  {
>      return load_helper(env, addr, oi, retaddr, MO_BEUW, true,
> -                       full_be_lduw_cmmu);
> +                       full_be_lduw_cmmu, wrap_lduw_be);
>  }
>  
>  uint16_t helper_be_ldw_cmmu(CPUArchState *env, target_ulong addr,
> @@ -1828,7 +1835,7 @@ static uint64_t full_le_ldul_cmmu(CPUArchState *env, target_ulong addr,
>                                    TCGMemOpIdx oi, uintptr_t retaddr)
>  {
>      return load_helper(env, addr, oi, retaddr, MO_LEUL, true,
> -                       full_le_ldul_cmmu);
> +                       full_le_ldul_cmmu, wrap_ldul_le);
>  }
>  
>  uint32_t helper_le_ldl_cmmu(CPUArchState *env, target_ulong addr,
> @@ -1841,7 +1848,7 @@ static uint64_t full_be_ldul_cmmu(CPUArchState *env, target_ulong addr,
>                                    TCGMemOpIdx oi, uintptr_t retaddr)
>  {
>      return load_helper(env, addr, oi, retaddr, MO_BEUL, true,
> -                       full_be_ldul_cmmu);
> +                       full_be_ldul_cmmu, wrap_ldul_be);
>  }
>  
>  uint32_t helper_be_ldl_cmmu(CPUArchState *env, target_ulong addr,
> @@ -1854,12 +1861,12 @@ uint64_t helper_le_ldq_cmmu(CPUArchState *env, target_ulong addr,
>                              TCGMemOpIdx oi, uintptr_t retaddr)
>  {
>      return load_helper(env, addr, oi, retaddr, MO_LEQ, true,
> -                       helper_le_ldq_cmmu);
> +                       helper_le_ldq_cmmu, ldq_le_p);
>  }
>  
>  uint64_t helper_be_ldq_cmmu(CPUArchState *env, target_ulong addr,
>                              TCGMemOpIdx oi, uintptr_t retaddr)
>  {
>      return load_helper(env, addr, oi, retaddr, MO_BEQ, true,
> -                       helper_be_ldq_cmmu);
> +                       helper_be_ldq_cmmu, ldq_be_p);
>  }
> 


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 13/20] cputlb: Move NOTDIRTY handling from I/O path to TLB path
  2019-09-22  3:54 ` [PATCH v3 13/20] cputlb: Move NOTDIRTY " Richard Henderson
  2019-09-23  8:41   ` David Hildenbrand
@ 2019-09-23  9:30   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 53+ messages in thread
From: Philippe Mathieu-Daudé @ 2019-09-23  9:30 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: pbonzini, alex.bennee, stefanha, david

On 9/22/19 5:54 AM, Richard Henderson wrote:
> Pages that we want to track for NOTDIRTY are RAM.  We do not
> really need to go through the I/O path to handle them.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  include/exec/cpu-common.h |  2 --
>  accel/tcg/cputlb.c        | 26 +++++++++++++++++---
>  exec.c                    | 50 ---------------------------------------
>  memory.c                  | 16 -------------
>  4 files changed, 23 insertions(+), 71 deletions(-)
> 
> diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
> index 1c0e03ddc2..81753bbb34 100644
> --- a/include/exec/cpu-common.h
> +++ b/include/exec/cpu-common.h
> @@ -100,8 +100,6 @@ void qemu_flush_coalesced_mmio_buffer(void);
>  
>  void cpu_flush_icache_range(hwaddr start, hwaddr len);
>  
> -extern struct MemoryRegion io_mem_notdirty;
> -
>  typedef int (RAMBlockIterFunc)(RAMBlock *rb, void *opaque);
>  
>  int qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque);
> diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
> index 7ab523d7ec..b7bd738115 100644
> --- a/accel/tcg/cputlb.c
> +++ b/accel/tcg/cputlb.c
> @@ -904,7 +904,7 @@ static uint64_t io_readx(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
>      mr = section->mr;
>      mr_offset = (iotlbentry->addr & TARGET_PAGE_MASK) + addr;
>      cpu->mem_io_pc = retaddr;
> -    if (mr != &io_mem_notdirty && !cpu->can_do_io) {
> +    if (!cpu->can_do_io) {
>          cpu_io_recompile(cpu, retaddr);
>      }
>  
> @@ -945,7 +945,7 @@ static void io_writex(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
>      section = iotlb_to_section(cpu, iotlbentry->addr, iotlbentry->attrs);
>      mr = section->mr;
>      mr_offset = (iotlbentry->addr & TARGET_PAGE_MASK) + addr;
> -    if (mr != &io_mem_notdirty && !cpu->can_do_io) {
> +    if (!cpu->can_do_io) {
>          cpu_io_recompile(cpu, retaddr);
>      }
>      cpu->mem_io_vaddr = addr;
> @@ -1606,7 +1606,7 @@ store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
>          }
>  
>          /* Handle I/O access.  */
> -        if (likely(tlb_addr & (TLB_MMIO | TLB_NOTDIRTY))) {
> +        if (tlb_addr & TLB_MMIO) {
>              io_writex(env, iotlbentry, mmu_idx, val, addr, retaddr,
>                        op ^ (tlb_addr & TLB_BSWAP ? MO_BSWAP : 0));
>              return;
> @@ -1619,6 +1619,26 @@ store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
>  
>          haddr = (void *)((uintptr_t)addr + entry->addend);
>  
> +        /* Handle clean RAM pages.  */
> +        if (tlb_addr & TLB_NOTDIRTY) {
> +            NotDirtyInfo ndi;
> +
> +            /* We require mem_io_pc in tb_invalidate_phys_page_range.  */
> +            env_cpu(env)->mem_io_pc = retaddr;
> +
> +            memory_notdirty_write_prepare(&ndi, env_cpu(env), addr,
> +                                          addr + iotlbentry->addr, size);
> +
> +            if (unlikely(tlb_addr & TLB_BSWAP)) {
> +                direct_swap(haddr, val);
> +            } else {
> +                direct(haddr, val);
> +            }
> +
> +            memory_notdirty_write_complete(&ndi);
> +            return;
> +        }
> +
>          if (unlikely(tlb_addr & TLB_BSWAP)) {
>              direct_swap(haddr, val);
>          } else {
> diff --git a/exec.c b/exec.c
> index e21e068535..abf58b68a0 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -88,7 +88,6 @@ static MemoryRegion *system_io;
>  AddressSpace address_space_io;
>  AddressSpace address_space_memory;
>  
> -MemoryRegion io_mem_notdirty;
>  static MemoryRegion io_mem_unassigned;
>  #endif
>  
> @@ -157,7 +156,6 @@ typedef struct subpage_t {
>  } subpage_t;
>  
>  #define PHYS_SECTION_UNASSIGNED 0
> -#define PHYS_SECTION_NOTDIRTY 1
>  
>  static void io_mem_init(void);
>  static void memory_map_init(void);
> @@ -1438,9 +1436,6 @@ hwaddr memory_region_section_get_iotlb(CPUState *cpu,
>      if (memory_region_is_ram(section->mr)) {
>          /* Normal RAM.  */
>          iotlb = memory_region_get_ram_addr(section->mr) + xlat;
> -        if (!section->readonly) {
> -            iotlb |= PHYS_SECTION_NOTDIRTY;
> -        }
>      } else {
>          AddressSpaceDispatch *d;
>  
> @@ -2749,42 +2744,6 @@ void memory_notdirty_write_complete(NotDirtyInfo *ndi)
>      }
>  }
>  
> -/* Called within RCU critical section.  */
> -static void notdirty_mem_write(void *opaque, hwaddr ram_addr,
> -                               uint64_t val, unsigned size)
> -{
> -    NotDirtyInfo ndi;
> -
> -    memory_notdirty_write_prepare(&ndi, current_cpu, current_cpu->mem_io_vaddr,
> -                         ram_addr, size);
> -
> -    stn_p(qemu_map_ram_ptr(NULL, ram_addr), size, val);
> -    memory_notdirty_write_complete(&ndi);
> -}
> -
> -static bool notdirty_mem_accepts(void *opaque, hwaddr addr,
> -                                 unsigned size, bool is_write,
> -                                 MemTxAttrs attrs)
> -{
> -    return is_write;
> -}
> -
> -static const MemoryRegionOps notdirty_mem_ops = {
> -    .write = notdirty_mem_write,
> -    .valid.accepts = notdirty_mem_accepts,
> -    .endianness = DEVICE_NATIVE_ENDIAN,
> -    .valid = {
> -        .min_access_size = 1,
> -        .max_access_size = 8,
> -        .unaligned = false,
> -    },
> -    .impl = {
> -        .min_access_size = 1,
> -        .max_access_size = 8,
> -        .unaligned = false,
> -    },
> -};
> -
>  /* Generate a debug exception if a watchpoint has been hit.  */
>  void cpu_check_watchpoint(CPUState *cpu, vaddr addr, vaddr len,
>                            MemTxAttrs attrs, int flags, uintptr_t ra)
> @@ -2980,13 +2939,6 @@ static void io_mem_init(void)
>  {
>      memory_region_init_io(&io_mem_unassigned, NULL, &unassigned_mem_ops, NULL,
>                            NULL, UINT64_MAX);
> -
> -    /* io_mem_notdirty calls tb_invalidate_phys_page_fast,
> -     * which can be called without the iothread mutex.
> -     */
> -    memory_region_init_io(&io_mem_notdirty, NULL, &notdirty_mem_ops, NULL,
> -                          NULL, UINT64_MAX);
> -    memory_region_clear_global_locking(&io_mem_notdirty);
>  }
>  
>  AddressSpaceDispatch *address_space_dispatch_new(FlatView *fv)
> @@ -2996,8 +2948,6 @@ AddressSpaceDispatch *address_space_dispatch_new(FlatView *fv)
>  
>      n = dummy_section(&d->map, fv, &io_mem_unassigned);
>      assert(n == PHYS_SECTION_UNASSIGNED);
> -    n = dummy_section(&d->map, fv, &io_mem_notdirty);
> -    assert(n == PHYS_SECTION_NOTDIRTY);
>  
>      d->phys_map  = (PhysPageEntry) { .ptr = PHYS_MAP_NODE_NIL, .skip = 1 };
>  
> diff --git a/memory.c b/memory.c
> index 57c44c97db..a99b8c0767 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -434,10 +434,6 @@ static MemTxResult  memory_region_read_accessor(MemoryRegion *mr,
>      tmp = mr->ops->read(mr->opaque, addr, size);
>      if (mr->subpage) {
>          trace_memory_region_subpage_read(get_cpu_index(), mr, addr, tmp, size);
> -    } else if (mr == &io_mem_notdirty) {
> -        /* Accesses to code which has previously been translated into a TB show
> -         * up in the MMIO path, as accesses to the io_mem_notdirty
> -         * MemoryRegion. */
>      } else if (TRACE_MEMORY_REGION_OPS_READ_ENABLED) {
>          hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
>          trace_memory_region_ops_read(get_cpu_index(), mr, abs_addr, tmp, size);
> @@ -460,10 +456,6 @@ static MemTxResult memory_region_read_with_attrs_accessor(MemoryRegion *mr,
>      r = mr->ops->read_with_attrs(mr->opaque, addr, &tmp, size, attrs);
>      if (mr->subpage) {
>          trace_memory_region_subpage_read(get_cpu_index(), mr, addr, tmp, size);
> -    } else if (mr == &io_mem_notdirty) {
> -        /* Accesses to code which has previously been translated into a TB show
> -         * up in the MMIO path, as accesses to the io_mem_notdirty
> -         * MemoryRegion. */
>      } else if (TRACE_MEMORY_REGION_OPS_READ_ENABLED) {
>          hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
>          trace_memory_region_ops_read(get_cpu_index(), mr, abs_addr, tmp, size);
> @@ -484,10 +476,6 @@ static MemTxResult memory_region_write_accessor(MemoryRegion *mr,
>  
>      if (mr->subpage) {
>          trace_memory_region_subpage_write(get_cpu_index(), mr, addr, tmp, size);
> -    } else if (mr == &io_mem_notdirty) {
> -        /* Accesses to code which has previously been translated into a TB show
> -         * up in the MMIO path, as accesses to the io_mem_notdirty
> -         * MemoryRegion. */
>      } else if (TRACE_MEMORY_REGION_OPS_WRITE_ENABLED) {
>          hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
>          trace_memory_region_ops_write(get_cpu_index(), mr, abs_addr, tmp, size);
> @@ -508,10 +496,6 @@ static MemTxResult memory_region_write_with_attrs_accessor(MemoryRegion *mr,
>  
>      if (mr->subpage) {
>          trace_memory_region_subpage_write(get_cpu_index(), mr, addr, tmp, size);
> -    } else if (mr == &io_mem_notdirty) {
> -        /* Accesses to code which has previously been translated into a TB show
> -         * up in the MMIO path, as accesses to the io_mem_notdirty
> -         * MemoryRegion. */
>      } else if (TRACE_MEMORY_REGION_OPS_WRITE_ENABLED) {
>          hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
>          trace_memory_region_ops_write(get_cpu_index(), mr, abs_addr, tmp, size);
> 

Very nice!
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 08/20] cputlb: Disable __always_inline__ without optimization
  2019-09-22  3:54 ` [PATCH v3 08/20] cputlb: Disable __always_inline__ without optimization Richard Henderson
  2019-09-23  9:18   ` Philippe Mathieu-Daudé
@ 2019-09-23  9:45   ` Paolo Bonzini
  2019-09-23 16:00     ` Richard Henderson
  1 sibling, 1 reply; 53+ messages in thread
From: Paolo Bonzini @ 2019-09-23  9:45 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: alex.bennee, stefanha, david

On 22/09/19 05:54, Richard Henderson wrote:
> +/*
> + * Forced inlining may be desired to encourage constant propagation
> + * of function parameters.  However, it can also make debugging harder,
> + * so disable it for a non-optimizing build.
> + */
> +#if defined(__OPTIMIZE__) && __has_attribute(always_inline)
> +#define QEMU_ALWAYS_INLINE  __attribute__((always_inline))

GCC doesn't have __has_attribute, does it?  I think you can just assume
that it exists and #ifdef __OPTIMIZE__.

Paolo

> +#else
> +#define QEMU_ALWAYS_INLINE



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 09/20] cputlb: Replace switches in load/store_helper with callback
  2019-09-22  3:54 ` [PATCH v3 09/20] cputlb: Replace switches in load/store_helper with callback Richard Henderson
  2019-09-23  8:32   ` David Hildenbrand
  2019-09-23  9:27   ` Philippe Mathieu-Daudé
@ 2019-09-23  9:51   ` Paolo Bonzini
  2019-09-23  9:54     ` David Hildenbrand
  2 siblings, 1 reply; 53+ messages in thread
From: Paolo Bonzini @ 2019-09-23  9:51 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: alex.bennee, stefanha, david

On 22/09/19 05:54, Richard Henderson wrote:
> +/* Wrap the unaligned load helpers to that they have a common signature.  */
> +static inline uint64_t wrap_ldub(const void *haddr)
> +{
> +    return ldub_p(haddr);
> +}
> +
> +static inline uint64_t wrap_lduw_be(const void *haddr)
> +{
> +    return lduw_be_p(haddr);
> +}
> +
> +static inline uint64_t wrap_lduw_le(const void *haddr)
> +{
> +    return lduw_le_p(haddr);
> +}
> +
> +static inline uint64_t wrap_ldul_be(const void *haddr)
> +{
> +    return (uint32_t)ldl_be_p(haddr);
> +}
> +
> +static inline uint64_t wrap_ldul_le(const void *haddr)
> +{
> +    return (uint32_t)ldl_le_p(haddr);
> +}

Any reason to have these five functions (plus five for stores) instead
of a pair

static uint64_t ld_memop(const void *haddr, MemOp op)
{
}

static uint64_t st_memop(void *haddr, MemOp op, uint64_t val)
{
}

that includes the switches?  Everything should be inlined just the same
if you do

        if (unlikely(tlb_addr & TLB_BSWAP)) {
            st_memop(haddr, op ^ MO_BSWAP, val);
        } else {
            st_memop(haddr, op, val);
        }

and the like.

Paolo

>  static inline uint64_t QEMU_ALWAYS_INLINE
>  load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
>              uintptr_t retaddr, MemOp op, bool code_read,
> -            FullLoadHelper *full_load)
> +            FullLoadHelper *full_load, LoadHelper *direct)
>  {
>      uintptr_t mmu_idx = get_mmuidx(oi);
>      uintptr_t index = tlb_index(env, mmu_idx, addr);
> @@ -1373,33 +1400,7 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
>  
>   do_aligned_access:
>      haddr = (void *)((uintptr_t)addr + entry->addend);
> -    switch (op) {
> -    case MO_UB:
> -        res = ldub_p(haddr);
> -        break;
> -    case MO_BEUW:
> -        res = lduw_be_p(haddr);
> -        break;
> -    case MO_LEUW:
> -        res = lduw_le_p(haddr);
> -        break;
> -    case MO_BEUL:
> -        res = (uint32_t)ldl_be_p(haddr);
> -        break;
> -    case MO_LEUL:
> -        res = (uint32_t)ldl_le_p(haddr);
> -        break;
> -    case MO_BEQ:
> -        res = ldq_be_p(haddr);
> -        break;
> -    case MO_LEQ:
> -        res = ldq_le_p(haddr);
> -        break;
> -    default:
> -        g_assert_not_reached();
> -    }
> -
> -    return res;
> +    return direct(haddr);



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 09/20] cputlb: Replace switches in load/store_helper with callback
  2019-09-23  9:51   ` Paolo Bonzini
@ 2019-09-23  9:54     ` David Hildenbrand
  2019-09-23 10:02       ` Paolo Bonzini
  0 siblings, 1 reply; 53+ messages in thread
From: David Hildenbrand @ 2019-09-23  9:54 UTC (permalink / raw)
  To: Paolo Bonzini, Richard Henderson, qemu-devel; +Cc: alex.bennee, stefanha

On 23.09.19 11:51, Paolo Bonzini wrote:
> On 22/09/19 05:54, Richard Henderson wrote:
>> +/* Wrap the unaligned load helpers to that they have a common signature.  */
>> +static inline uint64_t wrap_ldub(const void *haddr)
>> +{
>> +    return ldub_p(haddr);
>> +}
>> +
>> +static inline uint64_t wrap_lduw_be(const void *haddr)
>> +{
>> +    return lduw_be_p(haddr);
>> +}
>> +
>> +static inline uint64_t wrap_lduw_le(const void *haddr)
>> +{
>> +    return lduw_le_p(haddr);
>> +}
>> +
>> +static inline uint64_t wrap_ldul_be(const void *haddr)
>> +{
>> +    return (uint32_t)ldl_be_p(haddr);
>> +}
>> +
>> +static inline uint64_t wrap_ldul_le(const void *haddr)
>> +{
>> +    return (uint32_t)ldl_le_p(haddr);
>> +}
> 
> Any reason to have these five functions (plus five for stores) instead
> of a pair
> 
> static uint64_t ld_memop(const void *haddr, MemOp op)
> {
> }
> 
> static uint64_t st_memop(void *haddr, MemOp op, uint64_t val)
> {
> }
> 
> that includes the switches?  Everything should be inlined just the same
> if you do
> 
>         if (unlikely(tlb_addr & TLB_BSWAP)) {
>             st_memop(haddr, op ^ MO_BSWAP, val);
>         } else {
>             st_memop(haddr, op, val);
>         }

I asked the same question on v2 and Richard explained that - for
whatever reason -  the compiler will not properly propagate the constant
in the "op ^ MO_BSWAP" case.

> 
> and the like.
> 
> Paolo


-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 09/20] cputlb: Replace switches in load/store_helper with callback
  2019-09-23  9:54     ` David Hildenbrand
@ 2019-09-23 10:02       ` Paolo Bonzini
  2019-09-23 15:52         ` Richard Henderson
  0 siblings, 1 reply; 53+ messages in thread
From: Paolo Bonzini @ 2019-09-23 10:02 UTC (permalink / raw)
  To: David Hildenbrand, Richard Henderson, qemu-devel; +Cc: alex.bennee, stefanha

On 23/09/19 11:54, David Hildenbrand wrote:
> On 23.09.19 11:51, Paolo Bonzini wrote:
>> On 22/09/19 05:54, Richard Henderson wrote:
>>> +/* Wrap the unaligned load helpers to that they have a common signature.  */
>>> +static inline uint64_t wrap_ldub(const void *haddr)
>>> +{
>>> +    return ldub_p(haddr);
>>> +}
>>> +
>>> +static inline uint64_t wrap_lduw_be(const void *haddr)
>>> +{
>>> +    return lduw_be_p(haddr);
>>> +}
>>> +
>>> +static inline uint64_t wrap_lduw_le(const void *haddr)
>>> +{
>>> +    return lduw_le_p(haddr);
>>> +}
>>> +
>>> +static inline uint64_t wrap_ldul_be(const void *haddr)
>>> +{
>>> +    return (uint32_t)ldl_be_p(haddr);
>>> +}
>>> +
>>> +static inline uint64_t wrap_ldul_le(const void *haddr)
>>> +{
>>> +    return (uint32_t)ldl_le_p(haddr);
>>> +}
>>
>> Any reason to have these five functions (plus five for stores) instead
>> of a pair
>>
>> static uint64_t ld_memop(const void *haddr, MemOp op)
>> {
>> }
>>
>> static uint64_t st_memop(void *haddr, MemOp op, uint64_t val)
>> {
>> }
>>
>> that includes the switches?  Everything should be inlined just the same
>> if you do
>>
>>         if (unlikely(tlb_addr & TLB_BSWAP)) {
>>             st_memop(haddr, op ^ MO_BSWAP, val);
>>         } else {
>>             st_memop(haddr, op, val);
>>         }
> 
> I asked the same question on v2 and Richard explained that - for
> whatever reason -  the compiler will not properly propagate the constant
> in the "op ^ MO_BSWAP" case.

Even if ld_memop and st_memop are __always_inline__?

Paolo


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 09/20] cputlb: Replace switches in load/store_helper with callback
  2019-09-23 10:02       ` Paolo Bonzini
@ 2019-09-23 15:52         ` Richard Henderson
  2019-09-23 18:18           ` Richard Henderson
  0 siblings, 1 reply; 53+ messages in thread
From: Richard Henderson @ 2019-09-23 15:52 UTC (permalink / raw)
  To: Paolo Bonzini, David Hildenbrand, qemu-devel; +Cc: alex.bennee, stefanha

On 9/23/19 3:02 AM, Paolo Bonzini wrote:
> On 23/09/19 11:54, David Hildenbrand wrote:
>> On 23.09.19 11:51, Paolo Bonzini wrote:
>>> that includes the switches?  Everything should be inlined just the same
>>> if you do
>>>
>>>         if (unlikely(tlb_addr & TLB_BSWAP)) {
>>>             st_memop(haddr, op ^ MO_BSWAP, val);
>>>         } else {
>>>             st_memop(haddr, op, val);
>>>         }
>>
>> I asked the same question on v2 and Richard explained that - for
>> whatever reason -  the compiler will not properly propagate the constant
>> in the "op ^ MO_BSWAP" case.
> 
> Even if ld_memop and st_memop are __always_inline__?

I'm not sure I tried __always_inline__.  I can, if you like.


r~


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 08/20] cputlb: Disable __always_inline__ without optimization
  2019-09-23  9:45   ` Paolo Bonzini
@ 2019-09-23 16:00     ` Richard Henderson
  2019-09-23 16:49       ` Paolo Bonzini
  0 siblings, 1 reply; 53+ messages in thread
From: Richard Henderson @ 2019-09-23 16:00 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: alex.bennee, stefanha, david

On 9/23/19 2:45 AM, Paolo Bonzini wrote:
> On 22/09/19 05:54, Richard Henderson wrote:
>> +/*
>> + * Forced inlining may be desired to encourage constant propagation
>> + * of function parameters.  However, it can also make debugging harder,
>> + * so disable it for a non-optimizing build.
>> + */
>> +#if defined(__OPTIMIZE__) && __has_attribute(always_inline)
>> +#define QEMU_ALWAYS_INLINE  __attribute__((always_inline))
> 
> GCC doesn't have __has_attribute, does it?

It does, since at least gcc 5.  And now I realize that's only a reorganization
of the support and not when it was introduced.


r~


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 18/20] cputlb: Remove tb_invalidate_phys_page_range is_cpu_write_access
  2019-09-23  8:52   ` David Hildenbrand
@ 2019-09-23 16:05     ` Richard Henderson
  2019-09-23 16:50       ` Paolo Bonzini
  0 siblings, 1 reply; 53+ messages in thread
From: Richard Henderson @ 2019-09-23 16:05 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel; +Cc: pbonzini, alex.bennee, stefanha

On 9/23/19 1:52 AM, David Hildenbrand wrote:
>> -    tb_invalidate_phys_page_range__locked(pages, p, start, end,
>> -                                          is_cpu_write_access);
>> +    tb_invalidate_phys_page_range__locked(pages, p, start, end, 0);
> 
> I'd prefer "false" to highlight the semantics, but as it's and int ...
> 
> Reviewed-by: David Hildenbrand <david@redhat.com>

I did think of that, but then the next patch changes this last argument to
"uintptr_t retaddr", so then it's not really a boolean argument anymore, and
"0" is exactly what we want.


r~



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 02/20] exec: Split out variable page size support to exec-vary.c
  2019-09-23  8:26   ` David Hildenbrand
@ 2019-09-23 16:27     ` Richard Henderson
  0 siblings, 0 replies; 53+ messages in thread
From: Richard Henderson @ 2019-09-23 16:27 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel; +Cc: pbonzini, alex.bennee, stefanha

On 9/23/19 1:26 AM, David Hildenbrand wrote:
>> +void finalize_target_page_bits(void)
>> +{
>> +#ifdef TARGET_PAGE_BITS_VARY
>> +    if (target_page_bits == 0) {
>> +        target_page_bits = TARGET_PAGE_BITS_MIN;
>> +    }
>> +    target_page_bits_decided = true;
>> +#endif
>> +}
> I wonder if it would be nicer to handle this in the header file instead,
> providing dummy functions there.

No can do, because set_preferred_target_page_bits is used by vl.c, which is
compiled once and so does not have access to cpu.h definitions.


r~


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 08/20] cputlb: Disable __always_inline__ without optimization
  2019-09-23 16:00     ` Richard Henderson
@ 2019-09-23 16:49       ` Paolo Bonzini
  2019-09-23 18:09         ` Richard Henderson
  0 siblings, 1 reply; 53+ messages in thread
From: Paolo Bonzini @ 2019-09-23 16:49 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: alex.bennee, stefanha, david

On 23/09/19 18:00, Richard Henderson wrote:
> On 9/23/19 2:45 AM, Paolo Bonzini wrote:
>> On 22/09/19 05:54, Richard Henderson wrote:
>>> +/*
>>> + * Forced inlining may be desired to encourage constant propagation
>>> + * of function parameters.  However, it can also make debugging harder,
>>> + * so disable it for a non-optimizing build.
>>> + */
>>> +#if defined(__OPTIMIZE__) && __has_attribute(always_inline)
>>> +#define QEMU_ALWAYS_INLINE  __attribute__((always_inline))
>>
>> GCC doesn't have __has_attribute, does it?
> 
> It does, since at least gcc 5.  And now I realize that's only a reorganization
> of the support and not when it was introduced.

Hmm, still we support 4.8 and always_inline is much older than that.  So
I'm not sure it's useful to test it.

Paolo



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 18/20] cputlb: Remove tb_invalidate_phys_page_range is_cpu_write_access
  2019-09-23 16:05     ` Richard Henderson
@ 2019-09-23 16:50       ` Paolo Bonzini
  0 siblings, 0 replies; 53+ messages in thread
From: Paolo Bonzini @ 2019-09-23 16:50 UTC (permalink / raw)
  To: Richard Henderson, David Hildenbrand, qemu-devel; +Cc: alex.bennee, stefanha

On 23/09/19 18:05, Richard Henderson wrote:
> On 9/23/19 1:52 AM, David Hildenbrand wrote:
>>> -    tb_invalidate_phys_page_range__locked(pages, p, start, end,
>>> -                                          is_cpu_write_access);
>>> +    tb_invalidate_phys_page_range__locked(pages, p, start, end, 0);
>>
>> I'd prefer "false" to highlight the semantics, but as it's and int ...
>>
>> Reviewed-by: David Hildenbrand <david@redhat.com>
> 
> I did think of that, but then the next patch changes this last argument to
> "uintptr_t retaddr", so then it's not really a boolean argument anymore, and
> "0" is exactly what we want.

Yeah, 0 is fine here.

Paolo



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 08/20] cputlb: Disable __always_inline__ without optimization
  2019-09-23 16:49       ` Paolo Bonzini
@ 2019-09-23 18:09         ` Richard Henderson
  0 siblings, 0 replies; 53+ messages in thread
From: Richard Henderson @ 2019-09-23 18:09 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: alex.bennee, stefanha, david

On 9/23/19 9:49 AM, Paolo Bonzini wrote:
> Hmm, still we support 4.8 and always_inline is much older than that.  So
> I'm not sure it's useful to test it.

Fair enough.


r~


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 09/20] cputlb: Replace switches in load/store_helper with callback
  2019-09-23 15:52         ` Richard Henderson
@ 2019-09-23 18:18           ` Richard Henderson
  0 siblings, 0 replies; 53+ messages in thread
From: Richard Henderson @ 2019-09-23 18:18 UTC (permalink / raw)
  To: Paolo Bonzini, David Hildenbrand, qemu-devel; +Cc: alex.bennee, stefanha

On 9/23/19 8:52 AM, Richard Henderson wrote:
> On 9/23/19 3:02 AM, Paolo Bonzini wrote:
>> On 23/09/19 11:54, David Hildenbrand wrote:
>>> On 23.09.19 11:51, Paolo Bonzini wrote:
>>>> that includes the switches?  Everything should be inlined just the same
>>>> if you do
>>>>
>>>>         if (unlikely(tlb_addr & TLB_BSWAP)) {
>>>>             st_memop(haddr, op ^ MO_BSWAP, val);
>>>>         } else {
>>>>             st_memop(haddr, op, val);
>>>>         }
>>>
>>> I asked the same question on v2 and Richard explained that - for
>>> whatever reason -  the compiler will not properly propagate the constant
>>> in the "op ^ MO_BSWAP" case.
>>
>> Even if ld_memop and st_memop are __always_inline__?
> 
> I'm not sure I tried __always_inline__.  I can, if you like.

FWIW, that works.

Since two of you have now asked about this, I'll change the patch.


r~


^ permalink raw reply	[flat|nested] 53+ messages in thread

end of thread, other threads:[~2019-09-23 18:23 UTC | newest]

Thread overview: 53+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-22  3:54 [PATCH v3 00/20] Move rom and notdirty handling to cputlb Richard Henderson
2019-09-22  3:54 ` [PATCH v3 01/20] exec: Use TARGET_PAGE_BITS_MIN for TLB flags Richard Henderson
2019-09-23  8:24   ` David Hildenbrand
2019-09-22  3:54 ` [PATCH v3 02/20] exec: Split out variable page size support to exec-vary.c Richard Henderson
2019-09-23  8:26   ` David Hildenbrand
2019-09-23 16:27     ` Richard Henderson
2019-09-22  3:54 ` [PATCH v3 03/20] exec: Use const alias for TARGET_PAGE_BITS_VARY Richard Henderson
2019-09-22  3:54 ` [PATCH v3 04/20] exec: Restrict TARGET_PAGE_BITS_VARY assert to CONFIG_DEBUG_TCG Richard Henderson
2019-09-22  3:54 ` [PATCH v3 05/20] exec: Promote TARGET_PAGE_MASK to target_long Richard Henderson
2019-09-23  8:30   ` David Hildenbrand
2019-09-22  3:54 ` [PATCH v3 06/20] exec: Tidy TARGET_PAGE_ALIGN Richard Henderson
2019-09-23  8:30   ` David Hildenbrand
2019-09-22  3:54 ` [PATCH v3 07/20] exec: Cache TARGET_PAGE_MASK for TARGET_PAGE_BITS_VARY Richard Henderson
2019-09-23  8:31   ` David Hildenbrand
2019-09-22  3:54 ` [PATCH v3 08/20] cputlb: Disable __always_inline__ without optimization Richard Henderson
2019-09-23  9:18   ` Philippe Mathieu-Daudé
2019-09-23  9:45   ` Paolo Bonzini
2019-09-23 16:00     ` Richard Henderson
2019-09-23 16:49       ` Paolo Bonzini
2019-09-23 18:09         ` Richard Henderson
2019-09-22  3:54 ` [PATCH v3 09/20] cputlb: Replace switches in load/store_helper with callback Richard Henderson
2019-09-23  8:32   ` David Hildenbrand
2019-09-23  9:27   ` Philippe Mathieu-Daudé
2019-09-23  9:51   ` Paolo Bonzini
2019-09-23  9:54     ` David Hildenbrand
2019-09-23 10:02       ` Paolo Bonzini
2019-09-23 15:52         ` Richard Henderson
2019-09-23 18:18           ` Richard Henderson
2019-09-22  3:54 ` [PATCH v3 10/20] cputlb: Introduce TLB_BSWAP Richard Henderson
2019-09-23  8:33   ` David Hildenbrand
2019-09-22  3:54 ` [PATCH v3 11/20] exec: Adjust notdirty tracing Richard Henderson
2019-09-23  9:17   ` Philippe Mathieu-Daudé
2019-09-22  3:54 ` [PATCH v3 12/20] cputlb: Move ROM handling from I/O path to TLB path Richard Henderson
2019-09-23  8:39   ` David Hildenbrand
2019-09-22  3:54 ` [PATCH v3 13/20] cputlb: Move NOTDIRTY " Richard Henderson
2019-09-23  8:41   ` David Hildenbrand
2019-09-23  9:30   ` Philippe Mathieu-Daudé
2019-09-22  3:54 ` [PATCH v3 14/20] cputlb: Partially inline memory_region_section_get_iotlb Richard Henderson
2019-09-22  3:54 ` [PATCH v3 15/20] cputlb: Merge and move memory_notdirty_write_{prepare, complete} Richard Henderson
2019-09-22  3:54 ` [PATCH v3 16/20] cputlb: Handle TLB_NOTDIRTY in probe_access Richard Henderson
2019-09-22  3:54 ` [PATCH v3 17/20] cputlb: Remove cpu->mem_io_vaddr Richard Henderson
2019-09-23  8:50   ` David Hildenbrand
2019-09-22  3:54 ` [PATCH v3 18/20] cputlb: Remove tb_invalidate_phys_page_range is_cpu_write_access Richard Henderson
2019-09-23  8:52   ` David Hildenbrand
2019-09-23 16:05     ` Richard Henderson
2019-09-23 16:50       ` Paolo Bonzini
2019-09-22  3:54 ` [PATCH v3 19/20] cputlb: Pass retaddr to tb_invalidate_phys_page_fast Richard Henderson
2019-09-23  8:53   ` David Hildenbrand
2019-09-22  3:54 ` [PATCH v3 20/20] cputlb: Pass retaddr to tb_check_watchpoint Richard Henderson
2019-09-23  8:54   ` David Hildenbrand
2019-09-22  4:02 ` [PATCH v3 00/20] Move rom and notdirty handling to cputlb Richard Henderson
2019-09-22  6:46 ` no-reply
2019-09-23  8:23 ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).