* [PATCH v3 00/20] Move rom and notdirty handling to cputlb
@ 2019-09-22 3:54 Richard Henderson
2019-09-22 3:54 ` [PATCH v3 01/20] exec: Use TARGET_PAGE_BITS_MIN for TLB flags Richard Henderson
` (22 more replies)
0 siblings, 23 replies; 53+ messages in thread
From: Richard Henderson @ 2019-09-22 3:54 UTC (permalink / raw)
To: qemu-devel; +Cc: pbonzini, alex.bennee, stefanha, david
Ok! Third time is the charm, because this time it works.
New to v3:
* Covert io_mem_rom with a new TLB_ROM bit.
* This in turn means that there are no longer any special RAM
case along along the MMIO path -- they all have devices on
the other end.
* This in turn means that we can fold the bulk of
memory_region_section_get_iotlb into tlb_set_page_with_attrs,
a couple of redundant tests vs the MemoryRegion.
The result in patch 14 is, IMO, much more understandable.
* Fold away uses of cpu->mem_io_pc in tb_invalidate_phys_page__locked,
the cause of the problems for my previous two patch sets.
BTW, I was correct with my guess in the v2 cover letter that the use
of memory_notdirty_write_{prepare,complete} within atomic_mmu_lookup
must have been broken, for not setting mem_io_pc. :-P
* Fix a missed use of cpu->mem_io_pc in tb_check_watchpoint,
which meant that the previous TLB_WATCHPOINT cleanup was a
titch broken.
The remaining two users of cpu->mem_io_pc are hw/misc/mips_itu.c and
target/i386/helper.c. I haven't looked, but I assume that these are
legitimately on the MMIO path, and there probably isn't a decent way
to remove the uses.
r~
Richard Henderson (20):
exec: Use TARGET_PAGE_BITS_MIN for TLB flags
exec: Split out variable page size support to exec-vary.c
exec: Use const alias for TARGET_PAGE_BITS_VARY
exec: Restrict TARGET_PAGE_BITS_VARY assert to CONFIG_DEBUG_TCG
exec: Promote TARGET_PAGE_MASK to target_long
exec: Tidy TARGET_PAGE_ALIGN
exec: Cache TARGET_PAGE_MASK for TARGET_PAGE_BITS_VARY
cputlb: Disable __always_inline__ without optimization
cputlb: Replace switches in load/store_helper with callback
cputlb: Introduce TLB_BSWAP
exec: Adjust notdirty tracing
cputlb: Move ROM handling from I/O path to TLB path
cputlb: Move NOTDIRTY handling from I/O path to TLB path
cputlb: Partially inline memory_region_section_get_iotlb
cputlb: Merge and move memory_notdirty_write_{prepare,complete}
cputlb: Handle TLB_NOTDIRTY in probe_access
cputlb: Remove cpu->mem_io_vaddr
cputlb: Remove tb_invalidate_phys_page_range is_cpu_write_access
cputlb: Pass retaddr to tb_invalidate_phys_page_fast
cputlb: Pass retaddr to tb_check_watchpoint
Makefile.target | 2 +-
accel/tcg/translate-all.h | 8 +-
include/exec/cpu-all.h | 48 ++--
include/exec/cpu-common.h | 3 -
include/exec/exec-all.h | 6 +-
include/exec/memory-internal.h | 65 ------
include/hw/core/cpu.h | 2 -
include/qemu-common.h | 6 +
include/qemu/compiler.h | 11 +
accel/tcg/cputlb.c | 388 +++++++++++++++++++--------------
accel/tcg/translate-all.c | 51 ++---
exec-vary.c | 88 ++++++++
exec.c | 192 +---------------
hw/core/cpu.c | 1 -
memory.c | 20 --
trace-events | 4 +-
16 files changed, 403 insertions(+), 492 deletions(-)
create mode 100644 exec-vary.c
--
2.17.1
^ permalink raw reply [flat|nested] 53+ messages in thread
* [PATCH v3 01/20] exec: Use TARGET_PAGE_BITS_MIN for TLB flags
2019-09-22 3:54 [PATCH v3 00/20] Move rom and notdirty handling to cputlb Richard Henderson
@ 2019-09-22 3:54 ` Richard Henderson
2019-09-23 8:24 ` David Hildenbrand
2019-09-22 3:54 ` [PATCH v3 02/20] exec: Split out variable page size support to exec-vary.c Richard Henderson
` (21 subsequent siblings)
22 siblings, 1 reply; 53+ messages in thread
From: Richard Henderson @ 2019-09-22 3:54 UTC (permalink / raw)
To: qemu-devel; +Cc: pbonzini, alex.bennee, stefanha, david
These bits do not need to vary with the actual page size
used by the guest.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
include/exec/cpu-all.h | 16 ++++++++++------
1 file changed, 10 insertions(+), 6 deletions(-)
diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index d2d443c4f9..e0c8dc540c 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -317,20 +317,24 @@ CPUArchState *cpu_copy(CPUArchState *env);
#if !defined(CONFIG_USER_ONLY)
-/* Flags stored in the low bits of the TLB virtual address. These are
- * defined so that fast path ram access is all zeros.
+/*
+ * Flags stored in the low bits of the TLB virtual address.
+ * These are defined so that fast path ram access is all zeros.
* The flags all must be between TARGET_PAGE_BITS and
* maximum address alignment bit.
+ *
+ * Use TARGET_PAGE_BITS_MIN so that these bits are constant
+ * when TARGET_PAGE_BITS_VARY is in effect.
*/
/* Zero if TLB entry is valid. */
-#define TLB_INVALID_MASK (1 << (TARGET_PAGE_BITS - 1))
+#define TLB_INVALID_MASK (1 << (TARGET_PAGE_BITS_MIN - 1))
/* Set if TLB entry references a clean RAM page. The iotlb entry will
contain the page physical address. */
-#define TLB_NOTDIRTY (1 << (TARGET_PAGE_BITS - 2))
+#define TLB_NOTDIRTY (1 << (TARGET_PAGE_BITS_MIN - 2))
/* Set if TLB entry is an IO callback. */
-#define TLB_MMIO (1 << (TARGET_PAGE_BITS - 3))
+#define TLB_MMIO (1 << (TARGET_PAGE_BITS_MIN - 3))
/* Set if TLB entry contains a watchpoint. */
-#define TLB_WATCHPOINT (1 << (TARGET_PAGE_BITS - 4))
+#define TLB_WATCHPOINT (1 << (TARGET_PAGE_BITS_MIN - 4))
/* Use this mask to check interception with an alignment mask
* in a TCG backend.
--
2.17.1
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCH v3 02/20] exec: Split out variable page size support to exec-vary.c
2019-09-22 3:54 [PATCH v3 00/20] Move rom and notdirty handling to cputlb Richard Henderson
2019-09-22 3:54 ` [PATCH v3 01/20] exec: Use TARGET_PAGE_BITS_MIN for TLB flags Richard Henderson
@ 2019-09-22 3:54 ` Richard Henderson
2019-09-23 8:26 ` David Hildenbrand
2019-09-22 3:54 ` [PATCH v3 03/20] exec: Use const alias for TARGET_PAGE_BITS_VARY Richard Henderson
` (20 subsequent siblings)
22 siblings, 1 reply; 53+ messages in thread
From: Richard Henderson @ 2019-09-22 3:54 UTC (permalink / raw)
To: qemu-devel; +Cc: pbonzini, alex.bennee, stefanha, david
The next patch will play a trick with "const" that will
confuse the compiler about the uses of target_page_bits
within exec.c. Moving everything to a new file prevents
this confusion.
No functional change so far.
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
Makefile.target | 2 +-
include/qemu-common.h | 6 +++++
exec-vary.c | 57 +++++++++++++++++++++++++++++++++++++++++++
exec.c | 34 --------------------------
4 files changed, 64 insertions(+), 35 deletions(-)
create mode 100644 exec-vary.c
diff --git a/Makefile.target b/Makefile.target
index 5e916230c4..ca3d14efe1 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -107,7 +107,7 @@ obj-y += trace/
#########################################################
# cpu emulator library
-obj-y += exec.o
+obj-y += exec.o exec-vary.o
obj-y += accel/
obj-$(CONFIG_TCG) += tcg/tcg.o tcg/tcg-op.o tcg/tcg-op-vec.o tcg/tcg-op-gvec.o
obj-$(CONFIG_TCG) += tcg/tcg-common.o tcg/optimize.o
diff --git a/include/qemu-common.h b/include/qemu-common.h
index 8d84db90b0..082da59e85 100644
--- a/include/qemu-common.h
+++ b/include/qemu-common.h
@@ -74,6 +74,12 @@ void cpu_exec_step_atomic(CPUState *cpu);
*/
bool set_preferred_target_page_bits(int bits);
+/**
+ * finalize_target_page_bits:
+ * Commit the final value set by set_preferred_target_page_bits.
+ */
+void finalize_target_page_bits(void);
+
/**
* Sends a (part of) iovec down a socket, yielding when the socket is full, or
* Receives data into a (part of) iovec from a socket,
diff --git a/exec-vary.c b/exec-vary.c
new file mode 100644
index 0000000000..48c0ab306c
--- /dev/null
+++ b/exec-vary.c
@@ -0,0 +1,57 @@
+/*
+ * Variable page size handling
+ *
+ * Copyright (c) 2003 Fabrice Bellard
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "exec/exec-all.h"
+
+#ifdef TARGET_PAGE_BITS_VARY
+int target_page_bits;
+bool target_page_bits_decided;
+#endif
+
+bool set_preferred_target_page_bits(int bits)
+{
+ /*
+ * The target page size is the lowest common denominator for all
+ * the CPUs in the system, so we can only make it smaller, never
+ * larger. And we can't make it smaller once we've committed to
+ * a particular size.
+ */
+#ifdef TARGET_PAGE_BITS_VARY
+ assert(bits >= TARGET_PAGE_BITS_MIN);
+ if (target_page_bits == 0 || target_page_bits > bits) {
+ if (target_page_bits_decided) {
+ return false;
+ }
+ target_page_bits = bits;
+ }
+#endif
+ return true;
+}
+
+void finalize_target_page_bits(void)
+{
+#ifdef TARGET_PAGE_BITS_VARY
+ if (target_page_bits == 0) {
+ target_page_bits = TARGET_PAGE_BITS_MIN;
+ }
+ target_page_bits_decided = true;
+#endif
+}
diff --git a/exec.c b/exec.c
index 8b998974f8..33bd0e36c1 100644
--- a/exec.c
+++ b/exec.c
@@ -92,11 +92,6 @@ MemoryRegion io_mem_rom, io_mem_notdirty;
static MemoryRegion io_mem_unassigned;
#endif
-#ifdef TARGET_PAGE_BITS_VARY
-int target_page_bits;
-bool target_page_bits_decided;
-#endif
-
CPUTailQ cpus = QTAILQ_HEAD_INITIALIZER(cpus);
/* current CPU in the current thread. It is only valid inside
@@ -110,37 +105,8 @@ int use_icount;
uintptr_t qemu_host_page_size;
intptr_t qemu_host_page_mask;
-bool set_preferred_target_page_bits(int bits)
-{
- /* The target page size is the lowest common denominator for all
- * the CPUs in the system, so we can only make it smaller, never
- * larger. And we can't make it smaller once we've committed to
- * a particular size.
- */
-#ifdef TARGET_PAGE_BITS_VARY
- assert(bits >= TARGET_PAGE_BITS_MIN);
- if (target_page_bits == 0 || target_page_bits > bits) {
- if (target_page_bits_decided) {
- return false;
- }
- target_page_bits = bits;
- }
-#endif
- return true;
-}
-
#if !defined(CONFIG_USER_ONLY)
-static void finalize_target_page_bits(void)
-{
-#ifdef TARGET_PAGE_BITS_VARY
- if (target_page_bits == 0) {
- target_page_bits = TARGET_PAGE_BITS_MIN;
- }
- target_page_bits_decided = true;
-#endif
-}
-
typedef struct PhysPageEntry PhysPageEntry;
struct PhysPageEntry {
--
2.17.1
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCH v3 03/20] exec: Use const alias for TARGET_PAGE_BITS_VARY
2019-09-22 3:54 [PATCH v3 00/20] Move rom and notdirty handling to cputlb Richard Henderson
2019-09-22 3:54 ` [PATCH v3 01/20] exec: Use TARGET_PAGE_BITS_MIN for TLB flags Richard Henderson
2019-09-22 3:54 ` [PATCH v3 02/20] exec: Split out variable page size support to exec-vary.c Richard Henderson
@ 2019-09-22 3:54 ` Richard Henderson
2019-09-22 3:54 ` [PATCH v3 04/20] exec: Restrict TARGET_PAGE_BITS_VARY assert to CONFIG_DEBUG_TCG Richard Henderson
` (19 subsequent siblings)
22 siblings, 0 replies; 53+ messages in thread
From: Richard Henderson @ 2019-09-22 3:54 UTC (permalink / raw)
To: qemu-devel; +Cc: pbonzini, alex.bennee, stefanha, david
Using a variable that is declared "const" for this tells the
compiler that it may read the value once and assume that it
does not change across function calls.
For target_page_size, this means we have only one assert per
function, and one read of the variable.
This reduces the size of qemu-system-aarch64 by 8k.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
include/exec/cpu-all.h | 10 +++++----
exec-vary.c | 46 ++++++++++++++++++++++++++++++++++--------
2 files changed, 44 insertions(+), 12 deletions(-)
diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index e0c8dc540c..a53b761b48 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -210,10 +210,12 @@ static inline void stl_phys_notdirty(AddressSpace *as, hwaddr addr, uint32_t val
/* page related stuff */
#ifdef TARGET_PAGE_BITS_VARY
-extern bool target_page_bits_decided;
-extern int target_page_bits;
-#define TARGET_PAGE_BITS ({ assert(target_page_bits_decided); \
- target_page_bits; })
+typedef struct {
+ bool decided;
+ int bits;
+} TargetPageBits;
+extern const TargetPageBits target_page;
+#define TARGET_PAGE_BITS (assert(target_page.decided), target_page.bits)
#else
#define TARGET_PAGE_BITS_MIN TARGET_PAGE_BITS
#endif
diff --git a/exec-vary.c b/exec-vary.c
index 48c0ab306c..67cdf57a9c 100644
--- a/exec-vary.c
+++ b/exec-vary.c
@@ -22,8 +22,38 @@
#include "exec/exec-all.h"
#ifdef TARGET_PAGE_BITS_VARY
-int target_page_bits;
-bool target_page_bits_decided;
+/*
+ * We want to declare the "target_page" variable as const, which tells
+ * the compiler that it can cache any value that it reads across calls.
+ * This avoids multiple assertions and multiple reads within any one user.
+ *
+ * This works because we initialize the target_page data very early, in a
+ * location far removed from the functions that require the final results.
+ *
+ * This also requires that we have a non-constant symbol by which we can
+ * perform the actual initialization, and which forces the data to be
+ * allocated within writable memory. Thus "init_target_page", and we use
+ * that symbol exclusively in the two functions that initialize this value.
+ *
+ * The "target_page" symbol is created as an alias of "init_target_page".
+ */
+static TargetPageBits init_target_page;
+
+/*
+ * Note that this is *not* a redundant decl, this is the definition of
+ * the "target_page" symbol. The syntax for this definition requires
+ * the use of the extern keyword. This seems to be a GCC bug in
+ * either the syntax for the alias attribute or in -Wredundant-decls.
+ *
+ * See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91765
+ */
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wredundant-decls"
+
+extern const TargetPageBits target_page
+ __attribute__((alias("init_target_page")));
+
+#pragma GCC diagnostic pop
#endif
bool set_preferred_target_page_bits(int bits)
@@ -36,11 +66,11 @@ bool set_preferred_target_page_bits(int bits)
*/
#ifdef TARGET_PAGE_BITS_VARY
assert(bits >= TARGET_PAGE_BITS_MIN);
- if (target_page_bits == 0 || target_page_bits > bits) {
- if (target_page_bits_decided) {
+ if (init_target_page.bits == 0 || init_target_page.bits > bits) {
+ if (init_target_page.decided) {
return false;
}
- target_page_bits = bits;
+ init_target_page.bits = bits;
}
#endif
return true;
@@ -49,9 +79,9 @@ bool set_preferred_target_page_bits(int bits)
void finalize_target_page_bits(void)
{
#ifdef TARGET_PAGE_BITS_VARY
- if (target_page_bits == 0) {
- target_page_bits = TARGET_PAGE_BITS_MIN;
+ if (init_target_page.bits == 0) {
+ init_target_page.bits = TARGET_PAGE_BITS_MIN;
}
- target_page_bits_decided = true;
+ init_target_page.decided = true;
#endif
}
--
2.17.1
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCH v3 04/20] exec: Restrict TARGET_PAGE_BITS_VARY assert to CONFIG_DEBUG_TCG
2019-09-22 3:54 [PATCH v3 00/20] Move rom and notdirty handling to cputlb Richard Henderson
` (2 preceding siblings ...)
2019-09-22 3:54 ` [PATCH v3 03/20] exec: Use const alias for TARGET_PAGE_BITS_VARY Richard Henderson
@ 2019-09-22 3:54 ` Richard Henderson
2019-09-22 3:54 ` [PATCH v3 05/20] exec: Promote TARGET_PAGE_MASK to target_long Richard Henderson
` (18 subsequent siblings)
22 siblings, 0 replies; 53+ messages in thread
From: Richard Henderson @ 2019-09-22 3:54 UTC (permalink / raw)
To: qemu-devel; +Cc: pbonzini, alex.bennee, stefanha, david
This reduces the size of a release build by about 10k.
Noticably, within the tlb miss helpers.
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
include/exec/cpu-all.h | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index a53b761b48..b11ee1f711 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -215,7 +215,11 @@ typedef struct {
int bits;
} TargetPageBits;
extern const TargetPageBits target_page;
-#define TARGET_PAGE_BITS (assert(target_page.decided), target_page.bits)
+# ifdef CONFIG_DEBUG_TCG
+# define TARGET_PAGE_BITS (assert(target_page.decided), target_page.bits)
+# else
+# define TARGET_PAGE_BITS target_page.bits
+# endif
#else
#define TARGET_PAGE_BITS_MIN TARGET_PAGE_BITS
#endif
--
2.17.1
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCH v3 05/20] exec: Promote TARGET_PAGE_MASK to target_long
2019-09-22 3:54 [PATCH v3 00/20] Move rom and notdirty handling to cputlb Richard Henderson
` (3 preceding siblings ...)
2019-09-22 3:54 ` [PATCH v3 04/20] exec: Restrict TARGET_PAGE_BITS_VARY assert to CONFIG_DEBUG_TCG Richard Henderson
@ 2019-09-22 3:54 ` Richard Henderson
2019-09-23 8:30 ` David Hildenbrand
2019-09-22 3:54 ` [PATCH v3 06/20] exec: Tidy TARGET_PAGE_ALIGN Richard Henderson
` (17 subsequent siblings)
22 siblings, 1 reply; 53+ messages in thread
From: Richard Henderson @ 2019-09-22 3:54 UTC (permalink / raw)
To: qemu-devel; +Cc: pbonzini, alex.bennee, stefanha, david
There are some uint64_t uses that expect TARGET_PAGE_MASK to
extend for a 32-bit, so this must continue to be a signed type.
Define based on TARGET_PAGE_BITS not TARGET_PAGE_SIZE; this
will make a following patch more clear.
This should not have a functional effect so far.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
include/exec/cpu-all.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index b11ee1f711..34d36cebca 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -225,7 +225,7 @@ extern const TargetPageBits target_page;
#endif
#define TARGET_PAGE_SIZE (1 << TARGET_PAGE_BITS)
-#define TARGET_PAGE_MASK ~(TARGET_PAGE_SIZE - 1)
+#define TARGET_PAGE_MASK ((target_long)-1 << TARGET_PAGE_BITS)
#define TARGET_PAGE_ALIGN(addr) (((addr) + TARGET_PAGE_SIZE - 1) & TARGET_PAGE_MASK)
/* Using intptr_t ensures that qemu_*_page_mask is sign-extended even
--
2.17.1
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCH v3 06/20] exec: Tidy TARGET_PAGE_ALIGN
2019-09-22 3:54 [PATCH v3 00/20] Move rom and notdirty handling to cputlb Richard Henderson
` (4 preceding siblings ...)
2019-09-22 3:54 ` [PATCH v3 05/20] exec: Promote TARGET_PAGE_MASK to target_long Richard Henderson
@ 2019-09-22 3:54 ` Richard Henderson
2019-09-23 8:30 ` David Hildenbrand
2019-09-22 3:54 ` [PATCH v3 07/20] exec: Cache TARGET_PAGE_MASK for TARGET_PAGE_BITS_VARY Richard Henderson
` (16 subsequent siblings)
22 siblings, 1 reply; 53+ messages in thread
From: Richard Henderson @ 2019-09-22 3:54 UTC (permalink / raw)
To: qemu-devel; +Cc: pbonzini, alex.bennee, stefanha, david
Use TARGET_PAGE_MASK twice instead of TARGET_PAGE_SIZE once.
This is functionally identical, but will help a following patch.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
include/exec/cpu-all.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index 34d36cebca..5246770271 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -226,7 +226,8 @@ extern const TargetPageBits target_page;
#define TARGET_PAGE_SIZE (1 << TARGET_PAGE_BITS)
#define TARGET_PAGE_MASK ((target_long)-1 << TARGET_PAGE_BITS)
-#define TARGET_PAGE_ALIGN(addr) (((addr) + TARGET_PAGE_SIZE - 1) & TARGET_PAGE_MASK)
+#define TARGET_PAGE_ALIGN(addr) \
+ (((addr) + ~TARGET_PAGE_MASK) & TARGET_PAGE_MASK)
/* Using intptr_t ensures that qemu_*_page_mask is sign-extended even
* when intptr_t is 32-bit and we are aligning a long long.
--
2.17.1
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCH v3 07/20] exec: Cache TARGET_PAGE_MASK for TARGET_PAGE_BITS_VARY
2019-09-22 3:54 [PATCH v3 00/20] Move rom and notdirty handling to cputlb Richard Henderson
` (5 preceding siblings ...)
2019-09-22 3:54 ` [PATCH v3 06/20] exec: Tidy TARGET_PAGE_ALIGN Richard Henderson
@ 2019-09-22 3:54 ` Richard Henderson
2019-09-23 8:31 ` David Hildenbrand
2019-09-22 3:54 ` [PATCH v3 08/20] cputlb: Disable __always_inline__ without optimization Richard Henderson
` (15 subsequent siblings)
22 siblings, 1 reply; 53+ messages in thread
From: Richard Henderson @ 2019-09-22 3:54 UTC (permalink / raw)
To: qemu-devel; +Cc: pbonzini, alex.bennee, stefanha, david
This eliminates a set of runtime shifts. It turns out that we
require TARGET_PAGE_MASK more often than TARGET_PAGE_SIZE, so
redefine TARGET_PAGE_SIZE based on TARGET_PAGE_MASK instead of
the other way around.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
include/exec/cpu-all.h | 8 ++++++--
exec-vary.c | 1 +
2 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index 5246770271..2db73c7a27 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -213,19 +213,23 @@ static inline void stl_phys_notdirty(AddressSpace *as, hwaddr addr, uint32_t val
typedef struct {
bool decided;
int bits;
+ target_long mask;
} TargetPageBits;
extern const TargetPageBits target_page;
# ifdef CONFIG_DEBUG_TCG
# define TARGET_PAGE_BITS (assert(target_page.decided), target_page.bits)
+# define TARGET_PAGE_MASK (assert(target_page.decided), target_page.mask)
# else
# define TARGET_PAGE_BITS target_page.bits
+# define TARGET_PAGE_MASK target_page.mask
# endif
+# define TARGET_PAGE_SIZE ((int)-TARGET_PAGE_MASK)
#else
#define TARGET_PAGE_BITS_MIN TARGET_PAGE_BITS
+#define TARGET_PAGE_SIZE (1 << TARGET_PAGE_BITS)
+#define TARGET_PAGE_MASK ((target_long)-1 << TARGET_PAGE_BITS)
#endif
-#define TARGET_PAGE_SIZE (1 << TARGET_PAGE_BITS)
-#define TARGET_PAGE_MASK ((target_long)-1 << TARGET_PAGE_BITS)
#define TARGET_PAGE_ALIGN(addr) \
(((addr) + ~TARGET_PAGE_MASK) & TARGET_PAGE_MASK)
diff --git a/exec-vary.c b/exec-vary.c
index 67cdf57a9c..26daf281f2 100644
--- a/exec-vary.c
+++ b/exec-vary.c
@@ -83,5 +83,6 @@ void finalize_target_page_bits(void)
init_target_page.bits = TARGET_PAGE_BITS_MIN;
}
init_target_page.decided = true;
+ init_target_page.mask = (target_long)-1 << init_target_page.bits;
#endif
}
--
2.17.1
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCH v3 08/20] cputlb: Disable __always_inline__ without optimization
2019-09-22 3:54 [PATCH v3 00/20] Move rom and notdirty handling to cputlb Richard Henderson
` (6 preceding siblings ...)
2019-09-22 3:54 ` [PATCH v3 07/20] exec: Cache TARGET_PAGE_MASK for TARGET_PAGE_BITS_VARY Richard Henderson
@ 2019-09-22 3:54 ` Richard Henderson
2019-09-23 9:18 ` Philippe Mathieu-Daudé
2019-09-23 9:45 ` Paolo Bonzini
2019-09-22 3:54 ` [PATCH v3 09/20] cputlb: Replace switches in load/store_helper with callback Richard Henderson
` (14 subsequent siblings)
22 siblings, 2 replies; 53+ messages in thread
From: Richard Henderson @ 2019-09-22 3:54 UTC (permalink / raw)
To: qemu-devel; +Cc: pbonzini, alex.bennee, stefanha, david
This forced inlining can result in missing symbols,
which makes a debugging build harder to follow.
Reviewed-by: David Hildenbrand <david@redhat.com>
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
include/qemu/compiler.h | 11 +++++++++++
accel/tcg/cputlb.c | 4 ++--
2 files changed, 13 insertions(+), 2 deletions(-)
diff --git a/include/qemu/compiler.h b/include/qemu/compiler.h
index 09fc44cca4..d6d400c523 100644
--- a/include/qemu/compiler.h
+++ b/include/qemu/compiler.h
@@ -170,6 +170,17 @@
# define QEMU_NONSTRING
#endif
+/*
+ * Forced inlining may be desired to encourage constant propagation
+ * of function parameters. However, it can also make debugging harder,
+ * so disable it for a non-optimizing build.
+ */
+#if defined(__OPTIMIZE__) && __has_attribute(always_inline)
+#define QEMU_ALWAYS_INLINE __attribute__((always_inline))
+#else
+#define QEMU_ALWAYS_INLINE
+#endif
+
/* Implement C11 _Generic via GCC builtins. Example:
*
* QEMU_GENERIC(x, (float, sinf), (long double, sinl), sin) (x)
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index abae79650c..2222b87764 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -1281,7 +1281,7 @@ static void *atomic_mmu_lookup(CPUArchState *env, target_ulong addr,
typedef uint64_t FullLoadHelper(CPUArchState *env, target_ulong addr,
TCGMemOpIdx oi, uintptr_t retaddr);
-static inline uint64_t __attribute__((always_inline))
+static inline uint64_t QEMU_ALWAYS_INLINE
load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
uintptr_t retaddr, MemOp op, bool code_read,
FullLoadHelper *full_load)
@@ -1530,7 +1530,7 @@ tcg_target_ulong helper_be_ldsl_mmu(CPUArchState *env, target_ulong addr,
* Store Helpers
*/
-static inline void __attribute__((always_inline))
+static inline void QEMU_ALWAYS_INLINE
store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
TCGMemOpIdx oi, uintptr_t retaddr, MemOp op)
{
--
2.17.1
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCH v3 09/20] cputlb: Replace switches in load/store_helper with callback
2019-09-22 3:54 [PATCH v3 00/20] Move rom and notdirty handling to cputlb Richard Henderson
` (7 preceding siblings ...)
2019-09-22 3:54 ` [PATCH v3 08/20] cputlb: Disable __always_inline__ without optimization Richard Henderson
@ 2019-09-22 3:54 ` Richard Henderson
2019-09-23 8:32 ` David Hildenbrand
` (2 more replies)
2019-09-22 3:54 ` [PATCH v3 10/20] cputlb: Introduce TLB_BSWAP Richard Henderson
` (13 subsequent siblings)
22 siblings, 3 replies; 53+ messages in thread
From: Richard Henderson @ 2019-09-22 3:54 UTC (permalink / raw)
To: qemu-devel; +Cc: pbonzini, alex.bennee, stefanha, david
Add a function parameter to perform the actual load/store to ram.
With optimization, this results in identical code.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
accel/tcg/cputlb.c | 159 +++++++++++++++++++++++----------------------
1 file changed, 83 insertions(+), 76 deletions(-)
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 2222b87764..b4a63d3928 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -1280,11 +1280,38 @@ static void *atomic_mmu_lookup(CPUArchState *env, target_ulong addr,
typedef uint64_t FullLoadHelper(CPUArchState *env, target_ulong addr,
TCGMemOpIdx oi, uintptr_t retaddr);
+typedef uint64_t LoadHelper(const void *);
+
+/* Wrap the unaligned load helpers to that they have a common signature. */
+static inline uint64_t wrap_ldub(const void *haddr)
+{
+ return ldub_p(haddr);
+}
+
+static inline uint64_t wrap_lduw_be(const void *haddr)
+{
+ return lduw_be_p(haddr);
+}
+
+static inline uint64_t wrap_lduw_le(const void *haddr)
+{
+ return lduw_le_p(haddr);
+}
+
+static inline uint64_t wrap_ldul_be(const void *haddr)
+{
+ return (uint32_t)ldl_be_p(haddr);
+}
+
+static inline uint64_t wrap_ldul_le(const void *haddr)
+{
+ return (uint32_t)ldl_le_p(haddr);
+}
static inline uint64_t QEMU_ALWAYS_INLINE
load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
uintptr_t retaddr, MemOp op, bool code_read,
- FullLoadHelper *full_load)
+ FullLoadHelper *full_load, LoadHelper *direct)
{
uintptr_t mmu_idx = get_mmuidx(oi);
uintptr_t index = tlb_index(env, mmu_idx, addr);
@@ -1373,33 +1400,7 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
do_aligned_access:
haddr = (void *)((uintptr_t)addr + entry->addend);
- switch (op) {
- case MO_UB:
- res = ldub_p(haddr);
- break;
- case MO_BEUW:
- res = lduw_be_p(haddr);
- break;
- case MO_LEUW:
- res = lduw_le_p(haddr);
- break;
- case MO_BEUL:
- res = (uint32_t)ldl_be_p(haddr);
- break;
- case MO_LEUL:
- res = (uint32_t)ldl_le_p(haddr);
- break;
- case MO_BEQ:
- res = ldq_be_p(haddr);
- break;
- case MO_LEQ:
- res = ldq_le_p(haddr);
- break;
- default:
- g_assert_not_reached();
- }
-
- return res;
+ return direct(haddr);
}
/*
@@ -1415,7 +1416,8 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
static uint64_t full_ldub_mmu(CPUArchState *env, target_ulong addr,
TCGMemOpIdx oi, uintptr_t retaddr)
{
- return load_helper(env, addr, oi, retaddr, MO_UB, false, full_ldub_mmu);
+ return load_helper(env, addr, oi, retaddr, MO_UB, false,
+ full_ldub_mmu, wrap_ldub);
}
tcg_target_ulong helper_ret_ldub_mmu(CPUArchState *env, target_ulong addr,
@@ -1428,7 +1430,7 @@ static uint64_t full_le_lduw_mmu(CPUArchState *env, target_ulong addr,
TCGMemOpIdx oi, uintptr_t retaddr)
{
return load_helper(env, addr, oi, retaddr, MO_LEUW, false,
- full_le_lduw_mmu);
+ full_le_lduw_mmu, wrap_lduw_le);
}
tcg_target_ulong helper_le_lduw_mmu(CPUArchState *env, target_ulong addr,
@@ -1441,7 +1443,7 @@ static uint64_t full_be_lduw_mmu(CPUArchState *env, target_ulong addr,
TCGMemOpIdx oi, uintptr_t retaddr)
{
return load_helper(env, addr, oi, retaddr, MO_BEUW, false,
- full_be_lduw_mmu);
+ full_be_lduw_mmu, wrap_lduw_be);
}
tcg_target_ulong helper_be_lduw_mmu(CPUArchState *env, target_ulong addr,
@@ -1454,7 +1456,7 @@ static uint64_t full_le_ldul_mmu(CPUArchState *env, target_ulong addr,
TCGMemOpIdx oi, uintptr_t retaddr)
{
return load_helper(env, addr, oi, retaddr, MO_LEUL, false,
- full_le_ldul_mmu);
+ full_le_ldul_mmu, wrap_ldul_le);
}
tcg_target_ulong helper_le_ldul_mmu(CPUArchState *env, target_ulong addr,
@@ -1467,7 +1469,7 @@ static uint64_t full_be_ldul_mmu(CPUArchState *env, target_ulong addr,
TCGMemOpIdx oi, uintptr_t retaddr)
{
return load_helper(env, addr, oi, retaddr, MO_BEUL, false,
- full_be_ldul_mmu);
+ full_be_ldul_mmu, wrap_ldul_be);
}
tcg_target_ulong helper_be_ldul_mmu(CPUArchState *env, target_ulong addr,
@@ -1480,14 +1482,14 @@ uint64_t helper_le_ldq_mmu(CPUArchState *env, target_ulong addr,
TCGMemOpIdx oi, uintptr_t retaddr)
{
return load_helper(env, addr, oi, retaddr, MO_LEQ, false,
- helper_le_ldq_mmu);
+ helper_le_ldq_mmu, ldq_le_p);
}
uint64_t helper_be_ldq_mmu(CPUArchState *env, target_ulong addr,
TCGMemOpIdx oi, uintptr_t retaddr)
{
return load_helper(env, addr, oi, retaddr, MO_BEQ, false,
- helper_be_ldq_mmu);
+ helper_be_ldq_mmu, ldq_be_p);
}
/*
@@ -1530,9 +1532,38 @@ tcg_target_ulong helper_be_ldsl_mmu(CPUArchState *env, target_ulong addr,
* Store Helpers
*/
+typedef void StoreHelper(void *, uint64_t);
+
+/* Wrap the unaligned store helpers to that they have a common signature. */
+static inline void wrap_stb(void *haddr, uint64_t val)
+{
+ stb_p(haddr, val);
+}
+
+static inline void wrap_stw_be(void *haddr, uint64_t val)
+{
+ stw_be_p(haddr, val);
+}
+
+static inline void wrap_stw_le(void *haddr, uint64_t val)
+{
+ stw_le_p(haddr, val);
+}
+
+static inline void wrap_stl_be(void *haddr, uint64_t val)
+{
+ stl_be_p(haddr, val);
+}
+
+static inline void wrap_stl_le(void *haddr, uint64_t val)
+{
+ stl_le_p(haddr, val);
+}
+
static inline void QEMU_ALWAYS_INLINE
store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
- TCGMemOpIdx oi, uintptr_t retaddr, MemOp op)
+ TCGMemOpIdx oi, uintptr_t retaddr, MemOp op,
+ StoreHelper *direct)
{
uintptr_t mmu_idx = get_mmuidx(oi);
uintptr_t index = tlb_index(env, mmu_idx, addr);
@@ -1657,74 +1688,49 @@ store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
do_aligned_access:
haddr = (void *)((uintptr_t)addr + entry->addend);
- switch (op) {
- case MO_UB:
- stb_p(haddr, val);
- break;
- case MO_BEUW:
- stw_be_p(haddr, val);
- break;
- case MO_LEUW:
- stw_le_p(haddr, val);
- break;
- case MO_BEUL:
- stl_be_p(haddr, val);
- break;
- case MO_LEUL:
- stl_le_p(haddr, val);
- break;
- case MO_BEQ:
- stq_be_p(haddr, val);
- break;
- case MO_LEQ:
- stq_le_p(haddr, val);
- break;
- default:
- g_assert_not_reached();
- break;
- }
+ direct(haddr, val);
}
void helper_ret_stb_mmu(CPUArchState *env, target_ulong addr, uint8_t val,
TCGMemOpIdx oi, uintptr_t retaddr)
{
- store_helper(env, addr, val, oi, retaddr, MO_UB);
+ store_helper(env, addr, val, oi, retaddr, MO_UB, wrap_stb);
}
void helper_le_stw_mmu(CPUArchState *env, target_ulong addr, uint16_t val,
TCGMemOpIdx oi, uintptr_t retaddr)
{
- store_helper(env, addr, val, oi, retaddr, MO_LEUW);
+ store_helper(env, addr, val, oi, retaddr, MO_LEUW, wrap_stw_le);
}
void helper_be_stw_mmu(CPUArchState *env, target_ulong addr, uint16_t val,
TCGMemOpIdx oi, uintptr_t retaddr)
{
- store_helper(env, addr, val, oi, retaddr, MO_BEUW);
+ store_helper(env, addr, val, oi, retaddr, MO_BEUW, wrap_stw_be);
}
void helper_le_stl_mmu(CPUArchState *env, target_ulong addr, uint32_t val,
TCGMemOpIdx oi, uintptr_t retaddr)
{
- store_helper(env, addr, val, oi, retaddr, MO_LEUL);
+ store_helper(env, addr, val, oi, retaddr, MO_LEUL, wrap_stl_le);
}
void helper_be_stl_mmu(CPUArchState *env, target_ulong addr, uint32_t val,
TCGMemOpIdx oi, uintptr_t retaddr)
{
- store_helper(env, addr, val, oi, retaddr, MO_BEUL);
+ store_helper(env, addr, val, oi, retaddr, MO_BEUL, wrap_stl_be);
}
void helper_le_stq_mmu(CPUArchState *env, target_ulong addr, uint64_t val,
TCGMemOpIdx oi, uintptr_t retaddr)
{
- store_helper(env, addr, val, oi, retaddr, MO_LEQ);
+ store_helper(env, addr, val, oi, retaddr, MO_LEQ, stq_le_p);
}
void helper_be_stq_mmu(CPUArchState *env, target_ulong addr, uint64_t val,
TCGMemOpIdx oi, uintptr_t retaddr)
{
- store_helper(env, addr, val, oi, retaddr, MO_BEQ);
+ store_helper(env, addr, val, oi, retaddr, MO_BEQ, stq_be_p);
}
/* First set of helpers allows passing in of OI and RETADDR. This makes
@@ -1789,7 +1795,8 @@ void helper_be_stq_mmu(CPUArchState *env, target_ulong addr, uint64_t val,
static uint64_t full_ldub_cmmu(CPUArchState *env, target_ulong addr,
TCGMemOpIdx oi, uintptr_t retaddr)
{
- return load_helper(env, addr, oi, retaddr, MO_8, true, full_ldub_cmmu);
+ return load_helper(env, addr, oi, retaddr, MO_8, true,
+ full_ldub_cmmu, wrap_ldub);
}
uint8_t helper_ret_ldb_cmmu(CPUArchState *env, target_ulong addr,
@@ -1802,7 +1809,7 @@ static uint64_t full_le_lduw_cmmu(CPUArchState *env, target_ulong addr,
TCGMemOpIdx oi, uintptr_t retaddr)
{
return load_helper(env, addr, oi, retaddr, MO_LEUW, true,
- full_le_lduw_cmmu);
+ full_le_lduw_cmmu, wrap_lduw_le);
}
uint16_t helper_le_ldw_cmmu(CPUArchState *env, target_ulong addr,
@@ -1815,7 +1822,7 @@ static uint64_t full_be_lduw_cmmu(CPUArchState *env, target_ulong addr,
TCGMemOpIdx oi, uintptr_t retaddr)
{
return load_helper(env, addr, oi, retaddr, MO_BEUW, true,
- full_be_lduw_cmmu);
+ full_be_lduw_cmmu, wrap_lduw_be);
}
uint16_t helper_be_ldw_cmmu(CPUArchState *env, target_ulong addr,
@@ -1828,7 +1835,7 @@ static uint64_t full_le_ldul_cmmu(CPUArchState *env, target_ulong addr,
TCGMemOpIdx oi, uintptr_t retaddr)
{
return load_helper(env, addr, oi, retaddr, MO_LEUL, true,
- full_le_ldul_cmmu);
+ full_le_ldul_cmmu, wrap_ldul_le);
}
uint32_t helper_le_ldl_cmmu(CPUArchState *env, target_ulong addr,
@@ -1841,7 +1848,7 @@ static uint64_t full_be_ldul_cmmu(CPUArchState *env, target_ulong addr,
TCGMemOpIdx oi, uintptr_t retaddr)
{
return load_helper(env, addr, oi, retaddr, MO_BEUL, true,
- full_be_ldul_cmmu);
+ full_be_ldul_cmmu, wrap_ldul_be);
}
uint32_t helper_be_ldl_cmmu(CPUArchState *env, target_ulong addr,
@@ -1854,12 +1861,12 @@ uint64_t helper_le_ldq_cmmu(CPUArchState *env, target_ulong addr,
TCGMemOpIdx oi, uintptr_t retaddr)
{
return load_helper(env, addr, oi, retaddr, MO_LEQ, true,
- helper_le_ldq_cmmu);
+ helper_le_ldq_cmmu, ldq_le_p);
}
uint64_t helper_be_ldq_cmmu(CPUArchState *env, target_ulong addr,
TCGMemOpIdx oi, uintptr_t retaddr)
{
return load_helper(env, addr, oi, retaddr, MO_BEQ, true,
- helper_be_ldq_cmmu);
+ helper_be_ldq_cmmu, ldq_be_p);
}
--
2.17.1
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCH v3 10/20] cputlb: Introduce TLB_BSWAP
2019-09-22 3:54 [PATCH v3 00/20] Move rom and notdirty handling to cputlb Richard Henderson
` (8 preceding siblings ...)
2019-09-22 3:54 ` [PATCH v3 09/20] cputlb: Replace switches in load/store_helper with callback Richard Henderson
@ 2019-09-22 3:54 ` Richard Henderson
2019-09-23 8:33 ` David Hildenbrand
2019-09-22 3:54 ` [PATCH v3 11/20] exec: Adjust notdirty tracing Richard Henderson
` (12 subsequent siblings)
22 siblings, 1 reply; 53+ messages in thread
From: Richard Henderson @ 2019-09-22 3:54 UTC (permalink / raw)
To: qemu-devel; +Cc: pbonzini, alex.bennee, stefanha, david
Handle bswap on ram directly in load/store_helper. This fixes a
bug with the previous implementation in that one cannot use the
I/O path for RAM.
Fixes: a26fc6f5152b47f1
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
include/exec/cpu-all.h | 4 +-
accel/tcg/cputlb.c | 108 +++++++++++++++++++++--------------------
2 files changed, 59 insertions(+), 53 deletions(-)
diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index 2db73c7a27..1ebd1b59ab 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -346,12 +346,14 @@ CPUArchState *cpu_copy(CPUArchState *env);
#define TLB_MMIO (1 << (TARGET_PAGE_BITS_MIN - 3))
/* Set if TLB entry contains a watchpoint. */
#define TLB_WATCHPOINT (1 << (TARGET_PAGE_BITS_MIN - 4))
+/* Set if TLB entry requires byte swap. */
+#define TLB_BSWAP (1 << (TARGET_PAGE_BITS_MIN - 5))
/* Use this mask to check interception with an alignment mask
* in a TCG backend.
*/
#define TLB_FLAGS_MASK \
- (TLB_INVALID_MASK | TLB_NOTDIRTY | TLB_MMIO | TLB_WATCHPOINT)
+ (TLB_INVALID_MASK | TLB_NOTDIRTY | TLB_MMIO | TLB_WATCHPOINT | TLB_BSWAP)
/**
* tlb_hit_page: return true if page aligned @addr is a hit against the
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index b4a63d3928..cb603917a2 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -737,8 +737,7 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
address |= TLB_INVALID_MASK;
}
if (attrs.byte_swap) {
- /* Force the access through the I/O slow path. */
- address |= TLB_MMIO;
+ address |= TLB_BSWAP;
}
if (!memory_region_is_ram(section->mr) &&
!memory_region_is_romd(section->mr)) {
@@ -901,10 +900,6 @@ static uint64_t io_readx(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
bool locked = false;
MemTxResult r;
- if (iotlbentry->attrs.byte_swap) {
- op ^= MO_BSWAP;
- }
-
section = iotlb_to_section(cpu, iotlbentry->addr, iotlbentry->attrs);
mr = section->mr;
mr_offset = (iotlbentry->addr & TARGET_PAGE_MASK) + addr;
@@ -947,10 +942,6 @@ static void io_writex(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
bool locked = false;
MemTxResult r;
- if (iotlbentry->attrs.byte_swap) {
- op ^= MO_BSWAP;
- }
-
section = iotlb_to_section(cpu, iotlbentry->addr, iotlbentry->attrs);
mr = section->mr;
mr_offset = (iotlbentry->addr & TARGET_PAGE_MASK) + addr;
@@ -1133,8 +1124,8 @@ void *probe_access(CPUArchState *env, target_ulong addr, int size,
wp_access, retaddr);
}
- if (tlb_addr & (TLB_NOTDIRTY | TLB_MMIO)) {
- /* I/O access */
+ /* Reject I/O access, or other required slow-path. */
+ if (tlb_addr & (TLB_NOTDIRTY | TLB_MMIO | TLB_BSWAP)) {
return NULL;
}
@@ -1311,7 +1302,8 @@ static inline uint64_t wrap_ldul_le(const void *haddr)
static inline uint64_t QEMU_ALWAYS_INLINE
load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
uintptr_t retaddr, MemOp op, bool code_read,
- FullLoadHelper *full_load, LoadHelper *direct)
+ FullLoadHelper *full_load, LoadHelper *direct,
+ LoadHelper *direct_swap)
{
uintptr_t mmu_idx = get_mmuidx(oi);
uintptr_t index = tlb_index(env, mmu_idx, addr);
@@ -1361,17 +1353,21 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
/* On watchpoint hit, this will longjmp out. */
cpu_check_watchpoint(env_cpu(env), addr, size,
iotlbentry->attrs, BP_MEM_READ, retaddr);
-
- /* The backing page may or may not require I/O. */
- tlb_addr &= ~TLB_WATCHPOINT;
- if ((tlb_addr & ~TARGET_PAGE_MASK) == 0) {
- goto do_aligned_access;
- }
}
/* Handle I/O access. */
- return io_readx(env, iotlbentry, mmu_idx, addr,
- retaddr, access_type, op);
+ if (likely(tlb_addr & TLB_MMIO)) {
+ return io_readx(env, iotlbentry, mmu_idx, addr,
+ retaddr, access_type,
+ op ^ (tlb_addr & TLB_BSWAP ? MO_BSWAP : 0));
+ }
+
+ haddr = (void *)((uintptr_t)addr + entry->addend);
+
+ if (unlikely(tlb_addr & TLB_BSWAP)) {
+ return direct_swap(haddr);
+ }
+ return direct(haddr);
}
/* Handle slow unaligned access (it spans two pages or IO). */
@@ -1398,7 +1394,6 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
return res & MAKE_64BIT_MASK(0, size * 8);
}
- do_aligned_access:
haddr = (void *)((uintptr_t)addr + entry->addend);
return direct(haddr);
}
@@ -1417,7 +1412,7 @@ static uint64_t full_ldub_mmu(CPUArchState *env, target_ulong addr,
TCGMemOpIdx oi, uintptr_t retaddr)
{
return load_helper(env, addr, oi, retaddr, MO_UB, false,
- full_ldub_mmu, wrap_ldub);
+ full_ldub_mmu, wrap_ldub, wrap_ldub);
}
tcg_target_ulong helper_ret_ldub_mmu(CPUArchState *env, target_ulong addr,
@@ -1430,7 +1425,7 @@ static uint64_t full_le_lduw_mmu(CPUArchState *env, target_ulong addr,
TCGMemOpIdx oi, uintptr_t retaddr)
{
return load_helper(env, addr, oi, retaddr, MO_LEUW, false,
- full_le_lduw_mmu, wrap_lduw_le);
+ full_le_lduw_mmu, wrap_lduw_le, wrap_lduw_be);
}
tcg_target_ulong helper_le_lduw_mmu(CPUArchState *env, target_ulong addr,
@@ -1443,7 +1438,7 @@ static uint64_t full_be_lduw_mmu(CPUArchState *env, target_ulong addr,
TCGMemOpIdx oi, uintptr_t retaddr)
{
return load_helper(env, addr, oi, retaddr, MO_BEUW, false,
- full_be_lduw_mmu, wrap_lduw_be);
+ full_be_lduw_mmu, wrap_lduw_be, wrap_lduw_le);
}
tcg_target_ulong helper_be_lduw_mmu(CPUArchState *env, target_ulong addr,
@@ -1456,7 +1451,7 @@ static uint64_t full_le_ldul_mmu(CPUArchState *env, target_ulong addr,
TCGMemOpIdx oi, uintptr_t retaddr)
{
return load_helper(env, addr, oi, retaddr, MO_LEUL, false,
- full_le_ldul_mmu, wrap_ldul_le);
+ full_le_ldul_mmu, wrap_ldul_le, wrap_ldul_be);
}
tcg_target_ulong helper_le_ldul_mmu(CPUArchState *env, target_ulong addr,
@@ -1469,7 +1464,7 @@ static uint64_t full_be_ldul_mmu(CPUArchState *env, target_ulong addr,
TCGMemOpIdx oi, uintptr_t retaddr)
{
return load_helper(env, addr, oi, retaddr, MO_BEUL, false,
- full_be_ldul_mmu, wrap_ldul_be);
+ full_be_ldul_mmu, wrap_ldul_be, wrap_ldul_le);
}
tcg_target_ulong helper_be_ldul_mmu(CPUArchState *env, target_ulong addr,
@@ -1482,14 +1477,14 @@ uint64_t helper_le_ldq_mmu(CPUArchState *env, target_ulong addr,
TCGMemOpIdx oi, uintptr_t retaddr)
{
return load_helper(env, addr, oi, retaddr, MO_LEQ, false,
- helper_le_ldq_mmu, ldq_le_p);
+ helper_le_ldq_mmu, ldq_le_p, ldq_be_p);
}
uint64_t helper_be_ldq_mmu(CPUArchState *env, target_ulong addr,
TCGMemOpIdx oi, uintptr_t retaddr)
{
return load_helper(env, addr, oi, retaddr, MO_BEQ, false,
- helper_be_ldq_mmu, ldq_be_p);
+ helper_be_ldq_mmu, ldq_be_p, ldq_le_p);
}
/*
@@ -1563,7 +1558,7 @@ static inline void wrap_stl_le(void *haddr, uint64_t val)
static inline void QEMU_ALWAYS_INLINE
store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
TCGMemOpIdx oi, uintptr_t retaddr, MemOp op,
- StoreHelper *direct)
+ StoreHelper *direct, StoreHelper *direct_swap)
{
uintptr_t mmu_idx = get_mmuidx(oi);
uintptr_t index = tlb_index(env, mmu_idx, addr);
@@ -1608,16 +1603,22 @@ store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
/* On watchpoint hit, this will longjmp out. */
cpu_check_watchpoint(env_cpu(env), addr, size,
iotlbentry->attrs, BP_MEM_WRITE, retaddr);
-
- /* The backing page may or may not require I/O. */
- tlb_addr &= ~TLB_WATCHPOINT;
- if ((tlb_addr & ~TARGET_PAGE_MASK) == 0) {
- goto do_aligned_access;
- }
}
/* Handle I/O access. */
- io_writex(env, iotlbentry, mmu_idx, val, addr, retaddr, op);
+ if (likely(tlb_addr & (TLB_MMIO | TLB_NOTDIRTY))) {
+ io_writex(env, iotlbentry, mmu_idx, val, addr, retaddr,
+ op ^ (tlb_addr & TLB_BSWAP ? MO_BSWAP : 0));
+ return;
+ }
+
+ haddr = (void *)((uintptr_t)addr + entry->addend);
+
+ if (unlikely(tlb_addr & TLB_BSWAP)) {
+ direct_swap(haddr, val);
+ } else {
+ direct(haddr, val);
+ }
return;
}
@@ -1686,7 +1687,6 @@ store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
return;
}
- do_aligned_access:
haddr = (void *)((uintptr_t)addr + entry->addend);
direct(haddr, val);
}
@@ -1694,43 +1694,47 @@ store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
void helper_ret_stb_mmu(CPUArchState *env, target_ulong addr, uint8_t val,
TCGMemOpIdx oi, uintptr_t retaddr)
{
- store_helper(env, addr, val, oi, retaddr, MO_UB, wrap_stb);
+ store_helper(env, addr, val, oi, retaddr, MO_UB, wrap_stb, wrap_stb);
}
void helper_le_stw_mmu(CPUArchState *env, target_ulong addr, uint16_t val,
TCGMemOpIdx oi, uintptr_t retaddr)
{
- store_helper(env, addr, val, oi, retaddr, MO_LEUW, wrap_stw_le);
+ store_helper(env, addr, val, oi, retaddr, MO_LEUW,
+ wrap_stw_le, wrap_stw_be);
}
void helper_be_stw_mmu(CPUArchState *env, target_ulong addr, uint16_t val,
TCGMemOpIdx oi, uintptr_t retaddr)
{
- store_helper(env, addr, val, oi, retaddr, MO_BEUW, wrap_stw_be);
+ store_helper(env, addr, val, oi, retaddr, MO_BEUW,
+ wrap_stw_be, wrap_stw_le);
}
void helper_le_stl_mmu(CPUArchState *env, target_ulong addr, uint32_t val,
TCGMemOpIdx oi, uintptr_t retaddr)
{
- store_helper(env, addr, val, oi, retaddr, MO_LEUL, wrap_stl_le);
+ store_helper(env, addr, val, oi, retaddr, MO_LEUL,
+ wrap_stl_le, wrap_stl_be);
}
void helper_be_stl_mmu(CPUArchState *env, target_ulong addr, uint32_t val,
TCGMemOpIdx oi, uintptr_t retaddr)
{
- store_helper(env, addr, val, oi, retaddr, MO_BEUL, wrap_stl_be);
+ store_helper(env, addr, val, oi, retaddr, MO_BEUL,
+ wrap_stl_be, wrap_stl_le);
}
void helper_le_stq_mmu(CPUArchState *env, target_ulong addr, uint64_t val,
TCGMemOpIdx oi, uintptr_t retaddr)
{
- store_helper(env, addr, val, oi, retaddr, MO_LEQ, stq_le_p);
+ store_helper(env, addr, val, oi, retaddr, MO_LEQ, stq_le_p, stq_be_p);
}
void helper_be_stq_mmu(CPUArchState *env, target_ulong addr, uint64_t val,
TCGMemOpIdx oi, uintptr_t retaddr)
{
- store_helper(env, addr, val, oi, retaddr, MO_BEQ, stq_be_p);
+ store_helper(env, addr, val, oi, retaddr, MO_BEQ, stq_be_p, stq_le_p);
}
/* First set of helpers allows passing in of OI and RETADDR. This makes
@@ -1796,7 +1800,7 @@ static uint64_t full_ldub_cmmu(CPUArchState *env, target_ulong addr,
TCGMemOpIdx oi, uintptr_t retaddr)
{
return load_helper(env, addr, oi, retaddr, MO_8, true,
- full_ldub_cmmu, wrap_ldub);
+ full_ldub_cmmu, wrap_ldub, wrap_ldub);
}
uint8_t helper_ret_ldb_cmmu(CPUArchState *env, target_ulong addr,
@@ -1809,7 +1813,7 @@ static uint64_t full_le_lduw_cmmu(CPUArchState *env, target_ulong addr,
TCGMemOpIdx oi, uintptr_t retaddr)
{
return load_helper(env, addr, oi, retaddr, MO_LEUW, true,
- full_le_lduw_cmmu, wrap_lduw_le);
+ full_le_lduw_cmmu, wrap_lduw_le, wrap_lduw_be);
}
uint16_t helper_le_ldw_cmmu(CPUArchState *env, target_ulong addr,
@@ -1822,7 +1826,7 @@ static uint64_t full_be_lduw_cmmu(CPUArchState *env, target_ulong addr,
TCGMemOpIdx oi, uintptr_t retaddr)
{
return load_helper(env, addr, oi, retaddr, MO_BEUW, true,
- full_be_lduw_cmmu, wrap_lduw_be);
+ full_be_lduw_cmmu, wrap_lduw_be, wrap_lduw_le);
}
uint16_t helper_be_ldw_cmmu(CPUArchState *env, target_ulong addr,
@@ -1835,7 +1839,7 @@ static uint64_t full_le_ldul_cmmu(CPUArchState *env, target_ulong addr,
TCGMemOpIdx oi, uintptr_t retaddr)
{
return load_helper(env, addr, oi, retaddr, MO_LEUL, true,
- full_le_ldul_cmmu, wrap_ldul_le);
+ full_le_ldul_cmmu, wrap_ldul_le, wrap_ldul_be);
}
uint32_t helper_le_ldl_cmmu(CPUArchState *env, target_ulong addr,
@@ -1848,7 +1852,7 @@ static uint64_t full_be_ldul_cmmu(CPUArchState *env, target_ulong addr,
TCGMemOpIdx oi, uintptr_t retaddr)
{
return load_helper(env, addr, oi, retaddr, MO_BEUL, true,
- full_be_ldul_cmmu, wrap_ldul_be);
+ full_be_ldul_cmmu, wrap_ldul_be, wrap_ldul_le);
}
uint32_t helper_be_ldl_cmmu(CPUArchState *env, target_ulong addr,
@@ -1861,12 +1865,12 @@ uint64_t helper_le_ldq_cmmu(CPUArchState *env, target_ulong addr,
TCGMemOpIdx oi, uintptr_t retaddr)
{
return load_helper(env, addr, oi, retaddr, MO_LEQ, true,
- helper_le_ldq_cmmu, ldq_le_p);
+ helper_le_ldq_cmmu, ldq_le_p, ldq_be_p);
}
uint64_t helper_be_ldq_cmmu(CPUArchState *env, target_ulong addr,
TCGMemOpIdx oi, uintptr_t retaddr)
{
return load_helper(env, addr, oi, retaddr, MO_BEQ, true,
- helper_be_ldq_cmmu, ldq_be_p);
+ helper_be_ldq_cmmu, ldq_be_p, ldq_le_p);
}
--
2.17.1
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCH v3 11/20] exec: Adjust notdirty tracing
2019-09-22 3:54 [PATCH v3 00/20] Move rom and notdirty handling to cputlb Richard Henderson
` (9 preceding siblings ...)
2019-09-22 3:54 ` [PATCH v3 10/20] cputlb: Introduce TLB_BSWAP Richard Henderson
@ 2019-09-22 3:54 ` Richard Henderson
2019-09-23 9:17 ` Philippe Mathieu-Daudé
2019-09-22 3:54 ` [PATCH v3 12/20] cputlb: Move ROM handling from I/O path to TLB path Richard Henderson
` (11 subsequent siblings)
22 siblings, 1 reply; 53+ messages in thread
From: Richard Henderson @ 2019-09-22 3:54 UTC (permalink / raw)
To: qemu-devel; +Cc: pbonzini, alex.bennee, stefanha, david
The memory_region_tb_read tracepoint is unreachable, since notdirty
is supposed to apply only to writes. The memory_region_tb_write
tracepoint is mis-named, because notdirty is not only used for TB
invalidation. It is also used for e.g. VGA RAM updates and migration.
Replace memory_region_tb_write with memory_notdirty_write_access,
and place it in memory_notdirty_write_prepare where it can catch
all of the instances. Add memory_notdirty_set_dirty to log when
we no longer intercept writes to a page.
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
exec.c | 3 +++
memory.c | 4 ----
trace-events | 4 ++--
3 files changed, 5 insertions(+), 6 deletions(-)
diff --git a/exec.c b/exec.c
index 33bd0e36c1..7ce0515635 100644
--- a/exec.c
+++ b/exec.c
@@ -2721,6 +2721,8 @@ void memory_notdirty_write_prepare(NotDirtyInfo *ndi,
ndi->size = size;
ndi->pages = NULL;
+ trace_memory_notdirty_write_access(mem_vaddr, ram_addr, size);
+
assert(tcg_enabled());
if (!cpu_physical_memory_get_dirty_flag(ram_addr, DIRTY_MEMORY_CODE)) {
ndi->pages = page_collection_lock(ram_addr, ram_addr + size);
@@ -2745,6 +2747,7 @@ void memory_notdirty_write_complete(NotDirtyInfo *ndi)
/* we remove the notdirty callback only if the code has been
flushed */
if (!cpu_physical_memory_is_clean(ndi->ram_addr)) {
+ trace_memory_notdirty_set_dirty(ndi->mem_vaddr);
tlb_set_dirty(ndi->cpu, ndi->mem_vaddr);
}
}
diff --git a/memory.c b/memory.c
index b9dd6b94ca..57c44c97db 100644
--- a/memory.c
+++ b/memory.c
@@ -438,7 +438,6 @@ static MemTxResult memory_region_read_accessor(MemoryRegion *mr,
/* Accesses to code which has previously been translated into a TB show
* up in the MMIO path, as accesses to the io_mem_notdirty
* MemoryRegion. */
- trace_memory_region_tb_read(get_cpu_index(), addr, tmp, size);
} else if (TRACE_MEMORY_REGION_OPS_READ_ENABLED) {
hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
trace_memory_region_ops_read(get_cpu_index(), mr, abs_addr, tmp, size);
@@ -465,7 +464,6 @@ static MemTxResult memory_region_read_with_attrs_accessor(MemoryRegion *mr,
/* Accesses to code which has previously been translated into a TB show
* up in the MMIO path, as accesses to the io_mem_notdirty
* MemoryRegion. */
- trace_memory_region_tb_read(get_cpu_index(), addr, tmp, size);
} else if (TRACE_MEMORY_REGION_OPS_READ_ENABLED) {
hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
trace_memory_region_ops_read(get_cpu_index(), mr, abs_addr, tmp, size);
@@ -490,7 +488,6 @@ static MemTxResult memory_region_write_accessor(MemoryRegion *mr,
/* Accesses to code which has previously been translated into a TB show
* up in the MMIO path, as accesses to the io_mem_notdirty
* MemoryRegion. */
- trace_memory_region_tb_write(get_cpu_index(), addr, tmp, size);
} else if (TRACE_MEMORY_REGION_OPS_WRITE_ENABLED) {
hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
trace_memory_region_ops_write(get_cpu_index(), mr, abs_addr, tmp, size);
@@ -515,7 +512,6 @@ static MemTxResult memory_region_write_with_attrs_accessor(MemoryRegion *mr,
/* Accesses to code which has previously been translated into a TB show
* up in the MMIO path, as accesses to the io_mem_notdirty
* MemoryRegion. */
- trace_memory_region_tb_write(get_cpu_index(), addr, tmp, size);
} else if (TRACE_MEMORY_REGION_OPS_WRITE_ENABLED) {
hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
trace_memory_region_ops_write(get_cpu_index(), mr, abs_addr, tmp, size);
diff --git a/trace-events b/trace-events
index 823a4ae64e..20821ba545 100644
--- a/trace-events
+++ b/trace-events
@@ -52,14 +52,14 @@ dma_map_wait(void *dbs) "dbs=%p"
find_ram_offset(uint64_t size, uint64_t offset) "size: 0x%" PRIx64 " @ 0x%" PRIx64
find_ram_offset_loop(uint64_t size, uint64_t candidate, uint64_t offset, uint64_t next, uint64_t mingap) "trying size: 0x%" PRIx64 " @ 0x%" PRIx64 ", offset: 0x%" PRIx64" next: 0x%" PRIx64 " mingap: 0x%" PRIx64
ram_block_discard_range(const char *rbname, void *hva, size_t length, bool need_madvise, bool need_fallocate, int ret) "%s@%p + 0x%zx: madvise: %d fallocate: %d ret: %d"
+memory_notdirty_write_access(uint64_t vaddr, uint64_t ram_addr, unsigned size) "0x%" PRIx64 " ram_addr 0x%" PRIx64 " size %u"
+memory_notdirty_set_dirty(uint64_t vaddr) "0x%" PRIx64
# memory.c
memory_region_ops_read(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u"
memory_region_ops_write(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u"
memory_region_subpage_read(int cpu_index, void *mr, uint64_t offset, uint64_t value, unsigned size) "cpu %d mr %p offset 0x%"PRIx64" value 0x%"PRIx64" size %u"
memory_region_subpage_write(int cpu_index, void *mr, uint64_t offset, uint64_t value, unsigned size) "cpu %d mr %p offset 0x%"PRIx64" value 0x%"PRIx64" size %u"
-memory_region_tb_read(int cpu_index, uint64_t addr, uint64_t value, unsigned size) "cpu %d addr 0x%"PRIx64" value 0x%"PRIx64" size %u"
-memory_region_tb_write(int cpu_index, uint64_t addr, uint64_t value, unsigned size) "cpu %d addr 0x%"PRIx64" value 0x%"PRIx64" size %u"
memory_region_ram_device_read(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u"
memory_region_ram_device_write(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u"
flatview_new(void *view, void *root) "%p (root %p)"
--
2.17.1
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCH v3 12/20] cputlb: Move ROM handling from I/O path to TLB path
2019-09-22 3:54 [PATCH v3 00/20] Move rom and notdirty handling to cputlb Richard Henderson
` (10 preceding siblings ...)
2019-09-22 3:54 ` [PATCH v3 11/20] exec: Adjust notdirty tracing Richard Henderson
@ 2019-09-22 3:54 ` Richard Henderson
2019-09-23 8:39 ` David Hildenbrand
2019-09-22 3:54 ` [PATCH v3 13/20] cputlb: Move NOTDIRTY " Richard Henderson
` (10 subsequent siblings)
22 siblings, 1 reply; 53+ messages in thread
From: Richard Henderson @ 2019-09-22 3:54 UTC (permalink / raw)
To: qemu-devel; +Cc: pbonzini, alex.bennee, stefanha, david
It does not require going through the whole I/O path
in order to discard a write.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
include/exec/cpu-all.h | 5 ++++-
include/exec/cpu-common.h | 1 -
accel/tcg/cputlb.c | 35 +++++++++++++++++++--------------
exec.c | 41 +--------------------------------------
4 files changed, 25 insertions(+), 57 deletions(-)
diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index 1ebd1b59ab..9f0b17802e 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -348,12 +348,15 @@ CPUArchState *cpu_copy(CPUArchState *env);
#define TLB_WATCHPOINT (1 << (TARGET_PAGE_BITS_MIN - 4))
/* Set if TLB entry requires byte swap. */
#define TLB_BSWAP (1 << (TARGET_PAGE_BITS_MIN - 5))
+/* Set if TLB entry writes ignored. */
+#define TLB_ROM (1 << (TARGET_PAGE_BITS_MIN - 6))
/* Use this mask to check interception with an alignment mask
* in a TCG backend.
*/
#define TLB_FLAGS_MASK \
- (TLB_INVALID_MASK | TLB_NOTDIRTY | TLB_MMIO | TLB_WATCHPOINT | TLB_BSWAP)
+ (TLB_INVALID_MASK | TLB_NOTDIRTY | TLB_MMIO \
+ | TLB_WATCHPOINT | TLB_BSWAP | TLB_ROM)
/**
* tlb_hit_page: return true if page aligned @addr is a hit against the
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index f7dbe75fbc..1c0e03ddc2 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -100,7 +100,6 @@ void qemu_flush_coalesced_mmio_buffer(void);
void cpu_flush_icache_range(hwaddr start, hwaddr len);
-extern struct MemoryRegion io_mem_rom;
extern struct MemoryRegion io_mem_notdirty;
typedef int (RAMBlockIterFunc)(RAMBlock *rb, void *opaque);
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index cb603917a2..7ab523d7ec 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -577,7 +577,7 @@ static void tlb_reset_dirty_range_locked(CPUTLBEntry *tlb_entry,
{
uintptr_t addr = tlb_entry->addr_write;
- if ((addr & (TLB_INVALID_MASK | TLB_MMIO | TLB_NOTDIRTY)) == 0) {
+ if ((addr & (TLB_INVALID_MASK | TLB_MMIO | TLB_ROM | TLB_NOTDIRTY)) == 0) {
addr &= TARGET_PAGE_MASK;
addr += tlb_entry->addend;
if ((addr - start) < length) {
@@ -745,7 +745,6 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
address |= TLB_MMIO;
addend = 0;
} else {
- /* TLB_MMIO for rom/romd handled below */
addend = (uintptr_t)memory_region_get_ram_ptr(section->mr) + xlat;
}
@@ -822,16 +821,17 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
tn.addr_write = -1;
if (prot & PAGE_WRITE) {
- if ((memory_region_is_ram(section->mr) && section->readonly)
- || memory_region_is_romd(section->mr)) {
- /* Write access calls the I/O callback. */
- tn.addr_write = address | TLB_MMIO;
- } else if (memory_region_is_ram(section->mr)
- && cpu_physical_memory_is_clean(
- memory_region_get_ram_addr(section->mr) + xlat)) {
- tn.addr_write = address | TLB_NOTDIRTY;
- } else {
- tn.addr_write = address;
+ tn.addr_write = address;
+ if (memory_region_is_romd(section->mr)) {
+ /* Use the MMIO path so that the device can switch states. */
+ tn.addr_write |= TLB_MMIO;
+ } else if (memory_region_is_ram(section->mr)) {
+ if (section->readonly) {
+ tn.addr_write |= TLB_ROM;
+ } else if (cpu_physical_memory_is_clean(
+ memory_region_get_ram_addr(section->mr) + xlat)) {
+ tn.addr_write |= TLB_NOTDIRTY;
+ }
}
if (prot & PAGE_WRITE_INV) {
tn.addr_write |= TLB_INVALID_MASK;
@@ -904,7 +904,7 @@ static uint64_t io_readx(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
mr = section->mr;
mr_offset = (iotlbentry->addr & TARGET_PAGE_MASK) + addr;
cpu->mem_io_pc = retaddr;
- if (mr != &io_mem_rom && mr != &io_mem_notdirty && !cpu->can_do_io) {
+ if (mr != &io_mem_notdirty && !cpu->can_do_io) {
cpu_io_recompile(cpu, retaddr);
}
@@ -945,7 +945,7 @@ static void io_writex(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
section = iotlb_to_section(cpu, iotlbentry->addr, iotlbentry->attrs);
mr = section->mr;
mr_offset = (iotlbentry->addr & TARGET_PAGE_MASK) + addr;
- if (mr != &io_mem_rom && mr != &io_mem_notdirty && !cpu->can_do_io) {
+ if (mr != &io_mem_notdirty && !cpu->can_do_io) {
cpu_io_recompile(cpu, retaddr);
}
cpu->mem_io_vaddr = addr;
@@ -1125,7 +1125,7 @@ void *probe_access(CPUArchState *env, target_ulong addr, int size,
}
/* Reject I/O access, or other required slow-path. */
- if (tlb_addr & (TLB_NOTDIRTY | TLB_MMIO | TLB_BSWAP)) {
+ if (tlb_addr & (TLB_NOTDIRTY | TLB_MMIO | TLB_BSWAP | TLB_ROM)) {
return NULL;
}
@@ -1612,6 +1612,11 @@ store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
return;
}
+ /* Ignore writes to ROM. */
+ if (unlikely(tlb_addr & TLB_ROM)) {
+ return;
+ }
+
haddr = (void *)((uintptr_t)addr + entry->addend);
if (unlikely(tlb_addr & TLB_BSWAP)) {
diff --git a/exec.c b/exec.c
index 7ce0515635..e21e068535 100644
--- a/exec.c
+++ b/exec.c
@@ -88,7 +88,7 @@ static MemoryRegion *system_io;
AddressSpace address_space_io;
AddressSpace address_space_memory;
-MemoryRegion io_mem_rom, io_mem_notdirty;
+MemoryRegion io_mem_notdirty;
static MemoryRegion io_mem_unassigned;
#endif
@@ -158,7 +158,6 @@ typedef struct subpage_t {
#define PHYS_SECTION_UNASSIGNED 0
#define PHYS_SECTION_NOTDIRTY 1
-#define PHYS_SECTION_ROM 2
static void io_mem_init(void);
static void memory_map_init(void);
@@ -1441,8 +1440,6 @@ hwaddr memory_region_section_get_iotlb(CPUState *cpu,
iotlb = memory_region_get_ram_addr(section->mr) + xlat;
if (!section->readonly) {
iotlb |= PHYS_SECTION_NOTDIRTY;
- } else {
- iotlb |= PHYS_SECTION_ROM;
}
} else {
AddressSpaceDispatch *d;
@@ -2968,38 +2965,6 @@ static uint16_t dummy_section(PhysPageMap *map, FlatView *fv, MemoryRegion *mr)
return phys_section_add(map, §ion);
}
-static void readonly_mem_write(void *opaque, hwaddr addr,
- uint64_t val, unsigned size)
-{
- /* Ignore any write to ROM. */
-}
-
-static bool readonly_mem_accepts(void *opaque, hwaddr addr,
- unsigned size, bool is_write,
- MemTxAttrs attrs)
-{
- return is_write;
-}
-
-/* This will only be used for writes, because reads are special cased
- * to directly access the underlying host ram.
- */
-static const MemoryRegionOps readonly_mem_ops = {
- .write = readonly_mem_write,
- .valid.accepts = readonly_mem_accepts,
- .endianness = DEVICE_NATIVE_ENDIAN,
- .valid = {
- .min_access_size = 1,
- .max_access_size = 8,
- .unaligned = false,
- },
- .impl = {
- .min_access_size = 1,
- .max_access_size = 8,
- .unaligned = false,
- },
-};
-
MemoryRegionSection *iotlb_to_section(CPUState *cpu,
hwaddr index, MemTxAttrs attrs)
{
@@ -3013,8 +2978,6 @@ MemoryRegionSection *iotlb_to_section(CPUState *cpu,
static void io_mem_init(void)
{
- memory_region_init_io(&io_mem_rom, NULL, &readonly_mem_ops,
- NULL, NULL, UINT64_MAX);
memory_region_init_io(&io_mem_unassigned, NULL, &unassigned_mem_ops, NULL,
NULL, UINT64_MAX);
@@ -3035,8 +2998,6 @@ AddressSpaceDispatch *address_space_dispatch_new(FlatView *fv)
assert(n == PHYS_SECTION_UNASSIGNED);
n = dummy_section(&d->map, fv, &io_mem_notdirty);
assert(n == PHYS_SECTION_NOTDIRTY);
- n = dummy_section(&d->map, fv, &io_mem_rom);
- assert(n == PHYS_SECTION_ROM);
d->phys_map = (PhysPageEntry) { .ptr = PHYS_MAP_NODE_NIL, .skip = 1 };
--
2.17.1
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCH v3 13/20] cputlb: Move NOTDIRTY handling from I/O path to TLB path
2019-09-22 3:54 [PATCH v3 00/20] Move rom and notdirty handling to cputlb Richard Henderson
` (11 preceding siblings ...)
2019-09-22 3:54 ` [PATCH v3 12/20] cputlb: Move ROM handling from I/O path to TLB path Richard Henderson
@ 2019-09-22 3:54 ` Richard Henderson
2019-09-23 8:41 ` David Hildenbrand
2019-09-23 9:30 ` Philippe Mathieu-Daudé
2019-09-22 3:54 ` [PATCH v3 14/20] cputlb: Partially inline memory_region_section_get_iotlb Richard Henderson
` (9 subsequent siblings)
22 siblings, 2 replies; 53+ messages in thread
From: Richard Henderson @ 2019-09-22 3:54 UTC (permalink / raw)
To: qemu-devel; +Cc: pbonzini, alex.bennee, stefanha, david
Pages that we want to track for NOTDIRTY are RAM. We do not
really need to go through the I/O path to handle them.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
include/exec/cpu-common.h | 2 --
accel/tcg/cputlb.c | 26 +++++++++++++++++---
exec.c | 50 ---------------------------------------
memory.c | 16 -------------
4 files changed, 23 insertions(+), 71 deletions(-)
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 1c0e03ddc2..81753bbb34 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -100,8 +100,6 @@ void qemu_flush_coalesced_mmio_buffer(void);
void cpu_flush_icache_range(hwaddr start, hwaddr len);
-extern struct MemoryRegion io_mem_notdirty;
-
typedef int (RAMBlockIterFunc)(RAMBlock *rb, void *opaque);
int qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque);
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 7ab523d7ec..b7bd738115 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -904,7 +904,7 @@ static uint64_t io_readx(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
mr = section->mr;
mr_offset = (iotlbentry->addr & TARGET_PAGE_MASK) + addr;
cpu->mem_io_pc = retaddr;
- if (mr != &io_mem_notdirty && !cpu->can_do_io) {
+ if (!cpu->can_do_io) {
cpu_io_recompile(cpu, retaddr);
}
@@ -945,7 +945,7 @@ static void io_writex(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
section = iotlb_to_section(cpu, iotlbentry->addr, iotlbentry->attrs);
mr = section->mr;
mr_offset = (iotlbentry->addr & TARGET_PAGE_MASK) + addr;
- if (mr != &io_mem_notdirty && !cpu->can_do_io) {
+ if (!cpu->can_do_io) {
cpu_io_recompile(cpu, retaddr);
}
cpu->mem_io_vaddr = addr;
@@ -1606,7 +1606,7 @@ store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
}
/* Handle I/O access. */
- if (likely(tlb_addr & (TLB_MMIO | TLB_NOTDIRTY))) {
+ if (tlb_addr & TLB_MMIO) {
io_writex(env, iotlbentry, mmu_idx, val, addr, retaddr,
op ^ (tlb_addr & TLB_BSWAP ? MO_BSWAP : 0));
return;
@@ -1619,6 +1619,26 @@ store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
haddr = (void *)((uintptr_t)addr + entry->addend);
+ /* Handle clean RAM pages. */
+ if (tlb_addr & TLB_NOTDIRTY) {
+ NotDirtyInfo ndi;
+
+ /* We require mem_io_pc in tb_invalidate_phys_page_range. */
+ env_cpu(env)->mem_io_pc = retaddr;
+
+ memory_notdirty_write_prepare(&ndi, env_cpu(env), addr,
+ addr + iotlbentry->addr, size);
+
+ if (unlikely(tlb_addr & TLB_BSWAP)) {
+ direct_swap(haddr, val);
+ } else {
+ direct(haddr, val);
+ }
+
+ memory_notdirty_write_complete(&ndi);
+ return;
+ }
+
if (unlikely(tlb_addr & TLB_BSWAP)) {
direct_swap(haddr, val);
} else {
diff --git a/exec.c b/exec.c
index e21e068535..abf58b68a0 100644
--- a/exec.c
+++ b/exec.c
@@ -88,7 +88,6 @@ static MemoryRegion *system_io;
AddressSpace address_space_io;
AddressSpace address_space_memory;
-MemoryRegion io_mem_notdirty;
static MemoryRegion io_mem_unassigned;
#endif
@@ -157,7 +156,6 @@ typedef struct subpage_t {
} subpage_t;
#define PHYS_SECTION_UNASSIGNED 0
-#define PHYS_SECTION_NOTDIRTY 1
static void io_mem_init(void);
static void memory_map_init(void);
@@ -1438,9 +1436,6 @@ hwaddr memory_region_section_get_iotlb(CPUState *cpu,
if (memory_region_is_ram(section->mr)) {
/* Normal RAM. */
iotlb = memory_region_get_ram_addr(section->mr) + xlat;
- if (!section->readonly) {
- iotlb |= PHYS_SECTION_NOTDIRTY;
- }
} else {
AddressSpaceDispatch *d;
@@ -2749,42 +2744,6 @@ void memory_notdirty_write_complete(NotDirtyInfo *ndi)
}
}
-/* Called within RCU critical section. */
-static void notdirty_mem_write(void *opaque, hwaddr ram_addr,
- uint64_t val, unsigned size)
-{
- NotDirtyInfo ndi;
-
- memory_notdirty_write_prepare(&ndi, current_cpu, current_cpu->mem_io_vaddr,
- ram_addr, size);
-
- stn_p(qemu_map_ram_ptr(NULL, ram_addr), size, val);
- memory_notdirty_write_complete(&ndi);
-}
-
-static bool notdirty_mem_accepts(void *opaque, hwaddr addr,
- unsigned size, bool is_write,
- MemTxAttrs attrs)
-{
- return is_write;
-}
-
-static const MemoryRegionOps notdirty_mem_ops = {
- .write = notdirty_mem_write,
- .valid.accepts = notdirty_mem_accepts,
- .endianness = DEVICE_NATIVE_ENDIAN,
- .valid = {
- .min_access_size = 1,
- .max_access_size = 8,
- .unaligned = false,
- },
- .impl = {
- .min_access_size = 1,
- .max_access_size = 8,
- .unaligned = false,
- },
-};
-
/* Generate a debug exception if a watchpoint has been hit. */
void cpu_check_watchpoint(CPUState *cpu, vaddr addr, vaddr len,
MemTxAttrs attrs, int flags, uintptr_t ra)
@@ -2980,13 +2939,6 @@ static void io_mem_init(void)
{
memory_region_init_io(&io_mem_unassigned, NULL, &unassigned_mem_ops, NULL,
NULL, UINT64_MAX);
-
- /* io_mem_notdirty calls tb_invalidate_phys_page_fast,
- * which can be called without the iothread mutex.
- */
- memory_region_init_io(&io_mem_notdirty, NULL, ¬dirty_mem_ops, NULL,
- NULL, UINT64_MAX);
- memory_region_clear_global_locking(&io_mem_notdirty);
}
AddressSpaceDispatch *address_space_dispatch_new(FlatView *fv)
@@ -2996,8 +2948,6 @@ AddressSpaceDispatch *address_space_dispatch_new(FlatView *fv)
n = dummy_section(&d->map, fv, &io_mem_unassigned);
assert(n == PHYS_SECTION_UNASSIGNED);
- n = dummy_section(&d->map, fv, &io_mem_notdirty);
- assert(n == PHYS_SECTION_NOTDIRTY);
d->phys_map = (PhysPageEntry) { .ptr = PHYS_MAP_NODE_NIL, .skip = 1 };
diff --git a/memory.c b/memory.c
index 57c44c97db..a99b8c0767 100644
--- a/memory.c
+++ b/memory.c
@@ -434,10 +434,6 @@ static MemTxResult memory_region_read_accessor(MemoryRegion *mr,
tmp = mr->ops->read(mr->opaque, addr, size);
if (mr->subpage) {
trace_memory_region_subpage_read(get_cpu_index(), mr, addr, tmp, size);
- } else if (mr == &io_mem_notdirty) {
- /* Accesses to code which has previously been translated into a TB show
- * up in the MMIO path, as accesses to the io_mem_notdirty
- * MemoryRegion. */
} else if (TRACE_MEMORY_REGION_OPS_READ_ENABLED) {
hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
trace_memory_region_ops_read(get_cpu_index(), mr, abs_addr, tmp, size);
@@ -460,10 +456,6 @@ static MemTxResult memory_region_read_with_attrs_accessor(MemoryRegion *mr,
r = mr->ops->read_with_attrs(mr->opaque, addr, &tmp, size, attrs);
if (mr->subpage) {
trace_memory_region_subpage_read(get_cpu_index(), mr, addr, tmp, size);
- } else if (mr == &io_mem_notdirty) {
- /* Accesses to code which has previously been translated into a TB show
- * up in the MMIO path, as accesses to the io_mem_notdirty
- * MemoryRegion. */
} else if (TRACE_MEMORY_REGION_OPS_READ_ENABLED) {
hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
trace_memory_region_ops_read(get_cpu_index(), mr, abs_addr, tmp, size);
@@ -484,10 +476,6 @@ static MemTxResult memory_region_write_accessor(MemoryRegion *mr,
if (mr->subpage) {
trace_memory_region_subpage_write(get_cpu_index(), mr, addr, tmp, size);
- } else if (mr == &io_mem_notdirty) {
- /* Accesses to code which has previously been translated into a TB show
- * up in the MMIO path, as accesses to the io_mem_notdirty
- * MemoryRegion. */
} else if (TRACE_MEMORY_REGION_OPS_WRITE_ENABLED) {
hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
trace_memory_region_ops_write(get_cpu_index(), mr, abs_addr, tmp, size);
@@ -508,10 +496,6 @@ static MemTxResult memory_region_write_with_attrs_accessor(MemoryRegion *mr,
if (mr->subpage) {
trace_memory_region_subpage_write(get_cpu_index(), mr, addr, tmp, size);
- } else if (mr == &io_mem_notdirty) {
- /* Accesses to code which has previously been translated into a TB show
- * up in the MMIO path, as accesses to the io_mem_notdirty
- * MemoryRegion. */
} else if (TRACE_MEMORY_REGION_OPS_WRITE_ENABLED) {
hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
trace_memory_region_ops_write(get_cpu_index(), mr, abs_addr, tmp, size);
--
2.17.1
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCH v3 14/20] cputlb: Partially inline memory_region_section_get_iotlb
2019-09-22 3:54 [PATCH v3 00/20] Move rom and notdirty handling to cputlb Richard Henderson
` (12 preceding siblings ...)
2019-09-22 3:54 ` [PATCH v3 13/20] cputlb: Move NOTDIRTY " Richard Henderson
@ 2019-09-22 3:54 ` Richard Henderson
2019-09-22 3:54 ` [PATCH v3 15/20] cputlb: Merge and move memory_notdirty_write_{prepare, complete} Richard Henderson
` (8 subsequent siblings)
22 siblings, 0 replies; 53+ messages in thread
From: Richard Henderson @ 2019-09-22 3:54 UTC (permalink / raw)
To: qemu-devel; +Cc: pbonzini, alex.bennee, stefanha, david
There is only one caller, tlb_set_page_with_attrs. We cannot
inline the entire function because the AddressSpaceDispatch
structure is private to exec.c, and cannot easily be moved to
include/exec/memory-internal.h.
Compute is_ram and is_romd once within tlb_set_page_with_attrs.
Fold the number of tests against these predicates. Compute
cpu_physical_memory_is_clean outside of the tlb lock region.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
include/exec/exec-all.h | 6 +---
accel/tcg/cputlb.c | 68 ++++++++++++++++++++++++++---------------
exec.c | 22 ++-----------
3 files changed, 47 insertions(+), 49 deletions(-)
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 81b02eb2fe..49db07ba0b 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -509,11 +509,7 @@ address_space_translate_for_iotlb(CPUState *cpu, int asidx, hwaddr addr,
hwaddr *xlat, hwaddr *plen,
MemTxAttrs attrs, int *prot);
hwaddr memory_region_section_get_iotlb(CPUState *cpu,
- MemoryRegionSection *section,
- target_ulong vaddr,
- hwaddr paddr, hwaddr xlat,
- int prot,
- target_ulong *address);
+ MemoryRegionSection *section);
#endif
/* vl.c */
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index b7bd738115..1a839c0f82 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -704,13 +704,14 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
MemoryRegionSection *section;
unsigned int index;
target_ulong address;
- target_ulong code_address;
+ target_ulong write_address;
uintptr_t addend;
CPUTLBEntry *te, tn;
hwaddr iotlb, xlat, sz, paddr_page;
target_ulong vaddr_page;
int asidx = cpu_asidx_from_attrs(cpu, attrs);
int wp_flags;
+ bool is_ram, is_romd;
assert_cpu_is_self(cpu);
@@ -739,18 +740,46 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
if (attrs.byte_swap) {
address |= TLB_BSWAP;
}
- if (!memory_region_is_ram(section->mr) &&
- !memory_region_is_romd(section->mr)) {
- /* IO memory case */
- address |= TLB_MMIO;
- addend = 0;
- } else {
+
+ is_ram = memory_region_is_ram(section->mr);
+ is_romd = memory_region_is_romd(section->mr);
+
+ if (is_ram || is_romd) {
+ /* RAM and ROMD both have associated host memory. */
addend = (uintptr_t)memory_region_get_ram_ptr(section->mr) + xlat;
+ } else {
+ /* I/O does not; force the host address to NULL. */
+ addend = 0;
+ }
+
+ write_address = address;
+ if (is_ram) {
+ iotlb = memory_region_get_ram_addr(section->mr) + xlat;
+ /*
+ * Computing is_clean is expensive; avoid all that unless
+ * the page is actually writable.
+ */
+ if (prot & PAGE_WRITE) {
+ if (section->readonly) {
+ write_address |= TLB_ROM;
+ } else if (cpu_physical_memory_is_clean(iotlb)) {
+ write_address |= TLB_NOTDIRTY;
+ }
+ }
+ } else {
+ /* I/O or ROMD */
+ iotlb = memory_region_section_get_iotlb(cpu, section) + xlat;
+ /*
+ * Writes to romd devices must go through MMIO to enable write.
+ * Reads to romd devices go through the ram_ptr found above,
+ * but of course reads to I/O must go through MMIO.
+ */
+ write_address |= TLB_MMIO;
+ if (!is_romd) {
+ address = write_address;
+ }
}
- code_address = address;
- iotlb = memory_region_section_get_iotlb(cpu, section, vaddr_page,
- paddr_page, xlat, prot, &address);
wp_flags = cpu_watchpoint_address_matches(cpu, vaddr_page,
TARGET_PAGE_SIZE);
@@ -790,8 +819,8 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
/*
* At this point iotlb contains a physical section number in the lower
* TARGET_PAGE_BITS, and either
- * + the ram_addr_t of the page base of the target RAM (if NOTDIRTY or ROM)
- * + the offset within section->mr of the page base (otherwise)
+ * + the ram_addr_t of the page base of the target RAM (RAM)
+ * + the offset within section->mr of the page base (I/O, ROMD)
* We subtract the vaddr_page (which is page aligned and thus won't
* disturb the low bits) to give an offset which can be added to the
* (non-page-aligned) vaddr of the eventual memory access to get
@@ -814,25 +843,14 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
}
if (prot & PAGE_EXEC) {
- tn.addr_code = code_address;
+ tn.addr_code = address;
} else {
tn.addr_code = -1;
}
tn.addr_write = -1;
if (prot & PAGE_WRITE) {
- tn.addr_write = address;
- if (memory_region_is_romd(section->mr)) {
- /* Use the MMIO path so that the device can switch states. */
- tn.addr_write |= TLB_MMIO;
- } else if (memory_region_is_ram(section->mr)) {
- if (section->readonly) {
- tn.addr_write |= TLB_ROM;
- } else if (cpu_physical_memory_is_clean(
- memory_region_get_ram_addr(section->mr) + xlat)) {
- tn.addr_write |= TLB_NOTDIRTY;
- }
- }
+ tn.addr_write = write_address;
if (prot & PAGE_WRITE_INV) {
tn.addr_write |= TLB_INVALID_MASK;
}
diff --git a/exec.c b/exec.c
index abf58b68a0..9c9cc811b3 100644
--- a/exec.c
+++ b/exec.c
@@ -1425,26 +1425,10 @@ bool cpu_physical_memory_snapshot_get_dirty(DirtyBitmapSnapshot *snap,
/* Called from RCU critical section */
hwaddr memory_region_section_get_iotlb(CPUState *cpu,
- MemoryRegionSection *section,
- target_ulong vaddr,
- hwaddr paddr, hwaddr xlat,
- int prot,
- target_ulong *address)
+ MemoryRegionSection *section)
{
- hwaddr iotlb;
-
- if (memory_region_is_ram(section->mr)) {
- /* Normal RAM. */
- iotlb = memory_region_get_ram_addr(section->mr) + xlat;
- } else {
- AddressSpaceDispatch *d;
-
- d = flatview_to_dispatch(section->fv);
- iotlb = section - d->map.sections;
- iotlb += xlat;
- }
-
- return iotlb;
+ AddressSpaceDispatch *d = flatview_to_dispatch(section->fv);
+ return section - d->map.sections;
}
#endif /* defined(CONFIG_USER_ONLY) */
--
2.17.1
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCH v3 15/20] cputlb: Merge and move memory_notdirty_write_{prepare, complete}
2019-09-22 3:54 [PATCH v3 00/20] Move rom and notdirty handling to cputlb Richard Henderson
` (13 preceding siblings ...)
2019-09-22 3:54 ` [PATCH v3 14/20] cputlb: Partially inline memory_region_section_get_iotlb Richard Henderson
@ 2019-09-22 3:54 ` Richard Henderson
2019-09-22 3:54 ` [PATCH v3 16/20] cputlb: Handle TLB_NOTDIRTY in probe_access Richard Henderson
` (7 subsequent siblings)
22 siblings, 0 replies; 53+ messages in thread
From: Richard Henderson @ 2019-09-22 3:54 UTC (permalink / raw)
To: qemu-devel; +Cc: pbonzini, alex.bennee, stefanha, david
Since 9458a9a1df1a, all readers of the dirty bitmaps wait
for the rcu lock, which means that they wait until the end
of any executing TranslationBlock.
As a consequence, there is no need for the actual access
to happen in between the _prepare and _complete. Therefore,
we can improve things by merging the two functions into
notdirty_write and dropping the NotDirtyInfo structure.
In addition, the only users of notdirty_write are in cputlb.c,
so move the merged function there. Pass in the CPUIOTLBEntry
from which the ram_addr_t may be computed.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
include/exec/memory-internal.h | 65 -----------------------------
accel/tcg/cputlb.c | 76 +++++++++++++++++++---------------
exec.c | 44 --------------------
3 files changed, 42 insertions(+), 143 deletions(-)
diff --git a/include/exec/memory-internal.h b/include/exec/memory-internal.h
index ef4fb92371..9fcc2af25c 100644
--- a/include/exec/memory-internal.h
+++ b/include/exec/memory-internal.h
@@ -49,70 +49,5 @@ void address_space_dispatch_free(AddressSpaceDispatch *d);
void mtree_print_dispatch(struct AddressSpaceDispatch *d,
MemoryRegion *root);
-
-struct page_collection;
-
-/* Opaque struct for passing info from memory_notdirty_write_prepare()
- * to memory_notdirty_write_complete(). Callers should treat all fields
- * as private, with the exception of @active.
- *
- * @active is a field which is not touched by either the prepare or
- * complete functions, but which the caller can use if it wishes to
- * track whether it has called prepare for this struct and so needs
- * to later call the complete function.
- */
-typedef struct {
- CPUState *cpu;
- struct page_collection *pages;
- ram_addr_t ram_addr;
- vaddr mem_vaddr;
- unsigned size;
- bool active;
-} NotDirtyInfo;
-
-/**
- * memory_notdirty_write_prepare: call before writing to non-dirty memory
- * @ndi: pointer to opaque NotDirtyInfo struct
- * @cpu: CPU doing the write
- * @mem_vaddr: virtual address of write
- * @ram_addr: the ram address of the write
- * @size: size of write in bytes
- *
- * Any code which writes to the host memory corresponding to
- * guest RAM which has been marked as NOTDIRTY must wrap those
- * writes in calls to memory_notdirty_write_prepare() and
- * memory_notdirty_write_complete():
- *
- * NotDirtyInfo ndi;
- * memory_notdirty_write_prepare(&ndi, ....);
- * ... perform write here ...
- * memory_notdirty_write_complete(&ndi);
- *
- * These calls will ensure that we flush any TCG translated code for
- * the memory being written, update the dirty bits and (if possible)
- * remove the slowpath callback for writing to the memory.
- *
- * This must only be called if we are using TCG; it will assert otherwise.
- *
- * We may take locks in the prepare call, so callers must ensure that
- * they don't exit (via longjump or otherwise) without calling complete.
- *
- * This call must only be made inside an RCU critical section.
- * (Note that while we're executing a TCG TB we're always in an
- * RCU critical section, which is likely to be the case for callers
- * of these functions.)
- */
-void memory_notdirty_write_prepare(NotDirtyInfo *ndi,
- CPUState *cpu,
- vaddr mem_vaddr,
- ram_addr_t ram_addr,
- unsigned size);
-/**
- * memory_notdirty_write_complete: finish write to non-dirty memory
- * @ndi: pointer to the opaque NotDirtyInfo struct which was initialized
- * by memory_not_dirty_write_prepare().
- */
-void memory_notdirty_write_complete(NotDirtyInfo *ndi);
-
#endif
#endif
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 1a839c0f82..6f685cb93a 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -33,6 +33,7 @@
#include "exec/helper-proto.h"
#include "qemu/atomic.h"
#include "qemu/atomic128.h"
+#include "translate-all.h"
/* DEBUG defines, enable DEBUG_TLB_LOG to log to the CPU_LOG_MMU target */
/* #define DEBUG_TLB */
@@ -1084,6 +1085,37 @@ tb_page_addr_t get_page_addr_code(CPUArchState *env, target_ulong addr)
return qemu_ram_addr_from_host_nofail(p);
}
+static void notdirty_write(CPUState *cpu, vaddr mem_vaddr, unsigned size,
+ CPUIOTLBEntry *iotlbentry, uintptr_t retaddr)
+{
+ ram_addr_t ram_addr = mem_vaddr + iotlbentry->addr;
+
+ trace_memory_notdirty_write_access(mem_vaddr, ram_addr, size);
+
+ if (!cpu_physical_memory_get_dirty_flag(ram_addr, DIRTY_MEMORY_CODE)) {
+ struct page_collection *pages
+ = page_collection_lock(ram_addr, ram_addr + size);
+
+ /* We require mem_io_pc in tb_invalidate_phys_page_range. */
+ cpu->mem_io_pc = retaddr;
+
+ tb_invalidate_phys_page_fast(pages, ram_addr, size);
+ page_collection_unlock(pages);
+ }
+
+ /*
+ * Set both VGA and migration bits for simplicity and to remove
+ * the notdirty callback faster.
+ */
+ cpu_physical_memory_set_dirty_range(ram_addr, size, DIRTY_CLIENTS_NOCODE);
+
+ /* We remove the notdirty callback only if the code has been flushed. */
+ if (!cpu_physical_memory_is_clean(ram_addr)) {
+ trace_memory_notdirty_set_dirty(mem_vaddr);
+ tlb_set_dirty(cpu, mem_vaddr);
+ }
+}
+
/*
* Probe for whether the specified guest access is permitted. If it is not
* permitted then an exception will be taken in the same way as if this
@@ -1203,8 +1235,7 @@ void *tlb_vaddr_to_host(CPUArchState *env, abi_ptr addr,
/* Probe for a read-modify-write atomic operation. Do not allow unaligned
* operations, or io operations to proceed. Return the host address. */
static void *atomic_mmu_lookup(CPUArchState *env, target_ulong addr,
- TCGMemOpIdx oi, uintptr_t retaddr,
- NotDirtyInfo *ndi)
+ TCGMemOpIdx oi, uintptr_t retaddr)
{
size_t mmu_idx = get_mmuidx(oi);
uintptr_t index = tlb_index(env, mmu_idx, addr);
@@ -1264,12 +1295,9 @@ static void *atomic_mmu_lookup(CPUArchState *env, target_ulong addr,
hostaddr = (void *)((uintptr_t)addr + tlbe->addend);
- ndi->active = false;
if (unlikely(tlb_addr & TLB_NOTDIRTY)) {
- ndi->active = true;
- memory_notdirty_write_prepare(ndi, env_cpu(env), addr,
- qemu_ram_addr_from_host_nofail(hostaddr),
- 1 << s_bits);
+ notdirty_write(env_cpu(env), addr, 1 << s_bits,
+ &env_tlb(env)->d[mmu_idx].iotlb[index], retaddr);
}
return hostaddr;
@@ -1635,28 +1663,13 @@ store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
return;
}
- haddr = (void *)((uintptr_t)addr + entry->addend);
-
/* Handle clean RAM pages. */
if (tlb_addr & TLB_NOTDIRTY) {
- NotDirtyInfo ndi;
-
- /* We require mem_io_pc in tb_invalidate_phys_page_range. */
- env_cpu(env)->mem_io_pc = retaddr;
-
- memory_notdirty_write_prepare(&ndi, env_cpu(env), addr,
- addr + iotlbentry->addr, size);
-
- if (unlikely(tlb_addr & TLB_BSWAP)) {
- direct_swap(haddr, val);
- } else {
- direct(haddr, val);
- }
-
- memory_notdirty_write_complete(&ndi);
- return;
+ notdirty_write(env_cpu(env), addr, size, iotlbentry, retaddr);
}
+ haddr = (void *)((uintptr_t)addr + entry->addend);
+
if (unlikely(tlb_addr & TLB_BSWAP)) {
direct_swap(haddr, val);
} else {
@@ -1786,14 +1799,9 @@ void helper_be_stq_mmu(CPUArchState *env, target_ulong addr, uint64_t val,
#define EXTRA_ARGS , TCGMemOpIdx oi, uintptr_t retaddr
#define ATOMIC_NAME(X) \
HELPER(glue(glue(glue(atomic_ ## X, SUFFIX), END), _mmu))
-#define ATOMIC_MMU_DECLS NotDirtyInfo ndi
-#define ATOMIC_MMU_LOOKUP atomic_mmu_lookup(env, addr, oi, retaddr, &ndi)
-#define ATOMIC_MMU_CLEANUP \
- do { \
- if (unlikely(ndi.active)) { \
- memory_notdirty_write_complete(&ndi); \
- } \
- } while (0)
+#define ATOMIC_MMU_DECLS
+#define ATOMIC_MMU_LOOKUP atomic_mmu_lookup(env, addr, oi, retaddr)
+#define ATOMIC_MMU_CLEANUP
#define DATA_SIZE 1
#include "atomic_template.h"
@@ -1821,7 +1829,7 @@ void helper_be_stq_mmu(CPUArchState *env, target_ulong addr, uint64_t val,
#undef ATOMIC_MMU_LOOKUP
#define EXTRA_ARGS , TCGMemOpIdx oi
#define ATOMIC_NAME(X) HELPER(glue(glue(atomic_ ## X, SUFFIX), END))
-#define ATOMIC_MMU_LOOKUP atomic_mmu_lookup(env, addr, oi, GETPC(), &ndi)
+#define ATOMIC_MMU_LOOKUP atomic_mmu_lookup(env, addr, oi, GETPC())
#define DATA_SIZE 1
#include "atomic_template.h"
diff --git a/exec.c b/exec.c
index 9c9cc811b3..090bcc05da 100644
--- a/exec.c
+++ b/exec.c
@@ -2684,50 +2684,6 @@ ram_addr_t qemu_ram_addr_from_host(void *ptr)
return block->offset + offset;
}
-/* Called within RCU critical section. */
-void memory_notdirty_write_prepare(NotDirtyInfo *ndi,
- CPUState *cpu,
- vaddr mem_vaddr,
- ram_addr_t ram_addr,
- unsigned size)
-{
- ndi->cpu = cpu;
- ndi->ram_addr = ram_addr;
- ndi->mem_vaddr = mem_vaddr;
- ndi->size = size;
- ndi->pages = NULL;
-
- trace_memory_notdirty_write_access(mem_vaddr, ram_addr, size);
-
- assert(tcg_enabled());
- if (!cpu_physical_memory_get_dirty_flag(ram_addr, DIRTY_MEMORY_CODE)) {
- ndi->pages = page_collection_lock(ram_addr, ram_addr + size);
- tb_invalidate_phys_page_fast(ndi->pages, ram_addr, size);
- }
-}
-
-/* Called within RCU critical section. */
-void memory_notdirty_write_complete(NotDirtyInfo *ndi)
-{
- if (ndi->pages) {
- assert(tcg_enabled());
- page_collection_unlock(ndi->pages);
- ndi->pages = NULL;
- }
-
- /* Set both VGA and migration bits for simplicity and to remove
- * the notdirty callback faster.
- */
- cpu_physical_memory_set_dirty_range(ndi->ram_addr, ndi->size,
- DIRTY_CLIENTS_NOCODE);
- /* we remove the notdirty callback only if the code has been
- flushed */
- if (!cpu_physical_memory_is_clean(ndi->ram_addr)) {
- trace_memory_notdirty_set_dirty(ndi->mem_vaddr);
- tlb_set_dirty(ndi->cpu, ndi->mem_vaddr);
- }
-}
-
/* Generate a debug exception if a watchpoint has been hit. */
void cpu_check_watchpoint(CPUState *cpu, vaddr addr, vaddr len,
MemTxAttrs attrs, int flags, uintptr_t ra)
--
2.17.1
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCH v3 16/20] cputlb: Handle TLB_NOTDIRTY in probe_access
2019-09-22 3:54 [PATCH v3 00/20] Move rom and notdirty handling to cputlb Richard Henderson
` (14 preceding siblings ...)
2019-09-22 3:54 ` [PATCH v3 15/20] cputlb: Merge and move memory_notdirty_write_{prepare, complete} Richard Henderson
@ 2019-09-22 3:54 ` Richard Henderson
2019-09-22 3:54 ` [PATCH v3 17/20] cputlb: Remove cpu->mem_io_vaddr Richard Henderson
` (6 subsequent siblings)
22 siblings, 0 replies; 53+ messages in thread
From: Richard Henderson @ 2019-09-22 3:54 UTC (permalink / raw)
To: qemu-devel; +Cc: pbonzini, alex.bennee, stefanha, david
We can use notdirty_write for the write and
return a valid host pointer for this case.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
accel/tcg/cputlb.c | 26 +++++++++++++++++---------
1 file changed, 17 insertions(+), 9 deletions(-)
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 6f685cb93a..6f4096bd0d 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -1167,16 +1167,24 @@ void *probe_access(CPUArchState *env, target_ulong addr, int size,
return NULL;
}
- /* Handle watchpoints. */
- if (tlb_addr & TLB_WATCHPOINT) {
- cpu_check_watchpoint(env_cpu(env), addr, size,
- env_tlb(env)->d[mmu_idx].iotlb[index].attrs,
- wp_access, retaddr);
- }
+ if (unlikely(tlb_addr & TLB_FLAGS_MASK)) {
+ CPUIOTLBEntry *iotlbentry = &env_tlb(env)->d[mmu_idx].iotlb[index];
- /* Reject I/O access, or other required slow-path. */
- if (tlb_addr & (TLB_NOTDIRTY | TLB_MMIO | TLB_BSWAP | TLB_ROM)) {
- return NULL;
+ /* Reject I/O access, or other required slow-path. */
+ if (tlb_addr & (TLB_MMIO | TLB_BSWAP | TLB_ROM)) {
+ return NULL;
+ }
+
+ /* Handle watchpoints. */
+ if (tlb_addr & TLB_WATCHPOINT) {
+ cpu_check_watchpoint(env_cpu(env), addr, size,
+ iotlbentry->attrs, wp_access, retaddr);
+ }
+
+ /* Handle clean RAM pages. */
+ if (tlb_addr & TLB_NOTDIRTY) {
+ notdirty_write(env_cpu(env), addr, size, iotlbentry, retaddr);
+ }
}
return (void *)((uintptr_t)addr + entry->addend);
--
2.17.1
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCH v3 17/20] cputlb: Remove cpu->mem_io_vaddr
2019-09-22 3:54 [PATCH v3 00/20] Move rom and notdirty handling to cputlb Richard Henderson
` (15 preceding siblings ...)
2019-09-22 3:54 ` [PATCH v3 16/20] cputlb: Handle TLB_NOTDIRTY in probe_access Richard Henderson
@ 2019-09-22 3:54 ` Richard Henderson
2019-09-23 8:50 ` David Hildenbrand
2019-09-22 3:54 ` [PATCH v3 18/20] cputlb: Remove tb_invalidate_phys_page_range is_cpu_write_access Richard Henderson
` (5 subsequent siblings)
22 siblings, 1 reply; 53+ messages in thread
From: Richard Henderson @ 2019-09-22 3:54 UTC (permalink / raw)
To: qemu-devel; +Cc: pbonzini, alex.bennee, stefanha, david
With the merge of notdirty handling into store_helper,
the last user of cpu->mem_io_vaddr was removed.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
include/hw/core/cpu.h | 2 --
accel/tcg/cputlb.c | 2 --
hw/core/cpu.c | 1 -
3 files changed, 5 deletions(-)
diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index c7cda65c66..031f587e51 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -338,7 +338,6 @@ struct qemu_work_item;
* @next_cpu: Next CPU sharing TB cache.
* @opaque: User data.
* @mem_io_pc: Host Program Counter at which the memory was accessed.
- * @mem_io_vaddr: Target virtual address at which the memory was accessed.
* @kvm_fd: vCPU file descriptor for KVM.
* @work_mutex: Lock to prevent multiple access to queued_work_*.
* @queued_work_first: First asynchronous work pending.
@@ -413,7 +412,6 @@ struct CPUState {
* we store some rarely used information in the CPU context.
*/
uintptr_t mem_io_pc;
- vaddr mem_io_vaddr;
/*
* This is only needed for the legacy cpu_unassigned_access() hook;
* when all targets using it have been converted to use
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 6f4096bd0d..257c59c08c 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -927,7 +927,6 @@ static uint64_t io_readx(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
cpu_io_recompile(cpu, retaddr);
}
- cpu->mem_io_vaddr = addr;
cpu->mem_io_access_type = access_type;
if (mr->global_locking && !qemu_mutex_iothread_locked()) {
@@ -967,7 +966,6 @@ static void io_writex(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
if (!cpu->can_do_io) {
cpu_io_recompile(cpu, retaddr);
}
- cpu->mem_io_vaddr = addr;
cpu->mem_io_pc = retaddr;
if (mr->global_locking && !qemu_mutex_iothread_locked()) {
diff --git a/hw/core/cpu.c b/hw/core/cpu.c
index 0035845511..73b1ee34d0 100644
--- a/hw/core/cpu.c
+++ b/hw/core/cpu.c
@@ -261,7 +261,6 @@ static void cpu_common_reset(CPUState *cpu)
cpu->interrupt_request = 0;
cpu->halted = 0;
cpu->mem_io_pc = 0;
- cpu->mem_io_vaddr = 0;
cpu->icount_extra = 0;
atomic_set(&cpu->icount_decr_ptr->u32, 0);
cpu->can_do_io = 1;
--
2.17.1
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCH v3 18/20] cputlb: Remove tb_invalidate_phys_page_range is_cpu_write_access
2019-09-22 3:54 [PATCH v3 00/20] Move rom and notdirty handling to cputlb Richard Henderson
` (16 preceding siblings ...)
2019-09-22 3:54 ` [PATCH v3 17/20] cputlb: Remove cpu->mem_io_vaddr Richard Henderson
@ 2019-09-22 3:54 ` Richard Henderson
2019-09-23 8:52 ` David Hildenbrand
2019-09-22 3:54 ` [PATCH v3 19/20] cputlb: Pass retaddr to tb_invalidate_phys_page_fast Richard Henderson
` (4 subsequent siblings)
22 siblings, 1 reply; 53+ messages in thread
From: Richard Henderson @ 2019-09-22 3:54 UTC (permalink / raw)
To: qemu-devel; +Cc: pbonzini, alex.bennee, stefanha, david
All callers pass false to this argument. Remove it and pass the
constant on to tb_invalidate_phys_page_range__locked.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
accel/tcg/translate-all.h | 3 +--
accel/tcg/translate-all.c | 6 ++----
exec.c | 4 ++--
3 files changed, 5 insertions(+), 8 deletions(-)
diff --git a/accel/tcg/translate-all.h b/accel/tcg/translate-all.h
index 64f5fd9a05..31f2117188 100644
--- a/accel/tcg/translate-all.h
+++ b/accel/tcg/translate-all.h
@@ -28,8 +28,7 @@ struct page_collection *page_collection_lock(tb_page_addr_t start,
void page_collection_unlock(struct page_collection *set);
void tb_invalidate_phys_page_fast(struct page_collection *pages,
tb_page_addr_t start, int len);
-void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
- int is_cpu_write_access);
+void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end);
void tb_check_watchpoint(CPUState *cpu);
#ifdef CONFIG_USER_ONLY
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 5d1e08b169..de4b697163 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1983,8 +1983,7 @@ tb_invalidate_phys_page_range__locked(struct page_collection *pages,
*
* Called with mmap_lock held for user-mode emulation
*/
-void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
- int is_cpu_write_access)
+void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end)
{
struct page_collection *pages;
PageDesc *p;
@@ -1996,8 +1995,7 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
return;
}
pages = page_collection_lock(start, end);
- tb_invalidate_phys_page_range__locked(pages, p, start, end,
- is_cpu_write_access);
+ tb_invalidate_phys_page_range__locked(pages, p, start, end, 0);
page_collection_unlock(pages);
}
diff --git a/exec.c b/exec.c
index 090bcc05da..fed25d029b 100644
--- a/exec.c
+++ b/exec.c
@@ -978,7 +978,7 @@ const char *parse_cpu_option(const char *cpu_option)
void tb_invalidate_phys_addr(target_ulong addr)
{
mmap_lock();
- tb_invalidate_phys_page_range(addr, addr + 1, 0);
+ tb_invalidate_phys_page_range(addr, addr + 1);
mmap_unlock();
}
@@ -1005,7 +1005,7 @@ void tb_invalidate_phys_addr(AddressSpace *as, hwaddr addr, MemTxAttrs attrs)
return;
}
ram_addr = memory_region_get_ram_addr(mr) + addr;
- tb_invalidate_phys_page_range(ram_addr, ram_addr + 1, 0);
+ tb_invalidate_phys_page_range(ram_addr, ram_addr + 1);
rcu_read_unlock();
}
--
2.17.1
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCH v3 19/20] cputlb: Pass retaddr to tb_invalidate_phys_page_fast
2019-09-22 3:54 [PATCH v3 00/20] Move rom and notdirty handling to cputlb Richard Henderson
` (17 preceding siblings ...)
2019-09-22 3:54 ` [PATCH v3 18/20] cputlb: Remove tb_invalidate_phys_page_range is_cpu_write_access Richard Henderson
@ 2019-09-22 3:54 ` Richard Henderson
2019-09-23 8:53 ` David Hildenbrand
2019-09-22 3:54 ` [PATCH v3 20/20] cputlb: Pass retaddr to tb_check_watchpoint Richard Henderson
` (3 subsequent siblings)
22 siblings, 1 reply; 53+ messages in thread
From: Richard Henderson @ 2019-09-22 3:54 UTC (permalink / raw)
To: qemu-devel; +Cc: pbonzini, alex.bennee, stefanha, david
Rather than rely on cpu->mem_io_pc, pass retaddr down directly.
Within tb_invalidate_phys_page_range__locked, the is_cpu_write_access
parameter is non-zero exactly when retaddr would be non-zero, so that
is a simple replacement.
Recognize that current_tb_not_found is true only when mem_io_pc
(and now retaddr) are also non-zero, so remove a redundant test.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
accel/tcg/translate-all.h | 3 ++-
accel/tcg/cputlb.c | 6 +-----
accel/tcg/translate-all.c | 39 +++++++++++++++++++--------------------
3 files changed, 22 insertions(+), 26 deletions(-)
diff --git a/accel/tcg/translate-all.h b/accel/tcg/translate-all.h
index 31f2117188..135c1ea96a 100644
--- a/accel/tcg/translate-all.h
+++ b/accel/tcg/translate-all.h
@@ -27,7 +27,8 @@ struct page_collection *page_collection_lock(tb_page_addr_t start,
tb_page_addr_t end);
void page_collection_unlock(struct page_collection *set);
void tb_invalidate_phys_page_fast(struct page_collection *pages,
- tb_page_addr_t start, int len);
+ tb_page_addr_t start, int len,
+ uintptr_t retaddr);
void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end);
void tb_check_watchpoint(CPUState *cpu);
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 257c59c08c..eff129447d 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -1093,11 +1093,7 @@ static void notdirty_write(CPUState *cpu, vaddr mem_vaddr, unsigned size,
if (!cpu_physical_memory_get_dirty_flag(ram_addr, DIRTY_MEMORY_CODE)) {
struct page_collection *pages
= page_collection_lock(ram_addr, ram_addr + size);
-
- /* We require mem_io_pc in tb_invalidate_phys_page_range. */
- cpu->mem_io_pc = retaddr;
-
- tb_invalidate_phys_page_fast(pages, ram_addr, size);
+ tb_invalidate_phys_page_fast(pages, ram_addr, size, retaddr);
page_collection_unlock(pages);
}
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index de4b697163..db77fb221b 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1889,7 +1889,7 @@ static void
tb_invalidate_phys_page_range__locked(struct page_collection *pages,
PageDesc *p, tb_page_addr_t start,
tb_page_addr_t end,
- int is_cpu_write_access)
+ uintptr_t retaddr)
{
TranslationBlock *tb;
tb_page_addr_t tb_start, tb_end;
@@ -1897,9 +1897,9 @@ tb_invalidate_phys_page_range__locked(struct page_collection *pages,
#ifdef TARGET_HAS_PRECISE_SMC
CPUState *cpu = current_cpu;
CPUArchState *env = NULL;
- int current_tb_not_found = is_cpu_write_access;
+ bool current_tb_not_found = retaddr != 0;
+ bool current_tb_modified = false;
TranslationBlock *current_tb = NULL;
- int current_tb_modified = 0;
target_ulong current_pc = 0;
target_ulong current_cs_base = 0;
uint32_t current_flags = 0;
@@ -1931,24 +1931,21 @@ tb_invalidate_phys_page_range__locked(struct page_collection *pages,
if (!(tb_end <= start || tb_start >= end)) {
#ifdef TARGET_HAS_PRECISE_SMC
if (current_tb_not_found) {
- current_tb_not_found = 0;
- current_tb = NULL;
- if (cpu->mem_io_pc) {
- /* now we have a real cpu fault */
- current_tb = tcg_tb_lookup(cpu->mem_io_pc);
- }
+ current_tb_not_found = false;
+ /* now we have a real cpu fault */
+ current_tb = tcg_tb_lookup(retaddr);
}
if (current_tb == tb &&
(tb_cflags(current_tb) & CF_COUNT_MASK) != 1) {
- /* If we are modifying the current TB, we must stop
- its execution. We could be more precise by checking
- that the modification is after the current PC, but it
- would require a specialized function to partially
- restore the CPU state */
-
- current_tb_modified = 1;
- cpu_restore_state_from_tb(cpu, current_tb,
- cpu->mem_io_pc, true);
+ /*
+ * If we are modifying the current TB, we must stop
+ * its execution. We could be more precise by checking
+ * that the modification is after the current PC, but it
+ * would require a specialized function to partially
+ * restore the CPU state.
+ */
+ current_tb_modified = true;
+ cpu_restore_state_from_tb(cpu, current_tb, retaddr, true);
cpu_get_tb_cpu_state(env, ¤t_pc, ¤t_cs_base,
¤t_flags);
}
@@ -2042,7 +2039,8 @@ void tb_invalidate_phys_range(target_ulong start, target_ulong end)
* Call with all @pages in the range [@start, @start + len[ locked.
*/
void tb_invalidate_phys_page_fast(struct page_collection *pages,
- tb_page_addr_t start, int len)
+ tb_page_addr_t start, int len,
+ uintptr_t retaddr)
{
PageDesc *p;
@@ -2069,7 +2067,8 @@ void tb_invalidate_phys_page_fast(struct page_collection *pages,
}
} else {
do_invalidate:
- tb_invalidate_phys_page_range__locked(pages, p, start, start + len, 1);
+ tb_invalidate_phys_page_range__locked(pages, p, start, start + len,
+ retaddr);
}
}
#else
--
2.17.1
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCH v3 20/20] cputlb: Pass retaddr to tb_check_watchpoint
2019-09-22 3:54 [PATCH v3 00/20] Move rom and notdirty handling to cputlb Richard Henderson
` (18 preceding siblings ...)
2019-09-22 3:54 ` [PATCH v3 19/20] cputlb: Pass retaddr to tb_invalidate_phys_page_fast Richard Henderson
@ 2019-09-22 3:54 ` Richard Henderson
2019-09-23 8:54 ` David Hildenbrand
2019-09-22 4:02 ` [PATCH v3 00/20] Move rom and notdirty handling to cputlb Richard Henderson
` (2 subsequent siblings)
22 siblings, 1 reply; 53+ messages in thread
From: Richard Henderson @ 2019-09-22 3:54 UTC (permalink / raw)
To: qemu-devel; +Cc: pbonzini, alex.bennee, stefanha, david
Fixes the previous TLB_WATCHPOINT patches because we are currently
failing to set cpu->mem_io_pc with the call to cpu_check_watchpoint.
Pass down the retaddr directly because it's readily available.
Fixes: 50b107c5d61
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
accel/tcg/translate-all.h | 2 +-
accel/tcg/translate-all.c | 6 +++---
exec.c | 2 +-
3 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/accel/tcg/translate-all.h b/accel/tcg/translate-all.h
index 135c1ea96a..a557b4e2bb 100644
--- a/accel/tcg/translate-all.h
+++ b/accel/tcg/translate-all.h
@@ -30,7 +30,7 @@ void tb_invalidate_phys_page_fast(struct page_collection *pages,
tb_page_addr_t start, int len,
uintptr_t retaddr);
void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end);
-void tb_check_watchpoint(CPUState *cpu);
+void tb_check_watchpoint(CPUState *cpu, uintptr_t retaddr);
#ifdef CONFIG_USER_ONLY
int page_unprotect(target_ulong address, uintptr_t pc);
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index db77fb221b..66d4bc4341 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -2142,16 +2142,16 @@ static bool tb_invalidate_phys_page(tb_page_addr_t addr, uintptr_t pc)
#endif
/* user-mode: call with mmap_lock held */
-void tb_check_watchpoint(CPUState *cpu)
+void tb_check_watchpoint(CPUState *cpu, uintptr_t retaddr)
{
TranslationBlock *tb;
assert_memory_lock();
- tb = tcg_tb_lookup(cpu->mem_io_pc);
+ tb = tcg_tb_lookup(retaddr);
if (tb) {
/* We can use retranslation to find the PC. */
- cpu_restore_state_from_tb(cpu, tb, cpu->mem_io_pc, true);
+ cpu_restore_state_from_tb(cpu, tb, retaddr, true);
tb_phys_invalidate(tb, -1);
} else {
/* The exception probably happened in a helper. The CPU state should
diff --git a/exec.c b/exec.c
index fed25d029b..ceeef4cd4b 100644
--- a/exec.c
+++ b/exec.c
@@ -2724,7 +2724,7 @@ void cpu_check_watchpoint(CPUState *cpu, vaddr addr, vaddr len,
cpu->watchpoint_hit = wp;
mmap_lock();
- tb_check_watchpoint(cpu);
+ tb_check_watchpoint(cpu, ra);
if (wp->flags & BP_STOP_BEFORE_ACCESS) {
cpu->exception_index = EXCP_DEBUG;
mmap_unlock();
--
2.17.1
^ permalink raw reply related [flat|nested] 53+ messages in thread
* Re: [PATCH v3 00/20] Move rom and notdirty handling to cputlb
2019-09-22 3:54 [PATCH v3 00/20] Move rom and notdirty handling to cputlb Richard Henderson
` (19 preceding siblings ...)
2019-09-22 3:54 ` [PATCH v3 20/20] cputlb: Pass retaddr to tb_check_watchpoint Richard Henderson
@ 2019-09-22 4:02 ` Richard Henderson
2019-09-22 6:46 ` no-reply
2019-09-23 8:23 ` David Hildenbrand
22 siblings, 0 replies; 53+ messages in thread
From: Richard Henderson @ 2019-09-22 4:02 UTC (permalink / raw)
To: qemu-devel; +Cc: pbonzini, alex.bennee, stefanha, david
On 9/21/19 8:54 PM, Richard Henderson wrote:
> Richard Henderson (20):
> exec: Use TARGET_PAGE_BITS_MIN for TLB flags
> exec: Split out variable page size support to exec-vary.c
> exec: Use const alias for TARGET_PAGE_BITS_VARY
> exec: Restrict TARGET_PAGE_BITS_VARY assert to CONFIG_DEBUG_TCG
> exec: Promote TARGET_PAGE_MASK to target_long
> exec: Tidy TARGET_PAGE_ALIGN
> exec: Cache TARGET_PAGE_MASK for TARGET_PAGE_BITS_VARY
I didn't mean to include the TARGET_PAGE_BITS_VARY patches.
Ignore the first 7, or review them at your pleasure. ;-)
r~
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v3 00/20] Move rom and notdirty handling to cputlb
2019-09-22 3:54 [PATCH v3 00/20] Move rom and notdirty handling to cputlb Richard Henderson
` (20 preceding siblings ...)
2019-09-22 4:02 ` [PATCH v3 00/20] Move rom and notdirty handling to cputlb Richard Henderson
@ 2019-09-22 6:46 ` no-reply
2019-09-23 8:23 ` David Hildenbrand
22 siblings, 0 replies; 53+ messages in thread
From: no-reply @ 2019-09-22 6:46 UTC (permalink / raw)
To: richard.henderson; +Cc: pbonzini, alex.bennee, qemu-devel, stefanha, david
Patchew URL: https://patchew.org/QEMU/20190922035458.14879-1-richard.henderson@linaro.org/
Hi,
This series seems to have some coding style problems. See output below for
more information:
Message-id: 20190922035458.14879-1-richard.henderson@linaro.org
Subject: [PATCH v3 00/20] Move rom and notdirty handling to cputlb
Type: series
=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===
From https://github.com/patchew-project/qemu
* [new tag] patchew/20190922035458.14879-1-richard.henderson@linaro.org -> patchew/20190922035458.14879-1-richard.henderson@linaro.org
Switched to a new branch 'test'
34dd130 cputlb: Pass retaddr to tb_check_watchpoint
7d95821 cputlb: Pass retaddr to tb_invalidate_phys_page_fast
0f82f7a cputlb: Remove tb_invalidate_phys_page_range is_cpu_write_access
143b897 cputlb: Remove cpu->mem_io_vaddr
1ac8201 cputlb: Handle TLB_NOTDIRTY in probe_access
d9ff1a3 cputlb: Merge and move memory_notdirty_write_{prepare, complete}
32abbae cputlb: Partially inline memory_region_section_get_iotlb
3976ef9 cputlb: Move NOTDIRTY handling from I/O path to TLB path
4f3e8aa cputlb: Move ROM handling from I/O path to TLB path
3c42e98 exec: Adjust notdirty tracing
e3bb276 cputlb: Introduce TLB_BSWAP
e48326a cputlb: Replace switches in load/store_helper with callback
287b700 cputlb: Disable __always_inline__ without optimization
8c801ae exec: Cache TARGET_PAGE_MASK for TARGET_PAGE_BITS_VARY
2f76655 exec: Tidy TARGET_PAGE_ALIGN
aa7fdf2 exec: Promote TARGET_PAGE_MASK to target_long
20c6e62 exec: Restrict TARGET_PAGE_BITS_VARY assert to CONFIG_DEBUG_TCG
4db4c87 exec: Use const alias for TARGET_PAGE_BITS_VARY
0d3457e exec: Split out variable page size support to exec-vary.c
b4b2c05 exec: Use TARGET_PAGE_BITS_MIN for TLB flags
=== OUTPUT BEGIN ===
1/20 Checking commit b4b2c05eaa12 (exec: Use TARGET_PAGE_BITS_MIN for TLB flags)
2/20 Checking commit 0d3457ee2850 (exec: Split out variable page size support to exec-vary.c)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#33:
new file mode 100644
total: 0 errors, 1 warnings, 125 lines checked
Patch 2/20 has style problems, please review. If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
3/20 Checking commit 4db4c8782d8c (exec: Use const alias for TARGET_PAGE_BITS_VARY)
ERROR: externs should be avoided in .c files
#58: FILE: exec-vary.c:53:
+extern const TargetPageBits target_page
total: 1 errors, 0 warnings, 82 lines checked
Patch 3/20 has style problems, please review. If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
4/20 Checking commit 20c6e620cee9 (exec: Restrict TARGET_PAGE_BITS_VARY assert to CONFIG_DEBUG_TCG)
5/20 Checking commit aa7fdf250f2b (exec: Promote TARGET_PAGE_MASK to target_long)
6/20 Checking commit 2f766556467d (exec: Tidy TARGET_PAGE_ALIGN)
7/20 Checking commit 8c801ae246fe (exec: Cache TARGET_PAGE_MASK for TARGET_PAGE_BITS_VARY)
8/20 Checking commit 287b700d5006 (cputlb: Disable __always_inline__ without optimization)
WARNING: architecture specific defines should be avoided
#50: FILE: include/qemu/compiler.h:178:
+#if defined(__OPTIMIZE__) && __has_attribute(always_inline)
total: 0 errors, 1 warnings, 33 lines checked
Patch 8/20 has style problems, please review. If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
9/20 Checking commit e48326ae3127 (cputlb: Replace switches in load/store_helper with callback)
10/20 Checking commit e3bb276cde3b (cputlb: Introduce TLB_BSWAP)
11/20 Checking commit 3c42e98e5a4a (exec: Adjust notdirty tracing)
12/20 Checking commit 4f3e8aa354bc (cputlb: Move ROM handling from I/O path to TLB path)
13/20 Checking commit 3976ef9549a0 (cputlb: Move NOTDIRTY handling from I/O path to TLB path)
14/20 Checking commit 32abbae47942 (cputlb: Partially inline memory_region_section_get_iotlb)
15/20 Checking commit d9ff1a3e3b82 (cputlb: Merge and move memory_notdirty_write_{prepare, complete})
16/20 Checking commit 1ac820151e2f (cputlb: Handle TLB_NOTDIRTY in probe_access)
17/20 Checking commit 143b897be8e4 (cputlb: Remove cpu->mem_io_vaddr)
18/20 Checking commit 0f82f7a7365d (cputlb: Remove tb_invalidate_phys_page_range is_cpu_write_access)
19/20 Checking commit 7d95821f8ce9 (cputlb: Pass retaddr to tb_invalidate_phys_page_fast)
20/20 Checking commit 34dd130e1b50 (cputlb: Pass retaddr to tb_check_watchpoint)
=== OUTPUT END ===
Test command exited with code: 1
The full log is available at
http://patchew.org/logs/20190922035458.14879-1-richard.henderson@linaro.org/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v3 00/20] Move rom and notdirty handling to cputlb
2019-09-22 3:54 [PATCH v3 00/20] Move rom and notdirty handling to cputlb Richard Henderson
` (21 preceding siblings ...)
2019-09-22 6:46 ` no-reply
@ 2019-09-23 8:23 ` David Hildenbrand
22 siblings, 0 replies; 53+ messages in thread
From: David Hildenbrand @ 2019-09-23 8:23 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: pbonzini, alex.bennee, stefanha
On 22.09.19 05:54, Richard Henderson wrote:
> Ok! Third time is the charm, because this time it works.
Yeay :) I don't wanna know how hard it was to debug that...
>
> New to v3:
>
> * Covert io_mem_rom with a new TLB_ROM bit.
>
> * This in turn means that there are no longer any special RAM
> case along along the MMIO path -- they all have devices on
> the other end.
That sounds like a really nice cleanup.
>
> * This in turn means that we can fold the bulk of
> memory_region_section_get_iotlb into tlb_set_page_with_attrs,
> a couple of redundant tests vs the MemoryRegion.
> The result in patch 14 is, IMO, much more understandable.
>
> * Fold away uses of cpu->mem_io_pc in tb_invalidate_phys_page__locked,
> the cause of the problems for my previous two patch sets.
>
> BTW, I was correct with my guess in the v2 cover letter that the use
> of memory_notdirty_write_{prepare,complete} within atomic_mmu_lookup
> must have been broken, for not setting mem_io_pc. :-P
>
> * Fix a missed use of cpu->mem_io_pc in tb_check_watchpoint,
> which meant that the previous TLB_WATCHPOINT cleanup was a
> titch broken.
So there was a PC already getting stored.
>
> The remaining two users of cpu->mem_io_pc are hw/misc/mips_itu.c and
> target/i386/helper.c. I haven't looked, but I assume that these are
> legitimately on the MMIO path, and there probably isn't a decent way
> to remove the uses.
>
>
> r~
>
>
> Richard Henderson (20):
> exec: Use TARGET_PAGE_BITS_MIN for TLB flags
> exec: Split out variable page size support to exec-vary.c
> exec: Use const alias for TARGET_PAGE_BITS_VARY
> exec: Restrict TARGET_PAGE_BITS_VARY assert to CONFIG_DEBUG_TCG
> exec: Promote TARGET_PAGE_MASK to target_long
> exec: Tidy TARGET_PAGE_ALIGN
> exec: Cache TARGET_PAGE_MASK for TARGET_PAGE_BITS_VARY
> cputlb: Disable __always_inline__ without optimization
> cputlb: Replace switches in load/store_helper with callback
> cputlb: Introduce TLB_BSWAP
> exec: Adjust notdirty tracing
> cputlb: Move ROM handling from I/O path to TLB path
> cputlb: Move NOTDIRTY handling from I/O path to TLB path
> cputlb: Partially inline memory_region_section_get_iotlb
> cputlb: Merge and move memory_notdirty_write_{prepare,complete}
> cputlb: Handle TLB_NOTDIRTY in probe_access
> cputlb: Remove cpu->mem_io_vaddr
> cputlb: Remove tb_invalidate_phys_page_range is_cpu_write_access
> cputlb: Pass retaddr to tb_invalidate_phys_page_fast
> cputlb: Pass retaddr to tb_check_watchpoint
>
> Makefile.target | 2 +-
> accel/tcg/translate-all.h | 8 +-
> include/exec/cpu-all.h | 48 ++--
> include/exec/cpu-common.h | 3 -
> include/exec/exec-all.h | 6 +-
> include/exec/memory-internal.h | 65 ------
> include/hw/core/cpu.h | 2 -
> include/qemu-common.h | 6 +
> include/qemu/compiler.h | 11 +
> accel/tcg/cputlb.c | 388 +++++++++++++++++++--------------
> accel/tcg/translate-all.c | 51 ++---
> exec-vary.c | 88 ++++++++
> exec.c | 192 +---------------
> hw/core/cpu.c | 1 -
> memory.c | 20 --
> trace-events | 4 +-
> 16 files changed, 403 insertions(+), 492 deletions(-)
> create mode 100644 exec-vary.c
>
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v3 01/20] exec: Use TARGET_PAGE_BITS_MIN for TLB flags
2019-09-22 3:54 ` [PATCH v3 01/20] exec: Use TARGET_PAGE_BITS_MIN for TLB flags Richard Henderson
@ 2019-09-23 8:24 ` David Hildenbrand
0 siblings, 0 replies; 53+ messages in thread
From: David Hildenbrand @ 2019-09-23 8:24 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: pbonzini, alex.bennee, stefanha
On 22.09.19 05:54, Richard Henderson wrote:
> These bits do not need to vary with the actual page size
> used by the guest.
>
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> include/exec/cpu-all.h | 16 ++++++++++------
> 1 file changed, 10 insertions(+), 6 deletions(-)
>
> diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
> index d2d443c4f9..e0c8dc540c 100644
> --- a/include/exec/cpu-all.h
> +++ b/include/exec/cpu-all.h
> @@ -317,20 +317,24 @@ CPUArchState *cpu_copy(CPUArchState *env);
>
> #if !defined(CONFIG_USER_ONLY)
>
> -/* Flags stored in the low bits of the TLB virtual address. These are
> - * defined so that fast path ram access is all zeros.
> +/*
> + * Flags stored in the low bits of the TLB virtual address.
> + * These are defined so that fast path ram access is all zeros.
> * The flags all must be between TARGET_PAGE_BITS and
> * maximum address alignment bit.
> + *
> + * Use TARGET_PAGE_BITS_MIN so that these bits are constant
> + * when TARGET_PAGE_BITS_VARY is in effect.
> */
> /* Zero if TLB entry is valid. */
> -#define TLB_INVALID_MASK (1 << (TARGET_PAGE_BITS - 1))
> +#define TLB_INVALID_MASK (1 << (TARGET_PAGE_BITS_MIN - 1))
> /* Set if TLB entry references a clean RAM page. The iotlb entry will
> contain the page physical address. */
> -#define TLB_NOTDIRTY (1 << (TARGET_PAGE_BITS - 2))
> +#define TLB_NOTDIRTY (1 << (TARGET_PAGE_BITS_MIN - 2))
> /* Set if TLB entry is an IO callback. */
> -#define TLB_MMIO (1 << (TARGET_PAGE_BITS - 3))
> +#define TLB_MMIO (1 << (TARGET_PAGE_BITS_MIN - 3))
> /* Set if TLB entry contains a watchpoint. */
> -#define TLB_WATCHPOINT (1 << (TARGET_PAGE_BITS - 4))
> +#define TLB_WATCHPOINT (1 << (TARGET_PAGE_BITS_MIN - 4))
>
> /* Use this mask to check interception with an alignment mask
> * in a TCG backend.
>
Reviewed-by: David Hildenbrand <david@redhat.com>
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v3 02/20] exec: Split out variable page size support to exec-vary.c
2019-09-22 3:54 ` [PATCH v3 02/20] exec: Split out variable page size support to exec-vary.c Richard Henderson
@ 2019-09-23 8:26 ` David Hildenbrand
2019-09-23 16:27 ` Richard Henderson
0 siblings, 1 reply; 53+ messages in thread
From: David Hildenbrand @ 2019-09-23 8:26 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: pbonzini, alex.bennee, stefanha
On 22.09.19 05:54, Richard Henderson wrote:
> The next patch will play a trick with "const" that will
> confuse the compiler about the uses of target_page_bits
> within exec.c. Moving everything to a new file prevents
> this confusion.
>
> No functional change so far.
>
> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> Makefile.target | 2 +-
> include/qemu-common.h | 6 +++++
> exec-vary.c | 57 +++++++++++++++++++++++++++++++++++++++++++
> exec.c | 34 --------------------------
> 4 files changed, 64 insertions(+), 35 deletions(-)
> create mode 100644 exec-vary.c
>
> diff --git a/Makefile.target b/Makefile.target
> index 5e916230c4..ca3d14efe1 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -107,7 +107,7 @@ obj-y += trace/
>
> #########################################################
> # cpu emulator library
> -obj-y += exec.o
> +obj-y += exec.o exec-vary.o
> obj-y += accel/
> obj-$(CONFIG_TCG) += tcg/tcg.o tcg/tcg-op.o tcg/tcg-op-vec.o tcg/tcg-op-gvec.o
> obj-$(CONFIG_TCG) += tcg/tcg-common.o tcg/optimize.o
> diff --git a/include/qemu-common.h b/include/qemu-common.h
> index 8d84db90b0..082da59e85 100644
> --- a/include/qemu-common.h
> +++ b/include/qemu-common.h
> @@ -74,6 +74,12 @@ void cpu_exec_step_atomic(CPUState *cpu);
> */
> bool set_preferred_target_page_bits(int bits);
>
> +/**
> + * finalize_target_page_bits:
> + * Commit the final value set by set_preferred_target_page_bits.
> + */
> +void finalize_target_page_bits(void);
> +
> /**
> * Sends a (part of) iovec down a socket, yielding when the socket is full, or
> * Receives data into a (part of) iovec from a socket,
> diff --git a/exec-vary.c b/exec-vary.c
> new file mode 100644
> index 0000000000..48c0ab306c
> --- /dev/null
> +++ b/exec-vary.c
> @@ -0,0 +1,57 @@
> +/*
> + * Variable page size handling
> + *
> + * Copyright (c) 2003 Fabrice Bellard
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2 of the License, or (at your option) any later version.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this library; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu-common.h"
> +#include "exec/exec-all.h"
> +
> +#ifdef TARGET_PAGE_BITS_VARY
> +int target_page_bits;
> +bool target_page_bits_decided;
> +#endif
> +
> +bool set_preferred_target_page_bits(int bits)
> +{
> + /*
> + * The target page size is the lowest common denominator for all
> + * the CPUs in the system, so we can only make it smaller, never
> + * larger. And we can't make it smaller once we've committed to
> + * a particular size.
> + */
> +#ifdef TARGET_PAGE_BITS_VARY
> + assert(bits >= TARGET_PAGE_BITS_MIN);
> + if (target_page_bits == 0 || target_page_bits > bits) {
> + if (target_page_bits_decided) {
> + return false;
> + }
> + target_page_bits = bits;
> + }
> +#endif
> + return true;
> +}
> +
> +void finalize_target_page_bits(void)
> +{
> +#ifdef TARGET_PAGE_BITS_VARY
> + if (target_page_bits == 0) {
> + target_page_bits = TARGET_PAGE_BITS_MIN;
> + }
> + target_page_bits_decided = true;
> +#endif
> +}
I wonder if it would be nicer to handle this in the header file instead,
providing dummy functions there.
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v3 05/20] exec: Promote TARGET_PAGE_MASK to target_long
2019-09-22 3:54 ` [PATCH v3 05/20] exec: Promote TARGET_PAGE_MASK to target_long Richard Henderson
@ 2019-09-23 8:30 ` David Hildenbrand
0 siblings, 0 replies; 53+ messages in thread
From: David Hildenbrand @ 2019-09-23 8:30 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: pbonzini, alex.bennee, stefanha
On 22.09.19 05:54, Richard Henderson wrote:
> There are some uint64_t uses that expect TARGET_PAGE_MASK to
> extend for a 32-bit, so this must continue to be a signed type.
> Define based on TARGET_PAGE_BITS not TARGET_PAGE_SIZE; this
> will make a following patch more clear.
>
> This should not have a functional effect so far.
>
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> include/exec/cpu-all.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
> index b11ee1f711..34d36cebca 100644
> --- a/include/exec/cpu-all.h
> +++ b/include/exec/cpu-all.h
> @@ -225,7 +225,7 @@ extern const TargetPageBits target_page;
> #endif
>
> #define TARGET_PAGE_SIZE (1 << TARGET_PAGE_BITS)
> -#define TARGET_PAGE_MASK ~(TARGET_PAGE_SIZE - 1)
> +#define TARGET_PAGE_MASK ((target_long)-1 << TARGET_PAGE_BITS)
> #define TARGET_PAGE_ALIGN(addr) (((addr) + TARGET_PAGE_SIZE - 1) & TARGET_PAGE_MASK)
>
> /* Using intptr_t ensures that qemu_*_page_mask is sign-extended even
>
Reviewed-by: David Hildenbrand <david@redhat.com>
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v3 06/20] exec: Tidy TARGET_PAGE_ALIGN
2019-09-22 3:54 ` [PATCH v3 06/20] exec: Tidy TARGET_PAGE_ALIGN Richard Henderson
@ 2019-09-23 8:30 ` David Hildenbrand
0 siblings, 0 replies; 53+ messages in thread
From: David Hildenbrand @ 2019-09-23 8:30 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: pbonzini, alex.bennee, stefanha
On 22.09.19 05:54, Richard Henderson wrote:
> Use TARGET_PAGE_MASK twice instead of TARGET_PAGE_SIZE once.
> This is functionally identical, but will help a following patch.
>
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> include/exec/cpu-all.h | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
> index 34d36cebca..5246770271 100644
> --- a/include/exec/cpu-all.h
> +++ b/include/exec/cpu-all.h
> @@ -226,7 +226,8 @@ extern const TargetPageBits target_page;
>
> #define TARGET_PAGE_SIZE (1 << TARGET_PAGE_BITS)
> #define TARGET_PAGE_MASK ((target_long)-1 << TARGET_PAGE_BITS)
> -#define TARGET_PAGE_ALIGN(addr) (((addr) + TARGET_PAGE_SIZE - 1) & TARGET_PAGE_MASK)
> +#define TARGET_PAGE_ALIGN(addr) \
> + (((addr) + ~TARGET_PAGE_MASK) & TARGET_PAGE_MASK)
>
> /* Using intptr_t ensures that qemu_*_page_mask is sign-extended even
> * when intptr_t is 32-bit and we are aligning a long long.
>
Reviewed-by: David Hildenbrand <david@redhat.com>
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v3 07/20] exec: Cache TARGET_PAGE_MASK for TARGET_PAGE_BITS_VARY
2019-09-22 3:54 ` [PATCH v3 07/20] exec: Cache TARGET_PAGE_MASK for TARGET_PAGE_BITS_VARY Richard Henderson
@ 2019-09-23 8:31 ` David Hildenbrand
0 siblings, 0 replies; 53+ messages in thread
From: David Hildenbrand @ 2019-09-23 8:31 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: pbonzini, alex.bennee, stefanha
On 22.09.19 05:54, Richard Henderson wrote:
> This eliminates a set of runtime shifts. It turns out that we
> require TARGET_PAGE_MASK more often than TARGET_PAGE_SIZE, so
> redefine TARGET_PAGE_SIZE based on TARGET_PAGE_MASK instead of
> the other way around.
>
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> include/exec/cpu-all.h | 8 ++++++--
> exec-vary.c | 1 +
> 2 files changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
> index 5246770271..2db73c7a27 100644
> --- a/include/exec/cpu-all.h
> +++ b/include/exec/cpu-all.h
> @@ -213,19 +213,23 @@ static inline void stl_phys_notdirty(AddressSpace *as, hwaddr addr, uint32_t val
> typedef struct {
> bool decided;
> int bits;
> + target_long mask;
> } TargetPageBits;
> extern const TargetPageBits target_page;
> # ifdef CONFIG_DEBUG_TCG
> # define TARGET_PAGE_BITS (assert(target_page.decided), target_page.bits)
> +# define TARGET_PAGE_MASK (assert(target_page.decided), target_page.mask)
> # else
> # define TARGET_PAGE_BITS target_page.bits
> +# define TARGET_PAGE_MASK target_page.mask
> # endif
> +# define TARGET_PAGE_SIZE ((int)-TARGET_PAGE_MASK)
> #else
> #define TARGET_PAGE_BITS_MIN TARGET_PAGE_BITS
> +#define TARGET_PAGE_SIZE (1 << TARGET_PAGE_BITS)
> +#define TARGET_PAGE_MASK ((target_long)-1 << TARGET_PAGE_BITS)
> #endif
>
> -#define TARGET_PAGE_SIZE (1 << TARGET_PAGE_BITS)
> -#define TARGET_PAGE_MASK ((target_long)-1 << TARGET_PAGE_BITS)
> #define TARGET_PAGE_ALIGN(addr) \
> (((addr) + ~TARGET_PAGE_MASK) & TARGET_PAGE_MASK)
>
> diff --git a/exec-vary.c b/exec-vary.c
> index 67cdf57a9c..26daf281f2 100644
> --- a/exec-vary.c
> +++ b/exec-vary.c
> @@ -83,5 +83,6 @@ void finalize_target_page_bits(void)
> init_target_page.bits = TARGET_PAGE_BITS_MIN;
> }
> init_target_page.decided = true;
> + init_target_page.mask = (target_long)-1 << init_target_page.bits;
> #endif
> }
>
Reviewed-by: David Hildenbrand <david@redhat.com>
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v3 09/20] cputlb: Replace switches in load/store_helper with callback
2019-09-22 3:54 ` [PATCH v3 09/20] cputlb: Replace switches in load/store_helper with callback Richard Henderson
@ 2019-09-23 8:32 ` David Hildenbrand
2019-09-23 9:27 ` Philippe Mathieu-Daudé
2019-09-23 9:51 ` Paolo Bonzini
2 siblings, 0 replies; 53+ messages in thread
From: David Hildenbrand @ 2019-09-23 8:32 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: pbonzini, alex.bennee, stefanha
On 22.09.19 05:54, Richard Henderson wrote:
> Add a function parameter to perform the actual load/store to ram.
> With optimization, this results in identical code.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> accel/tcg/cputlb.c | 159 +++++++++++++++++++++++----------------------
> 1 file changed, 83 insertions(+), 76 deletions(-)
>
> diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
> index 2222b87764..b4a63d3928 100644
> --- a/accel/tcg/cputlb.c
> +++ b/accel/tcg/cputlb.c
> @@ -1280,11 +1280,38 @@ static void *atomic_mmu_lookup(CPUArchState *env, target_ulong addr,
>
> typedef uint64_t FullLoadHelper(CPUArchState *env, target_ulong addr,
> TCGMemOpIdx oi, uintptr_t retaddr);
> +typedef uint64_t LoadHelper(const void *);
> +
> +/* Wrap the unaligned load helpers to that they have a common signature. */
> +static inline uint64_t wrap_ldub(const void *haddr)
> +{
> + return ldub_p(haddr);
> +}
> +
> +static inline uint64_t wrap_lduw_be(const void *haddr)
> +{
> + return lduw_be_p(haddr);
> +}
> +
> +static inline uint64_t wrap_lduw_le(const void *haddr)
> +{
> + return lduw_le_p(haddr);
> +}
> +
> +static inline uint64_t wrap_ldul_be(const void *haddr)
> +{
> + return (uint32_t)ldl_be_p(haddr);
> +}
> +
> +static inline uint64_t wrap_ldul_le(const void *haddr)
> +{
> + return (uint32_t)ldl_le_p(haddr);
> +}
>
> static inline uint64_t QEMU_ALWAYS_INLINE
> load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
> uintptr_t retaddr, MemOp op, bool code_read,
> - FullLoadHelper *full_load)
> + FullLoadHelper *full_load, LoadHelper *direct)
> {
> uintptr_t mmu_idx = get_mmuidx(oi);
> uintptr_t index = tlb_index(env, mmu_idx, addr);
> @@ -1373,33 +1400,7 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
>
> do_aligned_access:
> haddr = (void *)((uintptr_t)addr + entry->addend);
> - switch (op) {
> - case MO_UB:
> - res = ldub_p(haddr);
> - break;
> - case MO_BEUW:
> - res = lduw_be_p(haddr);
> - break;
> - case MO_LEUW:
> - res = lduw_le_p(haddr);
> - break;
> - case MO_BEUL:
> - res = (uint32_t)ldl_be_p(haddr);
> - break;
> - case MO_LEUL:
> - res = (uint32_t)ldl_le_p(haddr);
> - break;
> - case MO_BEQ:
> - res = ldq_be_p(haddr);
> - break;
> - case MO_LEQ:
> - res = ldq_le_p(haddr);
> - break;
> - default:
> - g_assert_not_reached();
> - }
> -
> - return res;
> + return direct(haddr);
> }
>
> /*
> @@ -1415,7 +1416,8 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
> static uint64_t full_ldub_mmu(CPUArchState *env, target_ulong addr,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> - return load_helper(env, addr, oi, retaddr, MO_UB, false, full_ldub_mmu);
> + return load_helper(env, addr, oi, retaddr, MO_UB, false,
> + full_ldub_mmu, wrap_ldub);
> }
>
> tcg_target_ulong helper_ret_ldub_mmu(CPUArchState *env, target_ulong addr,
> @@ -1428,7 +1430,7 @@ static uint64_t full_le_lduw_mmu(CPUArchState *env, target_ulong addr,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> return load_helper(env, addr, oi, retaddr, MO_LEUW, false,
> - full_le_lduw_mmu);
> + full_le_lduw_mmu, wrap_lduw_le);
> }
>
> tcg_target_ulong helper_le_lduw_mmu(CPUArchState *env, target_ulong addr,
> @@ -1441,7 +1443,7 @@ static uint64_t full_be_lduw_mmu(CPUArchState *env, target_ulong addr,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> return load_helper(env, addr, oi, retaddr, MO_BEUW, false,
> - full_be_lduw_mmu);
> + full_be_lduw_mmu, wrap_lduw_be);
> }
>
> tcg_target_ulong helper_be_lduw_mmu(CPUArchState *env, target_ulong addr,
> @@ -1454,7 +1456,7 @@ static uint64_t full_le_ldul_mmu(CPUArchState *env, target_ulong addr,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> return load_helper(env, addr, oi, retaddr, MO_LEUL, false,
> - full_le_ldul_mmu);
> + full_le_ldul_mmu, wrap_ldul_le);
> }
>
> tcg_target_ulong helper_le_ldul_mmu(CPUArchState *env, target_ulong addr,
> @@ -1467,7 +1469,7 @@ static uint64_t full_be_ldul_mmu(CPUArchState *env, target_ulong addr,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> return load_helper(env, addr, oi, retaddr, MO_BEUL, false,
> - full_be_ldul_mmu);
> + full_be_ldul_mmu, wrap_ldul_be);
> }
>
> tcg_target_ulong helper_be_ldul_mmu(CPUArchState *env, target_ulong addr,
> @@ -1480,14 +1482,14 @@ uint64_t helper_le_ldq_mmu(CPUArchState *env, target_ulong addr,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> return load_helper(env, addr, oi, retaddr, MO_LEQ, false,
> - helper_le_ldq_mmu);
> + helper_le_ldq_mmu, ldq_le_p);
> }
>
> uint64_t helper_be_ldq_mmu(CPUArchState *env, target_ulong addr,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> return load_helper(env, addr, oi, retaddr, MO_BEQ, false,
> - helper_be_ldq_mmu);
> + helper_be_ldq_mmu, ldq_be_p);
> }
>
> /*
> @@ -1530,9 +1532,38 @@ tcg_target_ulong helper_be_ldsl_mmu(CPUArchState *env, target_ulong addr,
> * Store Helpers
> */
>
> +typedef void StoreHelper(void *, uint64_t);
> +
> +/* Wrap the unaligned store helpers to that they have a common signature. */
> +static inline void wrap_stb(void *haddr, uint64_t val)
> +{
> + stb_p(haddr, val);
> +}
> +
> +static inline void wrap_stw_be(void *haddr, uint64_t val)
> +{
> + stw_be_p(haddr, val);
> +}
> +
> +static inline void wrap_stw_le(void *haddr, uint64_t val)
> +{
> + stw_le_p(haddr, val);
> +}
> +
> +static inline void wrap_stl_be(void *haddr, uint64_t val)
> +{
> + stl_be_p(haddr, val);
> +}
> +
> +static inline void wrap_stl_le(void *haddr, uint64_t val)
> +{
> + stl_le_p(haddr, val);
> +}
> +
> static inline void QEMU_ALWAYS_INLINE
> store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
> - TCGMemOpIdx oi, uintptr_t retaddr, MemOp op)
> + TCGMemOpIdx oi, uintptr_t retaddr, MemOp op,
> + StoreHelper *direct)
> {
> uintptr_t mmu_idx = get_mmuidx(oi);
> uintptr_t index = tlb_index(env, mmu_idx, addr);
> @@ -1657,74 +1688,49 @@ store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
>
> do_aligned_access:
> haddr = (void *)((uintptr_t)addr + entry->addend);
> - switch (op) {
> - case MO_UB:
> - stb_p(haddr, val);
> - break;
> - case MO_BEUW:
> - stw_be_p(haddr, val);
> - break;
> - case MO_LEUW:
> - stw_le_p(haddr, val);
> - break;
> - case MO_BEUL:
> - stl_be_p(haddr, val);
> - break;
> - case MO_LEUL:
> - stl_le_p(haddr, val);
> - break;
> - case MO_BEQ:
> - stq_be_p(haddr, val);
> - break;
> - case MO_LEQ:
> - stq_le_p(haddr, val);
> - break;
> - default:
> - g_assert_not_reached();
> - break;
> - }
> + direct(haddr, val);
> }
>
> void helper_ret_stb_mmu(CPUArchState *env, target_ulong addr, uint8_t val,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> - store_helper(env, addr, val, oi, retaddr, MO_UB);
> + store_helper(env, addr, val, oi, retaddr, MO_UB, wrap_stb);
> }
>
> void helper_le_stw_mmu(CPUArchState *env, target_ulong addr, uint16_t val,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> - store_helper(env, addr, val, oi, retaddr, MO_LEUW);
> + store_helper(env, addr, val, oi, retaddr, MO_LEUW, wrap_stw_le);
> }
>
> void helper_be_stw_mmu(CPUArchState *env, target_ulong addr, uint16_t val,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> - store_helper(env, addr, val, oi, retaddr, MO_BEUW);
> + store_helper(env, addr, val, oi, retaddr, MO_BEUW, wrap_stw_be);
> }
>
> void helper_le_stl_mmu(CPUArchState *env, target_ulong addr, uint32_t val,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> - store_helper(env, addr, val, oi, retaddr, MO_LEUL);
> + store_helper(env, addr, val, oi, retaddr, MO_LEUL, wrap_stl_le);
> }
>
> void helper_be_stl_mmu(CPUArchState *env, target_ulong addr, uint32_t val,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> - store_helper(env, addr, val, oi, retaddr, MO_BEUL);
> + store_helper(env, addr, val, oi, retaddr, MO_BEUL, wrap_stl_be);
> }
>
> void helper_le_stq_mmu(CPUArchState *env, target_ulong addr, uint64_t val,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> - store_helper(env, addr, val, oi, retaddr, MO_LEQ);
> + store_helper(env, addr, val, oi, retaddr, MO_LEQ, stq_le_p);
> }
>
> void helper_be_stq_mmu(CPUArchState *env, target_ulong addr, uint64_t val,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> - store_helper(env, addr, val, oi, retaddr, MO_BEQ);
> + store_helper(env, addr, val, oi, retaddr, MO_BEQ, stq_be_p);
> }
>
> /* First set of helpers allows passing in of OI and RETADDR. This makes
> @@ -1789,7 +1795,8 @@ void helper_be_stq_mmu(CPUArchState *env, target_ulong addr, uint64_t val,
> static uint64_t full_ldub_cmmu(CPUArchState *env, target_ulong addr,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> - return load_helper(env, addr, oi, retaddr, MO_8, true, full_ldub_cmmu);
> + return load_helper(env, addr, oi, retaddr, MO_8, true,
> + full_ldub_cmmu, wrap_ldub);
> }
>
> uint8_t helper_ret_ldb_cmmu(CPUArchState *env, target_ulong addr,
> @@ -1802,7 +1809,7 @@ static uint64_t full_le_lduw_cmmu(CPUArchState *env, target_ulong addr,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> return load_helper(env, addr, oi, retaddr, MO_LEUW, true,
> - full_le_lduw_cmmu);
> + full_le_lduw_cmmu, wrap_lduw_le);
> }
>
> uint16_t helper_le_ldw_cmmu(CPUArchState *env, target_ulong addr,
> @@ -1815,7 +1822,7 @@ static uint64_t full_be_lduw_cmmu(CPUArchState *env, target_ulong addr,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> return load_helper(env, addr, oi, retaddr, MO_BEUW, true,
> - full_be_lduw_cmmu);
> + full_be_lduw_cmmu, wrap_lduw_be);
> }
>
> uint16_t helper_be_ldw_cmmu(CPUArchState *env, target_ulong addr,
> @@ -1828,7 +1835,7 @@ static uint64_t full_le_ldul_cmmu(CPUArchState *env, target_ulong addr,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> return load_helper(env, addr, oi, retaddr, MO_LEUL, true,
> - full_le_ldul_cmmu);
> + full_le_ldul_cmmu, wrap_ldul_le);
> }
>
> uint32_t helper_le_ldl_cmmu(CPUArchState *env, target_ulong addr,
> @@ -1841,7 +1848,7 @@ static uint64_t full_be_ldul_cmmu(CPUArchState *env, target_ulong addr,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> return load_helper(env, addr, oi, retaddr, MO_BEUL, true,
> - full_be_ldul_cmmu);
> + full_be_ldul_cmmu, wrap_ldul_be);
> }
>
> uint32_t helper_be_ldl_cmmu(CPUArchState *env, target_ulong addr,
> @@ -1854,12 +1861,12 @@ uint64_t helper_le_ldq_cmmu(CPUArchState *env, target_ulong addr,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> return load_helper(env, addr, oi, retaddr, MO_LEQ, true,
> - helper_le_ldq_cmmu);
> + helper_le_ldq_cmmu, ldq_le_p);
> }
>
> uint64_t helper_be_ldq_cmmu(CPUArchState *env, target_ulong addr,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> return load_helper(env, addr, oi, retaddr, MO_BEQ, true,
> - helper_be_ldq_cmmu);
> + helper_be_ldq_cmmu, ldq_be_p);
> }
>
Reviewed-by: David Hildenbrand <david@redhat.com>
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v3 10/20] cputlb: Introduce TLB_BSWAP
2019-09-22 3:54 ` [PATCH v3 10/20] cputlb: Introduce TLB_BSWAP Richard Henderson
@ 2019-09-23 8:33 ` David Hildenbrand
0 siblings, 0 replies; 53+ messages in thread
From: David Hildenbrand @ 2019-09-23 8:33 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: pbonzini, alex.bennee, stefanha
On 22.09.19 05:54, Richard Henderson wrote:
> Handle bswap on ram directly in load/store_helper. This fixes a
> bug with the previous implementation in that one cannot use the
> I/O path for RAM.
>
> Fixes: a26fc6f5152b47f1
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> include/exec/cpu-all.h | 4 +-
> accel/tcg/cputlb.c | 108 +++++++++++++++++++++--------------------
> 2 files changed, 59 insertions(+), 53 deletions(-)
>
> diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
> index 2db73c7a27..1ebd1b59ab 100644
> --- a/include/exec/cpu-all.h
> +++ b/include/exec/cpu-all.h
> @@ -346,12 +346,14 @@ CPUArchState *cpu_copy(CPUArchState *env);
> #define TLB_MMIO (1 << (TARGET_PAGE_BITS_MIN - 3))
> /* Set if TLB entry contains a watchpoint. */
> #define TLB_WATCHPOINT (1 << (TARGET_PAGE_BITS_MIN - 4))
> +/* Set if TLB entry requires byte swap. */
> +#define TLB_BSWAP (1 << (TARGET_PAGE_BITS_MIN - 5))
>
> /* Use this mask to check interception with an alignment mask
> * in a TCG backend.
> */
> #define TLB_FLAGS_MASK \
> - (TLB_INVALID_MASK | TLB_NOTDIRTY | TLB_MMIO | TLB_WATCHPOINT)
> + (TLB_INVALID_MASK | TLB_NOTDIRTY | TLB_MMIO | TLB_WATCHPOINT | TLB_BSWAP)
>
> /**
> * tlb_hit_page: return true if page aligned @addr is a hit against the
> diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
> index b4a63d3928..cb603917a2 100644
> --- a/accel/tcg/cputlb.c
> +++ b/accel/tcg/cputlb.c
> @@ -737,8 +737,7 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
> address |= TLB_INVALID_MASK;
> }
> if (attrs.byte_swap) {
> - /* Force the access through the I/O slow path. */
> - address |= TLB_MMIO;
> + address |= TLB_BSWAP;
> }
> if (!memory_region_is_ram(section->mr) &&
> !memory_region_is_romd(section->mr)) {
> @@ -901,10 +900,6 @@ static uint64_t io_readx(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
> bool locked = false;
> MemTxResult r;
>
> - if (iotlbentry->attrs.byte_swap) {
> - op ^= MO_BSWAP;
> - }
> -
> section = iotlb_to_section(cpu, iotlbentry->addr, iotlbentry->attrs);
> mr = section->mr;
> mr_offset = (iotlbentry->addr & TARGET_PAGE_MASK) + addr;
> @@ -947,10 +942,6 @@ static void io_writex(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
> bool locked = false;
> MemTxResult r;
>
> - if (iotlbentry->attrs.byte_swap) {
> - op ^= MO_BSWAP;
> - }
> -
> section = iotlb_to_section(cpu, iotlbentry->addr, iotlbentry->attrs);
> mr = section->mr;
> mr_offset = (iotlbentry->addr & TARGET_PAGE_MASK) + addr;
> @@ -1133,8 +1124,8 @@ void *probe_access(CPUArchState *env, target_ulong addr, int size,
> wp_access, retaddr);
> }
>
> - if (tlb_addr & (TLB_NOTDIRTY | TLB_MMIO)) {
> - /* I/O access */
> + /* Reject I/O access, or other required slow-path. */
> + if (tlb_addr & (TLB_NOTDIRTY | TLB_MMIO | TLB_BSWAP)) {
> return NULL;
> }
>
> @@ -1311,7 +1302,8 @@ static inline uint64_t wrap_ldul_le(const void *haddr)
> static inline uint64_t QEMU_ALWAYS_INLINE
> load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
> uintptr_t retaddr, MemOp op, bool code_read,
> - FullLoadHelper *full_load, LoadHelper *direct)
> + FullLoadHelper *full_load, LoadHelper *direct,
> + LoadHelper *direct_swap)
> {
> uintptr_t mmu_idx = get_mmuidx(oi);
> uintptr_t index = tlb_index(env, mmu_idx, addr);
> @@ -1361,17 +1353,21 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
> /* On watchpoint hit, this will longjmp out. */
> cpu_check_watchpoint(env_cpu(env), addr, size,
> iotlbentry->attrs, BP_MEM_READ, retaddr);
> -
> - /* The backing page may or may not require I/O. */
> - tlb_addr &= ~TLB_WATCHPOINT;
> - if ((tlb_addr & ~TARGET_PAGE_MASK) == 0) {
> - goto do_aligned_access;
> - }
> }
>
> /* Handle I/O access. */
> - return io_readx(env, iotlbentry, mmu_idx, addr,
> - retaddr, access_type, op);
> + if (likely(tlb_addr & TLB_MMIO)) {
> + return io_readx(env, iotlbentry, mmu_idx, addr,
> + retaddr, access_type,
> + op ^ (tlb_addr & TLB_BSWAP ? MO_BSWAP : 0));
> + }
> +
> + haddr = (void *)((uintptr_t)addr + entry->addend);
> +
> + if (unlikely(tlb_addr & TLB_BSWAP)) {
> + return direct_swap(haddr);
> + }
> + return direct(haddr);
> }
>
> /* Handle slow unaligned access (it spans two pages or IO). */
> @@ -1398,7 +1394,6 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
> return res & MAKE_64BIT_MASK(0, size * 8);
> }
>
> - do_aligned_access:
> haddr = (void *)((uintptr_t)addr + entry->addend);
> return direct(haddr);
> }
> @@ -1417,7 +1412,7 @@ static uint64_t full_ldub_mmu(CPUArchState *env, target_ulong addr,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> return load_helper(env, addr, oi, retaddr, MO_UB, false,
> - full_ldub_mmu, wrap_ldub);
> + full_ldub_mmu, wrap_ldub, wrap_ldub);
> }
>
> tcg_target_ulong helper_ret_ldub_mmu(CPUArchState *env, target_ulong addr,
> @@ -1430,7 +1425,7 @@ static uint64_t full_le_lduw_mmu(CPUArchState *env, target_ulong addr,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> return load_helper(env, addr, oi, retaddr, MO_LEUW, false,
> - full_le_lduw_mmu, wrap_lduw_le);
> + full_le_lduw_mmu, wrap_lduw_le, wrap_lduw_be);
> }
>
> tcg_target_ulong helper_le_lduw_mmu(CPUArchState *env, target_ulong addr,
> @@ -1443,7 +1438,7 @@ static uint64_t full_be_lduw_mmu(CPUArchState *env, target_ulong addr,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> return load_helper(env, addr, oi, retaddr, MO_BEUW, false,
> - full_be_lduw_mmu, wrap_lduw_be);
> + full_be_lduw_mmu, wrap_lduw_be, wrap_lduw_le);
> }
>
> tcg_target_ulong helper_be_lduw_mmu(CPUArchState *env, target_ulong addr,
> @@ -1456,7 +1451,7 @@ static uint64_t full_le_ldul_mmu(CPUArchState *env, target_ulong addr,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> return load_helper(env, addr, oi, retaddr, MO_LEUL, false,
> - full_le_ldul_mmu, wrap_ldul_le);
> + full_le_ldul_mmu, wrap_ldul_le, wrap_ldul_be);
> }
>
> tcg_target_ulong helper_le_ldul_mmu(CPUArchState *env, target_ulong addr,
> @@ -1469,7 +1464,7 @@ static uint64_t full_be_ldul_mmu(CPUArchState *env, target_ulong addr,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> return load_helper(env, addr, oi, retaddr, MO_BEUL, false,
> - full_be_ldul_mmu, wrap_ldul_be);
> + full_be_ldul_mmu, wrap_ldul_be, wrap_ldul_le);
> }
>
> tcg_target_ulong helper_be_ldul_mmu(CPUArchState *env, target_ulong addr,
> @@ -1482,14 +1477,14 @@ uint64_t helper_le_ldq_mmu(CPUArchState *env, target_ulong addr,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> return load_helper(env, addr, oi, retaddr, MO_LEQ, false,
> - helper_le_ldq_mmu, ldq_le_p);
> + helper_le_ldq_mmu, ldq_le_p, ldq_be_p);
> }
>
> uint64_t helper_be_ldq_mmu(CPUArchState *env, target_ulong addr,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> return load_helper(env, addr, oi, retaddr, MO_BEQ, false,
> - helper_be_ldq_mmu, ldq_be_p);
> + helper_be_ldq_mmu, ldq_be_p, ldq_le_p);
> }
>
> /*
> @@ -1563,7 +1558,7 @@ static inline void wrap_stl_le(void *haddr, uint64_t val)
> static inline void QEMU_ALWAYS_INLINE
> store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
> TCGMemOpIdx oi, uintptr_t retaddr, MemOp op,
> - StoreHelper *direct)
> + StoreHelper *direct, StoreHelper *direct_swap)
> {
> uintptr_t mmu_idx = get_mmuidx(oi);
> uintptr_t index = tlb_index(env, mmu_idx, addr);
> @@ -1608,16 +1603,22 @@ store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
> /* On watchpoint hit, this will longjmp out. */
> cpu_check_watchpoint(env_cpu(env), addr, size,
> iotlbentry->attrs, BP_MEM_WRITE, retaddr);
> -
> - /* The backing page may or may not require I/O. */
> - tlb_addr &= ~TLB_WATCHPOINT;
> - if ((tlb_addr & ~TARGET_PAGE_MASK) == 0) {
> - goto do_aligned_access;
> - }
> }
>
> /* Handle I/O access. */
> - io_writex(env, iotlbentry, mmu_idx, val, addr, retaddr, op);
> + if (likely(tlb_addr & (TLB_MMIO | TLB_NOTDIRTY))) {
> + io_writex(env, iotlbentry, mmu_idx, val, addr, retaddr,
> + op ^ (tlb_addr & TLB_BSWAP ? MO_BSWAP : 0));
> + return;
> + }
> +
> + haddr = (void *)((uintptr_t)addr + entry->addend);
> +
> + if (unlikely(tlb_addr & TLB_BSWAP)) {
> + direct_swap(haddr, val);
> + } else {
> + direct(haddr, val);
> + }
> return;
> }
>
> @@ -1686,7 +1687,6 @@ store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
> return;
> }
>
> - do_aligned_access:
> haddr = (void *)((uintptr_t)addr + entry->addend);
> direct(haddr, val);
> }
> @@ -1694,43 +1694,47 @@ store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
> void helper_ret_stb_mmu(CPUArchState *env, target_ulong addr, uint8_t val,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> - store_helper(env, addr, val, oi, retaddr, MO_UB, wrap_stb);
> + store_helper(env, addr, val, oi, retaddr, MO_UB, wrap_stb, wrap_stb);
> }
>
> void helper_le_stw_mmu(CPUArchState *env, target_ulong addr, uint16_t val,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> - store_helper(env, addr, val, oi, retaddr, MO_LEUW, wrap_stw_le);
> + store_helper(env, addr, val, oi, retaddr, MO_LEUW,
> + wrap_stw_le, wrap_stw_be);
> }
>
> void helper_be_stw_mmu(CPUArchState *env, target_ulong addr, uint16_t val,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> - store_helper(env, addr, val, oi, retaddr, MO_BEUW, wrap_stw_be);
> + store_helper(env, addr, val, oi, retaddr, MO_BEUW,
> + wrap_stw_be, wrap_stw_le);
> }
>
> void helper_le_stl_mmu(CPUArchState *env, target_ulong addr, uint32_t val,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> - store_helper(env, addr, val, oi, retaddr, MO_LEUL, wrap_stl_le);
> + store_helper(env, addr, val, oi, retaddr, MO_LEUL,
> + wrap_stl_le, wrap_stl_be);
> }
>
> void helper_be_stl_mmu(CPUArchState *env, target_ulong addr, uint32_t val,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> - store_helper(env, addr, val, oi, retaddr, MO_BEUL, wrap_stl_be);
> + store_helper(env, addr, val, oi, retaddr, MO_BEUL,
> + wrap_stl_be, wrap_stl_le);
> }
>
> void helper_le_stq_mmu(CPUArchState *env, target_ulong addr, uint64_t val,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> - store_helper(env, addr, val, oi, retaddr, MO_LEQ, stq_le_p);
> + store_helper(env, addr, val, oi, retaddr, MO_LEQ, stq_le_p, stq_be_p);
> }
>
> void helper_be_stq_mmu(CPUArchState *env, target_ulong addr, uint64_t val,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> - store_helper(env, addr, val, oi, retaddr, MO_BEQ, stq_be_p);
> + store_helper(env, addr, val, oi, retaddr, MO_BEQ, stq_be_p, stq_le_p);
> }
>
> /* First set of helpers allows passing in of OI and RETADDR. This makes
> @@ -1796,7 +1800,7 @@ static uint64_t full_ldub_cmmu(CPUArchState *env, target_ulong addr,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> return load_helper(env, addr, oi, retaddr, MO_8, true,
> - full_ldub_cmmu, wrap_ldub);
> + full_ldub_cmmu, wrap_ldub, wrap_ldub);
> }
>
> uint8_t helper_ret_ldb_cmmu(CPUArchState *env, target_ulong addr,
> @@ -1809,7 +1813,7 @@ static uint64_t full_le_lduw_cmmu(CPUArchState *env, target_ulong addr,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> return load_helper(env, addr, oi, retaddr, MO_LEUW, true,
> - full_le_lduw_cmmu, wrap_lduw_le);
> + full_le_lduw_cmmu, wrap_lduw_le, wrap_lduw_be);
> }
>
> uint16_t helper_le_ldw_cmmu(CPUArchState *env, target_ulong addr,
> @@ -1822,7 +1826,7 @@ static uint64_t full_be_lduw_cmmu(CPUArchState *env, target_ulong addr,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> return load_helper(env, addr, oi, retaddr, MO_BEUW, true,
> - full_be_lduw_cmmu, wrap_lduw_be);
> + full_be_lduw_cmmu, wrap_lduw_be, wrap_lduw_le);
> }
>
> uint16_t helper_be_ldw_cmmu(CPUArchState *env, target_ulong addr,
> @@ -1835,7 +1839,7 @@ static uint64_t full_le_ldul_cmmu(CPUArchState *env, target_ulong addr,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> return load_helper(env, addr, oi, retaddr, MO_LEUL, true,
> - full_le_ldul_cmmu, wrap_ldul_le);
> + full_le_ldul_cmmu, wrap_ldul_le, wrap_ldul_be);
> }
>
> uint32_t helper_le_ldl_cmmu(CPUArchState *env, target_ulong addr,
> @@ -1848,7 +1852,7 @@ static uint64_t full_be_ldul_cmmu(CPUArchState *env, target_ulong addr,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> return load_helper(env, addr, oi, retaddr, MO_BEUL, true,
> - full_be_ldul_cmmu, wrap_ldul_be);
> + full_be_ldul_cmmu, wrap_ldul_be, wrap_ldul_le);
> }
>
> uint32_t helper_be_ldl_cmmu(CPUArchState *env, target_ulong addr,
> @@ -1861,12 +1865,12 @@ uint64_t helper_le_ldq_cmmu(CPUArchState *env, target_ulong addr,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> return load_helper(env, addr, oi, retaddr, MO_LEQ, true,
> - helper_le_ldq_cmmu, ldq_le_p);
> + helper_le_ldq_cmmu, ldq_le_p, ldq_be_p);
> }
>
> uint64_t helper_be_ldq_cmmu(CPUArchState *env, target_ulong addr,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> return load_helper(env, addr, oi, retaddr, MO_BEQ, true,
> - helper_be_ldq_cmmu, ldq_be_p);
> + helper_be_ldq_cmmu, ldq_be_p, ldq_le_p);
> }
>
Reviewed-by: David Hildenbrand <david@redhat.com>
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v3 12/20] cputlb: Move ROM handling from I/O path to TLB path
2019-09-22 3:54 ` [PATCH v3 12/20] cputlb: Move ROM handling from I/O path to TLB path Richard Henderson
@ 2019-09-23 8:39 ` David Hildenbrand
0 siblings, 0 replies; 53+ messages in thread
From: David Hildenbrand @ 2019-09-23 8:39 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: pbonzini, alex.bennee, stefanha
On 22.09.19 05:54, Richard Henderson wrote:
> It does not require going through the whole I/O path
> in order to discard a write.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> include/exec/cpu-all.h | 5 ++++-
> include/exec/cpu-common.h | 1 -
> accel/tcg/cputlb.c | 35 +++++++++++++++++++--------------
> exec.c | 41 +--------------------------------------
> 4 files changed, 25 insertions(+), 57 deletions(-)
>
> diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
> index 1ebd1b59ab..9f0b17802e 100644
> --- a/include/exec/cpu-all.h
> +++ b/include/exec/cpu-all.h
> @@ -348,12 +348,15 @@ CPUArchState *cpu_copy(CPUArchState *env);
> #define TLB_WATCHPOINT (1 << (TARGET_PAGE_BITS_MIN - 4))
> /* Set if TLB entry requires byte swap. */
> #define TLB_BSWAP (1 << (TARGET_PAGE_BITS_MIN - 5))
> +/* Set if TLB entry writes ignored. */
> +#define TLB_ROM (1 << (TARGET_PAGE_BITS_MIN - 6))
I was wondering if TLB_DISCARD_WRITE/TLB_IGNORE_WRITE/TLB_READONLY would
make it clearer what's actually happening here.
E.g., it isn't used for memory_region_is_romd(section->mr) but only for
memory_region_is_ram(section->mr) && section->readonly.
But anyhow, changes look fine to me
Reviewed-by: David Hildenbrand <david@redhat.com>
>
> /* Use this mask to check interception with an alignment mask
> * in a TCG backend.
> */
> #define TLB_FLAGS_MASK \
> - (TLB_INVALID_MASK | TLB_NOTDIRTY | TLB_MMIO | TLB_WATCHPOINT | TLB_BSWAP)
> + (TLB_INVALID_MASK | TLB_NOTDIRTY | TLB_MMIO \
> + | TLB_WATCHPOINT | TLB_BSWAP | TLB_ROM)
>
> /**
> * tlb_hit_page: return true if page aligned @addr is a hit against the
> diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
> index f7dbe75fbc..1c0e03ddc2 100644
> --- a/include/exec/cpu-common.h
> +++ b/include/exec/cpu-common.h
> @@ -100,7 +100,6 @@ void qemu_flush_coalesced_mmio_buffer(void);
>
> void cpu_flush_icache_range(hwaddr start, hwaddr len);
>
> -extern struct MemoryRegion io_mem_rom;
> extern struct MemoryRegion io_mem_notdirty;
>
> typedef int (RAMBlockIterFunc)(RAMBlock *rb, void *opaque);
> diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
> index cb603917a2..7ab523d7ec 100644
> --- a/accel/tcg/cputlb.c
> +++ b/accel/tcg/cputlb.c
> @@ -577,7 +577,7 @@ static void tlb_reset_dirty_range_locked(CPUTLBEntry *tlb_entry,
> {
> uintptr_t addr = tlb_entry->addr_write;
>
> - if ((addr & (TLB_INVALID_MASK | TLB_MMIO | TLB_NOTDIRTY)) == 0) {
> + if ((addr & (TLB_INVALID_MASK | TLB_MMIO | TLB_ROM | TLB_NOTDIRTY)) == 0) {
> addr &= TARGET_PAGE_MASK;
> addr += tlb_entry->addend;
> if ((addr - start) < length) {
> @@ -745,7 +745,6 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
> address |= TLB_MMIO;
> addend = 0;
> } else {
> - /* TLB_MMIO for rom/romd handled below */
> addend = (uintptr_t)memory_region_get_ram_ptr(section->mr) + xlat;
> }
>
> @@ -822,16 +821,17 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
>
> tn.addr_write = -1;
> if (prot & PAGE_WRITE) {
> - if ((memory_region_is_ram(section->mr) && section->readonly)
> - || memory_region_is_romd(section->mr)) {
> - /* Write access calls the I/O callback. */
> - tn.addr_write = address | TLB_MMIO;
> - } else if (memory_region_is_ram(section->mr)
> - && cpu_physical_memory_is_clean(
> - memory_region_get_ram_addr(section->mr) + xlat)) {
> - tn.addr_write = address | TLB_NOTDIRTY;
> - } else {
> - tn.addr_write = address;
> + tn.addr_write = address;
> + if (memory_region_is_romd(section->mr)) {
> + /* Use the MMIO path so that the device can switch states. */
> + tn.addr_write |= TLB_MMIO;
> + } else if (memory_region_is_ram(section->mr)) {
> + if (section->readonly) {
> + tn.addr_write |= TLB_ROM;
> + } else if (cpu_physical_memory_is_clean(
> + memory_region_get_ram_addr(section->mr) + xlat)) {
> + tn.addr_write |= TLB_NOTDIRTY;
> + }
> }
> if (prot & PAGE_WRITE_INV) {
> tn.addr_write |= TLB_INVALID_MASK;
> @@ -904,7 +904,7 @@ static uint64_t io_readx(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
> mr = section->mr;
> mr_offset = (iotlbentry->addr & TARGET_PAGE_MASK) + addr;
> cpu->mem_io_pc = retaddr;
> - if (mr != &io_mem_rom && mr != &io_mem_notdirty && !cpu->can_do_io) {
> + if (mr != &io_mem_notdirty && !cpu->can_do_io) {
> cpu_io_recompile(cpu, retaddr);
> }
>
> @@ -945,7 +945,7 @@ static void io_writex(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
> section = iotlb_to_section(cpu, iotlbentry->addr, iotlbentry->attrs);
> mr = section->mr;
> mr_offset = (iotlbentry->addr & TARGET_PAGE_MASK) + addr;
> - if (mr != &io_mem_rom && mr != &io_mem_notdirty && !cpu->can_do_io) {
> + if (mr != &io_mem_notdirty && !cpu->can_do_io) {
> cpu_io_recompile(cpu, retaddr);
> }
> cpu->mem_io_vaddr = addr;
> @@ -1125,7 +1125,7 @@ void *probe_access(CPUArchState *env, target_ulong addr, int size,
> }
>
> /* Reject I/O access, or other required slow-path. */
> - if (tlb_addr & (TLB_NOTDIRTY | TLB_MMIO | TLB_BSWAP)) {
> + if (tlb_addr & (TLB_NOTDIRTY | TLB_MMIO | TLB_BSWAP | TLB_ROM)) {
> return NULL;
> }
>
> @@ -1612,6 +1612,11 @@ store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
> return;
> }
>
> + /* Ignore writes to ROM. */
> + if (unlikely(tlb_addr & TLB_ROM)) {
> + return;
> + }
> +
> haddr = (void *)((uintptr_t)addr + entry->addend);
>
> if (unlikely(tlb_addr & TLB_BSWAP)) {
> diff --git a/exec.c b/exec.c
> index 7ce0515635..e21e068535 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -88,7 +88,7 @@ static MemoryRegion *system_io;
> AddressSpace address_space_io;
> AddressSpace address_space_memory;
>
> -MemoryRegion io_mem_rom, io_mem_notdirty;
> +MemoryRegion io_mem_notdirty;
> static MemoryRegion io_mem_unassigned;
> #endif
>
> @@ -158,7 +158,6 @@ typedef struct subpage_t {
>
> #define PHYS_SECTION_UNASSIGNED 0
> #define PHYS_SECTION_NOTDIRTY 1
> -#define PHYS_SECTION_ROM 2
>
> static void io_mem_init(void);
> static void memory_map_init(void);
> @@ -1441,8 +1440,6 @@ hwaddr memory_region_section_get_iotlb(CPUState *cpu,
> iotlb = memory_region_get_ram_addr(section->mr) + xlat;
> if (!section->readonly) {
> iotlb |= PHYS_SECTION_NOTDIRTY;
> - } else {
> - iotlb |= PHYS_SECTION_ROM;
> }
> } else {
> AddressSpaceDispatch *d;
> @@ -2968,38 +2965,6 @@ static uint16_t dummy_section(PhysPageMap *map, FlatView *fv, MemoryRegion *mr)
> return phys_section_add(map, §ion);
> }
>
> -static void readonly_mem_write(void *opaque, hwaddr addr,
> - uint64_t val, unsigned size)
> -{
> - /* Ignore any write to ROM. */
> -}
> -
> -static bool readonly_mem_accepts(void *opaque, hwaddr addr,
> - unsigned size, bool is_write,
> - MemTxAttrs attrs)
> -{
> - return is_write;
> -}
> -
> -/* This will only be used for writes, because reads are special cased
> - * to directly access the underlying host ram.
> - */
> -static const MemoryRegionOps readonly_mem_ops = {
> - .write = readonly_mem_write,
> - .valid.accepts = readonly_mem_accepts,
> - .endianness = DEVICE_NATIVE_ENDIAN,
> - .valid = {
> - .min_access_size = 1,
> - .max_access_size = 8,
> - .unaligned = false,
> - },
> - .impl = {
> - .min_access_size = 1,
> - .max_access_size = 8,
> - .unaligned = false,
> - },
> -};
> -
> MemoryRegionSection *iotlb_to_section(CPUState *cpu,
> hwaddr index, MemTxAttrs attrs)
> {
> @@ -3013,8 +2978,6 @@ MemoryRegionSection *iotlb_to_section(CPUState *cpu,
>
> static void io_mem_init(void)
> {
> - memory_region_init_io(&io_mem_rom, NULL, &readonly_mem_ops,
> - NULL, NULL, UINT64_MAX);
> memory_region_init_io(&io_mem_unassigned, NULL, &unassigned_mem_ops, NULL,
> NULL, UINT64_MAX);
>
> @@ -3035,8 +2998,6 @@ AddressSpaceDispatch *address_space_dispatch_new(FlatView *fv)
> assert(n == PHYS_SECTION_UNASSIGNED);
> n = dummy_section(&d->map, fv, &io_mem_notdirty);
> assert(n == PHYS_SECTION_NOTDIRTY);
> - n = dummy_section(&d->map, fv, &io_mem_rom);
> - assert(n == PHYS_SECTION_ROM);
>
> d->phys_map = (PhysPageEntry) { .ptr = PHYS_MAP_NODE_NIL, .skip = 1 };
>
>
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v3 13/20] cputlb: Move NOTDIRTY handling from I/O path to TLB path
2019-09-22 3:54 ` [PATCH v3 13/20] cputlb: Move NOTDIRTY " Richard Henderson
@ 2019-09-23 8:41 ` David Hildenbrand
2019-09-23 9:30 ` Philippe Mathieu-Daudé
1 sibling, 0 replies; 53+ messages in thread
From: David Hildenbrand @ 2019-09-23 8:41 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: pbonzini, alex.bennee, stefanha
On 22.09.19 05:54, Richard Henderson wrote:
> Pages that we want to track for NOTDIRTY are RAM. We do not
> really need to go through the I/O path to handle them.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> include/exec/cpu-common.h | 2 --
> accel/tcg/cputlb.c | 26 +++++++++++++++++---
> exec.c | 50 ---------------------------------------
> memory.c | 16 -------------
> 4 files changed, 23 insertions(+), 71 deletions(-)
>
> diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
> index 1c0e03ddc2..81753bbb34 100644
> --- a/include/exec/cpu-common.h
> +++ b/include/exec/cpu-common.h
> @@ -100,8 +100,6 @@ void qemu_flush_coalesced_mmio_buffer(void);
>
> void cpu_flush_icache_range(hwaddr start, hwaddr len);
>
> -extern struct MemoryRegion io_mem_notdirty;
> -
> typedef int (RAMBlockIterFunc)(RAMBlock *rb, void *opaque);
>
> int qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque);
> diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
> index 7ab523d7ec..b7bd738115 100644
> --- a/accel/tcg/cputlb.c
> +++ b/accel/tcg/cputlb.c
> @@ -904,7 +904,7 @@ static uint64_t io_readx(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
> mr = section->mr;
> mr_offset = (iotlbentry->addr & TARGET_PAGE_MASK) + addr;
> cpu->mem_io_pc = retaddr;
> - if (mr != &io_mem_notdirty && !cpu->can_do_io) {
> + if (!cpu->can_do_io) {
> cpu_io_recompile(cpu, retaddr);
> }
>
> @@ -945,7 +945,7 @@ static void io_writex(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
> section = iotlb_to_section(cpu, iotlbentry->addr, iotlbentry->attrs);
> mr = section->mr;
> mr_offset = (iotlbentry->addr & TARGET_PAGE_MASK) + addr;
> - if (mr != &io_mem_notdirty && !cpu->can_do_io) {
> + if (!cpu->can_do_io) {
> cpu_io_recompile(cpu, retaddr);
> }
> cpu->mem_io_vaddr = addr;
> @@ -1606,7 +1606,7 @@ store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
> }
>
> /* Handle I/O access. */
> - if (likely(tlb_addr & (TLB_MMIO | TLB_NOTDIRTY))) {
> + if (tlb_addr & TLB_MMIO) {
> io_writex(env, iotlbentry, mmu_idx, val, addr, retaddr,
> op ^ (tlb_addr & TLB_BSWAP ? MO_BSWAP : 0));
> return;
> @@ -1619,6 +1619,26 @@ store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
>
> haddr = (void *)((uintptr_t)addr + entry->addend);
>
> + /* Handle clean RAM pages. */
> + if (tlb_addr & TLB_NOTDIRTY) {
> + NotDirtyInfo ndi;
> +
> + /* We require mem_io_pc in tb_invalidate_phys_page_range. */
> + env_cpu(env)->mem_io_pc = retaddr;
> +
> + memory_notdirty_write_prepare(&ndi, env_cpu(env), addr,
> + addr + iotlbentry->addr, size);
> +
> + if (unlikely(tlb_addr & TLB_BSWAP)) {
> + direct_swap(haddr, val);
> + } else {
> + direct(haddr, val);
> + }
> +
> + memory_notdirty_write_complete(&ndi);
> + return;
> + }
> +
> if (unlikely(tlb_addr & TLB_BSWAP)) {
> direct_swap(haddr, val);
> } else {
> diff --git a/exec.c b/exec.c
> index e21e068535..abf58b68a0 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -88,7 +88,6 @@ static MemoryRegion *system_io;
> AddressSpace address_space_io;
> AddressSpace address_space_memory;
>
> -MemoryRegion io_mem_notdirty;
> static MemoryRegion io_mem_unassigned;
> #endif
>
> @@ -157,7 +156,6 @@ typedef struct subpage_t {
> } subpage_t;
>
> #define PHYS_SECTION_UNASSIGNED 0
> -#define PHYS_SECTION_NOTDIRTY 1
>
> static void io_mem_init(void);
> static void memory_map_init(void);
> @@ -1438,9 +1436,6 @@ hwaddr memory_region_section_get_iotlb(CPUState *cpu,
> if (memory_region_is_ram(section->mr)) {
> /* Normal RAM. */
> iotlb = memory_region_get_ram_addr(section->mr) + xlat;
> - if (!section->readonly) {
> - iotlb |= PHYS_SECTION_NOTDIRTY;
> - }
> } else {
> AddressSpaceDispatch *d;
>
> @@ -2749,42 +2744,6 @@ void memory_notdirty_write_complete(NotDirtyInfo *ndi)
> }
> }
>
> -/* Called within RCU critical section. */
> -static void notdirty_mem_write(void *opaque, hwaddr ram_addr,
> - uint64_t val, unsigned size)
> -{
> - NotDirtyInfo ndi;
> -
> - memory_notdirty_write_prepare(&ndi, current_cpu, current_cpu->mem_io_vaddr,
> - ram_addr, size);
> -
> - stn_p(qemu_map_ram_ptr(NULL, ram_addr), size, val);
> - memory_notdirty_write_complete(&ndi);
> -}
> -
> -static bool notdirty_mem_accepts(void *opaque, hwaddr addr,
> - unsigned size, bool is_write,
> - MemTxAttrs attrs)
> -{
> - return is_write;
> -}
> -
> -static const MemoryRegionOps notdirty_mem_ops = {
> - .write = notdirty_mem_write,
> - .valid.accepts = notdirty_mem_accepts,
> - .endianness = DEVICE_NATIVE_ENDIAN,
> - .valid = {
> - .min_access_size = 1,
> - .max_access_size = 8,
> - .unaligned = false,
> - },
> - .impl = {
> - .min_access_size = 1,
> - .max_access_size = 8,
> - .unaligned = false,
> - },
> -};
> -
> /* Generate a debug exception if a watchpoint has been hit. */
> void cpu_check_watchpoint(CPUState *cpu, vaddr addr, vaddr len,
> MemTxAttrs attrs, int flags, uintptr_t ra)
> @@ -2980,13 +2939,6 @@ static void io_mem_init(void)
> {
> memory_region_init_io(&io_mem_unassigned, NULL, &unassigned_mem_ops, NULL,
> NULL, UINT64_MAX);
> -
> - /* io_mem_notdirty calls tb_invalidate_phys_page_fast,
> - * which can be called without the iothread mutex.
> - */
> - memory_region_init_io(&io_mem_notdirty, NULL, ¬dirty_mem_ops, NULL,
> - NULL, UINT64_MAX);
> - memory_region_clear_global_locking(&io_mem_notdirty);
> }
>
> AddressSpaceDispatch *address_space_dispatch_new(FlatView *fv)
> @@ -2996,8 +2948,6 @@ AddressSpaceDispatch *address_space_dispatch_new(FlatView *fv)
>
> n = dummy_section(&d->map, fv, &io_mem_unassigned);
> assert(n == PHYS_SECTION_UNASSIGNED);
> - n = dummy_section(&d->map, fv, &io_mem_notdirty);
> - assert(n == PHYS_SECTION_NOTDIRTY);
>
> d->phys_map = (PhysPageEntry) { .ptr = PHYS_MAP_NODE_NIL, .skip = 1 };
>
> diff --git a/memory.c b/memory.c
> index 57c44c97db..a99b8c0767 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -434,10 +434,6 @@ static MemTxResult memory_region_read_accessor(MemoryRegion *mr,
> tmp = mr->ops->read(mr->opaque, addr, size);
> if (mr->subpage) {
> trace_memory_region_subpage_read(get_cpu_index(), mr, addr, tmp, size);
> - } else if (mr == &io_mem_notdirty) {
> - /* Accesses to code which has previously been translated into a TB show
> - * up in the MMIO path, as accesses to the io_mem_notdirty
> - * MemoryRegion. */
> } else if (TRACE_MEMORY_REGION_OPS_READ_ENABLED) {
> hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
> trace_memory_region_ops_read(get_cpu_index(), mr, abs_addr, tmp, size);
> @@ -460,10 +456,6 @@ static MemTxResult memory_region_read_with_attrs_accessor(MemoryRegion *mr,
> r = mr->ops->read_with_attrs(mr->opaque, addr, &tmp, size, attrs);
> if (mr->subpage) {
> trace_memory_region_subpage_read(get_cpu_index(), mr, addr, tmp, size);
> - } else if (mr == &io_mem_notdirty) {
> - /* Accesses to code which has previously been translated into a TB show
> - * up in the MMIO path, as accesses to the io_mem_notdirty
> - * MemoryRegion. */
> } else if (TRACE_MEMORY_REGION_OPS_READ_ENABLED) {
> hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
> trace_memory_region_ops_read(get_cpu_index(), mr, abs_addr, tmp, size);
> @@ -484,10 +476,6 @@ static MemTxResult memory_region_write_accessor(MemoryRegion *mr,
>
> if (mr->subpage) {
> trace_memory_region_subpage_write(get_cpu_index(), mr, addr, tmp, size);
> - } else if (mr == &io_mem_notdirty) {
> - /* Accesses to code which has previously been translated into a TB show
> - * up in the MMIO path, as accesses to the io_mem_notdirty
> - * MemoryRegion. */
> } else if (TRACE_MEMORY_REGION_OPS_WRITE_ENABLED) {
> hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
> trace_memory_region_ops_write(get_cpu_index(), mr, abs_addr, tmp, size);
> @@ -508,10 +496,6 @@ static MemTxResult memory_region_write_with_attrs_accessor(MemoryRegion *mr,
>
> if (mr->subpage) {
> trace_memory_region_subpage_write(get_cpu_index(), mr, addr, tmp, size);
> - } else if (mr == &io_mem_notdirty) {
> - /* Accesses to code which has previously been translated into a TB show
> - * up in the MMIO path, as accesses to the io_mem_notdirty
> - * MemoryRegion. */
> } else if (TRACE_MEMORY_REGION_OPS_WRITE_ENABLED) {
> hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
> trace_memory_region_ops_write(get_cpu_index(), mr, abs_addr, tmp, size);
>
Complicated stuff but I think this is fine
Acked-by: David Hildenbrand <david@redhat.com>
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v3 17/20] cputlb: Remove cpu->mem_io_vaddr
2019-09-22 3:54 ` [PATCH v3 17/20] cputlb: Remove cpu->mem_io_vaddr Richard Henderson
@ 2019-09-23 8:50 ` David Hildenbrand
0 siblings, 0 replies; 53+ messages in thread
From: David Hildenbrand @ 2019-09-23 8:50 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: pbonzini, alex.bennee, stefanha
On 22.09.19 05:54, Richard Henderson wrote:
> With the merge of notdirty handling into store_helper,
> the last user of cpu->mem_io_vaddr was removed.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> include/hw/core/cpu.h | 2 --
> accel/tcg/cputlb.c | 2 --
> hw/core/cpu.c | 1 -
> 3 files changed, 5 deletions(-)
>
> diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
> index c7cda65c66..031f587e51 100644
> --- a/include/hw/core/cpu.h
> +++ b/include/hw/core/cpu.h
> @@ -338,7 +338,6 @@ struct qemu_work_item;
> * @next_cpu: Next CPU sharing TB cache.
> * @opaque: User data.
> * @mem_io_pc: Host Program Counter at which the memory was accessed.
> - * @mem_io_vaddr: Target virtual address at which the memory was accessed.
> * @kvm_fd: vCPU file descriptor for KVM.
> * @work_mutex: Lock to prevent multiple access to queued_work_*.
> * @queued_work_first: First asynchronous work pending.
> @@ -413,7 +412,6 @@ struct CPUState {
> * we store some rarely used information in the CPU context.
> */
> uintptr_t mem_io_pc;
> - vaddr mem_io_vaddr;
> /*
> * This is only needed for the legacy cpu_unassigned_access() hook;
> * when all targets using it have been converted to use
> diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
> index 6f4096bd0d..257c59c08c 100644
> --- a/accel/tcg/cputlb.c
> +++ b/accel/tcg/cputlb.c
> @@ -927,7 +927,6 @@ static uint64_t io_readx(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
> cpu_io_recompile(cpu, retaddr);
> }
>
> - cpu->mem_io_vaddr = addr;
> cpu->mem_io_access_type = access_type;
>
> if (mr->global_locking && !qemu_mutex_iothread_locked()) {
> @@ -967,7 +966,6 @@ static void io_writex(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
> if (!cpu->can_do_io) {
> cpu_io_recompile(cpu, retaddr);
> }
> - cpu->mem_io_vaddr = addr;
> cpu->mem_io_pc = retaddr;
>
> if (mr->global_locking && !qemu_mutex_iothread_locked()) {
> diff --git a/hw/core/cpu.c b/hw/core/cpu.c
> index 0035845511..73b1ee34d0 100644
> --- a/hw/core/cpu.c
> +++ b/hw/core/cpu.c
> @@ -261,7 +261,6 @@ static void cpu_common_reset(CPUState *cpu)
> cpu->interrupt_request = 0;
> cpu->halted = 0;
> cpu->mem_io_pc = 0;
> - cpu->mem_io_vaddr = 0;
> cpu->icount_extra = 0;
> atomic_set(&cpu->icount_decr_ptr->u32, 0);
> cpu->can_do_io = 1;
>
Reviewed-by: David Hildenbrand <david@redhat.com>
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v3 18/20] cputlb: Remove tb_invalidate_phys_page_range is_cpu_write_access
2019-09-22 3:54 ` [PATCH v3 18/20] cputlb: Remove tb_invalidate_phys_page_range is_cpu_write_access Richard Henderson
@ 2019-09-23 8:52 ` David Hildenbrand
2019-09-23 16:05 ` Richard Henderson
0 siblings, 1 reply; 53+ messages in thread
From: David Hildenbrand @ 2019-09-23 8:52 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: pbonzini, alex.bennee, stefanha
On 22.09.19 05:54, Richard Henderson wrote:
> All callers pass false to this argument. Remove it and pass the
> constant on to tb_invalidate_phys_page_range__locked.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> accel/tcg/translate-all.h | 3 +--
> accel/tcg/translate-all.c | 6 ++----
> exec.c | 4 ++--
> 3 files changed, 5 insertions(+), 8 deletions(-)
>
> diff --git a/accel/tcg/translate-all.h b/accel/tcg/translate-all.h
> index 64f5fd9a05..31f2117188 100644
> --- a/accel/tcg/translate-all.h
> +++ b/accel/tcg/translate-all.h
> @@ -28,8 +28,7 @@ struct page_collection *page_collection_lock(tb_page_addr_t start,
> void page_collection_unlock(struct page_collection *set);
> void tb_invalidate_phys_page_fast(struct page_collection *pages,
> tb_page_addr_t start, int len);
> -void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
> - int is_cpu_write_access);
> +void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end);
> void tb_check_watchpoint(CPUState *cpu);
>
> #ifdef CONFIG_USER_ONLY
> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> index 5d1e08b169..de4b697163 100644
> --- a/accel/tcg/translate-all.c
> +++ b/accel/tcg/translate-all.c
> @@ -1983,8 +1983,7 @@ tb_invalidate_phys_page_range__locked(struct page_collection *pages,
> *
> * Called with mmap_lock held for user-mode emulation
> */
> -void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
> - int is_cpu_write_access)
> +void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end)
> {
> struct page_collection *pages;
> PageDesc *p;
> @@ -1996,8 +1995,7 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
> return;
> }
> pages = page_collection_lock(start, end);
> - tb_invalidate_phys_page_range__locked(pages, p, start, end,
> - is_cpu_write_access);
> + tb_invalidate_phys_page_range__locked(pages, p, start, end, 0);
I'd prefer "false" to highlight the semantics, but as it's and int ...
Reviewed-by: David Hildenbrand <david@redhat.com>
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v3 19/20] cputlb: Pass retaddr to tb_invalidate_phys_page_fast
2019-09-22 3:54 ` [PATCH v3 19/20] cputlb: Pass retaddr to tb_invalidate_phys_page_fast Richard Henderson
@ 2019-09-23 8:53 ` David Hildenbrand
0 siblings, 0 replies; 53+ messages in thread
From: David Hildenbrand @ 2019-09-23 8:53 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: pbonzini, alex.bennee, stefanha
On 22.09.19 05:54, Richard Henderson wrote:
> Rather than rely on cpu->mem_io_pc, pass retaddr down directly.
>
> Within tb_invalidate_phys_page_range__locked, the is_cpu_write_access
> parameter is non-zero exactly when retaddr would be non-zero, so that
> is a simple replacement.
>
> Recognize that current_tb_not_found is true only when mem_io_pc
> (and now retaddr) are also non-zero, so remove a redundant test.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> accel/tcg/translate-all.h | 3 ++-
> accel/tcg/cputlb.c | 6 +-----
> accel/tcg/translate-all.c | 39 +++++++++++++++++++--------------------
> 3 files changed, 22 insertions(+), 26 deletions(-)
>
> diff --git a/accel/tcg/translate-all.h b/accel/tcg/translate-all.h
> index 31f2117188..135c1ea96a 100644
> --- a/accel/tcg/translate-all.h
> +++ b/accel/tcg/translate-all.h
> @@ -27,7 +27,8 @@ struct page_collection *page_collection_lock(tb_page_addr_t start,
> tb_page_addr_t end);
> void page_collection_unlock(struct page_collection *set);
> void tb_invalidate_phys_page_fast(struct page_collection *pages,
> - tb_page_addr_t start, int len);
> + tb_page_addr_t start, int len,
> + uintptr_t retaddr);
> void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end);
> void tb_check_watchpoint(CPUState *cpu);
>
> diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
> index 257c59c08c..eff129447d 100644
> --- a/accel/tcg/cputlb.c
> +++ b/accel/tcg/cputlb.c
> @@ -1093,11 +1093,7 @@ static void notdirty_write(CPUState *cpu, vaddr mem_vaddr, unsigned size,
> if (!cpu_physical_memory_get_dirty_flag(ram_addr, DIRTY_MEMORY_CODE)) {
> struct page_collection *pages
> = page_collection_lock(ram_addr, ram_addr + size);
> -
> - /* We require mem_io_pc in tb_invalidate_phys_page_range. */
> - cpu->mem_io_pc = retaddr;
> -
> - tb_invalidate_phys_page_fast(pages, ram_addr, size);
> + tb_invalidate_phys_page_fast(pages, ram_addr, size, retaddr);
> page_collection_unlock(pages);
> }
>
> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> index de4b697163..db77fb221b 100644
> --- a/accel/tcg/translate-all.c
> +++ b/accel/tcg/translate-all.c
> @@ -1889,7 +1889,7 @@ static void
> tb_invalidate_phys_page_range__locked(struct page_collection *pages,
> PageDesc *p, tb_page_addr_t start,
> tb_page_addr_t end,
> - int is_cpu_write_access)
> + uintptr_t retaddr)
> {
> TranslationBlock *tb;
> tb_page_addr_t tb_start, tb_end;
> @@ -1897,9 +1897,9 @@ tb_invalidate_phys_page_range__locked(struct page_collection *pages,
> #ifdef TARGET_HAS_PRECISE_SMC
> CPUState *cpu = current_cpu;
> CPUArchState *env = NULL;
> - int current_tb_not_found = is_cpu_write_access;
> + bool current_tb_not_found = retaddr != 0;
> + bool current_tb_modified = false;
> TranslationBlock *current_tb = NULL;
> - int current_tb_modified = 0;
> target_ulong current_pc = 0;
> target_ulong current_cs_base = 0;
> uint32_t current_flags = 0;
> @@ -1931,24 +1931,21 @@ tb_invalidate_phys_page_range__locked(struct page_collection *pages,
> if (!(tb_end <= start || tb_start >= end)) {
> #ifdef TARGET_HAS_PRECISE_SMC
> if (current_tb_not_found) {
> - current_tb_not_found = 0;
> - current_tb = NULL;
> - if (cpu->mem_io_pc) {
> - /* now we have a real cpu fault */
> - current_tb = tcg_tb_lookup(cpu->mem_io_pc);
> - }
> + current_tb_not_found = false;
> + /* now we have a real cpu fault */
> + current_tb = tcg_tb_lookup(retaddr);
> }
> if (current_tb == tb &&
> (tb_cflags(current_tb) & CF_COUNT_MASK) != 1) {
> - /* If we are modifying the current TB, we must stop
> - its execution. We could be more precise by checking
> - that the modification is after the current PC, but it
> - would require a specialized function to partially
> - restore the CPU state */
> -
> - current_tb_modified = 1;
> - cpu_restore_state_from_tb(cpu, current_tb,
> - cpu->mem_io_pc, true);
> + /*
> + * If we are modifying the current TB, we must stop
> + * its execution. We could be more precise by checking
> + * that the modification is after the current PC, but it
> + * would require a specialized function to partially
> + * restore the CPU state.
> + */
> + current_tb_modified = true;
> + cpu_restore_state_from_tb(cpu, current_tb, retaddr, true);
> cpu_get_tb_cpu_state(env, ¤t_pc, ¤t_cs_base,
> ¤t_flags);
> }
> @@ -2042,7 +2039,8 @@ void tb_invalidate_phys_range(target_ulong start, target_ulong end)
> * Call with all @pages in the range [@start, @start + len[ locked.
> */
> void tb_invalidate_phys_page_fast(struct page_collection *pages,
> - tb_page_addr_t start, int len)
> + tb_page_addr_t start, int len,
> + uintptr_t retaddr)
> {
> PageDesc *p;
>
> @@ -2069,7 +2067,8 @@ void tb_invalidate_phys_page_fast(struct page_collection *pages,
> }
> } else {
> do_invalidate:
> - tb_invalidate_phys_page_range__locked(pages, p, start, start + len, 1);
> + tb_invalidate_phys_page_range__locked(pages, p, start, start + len,
> + retaddr);
> }
> }
> #else
>
Reviewed-by: David Hildenbrand <david@redhat.com>
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v3 20/20] cputlb: Pass retaddr to tb_check_watchpoint
2019-09-22 3:54 ` [PATCH v3 20/20] cputlb: Pass retaddr to tb_check_watchpoint Richard Henderson
@ 2019-09-23 8:54 ` David Hildenbrand
0 siblings, 0 replies; 53+ messages in thread
From: David Hildenbrand @ 2019-09-23 8:54 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: pbonzini, alex.bennee, stefanha
On 22.09.19 05:54, Richard Henderson wrote:
> Fixes the previous TLB_WATCHPOINT patches because we are currently
> failing to set cpu->mem_io_pc with the call to cpu_check_watchpoint.
> Pass down the retaddr directly because it's readily available.
>
> Fixes: 50b107c5d61
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> accel/tcg/translate-all.h | 2 +-
> accel/tcg/translate-all.c | 6 +++---
> exec.c | 2 +-
> 3 files changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/accel/tcg/translate-all.h b/accel/tcg/translate-all.h
> index 135c1ea96a..a557b4e2bb 100644
> --- a/accel/tcg/translate-all.h
> +++ b/accel/tcg/translate-all.h
> @@ -30,7 +30,7 @@ void tb_invalidate_phys_page_fast(struct page_collection *pages,
> tb_page_addr_t start, int len,
> uintptr_t retaddr);
> void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end);
> -void tb_check_watchpoint(CPUState *cpu);
> +void tb_check_watchpoint(CPUState *cpu, uintptr_t retaddr);
>
> #ifdef CONFIG_USER_ONLY
> int page_unprotect(target_ulong address, uintptr_t pc);
> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> index db77fb221b..66d4bc4341 100644
> --- a/accel/tcg/translate-all.c
> +++ b/accel/tcg/translate-all.c
> @@ -2142,16 +2142,16 @@ static bool tb_invalidate_phys_page(tb_page_addr_t addr, uintptr_t pc)
> #endif
>
> /* user-mode: call with mmap_lock held */
> -void tb_check_watchpoint(CPUState *cpu)
> +void tb_check_watchpoint(CPUState *cpu, uintptr_t retaddr)
> {
> TranslationBlock *tb;
>
> assert_memory_lock();
>
> - tb = tcg_tb_lookup(cpu->mem_io_pc);
> + tb = tcg_tb_lookup(retaddr);
> if (tb) {
> /* We can use retranslation to find the PC. */
> - cpu_restore_state_from_tb(cpu, tb, cpu->mem_io_pc, true);
> + cpu_restore_state_from_tb(cpu, tb, retaddr, true);
> tb_phys_invalidate(tb, -1);
> } else {
> /* The exception probably happened in a helper. The CPU state should
> diff --git a/exec.c b/exec.c
> index fed25d029b..ceeef4cd4b 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -2724,7 +2724,7 @@ void cpu_check_watchpoint(CPUState *cpu, vaddr addr, vaddr len,
> cpu->watchpoint_hit = wp;
>
> mmap_lock();
> - tb_check_watchpoint(cpu);
> + tb_check_watchpoint(cpu, ra);
> if (wp->flags & BP_STOP_BEFORE_ACCESS) {
> cpu->exception_index = EXCP_DEBUG;
> mmap_unlock();
>
Finally, a lot of this stuff gets cleaned up :)
Reviewed-by: David Hildenbrand <david@redhat.com>
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v3 11/20] exec: Adjust notdirty tracing
2019-09-22 3:54 ` [PATCH v3 11/20] exec: Adjust notdirty tracing Richard Henderson
@ 2019-09-23 9:17 ` Philippe Mathieu-Daudé
0 siblings, 0 replies; 53+ messages in thread
From: Philippe Mathieu-Daudé @ 2019-09-23 9:17 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: pbonzini, alex.bennee, stefanha, david
On 9/22/19 5:54 AM, Richard Henderson wrote:
> The memory_region_tb_read tracepoint is unreachable, since notdirty
> is supposed to apply only to writes. The memory_region_tb_write
> tracepoint is mis-named, because notdirty is not only used for TB
> invalidation. It is also used for e.g. VGA RAM updates and migration.
>
> Replace memory_region_tb_write with memory_notdirty_write_access,
> and place it in memory_notdirty_write_prepare where it can catch
> all of the instances. Add memory_notdirty_set_dirty to log when
> we no longer intercept writes to a page.
>
> Reviewed-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> exec.c | 3 +++
> memory.c | 4 ----
> trace-events | 4 ++--
> 3 files changed, 5 insertions(+), 6 deletions(-)
>
> diff --git a/exec.c b/exec.c
> index 33bd0e36c1..7ce0515635 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -2721,6 +2721,8 @@ void memory_notdirty_write_prepare(NotDirtyInfo *ndi,
> ndi->size = size;
> ndi->pages = NULL;
>
> + trace_memory_notdirty_write_access(mem_vaddr, ram_addr, size);
> +
> assert(tcg_enabled());
> if (!cpu_physical_memory_get_dirty_flag(ram_addr, DIRTY_MEMORY_CODE)) {
> ndi->pages = page_collection_lock(ram_addr, ram_addr + size);
> @@ -2745,6 +2747,7 @@ void memory_notdirty_write_complete(NotDirtyInfo *ndi)
> /* we remove the notdirty callback only if the code has been
> flushed */
> if (!cpu_physical_memory_is_clean(ndi->ram_addr)) {
> + trace_memory_notdirty_set_dirty(ndi->mem_vaddr);
> tlb_set_dirty(ndi->cpu, ndi->mem_vaddr);
> }
> }
> diff --git a/memory.c b/memory.c
> index b9dd6b94ca..57c44c97db 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -438,7 +438,6 @@ static MemTxResult memory_region_read_accessor(MemoryRegion *mr,
> /* Accesses to code which has previously been translated into a TB show
> * up in the MMIO path, as accesses to the io_mem_notdirty
> * MemoryRegion. */
> - trace_memory_region_tb_read(get_cpu_index(), addr, tmp, size);
> } else if (TRACE_MEMORY_REGION_OPS_READ_ENABLED) {
> hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
> trace_memory_region_ops_read(get_cpu_index(), mr, abs_addr, tmp, size);
> @@ -465,7 +464,6 @@ static MemTxResult memory_region_read_with_attrs_accessor(MemoryRegion *mr,
> /* Accesses to code which has previously been translated into a TB show
> * up in the MMIO path, as accesses to the io_mem_notdirty
> * MemoryRegion. */
> - trace_memory_region_tb_read(get_cpu_index(), addr, tmp, size);
> } else if (TRACE_MEMORY_REGION_OPS_READ_ENABLED) {
> hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
> trace_memory_region_ops_read(get_cpu_index(), mr, abs_addr, tmp, size);
> @@ -490,7 +488,6 @@ static MemTxResult memory_region_write_accessor(MemoryRegion *mr,
> /* Accesses to code which has previously been translated into a TB show
> * up in the MMIO path, as accesses to the io_mem_notdirty
> * MemoryRegion. */
> - trace_memory_region_tb_write(get_cpu_index(), addr, tmp, size);
> } else if (TRACE_MEMORY_REGION_OPS_WRITE_ENABLED) {
> hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
> trace_memory_region_ops_write(get_cpu_index(), mr, abs_addr, tmp, size);
> @@ -515,7 +512,6 @@ static MemTxResult memory_region_write_with_attrs_accessor(MemoryRegion *mr,
> /* Accesses to code which has previously been translated into a TB show
> * up in the MMIO path, as accesses to the io_mem_notdirty
> * MemoryRegion. */
> - trace_memory_region_tb_write(get_cpu_index(), addr, tmp, size);
> } else if (TRACE_MEMORY_REGION_OPS_WRITE_ENABLED) {
> hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
> trace_memory_region_ops_write(get_cpu_index(), mr, abs_addr, tmp, size);
> diff --git a/trace-events b/trace-events
> index 823a4ae64e..20821ba545 100644
> --- a/trace-events
> +++ b/trace-events
> @@ -52,14 +52,14 @@ dma_map_wait(void *dbs) "dbs=%p"
> find_ram_offset(uint64_t size, uint64_t offset) "size: 0x%" PRIx64 " @ 0x%" PRIx64
> find_ram_offset_loop(uint64_t size, uint64_t candidate, uint64_t offset, uint64_t next, uint64_t mingap) "trying size: 0x%" PRIx64 " @ 0x%" PRIx64 ", offset: 0x%" PRIx64" next: 0x%" PRIx64 " mingap: 0x%" PRIx64
> ram_block_discard_range(const char *rbname, void *hva, size_t length, bool need_madvise, bool need_fallocate, int ret) "%s@%p + 0x%zx: madvise: %d fallocate: %d ret: %d"
> +memory_notdirty_write_access(uint64_t vaddr, uint64_t ram_addr, unsigned size) "0x%" PRIx64 " ram_addr 0x%" PRIx64 " size %u"
> +memory_notdirty_set_dirty(uint64_t vaddr) "0x%" PRIx64
>
> # memory.c
> memory_region_ops_read(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u"
> memory_region_ops_write(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u"
> memory_region_subpage_read(int cpu_index, void *mr, uint64_t offset, uint64_t value, unsigned size) "cpu %d mr %p offset 0x%"PRIx64" value 0x%"PRIx64" size %u"
> memory_region_subpage_write(int cpu_index, void *mr, uint64_t offset, uint64_t value, unsigned size) "cpu %d mr %p offset 0x%"PRIx64" value 0x%"PRIx64" size %u"
> -memory_region_tb_read(int cpu_index, uint64_t addr, uint64_t value, unsigned size) "cpu %d addr 0x%"PRIx64" value 0x%"PRIx64" size %u"
> -memory_region_tb_write(int cpu_index, uint64_t addr, uint64_t value, unsigned size) "cpu %d addr 0x%"PRIx64" value 0x%"PRIx64" size %u"
> memory_region_ram_device_read(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u"
> memory_region_ram_device_write(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u"
> flatview_new(void *view, void *root) "%p (root %p)"
>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v3 08/20] cputlb: Disable __always_inline__ without optimization
2019-09-22 3:54 ` [PATCH v3 08/20] cputlb: Disable __always_inline__ without optimization Richard Henderson
@ 2019-09-23 9:18 ` Philippe Mathieu-Daudé
2019-09-23 9:45 ` Paolo Bonzini
1 sibling, 0 replies; 53+ messages in thread
From: Philippe Mathieu-Daudé @ 2019-09-23 9:18 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: pbonzini, alex.bennee, stefanha, david
On 9/22/19 5:54 AM, Richard Henderson wrote:
> This forced inlining can result in missing symbols,
> which makes a debugging build harder to follow.
>
> Reviewed-by: David Hildenbrand <david@redhat.com>
> Reported-by: Peter Maydell <peter.maydell@linaro.org>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> include/qemu/compiler.h | 11 +++++++++++
> accel/tcg/cputlb.c | 4 ++--
> 2 files changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/include/qemu/compiler.h b/include/qemu/compiler.h
> index 09fc44cca4..d6d400c523 100644
> --- a/include/qemu/compiler.h
> +++ b/include/qemu/compiler.h
> @@ -170,6 +170,17 @@
> # define QEMU_NONSTRING
> #endif
>
> +/*
> + * Forced inlining may be desired to encourage constant propagation
> + * of function parameters. However, it can also make debugging harder,
> + * so disable it for a non-optimizing build.
> + */
> +#if defined(__OPTIMIZE__) && __has_attribute(always_inline)
> +#define QEMU_ALWAYS_INLINE __attribute__((always_inline))
> +#else
> +#define QEMU_ALWAYS_INLINE
> +#endif
> +
> /* Implement C11 _Generic via GCC builtins. Example:
> *
> * QEMU_GENERIC(x, (float, sinf), (long double, sinl), sin) (x)
> diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
> index abae79650c..2222b87764 100644
> --- a/accel/tcg/cputlb.c
> +++ b/accel/tcg/cputlb.c
> @@ -1281,7 +1281,7 @@ static void *atomic_mmu_lookup(CPUArchState *env, target_ulong addr,
> typedef uint64_t FullLoadHelper(CPUArchState *env, target_ulong addr,
> TCGMemOpIdx oi, uintptr_t retaddr);
>
> -static inline uint64_t __attribute__((always_inline))
> +static inline uint64_t QEMU_ALWAYS_INLINE
> load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
> uintptr_t retaddr, MemOp op, bool code_read,
> FullLoadHelper *full_load)
> @@ -1530,7 +1530,7 @@ tcg_target_ulong helper_be_ldsl_mmu(CPUArchState *env, target_ulong addr,
> * Store Helpers
> */
>
> -static inline void __attribute__((always_inline))
> +static inline void QEMU_ALWAYS_INLINE
> store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
> TCGMemOpIdx oi, uintptr_t retaddr, MemOp op)
> {
>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v3 09/20] cputlb: Replace switches in load/store_helper with callback
2019-09-22 3:54 ` [PATCH v3 09/20] cputlb: Replace switches in load/store_helper with callback Richard Henderson
2019-09-23 8:32 ` David Hildenbrand
@ 2019-09-23 9:27 ` Philippe Mathieu-Daudé
2019-09-23 9:51 ` Paolo Bonzini
2 siblings, 0 replies; 53+ messages in thread
From: Philippe Mathieu-Daudé @ 2019-09-23 9:27 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: pbonzini, alex.bennee, stefanha, david
On 9/22/19 5:54 AM, Richard Henderson wrote:
> Add a function parameter to perform the actual load/store to ram.
> With optimization, this results in identical code.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> accel/tcg/cputlb.c | 159 +++++++++++++++++++++++----------------------
> 1 file changed, 83 insertions(+), 76 deletions(-)
>
> diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
> index 2222b87764..b4a63d3928 100644
> --- a/accel/tcg/cputlb.c
> +++ b/accel/tcg/cputlb.c
> @@ -1280,11 +1280,38 @@ static void *atomic_mmu_lookup(CPUArchState *env, target_ulong addr,
>
> typedef uint64_t FullLoadHelper(CPUArchState *env, target_ulong addr,
> TCGMemOpIdx oi, uintptr_t retaddr);
> +typedef uint64_t LoadHelper(const void *);
> +
> +/* Wrap the unaligned load helpers to that they have a common signature. */
> +static inline uint64_t wrap_ldub(const void *haddr)
> +{
> + return ldub_p(haddr);
> +}
> +
> +static inline uint64_t wrap_lduw_be(const void *haddr)
> +{
> + return lduw_be_p(haddr);
> +}
> +
> +static inline uint64_t wrap_lduw_le(const void *haddr)
> +{
> + return lduw_le_p(haddr);
> +}
> +
> +static inline uint64_t wrap_ldul_be(const void *haddr)
> +{
> + return (uint32_t)ldl_be_p(haddr);
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
> +}
> +
> +static inline uint64_t wrap_ldul_le(const void *haddr)
> +{
> + return (uint32_t)ldl_le_p(haddr);
> +}
>
> static inline uint64_t QEMU_ALWAYS_INLINE
> load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
> uintptr_t retaddr, MemOp op, bool code_read,
> - FullLoadHelper *full_load)
> + FullLoadHelper *full_load, LoadHelper *direct)
> {
> uintptr_t mmu_idx = get_mmuidx(oi);
> uintptr_t index = tlb_index(env, mmu_idx, addr);
> @@ -1373,33 +1400,7 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
>
> do_aligned_access:
> haddr = (void *)((uintptr_t)addr + entry->addend);
> - switch (op) {
> - case MO_UB:
> - res = ldub_p(haddr);
> - break;
> - case MO_BEUW:
> - res = lduw_be_p(haddr);
> - break;
> - case MO_LEUW:
> - res = lduw_le_p(haddr);
> - break;
> - case MO_BEUL:
> - res = (uint32_t)ldl_be_p(haddr);
> - break;
> - case MO_LEUL:
> - res = (uint32_t)ldl_le_p(haddr);
> - break;
> - case MO_BEQ:
> - res = ldq_be_p(haddr);
> - break;
> - case MO_LEQ:
> - res = ldq_le_p(haddr);
> - break;
> - default:
> - g_assert_not_reached();
> - }
> -
> - return res;
> + return direct(haddr);
> }
>
> /*
> @@ -1415,7 +1416,8 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
> static uint64_t full_ldub_mmu(CPUArchState *env, target_ulong addr,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> - return load_helper(env, addr, oi, retaddr, MO_UB, false, full_ldub_mmu);
> + return load_helper(env, addr, oi, retaddr, MO_UB, false,
> + full_ldub_mmu, wrap_ldub);
> }
>
> tcg_target_ulong helper_ret_ldub_mmu(CPUArchState *env, target_ulong addr,
> @@ -1428,7 +1430,7 @@ static uint64_t full_le_lduw_mmu(CPUArchState *env, target_ulong addr,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> return load_helper(env, addr, oi, retaddr, MO_LEUW, false,
> - full_le_lduw_mmu);
> + full_le_lduw_mmu, wrap_lduw_le);
> }
>
> tcg_target_ulong helper_le_lduw_mmu(CPUArchState *env, target_ulong addr,
> @@ -1441,7 +1443,7 @@ static uint64_t full_be_lduw_mmu(CPUArchState *env, target_ulong addr,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> return load_helper(env, addr, oi, retaddr, MO_BEUW, false,
> - full_be_lduw_mmu);
> + full_be_lduw_mmu, wrap_lduw_be);
> }
>
> tcg_target_ulong helper_be_lduw_mmu(CPUArchState *env, target_ulong addr,
> @@ -1454,7 +1456,7 @@ static uint64_t full_le_ldul_mmu(CPUArchState *env, target_ulong addr,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> return load_helper(env, addr, oi, retaddr, MO_LEUL, false,
> - full_le_ldul_mmu);
> + full_le_ldul_mmu, wrap_ldul_le);
> }
>
> tcg_target_ulong helper_le_ldul_mmu(CPUArchState *env, target_ulong addr,
> @@ -1467,7 +1469,7 @@ static uint64_t full_be_ldul_mmu(CPUArchState *env, target_ulong addr,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> return load_helper(env, addr, oi, retaddr, MO_BEUL, false,
> - full_be_ldul_mmu);
> + full_be_ldul_mmu, wrap_ldul_be);
> }
>
> tcg_target_ulong helper_be_ldul_mmu(CPUArchState *env, target_ulong addr,
> @@ -1480,14 +1482,14 @@ uint64_t helper_le_ldq_mmu(CPUArchState *env, target_ulong addr,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> return load_helper(env, addr, oi, retaddr, MO_LEQ, false,
> - helper_le_ldq_mmu);
> + helper_le_ldq_mmu, ldq_le_p);
> }
>
> uint64_t helper_be_ldq_mmu(CPUArchState *env, target_ulong addr,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> return load_helper(env, addr, oi, retaddr, MO_BEQ, false,
> - helper_be_ldq_mmu);
> + helper_be_ldq_mmu, ldq_be_p);
> }
>
> /*
> @@ -1530,9 +1532,38 @@ tcg_target_ulong helper_be_ldsl_mmu(CPUArchState *env, target_ulong addr,
> * Store Helpers
> */
>
> +typedef void StoreHelper(void *, uint64_t);
> +
> +/* Wrap the unaligned store helpers to that they have a common signature. */
> +static inline void wrap_stb(void *haddr, uint64_t val)
> +{
> + stb_p(haddr, val);
> +}
> +
> +static inline void wrap_stw_be(void *haddr, uint64_t val)
> +{
> + stw_be_p(haddr, val);
> +}
> +
> +static inline void wrap_stw_le(void *haddr, uint64_t val)
> +{
> + stw_le_p(haddr, val);
> +}
> +
> +static inline void wrap_stl_be(void *haddr, uint64_t val)
> +{
> + stl_be_p(haddr, val);
> +}
> +
> +static inline void wrap_stl_le(void *haddr, uint64_t val)
> +{
> + stl_le_p(haddr, val);
> +}
> +
> static inline void QEMU_ALWAYS_INLINE
> store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
> - TCGMemOpIdx oi, uintptr_t retaddr, MemOp op)
> + TCGMemOpIdx oi, uintptr_t retaddr, MemOp op,
> + StoreHelper *direct)
> {
> uintptr_t mmu_idx = get_mmuidx(oi);
> uintptr_t index = tlb_index(env, mmu_idx, addr);
> @@ -1657,74 +1688,49 @@ store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
>
> do_aligned_access:
> haddr = (void *)((uintptr_t)addr + entry->addend);
> - switch (op) {
> - case MO_UB:
> - stb_p(haddr, val);
> - break;
> - case MO_BEUW:
> - stw_be_p(haddr, val);
> - break;
> - case MO_LEUW:
> - stw_le_p(haddr, val);
> - break;
> - case MO_BEUL:
> - stl_be_p(haddr, val);
> - break;
> - case MO_LEUL:
> - stl_le_p(haddr, val);
> - break;
> - case MO_BEQ:
> - stq_be_p(haddr, val);
> - break;
> - case MO_LEQ:
> - stq_le_p(haddr, val);
> - break;
> - default:
> - g_assert_not_reached();
> - break;
> - }
> + direct(haddr, val);
> }
>
> void helper_ret_stb_mmu(CPUArchState *env, target_ulong addr, uint8_t val,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> - store_helper(env, addr, val, oi, retaddr, MO_UB);
> + store_helper(env, addr, val, oi, retaddr, MO_UB, wrap_stb);
> }
>
> void helper_le_stw_mmu(CPUArchState *env, target_ulong addr, uint16_t val,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> - store_helper(env, addr, val, oi, retaddr, MO_LEUW);
> + store_helper(env, addr, val, oi, retaddr, MO_LEUW, wrap_stw_le);
> }
>
> void helper_be_stw_mmu(CPUArchState *env, target_ulong addr, uint16_t val,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> - store_helper(env, addr, val, oi, retaddr, MO_BEUW);
> + store_helper(env, addr, val, oi, retaddr, MO_BEUW, wrap_stw_be);
> }
>
> void helper_le_stl_mmu(CPUArchState *env, target_ulong addr, uint32_t val,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> - store_helper(env, addr, val, oi, retaddr, MO_LEUL);
> + store_helper(env, addr, val, oi, retaddr, MO_LEUL, wrap_stl_le);
> }
>
> void helper_be_stl_mmu(CPUArchState *env, target_ulong addr, uint32_t val,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> - store_helper(env, addr, val, oi, retaddr, MO_BEUL);
> + store_helper(env, addr, val, oi, retaddr, MO_BEUL, wrap_stl_be);
> }
>
> void helper_le_stq_mmu(CPUArchState *env, target_ulong addr, uint64_t val,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> - store_helper(env, addr, val, oi, retaddr, MO_LEQ);
> + store_helper(env, addr, val, oi, retaddr, MO_LEQ, stq_le_p);
> }
>
> void helper_be_stq_mmu(CPUArchState *env, target_ulong addr, uint64_t val,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> - store_helper(env, addr, val, oi, retaddr, MO_BEQ);
> + store_helper(env, addr, val, oi, retaddr, MO_BEQ, stq_be_p);
> }
>
> /* First set of helpers allows passing in of OI and RETADDR. This makes
> @@ -1789,7 +1795,8 @@ void helper_be_stq_mmu(CPUArchState *env, target_ulong addr, uint64_t val,
> static uint64_t full_ldub_cmmu(CPUArchState *env, target_ulong addr,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> - return load_helper(env, addr, oi, retaddr, MO_8, true, full_ldub_cmmu);
> + return load_helper(env, addr, oi, retaddr, MO_8, true,
> + full_ldub_cmmu, wrap_ldub);
> }
>
> uint8_t helper_ret_ldb_cmmu(CPUArchState *env, target_ulong addr,
> @@ -1802,7 +1809,7 @@ static uint64_t full_le_lduw_cmmu(CPUArchState *env, target_ulong addr,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> return load_helper(env, addr, oi, retaddr, MO_LEUW, true,
> - full_le_lduw_cmmu);
> + full_le_lduw_cmmu, wrap_lduw_le);
> }
>
> uint16_t helper_le_ldw_cmmu(CPUArchState *env, target_ulong addr,
> @@ -1815,7 +1822,7 @@ static uint64_t full_be_lduw_cmmu(CPUArchState *env, target_ulong addr,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> return load_helper(env, addr, oi, retaddr, MO_BEUW, true,
> - full_be_lduw_cmmu);
> + full_be_lduw_cmmu, wrap_lduw_be);
> }
>
> uint16_t helper_be_ldw_cmmu(CPUArchState *env, target_ulong addr,
> @@ -1828,7 +1835,7 @@ static uint64_t full_le_ldul_cmmu(CPUArchState *env, target_ulong addr,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> return load_helper(env, addr, oi, retaddr, MO_LEUL, true,
> - full_le_ldul_cmmu);
> + full_le_ldul_cmmu, wrap_ldul_le);
> }
>
> uint32_t helper_le_ldl_cmmu(CPUArchState *env, target_ulong addr,
> @@ -1841,7 +1848,7 @@ static uint64_t full_be_ldul_cmmu(CPUArchState *env, target_ulong addr,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> return load_helper(env, addr, oi, retaddr, MO_BEUL, true,
> - full_be_ldul_cmmu);
> + full_be_ldul_cmmu, wrap_ldul_be);
> }
>
> uint32_t helper_be_ldl_cmmu(CPUArchState *env, target_ulong addr,
> @@ -1854,12 +1861,12 @@ uint64_t helper_le_ldq_cmmu(CPUArchState *env, target_ulong addr,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> return load_helper(env, addr, oi, retaddr, MO_LEQ, true,
> - helper_le_ldq_cmmu);
> + helper_le_ldq_cmmu, ldq_le_p);
> }
>
> uint64_t helper_be_ldq_cmmu(CPUArchState *env, target_ulong addr,
> TCGMemOpIdx oi, uintptr_t retaddr)
> {
> return load_helper(env, addr, oi, retaddr, MO_BEQ, true,
> - helper_be_ldq_cmmu);
> + helper_be_ldq_cmmu, ldq_be_p);
> }
>
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v3 13/20] cputlb: Move NOTDIRTY handling from I/O path to TLB path
2019-09-22 3:54 ` [PATCH v3 13/20] cputlb: Move NOTDIRTY " Richard Henderson
2019-09-23 8:41 ` David Hildenbrand
@ 2019-09-23 9:30 ` Philippe Mathieu-Daudé
1 sibling, 0 replies; 53+ messages in thread
From: Philippe Mathieu-Daudé @ 2019-09-23 9:30 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: pbonzini, alex.bennee, stefanha, david
On 9/22/19 5:54 AM, Richard Henderson wrote:
> Pages that we want to track for NOTDIRTY are RAM. We do not
> really need to go through the I/O path to handle them.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> include/exec/cpu-common.h | 2 --
> accel/tcg/cputlb.c | 26 +++++++++++++++++---
> exec.c | 50 ---------------------------------------
> memory.c | 16 -------------
> 4 files changed, 23 insertions(+), 71 deletions(-)
>
> diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
> index 1c0e03ddc2..81753bbb34 100644
> --- a/include/exec/cpu-common.h
> +++ b/include/exec/cpu-common.h
> @@ -100,8 +100,6 @@ void qemu_flush_coalesced_mmio_buffer(void);
>
> void cpu_flush_icache_range(hwaddr start, hwaddr len);
>
> -extern struct MemoryRegion io_mem_notdirty;
> -
> typedef int (RAMBlockIterFunc)(RAMBlock *rb, void *opaque);
>
> int qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque);
> diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
> index 7ab523d7ec..b7bd738115 100644
> --- a/accel/tcg/cputlb.c
> +++ b/accel/tcg/cputlb.c
> @@ -904,7 +904,7 @@ static uint64_t io_readx(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
> mr = section->mr;
> mr_offset = (iotlbentry->addr & TARGET_PAGE_MASK) + addr;
> cpu->mem_io_pc = retaddr;
> - if (mr != &io_mem_notdirty && !cpu->can_do_io) {
> + if (!cpu->can_do_io) {
> cpu_io_recompile(cpu, retaddr);
> }
>
> @@ -945,7 +945,7 @@ static void io_writex(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
> section = iotlb_to_section(cpu, iotlbentry->addr, iotlbentry->attrs);
> mr = section->mr;
> mr_offset = (iotlbentry->addr & TARGET_PAGE_MASK) + addr;
> - if (mr != &io_mem_notdirty && !cpu->can_do_io) {
> + if (!cpu->can_do_io) {
> cpu_io_recompile(cpu, retaddr);
> }
> cpu->mem_io_vaddr = addr;
> @@ -1606,7 +1606,7 @@ store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
> }
>
> /* Handle I/O access. */
> - if (likely(tlb_addr & (TLB_MMIO | TLB_NOTDIRTY))) {
> + if (tlb_addr & TLB_MMIO) {
> io_writex(env, iotlbentry, mmu_idx, val, addr, retaddr,
> op ^ (tlb_addr & TLB_BSWAP ? MO_BSWAP : 0));
> return;
> @@ -1619,6 +1619,26 @@ store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
>
> haddr = (void *)((uintptr_t)addr + entry->addend);
>
> + /* Handle clean RAM pages. */
> + if (tlb_addr & TLB_NOTDIRTY) {
> + NotDirtyInfo ndi;
> +
> + /* We require mem_io_pc in tb_invalidate_phys_page_range. */
> + env_cpu(env)->mem_io_pc = retaddr;
> +
> + memory_notdirty_write_prepare(&ndi, env_cpu(env), addr,
> + addr + iotlbentry->addr, size);
> +
> + if (unlikely(tlb_addr & TLB_BSWAP)) {
> + direct_swap(haddr, val);
> + } else {
> + direct(haddr, val);
> + }
> +
> + memory_notdirty_write_complete(&ndi);
> + return;
> + }
> +
> if (unlikely(tlb_addr & TLB_BSWAP)) {
> direct_swap(haddr, val);
> } else {
> diff --git a/exec.c b/exec.c
> index e21e068535..abf58b68a0 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -88,7 +88,6 @@ static MemoryRegion *system_io;
> AddressSpace address_space_io;
> AddressSpace address_space_memory;
>
> -MemoryRegion io_mem_notdirty;
> static MemoryRegion io_mem_unassigned;
> #endif
>
> @@ -157,7 +156,6 @@ typedef struct subpage_t {
> } subpage_t;
>
> #define PHYS_SECTION_UNASSIGNED 0
> -#define PHYS_SECTION_NOTDIRTY 1
>
> static void io_mem_init(void);
> static void memory_map_init(void);
> @@ -1438,9 +1436,6 @@ hwaddr memory_region_section_get_iotlb(CPUState *cpu,
> if (memory_region_is_ram(section->mr)) {
> /* Normal RAM. */
> iotlb = memory_region_get_ram_addr(section->mr) + xlat;
> - if (!section->readonly) {
> - iotlb |= PHYS_SECTION_NOTDIRTY;
> - }
> } else {
> AddressSpaceDispatch *d;
>
> @@ -2749,42 +2744,6 @@ void memory_notdirty_write_complete(NotDirtyInfo *ndi)
> }
> }
>
> -/* Called within RCU critical section. */
> -static void notdirty_mem_write(void *opaque, hwaddr ram_addr,
> - uint64_t val, unsigned size)
> -{
> - NotDirtyInfo ndi;
> -
> - memory_notdirty_write_prepare(&ndi, current_cpu, current_cpu->mem_io_vaddr,
> - ram_addr, size);
> -
> - stn_p(qemu_map_ram_ptr(NULL, ram_addr), size, val);
> - memory_notdirty_write_complete(&ndi);
> -}
> -
> -static bool notdirty_mem_accepts(void *opaque, hwaddr addr,
> - unsigned size, bool is_write,
> - MemTxAttrs attrs)
> -{
> - return is_write;
> -}
> -
> -static const MemoryRegionOps notdirty_mem_ops = {
> - .write = notdirty_mem_write,
> - .valid.accepts = notdirty_mem_accepts,
> - .endianness = DEVICE_NATIVE_ENDIAN,
> - .valid = {
> - .min_access_size = 1,
> - .max_access_size = 8,
> - .unaligned = false,
> - },
> - .impl = {
> - .min_access_size = 1,
> - .max_access_size = 8,
> - .unaligned = false,
> - },
> -};
> -
> /* Generate a debug exception if a watchpoint has been hit. */
> void cpu_check_watchpoint(CPUState *cpu, vaddr addr, vaddr len,
> MemTxAttrs attrs, int flags, uintptr_t ra)
> @@ -2980,13 +2939,6 @@ static void io_mem_init(void)
> {
> memory_region_init_io(&io_mem_unassigned, NULL, &unassigned_mem_ops, NULL,
> NULL, UINT64_MAX);
> -
> - /* io_mem_notdirty calls tb_invalidate_phys_page_fast,
> - * which can be called without the iothread mutex.
> - */
> - memory_region_init_io(&io_mem_notdirty, NULL, ¬dirty_mem_ops, NULL,
> - NULL, UINT64_MAX);
> - memory_region_clear_global_locking(&io_mem_notdirty);
> }
>
> AddressSpaceDispatch *address_space_dispatch_new(FlatView *fv)
> @@ -2996,8 +2948,6 @@ AddressSpaceDispatch *address_space_dispatch_new(FlatView *fv)
>
> n = dummy_section(&d->map, fv, &io_mem_unassigned);
> assert(n == PHYS_SECTION_UNASSIGNED);
> - n = dummy_section(&d->map, fv, &io_mem_notdirty);
> - assert(n == PHYS_SECTION_NOTDIRTY);
>
> d->phys_map = (PhysPageEntry) { .ptr = PHYS_MAP_NODE_NIL, .skip = 1 };
>
> diff --git a/memory.c b/memory.c
> index 57c44c97db..a99b8c0767 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -434,10 +434,6 @@ static MemTxResult memory_region_read_accessor(MemoryRegion *mr,
> tmp = mr->ops->read(mr->opaque, addr, size);
> if (mr->subpage) {
> trace_memory_region_subpage_read(get_cpu_index(), mr, addr, tmp, size);
> - } else if (mr == &io_mem_notdirty) {
> - /* Accesses to code which has previously been translated into a TB show
> - * up in the MMIO path, as accesses to the io_mem_notdirty
> - * MemoryRegion. */
> } else if (TRACE_MEMORY_REGION_OPS_READ_ENABLED) {
> hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
> trace_memory_region_ops_read(get_cpu_index(), mr, abs_addr, tmp, size);
> @@ -460,10 +456,6 @@ static MemTxResult memory_region_read_with_attrs_accessor(MemoryRegion *mr,
> r = mr->ops->read_with_attrs(mr->opaque, addr, &tmp, size, attrs);
> if (mr->subpage) {
> trace_memory_region_subpage_read(get_cpu_index(), mr, addr, tmp, size);
> - } else if (mr == &io_mem_notdirty) {
> - /* Accesses to code which has previously been translated into a TB show
> - * up in the MMIO path, as accesses to the io_mem_notdirty
> - * MemoryRegion. */
> } else if (TRACE_MEMORY_REGION_OPS_READ_ENABLED) {
> hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
> trace_memory_region_ops_read(get_cpu_index(), mr, abs_addr, tmp, size);
> @@ -484,10 +476,6 @@ static MemTxResult memory_region_write_accessor(MemoryRegion *mr,
>
> if (mr->subpage) {
> trace_memory_region_subpage_write(get_cpu_index(), mr, addr, tmp, size);
> - } else if (mr == &io_mem_notdirty) {
> - /* Accesses to code which has previously been translated into a TB show
> - * up in the MMIO path, as accesses to the io_mem_notdirty
> - * MemoryRegion. */
> } else if (TRACE_MEMORY_REGION_OPS_WRITE_ENABLED) {
> hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
> trace_memory_region_ops_write(get_cpu_index(), mr, abs_addr, tmp, size);
> @@ -508,10 +496,6 @@ static MemTxResult memory_region_write_with_attrs_accessor(MemoryRegion *mr,
>
> if (mr->subpage) {
> trace_memory_region_subpage_write(get_cpu_index(), mr, addr, tmp, size);
> - } else if (mr == &io_mem_notdirty) {
> - /* Accesses to code which has previously been translated into a TB show
> - * up in the MMIO path, as accesses to the io_mem_notdirty
> - * MemoryRegion. */
> } else if (TRACE_MEMORY_REGION_OPS_WRITE_ENABLED) {
> hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
> trace_memory_region_ops_write(get_cpu_index(), mr, abs_addr, tmp, size);
>
Very nice!
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v3 08/20] cputlb: Disable __always_inline__ without optimization
2019-09-22 3:54 ` [PATCH v3 08/20] cputlb: Disable __always_inline__ without optimization Richard Henderson
2019-09-23 9:18 ` Philippe Mathieu-Daudé
@ 2019-09-23 9:45 ` Paolo Bonzini
2019-09-23 16:00 ` Richard Henderson
1 sibling, 1 reply; 53+ messages in thread
From: Paolo Bonzini @ 2019-09-23 9:45 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: alex.bennee, stefanha, david
On 22/09/19 05:54, Richard Henderson wrote:
> +/*
> + * Forced inlining may be desired to encourage constant propagation
> + * of function parameters. However, it can also make debugging harder,
> + * so disable it for a non-optimizing build.
> + */
> +#if defined(__OPTIMIZE__) && __has_attribute(always_inline)
> +#define QEMU_ALWAYS_INLINE __attribute__((always_inline))
GCC doesn't have __has_attribute, does it? I think you can just assume
that it exists and #ifdef __OPTIMIZE__.
Paolo
> +#else
> +#define QEMU_ALWAYS_INLINE
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v3 09/20] cputlb: Replace switches in load/store_helper with callback
2019-09-22 3:54 ` [PATCH v3 09/20] cputlb: Replace switches in load/store_helper with callback Richard Henderson
2019-09-23 8:32 ` David Hildenbrand
2019-09-23 9:27 ` Philippe Mathieu-Daudé
@ 2019-09-23 9:51 ` Paolo Bonzini
2019-09-23 9:54 ` David Hildenbrand
2 siblings, 1 reply; 53+ messages in thread
From: Paolo Bonzini @ 2019-09-23 9:51 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: alex.bennee, stefanha, david
On 22/09/19 05:54, Richard Henderson wrote:
> +/* Wrap the unaligned load helpers to that they have a common signature. */
> +static inline uint64_t wrap_ldub(const void *haddr)
> +{
> + return ldub_p(haddr);
> +}
> +
> +static inline uint64_t wrap_lduw_be(const void *haddr)
> +{
> + return lduw_be_p(haddr);
> +}
> +
> +static inline uint64_t wrap_lduw_le(const void *haddr)
> +{
> + return lduw_le_p(haddr);
> +}
> +
> +static inline uint64_t wrap_ldul_be(const void *haddr)
> +{
> + return (uint32_t)ldl_be_p(haddr);
> +}
> +
> +static inline uint64_t wrap_ldul_le(const void *haddr)
> +{
> + return (uint32_t)ldl_le_p(haddr);
> +}
Any reason to have these five functions (plus five for stores) instead
of a pair
static uint64_t ld_memop(const void *haddr, MemOp op)
{
}
static uint64_t st_memop(void *haddr, MemOp op, uint64_t val)
{
}
that includes the switches? Everything should be inlined just the same
if you do
if (unlikely(tlb_addr & TLB_BSWAP)) {
st_memop(haddr, op ^ MO_BSWAP, val);
} else {
st_memop(haddr, op, val);
}
and the like.
Paolo
> static inline uint64_t QEMU_ALWAYS_INLINE
> load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
> uintptr_t retaddr, MemOp op, bool code_read,
> - FullLoadHelper *full_load)
> + FullLoadHelper *full_load, LoadHelper *direct)
> {
> uintptr_t mmu_idx = get_mmuidx(oi);
> uintptr_t index = tlb_index(env, mmu_idx, addr);
> @@ -1373,33 +1400,7 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
>
> do_aligned_access:
> haddr = (void *)((uintptr_t)addr + entry->addend);
> - switch (op) {
> - case MO_UB:
> - res = ldub_p(haddr);
> - break;
> - case MO_BEUW:
> - res = lduw_be_p(haddr);
> - break;
> - case MO_LEUW:
> - res = lduw_le_p(haddr);
> - break;
> - case MO_BEUL:
> - res = (uint32_t)ldl_be_p(haddr);
> - break;
> - case MO_LEUL:
> - res = (uint32_t)ldl_le_p(haddr);
> - break;
> - case MO_BEQ:
> - res = ldq_be_p(haddr);
> - break;
> - case MO_LEQ:
> - res = ldq_le_p(haddr);
> - break;
> - default:
> - g_assert_not_reached();
> - }
> -
> - return res;
> + return direct(haddr);
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v3 09/20] cputlb: Replace switches in load/store_helper with callback
2019-09-23 9:51 ` Paolo Bonzini
@ 2019-09-23 9:54 ` David Hildenbrand
2019-09-23 10:02 ` Paolo Bonzini
0 siblings, 1 reply; 53+ messages in thread
From: David Hildenbrand @ 2019-09-23 9:54 UTC (permalink / raw)
To: Paolo Bonzini, Richard Henderson, qemu-devel; +Cc: alex.bennee, stefanha
On 23.09.19 11:51, Paolo Bonzini wrote:
> On 22/09/19 05:54, Richard Henderson wrote:
>> +/* Wrap the unaligned load helpers to that they have a common signature. */
>> +static inline uint64_t wrap_ldub(const void *haddr)
>> +{
>> + return ldub_p(haddr);
>> +}
>> +
>> +static inline uint64_t wrap_lduw_be(const void *haddr)
>> +{
>> + return lduw_be_p(haddr);
>> +}
>> +
>> +static inline uint64_t wrap_lduw_le(const void *haddr)
>> +{
>> + return lduw_le_p(haddr);
>> +}
>> +
>> +static inline uint64_t wrap_ldul_be(const void *haddr)
>> +{
>> + return (uint32_t)ldl_be_p(haddr);
>> +}
>> +
>> +static inline uint64_t wrap_ldul_le(const void *haddr)
>> +{
>> + return (uint32_t)ldl_le_p(haddr);
>> +}
>
> Any reason to have these five functions (plus five for stores) instead
> of a pair
>
> static uint64_t ld_memop(const void *haddr, MemOp op)
> {
> }
>
> static uint64_t st_memop(void *haddr, MemOp op, uint64_t val)
> {
> }
>
> that includes the switches? Everything should be inlined just the same
> if you do
>
> if (unlikely(tlb_addr & TLB_BSWAP)) {
> st_memop(haddr, op ^ MO_BSWAP, val);
> } else {
> st_memop(haddr, op, val);
> }
I asked the same question on v2 and Richard explained that - for
whatever reason - the compiler will not properly propagate the constant
in the "op ^ MO_BSWAP" case.
>
> and the like.
>
> Paolo
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v3 09/20] cputlb: Replace switches in load/store_helper with callback
2019-09-23 9:54 ` David Hildenbrand
@ 2019-09-23 10:02 ` Paolo Bonzini
2019-09-23 15:52 ` Richard Henderson
0 siblings, 1 reply; 53+ messages in thread
From: Paolo Bonzini @ 2019-09-23 10:02 UTC (permalink / raw)
To: David Hildenbrand, Richard Henderson, qemu-devel; +Cc: alex.bennee, stefanha
On 23/09/19 11:54, David Hildenbrand wrote:
> On 23.09.19 11:51, Paolo Bonzini wrote:
>> On 22/09/19 05:54, Richard Henderson wrote:
>>> +/* Wrap the unaligned load helpers to that they have a common signature. */
>>> +static inline uint64_t wrap_ldub(const void *haddr)
>>> +{
>>> + return ldub_p(haddr);
>>> +}
>>> +
>>> +static inline uint64_t wrap_lduw_be(const void *haddr)
>>> +{
>>> + return lduw_be_p(haddr);
>>> +}
>>> +
>>> +static inline uint64_t wrap_lduw_le(const void *haddr)
>>> +{
>>> + return lduw_le_p(haddr);
>>> +}
>>> +
>>> +static inline uint64_t wrap_ldul_be(const void *haddr)
>>> +{
>>> + return (uint32_t)ldl_be_p(haddr);
>>> +}
>>> +
>>> +static inline uint64_t wrap_ldul_le(const void *haddr)
>>> +{
>>> + return (uint32_t)ldl_le_p(haddr);
>>> +}
>>
>> Any reason to have these five functions (plus five for stores) instead
>> of a pair
>>
>> static uint64_t ld_memop(const void *haddr, MemOp op)
>> {
>> }
>>
>> static uint64_t st_memop(void *haddr, MemOp op, uint64_t val)
>> {
>> }
>>
>> that includes the switches? Everything should be inlined just the same
>> if you do
>>
>> if (unlikely(tlb_addr & TLB_BSWAP)) {
>> st_memop(haddr, op ^ MO_BSWAP, val);
>> } else {
>> st_memop(haddr, op, val);
>> }
>
> I asked the same question on v2 and Richard explained that - for
> whatever reason - the compiler will not properly propagate the constant
> in the "op ^ MO_BSWAP" case.
Even if ld_memop and st_memop are __always_inline__?
Paolo
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v3 09/20] cputlb: Replace switches in load/store_helper with callback
2019-09-23 10:02 ` Paolo Bonzini
@ 2019-09-23 15:52 ` Richard Henderson
2019-09-23 18:18 ` Richard Henderson
0 siblings, 1 reply; 53+ messages in thread
From: Richard Henderson @ 2019-09-23 15:52 UTC (permalink / raw)
To: Paolo Bonzini, David Hildenbrand, qemu-devel; +Cc: alex.bennee, stefanha
On 9/23/19 3:02 AM, Paolo Bonzini wrote:
> On 23/09/19 11:54, David Hildenbrand wrote:
>> On 23.09.19 11:51, Paolo Bonzini wrote:
>>> that includes the switches? Everything should be inlined just the same
>>> if you do
>>>
>>> if (unlikely(tlb_addr & TLB_BSWAP)) {
>>> st_memop(haddr, op ^ MO_BSWAP, val);
>>> } else {
>>> st_memop(haddr, op, val);
>>> }
>>
>> I asked the same question on v2 and Richard explained that - for
>> whatever reason - the compiler will not properly propagate the constant
>> in the "op ^ MO_BSWAP" case.
>
> Even if ld_memop and st_memop are __always_inline__?
I'm not sure I tried __always_inline__. I can, if you like.
r~
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v3 08/20] cputlb: Disable __always_inline__ without optimization
2019-09-23 9:45 ` Paolo Bonzini
@ 2019-09-23 16:00 ` Richard Henderson
2019-09-23 16:49 ` Paolo Bonzini
0 siblings, 1 reply; 53+ messages in thread
From: Richard Henderson @ 2019-09-23 16:00 UTC (permalink / raw)
To: Paolo Bonzini, qemu-devel; +Cc: alex.bennee, stefanha, david
On 9/23/19 2:45 AM, Paolo Bonzini wrote:
> On 22/09/19 05:54, Richard Henderson wrote:
>> +/*
>> + * Forced inlining may be desired to encourage constant propagation
>> + * of function parameters. However, it can also make debugging harder,
>> + * so disable it for a non-optimizing build.
>> + */
>> +#if defined(__OPTIMIZE__) && __has_attribute(always_inline)
>> +#define QEMU_ALWAYS_INLINE __attribute__((always_inline))
>
> GCC doesn't have __has_attribute, does it?
It does, since at least gcc 5. And now I realize that's only a reorganization
of the support and not when it was introduced.
r~
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v3 18/20] cputlb: Remove tb_invalidate_phys_page_range is_cpu_write_access
2019-09-23 8:52 ` David Hildenbrand
@ 2019-09-23 16:05 ` Richard Henderson
2019-09-23 16:50 ` Paolo Bonzini
0 siblings, 1 reply; 53+ messages in thread
From: Richard Henderson @ 2019-09-23 16:05 UTC (permalink / raw)
To: David Hildenbrand, qemu-devel; +Cc: pbonzini, alex.bennee, stefanha
On 9/23/19 1:52 AM, David Hildenbrand wrote:
>> - tb_invalidate_phys_page_range__locked(pages, p, start, end,
>> - is_cpu_write_access);
>> + tb_invalidate_phys_page_range__locked(pages, p, start, end, 0);
>
> I'd prefer "false" to highlight the semantics, but as it's and int ...
>
> Reviewed-by: David Hildenbrand <david@redhat.com>
I did think of that, but then the next patch changes this last argument to
"uintptr_t retaddr", so then it's not really a boolean argument anymore, and
"0" is exactly what we want.
r~
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v3 02/20] exec: Split out variable page size support to exec-vary.c
2019-09-23 8:26 ` David Hildenbrand
@ 2019-09-23 16:27 ` Richard Henderson
0 siblings, 0 replies; 53+ messages in thread
From: Richard Henderson @ 2019-09-23 16:27 UTC (permalink / raw)
To: David Hildenbrand, qemu-devel; +Cc: pbonzini, alex.bennee, stefanha
On 9/23/19 1:26 AM, David Hildenbrand wrote:
>> +void finalize_target_page_bits(void)
>> +{
>> +#ifdef TARGET_PAGE_BITS_VARY
>> + if (target_page_bits == 0) {
>> + target_page_bits = TARGET_PAGE_BITS_MIN;
>> + }
>> + target_page_bits_decided = true;
>> +#endif
>> +}
> I wonder if it would be nicer to handle this in the header file instead,
> providing dummy functions there.
No can do, because set_preferred_target_page_bits is used by vl.c, which is
compiled once and so does not have access to cpu.h definitions.
r~
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v3 08/20] cputlb: Disable __always_inline__ without optimization
2019-09-23 16:00 ` Richard Henderson
@ 2019-09-23 16:49 ` Paolo Bonzini
2019-09-23 18:09 ` Richard Henderson
0 siblings, 1 reply; 53+ messages in thread
From: Paolo Bonzini @ 2019-09-23 16:49 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: alex.bennee, stefanha, david
On 23/09/19 18:00, Richard Henderson wrote:
> On 9/23/19 2:45 AM, Paolo Bonzini wrote:
>> On 22/09/19 05:54, Richard Henderson wrote:
>>> +/*
>>> + * Forced inlining may be desired to encourage constant propagation
>>> + * of function parameters. However, it can also make debugging harder,
>>> + * so disable it for a non-optimizing build.
>>> + */
>>> +#if defined(__OPTIMIZE__) && __has_attribute(always_inline)
>>> +#define QEMU_ALWAYS_INLINE __attribute__((always_inline))
>>
>> GCC doesn't have __has_attribute, does it?
>
> It does, since at least gcc 5. And now I realize that's only a reorganization
> of the support and not when it was introduced.
Hmm, still we support 4.8 and always_inline is much older than that. So
I'm not sure it's useful to test it.
Paolo
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v3 18/20] cputlb: Remove tb_invalidate_phys_page_range is_cpu_write_access
2019-09-23 16:05 ` Richard Henderson
@ 2019-09-23 16:50 ` Paolo Bonzini
0 siblings, 0 replies; 53+ messages in thread
From: Paolo Bonzini @ 2019-09-23 16:50 UTC (permalink / raw)
To: Richard Henderson, David Hildenbrand, qemu-devel; +Cc: alex.bennee, stefanha
On 23/09/19 18:05, Richard Henderson wrote:
> On 9/23/19 1:52 AM, David Hildenbrand wrote:
>>> - tb_invalidate_phys_page_range__locked(pages, p, start, end,
>>> - is_cpu_write_access);
>>> + tb_invalidate_phys_page_range__locked(pages, p, start, end, 0);
>>
>> I'd prefer "false" to highlight the semantics, but as it's and int ...
>>
>> Reviewed-by: David Hildenbrand <david@redhat.com>
>
> I did think of that, but then the next patch changes this last argument to
> "uintptr_t retaddr", so then it's not really a boolean argument anymore, and
> "0" is exactly what we want.
Yeah, 0 is fine here.
Paolo
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v3 08/20] cputlb: Disable __always_inline__ without optimization
2019-09-23 16:49 ` Paolo Bonzini
@ 2019-09-23 18:09 ` Richard Henderson
0 siblings, 0 replies; 53+ messages in thread
From: Richard Henderson @ 2019-09-23 18:09 UTC (permalink / raw)
To: Paolo Bonzini, qemu-devel; +Cc: alex.bennee, stefanha, david
On 9/23/19 9:49 AM, Paolo Bonzini wrote:
> Hmm, still we support 4.8 and always_inline is much older than that. So
> I'm not sure it's useful to test it.
Fair enough.
r~
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v3 09/20] cputlb: Replace switches in load/store_helper with callback
2019-09-23 15:52 ` Richard Henderson
@ 2019-09-23 18:18 ` Richard Henderson
0 siblings, 0 replies; 53+ messages in thread
From: Richard Henderson @ 2019-09-23 18:18 UTC (permalink / raw)
To: Paolo Bonzini, David Hildenbrand, qemu-devel; +Cc: alex.bennee, stefanha
On 9/23/19 8:52 AM, Richard Henderson wrote:
> On 9/23/19 3:02 AM, Paolo Bonzini wrote:
>> On 23/09/19 11:54, David Hildenbrand wrote:
>>> On 23.09.19 11:51, Paolo Bonzini wrote:
>>>> that includes the switches? Everything should be inlined just the same
>>>> if you do
>>>>
>>>> if (unlikely(tlb_addr & TLB_BSWAP)) {
>>>> st_memop(haddr, op ^ MO_BSWAP, val);
>>>> } else {
>>>> st_memop(haddr, op, val);
>>>> }
>>>
>>> I asked the same question on v2 and Richard explained that - for
>>> whatever reason - the compiler will not properly propagate the constant
>>> in the "op ^ MO_BSWAP" case.
>>
>> Even if ld_memop and st_memop are __always_inline__?
>
> I'm not sure I tried __always_inline__. I can, if you like.
FWIW, that works.
Since two of you have now asked about this, I'll change the patch.
r~
^ permalink raw reply [flat|nested] 53+ messages in thread
end of thread, other threads:[~2019-09-23 18:23 UTC | newest]
Thread overview: 53+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-22 3:54 [PATCH v3 00/20] Move rom and notdirty handling to cputlb Richard Henderson
2019-09-22 3:54 ` [PATCH v3 01/20] exec: Use TARGET_PAGE_BITS_MIN for TLB flags Richard Henderson
2019-09-23 8:24 ` David Hildenbrand
2019-09-22 3:54 ` [PATCH v3 02/20] exec: Split out variable page size support to exec-vary.c Richard Henderson
2019-09-23 8:26 ` David Hildenbrand
2019-09-23 16:27 ` Richard Henderson
2019-09-22 3:54 ` [PATCH v3 03/20] exec: Use const alias for TARGET_PAGE_BITS_VARY Richard Henderson
2019-09-22 3:54 ` [PATCH v3 04/20] exec: Restrict TARGET_PAGE_BITS_VARY assert to CONFIG_DEBUG_TCG Richard Henderson
2019-09-22 3:54 ` [PATCH v3 05/20] exec: Promote TARGET_PAGE_MASK to target_long Richard Henderson
2019-09-23 8:30 ` David Hildenbrand
2019-09-22 3:54 ` [PATCH v3 06/20] exec: Tidy TARGET_PAGE_ALIGN Richard Henderson
2019-09-23 8:30 ` David Hildenbrand
2019-09-22 3:54 ` [PATCH v3 07/20] exec: Cache TARGET_PAGE_MASK for TARGET_PAGE_BITS_VARY Richard Henderson
2019-09-23 8:31 ` David Hildenbrand
2019-09-22 3:54 ` [PATCH v3 08/20] cputlb: Disable __always_inline__ without optimization Richard Henderson
2019-09-23 9:18 ` Philippe Mathieu-Daudé
2019-09-23 9:45 ` Paolo Bonzini
2019-09-23 16:00 ` Richard Henderson
2019-09-23 16:49 ` Paolo Bonzini
2019-09-23 18:09 ` Richard Henderson
2019-09-22 3:54 ` [PATCH v3 09/20] cputlb: Replace switches in load/store_helper with callback Richard Henderson
2019-09-23 8:32 ` David Hildenbrand
2019-09-23 9:27 ` Philippe Mathieu-Daudé
2019-09-23 9:51 ` Paolo Bonzini
2019-09-23 9:54 ` David Hildenbrand
2019-09-23 10:02 ` Paolo Bonzini
2019-09-23 15:52 ` Richard Henderson
2019-09-23 18:18 ` Richard Henderson
2019-09-22 3:54 ` [PATCH v3 10/20] cputlb: Introduce TLB_BSWAP Richard Henderson
2019-09-23 8:33 ` David Hildenbrand
2019-09-22 3:54 ` [PATCH v3 11/20] exec: Adjust notdirty tracing Richard Henderson
2019-09-23 9:17 ` Philippe Mathieu-Daudé
2019-09-22 3:54 ` [PATCH v3 12/20] cputlb: Move ROM handling from I/O path to TLB path Richard Henderson
2019-09-23 8:39 ` David Hildenbrand
2019-09-22 3:54 ` [PATCH v3 13/20] cputlb: Move NOTDIRTY " Richard Henderson
2019-09-23 8:41 ` David Hildenbrand
2019-09-23 9:30 ` Philippe Mathieu-Daudé
2019-09-22 3:54 ` [PATCH v3 14/20] cputlb: Partially inline memory_region_section_get_iotlb Richard Henderson
2019-09-22 3:54 ` [PATCH v3 15/20] cputlb: Merge and move memory_notdirty_write_{prepare, complete} Richard Henderson
2019-09-22 3:54 ` [PATCH v3 16/20] cputlb: Handle TLB_NOTDIRTY in probe_access Richard Henderson
2019-09-22 3:54 ` [PATCH v3 17/20] cputlb: Remove cpu->mem_io_vaddr Richard Henderson
2019-09-23 8:50 ` David Hildenbrand
2019-09-22 3:54 ` [PATCH v3 18/20] cputlb: Remove tb_invalidate_phys_page_range is_cpu_write_access Richard Henderson
2019-09-23 8:52 ` David Hildenbrand
2019-09-23 16:05 ` Richard Henderson
2019-09-23 16:50 ` Paolo Bonzini
2019-09-22 3:54 ` [PATCH v3 19/20] cputlb: Pass retaddr to tb_invalidate_phys_page_fast Richard Henderson
2019-09-23 8:53 ` David Hildenbrand
2019-09-22 3:54 ` [PATCH v3 20/20] cputlb: Pass retaddr to tb_check_watchpoint Richard Henderson
2019-09-23 8:54 ` David Hildenbrand
2019-09-22 4:02 ` [PATCH v3 00/20] Move rom and notdirty handling to cputlb Richard Henderson
2019-09-22 6:46 ` no-reply
2019-09-23 8:23 ` David Hildenbrand
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).