All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/24] accel/tcg: Rewrite user-only vma tracking
@ 2022-10-06  3:10 Richard Henderson
  2022-10-06  3:10 ` [PATCH 01/24] util: Add interval-tree.c Richard Henderson
                   ` (24 more replies)
  0 siblings, 25 replies; 55+ messages in thread
From: Richard Henderson @ 2022-10-06  3:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, laurent, pbonzini, imp, f4bug

The primary motivator here are the numerous bug reports (e.g. #290)
about not being able to handle very large memory allocations.
I presume all or most of these are due to guest use of the clang
address sanitizer, which allocates a massive shadow vma.

This patch set copies the linux kernel code for interval trees,
which is what the kernel itself uses for managing vmas.  I then
purge all (real) use of PageDesc from user-only.  This is easy
for user-only because everything tricky happens under mmap_lock();

I have thought only briefly about using interval trees for system
mode too, but the locking situation there is more difficult.  So
for now that code gets moved around but not substantially changed.

The test case from #290 is added to test/tcg/multiarch/.
Before this patch set, on my moderately beefy laptop, it takes 39s
and has an RSS of 28GB before the qemu process is killed.  After
the patch set, the test case successfully allocates 16TB and
completes in 0.013s.


r~


Richard Henderson (24):
  util: Add interval-tree.c
  accel/tcg: Make page_alloc_target_data allocation constant
  accel/tcg: Remove disabled debug in translate-all.c
  accel/tcg: Split out PageDesc to internal.h
  accel/tcg: Split out tb-maint.c
  accel/tcg: Move assert_no_pages_locked to internal.h
  accel/tcg: Drop cpu_get_tb_cpu_state from TARGET_HAS_PRECISE_SMC
  accel/tcg: Remove duplicate store to tb->page_addr[]
  accel/tcg: Introduce tb_{set_}page_addr{0,1}
  accel/tcg: Rename tb_invalidate_phys_page
  accel/tcg: Rename tb_invalidate_phys_page_range and drop end parameter
  accel/tcg: Unify declarations of tb_invalidate_phys_range
  accel/tcg: Use tb_invalidate_phys_page in page_set_flags
  accel/tcg: Call tb_invalidate_phys_page for PAGE_RESET
  accel/tcg: Use interval tree for TBs in user-only mode
  accel/tcg: Use page_reset_target_data in page_set_flags
  accel/tcg: Use tb_invalidate_phys_range in page_set_flags
  accel/tcg: Move TARGET_PAGE_DATA_SIZE impl to user-exec.c
  accel/tcg: Simplify page_get/alloc_target_data
  accel/tcg: Use interval tree for TARGET_PAGE_DATA_SIZE
  accel/tcg: Move page_{get,set}_flags to user-exec.c
  accel/tcg: Use interval tree for user-only page tracking
  accel/tcg: Move PageDesc tree into tb-maint.c for system
  accel/tcg: Move remainder of page locking to tb-maint.c

 accel/tcg/internal.h            |   40 +
 include/exec/cpu-all.h          |   22 +-
 include/exec/exec-all.h         |   75 +-
 include/exec/ram_addr.h         |    2 -
 include/exec/translate-all.h    |    8 +-
 include/qemu/interval-tree.h    |   99 ++
 target/arm/cpu.h                |    8 +
 target/arm/internals.h          |    4 -
 accel/tcg/cpu-exec.c            |    9 +-
 accel/tcg/tb-maint.c            | 1222 ++++++++++++++++++++++
 accel/tcg/translate-all.c       | 1683 +------------------------------
 accel/tcg/translator.c          |    9 +-
 accel/tcg/user-exec.c           |  662 ++++++++++++
 bsd-user/mmap.c                 |    2 -
 cpu.c                           |    4 +-
 linux-user/mmap.c               |    4 -
 target/arm/mte_helper.c         |    5 -
 tests/tcg/multiarch/test-vma.c  |   22 +
 tests/unit/test-interval-tree.c |  209 ++++
 util/interval-tree.c            |  881 ++++++++++++++++
 accel/tcg/meson.build           |    1 +
 tests/unit/meson.build          |    1 +
 util/meson.build                |    1 +
 23 files changed, 3235 insertions(+), 1738 deletions(-)
 create mode 100644 include/qemu/interval-tree.h
 create mode 100644 accel/tcg/tb-maint.c
 create mode 100644 tests/tcg/multiarch/test-vma.c
 create mode 100644 tests/unit/test-interval-tree.c
 create mode 100644 util/interval-tree.c

-- 
2.34.1



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 01/24] util: Add interval-tree.c
  2022-10-06  3:10 [PATCH 00/24] accel/tcg: Rewrite user-only vma tracking Richard Henderson
@ 2022-10-06  3:10 ` Richard Henderson
  2022-10-25  8:40   ` Alex Bennée
  2022-10-06  3:10 ` [PATCH 02/24] accel/tcg: Make page_alloc_target_data allocation constant Richard Henderson
                   ` (23 subsequent siblings)
  24 siblings, 1 reply; 55+ messages in thread
From: Richard Henderson @ 2022-10-06  3:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, laurent, pbonzini, imp, f4bug

Copy and simplify the Linux kernel's interval_tree_generic.h,
instantiating for uint64_t.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/qemu/interval-tree.h    |  99 ++++
 tests/unit/test-interval-tree.c | 209 ++++++++
 util/interval-tree.c            | 881 ++++++++++++++++++++++++++++++++
 tests/unit/meson.build          |   1 +
 util/meson.build                |   1 +
 5 files changed, 1191 insertions(+)
 create mode 100644 include/qemu/interval-tree.h
 create mode 100644 tests/unit/test-interval-tree.c
 create mode 100644 util/interval-tree.c

diff --git a/include/qemu/interval-tree.h b/include/qemu/interval-tree.h
new file mode 100644
index 0000000000..25006debe8
--- /dev/null
+++ b/include/qemu/interval-tree.h
@@ -0,0 +1,99 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Interval trees.
+ *
+ * Derived from include/linux/interval_tree.h and its dependencies.
+ */
+
+#ifndef QEMU_INTERVAL_TREE_H
+#define QEMU_INTERVAL_TREE_H
+
+/*
+ * For now, don't expose Linux Red-Black Trees separately, but retain the
+ * separate type definitions to keep the implementation sane, and allow
+ * the possibility of disentangling them later.
+ */
+typedef struct RBNode
+{
+    /* Encodes parent with color in the lsb. */
+    uintptr_t rb_parent_color;
+    struct RBNode *rb_right;
+    struct RBNode *rb_left;
+} RBNode;
+
+typedef struct RBRoot
+{
+    RBNode *rb_node;
+} RBRoot;
+
+typedef struct RBRootLeftCached {
+    RBRoot rb_root;
+    RBNode *rb_leftmost;
+} RBRootLeftCached;
+
+typedef struct IntervalTreeNode
+{
+    RBNode rb;
+
+    uint64_t start;    /* Start of interval */
+    uint64_t last;     /* Last location _in_ interval */
+    uint64_t subtree_last;
+} IntervalTreeNode;
+
+typedef RBRootLeftCached IntervalTreeRoot;
+
+/**
+ * interval_tree_is_empty
+ * @root: root of the tree.
+ *
+ * Returns true if the tree contains no nodes.
+ */
+static inline bool interval_tree_is_empty(const IntervalTreeRoot *root)
+{
+    return root->rb_root.rb_node == NULL;
+}
+
+/**
+ * interval_tree_insert
+ * @node: node to insert,
+ * @root: root of the tree.
+ *
+ * Insert @node into @root, and rebalance.
+ */
+void interval_tree_insert(IntervalTreeNode *node, IntervalTreeRoot *root);
+
+/**
+ * interval_tree_remove
+ * @node: node to remove,
+ * @root: root of the tree.
+ *
+ * Remove @node from @root, and rebalance.
+ */
+void interval_tree_remove(IntervalTreeNode *node, IntervalTreeRoot *root);
+
+/**
+ * interval_tree_iter_first:
+ * @root: root of the tree,
+ * @start, @last: the inclusive interval [start, last].
+ *
+ * Locate the "first" of a set of nodes within the tree at @root
+ * that overlap the interval, where "first" is sorted by start.
+ * Returns NULL if no overlap found.
+ */
+IntervalTreeNode *interval_tree_iter_first(IntervalTreeRoot *root,
+                                           uint64_t start, uint64_t last);
+
+/**
+ * interval_tree_iter_next:
+ * @node: previous search result
+ * @start, @last: the inclusive interval [start, last].
+ *
+ * Locate the "next" of a set of nodes within the tree that overlap the
+ * interval; @next is the result of a previous call to
+ * interval_tree_iter_{first,next}.  Returns NULL if @next was the last
+ * node in the set.
+ */
+IntervalTreeNode *interval_tree_iter_next(IntervalTreeNode *node,
+                                          uint64_t start, uint64_t last);
+
+#endif /* QEMU_INTERVAL_TREE_H */
diff --git a/tests/unit/test-interval-tree.c b/tests/unit/test-interval-tree.c
new file mode 100644
index 0000000000..119817a019
--- /dev/null
+++ b/tests/unit/test-interval-tree.c
@@ -0,0 +1,209 @@
+/*
+ * Test interval trees
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/interval-tree.h"
+
+static IntervalTreeNode nodes[20];
+static IntervalTreeRoot root;
+
+static void rand_interval(IntervalTreeNode *n, uint64_t start, uint64_t last)
+{
+    gint32 s_ofs, l_ofs, l_max;
+
+    if (last - start > INT32_MAX) {
+        l_max = INT32_MAX;
+    } else {
+        l_max = last - start;
+    }
+    s_ofs = g_test_rand_int_range(0, l_max);
+    l_ofs = g_test_rand_int_range(s_ofs, l_max);
+
+    n->start = start + s_ofs;
+    n->last = start + l_ofs;
+}
+
+static void test_empty(void)
+{
+    g_assert(root.rb_root.rb_node == NULL);
+    g_assert(root.rb_leftmost == NULL);
+    g_assert(interval_tree_iter_first(&root, 0, UINT64_MAX) == NULL);
+}
+
+static void test_find_one_point(void)
+{
+    /* Create a tree of a single node, which is the point [1,1]. */
+    nodes[0].start = 1;
+    nodes[0].last = 1;
+
+    interval_tree_insert(&nodes[0], &root);
+
+    g_assert(interval_tree_iter_first(&root, 0, 9) == &nodes[0]);
+    g_assert(interval_tree_iter_next(&nodes[0], 0, 9) == NULL);
+    g_assert(interval_tree_iter_first(&root, 0, 0) == NULL);
+    g_assert(interval_tree_iter_next(&nodes[0], 0, 0) == NULL);
+    g_assert(interval_tree_iter_first(&root, 0, 1) == &nodes[0]);
+    g_assert(interval_tree_iter_first(&root, 1, 1) == &nodes[0]);
+    g_assert(interval_tree_iter_first(&root, 1, 2) == &nodes[0]);
+    g_assert(interval_tree_iter_first(&root, 2, 2) == NULL);
+
+    interval_tree_remove(&nodes[0], &root);
+    g_assert(root.rb_root.rb_node == NULL);
+    g_assert(root.rb_leftmost == NULL);
+}
+
+static void test_find_two_point(void)
+{
+    IntervalTreeNode *find0, *find1;
+
+    /* Create a tree of a two nodes, which are both the point [1,1]. */
+    nodes[0].start = 1;
+    nodes[0].last = 1;
+    nodes[1] = nodes[0];
+
+    interval_tree_insert(&nodes[0], &root);
+    interval_tree_insert(&nodes[1], &root);
+
+    find0 = interval_tree_iter_first(&root, 0, 9);
+    g_assert(find0 == &nodes[0] || find0 == &nodes[1]);
+
+    find1 = interval_tree_iter_next(find0, 0, 9);
+    g_assert(find1 == &nodes[0] || find1 == &nodes[1]);
+    g_assert(find0 != find1);
+
+    interval_tree_remove(&nodes[1], &root);
+
+    g_assert(interval_tree_iter_first(&root, 0, 9) == &nodes[0]);
+    g_assert(interval_tree_iter_next(&nodes[0], 0, 9) == NULL);
+
+    interval_tree_remove(&nodes[0], &root);
+}
+
+static void test_find_one_range(void)
+{
+    /* Create a tree of a single node, which is the range [1,8]. */
+    nodes[0].start = 1;
+    nodes[0].last = 8;
+
+    interval_tree_insert(&nodes[0], &root);
+
+    g_assert(interval_tree_iter_first(&root, 0, 9) == &nodes[0]);
+    g_assert(interval_tree_iter_next(&nodes[0], 0, 9) == NULL);
+    g_assert(interval_tree_iter_first(&root, 0, 0) == NULL);
+    g_assert(interval_tree_iter_first(&root, 0, 1) == &nodes[0]);
+    g_assert(interval_tree_iter_first(&root, 1, 1) == &nodes[0]);
+    g_assert(interval_tree_iter_first(&root, 4, 6) == &nodes[0]);
+    g_assert(interval_tree_iter_first(&root, 8, 8) == &nodes[0]);
+    g_assert(interval_tree_iter_first(&root, 9, 9) == NULL);
+
+    interval_tree_remove(&nodes[0], &root);
+}
+
+static void test_find_one_range_many(void)
+{
+    int i;
+
+    /*
+     * Create a tree of many nodes in [0,99] and [200,299],
+     * but only one node with exactly [110,190].
+     */
+    nodes[0].start = 110;
+    nodes[0].last = 190;
+
+    for (i = 1; i < ARRAY_SIZE(nodes) / 2; ++i) {
+        rand_interval(&nodes[i], 0, 99);
+    }
+    for (; i < ARRAY_SIZE(nodes); ++i) {
+        rand_interval(&nodes[i], 200, 299);
+    }
+
+    for (i = 0; i < ARRAY_SIZE(nodes); ++i) {
+        interval_tree_insert(&nodes[i], &root);
+    }
+
+    /* Test that we find exactly the one node. */
+    g_assert(interval_tree_iter_first(&root, 100, 199) == &nodes[0]);
+    g_assert(interval_tree_iter_next(&nodes[0], 100, 199) == NULL);
+    g_assert(interval_tree_iter_first(&root, 100, 109) == NULL);
+    g_assert(interval_tree_iter_first(&root, 100, 110) == &nodes[0]);
+    g_assert(interval_tree_iter_first(&root, 111, 120) == &nodes[0]);
+    g_assert(interval_tree_iter_first(&root, 111, 199) == &nodes[0]);
+    g_assert(interval_tree_iter_first(&root, 190, 199) == &nodes[0]);
+    g_assert(interval_tree_iter_first(&root, 192, 199) == NULL);
+
+    /*
+     * Test that if there are multiple matches, we return the one
+     * with the minimal start.
+     */
+    g_assert(interval_tree_iter_first(&root, 100, 300) == &nodes[0]);
+
+    /* Test that we don't find it after it is removed. */
+    interval_tree_remove(&nodes[0], &root);
+    g_assert(interval_tree_iter_first(&root, 100, 199) == NULL);
+
+    for (i = 1; i < ARRAY_SIZE(nodes); ++i) {
+        interval_tree_remove(&nodes[i], &root);
+    }
+}
+
+static void test_find_many_range(void)
+{
+    IntervalTreeNode *find;
+    int i, n;
+
+    n = g_test_rand_int_range(ARRAY_SIZE(nodes) / 3, ARRAY_SIZE(nodes) / 2);
+
+    /*
+     * Create a fair few nodes in [2000,2999], with the others
+     * distributed around.
+     */
+    for (i = 0; i < n; ++i) {
+        rand_interval(&nodes[i], 2000, 2999);
+    }
+    for (; i < ARRAY_SIZE(nodes) * 2 / 3; ++i) {
+        rand_interval(&nodes[i], 1000, 1899);
+    }
+    for (; i < ARRAY_SIZE(nodes); ++i) {
+        rand_interval(&nodes[i], 3100, 3999);
+    }
+
+    for (i = 0; i < ARRAY_SIZE(nodes); ++i) {
+        interval_tree_insert(&nodes[i], &root);
+    }
+
+    /* Test that we find all of the nodes. */
+    find = interval_tree_iter_first(&root, 2000, 2999);
+    for (i = 0; find != NULL; i++) {
+        find = interval_tree_iter_next(find, 2000, 2999);
+    }
+    g_assert_cmpint(i, ==, n);
+
+    g_assert(interval_tree_iter_first(&root,    0,  999) == NULL);
+    g_assert(interval_tree_iter_first(&root, 1900, 1999) == NULL);
+    g_assert(interval_tree_iter_first(&root, 3000, 3099) == NULL);
+    g_assert(interval_tree_iter_first(&root, 4000, UINT64_MAX) == NULL);
+
+    for (i = 0; i < ARRAY_SIZE(nodes); ++i) {
+        interval_tree_remove(&nodes[i], &root);
+    }
+}
+
+int main(int argc, char **argv)
+{
+    g_test_init(&argc, &argv, NULL);
+
+    g_test_add_func("/interval-tree/empty", test_empty);
+    g_test_add_func("/interval-tree/find-one-point", test_find_one_point);
+    g_test_add_func("/interval-tree/find-two-point", test_find_two_point);
+    g_test_add_func("/interval-tree/find-one-range", test_find_one_range);
+    g_test_add_func("/interval-tree/find-one-range-many",
+                    test_find_one_range_many);
+    g_test_add_func("/interval-tree/find-many-range", test_find_many_range);
+
+    return g_test_run();
+}
diff --git a/util/interval-tree.c b/util/interval-tree.c
new file mode 100644
index 0000000000..9578c05830
--- /dev/null
+++ b/util/interval-tree.c
@@ -0,0 +1,881 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+#include "qemu/osdep.h"
+#include "qemu/interval-tree.h"
+#include "qemu/atomic.h"
+
+/*
+ * Red Black Trees.
+ *
+ * For now, don't expose Linux Red-Black Trees separately, but retain the
+ * separate type definitions to keep the implementation sane, and allow
+ * the possibility of separating them later.
+ *
+ * Derived from include/linux/rbtree_augmented.h and its dependencies.
+ */
+
+/*
+ * red-black trees properties:  https://en.wikipedia.org/wiki/Rbtree
+ *
+ *  1) A node is either red or black
+ *  2) The root is black
+ *  3) All leaves (NULL) are black
+ *  4) Both children of every red node are black
+ *  5) Every simple path from root to leaves contains the same number
+ *     of black nodes.
+ *
+ *  4 and 5 give the O(log n) guarantee, since 4 implies you cannot have two
+ *  consecutive red nodes in a path and every red node is therefore followed by
+ *  a black. So if B is the number of black nodes on every simple path (as per
+ *  5), then the longest possible path due to 4 is 2B.
+ *
+ *  We shall indicate color with case, where black nodes are uppercase and red
+ *  nodes will be lowercase. Unknown color nodes shall be drawn as red within
+ *  parentheses and have some accompanying text comment.
+ *
+ * Notes on lockless lookups:
+ *
+ * All stores to the tree structure (rb_left and rb_right) must be done using
+ * WRITE_ONCE [qatomic_set for QEMU]. And we must not inadvertently cause
+ * (temporary) loops in the tree structure as seen in program order.
+ *
+ * These two requirements will allow lockless iteration of the tree -- not
+ * correct iteration mind you, tree rotations are not atomic so a lookup might
+ * miss entire subtrees.
+ *
+ * But they do guarantee that any such traversal will only see valid elements
+ * and that it will indeed complete -- does not get stuck in a loop.
+ *
+ * It also guarantees that if the lookup returns an element it is the 'correct'
+ * one. But not returning an element does _NOT_ mean it's not present.
+ *
+ * NOTE:
+ *
+ * Stores to __rb_parent_color are not important for simple lookups so those
+ * are left undone as of now. Nor did I check for loops involving parent
+ * pointers.
+ */
+
+typedef enum RBColor
+{
+    RB_RED,
+    RB_BLACK,
+} RBColor;
+
+typedef struct RBAugmentCallbacks {
+    void (*propagate)(RBNode *node, RBNode *stop);
+    void (*copy)(RBNode *old, RBNode *new);
+    void (*rotate)(RBNode *old, RBNode *new);
+} RBAugmentCallbacks;
+
+static inline RBNode *rb_parent(const RBNode *n)
+{
+    return (RBNode *)(n->rb_parent_color & ~1);
+}
+
+static inline RBNode *rb_red_parent(const RBNode *n)
+{
+    return (RBNode *)n->rb_parent_color;
+}
+
+static inline RBColor pc_color(uintptr_t pc)
+{
+    return (RBColor)(pc & 1);
+}
+
+static inline bool pc_is_red(uintptr_t pc)
+{
+    return pc_color(pc) == RB_RED;
+}
+
+static inline bool pc_is_black(uintptr_t pc)
+{
+    return !pc_is_red(pc);
+}
+
+static inline RBColor rb_color(const RBNode *n)
+{
+    return pc_color(n->rb_parent_color);
+}
+
+static inline bool rb_is_red(const RBNode *n)
+{
+    return pc_is_red(n->rb_parent_color);
+}
+
+static inline bool rb_is_black(const RBNode *n)
+{
+    return pc_is_black(n->rb_parent_color);
+}
+
+static inline void rb_set_black(RBNode *n)
+{
+    n->rb_parent_color |= RB_BLACK;
+}
+
+static inline void rb_set_parent_color(RBNode *n, RBNode *p, RBColor color)
+{
+    n->rb_parent_color = (uintptr_t)p | color;
+}
+
+static inline void rb_set_parent(RBNode *n, RBNode *p)
+{
+    rb_set_parent_color(n, p, rb_color(n));
+}
+
+static inline void rb_link_node(RBNode *node, RBNode *parent, RBNode **rb_link)
+{
+    node->rb_parent_color = (uintptr_t)parent;
+    node->rb_left = node->rb_right = NULL;
+
+    qatomic_set(rb_link, node);
+}
+
+static RBNode *rb_next(RBNode *node)
+{
+    RBNode *parent;
+
+    /* OMIT: if empty node, return null. */
+
+    /*
+     * If we have a right-hand child, go down and then left as far as we can.
+     */
+    if (node->rb_right) {
+        node = node->rb_right;
+        while (node->rb_left) {
+            node = node->rb_left;
+        }
+        return node;
+    }
+
+    /*
+     * No right-hand children. Everything down and left is smaller than us,
+     * so any 'next' node must be in the general direction of our parent.
+     * Go up the tree; any time the ancestor is a right-hand child of its
+     * parent, keep going up. First time it's a left-hand child of its
+     * parent, said parent is our 'next' node.
+     */
+    while ((parent = rb_parent(node)) && node == parent->rb_right) {
+        node = parent;
+    }
+
+    return parent;
+}
+
+static inline void rb_change_child(RBNode *old, RBNode *new,
+                                   RBNode *parent, RBRoot *root)
+{
+    if (!parent) {
+        qatomic_set(&root->rb_node, new);
+    } else if (parent->rb_left == old) {
+        qatomic_set(&parent->rb_left, new);
+    } else {
+        qatomic_set(&parent->rb_right, new);
+    }
+}
+
+static inline void rb_rotate_set_parents(RBNode *old, RBNode *new,
+                                         RBRoot *root, RBColor color)
+{
+    RBNode *parent = rb_parent(old);
+
+    new->rb_parent_color = old->rb_parent_color;
+    rb_set_parent_color(old, new, color);
+    rb_change_child(old, new, parent, root);
+}
+
+static void rb_insert_augmented(RBNode *node, RBRoot *root,
+                                const RBAugmentCallbacks *augment)
+{
+    RBNode *parent = rb_red_parent(node), *gparent, *tmp;
+
+    while (true) {
+        /*
+         * Loop invariant: node is red.
+         */
+        if (unlikely(!parent)) {
+            /*
+             * The inserted node is root. Either this is the first node, or
+             * we recursed at Case 1 below and are no longer violating 4).
+             */
+            rb_set_parent_color(node, NULL, RB_BLACK);
+            break;
+        }
+
+        /*
+         * If there is a black parent, we are done.  Otherwise, take some
+         * corrective action as, per 4), we don't want a red root or two
+         * consecutive red nodes.
+         */
+        if (rb_is_black(parent)) {
+            break;
+        }
+
+        gparent = rb_red_parent(parent);
+
+        tmp = gparent->rb_right;
+        if (parent != tmp) {    /* parent == gparent->rb_left */
+            if (tmp && rb_is_red(tmp)) {
+                /*
+                 * Case 1 - node's uncle is red (color flips).
+                 *
+                 *       G            g
+                 *      / \          / \
+                 *     p   u  -->   P   U
+                 *    /            /
+                 *   n            n
+                 *
+                 * However, since g's parent might be red, and 4) does not
+                 * allow this, we need to recurse at g.
+                 */
+                rb_set_parent_color(tmp, gparent, RB_BLACK);
+                rb_set_parent_color(parent, gparent, RB_BLACK);
+                node = gparent;
+                parent = rb_parent(node);
+                rb_set_parent_color(node, parent, RB_RED);
+                continue;
+            }
+
+            tmp = parent->rb_right;
+            if (node == tmp) {
+                /*
+                 * Case 2 - node's uncle is black and node is
+                 * the parent's right child (left rotate at parent).
+                 *
+                 *      G             G
+                 *     / \           / \
+                 *    p   U  -->    n   U
+                 *     \           /
+                 *      n         p
+                 *
+                 * This still leaves us in violation of 4), the
+                 * continuation into Case 3 will fix that.
+                 */
+                tmp = node->rb_left;
+                qatomic_set(&parent->rb_right, tmp);
+                qatomic_set(&node->rb_left, parent);
+                if (tmp) {
+                    rb_set_parent_color(tmp, parent, RB_BLACK);
+                }
+                rb_set_parent_color(parent, node, RB_RED);
+                augment->rotate(parent, node);
+                parent = node;
+                tmp = node->rb_right;
+            }
+
+            /*
+             * Case 3 - node's uncle is black and node is
+             * the parent's left child (right rotate at gparent).
+             *
+             *        G           P
+             *       / \         / \
+             *      p   U  -->  n   g
+             *     /                 \
+             *    n                   U
+             */
+            qatomic_set(&gparent->rb_left, tmp); /* == parent->rb_right */
+            qatomic_set(&parent->rb_right, gparent);
+            if (tmp) {
+                rb_set_parent_color(tmp, gparent, RB_BLACK);
+            }
+            rb_rotate_set_parents(gparent, parent, root, RB_RED);
+            augment->rotate(gparent, parent);
+            break;
+        } else {
+            tmp = gparent->rb_left;
+            if (tmp && rb_is_red(tmp)) {
+                /* Case 1 - color flips */
+                rb_set_parent_color(tmp, gparent, RB_BLACK);
+                rb_set_parent_color(parent, gparent, RB_BLACK);
+                node = gparent;
+                parent = rb_parent(node);
+                rb_set_parent_color(node, parent, RB_RED);
+                continue;
+            }
+
+            tmp = parent->rb_left;
+            if (node == tmp) {
+                /* Case 2 - right rotate at parent */
+                tmp = node->rb_right;
+                qatomic_set(&parent->rb_left, tmp);
+                qatomic_set(&node->rb_right, parent);
+                if (tmp) {
+                    rb_set_parent_color(tmp, parent, RB_BLACK);
+                }
+                rb_set_parent_color(parent, node, RB_RED);
+                augment->rotate(parent, node);
+                parent = node;
+                tmp = node->rb_left;
+            }
+
+            /* Case 3 - left rotate at gparent */
+            qatomic_set(&gparent->rb_right, tmp); /* == parent->rb_left */
+            qatomic_set(&parent->rb_left, gparent);
+            if (tmp) {
+                rb_set_parent_color(tmp, gparent, RB_BLACK);
+            }
+            rb_rotate_set_parents(gparent, parent, root, RB_RED);
+            augment->rotate(gparent, parent);
+            break;
+        }
+    }
+}
+
+static void rb_insert_augmented_cached(RBNode *node,
+                                       RBRootLeftCached *root, bool newleft,
+                                       const RBAugmentCallbacks *augment)
+{
+    if (newleft) {
+        root->rb_leftmost = node;
+    }
+    rb_insert_augmented(node, &root->rb_root, augment);
+}
+
+static void rb_erase_color(RBNode *parent, RBRoot *root,
+                           const RBAugmentCallbacks *augment)
+{
+    RBNode *node = NULL, *sibling, *tmp1, *tmp2;
+
+    while (true) {
+        /*
+         * Loop invariants:
+         * - node is black (or NULL on first iteration)
+         * - node is not the root (parent is not NULL)
+         * - All leaf paths going through parent and node have a
+         *   black node count that is 1 lower than other leaf paths.
+         */
+        sibling = parent->rb_right;
+        if (node != sibling) {  /* node == parent->rb_left */
+            if (rb_is_red(sibling)) {
+                /*
+                 * Case 1 - left rotate at parent
+                 *
+                 *     P               S
+                 *    / \             / \ 
+                 *   N   s    -->    p   Sr
+                 *      / \         / \ 
+                 *     Sl  Sr      N   Sl
+                 */
+                tmp1 = sibling->rb_left;
+                qatomic_set(&parent->rb_right, tmp1);
+                qatomic_set(&sibling->rb_left, parent);
+                rb_set_parent_color(tmp1, parent, RB_BLACK);
+                rb_rotate_set_parents(parent, sibling, root, RB_RED);
+                augment->rotate(parent, sibling);
+                sibling = tmp1;
+            }
+            tmp1 = sibling->rb_right;
+            if (!tmp1 || rb_is_black(tmp1)) {
+                tmp2 = sibling->rb_left;
+                if (!tmp2 || rb_is_black(tmp2)) {
+                    /*
+                     * Case 2 - sibling color flip
+                     * (p could be either color here)
+                     *
+                     *    (p)           (p)
+                     *    / \           / \ 
+                     *   N   S    -->  N   s
+                     *      / \           / \ 
+                     *     Sl  Sr        Sl  Sr
+                     *
+                     * This leaves us violating 5) which
+                     * can be fixed by flipping p to black
+                     * if it was red, or by recursing at p.
+                     * p is red when coming from Case 1.
+                     */
+                    rb_set_parent_color(sibling, parent, RB_RED);
+                    if (rb_is_red(parent)) {
+                        rb_set_black(parent);
+                    } else {
+                        node = parent;
+                        parent = rb_parent(node);
+                        if (parent) {
+                            continue;
+                        }
+                    }
+                    break;
+                }
+                /*
+                 * Case 3 - right rotate at sibling
+                 * (p could be either color here)
+                 *
+                 *   (p)           (p)
+                 *   / \           / \
+                 *  N   S    -->  N   sl
+                 *     / \             \
+                 *    sl  Sr            S
+                 *                       \
+                 *                        Sr
+                 *
+                 * Note: p might be red, and then bot
+                 * p and sl are red after rotation (which
+                 * breaks property 4). This is fixed in
+                 * Case 4 (in rb_rotate_set_parents()
+                 *         which set sl the color of p
+                 *         and set p RB_BLACK)
+                 *
+                 *   (p)            (sl)
+                 *   / \            /  \
+                 *  N   sl   -->   P    S
+                 *       \        /      \
+                 *        S      N        Sr
+                 *         \
+                 *          Sr
+                 */
+                tmp1 = tmp2->rb_right;
+                qatomic_set(&sibling->rb_left, tmp1);
+                qatomic_set(&tmp2->rb_right, sibling);
+                qatomic_set(&parent->rb_right, tmp2);
+                if (tmp1) {
+                    rb_set_parent_color(tmp1, sibling, RB_BLACK);
+                }
+                augment->rotate(sibling, tmp2);
+                tmp1 = sibling;
+                sibling = tmp2;
+            }
+            /*
+             * Case 4 - left rotate at parent + color flips
+             * (p and sl could be either color here.
+             *  After rotation, p becomes black, s acquires
+             *  p's color, and sl keeps its color)
+             *
+             *      (p)             (s)
+             *      / \             / \
+             *     N   S     -->   P   Sr
+             *        / \         / \
+             *      (sl) sr      N  (sl)
+             */
+            tmp2 = sibling->rb_left;
+            qatomic_set(&parent->rb_right, tmp2);
+            qatomic_set(&sibling->rb_left, parent);
+            rb_set_parent_color(tmp1, sibling, RB_BLACK);
+            if (tmp2) {
+                rb_set_parent(tmp2, parent);
+            }
+            rb_rotate_set_parents(parent, sibling, root, RB_BLACK);
+            augment->rotate(parent, sibling);
+            break;
+        } else {
+            sibling = parent->rb_left;
+            if (rb_is_red(sibling)) {
+                /* Case 1 - right rotate at parent */
+                tmp1 = sibling->rb_right;
+                qatomic_set(&parent->rb_left, tmp1);
+                qatomic_set(&sibling->rb_right, parent);
+                rb_set_parent_color(tmp1, parent, RB_BLACK);
+                rb_rotate_set_parents(parent, sibling, root, RB_RED);
+                augment->rotate(parent, sibling);
+                sibling = tmp1;
+            }
+            tmp1 = sibling->rb_left;
+            if (!tmp1 || rb_is_black(tmp1)) {
+                tmp2 = sibling->rb_right;
+                if (!tmp2 || rb_is_black(tmp2)) {
+                    /* Case 2 - sibling color flip */
+                    rb_set_parent_color(sibling, parent, RB_RED);
+                    if (rb_is_red(parent)) {
+                        rb_set_black(parent);
+                    } else {
+                        node = parent;
+                        parent = rb_parent(node);
+                        if (parent) {
+                            continue;
+                        }
+                    }
+                    break;
+                }
+                /* Case 3 - left rotate at sibling */
+                tmp1 = tmp2->rb_left;
+                qatomic_set(&sibling->rb_right, tmp1);
+                qatomic_set(&tmp2->rb_left, sibling);
+                qatomic_set(&parent->rb_left, tmp2);
+                if (tmp1) {
+                    rb_set_parent_color(tmp1, sibling, RB_BLACK);
+                }
+                augment->rotate(sibling, tmp2);
+                tmp1 = sibling;
+                sibling = tmp2;
+            }
+            /* Case 4 - right rotate at parent + color flips */
+            tmp2 = sibling->rb_right;
+            qatomic_set(&parent->rb_left, tmp2);
+            qatomic_set(&sibling->rb_right, parent);
+            rb_set_parent_color(tmp1, sibling, RB_BLACK);
+            if (tmp2) {
+                rb_set_parent(tmp2, parent);
+            }
+            rb_rotate_set_parents(parent, sibling, root, RB_BLACK);
+            augment->rotate(parent, sibling);
+            break;
+        }
+    }
+}
+
+static void rb_erase_augmented(RBNode *node, RBRoot *root,
+                               const RBAugmentCallbacks *augment)
+{
+    RBNode *child = node->rb_right;
+    RBNode *tmp = node->rb_left;
+    RBNode *parent, *rebalance;
+    uintptr_t pc;
+
+    if (!tmp) {
+        /*
+         * Case 1: node to erase has no more than 1 child (easy!)
+         *
+         * Note that if there is one child it must be red due to 5)
+         * and node must be black due to 4). We adjust colors locally
+         * so as to bypass rb_erase_color() later on.
+         */
+        pc = node->rb_parent_color;
+        parent = rb_parent(node);
+        rb_change_child(node, child, parent, root);
+        if (child) {
+            child->rb_parent_color = pc;
+            rebalance = NULL;
+        } else {
+            rebalance = pc_is_black(pc) ? parent : NULL;
+        }
+        tmp = parent;
+    } else if (!child) {
+        /* Still case 1, but this time the child is node->rb_left */
+        pc = node->rb_parent_color;
+        parent = rb_parent(node);
+        tmp->rb_parent_color = pc;
+        rb_change_child(node, tmp, parent, root);
+        rebalance = NULL;
+        tmp = parent;
+    } else {
+        RBNode *successor = child, *child2;
+        tmp = child->rb_left;
+        if (!tmp) {
+            /*
+             * Case 2: node's successor is its right child
+             *
+             *    (n)          (s)
+             *    / \          / \
+             *  (x) (s)  ->  (x) (c)
+             *        \
+             *        (c)
+             */
+            parent = successor;
+            child2 = successor->rb_right;
+
+            augment->copy(node, successor);
+        } else {
+            /*
+             * Case 3: node's successor is leftmost under
+             * node's right child subtree
+             *
+             *    (n)          (s)
+             *    / \          / \
+             *  (x) (y)  ->  (x) (y)
+             *      /            /
+             *    (p)          (p)
+             *    /            /
+             *  (s)          (c)
+             *    \
+             *    (c)
+             */
+            do {
+                parent = successor;
+                successor = tmp;
+                tmp = tmp->rb_left;
+            } while (tmp);
+            child2 = successor->rb_right;
+            qatomic_set(&parent->rb_left, child2);
+            qatomic_set(&successor->rb_right, child);
+            rb_set_parent(child, successor);
+
+            augment->copy(node, successor);
+            augment->propagate(parent, successor);
+        }
+
+        tmp = node->rb_left;
+        qatomic_set(&successor->rb_left, tmp);
+        rb_set_parent(tmp, successor);
+
+        pc = node->rb_parent_color;
+        tmp = rb_parent(node);
+        rb_change_child(node, successor, tmp, root);
+
+        if (child2) {
+            rb_set_parent_color(child2, parent, RB_BLACK);
+            rebalance = NULL;
+        } else {
+            rebalance = rb_is_black(successor) ? parent : NULL;
+        }
+        successor->rb_parent_color = pc;
+        tmp = successor;
+    }
+
+    augment->propagate(tmp, NULL);
+
+    if (rebalance) {
+        rb_erase_color(rebalance, root, augment);
+    }
+}
+
+static void rb_erase_augmented_cached(RBNode *node, RBRootLeftCached *root,
+                                      const RBAugmentCallbacks *augment)
+{
+    if (root->rb_leftmost == node) {
+        root->rb_leftmost = rb_next(node);
+    }
+    rb_erase_augmented(node, &root->rb_root, augment);
+}
+
+
+/*
+ * Interval trees.
+ *
+ * Derived from lib/interval_tree.c and its dependencies,
+ * especially include/linux/interval_tree_generic.h.
+ */
+
+#define rb_to_itree(N)  container_of(N, IntervalTreeNode, rb)
+
+static bool interval_tree_compute_max(IntervalTreeNode *node, bool exit)
+{
+    IntervalTreeNode *child;
+    uint64_t max = node->last;
+
+    if (node->rb.rb_left) {
+        child = rb_to_itree(node->rb.rb_left);
+        if (child->subtree_last > max) {
+            max = child->subtree_last;
+        }
+    }
+    if (node->rb.rb_right) {
+        child = rb_to_itree(node->rb.rb_right);
+        if (child->subtree_last > max) {
+            max = child->subtree_last;
+        }
+    }
+    if (exit && node->subtree_last == max) {
+        return true;
+    }
+    node->subtree_last = max;
+    return false;
+}
+
+static void interval_tree_propagate(RBNode *rb, RBNode *stop)
+{
+    while (rb != stop) {
+        IntervalTreeNode *node = rb_to_itree(rb);
+        if (interval_tree_compute_max(node, true)) {
+            break;
+        }
+        rb = rb_parent(&node->rb);
+    }
+}
+
+static void interval_tree_copy(RBNode *rb_old, RBNode *rb_new)
+{
+    IntervalTreeNode *old = rb_to_itree(rb_old);
+    IntervalTreeNode *new = rb_to_itree(rb_new);
+
+    new->subtree_last = old->subtree_last;
+}
+
+static void interval_tree_rotate(RBNode *rb_old, RBNode *rb_new)
+{
+    IntervalTreeNode *old = rb_to_itree(rb_old);
+    IntervalTreeNode *new = rb_to_itree(rb_new);
+
+    new->subtree_last = old->subtree_last;
+    interval_tree_compute_max(old, false);
+}
+
+static const RBAugmentCallbacks interval_tree_augment = {
+    .propagate = interval_tree_propagate,
+    .copy = interval_tree_copy,
+    .rotate = interval_tree_rotate,
+};
+
+/* Insert / remove interval nodes from the tree */
+void interval_tree_insert(IntervalTreeNode *node, IntervalTreeRoot *root)
+{
+    RBNode **link = &root->rb_root.rb_node, *rb_parent = NULL;
+    uint64_t start = node->start, last = node->last;
+    IntervalTreeNode *parent;
+    bool leftmost = true;
+
+    while (*link) {
+        rb_parent = *link;
+        parent = rb_to_itree(rb_parent);
+
+        if (parent->subtree_last < last) {
+            parent->subtree_last = last;
+        }
+        if (start < parent->start) {
+            link = &parent->rb.rb_left;
+        } else {
+            link = &parent->rb.rb_right;
+            leftmost = false;
+        }
+    }
+
+    node->subtree_last = last;
+    rb_link_node(&node->rb, rb_parent, link);
+    rb_insert_augmented_cached(&node->rb, root, leftmost,
+                               &interval_tree_augment);
+}
+
+void interval_tree_remove(IntervalTreeNode *node, IntervalTreeRoot *root)
+{
+    rb_erase_augmented_cached(&node->rb, root, &interval_tree_augment);
+}
+
+/*
+ * Iterate over intervals intersecting [start;last]
+ *
+ * Note that a node's interval intersects [start;last] iff:
+ *   Cond1: node->start <= last
+ * and
+ *   Cond2: start <= node->last
+ */
+
+static IntervalTreeNode *interval_tree_subtree_search(IntervalTreeNode *node,
+                                                      uint64_t start,
+                                                      uint64_t last)
+{
+    while (true) {
+        /*
+         * Loop invariant: start <= node->subtree_last
+         * (Cond2 is satisfied by one of the subtree nodes)
+         */
+        if (node->rb.rb_left) {
+            IntervalTreeNode *left = rb_to_itree(node->rb.rb_left);
+
+            if (start <= left->subtree_last) {
+                /*
+                 * Some nodes in left subtree satisfy Cond2.
+                 * Iterate to find the leftmost such node N.
+                 * If it also satisfies Cond1, that's the
+                 * match we are looking for. Otherwise, there
+                 * is no matching interval as nodes to the
+                 * right of N can't satisfy Cond1 either.
+                 */
+                node = left;
+                continue;
+            }
+        }
+        if (node->start <= last) {         /* Cond1 */
+            if (start <= node->last) {     /* Cond2 */
+                return node; /* node is leftmost match */
+            }
+            if (node->rb.rb_right) {
+                node = rb_to_itree(node->rb.rb_right);
+                if (start <= node->subtree_last) {
+                    continue;
+                }
+            }
+        }
+        return NULL; /* no match */
+    }
+}
+
+IntervalTreeNode *interval_tree_iter_first(IntervalTreeRoot *root,
+                                           uint64_t start, uint64_t last)
+{
+    IntervalTreeNode *node, *leftmost;
+
+    if (!root->rb_root.rb_node) {
+        return NULL;
+    }
+
+    /*
+     * Fastpath range intersection/overlap between A: [a0, a1] and
+     * B: [b0, b1] is given by:
+     *
+     *         a0 <= b1 && b0 <= a1
+     *
+     *  ... where A holds the lock range and B holds the smallest
+     * 'start' and largest 'last' in the tree. For the later, we
+     * rely on the root node, which by augmented interval tree
+     * property, holds the largest value in its last-in-subtree.
+     * This allows mitigating some of the tree walk overhead for
+     * for non-intersecting ranges, maintained and consulted in O(1).
+     */
+    node = rb_to_itree(root->rb_root.rb_node);
+    if (node->subtree_last < start) {
+        return NULL;
+    }
+
+    leftmost = rb_to_itree(root->rb_leftmost);
+    if (leftmost->start > last) {
+        return NULL;
+    }
+
+    return interval_tree_subtree_search(node, start, last);
+}
+
+IntervalTreeNode *interval_tree_iter_next(IntervalTreeNode *node,
+                                          uint64_t start, uint64_t last)
+{
+    RBNode *rb = node->rb.rb_right, *prev;
+
+    while (true) {
+        /*
+         * Loop invariants:
+         *   Cond1: node->start <= last
+         *   rb == node->rb.rb_right
+         *
+         * First, search right subtree if suitable
+         */
+        if (rb) {
+            IntervalTreeNode *right = rb_to_itree(rb);
+
+            if (start <= right->subtree_last) {
+                return interval_tree_subtree_search(right, start, last);
+            }
+        }
+
+        /* Move up the tree until we come from a node's left child */
+        do {
+            rb = rb_parent(&node->rb);
+            if (!rb) {
+                return NULL;
+            }
+            prev = &node->rb;
+            node = rb_to_itree(rb);
+            rb = node->rb.rb_right;
+        } while (prev == rb);
+
+        /* Check if the node intersects [start;last] */
+        if (last < node->start) {  /* !Cond1 */
+            return NULL;
+        }
+        if (start <= node->last) { /* Cond2 */
+            return node;
+        }
+    }
+}
+
+#if 1
+static void debug_interval_tree_int(IntervalTreeNode *node,
+                                    const char *dir, int level)
+{
+    printf("%4d %*s %s [%" PRId64 ",%" PRId64 "] subtree_last:%" PRId64 "\n",
+           level, level + 1, dir, rb_is_red(&node->rb) ? "r" : "b",
+           node->start, node->last, node->subtree_last);
+
+    if (node->rb.rb_left) {
+        debug_interval_tree_int(rb_to_itree(node->rb.rb_left), "<", level + 1);
+    }
+    if (node->rb.rb_right) {
+        debug_interval_tree_int(rb_to_itree(node->rb.rb_right), ">", level + 1);
+    }
+}
+
+void debug_interval_tree(IntervalTreeNode *node);
+void debug_interval_tree(IntervalTreeNode *node)
+{
+    if (node) {
+        debug_interval_tree_int(node, "*", 0);
+    } else {
+        printf("null\n");
+    }
+}
+#endif
diff --git a/tests/unit/meson.build b/tests/unit/meson.build
index b497a41378..ffa444f432 100644
--- a/tests/unit/meson.build
+++ b/tests/unit/meson.build
@@ -47,6 +47,7 @@ tests = {
   'ptimer-test': ['ptimer-test-stubs.c', meson.project_source_root() / 'hw/core/ptimer.c'],
   'test-qapi-util': [],
   'test-smp-parse': [qom, meson.project_source_root() / 'hw/core/machine-smp.c'],
+  'test-interval-tree': [],
 }
 
 if have_system or have_tools
diff --git a/util/meson.build b/util/meson.build
index 5e282130df..46a0d017a6 100644
--- a/util/meson.build
+++ b/util/meson.build
@@ -55,6 +55,7 @@ util_ss.add(files('guest-random.c'))
 util_ss.add(files('yank.c'))
 util_ss.add(files('int128.c'))
 util_ss.add(files('memalign.c'))
+util_ss.add(files('interval-tree.c'))
 
 if have_user
   util_ss.add(files('selfmap.c'))
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 02/24] accel/tcg: Make page_alloc_target_data allocation constant
  2022-10-06  3:10 [PATCH 00/24] accel/tcg: Rewrite user-only vma tracking Richard Henderson
  2022-10-06  3:10 ` [PATCH 01/24] util: Add interval-tree.c Richard Henderson
@ 2022-10-06  3:10 ` Richard Henderson
  2022-10-25  8:45   ` Alex Bennée
  2022-10-06  3:10 ` [PATCH 03/24] accel/tcg: Remove disabled debug in translate-all.c Richard Henderson
                   ` (22 subsequent siblings)
  24 siblings, 1 reply; 55+ messages in thread
From: Richard Henderson @ 2022-10-06  3:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, laurent, pbonzini, imp, f4bug

Use a constant target data allocation size for all pages.
This will be necessary to reduce overhead of page tracking.
Since TARGET_PAGE_DATA_SIZE is now required, we can use this
to omit data tracking for targets that don't require it.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/exec/cpu-all.h    | 9 ++++-----
 target/arm/cpu.h          | 8 ++++++++
 target/arm/internals.h    | 4 ----
 accel/tcg/translate-all.c | 8 ++++++--
 target/arm/mte_helper.c   | 3 +--
 5 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index 16b7df41bf..854adc4ac2 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -281,19 +281,18 @@ void page_reset_target_data(target_ulong start, target_ulong end);
 int page_check_range(target_ulong start, target_ulong len, int flags);
 
 /**
- * page_alloc_target_data(address, size)
+ * page_alloc_target_data(address)
  * @address: guest virtual address
- * @size: size of data to allocate
  *
- * Allocate @size bytes of out-of-band data to associate with the
- * guest page at @address.  If the page is not mapped, NULL will
+ * Allocate TARGET_PAGE_DATA_SIZE bytes of out-of-band data to associate
+ * with the guest page at @address.  If the page is not mapped, NULL will
  * be returned.  If there is existing data associated with @address,
  * no new memory will be allocated.
  *
  * The memory will be freed when the guest page is deallocated,
  * e.g. with the munmap system call.
  */
-void *page_alloc_target_data(target_ulong address, size_t size);
+void *page_alloc_target_data(target_ulong address);
 
 /**
  * page_get_target_data(address)
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 429ed42eec..2f44f7afc7 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -3423,6 +3423,14 @@ static inline MemTxAttrs *typecheck_memtxattrs(MemTxAttrs *x)
 #define PAGE_MTE            PAGE_TARGET_2
 #define PAGE_TARGET_STICKY  PAGE_MTE
 
+/* We associate one allocation tag per 16 bytes, the minimum.  */
+#define LOG2_TAG_GRANULE 4
+#define TAG_GRANULE      (1 << LOG2_TAG_GRANULE)
+
+#ifdef CONFIG_USER_ONLY
+#define TARGET_PAGE_DATA_SIZE (TARGET_PAGE_SIZE >> (LOG2_TAG_GRANULE + 1))
+#endif
+
 #ifdef TARGET_TAGGED_ADDRESSES
 /**
  * cpu_untagged_addr:
diff --git a/target/arm/internals.h b/target/arm/internals.h
index 307a596505..d94efbaf5b 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -1165,10 +1165,6 @@ void arm_log_exception(CPUState *cs);
  */
 #define GMID_EL1_BS  6
 
-/* We associate one allocation tag per 16 bytes, the minimum.  */
-#define LOG2_TAG_GRANULE 4
-#define TAG_GRANULE      (1 << LOG2_TAG_GRANULE)
-
 /*
  * SVE predicates are 1/8 the size of SVE vectors, and cannot use
  * the same simd_desc() encoding due to restrictions on size.
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 4ed75a13e1..64a2601f9f 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -2271,6 +2271,7 @@ void page_set_flags(target_ulong start, target_ulong end, int flags)
 
 void page_reset_target_data(target_ulong start, target_ulong end)
 {
+#ifdef TARGET_PAGE_DATA_SIZE
     target_ulong addr, len;
 
     /*
@@ -2293,15 +2294,17 @@ void page_reset_target_data(target_ulong start, target_ulong end)
         g_free(p->target_data);
         p->target_data = NULL;
     }
+#endif
 }
 
+#ifdef TARGET_PAGE_DATA_SIZE
 void *page_get_target_data(target_ulong address)
 {
     PageDesc *p = page_find(address >> TARGET_PAGE_BITS);
     return p ? p->target_data : NULL;
 }
 
-void *page_alloc_target_data(target_ulong address, size_t size)
+void *page_alloc_target_data(target_ulong address)
 {
     PageDesc *p = page_find(address >> TARGET_PAGE_BITS);
     void *ret = NULL;
@@ -2309,11 +2312,12 @@ void *page_alloc_target_data(target_ulong address, size_t size)
     if (p->flags & PAGE_VALID) {
         ret = p->target_data;
         if (!ret) {
-            p->target_data = ret = g_malloc0(size);
+            p->target_data = ret = g_malloc0(TARGET_PAGE_DATA_SIZE);
         }
     }
     return ret;
 }
+#endif /* TARGET_PAGE_DATA_SIZE */
 
 int page_check_range(target_ulong start, target_ulong len, int flags)
 {
diff --git a/target/arm/mte_helper.c b/target/arm/mte_helper.c
index fdd23ab3f8..62d36c127f 100644
--- a/target/arm/mte_helper.c
+++ b/target/arm/mte_helper.c
@@ -96,8 +96,7 @@ static uint8_t *allocation_tag_mem(CPUARMState *env, int ptr_mmu_idx,
 
     tags = page_get_target_data(clean_ptr);
     if (tags == NULL) {
-        size_t alloc_size = TARGET_PAGE_SIZE >> (LOG2_TAG_GRANULE + 1);
-        tags = page_alloc_target_data(clean_ptr, alloc_size);
+        tags = page_alloc_target_data(clean_ptr);
         assert(tags != NULL);
     }
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 03/24] accel/tcg: Remove disabled debug in translate-all.c
  2022-10-06  3:10 [PATCH 00/24] accel/tcg: Rewrite user-only vma tracking Richard Henderson
  2022-10-06  3:10 ` [PATCH 01/24] util: Add interval-tree.c Richard Henderson
  2022-10-06  3:10 ` [PATCH 02/24] accel/tcg: Make page_alloc_target_data allocation constant Richard Henderson
@ 2022-10-06  3:10 ` Richard Henderson
  2022-10-25  8:46   ` Alex Bennée
  2022-10-06  3:10 ` [PATCH 04/24] accel/tcg: Split out PageDesc to internal.h Richard Henderson
                   ` (21 subsequent siblings)
  24 siblings, 1 reply; 55+ messages in thread
From: Richard Henderson @ 2022-10-06  3:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, laurent, pbonzini, imp, f4bug

These items printf, and could be replaced with proper
tracepoints if we really cared.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/translate-all.c | 109 --------------------------------------
 1 file changed, 109 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 64a2601f9f..42385fa032 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -63,33 +63,7 @@
 #include "tb-context.h"
 #include "internal.h"
 
-/* #define DEBUG_TB_INVALIDATE */
-/* #define DEBUG_TB_FLUSH */
 /* make various TB consistency checks */
-/* #define DEBUG_TB_CHECK */
-
-#ifdef DEBUG_TB_INVALIDATE
-#define DEBUG_TB_INVALIDATE_GATE 1
-#else
-#define DEBUG_TB_INVALIDATE_GATE 0
-#endif
-
-#ifdef DEBUG_TB_FLUSH
-#define DEBUG_TB_FLUSH_GATE 1
-#else
-#define DEBUG_TB_FLUSH_GATE 0
-#endif
-
-#if !defined(CONFIG_USER_ONLY)
-/* TB consistency checks only implemented for usermode emulation.  */
-#undef DEBUG_TB_CHECK
-#endif
-
-#ifdef DEBUG_TB_CHECK
-#define DEBUG_TB_CHECK_GATE 1
-#else
-#define DEBUG_TB_CHECK_GATE 0
-#endif
 
 /* Access to the various translations structures need to be serialised via locks
  * for consistency.
@@ -940,15 +914,6 @@ static void page_flush_tb(void)
     }
 }
 
-static gboolean tb_host_size_iter(gpointer key, gpointer value, gpointer data)
-{
-    const TranslationBlock *tb = value;
-    size_t *size = data;
-
-    *size += tb->tc.size;
-    return false;
-}
-
 /* flush all the translation blocks */
 static void do_tb_flush(CPUState *cpu, run_on_cpu_data tb_flush_count)
 {
@@ -963,15 +928,6 @@ static void do_tb_flush(CPUState *cpu, run_on_cpu_data tb_flush_count)
     }
     did_flush = true;
 
-    if (DEBUG_TB_FLUSH_GATE) {
-        size_t nb_tbs = tcg_nb_tbs();
-        size_t host_size = 0;
-
-        tcg_tb_foreach(tb_host_size_iter, &host_size);
-        printf("qemu: flush code_size=%zu nb_tbs=%zu avg_tb_size=%zu\n",
-               tcg_code_size(), nb_tbs, nb_tbs > 0 ? host_size / nb_tbs : 0);
-    }
-
     CPU_FOREACH(cpu) {
         tcg_flush_jmp_cache(cpu);
     }
@@ -1005,57 +961,6 @@ void tb_flush(CPUState *cpu)
     }
 }
 
-/*
- * Formerly ifdef DEBUG_TB_CHECK. These debug functions are user-mode-only,
- * so in order to prevent bit rot we compile them unconditionally in user-mode,
- * and let the optimizer get rid of them by wrapping their user-only callers
- * with if (DEBUG_TB_CHECK_GATE).
- */
-#ifdef CONFIG_USER_ONLY
-
-static void do_tb_invalidate_check(void *p, uint32_t hash, void *userp)
-{
-    TranslationBlock *tb = p;
-    target_ulong addr = *(target_ulong *)userp;
-
-    if (!(addr + TARGET_PAGE_SIZE <= tb_pc(tb) ||
-          addr >= tb_pc(tb) + tb->size)) {
-        printf("ERROR invalidate: address=" TARGET_FMT_lx
-               " PC=%08lx size=%04x\n", addr, (long)tb_pc(tb), tb->size);
-    }
-}
-
-/* verify that all the pages have correct rights for code
- *
- * Called with mmap_lock held.
- */
-static void tb_invalidate_check(target_ulong address)
-{
-    address &= TARGET_PAGE_MASK;
-    qht_iter(&tb_ctx.htable, do_tb_invalidate_check, &address);
-}
-
-static void do_tb_page_check(void *p, uint32_t hash, void *userp)
-{
-    TranslationBlock *tb = p;
-    int flags1, flags2;
-
-    flags1 = page_get_flags(tb_pc(tb));
-    flags2 = page_get_flags(tb_pc(tb) + tb->size - 1);
-    if ((flags1 & PAGE_WRITE) || (flags2 & PAGE_WRITE)) {
-        printf("ERROR page flags: PC=%08lx size=%04x f1=%x f2=%x\n",
-               (long)tb_pc(tb), tb->size, flags1, flags2);
-    }
-}
-
-/* verify that all the pages have correct rights for code */
-static void tb_page_check(void)
-{
-    qht_iter(&tb_ctx.htable, do_tb_page_check, NULL);
-}
-
-#endif /* CONFIG_USER_ONLY */
-
 /*
  * user-mode: call with mmap_lock held
  * !user-mode: call with @pd->lock held
@@ -1339,12 +1244,6 @@ tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
         page_unlock(p2);
     }
     page_unlock(p);
-
-#ifdef CONFIG_USER_ONLY
-    if (DEBUG_TB_CHECK_GATE) {
-        tb_page_check();
-    }
-#endif
     return tb;
 }
 
@@ -2400,9 +2299,6 @@ void page_protect(tb_page_addr_t page_addr)
         }
         mprotect(g2h_untagged(page_addr), qemu_host_page_size,
                  (prot & PAGE_BITS) & ~PAGE_WRITE);
-        if (DEBUG_TB_INVALIDATE_GATE) {
-            printf("protecting code page: 0x" TB_PAGE_ADDR_FMT "\n", page_addr);
-        }
     }
 }
 
@@ -2458,11 +2354,6 @@ int page_unprotect(target_ulong address, uintptr_t pc)
                 /* and since the content will be modified, we must invalidate
                    the corresponding translated code. */
                 current_tb_invalidated |= tb_invalidate_phys_page(addr, pc);
-#ifdef CONFIG_USER_ONLY
-                if (DEBUG_TB_CHECK_GATE) {
-                    tb_invalidate_check(addr);
-                }
-#endif
             }
             mprotect((void *)g2h_untagged(host_start), qemu_host_page_size,
                      prot & PAGE_BITS);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 04/24] accel/tcg: Split out PageDesc to internal.h
  2022-10-06  3:10 [PATCH 00/24] accel/tcg: Rewrite user-only vma tracking Richard Henderson
                   ` (2 preceding siblings ...)
  2022-10-06  3:10 ` [PATCH 03/24] accel/tcg: Remove disabled debug in translate-all.c Richard Henderson
@ 2022-10-06  3:10 ` Richard Henderson
  2022-10-25  8:47   ` Alex Bennée
  2022-10-06  3:10 ` [PATCH 05/24] accel/tcg: Split out tb-maint.c Richard Henderson
                   ` (20 subsequent siblings)
  24 siblings, 1 reply; 55+ messages in thread
From: Richard Henderson @ 2022-10-06  3:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, laurent, pbonzini, imp, f4bug

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/internal.h      | 31 +++++++++++++++++++++++++++++++
 accel/tcg/translate-all.c | 31 +------------------------------
 2 files changed, 32 insertions(+), 30 deletions(-)

diff --git a/accel/tcg/internal.h b/accel/tcg/internal.h
index dc800fd485..62da49ed52 100644
--- a/accel/tcg/internal.h
+++ b/accel/tcg/internal.h
@@ -11,6 +11,37 @@
 
 #include "exec/exec-all.h"
 
+/*
+ * Access to the various translations structures need to be serialised
+ * via locks for consistency.  In user-mode emulation access to the
+ * memory related structures are protected with mmap_lock.
+ * In !user-mode we use per-page locks.
+ */
+#ifdef CONFIG_SOFTMMU
+#define assert_memory_lock()
+#else
+#define assert_memory_lock() tcg_debug_assert(have_mmap_lock())
+#endif
+
+typedef struct PageDesc {
+    /* list of TBs intersecting this ram page */
+    uintptr_t first_tb;
+#ifdef CONFIG_USER_ONLY
+    unsigned long flags;
+    void *target_data;
+#endif
+#ifdef CONFIG_SOFTMMU
+    QemuSpin lock;
+#endif
+} PageDesc;
+
+PageDesc *page_find_alloc(tb_page_addr_t index, bool alloc);
+
+static inline PageDesc *page_find(tb_page_addr_t index)
+{
+    return page_find_alloc(index, false);
+}
+
 TranslationBlock *tb_gen_code(CPUState *cpu, target_ulong pc,
                               target_ulong cs_base, uint32_t flags,
                               int cflags);
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 42385fa032..86848c6743 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -65,30 +65,6 @@
 
 /* make various TB consistency checks */
 
-/* Access to the various translations structures need to be serialised via locks
- * for consistency.
- * In user-mode emulation access to the memory related structures are protected
- * with mmap_lock.
- * In !user-mode we use per-page locks.
- */
-#ifdef CONFIG_SOFTMMU
-#define assert_memory_lock()
-#else
-#define assert_memory_lock() tcg_debug_assert(have_mmap_lock())
-#endif
-
-typedef struct PageDesc {
-    /* list of TBs intersecting this ram page */
-    uintptr_t first_tb;
-#ifdef CONFIG_USER_ONLY
-    unsigned long flags;
-    void *target_data;
-#endif
-#ifdef CONFIG_SOFTMMU
-    QemuSpin lock;
-#endif
-} PageDesc;
-
 /**
  * struct page_entry - page descriptor entry
  * @pd:     pointer to the &struct PageDesc of the page this entry represents
@@ -445,7 +421,7 @@ void page_init(void)
 #endif
 }
 
-static PageDesc *page_find_alloc(tb_page_addr_t index, bool alloc)
+PageDesc *page_find_alloc(tb_page_addr_t index, bool alloc)
 {
     PageDesc *pd;
     void **lp;
@@ -511,11 +487,6 @@ static PageDesc *page_find_alloc(tb_page_addr_t index, bool alloc)
     return pd + (index & (V_L2_SIZE - 1));
 }
 
-static inline PageDesc *page_find(tb_page_addr_t index)
-{
-    return page_find_alloc(index, false);
-}
-
 static void page_lock_pair(PageDesc **ret_p1, tb_page_addr_t phys1,
                            PageDesc **ret_p2, tb_page_addr_t phys2, bool alloc);
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 05/24] accel/tcg: Split out tb-maint.c
  2022-10-06  3:10 [PATCH 00/24] accel/tcg: Rewrite user-only vma tracking Richard Henderson
                   ` (3 preceding siblings ...)
  2022-10-06  3:10 ` [PATCH 04/24] accel/tcg: Split out PageDesc to internal.h Richard Henderson
@ 2022-10-06  3:10 ` Richard Henderson
  2022-10-25  8:49   ` Alex Bennée
  2022-10-06  3:10 ` [PATCH 06/24] accel/tcg: Move assert_no_pages_locked to internal.h Richard Henderson
                   ` (19 subsequent siblings)
  24 siblings, 1 reply; 55+ messages in thread
From: Richard Henderson @ 2022-10-06  3:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, laurent, pbonzini, imp, f4bug

Move all of the TranslationBlock flushing and page linking
code from translate-all.c to tb-maint.c.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/internal.h      |  55 +++
 accel/tcg/tb-maint.c      | 735 ++++++++++++++++++++++++++++++++++++
 accel/tcg/translate-all.c | 766 +-------------------------------------
 accel/tcg/meson.build     |   1 +
 4 files changed, 802 insertions(+), 755 deletions(-)
 create mode 100644 accel/tcg/tb-maint.c

diff --git a/accel/tcg/internal.h b/accel/tcg/internal.h
index 62da49ed52..a77b110b78 100644
--- a/accel/tcg/internal.h
+++ b/accel/tcg/internal.h
@@ -35,6 +35,27 @@ typedef struct PageDesc {
 #endif
 } PageDesc;
 
+/* Size of the L2 (and L3, etc) page tables.  */
+#define V_L2_BITS 10
+#define V_L2_SIZE (1 << V_L2_BITS)
+
+/*
+ * L1 Mapping properties
+ */
+extern int v_l1_size;
+extern int v_l1_shift;
+extern int v_l2_levels;
+
+/*
+ * The bottom level has pointers to PageDesc, and is indexed by
+ * anything from 4 to (V_L2_BITS + 3) bits, depending on target page size.
+ */
+#define V_L1_MIN_BITS 4
+#define V_L1_MAX_BITS (V_L2_BITS + 3)
+#define V_L1_MAX_SIZE (1 << V_L1_MAX_BITS)
+
+extern void *l1_map[V_L1_MAX_SIZE];
+
 PageDesc *page_find_alloc(tb_page_addr_t index, bool alloc);
 
 static inline PageDesc *page_find(tb_page_addr_t index)
@@ -42,12 +63,46 @@ static inline PageDesc *page_find(tb_page_addr_t index)
     return page_find_alloc(index, false);
 }
 
+/* list iterators for lists of tagged pointers in TranslationBlock */
+#define TB_FOR_EACH_TAGGED(head, tb, n, field)                          \
+    for (n = (head) & 1, tb = (TranslationBlock *)((head) & ~1);        \
+         tb; tb = (TranslationBlock *)tb->field[n], n = (uintptr_t)tb & 1, \
+             tb = (TranslationBlock *)((uintptr_t)tb & ~1))
+
+#define PAGE_FOR_EACH_TB(pagedesc, tb, n)                       \
+    TB_FOR_EACH_TAGGED((pagedesc)->first_tb, tb, n, page_next)
+
+#define TB_FOR_EACH_JMP(head_tb, tb, n)                                 \
+    TB_FOR_EACH_TAGGED((head_tb)->jmp_list_head, tb, n, jmp_list_next)
+
+/* In user-mode page locks aren't used; mmap_lock is enough */
+#ifdef CONFIG_USER_ONLY
+#define assert_page_locked(pd) tcg_debug_assert(have_mmap_lock())
+static inline void page_lock(PageDesc *pd) { }
+static inline void page_unlock(PageDesc *pd) { }
+#else
+#ifdef CONFIG_DEBUG_TCG
+void do_assert_page_locked(const PageDesc *pd, const char *file, int line);
+#define assert_page_locked(pd) do_assert_page_locked(pd, __FILE__, __LINE__)
+#else
+#define assert_page_locked(pd)
+#endif
+void page_lock(PageDesc *pd);
+void page_unlock(PageDesc *pd);
+#endif
+
 TranslationBlock *tb_gen_code(CPUState *cpu, target_ulong pc,
                               target_ulong cs_base, uint32_t flags,
                               int cflags);
 G_NORETURN void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr);
 void page_init(void);
 void tb_htable_init(void);
+void tb_reset_jump(TranslationBlock *tb, int n);
+TranslationBlock *tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
+                               tb_page_addr_t phys_page2);
+bool tb_invalidate_phys_page(tb_page_addr_t addr, uintptr_t pc);
+int cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
+                              uintptr_t searched_pc, bool reset_icount);
 
 /* Return the current PC from CPU, which may be cached in TB. */
 static inline target_ulong log_pc(CPUState *cpu, const TranslationBlock *tb)
diff --git a/accel/tcg/tb-maint.c b/accel/tcg/tb-maint.c
new file mode 100644
index 0000000000..66c1900ae6
--- /dev/null
+++ b/accel/tcg/tb-maint.c
@@ -0,0 +1,735 @@
+/*
+ * Translation Block Maintaince
+ *
+ *  Copyright (c) 2003 Fabrice Bellard
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "exec/cputlb.h"
+#include "exec/log.h"
+#include "exec/translate-all.h"
+#include "sysemu/tcg.h"
+#include "tcg/tcg.h"
+#include "tb-hash.h"
+#include "tb-context.h"
+#include "internal.h"
+
+/* FIXME: tb_invalidate_phys_range is declared in different places. */
+#ifdef CONFIG_USER_ONLY
+#include "exec/exec-all.h"
+#else
+#include "exec/ram_addr.h"
+#endif
+
+static bool tb_cmp(const void *ap, const void *bp)
+{
+    const TranslationBlock *a = ap;
+    const TranslationBlock *b = bp;
+
+    return ((TARGET_TB_PCREL || tb_pc(a) == tb_pc(b)) &&
+            a->cs_base == b->cs_base &&
+            a->flags == b->flags &&
+            (tb_cflags(a) & ~CF_INVALID) == (tb_cflags(b) & ~CF_INVALID) &&
+            a->trace_vcpu_dstate == b->trace_vcpu_dstate &&
+            a->page_addr[0] == b->page_addr[0] &&
+            a->page_addr[1] == b->page_addr[1]);
+}
+
+void tb_htable_init(void)
+{
+    unsigned int mode = QHT_MODE_AUTO_RESIZE;
+
+    qht_init(&tb_ctx.htable, tb_cmp, CODE_GEN_HTABLE_SIZE, mode);
+}
+
+/* Set to NULL all the 'first_tb' fields in all PageDescs. */
+static void page_flush_tb_1(int level, void **lp)
+{
+    int i;
+
+    if (*lp == NULL) {
+        return;
+    }
+    if (level == 0) {
+        PageDesc *pd = *lp;
+
+        for (i = 0; i < V_L2_SIZE; ++i) {
+            page_lock(&pd[i]);
+            pd[i].first_tb = (uintptr_t)NULL;
+            page_unlock(&pd[i]);
+        }
+    } else {
+        void **pp = *lp;
+
+        for (i = 0; i < V_L2_SIZE; ++i) {
+            page_flush_tb_1(level - 1, pp + i);
+        }
+    }
+}
+
+static void page_flush_tb(void)
+{
+    int i, l1_sz = v_l1_size;
+
+    for (i = 0; i < l1_sz; i++) {
+        page_flush_tb_1(v_l2_levels, l1_map + i);
+    }
+}
+
+/* flush all the translation blocks */
+static void do_tb_flush(CPUState *cpu, run_on_cpu_data tb_flush_count)
+{
+    bool did_flush = false;
+
+    mmap_lock();
+    /* If it is already been done on request of another CPU, just retry. */
+    if (tb_ctx.tb_flush_count != tb_flush_count.host_int) {
+        goto done;
+    }
+    did_flush = true;
+
+    CPU_FOREACH(cpu) {
+        tcg_flush_jmp_cache(cpu);
+    }
+
+    qht_reset_size(&tb_ctx.htable, CODE_GEN_HTABLE_SIZE);
+    page_flush_tb();
+
+    tcg_region_reset_all();
+    /* XXX: flush processor icache at this point if cache flush is expensive */
+    qatomic_mb_set(&tb_ctx.tb_flush_count, tb_ctx.tb_flush_count + 1);
+
+done:
+    mmap_unlock();
+    if (did_flush) {
+        qemu_plugin_flush_cb();
+    }
+}
+
+void tb_flush(CPUState *cpu)
+{
+    if (tcg_enabled()) {
+        unsigned tb_flush_count = qatomic_mb_read(&tb_ctx.tb_flush_count);
+
+        if (cpu_in_exclusive_context(cpu)) {
+            do_tb_flush(cpu, RUN_ON_CPU_HOST_INT(tb_flush_count));
+        } else {
+            async_safe_run_on_cpu(cpu, do_tb_flush,
+                                  RUN_ON_CPU_HOST_INT(tb_flush_count));
+        }
+    }
+}
+
+/*
+ * user-mode: call with mmap_lock held
+ * !user-mode: call with @pd->lock held
+ */
+static inline void tb_page_remove(PageDesc *pd, TranslationBlock *tb)
+{
+    TranslationBlock *tb1;
+    uintptr_t *pprev;
+    unsigned int n1;
+
+    assert_page_locked(pd);
+    pprev = &pd->first_tb;
+    PAGE_FOR_EACH_TB(pd, tb1, n1) {
+        if (tb1 == tb) {
+            *pprev = tb1->page_next[n1];
+            return;
+        }
+        pprev = &tb1->page_next[n1];
+    }
+    g_assert_not_reached();
+}
+
+/* remove @orig from its @n_orig-th jump list */
+static inline void tb_remove_from_jmp_list(TranslationBlock *orig, int n_orig)
+{
+    uintptr_t ptr, ptr_locked;
+    TranslationBlock *dest;
+    TranslationBlock *tb;
+    uintptr_t *pprev;
+    int n;
+
+    /* mark the LSB of jmp_dest[] so that no further jumps can be inserted */
+    ptr = qatomic_or_fetch(&orig->jmp_dest[n_orig], 1);
+    dest = (TranslationBlock *)(ptr & ~1);
+    if (dest == NULL) {
+        return;
+    }
+
+    qemu_spin_lock(&dest->jmp_lock);
+    /*
+     * While acquiring the lock, the jump might have been removed if the
+     * destination TB was invalidated; check again.
+     */
+    ptr_locked = qatomic_read(&orig->jmp_dest[n_orig]);
+    if (ptr_locked != ptr) {
+        qemu_spin_unlock(&dest->jmp_lock);
+        /*
+         * The only possibility is that the jump was unlinked via
+         * tb_jump_unlink(dest). Seeing here another destination would be a bug,
+         * because we set the LSB above.
+         */
+        g_assert(ptr_locked == 1 && dest->cflags & CF_INVALID);
+        return;
+    }
+    /*
+     * We first acquired the lock, and since the destination pointer matches,
+     * we know for sure that @orig is in the jmp list.
+     */
+    pprev = &dest->jmp_list_head;
+    TB_FOR_EACH_JMP(dest, tb, n) {
+        if (tb == orig && n == n_orig) {
+            *pprev = tb->jmp_list_next[n];
+            /* no need to set orig->jmp_dest[n]; setting the LSB was enough */
+            qemu_spin_unlock(&dest->jmp_lock);
+            return;
+        }
+        pprev = &tb->jmp_list_next[n];
+    }
+    g_assert_not_reached();
+}
+
+/*
+ * Reset the jump entry 'n' of a TB so that it is not chained to another TB.
+ */
+void tb_reset_jump(TranslationBlock *tb, int n)
+{
+    uintptr_t addr = (uintptr_t)(tb->tc.ptr + tb->jmp_reset_offset[n]);
+    tb_set_jmp_target(tb, n, addr);
+}
+
+/* remove any jumps to the TB */
+static inline void tb_jmp_unlink(TranslationBlock *dest)
+{
+    TranslationBlock *tb;
+    int n;
+
+    qemu_spin_lock(&dest->jmp_lock);
+
+    TB_FOR_EACH_JMP(dest, tb, n) {
+        tb_reset_jump(tb, n);
+        qatomic_and(&tb->jmp_dest[n], (uintptr_t)NULL | 1);
+        /* No need to clear the list entry; setting the dest ptr is enough */
+    }
+    dest->jmp_list_head = (uintptr_t)NULL;
+
+    qemu_spin_unlock(&dest->jmp_lock);
+}
+
+static void tb_jmp_cache_inval_tb(TranslationBlock *tb)
+{
+    CPUState *cpu;
+
+    if (TARGET_TB_PCREL) {
+        /* A TB may be at any virtual address */
+        CPU_FOREACH(cpu) {
+            tcg_flush_jmp_cache(cpu);
+        }
+    } else {
+        uint32_t h = tb_jmp_cache_hash_func(tb_pc(tb));
+
+        CPU_FOREACH(cpu) {
+            CPUJumpCache *jc = cpu->tb_jmp_cache;
+
+            if (qatomic_read(&jc->array[h].tb) == tb) {
+                qatomic_set(&jc->array[h].tb, NULL);
+            }
+        }
+    }
+}
+
+/*
+ * In user-mode, call with mmap_lock held.
+ * In !user-mode, if @rm_from_page_list is set, call with the TB's pages'
+ * locks held.
+ */
+static void do_tb_phys_invalidate(TranslationBlock *tb, bool rm_from_page_list)
+{
+    PageDesc *p;
+    uint32_t h;
+    tb_page_addr_t phys_pc;
+    uint32_t orig_cflags = tb_cflags(tb);
+
+    assert_memory_lock();
+
+    /* make sure no further incoming jumps will be chained to this TB */
+    qemu_spin_lock(&tb->jmp_lock);
+    qatomic_set(&tb->cflags, tb->cflags | CF_INVALID);
+    qemu_spin_unlock(&tb->jmp_lock);
+
+    /* remove the TB from the hash list */
+    phys_pc = tb->page_addr[0];
+    h = tb_hash_func(phys_pc, (TARGET_TB_PCREL ? 0 : tb_pc(tb)),
+                     tb->flags, orig_cflags, tb->trace_vcpu_dstate);
+    if (!qht_remove(&tb_ctx.htable, tb, h)) {
+        return;
+    }
+
+    /* remove the TB from the page list */
+    if (rm_from_page_list) {
+        p = page_find(tb->page_addr[0] >> TARGET_PAGE_BITS);
+        tb_page_remove(p, tb);
+        if (tb->page_addr[1] != -1) {
+            p = page_find(tb->page_addr[1] >> TARGET_PAGE_BITS);
+            tb_page_remove(p, tb);
+        }
+    }
+
+    /* remove the TB from the hash list */
+    tb_jmp_cache_inval_tb(tb);
+
+    /* suppress this TB from the two jump lists */
+    tb_remove_from_jmp_list(tb, 0);
+    tb_remove_from_jmp_list(tb, 1);
+
+    /* suppress any remaining jumps to this TB */
+    tb_jmp_unlink(tb);
+
+    qatomic_set(&tb_ctx.tb_phys_invalidate_count,
+                tb_ctx.tb_phys_invalidate_count + 1);
+}
+
+static void tb_phys_invalidate__locked(TranslationBlock *tb)
+{
+    qemu_thread_jit_write();
+    do_tb_phys_invalidate(tb, true);
+    qemu_thread_jit_execute();
+}
+
+static void page_lock_pair(PageDesc **ret_p1, tb_page_addr_t phys1,
+                           PageDesc **ret_p2, tb_page_addr_t phys2, bool alloc)
+{
+    PageDesc *p1, *p2;
+    tb_page_addr_t page1;
+    tb_page_addr_t page2;
+
+    assert_memory_lock();
+    g_assert(phys1 != -1);
+
+    page1 = phys1 >> TARGET_PAGE_BITS;
+    page2 = phys2 >> TARGET_PAGE_BITS;
+
+    p1 = page_find_alloc(page1, alloc);
+    if (ret_p1) {
+        *ret_p1 = p1;
+    }
+    if (likely(phys2 == -1)) {
+        page_lock(p1);
+        return;
+    } else if (page1 == page2) {
+        page_lock(p1);
+        if (ret_p2) {
+            *ret_p2 = p1;
+        }
+        return;
+    }
+    p2 = page_find_alloc(page2, alloc);
+    if (ret_p2) {
+        *ret_p2 = p2;
+    }
+    if (page1 < page2) {
+        page_lock(p1);
+        page_lock(p2);
+    } else {
+        page_lock(p2);
+        page_lock(p1);
+    }
+}
+
+#ifdef CONFIG_USER_ONLY
+static inline void page_lock_tb(const TranslationBlock *tb) { }
+static inline void page_unlock_tb(const TranslationBlock *tb) { }
+#else
+/* lock the page(s) of a TB in the correct acquisition order */
+static void page_lock_tb(const TranslationBlock *tb)
+{
+    page_lock_pair(NULL, tb->page_addr[0], NULL, tb->page_addr[1], false);
+}
+
+static void page_unlock_tb(const TranslationBlock *tb)
+{
+    PageDesc *p1 = page_find(tb->page_addr[0] >> TARGET_PAGE_BITS);
+
+    page_unlock(p1);
+    if (unlikely(tb->page_addr[1] != -1)) {
+        PageDesc *p2 = page_find(tb->page_addr[1] >> TARGET_PAGE_BITS);
+
+        if (p2 != p1) {
+            page_unlock(p2);
+        }
+    }
+}
+#endif
+
+/*
+ * Invalidate one TB.
+ * Called with mmap_lock held in user-mode.
+ */
+void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
+{
+    if (page_addr == -1 && tb->page_addr[0] != -1) {
+        page_lock_tb(tb);
+        do_tb_phys_invalidate(tb, true);
+        page_unlock_tb(tb);
+    } else {
+        do_tb_phys_invalidate(tb, false);
+    }
+}
+
+/*
+ * Add the tb in the target page and protect it if necessary.
+ * Called with mmap_lock held for user-mode emulation.
+ * Called with @p->lock held in !user-mode.
+ */
+static inline void tb_page_add(PageDesc *p, TranslationBlock *tb,
+                               unsigned int n, tb_page_addr_t page_addr)
+{
+#ifndef CONFIG_USER_ONLY
+    bool page_already_protected;
+#endif
+
+    assert_page_locked(p);
+
+    tb->page_addr[n] = page_addr;
+    tb->page_next[n] = p->first_tb;
+#ifndef CONFIG_USER_ONLY
+    page_already_protected = p->first_tb != (uintptr_t)NULL;
+#endif
+    p->first_tb = (uintptr_t)tb | n;
+
+#if defined(CONFIG_USER_ONLY)
+    /* translator_loop() must have made all TB pages non-writable */
+    assert(!(p->flags & PAGE_WRITE));
+#else
+    /*
+     * If some code is already present, then the pages are already
+     * protected. So we handle the case where only the first TB is
+     * allocated in a physical page.
+     */
+    if (!page_already_protected) {
+        tlb_protect_code(page_addr);
+    }
+#endif
+}
+
+/*
+ * Add a new TB and link it to the physical page tables. phys_page2 is
+ * (-1) to indicate that only one page contains the TB.
+ *
+ * Called with mmap_lock held for user-mode emulation.
+ *
+ * Returns a pointer @tb, or a pointer to an existing TB that matches @tb.
+ * Note that in !user-mode, another thread might have already added a TB
+ * for the same block of guest code that @tb corresponds to. In that case,
+ * the caller should discard the original @tb, and use instead the returned TB.
+ */
+TranslationBlock *tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
+                               tb_page_addr_t phys_page2)
+{
+    PageDesc *p;
+    PageDesc *p2 = NULL;
+    void *existing_tb = NULL;
+    uint32_t h;
+
+    assert_memory_lock();
+    tcg_debug_assert(!(tb->cflags & CF_INVALID));
+
+    /*
+     * Add the TB to the page list, acquiring first the pages's locks.
+     * We keep the locks held until after inserting the TB in the hash table,
+     * so that if the insertion fails we know for sure that the TBs are still
+     * in the page descriptors.
+     * Note that inserting into the hash table first isn't an option, since
+     * we can only insert TBs that are fully initialized.
+     */
+    page_lock_pair(&p, phys_pc, &p2, phys_page2, true);
+    tb_page_add(p, tb, 0, phys_pc);
+    if (p2) {
+        tb_page_add(p2, tb, 1, phys_page2);
+    } else {
+        tb->page_addr[1] = -1;
+    }
+
+    /* add in the hash table */
+    h = tb_hash_func(phys_pc, (TARGET_TB_PCREL ? 0 : tb_pc(tb)),
+                     tb->flags, tb->cflags, tb->trace_vcpu_dstate);
+    qht_insert(&tb_ctx.htable, tb, h, &existing_tb);
+
+    /* remove TB from the page(s) if we couldn't insert it */
+    if (unlikely(existing_tb)) {
+        tb_page_remove(p, tb);
+        if (p2) {
+            tb_page_remove(p2, tb);
+        }
+        tb = existing_tb;
+    }
+
+    if (p2 && p2 != p) {
+        page_unlock(p2);
+    }
+    page_unlock(p);
+    return tb;
+}
+
+/*
+ * @p must be non-NULL.
+ * user-mode: call with mmap_lock held.
+ * !user-mode: call with all @pages locked.
+ */
+static void
+tb_invalidate_phys_page_range__locked(struct page_collection *pages,
+                                      PageDesc *p, tb_page_addr_t start,
+                                      tb_page_addr_t end,
+                                      uintptr_t retaddr)
+{
+    TranslationBlock *tb;
+    tb_page_addr_t tb_start, tb_end;
+    int n;
+#ifdef TARGET_HAS_PRECISE_SMC
+    CPUState *cpu = current_cpu;
+    CPUArchState *env = NULL;
+    bool current_tb_not_found = retaddr != 0;
+    bool current_tb_modified = false;
+    TranslationBlock *current_tb = NULL;
+    target_ulong current_pc = 0;
+    target_ulong current_cs_base = 0;
+    uint32_t current_flags = 0;
+#endif /* TARGET_HAS_PRECISE_SMC */
+
+    assert_page_locked(p);
+
+#if defined(TARGET_HAS_PRECISE_SMC)
+    if (cpu != NULL) {
+        env = cpu->env_ptr;
+    }
+#endif
+
+    /*
+     * We remove all the TBs in the range [start, end[.
+     * XXX: see if in some cases it could be faster to invalidate all the code
+     */
+    PAGE_FOR_EACH_TB(p, tb, n) {
+        assert_page_locked(p);
+        /* NOTE: this is subtle as a TB may span two physical pages */
+        if (n == 0) {
+            /* NOTE: tb_end may be after the end of the page, but
+               it is not a problem */
+            tb_start = tb->page_addr[0];
+            tb_end = tb_start + tb->size;
+        } else {
+            tb_start = tb->page_addr[1];
+            tb_end = tb_start + ((tb->page_addr[0] + tb->size)
+                                 & ~TARGET_PAGE_MASK);
+        }
+        if (!(tb_end <= start || tb_start >= end)) {
+#ifdef TARGET_HAS_PRECISE_SMC
+            if (current_tb_not_found) {
+                current_tb_not_found = false;
+                /* now we have a real cpu fault */
+                current_tb = tcg_tb_lookup(retaddr);
+            }
+            if (current_tb == tb &&
+                (tb_cflags(current_tb) & CF_COUNT_MASK) != 1) {
+                /*
+                 * If we are modifying the current TB, we must stop
+                 * its execution. We could be more precise by checking
+                 * that the modification is after the current PC, but it
+                 * would require a specialized function to partially
+                 * restore the CPU state.
+                 */
+                current_tb_modified = true;
+                cpu_restore_state_from_tb(cpu, current_tb, retaddr, true);
+                cpu_get_tb_cpu_state(env, &current_pc, &current_cs_base,
+                                     &current_flags);
+            }
+#endif /* TARGET_HAS_PRECISE_SMC */
+            tb_phys_invalidate__locked(tb);
+        }
+    }
+#if !defined(CONFIG_USER_ONLY)
+    /* if no code remaining, no need to continue to use slow writes */
+    if (!p->first_tb) {
+        tlb_unprotect_code(start);
+    }
+#endif
+#ifdef TARGET_HAS_PRECISE_SMC
+    if (current_tb_modified) {
+        page_collection_unlock(pages);
+        /* Force execution of one insn next time.  */
+        cpu->cflags_next_tb = 1 | CF_NOIRQ | curr_cflags(cpu);
+        mmap_unlock();
+        cpu_loop_exit_noexc(cpu);
+    }
+#endif
+}
+
+/*
+ * Invalidate all TBs which intersect with the target physical address range
+ * [start;end[. NOTE: start and end must refer to the *same* physical page.
+ * 'is_cpu_write_access' should be true if called from a real cpu write
+ * access: the virtual CPU will exit the current TB if code is modified inside
+ * this TB.
+ *
+ * Called with mmap_lock held for user-mode emulation
+ */
+void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end)
+{
+    struct page_collection *pages;
+    PageDesc *p;
+
+    assert_memory_lock();
+
+    p = page_find(start >> TARGET_PAGE_BITS);
+    if (p == NULL) {
+        return;
+    }
+    pages = page_collection_lock(start, end);
+    tb_invalidate_phys_page_range__locked(pages, p, start, end, 0);
+    page_collection_unlock(pages);
+}
+
+/*
+ * Invalidate all TBs which intersect with the target physical address range
+ * [start;end[. NOTE: start and end may refer to *different* physical pages.
+ * 'is_cpu_write_access' should be true if called from a real cpu write
+ * access: the virtual CPU will exit the current TB if code is modified inside
+ * this TB.
+ *
+ * Called with mmap_lock held for user-mode emulation.
+ */
+#ifdef CONFIG_SOFTMMU
+void tb_invalidate_phys_range(ram_addr_t start, ram_addr_t end)
+#else
+void tb_invalidate_phys_range(target_ulong start, target_ulong end)
+#endif
+{
+    struct page_collection *pages;
+    tb_page_addr_t next;
+
+    assert_memory_lock();
+
+    pages = page_collection_lock(start, end);
+    for (next = (start & TARGET_PAGE_MASK) + TARGET_PAGE_SIZE;
+         start < end;
+         start = next, next += TARGET_PAGE_SIZE) {
+        PageDesc *pd = page_find(start >> TARGET_PAGE_BITS);
+        tb_page_addr_t bound = MIN(next, end);
+
+        if (pd == NULL) {
+            continue;
+        }
+        tb_invalidate_phys_page_range__locked(pages, pd, start, bound, 0);
+    }
+    page_collection_unlock(pages);
+}
+
+#ifdef CONFIG_SOFTMMU
+/*
+ * len must be <= 8 and start must be a multiple of len.
+ * Called via softmmu_template.h when code areas are written to with
+ * iothread mutex not held.
+ *
+ * Call with all @pages in the range [@start, @start + len[ locked.
+ */
+void tb_invalidate_phys_page_fast(struct page_collection *pages,
+                                  tb_page_addr_t start, int len,
+                                  uintptr_t retaddr)
+{
+    PageDesc *p;
+
+    assert_memory_lock();
+
+    p = page_find(start >> TARGET_PAGE_BITS);
+    if (!p) {
+        return;
+    }
+
+    assert_page_locked(p);
+    tb_invalidate_phys_page_range__locked(pages, p, start, start + len,
+                                          retaddr);
+}
+#else
+/*
+ * Called with mmap_lock held. If pc is not 0 then it indicates the
+ * host PC of the faulting store instruction that caused this invalidate.
+ * Returns true if the caller needs to abort execution of the current
+ * TB (because it was modified by this store and the guest CPU has
+ * precise-SMC semantics).
+ */
+bool tb_invalidate_phys_page(tb_page_addr_t addr, uintptr_t pc)
+{
+    TranslationBlock *tb;
+    PageDesc *p;
+    int n;
+#ifdef TARGET_HAS_PRECISE_SMC
+    TranslationBlock *current_tb = NULL;
+    CPUState *cpu = current_cpu;
+    CPUArchState *env = NULL;
+    int current_tb_modified = 0;
+    target_ulong current_pc = 0;
+    target_ulong current_cs_base = 0;
+    uint32_t current_flags = 0;
+#endif
+
+    assert_memory_lock();
+
+    addr &= TARGET_PAGE_MASK;
+    p = page_find(addr >> TARGET_PAGE_BITS);
+    if (!p) {
+        return false;
+    }
+
+#ifdef TARGET_HAS_PRECISE_SMC
+    if (p->first_tb && pc != 0) {
+        current_tb = tcg_tb_lookup(pc);
+    }
+    if (cpu != NULL) {
+        env = cpu->env_ptr;
+    }
+#endif
+    assert_page_locked(p);
+    PAGE_FOR_EACH_TB(p, tb, n) {
+#ifdef TARGET_HAS_PRECISE_SMC
+        if (current_tb == tb &&
+            (tb_cflags(current_tb) & CF_COUNT_MASK) != 1) {
+            /*
+             * If we are modifying the current TB, we must stop its execution.
+             * We could be more precise by checking that the modification is
+             * after the current PC, but it would require a specialized
+             * function to partially restore the CPU state.
+             */
+            current_tb_modified = 1;
+            cpu_restore_state_from_tb(cpu, current_tb, pc, true);
+            cpu_get_tb_cpu_state(env, &current_pc, &current_cs_base,
+                                 &current_flags);
+        }
+#endif /* TARGET_HAS_PRECISE_SMC */
+        tb_phys_invalidate(tb, addr);
+    }
+    p->first_tb = (uintptr_t)NULL;
+#ifdef TARGET_HAS_PRECISE_SMC
+    if (current_tb_modified) {
+        /* Force execution of one insn next time.  */
+        cpu->cflags_next_tb = 1 | CF_NOIRQ | curr_cflags(cpu);
+        return true;
+    }
+#endif
+
+    return false;
+}
+#endif
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 86848c6743..5e28e9fccd 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -109,18 +109,6 @@ struct page_collection {
     struct page_entry *max;
 };
 
-/* list iterators for lists of tagged pointers in TranslationBlock */
-#define TB_FOR_EACH_TAGGED(head, tb, n, field)                          \
-    for (n = (head) & 1, tb = (TranslationBlock *)((head) & ~1);        \
-         tb; tb = (TranslationBlock *)tb->field[n], n = (uintptr_t)tb & 1, \
-             tb = (TranslationBlock *)((uintptr_t)tb & ~1))
-
-#define PAGE_FOR_EACH_TB(pagedesc, tb, n)                       \
-    TB_FOR_EACH_TAGGED((pagedesc)->first_tb, tb, n, page_next)
-
-#define TB_FOR_EACH_JMP(head_tb, tb, n)                                 \
-    TB_FOR_EACH_TAGGED((head_tb)->jmp_list_head, tb, n, jmp_list_next)
-
 /*
  * In system mode we want L1_MAP to be based on ram offsets,
  * while in user mode we want it to be based on virtual addresses.
@@ -138,10 +126,6 @@ struct page_collection {
 # define L1_MAP_ADDR_SPACE_BITS  MIN(HOST_LONG_BITS, TARGET_ABI_BITS)
 #endif
 
-/* Size of the L2 (and L3, etc) page tables.  */
-#define V_L2_BITS 10
-#define V_L2_SIZE (1 << V_L2_BITS)
-
 /* Make sure all possible CPU event bits fit in tb->trace_vcpu_dstate */
 QEMU_BUILD_BUG_ON(CPU_TRACE_DSTATE_MAX_EVENTS >
                   sizeof_field(TranslationBlock, trace_vcpu_dstate)
@@ -150,18 +134,11 @@ QEMU_BUILD_BUG_ON(CPU_TRACE_DSTATE_MAX_EVENTS >
 /*
  * L1 Mapping properties
  */
-static int v_l1_size;
-static int v_l1_shift;
-static int v_l2_levels;
+int v_l1_size;
+int v_l1_shift;
+int v_l2_levels;
 
-/* The bottom level has pointers to PageDesc, and is indexed by
- * anything from 4 to (V_L2_BITS + 3) bits, depending on target page size.
- */
-#define V_L1_MIN_BITS 4
-#define V_L1_MAX_BITS (V_L2_BITS + 3)
-#define V_L1_MAX_SIZE (1 << V_L1_MAX_BITS)
-
-static void *l1_map[V_L1_MAX_SIZE];
+void *l1_map[V_L1_MAX_SIZE];
 
 TBContext tb_ctx;
 
@@ -274,8 +251,8 @@ static int encode_search(TranslationBlock *tb, uint8_t *block)
  * When reset_icount is true, current TB will be interrupted and
  * icount should be recalculated.
  */
-static int cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
-                                     uintptr_t searched_pc, bool reset_icount)
+int cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
+                              uintptr_t searched_pc, bool reset_icount)
 {
     target_ulong data[TARGET_INSN_START_WORDS];
     uintptr_t host_pc = (uintptr_t)tb->tc.ptr;
@@ -487,26 +464,8 @@ PageDesc *page_find_alloc(tb_page_addr_t index, bool alloc)
     return pd + (index & (V_L2_SIZE - 1));
 }
 
-static void page_lock_pair(PageDesc **ret_p1, tb_page_addr_t phys1,
-                           PageDesc **ret_p2, tb_page_addr_t phys2, bool alloc);
-
 /* In user-mode page locks aren't used; mmap_lock is enough */
 #ifdef CONFIG_USER_ONLY
-
-#define assert_page_locked(pd) tcg_debug_assert(have_mmap_lock())
-
-static inline void page_lock(PageDesc *pd)
-{ }
-
-static inline void page_unlock(PageDesc *pd)
-{ }
-
-static inline void page_lock_tb(const TranslationBlock *tb)
-{ }
-
-static inline void page_unlock_tb(const TranslationBlock *tb)
-{ }
-
 struct page_collection *
 page_collection_lock(tb_page_addr_t start, tb_page_addr_t end)
 {
@@ -555,8 +514,7 @@ static void page_unlock__debug(const PageDesc *pd)
     g_assert(removed);
 }
 
-static void
-do_assert_page_locked(const PageDesc *pd, const char *file, int line)
+void do_assert_page_locked(const PageDesc *pd, const char *file, int line)
 {
     if (unlikely(!page_is_locked(pd))) {
         error_report("assert_page_lock: PageDesc %p not locked @ %s:%d",
@@ -565,8 +523,6 @@ do_assert_page_locked(const PageDesc *pd, const char *file, int line)
     }
 }
 
-#define assert_page_locked(pd) do_assert_page_locked(pd, __FILE__, __LINE__)
-
 void assert_no_pages_locked(void)
 {
     ht_pages_locked_debug_init();
@@ -575,50 +531,23 @@ void assert_no_pages_locked(void)
 
 #else /* !CONFIG_DEBUG_TCG */
 
-#define assert_page_locked(pd)
-
-static inline void page_lock__debug(const PageDesc *pd)
-{
-}
-
-static inline void page_unlock__debug(const PageDesc *pd)
-{
-}
+static inline void page_lock__debug(const PageDesc *pd) { }
+static inline void page_unlock__debug(const PageDesc *pd) { }
 
 #endif /* CONFIG_DEBUG_TCG */
 
-static inline void page_lock(PageDesc *pd)
+void page_lock(PageDesc *pd)
 {
     page_lock__debug(pd);
     qemu_spin_lock(&pd->lock);
 }
 
-static inline void page_unlock(PageDesc *pd)
+void page_unlock(PageDesc *pd)
 {
     qemu_spin_unlock(&pd->lock);
     page_unlock__debug(pd);
 }
 
-/* lock the page(s) of a TB in the correct acquisition order */
-static inline void page_lock_tb(const TranslationBlock *tb)
-{
-    page_lock_pair(NULL, tb->page_addr[0], NULL, tb->page_addr[1], false);
-}
-
-static inline void page_unlock_tb(const TranslationBlock *tb)
-{
-    PageDesc *p1 = page_find(tb->page_addr[0] >> TARGET_PAGE_BITS);
-
-    page_unlock(p1);
-    if (unlikely(tb->page_addr[1] != -1)) {
-        PageDesc *p2 = page_find(tb->page_addr[1] >> TARGET_PAGE_BITS);
-
-        if (p2 != p1) {
-            page_unlock(p2);
-        }
-    }
-}
-
 static inline struct page_entry *
 page_entry_new(PageDesc *pd, tb_page_addr_t index)
 {
@@ -790,434 +719,6 @@ void page_collection_unlock(struct page_collection *set)
 
 #endif /* !CONFIG_USER_ONLY */
 
-static void page_lock_pair(PageDesc **ret_p1, tb_page_addr_t phys1,
-                           PageDesc **ret_p2, tb_page_addr_t phys2, bool alloc)
-{
-    PageDesc *p1, *p2;
-    tb_page_addr_t page1;
-    tb_page_addr_t page2;
-
-    assert_memory_lock();
-    g_assert(phys1 != -1);
-
-    page1 = phys1 >> TARGET_PAGE_BITS;
-    page2 = phys2 >> TARGET_PAGE_BITS;
-
-    p1 = page_find_alloc(page1, alloc);
-    if (ret_p1) {
-        *ret_p1 = p1;
-    }
-    if (likely(phys2 == -1)) {
-        page_lock(p1);
-        return;
-    } else if (page1 == page2) {
-        page_lock(p1);
-        if (ret_p2) {
-            *ret_p2 = p1;
-        }
-        return;
-    }
-    p2 = page_find_alloc(page2, alloc);
-    if (ret_p2) {
-        *ret_p2 = p2;
-    }
-    if (page1 < page2) {
-        page_lock(p1);
-        page_lock(p2);
-    } else {
-        page_lock(p2);
-        page_lock(p1);
-    }
-}
-
-static bool tb_cmp(const void *ap, const void *bp)
-{
-    const TranslationBlock *a = ap;
-    const TranslationBlock *b = bp;
-
-    return ((TARGET_TB_PCREL || tb_pc(a) == tb_pc(b)) &&
-            a->cs_base == b->cs_base &&
-            a->flags == b->flags &&
-            (tb_cflags(a) & ~CF_INVALID) == (tb_cflags(b) & ~CF_INVALID) &&
-            a->trace_vcpu_dstate == b->trace_vcpu_dstate &&
-            a->page_addr[0] == b->page_addr[0] &&
-            a->page_addr[1] == b->page_addr[1]);
-}
-
-void tb_htable_init(void)
-{
-    unsigned int mode = QHT_MODE_AUTO_RESIZE;
-
-    qht_init(&tb_ctx.htable, tb_cmp, CODE_GEN_HTABLE_SIZE, mode);
-}
-
-/* Set to NULL all the 'first_tb' fields in all PageDescs. */
-static void page_flush_tb_1(int level, void **lp)
-{
-    int i;
-
-    if (*lp == NULL) {
-        return;
-    }
-    if (level == 0) {
-        PageDesc *pd = *lp;
-
-        for (i = 0; i < V_L2_SIZE; ++i) {
-            page_lock(&pd[i]);
-            pd[i].first_tb = (uintptr_t)NULL;
-            page_unlock(&pd[i]);
-        }
-    } else {
-        void **pp = *lp;
-
-        for (i = 0; i < V_L2_SIZE; ++i) {
-            page_flush_tb_1(level - 1, pp + i);
-        }
-    }
-}
-
-static void page_flush_tb(void)
-{
-    int i, l1_sz = v_l1_size;
-
-    for (i = 0; i < l1_sz; i++) {
-        page_flush_tb_1(v_l2_levels, l1_map + i);
-    }
-}
-
-/* flush all the translation blocks */
-static void do_tb_flush(CPUState *cpu, run_on_cpu_data tb_flush_count)
-{
-    bool did_flush = false;
-
-    mmap_lock();
-    /* If it is already been done on request of another CPU,
-     * just retry.
-     */
-    if (tb_ctx.tb_flush_count != tb_flush_count.host_int) {
-        goto done;
-    }
-    did_flush = true;
-
-    CPU_FOREACH(cpu) {
-        tcg_flush_jmp_cache(cpu);
-    }
-
-    qht_reset_size(&tb_ctx.htable, CODE_GEN_HTABLE_SIZE);
-    page_flush_tb();
-
-    tcg_region_reset_all();
-    /* XXX: flush processor icache at this point if cache flush is
-       expensive */
-    qatomic_mb_set(&tb_ctx.tb_flush_count, tb_ctx.tb_flush_count + 1);
-
-done:
-    mmap_unlock();
-    if (did_flush) {
-        qemu_plugin_flush_cb();
-    }
-}
-
-void tb_flush(CPUState *cpu)
-{
-    if (tcg_enabled()) {
-        unsigned tb_flush_count = qatomic_mb_read(&tb_ctx.tb_flush_count);
-
-        if (cpu_in_exclusive_context(cpu)) {
-            do_tb_flush(cpu, RUN_ON_CPU_HOST_INT(tb_flush_count));
-        } else {
-            async_safe_run_on_cpu(cpu, do_tb_flush,
-                                  RUN_ON_CPU_HOST_INT(tb_flush_count));
-        }
-    }
-}
-
-/*
- * user-mode: call with mmap_lock held
- * !user-mode: call with @pd->lock held
- */
-static inline void tb_page_remove(PageDesc *pd, TranslationBlock *tb)
-{
-    TranslationBlock *tb1;
-    uintptr_t *pprev;
-    unsigned int n1;
-
-    assert_page_locked(pd);
-    pprev = &pd->first_tb;
-    PAGE_FOR_EACH_TB(pd, tb1, n1) {
-        if (tb1 == tb) {
-            *pprev = tb1->page_next[n1];
-            return;
-        }
-        pprev = &tb1->page_next[n1];
-    }
-    g_assert_not_reached();
-}
-
-/* remove @orig from its @n_orig-th jump list */
-static inline void tb_remove_from_jmp_list(TranslationBlock *orig, int n_orig)
-{
-    uintptr_t ptr, ptr_locked;
-    TranslationBlock *dest;
-    TranslationBlock *tb;
-    uintptr_t *pprev;
-    int n;
-
-    /* mark the LSB of jmp_dest[] so that no further jumps can be inserted */
-    ptr = qatomic_or_fetch(&orig->jmp_dest[n_orig], 1);
-    dest = (TranslationBlock *)(ptr & ~1);
-    if (dest == NULL) {
-        return;
-    }
-
-    qemu_spin_lock(&dest->jmp_lock);
-    /*
-     * While acquiring the lock, the jump might have been removed if the
-     * destination TB was invalidated; check again.
-     */
-    ptr_locked = qatomic_read(&orig->jmp_dest[n_orig]);
-    if (ptr_locked != ptr) {
-        qemu_spin_unlock(&dest->jmp_lock);
-        /*
-         * The only possibility is that the jump was unlinked via
-         * tb_jump_unlink(dest). Seeing here another destination would be a bug,
-         * because we set the LSB above.
-         */
-        g_assert(ptr_locked == 1 && dest->cflags & CF_INVALID);
-        return;
-    }
-    /*
-     * We first acquired the lock, and since the destination pointer matches,
-     * we know for sure that @orig is in the jmp list.
-     */
-    pprev = &dest->jmp_list_head;
-    TB_FOR_EACH_JMP(dest, tb, n) {
-        if (tb == orig && n == n_orig) {
-            *pprev = tb->jmp_list_next[n];
-            /* no need to set orig->jmp_dest[n]; setting the LSB was enough */
-            qemu_spin_unlock(&dest->jmp_lock);
-            return;
-        }
-        pprev = &tb->jmp_list_next[n];
-    }
-    g_assert_not_reached();
-}
-
-/* reset the jump entry 'n' of a TB so that it is not chained to
-   another TB */
-static inline void tb_reset_jump(TranslationBlock *tb, int n)
-{
-    uintptr_t addr = (uintptr_t)(tb->tc.ptr + tb->jmp_reset_offset[n]);
-    tb_set_jmp_target(tb, n, addr);
-}
-
-/* remove any jumps to the TB */
-static inline void tb_jmp_unlink(TranslationBlock *dest)
-{
-    TranslationBlock *tb;
-    int n;
-
-    qemu_spin_lock(&dest->jmp_lock);
-
-    TB_FOR_EACH_JMP(dest, tb, n) {
-        tb_reset_jump(tb, n);
-        qatomic_and(&tb->jmp_dest[n], (uintptr_t)NULL | 1);
-        /* No need to clear the list entry; setting the dest ptr is enough */
-    }
-    dest->jmp_list_head = (uintptr_t)NULL;
-
-    qemu_spin_unlock(&dest->jmp_lock);
-}
-
-static void tb_jmp_cache_inval_tb(TranslationBlock *tb)
-{
-    CPUState *cpu;
-
-    if (TARGET_TB_PCREL) {
-        /* A TB may be at any virtual address */
-        CPU_FOREACH(cpu) {
-            tcg_flush_jmp_cache(cpu);
-        }
-    } else {
-        uint32_t h = tb_jmp_cache_hash_func(tb_pc(tb));
-
-        CPU_FOREACH(cpu) {
-            CPUJumpCache *jc = cpu->tb_jmp_cache;
-
-            if (qatomic_read(&jc->array[h].tb) == tb) {
-                qatomic_set(&jc->array[h].tb, NULL);
-            }
-        }
-    }
-}
-
-/*
- * In user-mode, call with mmap_lock held.
- * In !user-mode, if @rm_from_page_list is set, call with the TB's pages'
- * locks held.
- */
-static void do_tb_phys_invalidate(TranslationBlock *tb, bool rm_from_page_list)
-{
-    PageDesc *p;
-    uint32_t h;
-    tb_page_addr_t phys_pc;
-    uint32_t orig_cflags = tb_cflags(tb);
-
-    assert_memory_lock();
-
-    /* make sure no further incoming jumps will be chained to this TB */
-    qemu_spin_lock(&tb->jmp_lock);
-    qatomic_set(&tb->cflags, tb->cflags | CF_INVALID);
-    qemu_spin_unlock(&tb->jmp_lock);
-
-    /* remove the TB from the hash list */
-    phys_pc = tb->page_addr[0];
-    h = tb_hash_func(phys_pc, (TARGET_TB_PCREL ? 0 : tb_pc(tb)),
-                     tb->flags, orig_cflags, tb->trace_vcpu_dstate);
-    if (!qht_remove(&tb_ctx.htable, tb, h)) {
-        return;
-    }
-
-    /* remove the TB from the page list */
-    if (rm_from_page_list) {
-        p = page_find(tb->page_addr[0] >> TARGET_PAGE_BITS);
-        tb_page_remove(p, tb);
-        if (tb->page_addr[1] != -1) {
-            p = page_find(tb->page_addr[1] >> TARGET_PAGE_BITS);
-            tb_page_remove(p, tb);
-        }
-    }
-
-    /* remove the TB from the hash list */
-    tb_jmp_cache_inval_tb(tb);
-
-    /* suppress this TB from the two jump lists */
-    tb_remove_from_jmp_list(tb, 0);
-    tb_remove_from_jmp_list(tb, 1);
-
-    /* suppress any remaining jumps to this TB */
-    tb_jmp_unlink(tb);
-
-    qatomic_set(&tb_ctx.tb_phys_invalidate_count,
-                tb_ctx.tb_phys_invalidate_count + 1);
-}
-
-static void tb_phys_invalidate__locked(TranslationBlock *tb)
-{
-    qemu_thread_jit_write();
-    do_tb_phys_invalidate(tb, true);
-    qemu_thread_jit_execute();
-}
-
-/* invalidate one TB
- *
- * Called with mmap_lock held in user-mode.
- */
-void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
-{
-    if (page_addr == -1 && tb->page_addr[0] != -1) {
-        page_lock_tb(tb);
-        do_tb_phys_invalidate(tb, true);
-        page_unlock_tb(tb);
-    } else {
-        do_tb_phys_invalidate(tb, false);
-    }
-}
-
-/* add the tb in the target page and protect it if necessary
- *
- * Called with mmap_lock held for user-mode emulation.
- * Called with @p->lock held in !user-mode.
- */
-static inline void tb_page_add(PageDesc *p, TranslationBlock *tb,
-                               unsigned int n, tb_page_addr_t page_addr)
-{
-#ifndef CONFIG_USER_ONLY
-    bool page_already_protected;
-#endif
-
-    assert_page_locked(p);
-
-    tb->page_addr[n] = page_addr;
-    tb->page_next[n] = p->first_tb;
-#ifndef CONFIG_USER_ONLY
-    page_already_protected = p->first_tb != (uintptr_t)NULL;
-#endif
-    p->first_tb = (uintptr_t)tb | n;
-
-#if defined(CONFIG_USER_ONLY)
-    /* translator_loop() must have made all TB pages non-writable */
-    assert(!(p->flags & PAGE_WRITE));
-#else
-    /* if some code is already present, then the pages are already
-       protected. So we handle the case where only the first TB is
-       allocated in a physical page */
-    if (!page_already_protected) {
-        tlb_protect_code(page_addr);
-    }
-#endif
-}
-
-/*
- * Add a new TB and link it to the physical page tables. phys_page2 is
- * (-1) to indicate that only one page contains the TB.
- *
- * Called with mmap_lock held for user-mode emulation.
- *
- * Returns a pointer @tb, or a pointer to an existing TB that matches @tb.
- * Note that in !user-mode, another thread might have already added a TB
- * for the same block of guest code that @tb corresponds to. In that case,
- * the caller should discard the original @tb, and use instead the returned TB.
- */
-static TranslationBlock *
-tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
-             tb_page_addr_t phys_page2)
-{
-    PageDesc *p;
-    PageDesc *p2 = NULL;
-    void *existing_tb = NULL;
-    uint32_t h;
-
-    assert_memory_lock();
-    tcg_debug_assert(!(tb->cflags & CF_INVALID));
-
-    /*
-     * Add the TB to the page list, acquiring first the pages's locks.
-     * We keep the locks held until after inserting the TB in the hash table,
-     * so that if the insertion fails we know for sure that the TBs are still
-     * in the page descriptors.
-     * Note that inserting into the hash table first isn't an option, since
-     * we can only insert TBs that are fully initialized.
-     */
-    page_lock_pair(&p, phys_pc, &p2, phys_page2, true);
-    tb_page_add(p, tb, 0, phys_pc);
-    if (p2) {
-        tb_page_add(p2, tb, 1, phys_page2);
-    } else {
-        tb->page_addr[1] = -1;
-    }
-
-    /* add in the hash table */
-    h = tb_hash_func(phys_pc, (TARGET_TB_PCREL ? 0 : tb_pc(tb)),
-                     tb->flags, tb->cflags, tb->trace_vcpu_dstate);
-    qht_insert(&tb_ctx.htable, tb, h, &existing_tb);
-
-    /* remove TB from the page(s) if we couldn't insert it */
-    if (unlikely(existing_tb)) {
-        tb_page_remove(p, tb);
-        if (p2) {
-            tb_page_remove(p2, tb);
-        }
-        tb = existing_tb;
-    }
-
-    if (p2 && p2 != p) {
-        page_unlock(p2);
-    }
-    page_unlock(p);
-    return tb;
-}
-
 /* Called with mmap_lock held for user mode emulation.  */
 TranslationBlock *tb_gen_code(CPUState *cpu,
                               target_ulong pc, target_ulong cs_base,
@@ -1497,251 +998,6 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
     return tb;
 }
 
-/*
- * @p must be non-NULL.
- * user-mode: call with mmap_lock held.
- * !user-mode: call with all @pages locked.
- */
-static void
-tb_invalidate_phys_page_range__locked(struct page_collection *pages,
-                                      PageDesc *p, tb_page_addr_t start,
-                                      tb_page_addr_t end,
-                                      uintptr_t retaddr)
-{
-    TranslationBlock *tb;
-    tb_page_addr_t tb_start, tb_end;
-    int n;
-#ifdef TARGET_HAS_PRECISE_SMC
-    CPUState *cpu = current_cpu;
-    CPUArchState *env = NULL;
-    bool current_tb_not_found = retaddr != 0;
-    bool current_tb_modified = false;
-    TranslationBlock *current_tb = NULL;
-    target_ulong current_pc = 0;
-    target_ulong current_cs_base = 0;
-    uint32_t current_flags = 0;
-#endif /* TARGET_HAS_PRECISE_SMC */
-
-    assert_page_locked(p);
-
-#if defined(TARGET_HAS_PRECISE_SMC)
-    if (cpu != NULL) {
-        env = cpu->env_ptr;
-    }
-#endif
-
-    /* we remove all the TBs in the range [start, end[ */
-    /* XXX: see if in some cases it could be faster to invalidate all
-       the code */
-    PAGE_FOR_EACH_TB(p, tb, n) {
-        assert_page_locked(p);
-        /* NOTE: this is subtle as a TB may span two physical pages */
-        if (n == 0) {
-            /* NOTE: tb_end may be after the end of the page, but
-               it is not a problem */
-            tb_start = tb->page_addr[0];
-            tb_end = tb_start + tb->size;
-        } else {
-            tb_start = tb->page_addr[1];
-            tb_end = tb_start + ((tb->page_addr[0] + tb->size)
-                                 & ~TARGET_PAGE_MASK);
-        }
-        if (!(tb_end <= start || tb_start >= end)) {
-#ifdef TARGET_HAS_PRECISE_SMC
-            if (current_tb_not_found) {
-                current_tb_not_found = false;
-                /* now we have a real cpu fault */
-                current_tb = tcg_tb_lookup(retaddr);
-            }
-            if (current_tb == tb &&
-                (tb_cflags(current_tb) & CF_COUNT_MASK) != 1) {
-                /*
-                 * If we are modifying the current TB, we must stop
-                 * its execution. We could be more precise by checking
-                 * that the modification is after the current PC, but it
-                 * would require a specialized function to partially
-                 * restore the CPU state.
-                 */
-                current_tb_modified = true;
-                cpu_restore_state_from_tb(cpu, current_tb, retaddr, true);
-                cpu_get_tb_cpu_state(env, &current_pc, &current_cs_base,
-                                     &current_flags);
-            }
-#endif /* TARGET_HAS_PRECISE_SMC */
-            tb_phys_invalidate__locked(tb);
-        }
-    }
-#if !defined(CONFIG_USER_ONLY)
-    /* if no code remaining, no need to continue to use slow writes */
-    if (!p->first_tb) {
-        tlb_unprotect_code(start);
-    }
-#endif
-#ifdef TARGET_HAS_PRECISE_SMC
-    if (current_tb_modified) {
-        page_collection_unlock(pages);
-        /* Force execution of one insn next time.  */
-        cpu->cflags_next_tb = 1 | CF_NOIRQ | curr_cflags(cpu);
-        mmap_unlock();
-        cpu_loop_exit_noexc(cpu);
-    }
-#endif
-}
-
-/*
- * Invalidate all TBs which intersect with the target physical address range
- * [start;end[. NOTE: start and end must refer to the *same* physical page.
- * 'is_cpu_write_access' should be true if called from a real cpu write
- * access: the virtual CPU will exit the current TB if code is modified inside
- * this TB.
- *
- * Called with mmap_lock held for user-mode emulation
- */
-void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end)
-{
-    struct page_collection *pages;
-    PageDesc *p;
-
-    assert_memory_lock();
-
-    p = page_find(start >> TARGET_PAGE_BITS);
-    if (p == NULL) {
-        return;
-    }
-    pages = page_collection_lock(start, end);
-    tb_invalidate_phys_page_range__locked(pages, p, start, end, 0);
-    page_collection_unlock(pages);
-}
-
-/*
- * Invalidate all TBs which intersect with the target physical address range
- * [start;end[. NOTE: start and end may refer to *different* physical pages.
- * 'is_cpu_write_access' should be true if called from a real cpu write
- * access: the virtual CPU will exit the current TB if code is modified inside
- * this TB.
- *
- * Called with mmap_lock held for user-mode emulation.
- */
-#ifdef CONFIG_SOFTMMU
-void tb_invalidate_phys_range(ram_addr_t start, ram_addr_t end)
-#else
-void tb_invalidate_phys_range(target_ulong start, target_ulong end)
-#endif
-{
-    struct page_collection *pages;
-    tb_page_addr_t next;
-
-    assert_memory_lock();
-
-    pages = page_collection_lock(start, end);
-    for (next = (start & TARGET_PAGE_MASK) + TARGET_PAGE_SIZE;
-         start < end;
-         start = next, next += TARGET_PAGE_SIZE) {
-        PageDesc *pd = page_find(start >> TARGET_PAGE_BITS);
-        tb_page_addr_t bound = MIN(next, end);
-
-        if (pd == NULL) {
-            continue;
-        }
-        tb_invalidate_phys_page_range__locked(pages, pd, start, bound, 0);
-    }
-    page_collection_unlock(pages);
-}
-
-#ifdef CONFIG_SOFTMMU
-/* len must be <= 8 and start must be a multiple of len.
- * Called via softmmu_template.h when code areas are written to with
- * iothread mutex not held.
- *
- * Call with all @pages in the range [@start, @start + len[ locked.
- */
-void tb_invalidate_phys_page_fast(struct page_collection *pages,
-                                  tb_page_addr_t start, int len,
-                                  uintptr_t retaddr)
-{
-    PageDesc *p;
-
-    assert_memory_lock();
-
-    p = page_find(start >> TARGET_PAGE_BITS);
-    if (!p) {
-        return;
-    }
-
-    assert_page_locked(p);
-    tb_invalidate_phys_page_range__locked(pages, p, start, start + len,
-                                          retaddr);
-}
-#else
-/* Called with mmap_lock held. If pc is not 0 then it indicates the
- * host PC of the faulting store instruction that caused this invalidate.
- * Returns true if the caller needs to abort execution of the current
- * TB (because it was modified by this store and the guest CPU has
- * precise-SMC semantics).
- */
-static bool tb_invalidate_phys_page(tb_page_addr_t addr, uintptr_t pc)
-{
-    TranslationBlock *tb;
-    PageDesc *p;
-    int n;
-#ifdef TARGET_HAS_PRECISE_SMC
-    TranslationBlock *current_tb = NULL;
-    CPUState *cpu = current_cpu;
-    CPUArchState *env = NULL;
-    int current_tb_modified = 0;
-    target_ulong current_pc = 0;
-    target_ulong current_cs_base = 0;
-    uint32_t current_flags = 0;
-#endif
-
-    assert_memory_lock();
-
-    addr &= TARGET_PAGE_MASK;
-    p = page_find(addr >> TARGET_PAGE_BITS);
-    if (!p) {
-        return false;
-    }
-
-#ifdef TARGET_HAS_PRECISE_SMC
-    if (p->first_tb && pc != 0) {
-        current_tb = tcg_tb_lookup(pc);
-    }
-    if (cpu != NULL) {
-        env = cpu->env_ptr;
-    }
-#endif
-    assert_page_locked(p);
-    PAGE_FOR_EACH_TB(p, tb, n) {
-#ifdef TARGET_HAS_PRECISE_SMC
-        if (current_tb == tb &&
-            (tb_cflags(current_tb) & CF_COUNT_MASK) != 1) {
-                /* If we are modifying the current TB, we must stop
-                   its execution. We could be more precise by checking
-                   that the modification is after the current PC, but it
-                   would require a specialized function to partially
-                   restore the CPU state */
-
-            current_tb_modified = 1;
-            cpu_restore_state_from_tb(cpu, current_tb, pc, true);
-            cpu_get_tb_cpu_state(env, &current_pc, &current_cs_base,
-                                 &current_flags);
-        }
-#endif /* TARGET_HAS_PRECISE_SMC */
-        tb_phys_invalidate(tb, addr);
-    }
-    p->first_tb = (uintptr_t)NULL;
-#ifdef TARGET_HAS_PRECISE_SMC
-    if (current_tb_modified) {
-        /* Force execution of one insn next time.  */
-        cpu->cflags_next_tb = 1 | CF_NOIRQ | curr_cflags(cpu);
-        return true;
-    }
-#endif
-
-    return false;
-}
-#endif
-
 /* user-mode: call with mmap_lock held */
 void tb_check_watchpoint(CPUState *cpu, uintptr_t retaddr)
 {
diff --git a/accel/tcg/meson.build b/accel/tcg/meson.build
index 7a0a79d731..75e1dffb4d 100644
--- a/accel/tcg/meson.build
+++ b/accel/tcg/meson.build
@@ -3,6 +3,7 @@ tcg_ss.add(files(
   'tcg-all.c',
   'cpu-exec-common.c',
   'cpu-exec.c',
+  'tb-maint.c',
   'tcg-runtime-gvec.c',
   'tcg-runtime.c',
   'translate-all.c',
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 06/24] accel/tcg: Move assert_no_pages_locked to internal.h
  2022-10-06  3:10 [PATCH 00/24] accel/tcg: Rewrite user-only vma tracking Richard Henderson
                   ` (4 preceding siblings ...)
  2022-10-06  3:10 ` [PATCH 05/24] accel/tcg: Split out tb-maint.c Richard Henderson
@ 2022-10-06  3:10 ` Richard Henderson
  2022-10-25  8:49   ` Alex Bennée
  2022-10-06  3:10 ` [PATCH 07/24] accel/tcg: Drop cpu_get_tb_cpu_state from TARGET_HAS_PRECISE_SMC Richard Henderson
                   ` (18 subsequent siblings)
  24 siblings, 1 reply; 55+ messages in thread
From: Richard Henderson @ 2022-10-06  3:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, laurent, pbonzini, imp, f4bug

There are no users outside of accel/tcg; this function
does not need to be defined in exec-all.h.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/internal.h    | 5 +++++
 include/exec/exec-all.h | 8 --------
 2 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/accel/tcg/internal.h b/accel/tcg/internal.h
index a77b110b78..1a704ee14f 100644
--- a/accel/tcg/internal.h
+++ b/accel/tcg/internal.h
@@ -90,6 +90,11 @@ void do_assert_page_locked(const PageDesc *pd, const char *file, int line);
 void page_lock(PageDesc *pd);
 void page_unlock(PageDesc *pd);
 #endif
+#if !defined(CONFIG_USER_ONLY) && defined(CONFIG_DEBUG_TCG)
+void assert_no_pages_locked(void);
+#else
+static inline void assert_no_pages_locked(void) { }
+#endif
 
 TranslationBlock *tb_gen_code(CPUState *cpu, target_ulong pc,
                               target_ulong cs_base, uint32_t flags,
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index e5f8b224a5..b5bde1b56a 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -642,14 +642,6 @@ extern __thread uintptr_t tci_tb_ptr;
    smaller than 4 bytes, so we don't worry about special-casing this.  */
 #define GETPC_ADJ   2
 
-#if !defined(CONFIG_USER_ONLY) && defined(CONFIG_DEBUG_TCG)
-void assert_no_pages_locked(void);
-#else
-static inline void assert_no_pages_locked(void)
-{
-}
-#endif
-
 #if !defined(CONFIG_USER_ONLY)
 
 /**
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 07/24] accel/tcg: Drop cpu_get_tb_cpu_state from TARGET_HAS_PRECISE_SMC
  2022-10-06  3:10 [PATCH 00/24] accel/tcg: Rewrite user-only vma tracking Richard Henderson
                   ` (5 preceding siblings ...)
  2022-10-06  3:10 ` [PATCH 06/24] accel/tcg: Move assert_no_pages_locked to internal.h Richard Henderson
@ 2022-10-06  3:10 ` Richard Henderson
  2022-10-25  8:50   ` Alex Bennée
  2022-10-06  3:10 ` [PATCH 08/24] accel/tcg: Remove duplicate store to tb->page_addr[] Richard Henderson
                   ` (17 subsequent siblings)
  24 siblings, 1 reply; 55+ messages in thread
From: Richard Henderson @ 2022-10-06  3:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, laurent, pbonzini, imp, f4bug

The results of the calls to cpu_get_tb_cpu_state,
current_{pc,cs_base,flags}, are not used.
In tb_invalidate_phys_page, use bool for current_tb_modified.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/tb-maint.c | 25 ++-----------------------
 1 file changed, 2 insertions(+), 23 deletions(-)

diff --git a/accel/tcg/tb-maint.c b/accel/tcg/tb-maint.c
index 66c1900ae6..9af5cb49e0 100644
--- a/accel/tcg/tb-maint.c
+++ b/accel/tcg/tb-maint.c
@@ -502,23 +502,13 @@ tb_invalidate_phys_page_range__locked(struct page_collection *pages,
     int n;
 #ifdef TARGET_HAS_PRECISE_SMC
     CPUState *cpu = current_cpu;
-    CPUArchState *env = NULL;
     bool current_tb_not_found = retaddr != 0;
     bool current_tb_modified = false;
     TranslationBlock *current_tb = NULL;
-    target_ulong current_pc = 0;
-    target_ulong current_cs_base = 0;
-    uint32_t current_flags = 0;
 #endif /* TARGET_HAS_PRECISE_SMC */
 
     assert_page_locked(p);
 
-#if defined(TARGET_HAS_PRECISE_SMC)
-    if (cpu != NULL) {
-        env = cpu->env_ptr;
-    }
-#endif
-
     /*
      * We remove all the TBs in the range [start, end[.
      * XXX: see if in some cases it could be faster to invalidate all the code
@@ -554,8 +544,6 @@ tb_invalidate_phys_page_range__locked(struct page_collection *pages,
                  */
                 current_tb_modified = true;
                 cpu_restore_state_from_tb(cpu, current_tb, retaddr, true);
-                cpu_get_tb_cpu_state(env, &current_pc, &current_cs_base,
-                                     &current_flags);
             }
 #endif /* TARGET_HAS_PRECISE_SMC */
             tb_phys_invalidate__locked(tb);
@@ -679,11 +667,7 @@ bool tb_invalidate_phys_page(tb_page_addr_t addr, uintptr_t pc)
 #ifdef TARGET_HAS_PRECISE_SMC
     TranslationBlock *current_tb = NULL;
     CPUState *cpu = current_cpu;
-    CPUArchState *env = NULL;
-    int current_tb_modified = 0;
-    target_ulong current_pc = 0;
-    target_ulong current_cs_base = 0;
-    uint32_t current_flags = 0;
+    bool current_tb_modified = false;
 #endif
 
     assert_memory_lock();
@@ -698,9 +682,6 @@ bool tb_invalidate_phys_page(tb_page_addr_t addr, uintptr_t pc)
     if (p->first_tb && pc != 0) {
         current_tb = tcg_tb_lookup(pc);
     }
-    if (cpu != NULL) {
-        env = cpu->env_ptr;
-    }
 #endif
     assert_page_locked(p);
     PAGE_FOR_EACH_TB(p, tb, n) {
@@ -713,10 +694,8 @@ bool tb_invalidate_phys_page(tb_page_addr_t addr, uintptr_t pc)
              * after the current PC, but it would require a specialized
              * function to partially restore the CPU state.
              */
-            current_tb_modified = 1;
+            current_tb_modified = true;
             cpu_restore_state_from_tb(cpu, current_tb, pc, true);
-            cpu_get_tb_cpu_state(env, &current_pc, &current_cs_base,
-                                 &current_flags);
         }
 #endif /* TARGET_HAS_PRECISE_SMC */
         tb_phys_invalidate(tb, addr);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 08/24] accel/tcg: Remove duplicate store to tb->page_addr[]
  2022-10-06  3:10 [PATCH 00/24] accel/tcg: Rewrite user-only vma tracking Richard Henderson
                   ` (6 preceding siblings ...)
  2022-10-06  3:10 ` [PATCH 07/24] accel/tcg: Drop cpu_get_tb_cpu_state from TARGET_HAS_PRECISE_SMC Richard Henderson
@ 2022-10-06  3:10 ` Richard Henderson
  2022-10-25  9:12   ` Alex Bennée
  2022-10-06  3:10 ` [PATCH 09/24] accel/tcg: Introduce tb_{set_}page_addr{0,1} Richard Henderson
                   ` (16 subsequent siblings)
  24 siblings, 1 reply; 55+ messages in thread
From: Richard Henderson @ 2022-10-06  3:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, laurent, pbonzini, imp, f4bug

When we added the fast path, we initialized page_addr[] early.
These stores in and around tb_page_add() are redundant; remove them.

Fixes: 50627f1b7b1 ("accel/tcg: Add fast path for translator_ld*")
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/tb-maint.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/accel/tcg/tb-maint.c b/accel/tcg/tb-maint.c
index 9af5cb49e0..7f4e1e1299 100644
--- a/accel/tcg/tb-maint.c
+++ b/accel/tcg/tb-maint.c
@@ -405,7 +405,6 @@ static inline void tb_page_add(PageDesc *p, TranslationBlock *tb,
 
     assert_page_locked(p);
 
-    tb->page_addr[n] = page_addr;
     tb->page_next[n] = p->first_tb;
 #ifndef CONFIG_USER_ONLY
     page_already_protected = p->first_tb != (uintptr_t)NULL;
@@ -461,8 +460,6 @@ TranslationBlock *tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
     tb_page_add(p, tb, 0, phys_pc);
     if (p2) {
         tb_page_add(p2, tb, 1, phys_page2);
-    } else {
-        tb->page_addr[1] = -1;
     }
 
     /* add in the hash table */
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 09/24] accel/tcg: Introduce tb_{set_}page_addr{0,1}
  2022-10-06  3:10 [PATCH 00/24] accel/tcg: Rewrite user-only vma tracking Richard Henderson
                   ` (7 preceding siblings ...)
  2022-10-06  3:10 ` [PATCH 08/24] accel/tcg: Remove duplicate store to tb->page_addr[] Richard Henderson
@ 2022-10-06  3:10 ` Richard Henderson
  2022-10-25  9:47   ` Alex Bennée
  2022-10-06  3:10 ` [PATCH 10/24] accel/tcg: Rename tb_invalidate_phys_page Richard Henderson
                   ` (15 subsequent siblings)
  24 siblings, 1 reply; 55+ messages in thread
From: Richard Henderson @ 2022-10-06  3:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, laurent, pbonzini, imp, f4bug

This data structure will be replaced for user-only: add accessors.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/exec/exec-all.h   | 22 ++++++++++++++++++++++
 accel/tcg/cpu-exec.c      |  9 +++++----
 accel/tcg/tb-maint.c      | 29 +++++++++++++++--------------
 accel/tcg/translate-all.c | 16 ++++++++--------
 accel/tcg/translator.c    |  9 +++++----
 5 files changed, 55 insertions(+), 30 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index b5bde1b56a..5900f4637b 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -610,6 +610,28 @@ static inline uint32_t tb_cflags(const TranslationBlock *tb)
     return qatomic_read(&tb->cflags);
 }
 
+static inline tb_page_addr_t tb_page_addr0(const TranslationBlock *tb)
+{
+    return tb->page_addr[0];
+}
+
+static inline tb_page_addr_t tb_page_addr1(const TranslationBlock *tb)
+{
+    return tb->page_addr[1];
+}
+
+static inline void tb_set_page_addr0(TranslationBlock *tb,
+                                     tb_page_addr_t addr)
+{
+    tb->page_addr[0] = addr;
+}
+
+static inline void tb_set_page_addr1(TranslationBlock *tb,
+                                     tb_page_addr_t addr)
+{
+    tb->page_addr[1] = addr;
+}
+
 /* current cflags for hashing/comparison */
 uint32_t curr_cflags(CPUState *cpu);
 
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index f9e5cc9ba0..70eb8a58d9 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -187,13 +187,14 @@ static bool tb_lookup_cmp(const void *p, const void *d)
     const struct tb_desc *desc = d;
 
     if ((TARGET_TB_PCREL || tb_pc(tb) == desc->pc) &&
-        tb->page_addr[0] == desc->page_addr0 &&
+        tb_page_addr0(tb) == desc->page_addr0 &&
         tb->cs_base == desc->cs_base &&
         tb->flags == desc->flags &&
         tb->trace_vcpu_dstate == desc->trace_vcpu_dstate &&
         tb_cflags(tb) == desc->cflags) {
         /* check next page if needed */
-        if (tb->page_addr[1] == -1) {
+        tb_page_addr_t tb_phys_page1 = tb_page_addr1(tb);
+        if (tb_phys_page1 == -1) {
             return true;
         } else {
             tb_page_addr_t phys_page1;
@@ -210,7 +211,7 @@ static bool tb_lookup_cmp(const void *p, const void *d)
              */
             virt_page1 = TARGET_PAGE_ALIGN(desc->pc);
             phys_page1 = get_page_addr_code(desc->env, virt_page1);
-            if (tb->page_addr[1] == phys_page1) {
+            if (tb_phys_page1 == phys_page1) {
                 return true;
             }
         }
@@ -1016,7 +1017,7 @@ int cpu_exec(CPUState *cpu)
              * direct jump to a TB spanning two pages because the mapping
              * for the second page can change.
              */
-            if (tb->page_addr[1] != -1) {
+            if (tb_page_addr1(tb) != -1) {
                 last_tb = NULL;
             }
 #endif
diff --git a/accel/tcg/tb-maint.c b/accel/tcg/tb-maint.c
index 7f4e1e1299..15ec2f741d 100644
--- a/accel/tcg/tb-maint.c
+++ b/accel/tcg/tb-maint.c
@@ -44,8 +44,8 @@ static bool tb_cmp(const void *ap, const void *bp)
             a->flags == b->flags &&
             (tb_cflags(a) & ~CF_INVALID) == (tb_cflags(b) & ~CF_INVALID) &&
             a->trace_vcpu_dstate == b->trace_vcpu_dstate &&
-            a->page_addr[0] == b->page_addr[0] &&
-            a->page_addr[1] == b->page_addr[1]);
+            tb_page_addr0(a) == tb_page_addr0(b) &&
+            tb_page_addr1(a) == tb_page_addr1(b));
 }
 
 void tb_htable_init(void)
@@ -273,7 +273,7 @@ static void do_tb_phys_invalidate(TranslationBlock *tb, bool rm_from_page_list)
     qemu_spin_unlock(&tb->jmp_lock);
 
     /* remove the TB from the hash list */
-    phys_pc = tb->page_addr[0];
+    phys_pc = tb_page_addr0(tb);
     h = tb_hash_func(phys_pc, (TARGET_TB_PCREL ? 0 : tb_pc(tb)),
                      tb->flags, orig_cflags, tb->trace_vcpu_dstate);
     if (!qht_remove(&tb_ctx.htable, tb, h)) {
@@ -282,10 +282,11 @@ static void do_tb_phys_invalidate(TranslationBlock *tb, bool rm_from_page_list)
 
     /* remove the TB from the page list */
     if (rm_from_page_list) {
-        p = page_find(tb->page_addr[0] >> TARGET_PAGE_BITS);
+        p = page_find(phys_pc >> TARGET_PAGE_BITS);
         tb_page_remove(p, tb);
-        if (tb->page_addr[1] != -1) {
-            p = page_find(tb->page_addr[1] >> TARGET_PAGE_BITS);
+        phys_pc = tb_page_addr1(tb);
+        if (phys_pc != -1) {
+            p = page_find(phys_pc >> TARGET_PAGE_BITS);
             tb_page_remove(p, tb);
         }
     }
@@ -358,16 +359,16 @@ static inline void page_unlock_tb(const TranslationBlock *tb) { }
 /* lock the page(s) of a TB in the correct acquisition order */
 static void page_lock_tb(const TranslationBlock *tb)
 {
-    page_lock_pair(NULL, tb->page_addr[0], NULL, tb->page_addr[1], false);
+    page_lock_pair(NULL, tb_page_addr0(tb), NULL, tb_page_addr1(tb), false);
 }
 
 static void page_unlock_tb(const TranslationBlock *tb)
 {
-    PageDesc *p1 = page_find(tb->page_addr[0] >> TARGET_PAGE_BITS);
+    PageDesc *p1 = page_find(tb_page_addr0(tb) >> TARGET_PAGE_BITS);
 
     page_unlock(p1);
-    if (unlikely(tb->page_addr[1] != -1)) {
-        PageDesc *p2 = page_find(tb->page_addr[1] >> TARGET_PAGE_BITS);
+    if (unlikely(tb_page_addr1(tb) != -1)) {
+        PageDesc *p2 = page_find(tb_page_addr1(tb) >> TARGET_PAGE_BITS);
 
         if (p2 != p1) {
             page_unlock(p2);
@@ -382,7 +383,7 @@ static void page_unlock_tb(const TranslationBlock *tb)
  */
 void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
 {
-    if (page_addr == -1 && tb->page_addr[0] != -1) {
+    if (page_addr == -1 && tb_page_addr0(tb) != -1) {
         page_lock_tb(tb);
         do_tb_phys_invalidate(tb, true);
         page_unlock_tb(tb);
@@ -516,11 +517,11 @@ tb_invalidate_phys_page_range__locked(struct page_collection *pages,
         if (n == 0) {
             /* NOTE: tb_end may be after the end of the page, but
                it is not a problem */
-            tb_start = tb->page_addr[0];
+            tb_start = tb_page_addr0(tb);
             tb_end = tb_start + tb->size;
         } else {
-            tb_start = tb->page_addr[1];
-            tb_end = tb_start + ((tb->page_addr[0] + tb->size)
+            tb_start = tb_page_addr1(tb);
+            tb_end = tb_start + ((tb_page_addr0(tb) + tb->size)
                                  & ~TARGET_PAGE_MASK);
         }
         if (!(tb_end <= start || tb_start >= end)) {
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 5e28e9fccd..bef4c56cff 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -698,9 +698,9 @@ page_collection_lock(tb_page_addr_t start, tb_page_addr_t end)
         }
         assert_page_locked(pd);
         PAGE_FOR_EACH_TB(pd, tb, n) {
-            if (page_trylock_add(set, tb->page_addr[0]) ||
-                (tb->page_addr[1] != -1 &&
-                 page_trylock_add(set, tb->page_addr[1]))) {
+            if (page_trylock_add(set, tb_page_addr0(tb)) ||
+                (tb_page_addr1(tb) != -1 &&
+                 page_trylock_add(set, tb_page_addr1(tb)))) {
                 /* drop all locks, and reacquire in order */
                 g_tree_foreach(set->tree, page_entry_unlock, NULL);
                 goto retry;
@@ -771,8 +771,8 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
     tb->flags = flags;
     tb->cflags = cflags;
     tb->trace_vcpu_dstate = *cpu->trace_dstate;
-    tb->page_addr[0] = phys_pc;
-    tb->page_addr[1] = -1;
+    tb_set_page_addr0(tb, phys_pc);
+    tb_set_page_addr1(tb, -1);
     tcg_ctx->tb_cflags = cflags;
  tb_overflow:
 
@@ -970,7 +970,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
      * a temporary one-insn TB, and we have nothing left to do. Return early
      * before attempting to link to other TBs or add to the lookup table.
      */
-    if (tb->page_addr[0] == -1) {
+    if (tb_page_addr0(tb) == -1) {
         return tb;
     }
 
@@ -985,7 +985,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
      * No explicit memory barrier is required -- tb_link_page() makes the
      * TB visible in a consistent state.
      */
-    existing_tb = tb_link_page(tb, tb->page_addr[0], tb->page_addr[1]);
+    existing_tb = tb_link_page(tb, tb_page_addr0(tb), tb_page_addr1(tb));
     /* if the TB already exists, discard what we just translated */
     if (unlikely(existing_tb != tb)) {
         uintptr_t orig_aligned = (uintptr_t)gen_code_buf;
@@ -1140,7 +1140,7 @@ static gboolean tb_tree_stats_iter(gpointer key, gpointer value, gpointer data)
     if (tb->size > tst->max_target_size) {
         tst->max_target_size = tb->size;
     }
-    if (tb->page_addr[1] != -1) {
+    if (tb_page_addr1(tb) != -1) {
         tst->cross_page++;
     }
     if (tb->jmp_reset_offset[0] != TB_JMP_RESET_OFFSET_INVALID) {
diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
index 8e78fd7a9c..061519691f 100644
--- a/accel/tcg/translator.c
+++ b/accel/tcg/translator.c
@@ -157,7 +157,7 @@ static void *translator_access(CPUArchState *env, DisasContextBase *db,
     tb = db->tb;
 
     /* Use slow path if first page is MMIO. */
-    if (unlikely(tb->page_addr[0] == -1)) {
+    if (unlikely(tb_page_addr0(tb) == -1)) {
         return NULL;
     }
 
@@ -169,13 +169,14 @@ static void *translator_access(CPUArchState *env, DisasContextBase *db,
         host = db->host_addr[1];
         base = TARGET_PAGE_ALIGN(db->pc_first);
         if (host == NULL) {
-            tb->page_addr[1] =
+            tb_page_addr_t phys_page =
                 get_page_addr_code_hostp(env, base, &db->host_addr[1]);
+            /* We cannot handle MMIO as second page. */
+            assert(phys_page != -1);
+            tb_set_page_addr1(tb, phys_page);
 #ifdef CONFIG_USER_ONLY
             page_protect(end);
 #endif
-            /* We cannot handle MMIO as second page. */
-            assert(tb->page_addr[1] != -1);
             host = db->host_addr[1];
         }
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 10/24] accel/tcg: Rename tb_invalidate_phys_page
  2022-10-06  3:10 [PATCH 00/24] accel/tcg: Rewrite user-only vma tracking Richard Henderson
                   ` (8 preceding siblings ...)
  2022-10-06  3:10 ` [PATCH 09/24] accel/tcg: Introduce tb_{set_}page_addr{0,1} Richard Henderson
@ 2022-10-06  3:10 ` Richard Henderson
  2022-10-25  9:49   ` Alex Bennée
  2022-10-06  3:11 ` [PATCH 11/24] accel/tcg: Rename tb_invalidate_phys_page_range and drop end parameter Richard Henderson
                   ` (14 subsequent siblings)
  24 siblings, 1 reply; 55+ messages in thread
From: Richard Henderson @ 2022-10-06  3:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, laurent, pbonzini, imp, f4bug

Rename to tb_invalidate_phys_page_unwind to emphasize that
we also detect invalidating the current TB, and also to free
up that name for other usage.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/internal.h      | 2 +-
 accel/tcg/tb-maint.c      | 2 +-
 accel/tcg/translate-all.c | 5 +++--
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/accel/tcg/internal.h b/accel/tcg/internal.h
index 1a704ee14f..1227bb69bd 100644
--- a/accel/tcg/internal.h
+++ b/accel/tcg/internal.h
@@ -105,7 +105,7 @@ void tb_htable_init(void);
 void tb_reset_jump(TranslationBlock *tb, int n);
 TranslationBlock *tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
                                tb_page_addr_t phys_page2);
-bool tb_invalidate_phys_page(tb_page_addr_t addr, uintptr_t pc);
+bool tb_invalidate_phys_page_unwind(tb_page_addr_t addr, uintptr_t pc);
 int cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
                               uintptr_t searched_pc, bool reset_icount);
 
diff --git a/accel/tcg/tb-maint.c b/accel/tcg/tb-maint.c
index 15ec2f741d..92170cbbc1 100644
--- a/accel/tcg/tb-maint.c
+++ b/accel/tcg/tb-maint.c
@@ -657,7 +657,7 @@ void tb_invalidate_phys_page_fast(struct page_collection *pages,
  * TB (because it was modified by this store and the guest CPU has
  * precise-SMC semantics).
  */
-bool tb_invalidate_phys_page(tb_page_addr_t addr, uintptr_t pc)
+bool tb_invalidate_phys_page_unwind(tb_page_addr_t addr, uintptr_t pc)
 {
     TranslationBlock *tb;
     PageDesc *p;
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index bef4c56cff..aa8d213514 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1382,7 +1382,7 @@ void page_set_flags(target_ulong start, target_ulong end, int flags)
         if (!(p->flags & PAGE_WRITE) &&
             (flags & PAGE_WRITE) &&
             p->first_tb) {
-            tb_invalidate_phys_page(addr, 0);
+            tb_invalidate_phys_page_unwind(addr, 0);
         }
         if (reset_target_data) {
             g_free(p->target_data);
@@ -1580,7 +1580,8 @@ int page_unprotect(target_ulong address, uintptr_t pc)
 
                 /* and since the content will be modified, we must invalidate
                    the corresponding translated code. */
-                current_tb_invalidated |= tb_invalidate_phys_page(addr, pc);
+                current_tb_invalidated |=
+                    tb_invalidate_phys_page_unwind(addr, pc);
             }
             mprotect((void *)g2h_untagged(host_start), qemu_host_page_size,
                      prot & PAGE_BITS);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 11/24] accel/tcg: Rename tb_invalidate_phys_page_range and drop end parameter
  2022-10-06  3:10 [PATCH 00/24] accel/tcg: Rewrite user-only vma tracking Richard Henderson
                   ` (9 preceding siblings ...)
  2022-10-06  3:10 ` [PATCH 10/24] accel/tcg: Rename tb_invalidate_phys_page Richard Henderson
@ 2022-10-06  3:11 ` Richard Henderson
  2022-10-25 13:21   ` Alex Bennée
  2022-10-06  3:11 ` [PATCH 12/24] accel/tcg: Unify declarations of tb_invalidate_phys_range Richard Henderson
                   ` (13 subsequent siblings)
  24 siblings, 1 reply; 55+ messages in thread
From: Richard Henderson @ 2022-10-06  3:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, laurent, pbonzini, imp, f4bug

This function is is never called with a real range,
only for a single page.  Drop the second parameter
and rename to tb_invalidate_phys_page.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/exec/translate-all.h |  2 +-
 accel/tcg/tb-maint.c         | 15 ++++++++-------
 cpu.c                        |  4 ++--
 3 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/include/exec/translate-all.h b/include/exec/translate-all.h
index 9f646389af..3e9cb91565 100644
--- a/include/exec/translate-all.h
+++ b/include/exec/translate-all.h
@@ -29,7 +29,7 @@ void page_collection_unlock(struct page_collection *set);
 void tb_invalidate_phys_page_fast(struct page_collection *pages,
                                   tb_page_addr_t start, int len,
                                   uintptr_t retaddr);
-void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end);
+void tb_invalidate_phys_page(tb_page_addr_t addr);
 void tb_check_watchpoint(CPUState *cpu, uintptr_t retaddr);
 
 #ifdef CONFIG_USER_ONLY
diff --git a/accel/tcg/tb-maint.c b/accel/tcg/tb-maint.c
index 92170cbbc1..bac43774c0 100644
--- a/accel/tcg/tb-maint.c
+++ b/accel/tcg/tb-maint.c
@@ -565,25 +565,26 @@ tb_invalidate_phys_page_range__locked(struct page_collection *pages,
 }
 
 /*
- * Invalidate all TBs which intersect with the target physical address range
- * [start;end[. NOTE: start and end must refer to the *same* physical page.
- * 'is_cpu_write_access' should be true if called from a real cpu write
- * access: the virtual CPU will exit the current TB if code is modified inside
- * this TB.
+ * Invalidate all TBs which intersect with the target physical
+ * address page @addr.
  *
  * Called with mmap_lock held for user-mode emulation
  */
-void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end)
+void tb_invalidate_phys_page(tb_page_addr_t addr)
 {
     struct page_collection *pages;
+    tb_page_addr_t start, end;
     PageDesc *p;
 
     assert_memory_lock();
 
-    p = page_find(start >> TARGET_PAGE_BITS);
+    p = page_find(addr >> TARGET_PAGE_BITS);
     if (p == NULL) {
         return;
     }
+
+    start = addr & TARGET_PAGE_MASK;
+    end = start + TARGET_PAGE_SIZE;
     pages = page_collection_lock(start, end);
     tb_invalidate_phys_page_range__locked(pages, p, start, end, 0);
     page_collection_unlock(pages);
diff --git a/cpu.c b/cpu.c
index 14365e36f3..2a09b05205 100644
--- a/cpu.c
+++ b/cpu.c
@@ -277,7 +277,7 @@ void list_cpus(const char *optarg)
 void tb_invalidate_phys_addr(target_ulong addr)
 {
     mmap_lock();
-    tb_invalidate_phys_page_range(addr, addr + 1);
+    tb_invalidate_phys_page(addr);
     mmap_unlock();
 }
 #else
@@ -298,7 +298,7 @@ void tb_invalidate_phys_addr(AddressSpace *as, hwaddr addr, MemTxAttrs attrs)
         return;
     }
     ram_addr = memory_region_get_ram_addr(mr) + addr;
-    tb_invalidate_phys_page_range(ram_addr, ram_addr + 1);
+    tb_invalidate_phys_page(ram_addr);
 }
 #endif
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 12/24] accel/tcg: Unify declarations of tb_invalidate_phys_range
  2022-10-06  3:10 [PATCH 00/24] accel/tcg: Rewrite user-only vma tracking Richard Henderson
                   ` (10 preceding siblings ...)
  2022-10-06  3:11 ` [PATCH 11/24] accel/tcg: Rename tb_invalidate_phys_page_range and drop end parameter Richard Henderson
@ 2022-10-06  3:11 ` Richard Henderson
  2022-10-25 13:24   ` Alex Bennée
  2022-10-06  3:11 ` [PATCH 13/24] accel/tcg: Use tb_invalidate_phys_page in page_set_flags Richard Henderson
                   ` (12 subsequent siblings)
  24 siblings, 1 reply; 55+ messages in thread
From: Richard Henderson @ 2022-10-06  3:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, laurent, pbonzini, imp, f4bug

We missed this function when we introduced tb_page_addr_t.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/exec/exec-all.h |  2 +-
 include/exec/ram_addr.h |  2 --
 accel/tcg/tb-maint.c    | 13 ++-----------
 3 files changed, 3 insertions(+), 14 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 5900f4637b..5ae484e34d 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -638,12 +638,12 @@ uint32_t curr_cflags(CPUState *cpu);
 /* TranslationBlock invalidate API */
 #if defined(CONFIG_USER_ONLY)
 void tb_invalidate_phys_addr(target_ulong addr);
-void tb_invalidate_phys_range(target_ulong start, target_ulong end);
 #else
 void tb_invalidate_phys_addr(AddressSpace *as, hwaddr addr, MemTxAttrs attrs);
 #endif
 void tb_flush(CPUState *cpu);
 void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr);
+void tb_invalidate_phys_range(tb_page_addr_t start, tb_page_addr_t end);
 void tb_set_jmp_target(TranslationBlock *tb, int n, uintptr_t addr);
 
 /* GETPC is the true target of the return instruction that we'll execute.  */
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index f3e0c78161..1500680458 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -147,8 +147,6 @@ static inline void qemu_ram_block_writeback(RAMBlock *block)
 #define DIRTY_CLIENTS_ALL     ((1 << DIRTY_MEMORY_NUM) - 1)
 #define DIRTY_CLIENTS_NOCODE  (DIRTY_CLIENTS_ALL & ~(1 << DIRTY_MEMORY_CODE))
 
-void tb_invalidate_phys_range(ram_addr_t start, ram_addr_t end);
-
 static inline bool cpu_physical_memory_get_dirty(ram_addr_t start,
                                                  ram_addr_t length,
                                                  unsigned client)
diff --git a/accel/tcg/tb-maint.c b/accel/tcg/tb-maint.c
index bac43774c0..c8e921089d 100644
--- a/accel/tcg/tb-maint.c
+++ b/accel/tcg/tb-maint.c
@@ -20,6 +20,7 @@
 #include "qemu/osdep.h"
 #include "exec/cputlb.h"
 #include "exec/log.h"
+#include "exec/exec-all.h"
 #include "exec/translate-all.h"
 #include "sysemu/tcg.h"
 #include "tcg/tcg.h"
@@ -27,12 +28,6 @@
 #include "tb-context.h"
 #include "internal.h"
 
-/* FIXME: tb_invalidate_phys_range is declared in different places. */
-#ifdef CONFIG_USER_ONLY
-#include "exec/exec-all.h"
-#else
-#include "exec/ram_addr.h"
-#endif
 
 static bool tb_cmp(const void *ap, const void *bp)
 {
@@ -599,11 +594,7 @@ void tb_invalidate_phys_page(tb_page_addr_t addr)
  *
  * Called with mmap_lock held for user-mode emulation.
  */
-#ifdef CONFIG_SOFTMMU
-void tb_invalidate_phys_range(ram_addr_t start, ram_addr_t end)
-#else
-void tb_invalidate_phys_range(target_ulong start, target_ulong end)
-#endif
+void tb_invalidate_phys_range(tb_page_addr_t start, tb_page_addr_t end)
 {
     struct page_collection *pages;
     tb_page_addr_t next;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 13/24] accel/tcg: Use tb_invalidate_phys_page in page_set_flags
  2022-10-06  3:10 [PATCH 00/24] accel/tcg: Rewrite user-only vma tracking Richard Henderson
                   ` (11 preceding siblings ...)
  2022-10-06  3:11 ` [PATCH 12/24] accel/tcg: Unify declarations of tb_invalidate_phys_range Richard Henderson
@ 2022-10-06  3:11 ` Richard Henderson
  2022-10-06  3:11 ` [PATCH 14/24] accel/tcg: Call tb_invalidate_phys_page for PAGE_RESET Richard Henderson
                   ` (11 subsequent siblings)
  24 siblings, 0 replies; 55+ messages in thread
From: Richard Henderson @ 2022-10-06  3:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, laurent, pbonzini, imp, f4bug

We do not require detection of overlapping TBs here,
so use the more appropriate function.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/translate-all.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index aa8d213514..8d5233fa9e 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1382,7 +1382,7 @@ void page_set_flags(target_ulong start, target_ulong end, int flags)
         if (!(p->flags & PAGE_WRITE) &&
             (flags & PAGE_WRITE) &&
             p->first_tb) {
-            tb_invalidate_phys_page_unwind(addr, 0);
+            tb_invalidate_phys_page(addr);
         }
         if (reset_target_data) {
             g_free(p->target_data);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 14/24] accel/tcg: Call tb_invalidate_phys_page for PAGE_RESET
  2022-10-06  3:10 [PATCH 00/24] accel/tcg: Rewrite user-only vma tracking Richard Henderson
                   ` (12 preceding siblings ...)
  2022-10-06  3:11 ` [PATCH 13/24] accel/tcg: Use tb_invalidate_phys_page in page_set_flags Richard Henderson
@ 2022-10-06  3:11 ` Richard Henderson
  2022-10-25 15:42   ` Alex Bennée
  2022-10-06  3:11 ` [PATCH 15/24] accel/tcg: Use interval tree for TBs in user-only mode Richard Henderson
                   ` (10 subsequent siblings)
  24 siblings, 1 reply; 55+ messages in thread
From: Richard Henderson @ 2022-10-06  3:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, laurent, pbonzini, imp, f4bug

When PAGE_RESET is set, we are replacing pages with new
content, which means that we need to invalidate existing
cached data, such as TranslationBlocks.  Perform the
reset invalidate while we're doing other invalidates,
which allows us to remove the separate invalidates from
the user-only mmap/munmap/mprotect routines.

In addition, restrict invalidation to PAGE_EXEC pages.
Since cdf713085131, we have validated PAGE_EXEC is present
before translation, which means we can assume that if the
bit is not present, there are no translations to invalidate.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/translate-all.c | 19 +++++++++++--------
 bsd-user/mmap.c           |  2 --
 linux-user/mmap.c         |  4 ----
 3 files changed, 11 insertions(+), 14 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 8d5233fa9e..478301f227 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1352,7 +1352,7 @@ int page_get_flags(target_ulong address)
 void page_set_flags(target_ulong start, target_ulong end, int flags)
 {
     target_ulong addr, len;
-    bool reset_target_data;
+    bool reset;
 
     /* This function should never be called with addresses outside the
        guest address space.  If this assert fires, it probably indicates
@@ -1369,7 +1369,7 @@ void page_set_flags(target_ulong start, target_ulong end, int flags)
     if (flags & PAGE_WRITE) {
         flags |= PAGE_WRITE_ORG;
     }
-    reset_target_data = !(flags & PAGE_VALID) || (flags & PAGE_RESET);
+    reset = !(flags & PAGE_VALID) || (flags & PAGE_RESET);
     flags &= ~PAGE_RESET;
 
     for (addr = start, len = end - start;
@@ -1377,14 +1377,17 @@ void page_set_flags(target_ulong start, target_ulong end, int flags)
          len -= TARGET_PAGE_SIZE, addr += TARGET_PAGE_SIZE) {
         PageDesc *p = page_find_alloc(addr >> TARGET_PAGE_BITS, true);
 
-        /* If the write protection bit is set, then we invalidate
-           the code inside.  */
-        if (!(p->flags & PAGE_WRITE) &&
-            (flags & PAGE_WRITE) &&
-            p->first_tb) {
+        /*
+         * If the page was executable, but is reset, or is no longer
+         * executable, or has become writable, then invalidate any code.
+         */
+        if ((p->flags & PAGE_EXEC)
+            && (reset ||
+                !(flags & PAGE_EXEC) ||
+                (flags & ~p->flags & PAGE_WRITE))) {
             tb_invalidate_phys_page(addr);
         }
-        if (reset_target_data) {
+        if (reset) {
             g_free(p->target_data);
             p->target_data = NULL;
             p->flags = flags;
diff --git a/bsd-user/mmap.c b/bsd-user/mmap.c
index e54e26de17..d6c5a344c9 100644
--- a/bsd-user/mmap.c
+++ b/bsd-user/mmap.c
@@ -663,7 +663,6 @@ abi_long target_mmap(abi_ulong start, abi_ulong len, int prot,
     page_dump(stdout);
     printf("\n");
 #endif
-    tb_invalidate_phys_range(start, start + len);
     mmap_unlock();
     return start;
 fail:
@@ -769,7 +768,6 @@ int target_munmap(abi_ulong start, abi_ulong len)
 
     if (ret == 0) {
         page_set_flags(start, start + len, 0);
-        tb_invalidate_phys_range(start, start + len);
     }
     mmap_unlock();
     return ret;
diff --git a/linux-user/mmap.c b/linux-user/mmap.c
index 28f3bc85ed..10f5079331 100644
--- a/linux-user/mmap.c
+++ b/linux-user/mmap.c
@@ -182,7 +182,6 @@ int target_mprotect(abi_ulong start, abi_ulong len, int target_prot)
     }
 
     page_set_flags(start, start + len, page_flags);
-    tb_invalidate_phys_range(start, start + len);
     ret = 0;
 
 error:
@@ -662,7 +661,6 @@ abi_long target_mmap(abi_ulong start, abi_ulong len, int target_prot,
             qemu_log_unlock(f);
         }
     }
-    tb_invalidate_phys_range(start, start + len);
     mmap_unlock();
     return start;
 fail:
@@ -766,7 +764,6 @@ int target_munmap(abi_ulong start, abi_ulong len)
 
     if (ret == 0) {
         page_set_flags(start, start + len, 0);
-        tb_invalidate_phys_range(start, start + len);
     }
     mmap_unlock();
     return ret;
@@ -856,7 +853,6 @@ abi_long target_mremap(abi_ulong old_addr, abi_ulong old_size,
         page_set_flags(new_addr, new_addr + new_size,
                        prot | PAGE_VALID | PAGE_RESET);
     }
-    tb_invalidate_phys_range(new_addr, new_addr + new_size);
     mmap_unlock();
     return new_addr;
 }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 15/24] accel/tcg: Use interval tree for TBs in user-only mode
  2022-10-06  3:10 [PATCH 00/24] accel/tcg: Rewrite user-only vma tracking Richard Henderson
                   ` (13 preceding siblings ...)
  2022-10-06  3:11 ` [PATCH 14/24] accel/tcg: Call tb_invalidate_phys_page for PAGE_RESET Richard Henderson
@ 2022-10-06  3:11 ` Richard Henderson
  2022-10-25 15:58   ` Alex Bennée
  2022-10-06  3:11 ` [PATCH 16/24] accel/tcg: Use page_reset_target_data in page_set_flags Richard Henderson
                   ` (9 subsequent siblings)
  24 siblings, 1 reply; 55+ messages in thread
From: Richard Henderson @ 2022-10-06  3:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, laurent, pbonzini, imp, f4bug

Begin weaning user-only away from PageDesc.

Since, for user-only, all TB (and page) manipulation is done with
a single mutex, and there is no virtual/physical discontinuity to
split a TB across discontinuous pages, place all of the TBs into
a single IntervalTree. This makes it trivial to find all of the
TBs intersecting a range.

Retain the existing PageDesc + linked list implementation for
system mode.  Move the portion of the implementation that overlaps
the new user-only code behind the common ifdef.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/internal.h      |  16 +-
 include/exec/exec-all.h   |  43 ++++-
 accel/tcg/tb-maint.c      | 388 ++++++++++++++++++++++----------------
 accel/tcg/translate-all.c |   4 +-
 4 files changed, 280 insertions(+), 171 deletions(-)

diff --git a/accel/tcg/internal.h b/accel/tcg/internal.h
index 1227bb69bd..1bd5a02911 100644
--- a/accel/tcg/internal.h
+++ b/accel/tcg/internal.h
@@ -24,14 +24,13 @@
 #endif
 
 typedef struct PageDesc {
-    /* list of TBs intersecting this ram page */
-    uintptr_t first_tb;
 #ifdef CONFIG_USER_ONLY
     unsigned long flags;
     void *target_data;
-#endif
-#ifdef CONFIG_SOFTMMU
+#else
     QemuSpin lock;
+    /* list of TBs intersecting this ram page */
+    uintptr_t first_tb;
 #endif
 } PageDesc;
 
@@ -69,9 +68,6 @@ static inline PageDesc *page_find(tb_page_addr_t index)
          tb; tb = (TranslationBlock *)tb->field[n], n = (uintptr_t)tb & 1, \
              tb = (TranslationBlock *)((uintptr_t)tb & ~1))
 
-#define PAGE_FOR_EACH_TB(pagedesc, tb, n)                       \
-    TB_FOR_EACH_TAGGED((pagedesc)->first_tb, tb, n, page_next)
-
 #define TB_FOR_EACH_JMP(head_tb, tb, n)                                 \
     TB_FOR_EACH_TAGGED((head_tb)->jmp_list_head, tb, n, jmp_list_next)
 
@@ -89,6 +85,12 @@ void do_assert_page_locked(const PageDesc *pd, const char *file, int line);
 #endif
 void page_lock(PageDesc *pd);
 void page_unlock(PageDesc *pd);
+
+/* TODO: For now, still shared with translate-all.c for system mode. */
+typedef int PageForEachNext;
+#define PAGE_FOR_EACH_TB(start, end, pagedesc, tb, n) \
+    TB_FOR_EACH_TAGGED((pagedesc)->first_tb, tb, n, page_next)
+
 #endif
 #if !defined(CONFIG_USER_ONLY) && defined(CONFIG_DEBUG_TCG)
 void assert_no_pages_locked(void);
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 5ae484e34d..793ef5ba4f 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -24,6 +24,7 @@
 #ifdef CONFIG_TCG
 #include "exec/cpu_ldst.h"
 #endif
+#include "qemu/interval-tree.h"
 
 /* allow to see translation results - the slowdown should be negligible, so we leave it */
 #define DEBUG_DISAS
@@ -552,11 +553,20 @@ struct TranslationBlock {
 
     struct tb_tc tc;
 
-    /* first and second physical page containing code. The lower bit
-       of the pointer tells the index in page_next[].
-       The list is protected by the TB's page('s) lock(s) */
+    /*
+     * Track tb_page_addr_t intervals that intersect this TB.
+     * For user-only, the virtual addresses are always contiguous,
+     * and we use a unified interval tree.  For system, we use a
+     * linked list headed in each PageDesc.  Within the list, the lsb
+     * of the previous pointer tells the index of page_next[], and the
+     * list is protected by the PageDesc lock(s).
+     */
+#ifdef CONFIG_USER_ONLY
+    IntervalTreeNode itree;
+#else
     uintptr_t page_next[2];
     tb_page_addr_t page_addr[2];
+#endif
 
     /* jmp_lock placed here to fill a 4-byte hole. Its documentation is below */
     QemuSpin jmp_lock;
@@ -612,24 +622,51 @@ static inline uint32_t tb_cflags(const TranslationBlock *tb)
 
 static inline tb_page_addr_t tb_page_addr0(const TranslationBlock *tb)
 {
+#ifdef CONFIG_USER_ONLY
+    return tb->itree.start;
+#else
     return tb->page_addr[0];
+#endif
 }
 
 static inline tb_page_addr_t tb_page_addr1(const TranslationBlock *tb)
 {
+#ifdef CONFIG_USER_ONLY
+    tb_page_addr_t next = tb->itree.last & TARGET_PAGE_MASK;
+    return next == (tb->itree.start & TARGET_PAGE_MASK) ? -1 : next;
+#else
     return tb->page_addr[1];
+#endif
 }
 
 static inline void tb_set_page_addr0(TranslationBlock *tb,
                                      tb_page_addr_t addr)
 {
+#ifdef CONFIG_USER_ONLY
+    tb->itree.start = addr;
+    /*
+     * To begin, we record an interval of one byte.  When the translation
+     * loop encounters a second page, the interval will be extended to
+     * include the first byte of the second page, which is sufficient to
+     * allow tb_page_addr1() above to work properly.  The final corrected
+     * interval will be set by tb_page_add() from tb->size before the
+     * node is added to the interval tree.
+     */
+    tb->itree.last = addr;
+#else
     tb->page_addr[0] = addr;
+#endif
 }
 
 static inline void tb_set_page_addr1(TranslationBlock *tb,
                                      tb_page_addr_t addr)
 {
+#ifdef CONFIG_USER_ONLY
+    /* Extend the interval to the first byte of the second page.  See above. */
+    tb->itree.last = addr;
+#else
     tb->page_addr[1] = addr;
+#endif
 }
 
 /* current cflags for hashing/comparison */
diff --git a/accel/tcg/tb-maint.c b/accel/tcg/tb-maint.c
index c8e921089d..14e8e47a6a 100644
--- a/accel/tcg/tb-maint.c
+++ b/accel/tcg/tb-maint.c
@@ -18,6 +18,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/interval-tree.h"
 #include "exec/cputlb.h"
 #include "exec/log.h"
 #include "exec/exec-all.h"
@@ -50,6 +51,75 @@ void tb_htable_init(void)
     qht_init(&tb_ctx.htable, tb_cmp, CODE_GEN_HTABLE_SIZE, mode);
 }
 
+#ifdef CONFIG_USER_ONLY
+/*
+ * For user-only, since we are protecting all of memory with a single lock,
+ * and because the two pages of a TranslationBlock are always contiguous,
+ * use a single data structure to record all TranslationBlocks.
+ */
+static IntervalTreeRoot tb_root;
+
+static void page_flush_tb(void)
+{
+    assert_memory_lock();
+    memset(&tb_root, 0, sizeof(tb_root));
+}
+
+/* Call with mmap_lock held. */
+static void tb_page_add(TranslationBlock *tb, PageDesc *p1, PageDesc *p2)
+{
+    /* translator_loop() must have made all TB pages non-writable */
+    assert(!(p1->flags & PAGE_WRITE));
+    if (p2) {
+        assert(!(p2->flags & PAGE_WRITE));
+    }
+
+    assert_memory_lock();
+
+    tb->itree.last = tb->itree.start + tb->size - 1;
+    interval_tree_insert(&tb->itree, &tb_root);
+}
+
+/* Call with mmap_lock held. */
+static void tb_page_remove(TranslationBlock *tb)
+{
+    assert_memory_lock();
+    interval_tree_remove(&tb->itree, &tb_root);
+}
+
+/* TODO: For now, still shared with translate-all.c for system mode. */
+#define PAGE_FOR_EACH_TB(start, end, pagedesc, T, N)    \
+    for (T = foreach_tb_first(start, end),              \
+         N = foreach_tb_next(T, start, end);            \
+         T != NULL;                                     \
+         T = N, N = foreach_tb_next(N, start, end))
+
+typedef TranslationBlock *PageForEachNext;
+
+static PageForEachNext foreach_tb_first(tb_page_addr_t start,
+                                        tb_page_addr_t end)
+{
+    IntervalTreeNode *n = interval_tree_iter_first(&tb_root, start, end - 1);
+    return n ? container_of(n, TranslationBlock, itree) : NULL;
+}
+
+static PageForEachNext foreach_tb_next(PageForEachNext tb,
+                                       tb_page_addr_t start,
+                                       tb_page_addr_t end)
+{
+    IntervalTreeNode *n;
+
+    if (tb) {
+        n = interval_tree_iter_next(&tb->itree, start, end - 1);
+        if (n) {
+            return container_of(n, TranslationBlock, itree);
+        }
+    }
+    return NULL;
+}
+
+#else
+
 /* Set to NULL all the 'first_tb' fields in all PageDescs. */
 static void page_flush_tb_1(int level, void **lp)
 {
@@ -84,6 +154,70 @@ static void page_flush_tb(void)
     }
 }
 
+/*
+ * Add the tb in the target page and protect it if necessary.
+ * Called with @p->lock held.
+ */
+static void tb_page_add(TranslationBlock *tb, PageDesc *p1, PageDesc *p2)
+{
+    /*
+     * If some code is already present, then the pages are already
+     * protected. So we handle the case where only the first TB is
+     * allocated in a physical page.
+     */
+    assert_page_locked(p1);
+    if (p1->first_tb) {
+        tb->page_next[0] = p1->first_tb;
+    } else {
+        tlb_protect_code(tb->page_addr[0] & TARGET_PAGE_MASK);
+        tb->page_next[0] = 0;
+    }
+    p1->first_tb = (uintptr_t)tb | 0;
+
+    if (unlikely(p2)) {
+        assert_page_locked(p2);
+        if (p2->first_tb) {
+            tb->page_next[1] = p2->first_tb;
+        } else {
+            tlb_protect_code(tb->page_addr[1] & TARGET_PAGE_MASK);
+            tb->page_next[1] = 0;
+        }
+        p2->first_tb = (uintptr_t)tb | 1;
+    }
+}
+
+static void tb_page_remove1(TranslationBlock *tb, PageDesc *pd)
+{
+    TranslationBlock *i;
+    PageForEachNext n;
+    uintptr_t *pprev;
+
+    assert_page_locked(pd);
+    pprev = &pd->first_tb;
+    PAGE_FOR_EACH_TB(unused, unused, pd, i, n) {
+        if (i == tb) {
+            *pprev = i->page_next[n];
+            return;
+        }
+        pprev = &i->page_next[n];
+    }
+    g_assert_not_reached();
+}
+
+static void tb_page_remove(TranslationBlock *tb)
+{
+    PageDesc *pd;
+
+    pd = page_find(tb->page_addr[0] >> TARGET_PAGE_BITS);
+    tb_page_remove1(tb, pd);
+    if (unlikely(tb->page_addr[1] != -1)) {
+        pd = page_find(tb->page_addr[1] >> TARGET_PAGE_BITS);
+        tb_page_remove1(tb, pd);
+    }
+}
+
+#endif
+
 /* flush all the translation blocks */
 static void do_tb_flush(CPUState *cpu, run_on_cpu_data tb_flush_count)
 {
@@ -128,28 +262,6 @@ void tb_flush(CPUState *cpu)
     }
 }
 
-/*
- * user-mode: call with mmap_lock held
- * !user-mode: call with @pd->lock held
- */
-static inline void tb_page_remove(PageDesc *pd, TranslationBlock *tb)
-{
-    TranslationBlock *tb1;
-    uintptr_t *pprev;
-    unsigned int n1;
-
-    assert_page_locked(pd);
-    pprev = &pd->first_tb;
-    PAGE_FOR_EACH_TB(pd, tb1, n1) {
-        if (tb1 == tb) {
-            *pprev = tb1->page_next[n1];
-            return;
-        }
-        pprev = &tb1->page_next[n1];
-    }
-    g_assert_not_reached();
-}
-
 /* remove @orig from its @n_orig-th jump list */
 static inline void tb_remove_from_jmp_list(TranslationBlock *orig, int n_orig)
 {
@@ -255,7 +367,6 @@ static void tb_jmp_cache_inval_tb(TranslationBlock *tb)
  */
 static void do_tb_phys_invalidate(TranslationBlock *tb, bool rm_from_page_list)
 {
-    PageDesc *p;
     uint32_t h;
     tb_page_addr_t phys_pc;
     uint32_t orig_cflags = tb_cflags(tb);
@@ -277,13 +388,7 @@ static void do_tb_phys_invalidate(TranslationBlock *tb, bool rm_from_page_list)
 
     /* remove the TB from the page list */
     if (rm_from_page_list) {
-        p = page_find(phys_pc >> TARGET_PAGE_BITS);
-        tb_page_remove(p, tb);
-        phys_pc = tb_page_addr1(tb);
-        if (phys_pc != -1) {
-            p = page_find(phys_pc >> TARGET_PAGE_BITS);
-            tb_page_remove(p, tb);
-        }
+        tb_page_remove(tb);
     }
 
     /* remove the TB from the hash list */
@@ -387,41 +492,6 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
     }
 }
 
-/*
- * Add the tb in the target page and protect it if necessary.
- * Called with mmap_lock held for user-mode emulation.
- * Called with @p->lock held in !user-mode.
- */
-static inline void tb_page_add(PageDesc *p, TranslationBlock *tb,
-                               unsigned int n, tb_page_addr_t page_addr)
-{
-#ifndef CONFIG_USER_ONLY
-    bool page_already_protected;
-#endif
-
-    assert_page_locked(p);
-
-    tb->page_next[n] = p->first_tb;
-#ifndef CONFIG_USER_ONLY
-    page_already_protected = p->first_tb != (uintptr_t)NULL;
-#endif
-    p->first_tb = (uintptr_t)tb | n;
-
-#if defined(CONFIG_USER_ONLY)
-    /* translator_loop() must have made all TB pages non-writable */
-    assert(!(p->flags & PAGE_WRITE));
-#else
-    /*
-     * If some code is already present, then the pages are already
-     * protected. So we handle the case where only the first TB is
-     * allocated in a physical page.
-     */
-    if (!page_already_protected) {
-        tlb_protect_code(page_addr);
-    }
-#endif
-}
-
 /*
  * Add a new TB and link it to the physical page tables. phys_page2 is
  * (-1) to indicate that only one page contains the TB.
@@ -453,10 +523,7 @@ TranslationBlock *tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
      * we can only insert TBs that are fully initialized.
      */
     page_lock_pair(&p, phys_pc, &p2, phys_page2, true);
-    tb_page_add(p, tb, 0, phys_pc);
-    if (p2) {
-        tb_page_add(p2, tb, 1, phys_page2);
-    }
+    tb_page_add(tb, p, p2);
 
     /* add in the hash table */
     h = tb_hash_func(phys_pc, (TARGET_TB_PCREL ? 0 : tb_pc(tb)),
@@ -465,10 +532,7 @@ TranslationBlock *tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
 
     /* remove TB from the page(s) if we couldn't insert it */
     if (unlikely(existing_tb)) {
-        tb_page_remove(p, tb);
-        if (p2) {
-            tb_page_remove(p2, tb);
-        }
+        tb_page_remove(tb);
         tb = existing_tb;
     }
 
@@ -479,6 +543,87 @@ TranslationBlock *tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
     return tb;
 }
 
+#ifdef CONFIG_USER_ONLY
+/*
+ * Invalidate all TBs which intersect with the target address range.
+ * Called with mmap_lock held for user-mode emulation.
+ * NOTE: this function must not be called while a TB is running.
+ */
+void tb_invalidate_phys_range(tb_page_addr_t start, tb_page_addr_t end)
+{
+    TranslationBlock *tb;
+    PageForEachNext n;
+
+    assert_memory_lock();
+
+    PAGE_FOR_EACH_TB(start, end, unused, tb, n) {
+        tb_phys_invalidate__locked(tb);
+    }
+}
+
+/*
+ * Invalidate all TBs which intersect with the target address page @addr.
+ * Called with mmap_lock held for user-mode emulation
+ * NOTE: this function must not be called while a TB is running.
+ */
+void tb_invalidate_phys_page(tb_page_addr_t addr)
+{
+    tb_page_addr_t start, end;
+
+    start = addr & TARGET_PAGE_MASK;
+    end = start + TARGET_PAGE_SIZE;
+    tb_invalidate_phys_range(start, end);
+}
+
+/*
+ * Called with mmap_lock held. If pc is not 0 then it indicates the
+ * host PC of the faulting store instruction that caused this invalidate.
+ * Returns true if the caller needs to abort execution of the current
+ * TB (because it was modified by this store and the guest CPU has
+ * precise-SMC semantics).
+ */
+bool tb_invalidate_phys_page_unwind(tb_page_addr_t addr, uintptr_t pc)
+{
+    assert(pc != 0);
+#ifdef TARGET_HAS_PRECISE_SMC
+    assert_memory_lock();
+    {
+        TranslationBlock *current_tb = tcg_tb_lookup(pc);
+        bool current_tb_modified = false;
+        TranslationBlock *tb;
+        PageForEachNext n;
+
+        addr &= TARGET_PAGE_MASK;
+
+        PAGE_FOR_EACH_TB(addr, addr + TARGET_PAGE_SIZE, unused, tb, n) {
+            if (current_tb == tb &&
+                (tb_cflags(current_tb) & CF_COUNT_MASK) != 1) {
+                /*
+                 * If we are modifying the current TB, we must stop its
+                 * execution. We could be more precise by checking that
+                 * the modification is after the current PC, but it would
+                 * require a specialized function to partially restore
+                 * the CPU state.
+                 */
+                current_tb_modified = true;
+                cpu_restore_state_from_tb(current_cpu, current_tb, pc, true);
+            }
+            tb_phys_invalidate__locked(tb);
+        }
+
+        if (current_tb_modified) {
+            /* Force execution of one insn next time.  */
+            CPUState *cpu = current_cpu;
+            cpu->cflags_next_tb = 1 | CF_NOIRQ | curr_cflags(current_cpu);
+            return true;
+        }
+    }
+#else
+    tb_invalidate_phys_page(addr);
+#endif /* TARGET_HAS_PRECISE_SMC */
+    return false;
+}
+#else
 /*
  * @p must be non-NULL.
  * user-mode: call with mmap_lock held.
@@ -492,22 +637,17 @@ tb_invalidate_phys_page_range__locked(struct page_collection *pages,
 {
     TranslationBlock *tb;
     tb_page_addr_t tb_start, tb_end;
-    int n;
+    PageForEachNext n;
 #ifdef TARGET_HAS_PRECISE_SMC
-    CPUState *cpu = current_cpu;
-    bool current_tb_not_found = retaddr != 0;
     bool current_tb_modified = false;
-    TranslationBlock *current_tb = NULL;
+    TranslationBlock *current_tb = retaddr ? tcg_tb_lookup(retaddr) : NULL;
 #endif /* TARGET_HAS_PRECISE_SMC */
 
-    assert_page_locked(p);
-
     /*
      * We remove all the TBs in the range [start, end[.
      * XXX: see if in some cases it could be faster to invalidate all the code
      */
-    PAGE_FOR_EACH_TB(p, tb, n) {
-        assert_page_locked(p);
+    PAGE_FOR_EACH_TB(start, end, p, tb, n) {
         /* NOTE: this is subtle as a TB may span two physical pages */
         if (n == 0) {
             /* NOTE: tb_end may be after the end of the page, but
@@ -521,11 +661,6 @@ tb_invalidate_phys_page_range__locked(struct page_collection *pages,
         }
         if (!(tb_end <= start || tb_start >= end)) {
 #ifdef TARGET_HAS_PRECISE_SMC
-            if (current_tb_not_found) {
-                current_tb_not_found = false;
-                /* now we have a real cpu fault */
-                current_tb = tcg_tb_lookup(retaddr);
-            }
             if (current_tb == tb &&
                 (tb_cflags(current_tb) & CF_COUNT_MASK) != 1) {
                 /*
@@ -536,25 +671,26 @@ tb_invalidate_phys_page_range__locked(struct page_collection *pages,
                  * restore the CPU state.
                  */
                 current_tb_modified = true;
-                cpu_restore_state_from_tb(cpu, current_tb, retaddr, true);
+                cpu_restore_state_from_tb(current_cpu, current_tb,
+                                          retaddr, true);
             }
 #endif /* TARGET_HAS_PRECISE_SMC */
             tb_phys_invalidate__locked(tb);
         }
     }
-#if !defined(CONFIG_USER_ONLY)
+
     /* if no code remaining, no need to continue to use slow writes */
     if (!p->first_tb) {
         tlb_unprotect_code(start);
     }
-#endif
+
 #ifdef TARGET_HAS_PRECISE_SMC
     if (current_tb_modified) {
         page_collection_unlock(pages);
         /* Force execution of one insn next time.  */
-        cpu->cflags_next_tb = 1 | CF_NOIRQ | curr_cflags(cpu);
+        current_cpu->cflags_next_tb = 1 | CF_NOIRQ | curr_cflags(current_cpu);
         mmap_unlock();
-        cpu_loop_exit_noexc(cpu);
+        cpu_loop_exit_noexc(current_cpu);
     }
 #endif
 }
@@ -571,8 +707,6 @@ void tb_invalidate_phys_page(tb_page_addr_t addr)
     tb_page_addr_t start, end;
     PageDesc *p;
 
-    assert_memory_lock();
-
     p = page_find(addr >> TARGET_PAGE_BITS);
     if (p == NULL) {
         return;
@@ -599,8 +733,6 @@ void tb_invalidate_phys_range(tb_page_addr_t start, tb_page_addr_t end)
     struct page_collection *pages;
     tb_page_addr_t next;
 
-    assert_memory_lock();
-
     pages = page_collection_lock(start, end);
     for (next = (start & TARGET_PAGE_MASK) + TARGET_PAGE_SIZE;
          start < end;
@@ -611,12 +743,12 @@ void tb_invalidate_phys_range(tb_page_addr_t start, tb_page_addr_t end)
         if (pd == NULL) {
             continue;
         }
+        assert_page_locked(pd);
         tb_invalidate_phys_page_range__locked(pages, pd, start, bound, 0);
     }
     page_collection_unlock(pages);
 }
 
-#ifdef CONFIG_SOFTMMU
 /*
  * len must be <= 8 and start must be a multiple of len.
  * Called via softmmu_template.h when code areas are written to with
@@ -630,8 +762,6 @@ void tb_invalidate_phys_page_fast(struct page_collection *pages,
 {
     PageDesc *p;
 
-    assert_memory_lock();
-
     p = page_find(start >> TARGET_PAGE_BITS);
     if (!p) {
         return;
@@ -641,64 +771,4 @@ void tb_invalidate_phys_page_fast(struct page_collection *pages,
     tb_invalidate_phys_page_range__locked(pages, p, start, start + len,
                                           retaddr);
 }
-#else
-/*
- * Called with mmap_lock held. If pc is not 0 then it indicates the
- * host PC of the faulting store instruction that caused this invalidate.
- * Returns true if the caller needs to abort execution of the current
- * TB (because it was modified by this store and the guest CPU has
- * precise-SMC semantics).
- */
-bool tb_invalidate_phys_page_unwind(tb_page_addr_t addr, uintptr_t pc)
-{
-    TranslationBlock *tb;
-    PageDesc *p;
-    int n;
-#ifdef TARGET_HAS_PRECISE_SMC
-    TranslationBlock *current_tb = NULL;
-    CPUState *cpu = current_cpu;
-    bool current_tb_modified = false;
-#endif
-
-    assert_memory_lock();
-
-    addr &= TARGET_PAGE_MASK;
-    p = page_find(addr >> TARGET_PAGE_BITS);
-    if (!p) {
-        return false;
-    }
-
-#ifdef TARGET_HAS_PRECISE_SMC
-    if (p->first_tb && pc != 0) {
-        current_tb = tcg_tb_lookup(pc);
-    }
-#endif
-    assert_page_locked(p);
-    PAGE_FOR_EACH_TB(p, tb, n) {
-#ifdef TARGET_HAS_PRECISE_SMC
-        if (current_tb == tb &&
-            (tb_cflags(current_tb) & CF_COUNT_MASK) != 1) {
-            /*
-             * If we are modifying the current TB, we must stop its execution.
-             * We could be more precise by checking that the modification is
-             * after the current PC, but it would require a specialized
-             * function to partially restore the CPU state.
-             */
-            current_tb_modified = true;
-            cpu_restore_state_from_tb(cpu, current_tb, pc, true);
-        }
-#endif /* TARGET_HAS_PRECISE_SMC */
-        tb_phys_invalidate(tb, addr);
-    }
-    p->first_tb = (uintptr_t)NULL;
-#ifdef TARGET_HAS_PRECISE_SMC
-    if (current_tb_modified) {
-        /* Force execution of one insn next time.  */
-        cpu->cflags_next_tb = 1 | CF_NOIRQ | curr_cflags(cpu);
-        return true;
-    }
-#endif
-
-    return false;
-}
-#endif
+#endif /* CONFIG_USER_ONLY */
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 478301f227..e002981a9f 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -686,7 +686,7 @@ page_collection_lock(tb_page_addr_t start, tb_page_addr_t end)
 
     for (index = start; index <= end; index++) {
         TranslationBlock *tb;
-        int n;
+        PageForEachNext n;
 
         pd = page_find(index);
         if (pd == NULL) {
@@ -697,7 +697,7 @@ page_collection_lock(tb_page_addr_t start, tb_page_addr_t end)
             goto retry;
         }
         assert_page_locked(pd);
-        PAGE_FOR_EACH_TB(pd, tb, n) {
+        PAGE_FOR_EACH_TB(unused, unused, pd, tb, n) {
             if (page_trylock_add(set, tb_page_addr0(tb)) ||
                 (tb_page_addr1(tb) != -1 &&
                  page_trylock_add(set, tb_page_addr1(tb)))) {
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 16/24] accel/tcg: Use page_reset_target_data in page_set_flags
  2022-10-06  3:10 [PATCH 00/24] accel/tcg: Rewrite user-only vma tracking Richard Henderson
                   ` (14 preceding siblings ...)
  2022-10-06  3:11 ` [PATCH 15/24] accel/tcg: Use interval tree for TBs in user-only mode Richard Henderson
@ 2022-10-06  3:11 ` Richard Henderson
  2022-10-25 16:12   ` Alex Bennée
  2022-10-06  3:11 ` [PATCH 17/24] accel/tcg: Use tb_invalidate_phys_range " Richard Henderson
                   ` (8 subsequent siblings)
  24 siblings, 1 reply; 55+ messages in thread
From: Richard Henderson @ 2022-10-06  3:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, laurent, pbonzini, imp, f4bug

Use the existing function for clearing target data.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/translate-all.c | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index e002981a9f..0006290694 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1370,6 +1370,9 @@ void page_set_flags(target_ulong start, target_ulong end, int flags)
         flags |= PAGE_WRITE_ORG;
     }
     reset = !(flags & PAGE_VALID) || (flags & PAGE_RESET);
+    if (reset) {
+        page_reset_target_data(start, end);
+    }
     flags &= ~PAGE_RESET;
 
     for (addr = start, len = end - start;
@@ -1387,14 +1390,8 @@ void page_set_flags(target_ulong start, target_ulong end, int flags)
                 (flags & ~p->flags & PAGE_WRITE))) {
             tb_invalidate_phys_page(addr);
         }
-        if (reset) {
-            g_free(p->target_data);
-            p->target_data = NULL;
-            p->flags = flags;
-        } else {
-            /* Using mprotect on a page does not change sticky bits. */
-            p->flags = (p->flags & PAGE_STICKY) | flags;
-        }
+        /* Using mprotect on a page does not change sticky bits. */
+        p->flags = (reset ? 0 : p->flags & PAGE_STICKY) | flags;
     }
 }
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 17/24] accel/tcg: Use tb_invalidate_phys_range in page_set_flags
  2022-10-06  3:10 [PATCH 00/24] accel/tcg: Rewrite user-only vma tracking Richard Henderson
                   ` (15 preceding siblings ...)
  2022-10-06  3:11 ` [PATCH 16/24] accel/tcg: Use page_reset_target_data in page_set_flags Richard Henderson
@ 2022-10-06  3:11 ` Richard Henderson
  2022-10-25 16:14   ` Alex Bennée
  2022-10-06  3:11 ` [PATCH 18/24] accel/tcg: Move TARGET_PAGE_DATA_SIZE impl to user-exec.c Richard Henderson
                   ` (7 subsequent siblings)
  24 siblings, 1 reply; 55+ messages in thread
From: Richard Henderson @ 2022-10-06  3:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, laurent, pbonzini, imp, f4bug

Flush translation blocks in bulk, rather than page-by-page.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/translate-all.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 0006290694..04401ceac7 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1352,7 +1352,7 @@ int page_get_flags(target_ulong address)
 void page_set_flags(target_ulong start, target_ulong end, int flags)
 {
     target_ulong addr, len;
-    bool reset;
+    bool reset, inval_tb = false;
 
     /* This function should never be called with addresses outside the
        guest address space.  If this assert fires, it probably indicates
@@ -1388,11 +1388,15 @@ void page_set_flags(target_ulong start, target_ulong end, int flags)
             && (reset ||
                 !(flags & PAGE_EXEC) ||
                 (flags & ~p->flags & PAGE_WRITE))) {
-            tb_invalidate_phys_page(addr);
+            inval_tb = true;
         }
         /* Using mprotect on a page does not change sticky bits. */
         p->flags = (reset ? 0 : p->flags & PAGE_STICKY) | flags;
     }
+
+    if (inval_tb) {
+        tb_invalidate_phys_range(start, end);
+    }
 }
 
 void page_reset_target_data(target_ulong start, target_ulong end)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 18/24] accel/tcg: Move TARGET_PAGE_DATA_SIZE impl to user-exec.c
  2022-10-06  3:10 [PATCH 00/24] accel/tcg: Rewrite user-only vma tracking Richard Henderson
                   ` (16 preceding siblings ...)
  2022-10-06  3:11 ` [PATCH 17/24] accel/tcg: Use tb_invalidate_phys_range " Richard Henderson
@ 2022-10-06  3:11 ` Richard Henderson
  2022-10-25 16:15   ` Alex Bennée
  2022-10-06  3:11 ` [PATCH 19/24] accel/tcg: Simplify page_get/alloc_target_data Richard Henderson
                   ` (6 subsequent siblings)
  24 siblings, 1 reply; 55+ messages in thread
From: Richard Henderson @ 2022-10-06  3:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, laurent, pbonzini, imp, f4bug

Since "target data" is always user-only, move it out of
translate-all.c to user-exec.c.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/translate-all.c | 50 ---------------------------------------
 accel/tcg/user-exec.c     | 50 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 50 insertions(+), 50 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 04401ceac7..dbd4eff0cf 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1399,56 +1399,6 @@ void page_set_flags(target_ulong start, target_ulong end, int flags)
     }
 }
 
-void page_reset_target_data(target_ulong start, target_ulong end)
-{
-#ifdef TARGET_PAGE_DATA_SIZE
-    target_ulong addr, len;
-
-    /*
-     * This function should never be called with addresses outside the
-     * guest address space.  If this assert fires, it probably indicates
-     * a missing call to h2g_valid.
-     */
-    assert(end - 1 <= GUEST_ADDR_MAX);
-    assert(start < end);
-    assert_memory_lock();
-
-    start = start & TARGET_PAGE_MASK;
-    end = TARGET_PAGE_ALIGN(end);
-
-    for (addr = start, len = end - start;
-         len != 0;
-         len -= TARGET_PAGE_SIZE, addr += TARGET_PAGE_SIZE) {
-        PageDesc *p = page_find_alloc(addr >> TARGET_PAGE_BITS, 1);
-
-        g_free(p->target_data);
-        p->target_data = NULL;
-    }
-#endif
-}
-
-#ifdef TARGET_PAGE_DATA_SIZE
-void *page_get_target_data(target_ulong address)
-{
-    PageDesc *p = page_find(address >> TARGET_PAGE_BITS);
-    return p ? p->target_data : NULL;
-}
-
-void *page_alloc_target_data(target_ulong address)
-{
-    PageDesc *p = page_find(address >> TARGET_PAGE_BITS);
-    void *ret = NULL;
-
-    if (p->flags & PAGE_VALID) {
-        ret = p->target_data;
-        if (!ret) {
-            p->target_data = ret = g_malloc0(TARGET_PAGE_DATA_SIZE);
-        }
-    }
-    return ret;
-}
-#endif /* TARGET_PAGE_DATA_SIZE */
-
 int page_check_range(target_ulong start, target_ulong len, int flags)
 {
     PageDesc *p;
diff --git a/accel/tcg/user-exec.c b/accel/tcg/user-exec.c
index 521aa8b61e..927b91900f 100644
--- a/accel/tcg/user-exec.c
+++ b/accel/tcg/user-exec.c
@@ -210,6 +210,56 @@ tb_page_addr_t get_page_addr_code_hostp(CPUArchState *env, target_ulong addr,
     return addr;
 }
 
+void page_reset_target_data(target_ulong start, target_ulong end)
+{
+#ifdef TARGET_PAGE_DATA_SIZE
+    target_ulong addr, len;
+
+    /*
+     * This function should never be called with addresses outside the
+     * guest address space.  If this assert fires, it probably indicates
+     * a missing call to h2g_valid.
+     */
+    assert(end - 1 <= GUEST_ADDR_MAX);
+    assert(start < end);
+    assert_memory_lock();
+
+    start = start & TARGET_PAGE_MASK;
+    end = TARGET_PAGE_ALIGN(end);
+
+    for (addr = start, len = end - start;
+         len != 0;
+         len -= TARGET_PAGE_SIZE, addr += TARGET_PAGE_SIZE) {
+        PageDesc *p = page_find_alloc(addr >> TARGET_PAGE_BITS, 1);
+
+        g_free(p->target_data);
+        p->target_data = NULL;
+    }
+#endif
+}
+
+#ifdef TARGET_PAGE_DATA_SIZE
+void *page_get_target_data(target_ulong address)
+{
+    PageDesc *p = page_find(address >> TARGET_PAGE_BITS);
+    return p ? p->target_data : NULL;
+}
+
+void *page_alloc_target_data(target_ulong address)
+{
+    PageDesc *p = page_find(address >> TARGET_PAGE_BITS);
+    void *ret = NULL;
+
+    if (p->flags & PAGE_VALID) {
+        ret = p->target_data;
+        if (!ret) {
+            p->target_data = ret = g_malloc0(TARGET_PAGE_DATA_SIZE);
+        }
+    }
+    return ret;
+}
+#endif
+
 /* The softmmu versions of these helpers are in cputlb.c.  */
 
 /*
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 19/24] accel/tcg: Simplify page_get/alloc_target_data
  2022-10-06  3:10 [PATCH 00/24] accel/tcg: Rewrite user-only vma tracking Richard Henderson
                   ` (17 preceding siblings ...)
  2022-10-06  3:11 ` [PATCH 18/24] accel/tcg: Move TARGET_PAGE_DATA_SIZE impl to user-exec.c Richard Henderson
@ 2022-10-06  3:11 ` Richard Henderson
  2022-10-25 16:19   ` Alex Bennée
  2022-10-06  3:11 ` [PATCH 20/24] accel/tcg: Use interval tree for TARGET_PAGE_DATA_SIZE Richard Henderson
                   ` (5 subsequent siblings)
  24 siblings, 1 reply; 55+ messages in thread
From: Richard Henderson @ 2022-10-06  3:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, laurent, pbonzini, imp, f4bug

Since the only user, Arm MTE, always requires allocation,
merge the get and alloc functions to always produce a
non-null result.  Also assume that the user has already
checked page validity.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/exec/cpu-all.h  | 21 ++++++---------------
 accel/tcg/user-exec.c   | 16 ++++------------
 target/arm/mte_helper.c |  4 ----
 3 files changed, 10 insertions(+), 31 deletions(-)

diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index 854adc4ac2..2eb1176538 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -281,27 +281,18 @@ void page_reset_target_data(target_ulong start, target_ulong end);
 int page_check_range(target_ulong start, target_ulong len, int flags);
 
 /**
- * page_alloc_target_data(address)
+ * page_get_target_data(address)
  * @address: guest virtual address
  *
- * Allocate TARGET_PAGE_DATA_SIZE bytes of out-of-band data to associate
- * with the guest page at @address.  If the page is not mapped, NULL will
- * be returned.  If there is existing data associated with @address,
- * no new memory will be allocated.
+ * Return TARGET_PAGE_DATA_SIZE bytes of out-of-band data to associate
+ * with the guest page at @address, allocating it if necessary.  The
+ * caller should already have verified that the address is valid.
  *
  * The memory will be freed when the guest page is deallocated,
  * e.g. with the munmap system call.
  */
-void *page_alloc_target_data(target_ulong address);
-
-/**
- * page_get_target_data(address)
- * @address: guest virtual address
- *
- * Return any out-of-bound memory assocated with the guest page
- * at @address, as per page_alloc_target_data.
- */
-void *page_get_target_data(target_ulong address);
+void *page_get_target_data(target_ulong address)
+    __attribute__((returns_nonnull));
 #endif
 
 CPUArchState *cpu_copy(CPUArchState *env);
diff --git a/accel/tcg/user-exec.c b/accel/tcg/user-exec.c
index 927b91900f..fb7d6ee9e9 100644
--- a/accel/tcg/user-exec.c
+++ b/accel/tcg/user-exec.c
@@ -242,19 +242,11 @@ void page_reset_target_data(target_ulong start, target_ulong end)
 void *page_get_target_data(target_ulong address)
 {
     PageDesc *p = page_find(address >> TARGET_PAGE_BITS);
-    return p ? p->target_data : NULL;
-}
+    void *ret = p->target_data;
 
-void *page_alloc_target_data(target_ulong address)
-{
-    PageDesc *p = page_find(address >> TARGET_PAGE_BITS);
-    void *ret = NULL;
-
-    if (p->flags & PAGE_VALID) {
-        ret = p->target_data;
-        if (!ret) {
-            p->target_data = ret = g_malloc0(TARGET_PAGE_DATA_SIZE);
-        }
+    if (!ret) {
+        ret = g_malloc0(TARGET_PAGE_DATA_SIZE);
+        p->target_data = ret;
     }
     return ret;
 }
diff --git a/target/arm/mte_helper.c b/target/arm/mte_helper.c
index 62d36c127f..d8eefabcd3 100644
--- a/target/arm/mte_helper.c
+++ b/target/arm/mte_helper.c
@@ -95,10 +95,6 @@ static uint8_t *allocation_tag_mem(CPUARMState *env, int ptr_mmu_idx,
     }
 
     tags = page_get_target_data(clean_ptr);
-    if (tags == NULL) {
-        tags = page_alloc_target_data(clean_ptr);
-        assert(tags != NULL);
-    }
 
     index = extract32(ptr, LOG2_TAG_GRANULE + 1,
                       TARGET_PAGE_BITS - LOG2_TAG_GRANULE - 1);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 20/24] accel/tcg: Use interval tree for TARGET_PAGE_DATA_SIZE
  2022-10-06  3:10 [PATCH 00/24] accel/tcg: Rewrite user-only vma tracking Richard Henderson
                   ` (18 preceding siblings ...)
  2022-10-06  3:11 ` [PATCH 19/24] accel/tcg: Simplify page_get/alloc_target_data Richard Henderson
@ 2022-10-06  3:11 ` Richard Henderson
  2022-10-25 19:30   ` Alex Bennée
  2022-10-06  3:11 ` [PATCH 21/24] accel/tcg: Move page_{get,set}_flags to user-exec.c Richard Henderson
                   ` (4 subsequent siblings)
  24 siblings, 1 reply; 55+ messages in thread
From: Richard Henderson @ 2022-10-06  3:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, laurent, pbonzini, imp, f4bug

Continue weaning user-only away from PageDesc.

Use an interval tree to record target data.
Chunk the data, to minimize allocation overhead.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/internal.h  |   1 -
 accel/tcg/user-exec.c | 110 ++++++++++++++++++++++++++++++++----------
 2 files changed, 85 insertions(+), 26 deletions(-)

diff --git a/accel/tcg/internal.h b/accel/tcg/internal.h
index 1bd5a02911..8731dc52e2 100644
--- a/accel/tcg/internal.h
+++ b/accel/tcg/internal.h
@@ -26,7 +26,6 @@
 typedef struct PageDesc {
 #ifdef CONFIG_USER_ONLY
     unsigned long flags;
-    void *target_data;
 #else
     QemuSpin lock;
     /* list of TBs intersecting this ram page */
diff --git a/accel/tcg/user-exec.c b/accel/tcg/user-exec.c
index fb7d6ee9e9..bce3d5f335 100644
--- a/accel/tcg/user-exec.c
+++ b/accel/tcg/user-exec.c
@@ -210,47 +210,107 @@ tb_page_addr_t get_page_addr_code_hostp(CPUArchState *env, target_ulong addr,
     return addr;
 }
 
+#ifdef TARGET_PAGE_DATA_SIZE
+/*
+ * Allocate chunks of target data together.  For the only current user,
+ * if we allocate one hunk per page, we have overhead of 40/128 or 40%.
+ * Therefore, allocate memory for 64 pages at a time for overhead < 1%.
+ */
+#define TPD_PAGES  64
+#define TBD_MASK   (TARGET_PAGE_MASK * TPD_PAGES)
+
+typedef struct TargetPageDataNode {
+    IntervalTreeNode itree;
+    char data[TPD_PAGES][TARGET_PAGE_DATA_SIZE] __attribute__((aligned));
+} TargetPageDataNode;
+
+static IntervalTreeRoot targetdata_root;
+
 void page_reset_target_data(target_ulong start, target_ulong end)
 {
-#ifdef TARGET_PAGE_DATA_SIZE
-    target_ulong addr, len;
+    IntervalTreeNode *n, *next;
+    target_ulong last;
 
-    /*
-     * This function should never be called with addresses outside the
-     * guest address space.  If this assert fires, it probably indicates
-     * a missing call to h2g_valid.
-     */
-    assert(end - 1 <= GUEST_ADDR_MAX);
-    assert(start < end);
     assert_memory_lock();
 
     start = start & TARGET_PAGE_MASK;
-    end = TARGET_PAGE_ALIGN(end);
+    last = TARGET_PAGE_ALIGN(end) - 1;
 
-    for (addr = start, len = end - start;
-         len != 0;
-         len -= TARGET_PAGE_SIZE, addr += TARGET_PAGE_SIZE) {
-        PageDesc *p = page_find_alloc(addr >> TARGET_PAGE_BITS, 1);
+    for (n = interval_tree_iter_first(&targetdata_root, start, last),
+         next = n ? interval_tree_iter_next(n, start, last) : NULL;
+         n != NULL;
+         n = next,
+         next = next ? interval_tree_iter_next(n, start, last) : NULL) {
+        target_ulong n_start, n_last, p_ofs, p_len;
+        TargetPageDataNode *t;
 
-        g_free(p->target_data);
-        p->target_data = NULL;
+        if (n->start >= start && n->last <= last) {
+            interval_tree_remove(n, &targetdata_root);
+            g_free(n);
+            continue;
+        }
+
+        if (n->start < start) {
+            n_start = start;
+            p_ofs = (start - n->start) >> TARGET_PAGE_BITS;
+        } else {
+            n_start = n->start;
+            p_ofs = 0;
+        }
+        n_last = MIN(last, n->last);
+        p_len = (n_last + 1 - n_start) >> TARGET_PAGE_BITS;
+
+        t = container_of(n, TargetPageDataNode, itree);
+        memset(t->data[p_ofs], 0, p_len * TARGET_PAGE_DATA_SIZE);
     }
-#endif
 }
 
-#ifdef TARGET_PAGE_DATA_SIZE
 void *page_get_target_data(target_ulong address)
 {
-    PageDesc *p = page_find(address >> TARGET_PAGE_BITS);
-    void *ret = p->target_data;
+    IntervalTreeNode *n;
+    TargetPageDataNode *t;
+    target_ulong page, region;
+    bool locked;
 
-    if (!ret) {
-        ret = g_malloc0(TARGET_PAGE_DATA_SIZE);
-        p->target_data = ret;
+    page = address & TARGET_PAGE_MASK;
+    region = address & TBD_MASK;
+
+    n = interval_tree_iter_first(&targetdata_root, page, page);
+    if (n) {
+        goto found;
     }
-    return ret;
+
+    /*
+     * See util/interval-tree.c re lockless lookups: no false positives but
+     * there are false negatives.  If we find nothing, retry with the mmap
+     * lock acquired.  We also need the lock for the allocation + insert.
+     */
+    locked = have_mmap_lock();
+    if (!locked) {
+        mmap_lock();
+        n = interval_tree_iter_first(&targetdata_root, page, page);
+        if (n) {
+            mmap_unlock();
+            goto found;
+        }
+    }
+
+    t = g_new0(TargetPageDataNode, 1);
+    n = &t->itree;
+    n->start = region;
+    n->last = region | ~TBD_MASK;
+    interval_tree_insert(n, &targetdata_root);
+    if (!locked) {
+        mmap_unlock();
+    }
+
+ found:
+    t = container_of(n, TargetPageDataNode, itree);
+    return t->data[(page - region) >> TARGET_PAGE_BITS];
 }
-#endif
+#else
+void page_reset_target_data(target_ulong start, target_ulong end) { }
+#endif /* TARGET_PAGE_DATA_SIZE */
 
 /* The softmmu versions of these helpers are in cputlb.c.  */
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 21/24] accel/tcg: Move page_{get,set}_flags to user-exec.c
  2022-10-06  3:10 [PATCH 00/24] accel/tcg: Rewrite user-only vma tracking Richard Henderson
                   ` (19 preceding siblings ...)
  2022-10-06  3:11 ` [PATCH 20/24] accel/tcg: Use interval tree for TARGET_PAGE_DATA_SIZE Richard Henderson
@ 2022-10-06  3:11 ` Richard Henderson
  2022-10-26 12:35   ` [PATCH 21/24] accel/tcg: Move page_{get, set}_flags " Alex Bennée
  2022-10-06  3:11 ` [PATCH 22/24] accel/tcg: Use interval tree for user-only page tracking Richard Henderson
                   ` (3 subsequent siblings)
  24 siblings, 1 reply; 55+ messages in thread
From: Richard Henderson @ 2022-10-06  3:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, laurent, pbonzini, imp, f4bug

This page tracking implementation is specific to user-only,
since the system softmmu version is in cputlb.c.  Move it
out of translate-all.c to user-exec.c.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/internal.h      |  17 ++
 accel/tcg/translate-all.c | 350 --------------------------------------
 accel/tcg/user-exec.c     | 346 +++++++++++++++++++++++++++++++++++++
 3 files changed, 363 insertions(+), 350 deletions(-)

diff --git a/accel/tcg/internal.h b/accel/tcg/internal.h
index 8731dc52e2..250f0daac9 100644
--- a/accel/tcg/internal.h
+++ b/accel/tcg/internal.h
@@ -33,6 +33,23 @@ typedef struct PageDesc {
 #endif
 } PageDesc;
 
+/*
+ * In system mode we want L1_MAP to be based on ram offsets,
+ * while in user mode we want it to be based on virtual addresses.
+ *
+ * TODO: For user mode, see the caveat re host vs guest virtual
+ * address spaces near GUEST_ADDR_MAX.
+ */
+#if !defined(CONFIG_USER_ONLY)
+#if HOST_LONG_BITS < TARGET_PHYS_ADDR_SPACE_BITS
+# define L1_MAP_ADDR_SPACE_BITS  HOST_LONG_BITS
+#else
+# define L1_MAP_ADDR_SPACE_BITS  TARGET_PHYS_ADDR_SPACE_BITS
+#endif
+#else
+# define L1_MAP_ADDR_SPACE_BITS  MIN(HOST_LONG_BITS, TARGET_ABI_BITS)
+#endif
+
 /* Size of the L2 (and L3, etc) page tables.  */
 #define V_L2_BITS 10
 #define V_L2_SIZE (1 << V_L2_BITS)
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index dbd4eff0cf..65e557592a 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -109,23 +109,6 @@ struct page_collection {
     struct page_entry *max;
 };
 
-/*
- * In system mode we want L1_MAP to be based on ram offsets,
- * while in user mode we want it to be based on virtual addresses.
- *
- * TODO: For user mode, see the caveat re host vs guest virtual
- * address spaces near GUEST_ADDR_MAX.
- */
-#if !defined(CONFIG_USER_ONLY)
-#if HOST_LONG_BITS < TARGET_PHYS_ADDR_SPACE_BITS
-# define L1_MAP_ADDR_SPACE_BITS  HOST_LONG_BITS
-#else
-# define L1_MAP_ADDR_SPACE_BITS  TARGET_PHYS_ADDR_SPACE_BITS
-#endif
-#else
-# define L1_MAP_ADDR_SPACE_BITS  MIN(HOST_LONG_BITS, TARGET_ABI_BITS)
-#endif
-
 /* Make sure all possible CPU event bits fit in tb->trace_vcpu_dstate */
 QEMU_BUILD_BUG_ON(CPU_TRACE_DSTATE_MAX_EVENTS >
                   sizeof_field(TranslationBlock, trace_vcpu_dstate)
@@ -1214,339 +1197,6 @@ void cpu_interrupt(CPUState *cpu, int mask)
     qatomic_set(&cpu_neg(cpu)->icount_decr.u16.high, -1);
 }
 
-/*
- * Walks guest process memory "regions" one by one
- * and calls callback function 'fn' for each region.
- */
-struct walk_memory_regions_data {
-    walk_memory_regions_fn fn;
-    void *priv;
-    target_ulong start;
-    int prot;
-};
-
-static int walk_memory_regions_end(struct walk_memory_regions_data *data,
-                                   target_ulong end, int new_prot)
-{
-    if (data->start != -1u) {
-        int rc = data->fn(data->priv, data->start, end, data->prot);
-        if (rc != 0) {
-            return rc;
-        }
-    }
-
-    data->start = (new_prot ? end : -1u);
-    data->prot = new_prot;
-
-    return 0;
-}
-
-static int walk_memory_regions_1(struct walk_memory_regions_data *data,
-                                 target_ulong base, int level, void **lp)
-{
-    target_ulong pa;
-    int i, rc;
-
-    if (*lp == NULL) {
-        return walk_memory_regions_end(data, base, 0);
-    }
-
-    if (level == 0) {
-        PageDesc *pd = *lp;
-
-        for (i = 0; i < V_L2_SIZE; ++i) {
-            int prot = pd[i].flags;
-
-            pa = base | (i << TARGET_PAGE_BITS);
-            if (prot != data->prot) {
-                rc = walk_memory_regions_end(data, pa, prot);
-                if (rc != 0) {
-                    return rc;
-                }
-            }
-        }
-    } else {
-        void **pp = *lp;
-
-        for (i = 0; i < V_L2_SIZE; ++i) {
-            pa = base | ((target_ulong)i <<
-                (TARGET_PAGE_BITS + V_L2_BITS * level));
-            rc = walk_memory_regions_1(data, pa, level - 1, pp + i);
-            if (rc != 0) {
-                return rc;
-            }
-        }
-    }
-
-    return 0;
-}
-
-int walk_memory_regions(void *priv, walk_memory_regions_fn fn)
-{
-    struct walk_memory_regions_data data;
-    uintptr_t i, l1_sz = v_l1_size;
-
-    data.fn = fn;
-    data.priv = priv;
-    data.start = -1u;
-    data.prot = 0;
-
-    for (i = 0; i < l1_sz; i++) {
-        target_ulong base = i << (v_l1_shift + TARGET_PAGE_BITS);
-        int rc = walk_memory_regions_1(&data, base, v_l2_levels, l1_map + i);
-        if (rc != 0) {
-            return rc;
-        }
-    }
-
-    return walk_memory_regions_end(&data, 0, 0);
-}
-
-static int dump_region(void *priv, target_ulong start,
-    target_ulong end, unsigned long prot)
-{
-    FILE *f = (FILE *)priv;
-
-    (void) fprintf(f, TARGET_FMT_lx"-"TARGET_FMT_lx
-        " "TARGET_FMT_lx" %c%c%c\n",
-        start, end, end - start,
-        ((prot & PAGE_READ) ? 'r' : '-'),
-        ((prot & PAGE_WRITE) ? 'w' : '-'),
-        ((prot & PAGE_EXEC) ? 'x' : '-'));
-
-    return 0;
-}
-
-/* dump memory mappings */
-void page_dump(FILE *f)
-{
-    const int length = sizeof(target_ulong) * 2;
-    (void) fprintf(f, "%-*s %-*s %-*s %s\n",
-            length, "start", length, "end", length, "size", "prot");
-    walk_memory_regions(f, dump_region);
-}
-
-int page_get_flags(target_ulong address)
-{
-    PageDesc *p;
-
-    p = page_find(address >> TARGET_PAGE_BITS);
-    if (!p) {
-        return 0;
-    }
-    return p->flags;
-}
-
-/*
- * Allow the target to decide if PAGE_TARGET_[12] may be reset.
- * By default, they are not kept.
- */
-#ifndef PAGE_TARGET_STICKY
-#define PAGE_TARGET_STICKY  0
-#endif
-#define PAGE_STICKY  (PAGE_ANON | PAGE_PASSTHROUGH | PAGE_TARGET_STICKY)
-
-/* Modify the flags of a page and invalidate the code if necessary.
-   The flag PAGE_WRITE_ORG is positioned automatically depending
-   on PAGE_WRITE.  The mmap_lock should already be held.  */
-void page_set_flags(target_ulong start, target_ulong end, int flags)
-{
-    target_ulong addr, len;
-    bool reset, inval_tb = false;
-
-    /* This function should never be called with addresses outside the
-       guest address space.  If this assert fires, it probably indicates
-       a missing call to h2g_valid.  */
-    assert(end - 1 <= GUEST_ADDR_MAX);
-    assert(start < end);
-    /* Only set PAGE_ANON with new mappings. */
-    assert(!(flags & PAGE_ANON) || (flags & PAGE_RESET));
-    assert_memory_lock();
-
-    start = start & TARGET_PAGE_MASK;
-    end = TARGET_PAGE_ALIGN(end);
-
-    if (flags & PAGE_WRITE) {
-        flags |= PAGE_WRITE_ORG;
-    }
-    reset = !(flags & PAGE_VALID) || (flags & PAGE_RESET);
-    if (reset) {
-        page_reset_target_data(start, end);
-    }
-    flags &= ~PAGE_RESET;
-
-    for (addr = start, len = end - start;
-         len != 0;
-         len -= TARGET_PAGE_SIZE, addr += TARGET_PAGE_SIZE) {
-        PageDesc *p = page_find_alloc(addr >> TARGET_PAGE_BITS, true);
-
-        /*
-         * If the page was executable, but is reset, or is no longer
-         * executable, or has become writable, then invalidate any code.
-         */
-        if ((p->flags & PAGE_EXEC)
-            && (reset ||
-                !(flags & PAGE_EXEC) ||
-                (flags & ~p->flags & PAGE_WRITE))) {
-            inval_tb = true;
-        }
-        /* Using mprotect on a page does not change sticky bits. */
-        p->flags = (reset ? 0 : p->flags & PAGE_STICKY) | flags;
-    }
-
-    if (inval_tb) {
-        tb_invalidate_phys_range(start, end);
-    }
-}
-
-int page_check_range(target_ulong start, target_ulong len, int flags)
-{
-    PageDesc *p;
-    target_ulong end;
-    target_ulong addr;
-
-    /* This function should never be called with addresses outside the
-       guest address space.  If this assert fires, it probably indicates
-       a missing call to h2g_valid.  */
-    if (TARGET_ABI_BITS > L1_MAP_ADDR_SPACE_BITS) {
-        assert(start < ((target_ulong)1 << L1_MAP_ADDR_SPACE_BITS));
-    }
-
-    if (len == 0) {
-        return 0;
-    }
-    if (start + len - 1 < start) {
-        /* We've wrapped around.  */
-        return -1;
-    }
-
-    /* must do before we loose bits in the next step */
-    end = TARGET_PAGE_ALIGN(start + len);
-    start = start & TARGET_PAGE_MASK;
-
-    for (addr = start, len = end - start;
-         len != 0;
-         len -= TARGET_PAGE_SIZE, addr += TARGET_PAGE_SIZE) {
-        p = page_find(addr >> TARGET_PAGE_BITS);
-        if (!p) {
-            return -1;
-        }
-        if (!(p->flags & PAGE_VALID)) {
-            return -1;
-        }
-
-        if ((flags & PAGE_READ) && !(p->flags & PAGE_READ)) {
-            return -1;
-        }
-        if (flags & PAGE_WRITE) {
-            if (!(p->flags & PAGE_WRITE_ORG)) {
-                return -1;
-            }
-            /* unprotect the page if it was put read-only because it
-               contains translated code */
-            if (!(p->flags & PAGE_WRITE)) {
-                if (!page_unprotect(addr, 0)) {
-                    return -1;
-                }
-            }
-        }
-    }
-    return 0;
-}
-
-void page_protect(tb_page_addr_t page_addr)
-{
-    target_ulong addr;
-    PageDesc *p;
-    int prot;
-
-    p = page_find(page_addr >> TARGET_PAGE_BITS);
-    if (p && (p->flags & PAGE_WRITE)) {
-        /*
-         * Force the host page as non writable (writes will have a page fault +
-         * mprotect overhead).
-         */
-        page_addr &= qemu_host_page_mask;
-        prot = 0;
-        for (addr = page_addr; addr < page_addr + qemu_host_page_size;
-             addr += TARGET_PAGE_SIZE) {
-
-            p = page_find(addr >> TARGET_PAGE_BITS);
-            if (!p) {
-                continue;
-            }
-            prot |= p->flags;
-            p->flags &= ~PAGE_WRITE;
-        }
-        mprotect(g2h_untagged(page_addr), qemu_host_page_size,
-                 (prot & PAGE_BITS) & ~PAGE_WRITE);
-    }
-}
-
-/* called from signal handler: invalidate the code and unprotect the
- * page. Return 0 if the fault was not handled, 1 if it was handled,
- * and 2 if it was handled but the caller must cause the TB to be
- * immediately exited. (We can only return 2 if the 'pc' argument is
- * non-zero.)
- */
-int page_unprotect(target_ulong address, uintptr_t pc)
-{
-    unsigned int prot;
-    bool current_tb_invalidated;
-    PageDesc *p;
-    target_ulong host_start, host_end, addr;
-
-    /* Technically this isn't safe inside a signal handler.  However we
-       know this only ever happens in a synchronous SEGV handler, so in
-       practice it seems to be ok.  */
-    mmap_lock();
-
-    p = page_find(address >> TARGET_PAGE_BITS);
-    if (!p) {
-        mmap_unlock();
-        return 0;
-    }
-
-    /* if the page was really writable, then we change its
-       protection back to writable */
-    if (p->flags & PAGE_WRITE_ORG) {
-        current_tb_invalidated = false;
-        if (p->flags & PAGE_WRITE) {
-            /* If the page is actually marked WRITE then assume this is because
-             * this thread raced with another one which got here first and
-             * set the page to PAGE_WRITE and did the TB invalidate for us.
-             */
-#ifdef TARGET_HAS_PRECISE_SMC
-            TranslationBlock *current_tb = tcg_tb_lookup(pc);
-            if (current_tb) {
-                current_tb_invalidated = tb_cflags(current_tb) & CF_INVALID;
-            }
-#endif
-        } else {
-            host_start = address & qemu_host_page_mask;
-            host_end = host_start + qemu_host_page_size;
-
-            prot = 0;
-            for (addr = host_start; addr < host_end; addr += TARGET_PAGE_SIZE) {
-                p = page_find(addr >> TARGET_PAGE_BITS);
-                p->flags |= PAGE_WRITE;
-                prot |= p->flags;
-
-                /* and since the content will be modified, we must invalidate
-                   the corresponding translated code. */
-                current_tb_invalidated |=
-                    tb_invalidate_phys_page_unwind(addr, pc);
-            }
-            mprotect((void *)g2h_untagged(host_start), qemu_host_page_size,
-                     prot & PAGE_BITS);
-        }
-        mmap_unlock();
-        /* If current TB was invalidated return to main loop */
-        return current_tb_invalidated ? 2 : 1;
-    }
-    mmap_unlock();
-    return 0;
-}
 #endif /* CONFIG_USER_ONLY */
 
 /*
diff --git a/accel/tcg/user-exec.c b/accel/tcg/user-exec.c
index bce3d5f335..b6050e2bfe 100644
--- a/accel/tcg/user-exec.c
+++ b/accel/tcg/user-exec.c
@@ -135,6 +135,352 @@ bool handle_sigsegv_accerr_write(CPUState *cpu, sigset_t *old_set,
     }
 }
 
+/*
+ * Walks guest process memory "regions" one by one
+ * and calls callback function 'fn' for each region.
+ */
+struct walk_memory_regions_data {
+    walk_memory_regions_fn fn;
+    void *priv;
+    target_ulong start;
+    int prot;
+};
+
+static int walk_memory_regions_end(struct walk_memory_regions_data *data,
+                                   target_ulong end, int new_prot)
+{
+    if (data->start != -1u) {
+        int rc = data->fn(data->priv, data->start, end, data->prot);
+        if (rc != 0) {
+            return rc;
+        }
+    }
+
+    data->start = (new_prot ? end : -1u);
+    data->prot = new_prot;
+
+    return 0;
+}
+
+static int walk_memory_regions_1(struct walk_memory_regions_data *data,
+                                 target_ulong base, int level, void **lp)
+{
+    target_ulong pa;
+    int i, rc;
+
+    if (*lp == NULL) {
+        return walk_memory_regions_end(data, base, 0);
+    }
+
+    if (level == 0) {
+        PageDesc *pd = *lp;
+
+        for (i = 0; i < V_L2_SIZE; ++i) {
+            int prot = pd[i].flags;
+
+            pa = base | (i << TARGET_PAGE_BITS);
+            if (prot != data->prot) {
+                rc = walk_memory_regions_end(data, pa, prot);
+                if (rc != 0) {
+                    return rc;
+                }
+            }
+        }
+    } else {
+        void **pp = *lp;
+
+        for (i = 0; i < V_L2_SIZE; ++i) {
+            pa = base | ((target_ulong)i <<
+                (TARGET_PAGE_BITS + V_L2_BITS * level));
+            rc = walk_memory_regions_1(data, pa, level - 1, pp + i);
+            if (rc != 0) {
+                return rc;
+            }
+        }
+    }
+
+    return 0;
+}
+
+int walk_memory_regions(void *priv, walk_memory_regions_fn fn)
+{
+    struct walk_memory_regions_data data;
+    uintptr_t i, l1_sz = v_l1_size;
+
+    data.fn = fn;
+    data.priv = priv;
+    data.start = -1u;
+    data.prot = 0;
+
+    for (i = 0; i < l1_sz; i++) {
+        target_ulong base = i << (v_l1_shift + TARGET_PAGE_BITS);
+        int rc = walk_memory_regions_1(&data, base, v_l2_levels, l1_map + i);
+        if (rc != 0) {
+            return rc;
+        }
+    }
+
+    return walk_memory_regions_end(&data, 0, 0);
+}
+
+static int dump_region(void *priv, target_ulong start,
+    target_ulong end, unsigned long prot)
+{
+    FILE *f = (FILE *)priv;
+
+    (void) fprintf(f, TARGET_FMT_lx"-"TARGET_FMT_lx
+        " "TARGET_FMT_lx" %c%c%c\n",
+        start, end, end - start,
+        ((prot & PAGE_READ) ? 'r' : '-'),
+        ((prot & PAGE_WRITE) ? 'w' : '-'),
+        ((prot & PAGE_EXEC) ? 'x' : '-'));
+
+    return 0;
+}
+
+/* dump memory mappings */
+void page_dump(FILE *f)
+{
+    const int length = sizeof(target_ulong) * 2;
+    (void) fprintf(f, "%-*s %-*s %-*s %s\n",
+            length, "start", length, "end", length, "size", "prot");
+    walk_memory_regions(f, dump_region);
+}
+
+int page_get_flags(target_ulong address)
+{
+    PageDesc *p;
+
+    p = page_find(address >> TARGET_PAGE_BITS);
+    if (!p) {
+        return 0;
+    }
+    return p->flags;
+}
+
+/*
+ * Allow the target to decide if PAGE_TARGET_[12] may be reset.
+ * By default, they are not kept.
+ */
+#ifndef PAGE_TARGET_STICKY
+#define PAGE_TARGET_STICKY  0
+#endif
+#define PAGE_STICKY  (PAGE_ANON | PAGE_PASSTHROUGH | PAGE_TARGET_STICKY)
+
+/*
+ * Modify the flags of a page and invalidate the code if necessary.
+ * The flag PAGE_WRITE_ORG is positioned automatically depending
+ * on PAGE_WRITE.  The mmap_lock should already be held.
+ */
+void page_set_flags(target_ulong start, target_ulong end, int flags)
+{
+    target_ulong addr, len;
+    bool reset, inval_tb = false;
+
+    /* This function should never be called with addresses outside the
+       guest address space.  If this assert fires, it probably indicates
+       a missing call to h2g_valid.  */
+    assert(end - 1 <= GUEST_ADDR_MAX);
+    assert(start < end);
+    /* Only set PAGE_ANON with new mappings. */
+    assert(!(flags & PAGE_ANON) || (flags & PAGE_RESET));
+    assert_memory_lock();
+
+    start = start & TARGET_PAGE_MASK;
+    end = TARGET_PAGE_ALIGN(end);
+
+    if (flags & PAGE_WRITE) {
+        flags |= PAGE_WRITE_ORG;
+    }
+    reset = !(flags & PAGE_VALID) || (flags & PAGE_RESET);
+    if (reset) {
+        page_reset_target_data(start, end);
+    }
+    flags &= ~PAGE_RESET;
+
+    for (addr = start, len = end - start;
+         len != 0;
+         len -= TARGET_PAGE_SIZE, addr += TARGET_PAGE_SIZE) {
+        PageDesc *p = page_find_alloc(addr >> TARGET_PAGE_BITS, true);
+
+        /*
+         * If the page was executable, but is reset, or is no longer
+         * executable, or has become writable, then invalidate any code.
+         */
+        if ((p->flags & PAGE_EXEC)
+            && (reset ||
+                !(flags & PAGE_EXEC) ||
+                (flags & ~p->flags & PAGE_WRITE))) {
+            inval_tb = true;
+        }
+        /* Using mprotect on a page does not change sticky bits. */
+        p->flags = (reset ? 0 : p->flags & PAGE_STICKY) | flags;
+    }
+
+    if (inval_tb) {
+        tb_invalidate_phys_range(start, end);
+    }
+}
+
+int page_check_range(target_ulong start, target_ulong len, int flags)
+{
+    PageDesc *p;
+    target_ulong end;
+    target_ulong addr;
+
+    /*
+     * This function should never be called with addresses outside the
+     * guest address space.  If this assert fires, it probably indicates
+     * a missing call to h2g_valid.
+     */
+    if (TARGET_ABI_BITS > L1_MAP_ADDR_SPACE_BITS) {
+        assert(start < ((target_ulong)1 << L1_MAP_ADDR_SPACE_BITS));
+    }
+
+    if (len == 0) {
+        return 0;
+    }
+    if (start + len - 1 < start) {
+        /* We've wrapped around.  */
+        return -1;
+    }
+
+    /* must do before we loose bits in the next step */
+    end = TARGET_PAGE_ALIGN(start + len);
+    start = start & TARGET_PAGE_MASK;
+
+    for (addr = start, len = end - start;
+         len != 0;
+         len -= TARGET_PAGE_SIZE, addr += TARGET_PAGE_SIZE) {
+        p = page_find(addr >> TARGET_PAGE_BITS);
+        if (!p) {
+            return -1;
+        }
+        if (!(p->flags & PAGE_VALID)) {
+            return -1;
+        }
+
+        if ((flags & PAGE_READ) && !(p->flags & PAGE_READ)) {
+            return -1;
+        }
+        if (flags & PAGE_WRITE) {
+            if (!(p->flags & PAGE_WRITE_ORG)) {
+                return -1;
+            }
+            /* unprotect the page if it was put read-only because it
+               contains translated code */
+            if (!(p->flags & PAGE_WRITE)) {
+                if (!page_unprotect(addr, 0)) {
+                    return -1;
+                }
+            }
+        }
+    }
+    return 0;
+}
+
+void page_protect(tb_page_addr_t page_addr)
+{
+    target_ulong addr;
+    PageDesc *p;
+    int prot;
+
+    p = page_find(page_addr >> TARGET_PAGE_BITS);
+    if (p && (p->flags & PAGE_WRITE)) {
+        /*
+         * Force the host page as non writable (writes will have a page fault +
+         * mprotect overhead).
+         */
+        page_addr &= qemu_host_page_mask;
+        prot = 0;
+        for (addr = page_addr; addr < page_addr + qemu_host_page_size;
+             addr += TARGET_PAGE_SIZE) {
+
+            p = page_find(addr >> TARGET_PAGE_BITS);
+            if (!p) {
+                continue;
+            }
+            prot |= p->flags;
+            p->flags &= ~PAGE_WRITE;
+        }
+        mprotect(g2h_untagged(page_addr), qemu_host_page_size,
+                 (prot & PAGE_BITS) & ~PAGE_WRITE);
+    }
+}
+
+/*
+ * Called from signal handler: invalidate the code and unprotect the
+ * page. Return 0 if the fault was not handled, 1 if it was handled,
+ * and 2 if it was handled but the caller must cause the TB to be
+ * immediately exited. (We can only return 2 if the 'pc' argument is
+ * non-zero.)
+ */
+int page_unprotect(target_ulong address, uintptr_t pc)
+{
+    unsigned int prot;
+    bool current_tb_invalidated;
+    PageDesc *p;
+    target_ulong host_start, host_end, addr;
+
+    /*
+     * Technically this isn't safe inside a signal handler.  However we
+     * know this only ever happens in a synchronous SEGV handler, so in
+     * practice it seems to be ok.
+     */
+    mmap_lock();
+
+    p = page_find(address >> TARGET_PAGE_BITS);
+    if (!p) {
+        mmap_unlock();
+        return 0;
+    }
+
+    /*
+     * If the page was really writable, then we change its
+     * protection back to writable.
+     */
+    if (p->flags & PAGE_WRITE_ORG) {
+        current_tb_invalidated = false;
+        if (p->flags & PAGE_WRITE) {
+            /*
+             * If the page is actually marked WRITE then assume this is because
+             * this thread raced with another one which got here first and
+             * set the page to PAGE_WRITE and did the TB invalidate for us.
+             */
+#ifdef TARGET_HAS_PRECISE_SMC
+            TranslationBlock *current_tb = tcg_tb_lookup(pc);
+            if (current_tb) {
+                current_tb_invalidated = tb_cflags(current_tb) & CF_INVALID;
+            }
+#endif
+        } else {
+            host_start = address & qemu_host_page_mask;
+            host_end = host_start + qemu_host_page_size;
+
+            prot = 0;
+            for (addr = host_start; addr < host_end; addr += TARGET_PAGE_SIZE) {
+                p = page_find(addr >> TARGET_PAGE_BITS);
+                p->flags |= PAGE_WRITE;
+                prot |= p->flags;
+
+                /*
+                 * Since the content will be modified, we must invalidate
+                 * the corresponding translated code.
+                 */
+                current_tb_invalidated |=
+                    tb_invalidate_phys_page_unwind(addr, pc);
+            }
+            mprotect((void *)g2h_untagged(host_start), qemu_host_page_size,
+                     prot & PAGE_BITS);
+        }
+        mmap_unlock();
+        /* If current TB was invalidated return to main loop */
+        return current_tb_invalidated ? 2 : 1;
+    }
+    mmap_unlock();
+    return 0;
+}
+
 static int probe_access_internal(CPUArchState *env, target_ulong addr,
                                  int fault_size, MMUAccessType access_type,
                                  bool nonfault, uintptr_t ra)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 22/24] accel/tcg: Use interval tree for user-only page tracking
  2022-10-06  3:10 [PATCH 00/24] accel/tcg: Rewrite user-only vma tracking Richard Henderson
                   ` (20 preceding siblings ...)
  2022-10-06  3:11 ` [PATCH 21/24] accel/tcg: Move page_{get,set}_flags to user-exec.c Richard Henderson
@ 2022-10-06  3:11 ` Richard Henderson
  2022-10-26 13:36   ` Alex Bennée
  2022-10-06  3:11 ` [PATCH 23/24] accel/tcg: Move PageDesc tree into tb-maint.c for system Richard Henderson
                   ` (2 subsequent siblings)
  24 siblings, 1 reply; 55+ messages in thread
From: Richard Henderson @ 2022-10-06  3:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, laurent, pbonzini, imp, f4bug

Finish weaning user-only away from PageDesc.

Using an interval tree to track page permissions means that
we can represent very large regions efficiently.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/290
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/967
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1214
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/internal.h           |   4 +-
 accel/tcg/tb-maint.c           |  20 +-
 accel/tcg/user-exec.c          | 614 ++++++++++++++++++++++-----------
 tests/tcg/multiarch/test-vma.c |  22 ++
 4 files changed, 451 insertions(+), 209 deletions(-)
 create mode 100644 tests/tcg/multiarch/test-vma.c

diff --git a/accel/tcg/internal.h b/accel/tcg/internal.h
index 250f0daac9..c7e157d1cd 100644
--- a/accel/tcg/internal.h
+++ b/accel/tcg/internal.h
@@ -24,9 +24,7 @@
 #endif
 
 typedef struct PageDesc {
-#ifdef CONFIG_USER_ONLY
-    unsigned long flags;
-#else
+#ifndef CONFIG_USER_ONLY
     QemuSpin lock;
     /* list of TBs intersecting this ram page */
     uintptr_t first_tb;
diff --git a/accel/tcg/tb-maint.c b/accel/tcg/tb-maint.c
index 14e8e47a6a..694440cb4a 100644
--- a/accel/tcg/tb-maint.c
+++ b/accel/tcg/tb-maint.c
@@ -68,15 +68,23 @@ static void page_flush_tb(void)
 /* Call with mmap_lock held. */
 static void tb_page_add(TranslationBlock *tb, PageDesc *p1, PageDesc *p2)
 {
-    /* translator_loop() must have made all TB pages non-writable */
-    assert(!(p1->flags & PAGE_WRITE));
-    if (p2) {
-        assert(!(p2->flags & PAGE_WRITE));
-    }
+    target_ulong addr;
+    int flags;
 
     assert_memory_lock();
-
     tb->itree.last = tb->itree.start + tb->size - 1;
+
+    /* translator_loop() must have made all TB pages non-writable */
+    addr = tb_page_addr0(tb);
+    flags = page_get_flags(addr);
+    assert(!(flags & PAGE_WRITE));
+
+    addr = tb_page_addr1(tb);
+    if (addr != -1) {
+        flags = page_get_flags(addr);
+        assert(!(flags & PAGE_WRITE));
+    }
+
     interval_tree_insert(&tb->itree, &tb_root);
 }
 
diff --git a/accel/tcg/user-exec.c b/accel/tcg/user-exec.c
index b6050e2bfe..3b47a9e26a 100644
--- a/accel/tcg/user-exec.c
+++ b/accel/tcg/user-exec.c
@@ -135,106 +135,61 @@ bool handle_sigsegv_accerr_write(CPUState *cpu, sigset_t *old_set,
     }
 }
 
-/*
- * Walks guest process memory "regions" one by one
- * and calls callback function 'fn' for each region.
- */
-struct walk_memory_regions_data {
-    walk_memory_regions_fn fn;
-    void *priv;
-    target_ulong start;
-    int prot;
-};
+typedef struct PageFlagsNode {
+    IntervalTreeNode itree;
+    int flags;
+} PageFlagsNode;
 
-static int walk_memory_regions_end(struct walk_memory_regions_data *data,
-                                   target_ulong end, int new_prot)
+static IntervalTreeRoot pageflags_root;
+
+static PageFlagsNode *pageflags_find(target_ulong start, target_long last)
 {
-    if (data->start != -1u) {
-        int rc = data->fn(data->priv, data->start, end, data->prot);
-        if (rc != 0) {
-            return rc;
-        }
-    }
+    IntervalTreeNode *n;
 
-    data->start = (new_prot ? end : -1u);
-    data->prot = new_prot;
-
-    return 0;
+    n = interval_tree_iter_first(&pageflags_root, start, last);
+    return n ? container_of(n, PageFlagsNode, itree) : NULL;
 }
 
-static int walk_memory_regions_1(struct walk_memory_regions_data *data,
-                                 target_ulong base, int level, void **lp)
+static PageFlagsNode *pageflags_next(PageFlagsNode *p, target_ulong start,
+                                     target_long last)
 {
-    target_ulong pa;
-    int i, rc;
+    IntervalTreeNode *n;
 
-    if (*lp == NULL) {
-        return walk_memory_regions_end(data, base, 0);
-    }
-
-    if (level == 0) {
-        PageDesc *pd = *lp;
-
-        for (i = 0; i < V_L2_SIZE; ++i) {
-            int prot = pd[i].flags;
-
-            pa = base | (i << TARGET_PAGE_BITS);
-            if (prot != data->prot) {
-                rc = walk_memory_regions_end(data, pa, prot);
-                if (rc != 0) {
-                    return rc;
-                }
-            }
-        }
-    } else {
-        void **pp = *lp;
-
-        for (i = 0; i < V_L2_SIZE; ++i) {
-            pa = base | ((target_ulong)i <<
-                (TARGET_PAGE_BITS + V_L2_BITS * level));
-            rc = walk_memory_regions_1(data, pa, level - 1, pp + i);
-            if (rc != 0) {
-                return rc;
-            }
-        }
-    }
-
-    return 0;
+    n = interval_tree_iter_next(&p->itree, start, last);
+    return n ? container_of(n, PageFlagsNode, itree) : NULL;
 }
 
 int walk_memory_regions(void *priv, walk_memory_regions_fn fn)
 {
-    struct walk_memory_regions_data data;
-    uintptr_t i, l1_sz = v_l1_size;
+    IntervalTreeNode *n;
+    int rc = 0;
 
-    data.fn = fn;
-    data.priv = priv;
-    data.start = -1u;
-    data.prot = 0;
+    mmap_lock();
+    for (n = interval_tree_iter_first(&pageflags_root, 0, -1);
+         n != NULL;
+         n = interval_tree_iter_next(n, 0, -1)) {
+        PageFlagsNode *p = container_of(n, PageFlagsNode, itree);
 
-    for (i = 0; i < l1_sz; i++) {
-        target_ulong base = i << (v_l1_shift + TARGET_PAGE_BITS);
-        int rc = walk_memory_regions_1(&data, base, v_l2_levels, l1_map + i);
+        rc = fn(priv, n->start, n->last + 1, p->flags);
         if (rc != 0) {
-            return rc;
+            break;
         }
     }
+    mmap_unlock();
 
-    return walk_memory_regions_end(&data, 0, 0);
+    return rc;
 }
 
 static int dump_region(void *priv, target_ulong start,
-    target_ulong end, unsigned long prot)
+                       target_ulong end, unsigned long prot)
 {
     FILE *f = (FILE *)priv;
 
-    (void) fprintf(f, TARGET_FMT_lx"-"TARGET_FMT_lx
-        " "TARGET_FMT_lx" %c%c%c\n",
-        start, end, end - start,
-        ((prot & PAGE_READ) ? 'r' : '-'),
-        ((prot & PAGE_WRITE) ? 'w' : '-'),
-        ((prot & PAGE_EXEC) ? 'x' : '-'));
-
+    fprintf(f, TARGET_FMT_lx"-"TARGET_FMT_lx" "TARGET_FMT_lx" %c%c%c\n",
+            start, end, end - start,
+            ((prot & PAGE_READ) ? 'r' : '-'),
+            ((prot & PAGE_WRITE) ? 'w' : '-'),
+            ((prot & PAGE_EXEC) ? 'x' : '-'));
     return 0;
 }
 
@@ -242,22 +197,134 @@ static int dump_region(void *priv, target_ulong start,
 void page_dump(FILE *f)
 {
     const int length = sizeof(target_ulong) * 2;
-    (void) fprintf(f, "%-*s %-*s %-*s %s\n",
+
+    fprintf(f, "%-*s %-*s %-*s %s\n",
             length, "start", length, "end", length, "size", "prot");
     walk_memory_regions(f, dump_region);
 }
 
 int page_get_flags(target_ulong address)
 {
-    PageDesc *p;
+    PageFlagsNode *p = pageflags_find(address, address);
 
-    p = page_find(address >> TARGET_PAGE_BITS);
+    /*
+     * See util/interval-tree.c re lockless lookups: no false positives but
+     * there are false negatives.  If we find nothing, retry with the mmap
+     * lock acquired.
+     */
     if (!p) {
-        return 0;
+        if (have_mmap_lock()) {
+            return 0;
+        }
+        mmap_lock();
+        p = pageflags_find(address, address);
+        mmap_unlock();
+        if (!p) {
+            return 0;
+        }
     }
     return p->flags;
 }
 
+/* A subroutine of page_set_flags: insert a new node for [start,last]. */
+static void pageflags_create(target_ulong start, target_ulong last, int flags)
+{
+    PageFlagsNode *p = g_new(PageFlagsNode, 1);
+
+    p->itree.start = start;
+    p->itree.last = last;
+    p->flags = flags;
+    interval_tree_insert(&p->itree, &pageflags_root);
+}
+
+/* A subroutine of page_set_flags: remove everything in [start,last]. */
+static bool pageflags_unset(target_ulong start, target_ulong last)
+{
+    bool inval_tb = false;
+
+    while (true) {
+        PageFlagsNode *p = pageflags_find(start, last);
+        target_ulong p_last;
+
+        if (!p) {
+            break;
+        }
+
+        if (p->flags & PAGE_EXEC) {
+            inval_tb = true;
+        }
+
+        interval_tree_remove(&p->itree, &pageflags_root);
+        p_last = p->itree.last;
+
+        if (p->itree.start < start) {
+            /* Truncate the node from the end, or split out the middle. */
+            p->itree.last = start - 1;
+            interval_tree_insert(&p->itree, &pageflags_root);
+            if (last < p_last) {
+                pageflags_create(last + 1, p_last, p->flags);
+                break;
+            }
+        } else if (p_last <= last) {
+            /* Range completely covers node -- remove it. */
+            g_free(p);
+        } else {
+            /* Truncate the node from the start. */
+            p->itree.start = last + 1;
+            interval_tree_insert(&p->itree, &pageflags_root);
+            break;
+        }
+    }
+
+    return inval_tb;
+}
+
+/*
+ * A subroutine of page_set_flags: nothing overlaps [start,last],
+ * but check adjacent mappings and maybe merge into a single range.
+ */
+static void pageflags_create_merge(target_ulong start, target_ulong last,
+                                   int flags)
+{
+    PageFlagsNode *next = NULL, *prev = NULL;
+
+    if (start > 0) {
+        prev = pageflags_find(start - 1, start - 1);
+        if (prev) {
+            if (prev->flags == flags) {
+                interval_tree_remove(&prev->itree, &pageflags_root);
+            } else {
+                prev = NULL;
+            }
+        }
+    }
+    if (last + 1 != 0) {
+        next = pageflags_find(last + 1, last + 1);
+        if (next) {
+            if (next->flags == flags) {
+                interval_tree_remove(&next->itree, &pageflags_root);
+            } else {
+                next = NULL;
+            }
+        }
+    }
+
+    if (prev) {
+        if (next) {
+            prev->itree.last = next->itree.last;
+            g_free(next);
+        } else {
+            prev->itree.last = last;
+        }
+        interval_tree_insert(&prev->itree, &pageflags_root);
+    } else if (next) {
+        next->itree.start = start;
+        interval_tree_insert(&next->itree, &pageflags_root);
+    } else {
+        pageflags_create(start, last, flags);
+    }
+}
+
 /*
  * Allow the target to decide if PAGE_TARGET_[12] may be reset.
  * By default, they are not kept.
@@ -267,6 +334,146 @@ int page_get_flags(target_ulong address)
 #endif
 #define PAGE_STICKY  (PAGE_ANON | PAGE_PASSTHROUGH | PAGE_TARGET_STICKY)
 
+/* A subroutine of page_set_flags: add flags to [start,last]. */
+static bool pageflags_set_clear(target_ulong start, target_ulong last,
+                                int set_flags, int clear_flags)
+{
+    PageFlagsNode *p;
+    target_ulong p_start, p_last;
+    int p_flags, merge_flags;
+    bool inval_tb = false;
+
+ restart:
+    p = pageflags_find(start, last);
+    if (!p) {
+        if (set_flags) {
+            pageflags_create_merge(start, last, set_flags);
+        }
+        goto done;
+    }
+
+    p_start = p->itree.start;
+    p_last = p->itree.last;
+    p_flags = p->flags;
+    /* Using mprotect on a page does not change sticky bits. */
+    merge_flags = (p_flags & ~clear_flags) | set_flags;
+
+    /*
+     * Need to flush if an overlapping executable region
+     * removes exec, or adds write.
+     */
+    if ((p_flags & PAGE_EXEC)
+        && (!(merge_flags & PAGE_EXEC)
+            || (merge_flags & ~p_flags & PAGE_WRITE))) {
+        inval_tb = true;
+    }
+
+    /*
+     * If there is an exact range match, update and return without
+     * attempting to merge with adjacent regions.
+     */
+    if (start == p_start && last == p_last) {
+        if (merge_flags) {
+            p->flags = merge_flags;
+        } else {
+            interval_tree_remove(&p->itree, &pageflags_root);
+            g_free(p);
+        }
+        goto done;
+    }
+
+    /*
+     * If sticky bits affect the original mapping, then we must be more
+     * careful about the existing intervals and the separate flags.
+     */
+    if (set_flags != merge_flags) {
+        if (p_start < start) {
+            interval_tree_remove(&p->itree, &pageflags_root);
+            p->itree.last = start - 1;
+            interval_tree_insert(&p->itree, &pageflags_root);
+
+            if (last < p_last) {
+                if (merge_flags) {
+                    pageflags_create(start, last, merge_flags);
+                }
+                pageflags_create(last + 1, p_last, p_flags);
+            } else {
+                if (merge_flags) {
+                    pageflags_create(start, p_last, merge_flags);
+                }
+                if (p_last < last) {
+                    start = p_last + 1;
+                    goto restart;
+                }
+            }
+        } else {
+            if (start < p_start && set_flags) {
+                pageflags_create(start, p_start - 1, set_flags);
+            }
+            if (last < p_last) {
+                interval_tree_remove(&p->itree, &pageflags_root);
+                p->itree.start = last + 1;
+                interval_tree_insert(&p->itree, &pageflags_root);
+                if (merge_flags) {
+                    pageflags_create(start, last, merge_flags);
+                }
+            } else {
+                if (merge_flags) {
+                    p->flags = merge_flags;
+                } else {
+                    interval_tree_remove(&p->itree, &pageflags_root);
+                    g_free(p);
+                }
+                if (p_last < last) {
+                    start = p_last + 1;
+                    goto restart;
+                }
+            }
+        }
+        goto done;
+    }
+
+    /* If flags are not changing for this range, incorporate it. */
+    if (set_flags == p_flags) {
+        if (start < p_start) {
+            interval_tree_remove(&p->itree, &pageflags_root);
+            p->itree.start = start;
+            interval_tree_insert(&p->itree, &pageflags_root);
+        }
+        if (p_last < last) {
+            start = p_last + 1;
+            goto restart;
+        }
+        goto done;
+    }
+
+    /* Maybe split out head and/or tail ranges with the original flags. */
+    interval_tree_remove(&p->itree, &pageflags_root);
+    if (p_start < start) {
+        p->itree.last = start - 1;
+        interval_tree_insert(&p->itree, &pageflags_root);
+
+        if (p_last < last) {
+            goto restart;
+        }
+        if (last < p_last) {
+            pageflags_create(last + 1, p_last, p_flags);
+        }
+    } else if (last < p_last) {
+        p->itree.start = last + 1;
+        interval_tree_insert(&p->itree, &pageflags_root);
+    } else {
+        g_free(p);
+        goto restart;
+    }
+    if (set_flags) {
+        pageflags_create(start, last, set_flags);
+    }
+
+ done:
+    return inval_tb;
+}
+
 /*
  * Modify the flags of a page and invalidate the code if necessary.
  * The flag PAGE_WRITE_ORG is positioned automatically depending
@@ -274,49 +481,41 @@ int page_get_flags(target_ulong address)
  */
 void page_set_flags(target_ulong start, target_ulong end, int flags)
 {
-    target_ulong addr, len;
-    bool reset, inval_tb = false;
+    target_ulong last;
+    bool reset = false;
+    bool inval_tb = false;
 
     /* This function should never be called with addresses outside the
        guest address space.  If this assert fires, it probably indicates
        a missing call to h2g_valid.  */
-    assert(end - 1 <= GUEST_ADDR_MAX);
     assert(start < end);
+    assert(end - 1 <= GUEST_ADDR_MAX);
     /* Only set PAGE_ANON with new mappings. */
     assert(!(flags & PAGE_ANON) || (flags & PAGE_RESET));
     assert_memory_lock();
 
     start = start & TARGET_PAGE_MASK;
     end = TARGET_PAGE_ALIGN(end);
+    last = end - 1;
 
-    if (flags & PAGE_WRITE) {
-        flags |= PAGE_WRITE_ORG;
-    }
-    reset = !(flags & PAGE_VALID) || (flags & PAGE_RESET);
-    if (reset) {
-        page_reset_target_data(start, end);
-    }
-    flags &= ~PAGE_RESET;
-
-    for (addr = start, len = end - start;
-         len != 0;
-         len -= TARGET_PAGE_SIZE, addr += TARGET_PAGE_SIZE) {
-        PageDesc *p = page_find_alloc(addr >> TARGET_PAGE_BITS, true);
-
-        /*
-         * If the page was executable, but is reset, or is no longer
-         * executable, or has become writable, then invalidate any code.
-         */
-        if ((p->flags & PAGE_EXEC)
-            && (reset ||
-                !(flags & PAGE_EXEC) ||
-                (flags & ~p->flags & PAGE_WRITE))) {
-            inval_tb = true;
+    if (!(flags & PAGE_VALID)) {
+        flags = 0;
+    } else {
+        reset = flags & PAGE_RESET;
+        flags &= ~PAGE_RESET;
+        if (flags & PAGE_WRITE) {
+            flags |= PAGE_WRITE_ORG;
         }
-        /* Using mprotect on a page does not change sticky bits. */
-        p->flags = (reset ? 0 : p->flags & PAGE_STICKY) | flags;
     }
 
+    if (!flags || reset) {
+        page_reset_target_data(start, end);
+        inval_tb |= pageflags_unset(start, last);
+    }
+    if (flags) {
+        inval_tb |= pageflags_set_clear(start, last, flags,
+                                        ~(reset ? 0 : PAGE_STICKY));
+    }
     if (inval_tb) {
         tb_invalidate_phys_range(start, end);
     }
@@ -324,87 +523,89 @@ void page_set_flags(target_ulong start, target_ulong end, int flags)
 
 int page_check_range(target_ulong start, target_ulong len, int flags)
 {
-    PageDesc *p;
-    target_ulong end;
-    target_ulong addr;
-
-    /*
-     * This function should never be called with addresses outside the
-     * guest address space.  If this assert fires, it probably indicates
-     * a missing call to h2g_valid.
-     */
-    if (TARGET_ABI_BITS > L1_MAP_ADDR_SPACE_BITS) {
-        assert(start < ((target_ulong)1 << L1_MAP_ADDR_SPACE_BITS));
-    }
+    target_ulong last;
 
     if (len == 0) {
-        return 0;
-    }
-    if (start + len - 1 < start) {
-        /* We've wrapped around.  */
-        return -1;
+        return 0;  /* trivial length */
     }
 
-    /* must do before we loose bits in the next step */
-    end = TARGET_PAGE_ALIGN(start + len);
-    start = start & TARGET_PAGE_MASK;
+    last = start + len - 1;
+    if (last < start) {
+        return -1; /* wrap around */
+    }
+
+    while (true) {
+        PageFlagsNode *p = pageflags_find(start, last);
+        int missing;
 
-    for (addr = start, len = end - start;
-         len != 0;
-         len -= TARGET_PAGE_SIZE, addr += TARGET_PAGE_SIZE) {
-        p = page_find(addr >> TARGET_PAGE_BITS);
         if (!p) {
-            return -1;
+            return -1; /* entire region invalid */
         }
-        if (!(p->flags & PAGE_VALID)) {
-            return -1;
+        if (start < p->itree.start) {
+            return -1; /* initial bytes invalid */
         }
 
-        if ((flags & PAGE_READ) && !(p->flags & PAGE_READ)) {
-            return -1;
+        missing = flags & ~p->flags;
+        if (missing & PAGE_READ) {
+            return -1; /* page not readable */
         }
-        if (flags & PAGE_WRITE) {
+        if (missing & PAGE_WRITE) {
             if (!(p->flags & PAGE_WRITE_ORG)) {
+                return -1; /* page not writable */
+            }
+            /* Asking about writable, but has been protected: undo. */
+            if (!page_unprotect(start, 0)) {
                 return -1;
             }
-            /* unprotect the page if it was put read-only because it
-               contains translated code */
-            if (!(p->flags & PAGE_WRITE)) {
-                if (!page_unprotect(addr, 0)) {
-                    return -1;
-                }
+            /* TODO: page_unprotect should take a range, not a single page. */
+            if (last - start < TARGET_PAGE_SIZE) {
+                return 0; /* ok */
             }
+            start += TARGET_PAGE_SIZE;
+            continue;
         }
+
+        if (last <= p->itree.last) {
+            return 0; /* ok */
+        }
+        start = p->itree.last + 1;
     }
-    return 0;
 }
 
-void page_protect(tb_page_addr_t page_addr)
+void page_protect(tb_page_addr_t address)
 {
-    target_ulong addr;
-    PageDesc *p;
+    PageFlagsNode *p;
+    target_ulong start, last;
     int prot;
 
-    p = page_find(page_addr >> TARGET_PAGE_BITS);
-    if (p && (p->flags & PAGE_WRITE)) {
-        /*
-         * Force the host page as non writable (writes will have a page fault +
-         * mprotect overhead).
-         */
-        page_addr &= qemu_host_page_mask;
-        prot = 0;
-        for (addr = page_addr; addr < page_addr + qemu_host_page_size;
-             addr += TARGET_PAGE_SIZE) {
+    assert_memory_lock();
 
-            p = page_find(addr >> TARGET_PAGE_BITS);
-            if (!p) {
-                continue;
-            }
+    if (qemu_host_page_size <= TARGET_PAGE_SIZE) {
+        start = address & TARGET_PAGE_MASK;
+        last = start + TARGET_PAGE_SIZE - 1;
+    } else {
+        start = address & qemu_host_page_mask;
+        last = start + qemu_host_page_size - 1;
+    }
+
+    p = pageflags_find(start, last);
+    if (!p) {
+        return;
+    }
+    prot = p->flags;
+
+    if (unlikely(p->itree.last < last)) {
+        /* More than one protection region covers the one host page. */
+        assert(TARGET_PAGE_SIZE < qemu_host_page_size);
+        while ((p = pageflags_next(p, start, last)) != NULL) {
             prot |= p->flags;
-            p->flags &= ~PAGE_WRITE;
         }
-        mprotect(g2h_untagged(page_addr), qemu_host_page_size,
-                 (prot & PAGE_BITS) & ~PAGE_WRITE);
+    }
+
+    if (prot & PAGE_WRITE) {
+        pageflags_set_clear(start, last, 0, PAGE_WRITE);
+        mprotect(g2h_untagged(start), qemu_host_page_size,
+                 prot & (PAGE_READ | PAGE_EXEC) ? PROT_READ : PROT_NONE);
     }
 }
 
@@ -417,10 +618,8 @@ void page_protect(tb_page_addr_t page_addr)
  */
 int page_unprotect(target_ulong address, uintptr_t pc)
 {
-    unsigned int prot;
+    PageFlagsNode *p;
     bool current_tb_invalidated;
-    PageDesc *p;
-    target_ulong host_start, host_end, addr;
 
     /*
      * Technically this isn't safe inside a signal handler.  However we
@@ -429,40 +628,54 @@ int page_unprotect(target_ulong address, uintptr_t pc)
      */
     mmap_lock();
 
-    p = page_find(address >> TARGET_PAGE_BITS);
-    if (!p) {
+    p = pageflags_find(address, address);
+
+    /* If this address was not really writable, nothing to do. */
+    if (!p || !(p->flags & PAGE_WRITE_ORG)) {
         mmap_unlock();
         return 0;
     }
 
-    /*
-     * If the page was really writable, then we change its
-     * protection back to writable.
-     */
-    if (p->flags & PAGE_WRITE_ORG) {
-        current_tb_invalidated = false;
-        if (p->flags & PAGE_WRITE) {
-            /*
-             * If the page is actually marked WRITE then assume this is because
-             * this thread raced with another one which got here first and
-             * set the page to PAGE_WRITE and did the TB invalidate for us.
-             */
+    current_tb_invalidated = false;
+    if (p->flags & PAGE_WRITE) {
+        /*
+         * If the page is actually marked WRITE then assume this is because
+         * this thread raced with another one which got here first and
+         * set the page to PAGE_WRITE and did the TB invalidate for us.
+         */
 #ifdef TARGET_HAS_PRECISE_SMC
-            TranslationBlock *current_tb = tcg_tb_lookup(pc);
-            if (current_tb) {
-                current_tb_invalidated = tb_cflags(current_tb) & CF_INVALID;
-            }
+        TranslationBlock *current_tb = tcg_tb_lookup(pc);
+        if (current_tb) {
+            current_tb_invalidated = tb_cflags(current_tb) & CF_INVALID;
+        }
 #endif
+    } else {
+        target_ulong start, len, i;
+        int prot;
+
+        if (qemu_host_page_size <= TARGET_PAGE_SIZE) {
+            start = address & TARGET_PAGE_MASK;
+            len = TARGET_PAGE_SIZE;
+            prot = p->flags | PAGE_WRITE;
+            pageflags_set_clear(start, start + len - 1, PAGE_WRITE, 0);
+            current_tb_invalidated = tb_invalidate_phys_page_unwind(start, pc);
         } else {
-            host_start = address & qemu_host_page_mask;
-            host_end = host_start + qemu_host_page_size;
-
+            start = address & qemu_host_page_mask;
+            len = qemu_host_page_size;
             prot = 0;
-            for (addr = host_start; addr < host_end; addr += TARGET_PAGE_SIZE) {
-                p = page_find(addr >> TARGET_PAGE_BITS);
-                p->flags |= PAGE_WRITE;
-                prot |= p->flags;
 
+            for (i = 0; i < len; i += TARGET_PAGE_SIZE) {
+                target_ulong addr = start + i;
+
+                p = pageflags_find(addr, addr);
+                if (p) {
+                    prot |= p->flags;
+                    if (p->flags & PAGE_WRITE_ORG) {
+                        prot |= PAGE_WRITE;
+                        pageflags_set_clear(addr, addr + TARGET_PAGE_SIZE - 1,
+                                            PAGE_WRITE, 0);
+                    }
+                }
                 /*
                  * Since the content will be modified, we must invalidate
                  * the corresponding translated code.
@@ -470,15 +683,16 @@ int page_unprotect(target_ulong address, uintptr_t pc)
                 current_tb_invalidated |=
                     tb_invalidate_phys_page_unwind(addr, pc);
             }
-            mprotect((void *)g2h_untagged(host_start), qemu_host_page_size,
-                     prot & PAGE_BITS);
         }
-        mmap_unlock();
-        /* If current TB was invalidated return to main loop */
-        return current_tb_invalidated ? 2 : 1;
+        if (prot & PAGE_EXEC) {
+            prot = (prot & ~PAGE_EXEC) | PAGE_READ;
+        }
+        mprotect((void *)g2h_untagged(start), len, prot & PAGE_BITS);
     }
     mmap_unlock();
-    return 0;
+
+    /* If current TB was invalidated return to main loop */
+    return current_tb_invalidated ? 2 : 1;
 }
 
 static int probe_access_internal(CPUArchState *env, target_ulong addr,
diff --git a/tests/tcg/multiarch/test-vma.c b/tests/tcg/multiarch/test-vma.c
new file mode 100644
index 0000000000..2893d60334
--- /dev/null
+++ b/tests/tcg/multiarch/test-vma.c
@@ -0,0 +1,22 @@
+/*
+ * Test very large vma allocations.
+ * The qemu out-of-memory condition was within the mmap syscall itself.
+ * If the syscall actually returns with MAP_FAILED, the test succeeded.
+ */
+#include <sys/mman.h>
+
+int main()
+{
+    int n = sizeof(size_t) == 4 ? 32 : 45;
+
+    for (int i = 28; i < n; i++) {
+        size_t l = (size_t)1 << i;
+        void *p = mmap(0, l, PROT_NONE,
+                       MAP_PRIVATE | MAP_ANONYMOUS | MAP_NORESERVE, -1, 0);
+        if (p == MAP_FAILED) {
+            break;
+        }
+        munmap(p, l);
+    }
+    return 0;
+}
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 23/24] accel/tcg: Move PageDesc tree into tb-maint.c for system
  2022-10-06  3:10 [PATCH 00/24] accel/tcg: Rewrite user-only vma tracking Richard Henderson
                   ` (21 preceding siblings ...)
  2022-10-06  3:11 ` [PATCH 22/24] accel/tcg: Use interval tree for user-only page tracking Richard Henderson
@ 2022-10-06  3:11 ` Richard Henderson
  2022-10-06  3:11 ` [PATCH 24/24] accel/tcg: Move remainder of page locking to tb-maint.c Richard Henderson
  2022-10-24 23:05 ` [PATCH 00/24] accel/tcg: Rewrite user-only vma tracking Richard Henderson
  24 siblings, 0 replies; 55+ messages in thread
From: Richard Henderson @ 2022-10-06  3:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, laurent, pbonzini, imp, f4bug

Now that PageDesc is not used for user-only, and for system
it is only used for tb maintenance, move the implementation
into tb-main.c appropriately ifdefed.

We have not yet eliminated all references to PageDesc for
user-only, so retain a typedef to the structure without definition.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/internal.h      |  49 +++-----------
 accel/tcg/tb-maint.c      | 130 ++++++++++++++++++++++++++++++++++++--
 accel/tcg/translate-all.c |  95 ----------------------------
 3 files changed, 134 insertions(+), 140 deletions(-)

diff --git a/accel/tcg/internal.h b/accel/tcg/internal.h
index c7e157d1cd..c6c9e02cfd 100644
--- a/accel/tcg/internal.h
+++ b/accel/tcg/internal.h
@@ -23,51 +23,13 @@
 #define assert_memory_lock() tcg_debug_assert(have_mmap_lock())
 #endif
 
-typedef struct PageDesc {
+typedef struct PageDesc PageDesc;
 #ifndef CONFIG_USER_ONLY
+struct PageDesc {
     QemuSpin lock;
     /* list of TBs intersecting this ram page */
     uintptr_t first_tb;
-#endif
-} PageDesc;
-
-/*
- * In system mode we want L1_MAP to be based on ram offsets,
- * while in user mode we want it to be based on virtual addresses.
- *
- * TODO: For user mode, see the caveat re host vs guest virtual
- * address spaces near GUEST_ADDR_MAX.
- */
-#if !defined(CONFIG_USER_ONLY)
-#if HOST_LONG_BITS < TARGET_PHYS_ADDR_SPACE_BITS
-# define L1_MAP_ADDR_SPACE_BITS  HOST_LONG_BITS
-#else
-# define L1_MAP_ADDR_SPACE_BITS  TARGET_PHYS_ADDR_SPACE_BITS
-#endif
-#else
-# define L1_MAP_ADDR_SPACE_BITS  MIN(HOST_LONG_BITS, TARGET_ABI_BITS)
-#endif
-
-/* Size of the L2 (and L3, etc) page tables.  */
-#define V_L2_BITS 10
-#define V_L2_SIZE (1 << V_L2_BITS)
-
-/*
- * L1 Mapping properties
- */
-extern int v_l1_size;
-extern int v_l1_shift;
-extern int v_l2_levels;
-
-/*
- * The bottom level has pointers to PageDesc, and is indexed by
- * anything from 4 to (V_L2_BITS + 3) bits, depending on target page size.
- */
-#define V_L1_MIN_BITS 4
-#define V_L1_MAX_BITS (V_L2_BITS + 3)
-#define V_L1_MAX_SIZE (1 << V_L1_MAX_BITS)
-
-extern void *l1_map[V_L1_MAX_SIZE];
+};
 
 PageDesc *page_find_alloc(tb_page_addr_t index, bool alloc);
 
@@ -76,6 +38,11 @@ static inline PageDesc *page_find(tb_page_addr_t index)
     return page_find_alloc(index, false);
 }
 
+void page_table_config_init(void);
+#else
+static inline void page_table_config_init(void) { }
+#endif
+
 /* list iterators for lists of tagged pointers in TranslationBlock */
 #define TB_FOR_EACH_TAGGED(head, tb, n, field)                          \
     for (n = (head) & 1, tb = (TranslationBlock *)((head) & ~1);        \
diff --git a/accel/tcg/tb-maint.c b/accel/tcg/tb-maint.c
index 694440cb4a..31d0a74aa9 100644
--- a/accel/tcg/tb-maint.c
+++ b/accel/tcg/tb-maint.c
@@ -127,6 +127,121 @@ static PageForEachNext foreach_tb_next(PageForEachNext tb,
 }
 
 #else
+/*
+ * In system mode we want L1_MAP to be based on ram offsets.
+ */
+#if HOST_LONG_BITS < TARGET_PHYS_ADDR_SPACE_BITS
+# define L1_MAP_ADDR_SPACE_BITS  HOST_LONG_BITS
+#else
+# define L1_MAP_ADDR_SPACE_BITS  TARGET_PHYS_ADDR_SPACE_BITS
+#endif
+
+/* Size of the L2 (and L3, etc) page tables.  */
+#define V_L2_BITS 10
+#define V_L2_SIZE (1 << V_L2_BITS)
+
+/*
+ * L1 Mapping properties
+ */
+static int v_l1_size;
+static int v_l1_shift;
+static int v_l2_levels;
+
+/*
+ * The bottom level has pointers to PageDesc, and is indexed by
+ * anything from 4 to (V_L2_BITS + 3) bits, depending on target page size.
+ */
+#define V_L1_MIN_BITS 4
+#define V_L1_MAX_BITS (V_L2_BITS + 3)
+#define V_L1_MAX_SIZE (1 << V_L1_MAX_BITS)
+
+static void *l1_map[V_L1_MAX_SIZE];
+
+void page_table_config_init(void)
+{
+    uint32_t v_l1_bits;
+
+    assert(TARGET_PAGE_BITS);
+    /* The bits remaining after N lower levels of page tables.  */
+    v_l1_bits = (L1_MAP_ADDR_SPACE_BITS - TARGET_PAGE_BITS) % V_L2_BITS;
+    if (v_l1_bits < V_L1_MIN_BITS) {
+        v_l1_bits += V_L2_BITS;
+    }
+
+    v_l1_size = 1 << v_l1_bits;
+    v_l1_shift = L1_MAP_ADDR_SPACE_BITS - TARGET_PAGE_BITS - v_l1_bits;
+    v_l2_levels = v_l1_shift / V_L2_BITS - 1;
+
+    assert(v_l1_bits <= V_L1_MAX_BITS);
+    assert(v_l1_shift % V_L2_BITS == 0);
+    assert(v_l2_levels >= 0);
+}
+
+PageDesc *page_find_alloc(tb_page_addr_t index, bool alloc)
+{
+    PageDesc *pd;
+    void **lp;
+    int i;
+
+    /* Level 1.  Always allocated.  */
+    lp = l1_map + ((index >> v_l1_shift) & (v_l1_size - 1));
+
+    /* Level 2..N-1.  */
+    for (i = v_l2_levels; i > 0; i--) {
+        void **p = qatomic_rcu_read(lp);
+
+        if (p == NULL) {
+            void *existing;
+
+            if (!alloc) {
+                return NULL;
+            }
+            p = g_new0(void *, V_L2_SIZE);
+            existing = qatomic_cmpxchg(lp, NULL, p);
+            if (unlikely(existing)) {
+                g_free(p);
+                p = existing;
+            }
+        }
+
+        lp = p + ((index >> (i * V_L2_BITS)) & (V_L2_SIZE - 1));
+    }
+
+    pd = qatomic_rcu_read(lp);
+    if (pd == NULL) {
+        void *existing;
+
+        if (!alloc) {
+            return NULL;
+        }
+        pd = g_new0(PageDesc, V_L2_SIZE);
+#ifndef CONFIG_USER_ONLY
+        {
+            int i;
+
+            for (i = 0; i < V_L2_SIZE; i++) {
+                qemu_spin_init(&pd[i].lock);
+            }
+        }
+#endif
+        existing = qatomic_cmpxchg(lp, NULL, pd);
+        if (unlikely(existing)) {
+#ifndef CONFIG_USER_ONLY
+            {
+                int i;
+
+                for (i = 0; i < V_L2_SIZE; i++) {
+                    qemu_spin_destroy(&pd[i].lock);
+                }
+            }
+#endif
+            g_free(pd);
+            pd = existing;
+        }
+    }
+
+    return pd + (index & (V_L2_SIZE - 1));
+}
 
 /* Set to NULL all the 'first_tb' fields in all PageDescs. */
 static void page_flush_tb_1(int level, void **lp)
@@ -420,6 +535,17 @@ static void tb_phys_invalidate__locked(TranslationBlock *tb)
     qemu_thread_jit_execute();
 }
 
+#ifdef CONFIG_USER_ONLY
+static inline void page_lock_pair(PageDesc **ret_p1, tb_page_addr_t phys1,
+                                  PageDesc **ret_p2, tb_page_addr_t phys2,
+                                  bool alloc)
+{
+    *ret_p1 = NULL;
+    *ret_p2 = NULL;
+}
+static inline void page_lock_tb(const TranslationBlock *tb) { }
+static inline void page_unlock_tb(const TranslationBlock *tb) { }
+#else
 static void page_lock_pair(PageDesc **ret_p1, tb_page_addr_t phys1,
                            PageDesc **ret_p2, tb_page_addr_t phys2, bool alloc)
 {
@@ -460,10 +586,6 @@ static void page_lock_pair(PageDesc **ret_p1, tb_page_addr_t phys1,
     }
 }
 
-#ifdef CONFIG_USER_ONLY
-static inline void page_lock_tb(const TranslationBlock *tb) { }
-static inline void page_unlock_tb(const TranslationBlock *tb) { }
-#else
 /* lock the page(s) of a TB in the correct acquisition order */
 static void page_lock_tb(const TranslationBlock *tb)
 {
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 65e557592a..0537723b33 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -114,37 +114,8 @@ QEMU_BUILD_BUG_ON(CPU_TRACE_DSTATE_MAX_EVENTS >
                   sizeof_field(TranslationBlock, trace_vcpu_dstate)
                   * BITS_PER_BYTE);
 
-/*
- * L1 Mapping properties
- */
-int v_l1_size;
-int v_l1_shift;
-int v_l2_levels;
-
-void *l1_map[V_L1_MAX_SIZE];
-
 TBContext tb_ctx;
 
-static void page_table_config_init(void)
-{
-    uint32_t v_l1_bits;
-
-    assert(TARGET_PAGE_BITS);
-    /* The bits remaining after N lower levels of page tables.  */
-    v_l1_bits = (L1_MAP_ADDR_SPACE_BITS - TARGET_PAGE_BITS) % V_L2_BITS;
-    if (v_l1_bits < V_L1_MIN_BITS) {
-        v_l1_bits += V_L2_BITS;
-    }
-
-    v_l1_size = 1 << v_l1_bits;
-    v_l1_shift = L1_MAP_ADDR_SPACE_BITS - TARGET_PAGE_BITS - v_l1_bits;
-    v_l2_levels = v_l1_shift / V_L2_BITS - 1;
-
-    assert(v_l1_bits <= V_L1_MAX_BITS);
-    assert(v_l1_shift % V_L2_BITS == 0);
-    assert(v_l2_levels >= 0);
-}
-
 /* Encode VAL as a signed leb128 sequence at P.
    Return P incremented past the encoded value.  */
 static uint8_t *encode_sleb128(uint8_t *p, target_long val)
@@ -381,72 +352,6 @@ void page_init(void)
 #endif
 }
 
-PageDesc *page_find_alloc(tb_page_addr_t index, bool alloc)
-{
-    PageDesc *pd;
-    void **lp;
-    int i;
-
-    /* Level 1.  Always allocated.  */
-    lp = l1_map + ((index >> v_l1_shift) & (v_l1_size - 1));
-
-    /* Level 2..N-1.  */
-    for (i = v_l2_levels; i > 0; i--) {
-        void **p = qatomic_rcu_read(lp);
-
-        if (p == NULL) {
-            void *existing;
-
-            if (!alloc) {
-                return NULL;
-            }
-            p = g_new0(void *, V_L2_SIZE);
-            existing = qatomic_cmpxchg(lp, NULL, p);
-            if (unlikely(existing)) {
-                g_free(p);
-                p = existing;
-            }
-        }
-
-        lp = p + ((index >> (i * V_L2_BITS)) & (V_L2_SIZE - 1));
-    }
-
-    pd = qatomic_rcu_read(lp);
-    if (pd == NULL) {
-        void *existing;
-
-        if (!alloc) {
-            return NULL;
-        }
-        pd = g_new0(PageDesc, V_L2_SIZE);
-#ifndef CONFIG_USER_ONLY
-        {
-            int i;
-
-            for (i = 0; i < V_L2_SIZE; i++) {
-                qemu_spin_init(&pd[i].lock);
-            }
-        }
-#endif
-        existing = qatomic_cmpxchg(lp, NULL, pd);
-        if (unlikely(existing)) {
-#ifndef CONFIG_USER_ONLY
-            {
-                int i;
-
-                for (i = 0; i < V_L2_SIZE; i++) {
-                    qemu_spin_destroy(&pd[i].lock);
-                }
-            }
-#endif
-            g_free(pd);
-            pd = existing;
-        }
-    }
-
-    return pd + (index & (V_L2_SIZE - 1));
-}
-
 /* In user-mode page locks aren't used; mmap_lock is enough */
 #ifdef CONFIG_USER_ONLY
 struct page_collection *
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 24/24] accel/tcg: Move remainder of page locking to tb-maint.c
  2022-10-06  3:10 [PATCH 00/24] accel/tcg: Rewrite user-only vma tracking Richard Henderson
                   ` (22 preceding siblings ...)
  2022-10-06  3:11 ` [PATCH 23/24] accel/tcg: Move PageDesc tree into tb-maint.c for system Richard Henderson
@ 2022-10-06  3:11 ` Richard Henderson
  2022-10-26 13:48   ` Alex Bennée
  2022-10-24 23:05 ` [PATCH 00/24] accel/tcg: Rewrite user-only vma tracking Richard Henderson
  24 siblings, 1 reply; 55+ messages in thread
From: Richard Henderson @ 2022-10-06  3:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, laurent, pbonzini, imp, f4bug

The only thing that still touches PageDesc in translate-all.c
are some locking routines related to tb-maint.c which have not
yet been moved.  Do so now.

Move some code up in tb-maint.c as well, to untangle the maze
of ifdefs, and allow a sensible final ordering.

Move some declarations from exec/translate-all.h to internal.h,
as they are only used within accel/tcg/.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/internal.h         |  68 ++-----
 include/exec/translate-all.h |   6 -
 accel/tcg/tb-maint.c         | 352 +++++++++++++++++++++++++++++++++--
 accel/tcg/translate-all.c    | 301 ------------------------------
 4 files changed, 352 insertions(+), 375 deletions(-)

diff --git a/accel/tcg/internal.h b/accel/tcg/internal.h
index c6c9e02cfd..6ce1437c58 100644
--- a/accel/tcg/internal.h
+++ b/accel/tcg/internal.h
@@ -23,62 +23,28 @@
 #define assert_memory_lock() tcg_debug_assert(have_mmap_lock())
 #endif
 
-typedef struct PageDesc PageDesc;
-#ifndef CONFIG_USER_ONLY
-struct PageDesc {
-    QemuSpin lock;
-    /* list of TBs intersecting this ram page */
-    uintptr_t first_tb;
-};
-
-PageDesc *page_find_alloc(tb_page_addr_t index, bool alloc);
-
-static inline PageDesc *page_find(tb_page_addr_t index)
-{
-    return page_find_alloc(index, false);
-}
-
-void page_table_config_init(void);
-#else
-static inline void page_table_config_init(void) { }
-#endif
-
-/* list iterators for lists of tagged pointers in TranslationBlock */
-#define TB_FOR_EACH_TAGGED(head, tb, n, field)                          \
-    for (n = (head) & 1, tb = (TranslationBlock *)((head) & ~1);        \
-         tb; tb = (TranslationBlock *)tb->field[n], n = (uintptr_t)tb & 1, \
-             tb = (TranslationBlock *)((uintptr_t)tb & ~1))
-
-#define TB_FOR_EACH_JMP(head_tb, tb, n)                                 \
-    TB_FOR_EACH_TAGGED((head_tb)->jmp_list_head, tb, n, jmp_list_next)
-
-/* In user-mode page locks aren't used; mmap_lock is enough */
-#ifdef CONFIG_USER_ONLY
-#define assert_page_locked(pd) tcg_debug_assert(have_mmap_lock())
-static inline void page_lock(PageDesc *pd) { }
-static inline void page_unlock(PageDesc *pd) { }
-#else
-#ifdef CONFIG_DEBUG_TCG
-void do_assert_page_locked(const PageDesc *pd, const char *file, int line);
-#define assert_page_locked(pd) do_assert_page_locked(pd, __FILE__, __LINE__)
-#else
-#define assert_page_locked(pd)
-#endif
-void page_lock(PageDesc *pd);
-void page_unlock(PageDesc *pd);
-
-/* TODO: For now, still shared with translate-all.c for system mode. */
-typedef int PageForEachNext;
-#define PAGE_FOR_EACH_TB(start, end, pagedesc, tb, n) \
-    TB_FOR_EACH_TAGGED((pagedesc)->first_tb, tb, n, page_next)
-
-#endif
-#if !defined(CONFIG_USER_ONLY) && defined(CONFIG_DEBUG_TCG)
+#if defined(CONFIG_SOFTMMU) && defined(CONFIG_DEBUG_TCG)
 void assert_no_pages_locked(void);
 #else
 static inline void assert_no_pages_locked(void) { }
 #endif
 
+#ifdef CONFIG_USER_ONLY
+static inline void page_table_config_init(void) { }
+#else
+void page_table_config_init(void);
+#endif
+
+#ifdef CONFIG_SOFTMMU
+struct page_collection;
+void tb_invalidate_phys_page_fast(struct page_collection *pages,
+                                  tb_page_addr_t start, int len,
+                                  uintptr_t retaddr);
+struct page_collection *page_collection_lock(tb_page_addr_t start,
+                                             tb_page_addr_t end);
+void page_collection_unlock(struct page_collection *set);
+#endif /* CONFIG_SOFTMMU */
+
 TranslationBlock *tb_gen_code(CPUState *cpu, target_ulong pc,
                               target_ulong cs_base, uint32_t flags,
                               int cflags);
diff --git a/include/exec/translate-all.h b/include/exec/translate-all.h
index 3e9cb91565..88602ae8d8 100644
--- a/include/exec/translate-all.h
+++ b/include/exec/translate-all.h
@@ -23,12 +23,6 @@
 
 
 /* translate-all.c */
-struct page_collection *page_collection_lock(tb_page_addr_t start,
-                                             tb_page_addr_t end);
-void page_collection_unlock(struct page_collection *set);
-void tb_invalidate_phys_page_fast(struct page_collection *pages,
-                                  tb_page_addr_t start, int len,
-                                  uintptr_t retaddr);
 void tb_invalidate_phys_page(tb_page_addr_t addr);
 void tb_check_watchpoint(CPUState *cpu, uintptr_t retaddr);
 
diff --git a/accel/tcg/tb-maint.c b/accel/tcg/tb-maint.c
index 31d0a74aa9..8fe2d322db 100644
--- a/accel/tcg/tb-maint.c
+++ b/accel/tcg/tb-maint.c
@@ -30,6 +30,15 @@
 #include "internal.h"
 
 
+/* List iterators for lists of tagged pointers in TranslationBlock. */
+#define TB_FOR_EACH_TAGGED(head, tb, n, field)                          \
+    for (n = (head) & 1, tb = (TranslationBlock *)((head) & ~1);        \
+         tb; tb = (TranslationBlock *)tb->field[n], n = (uintptr_t)tb & 1, \
+             tb = (TranslationBlock *)((uintptr_t)tb & ~1))
+
+#define TB_FOR_EACH_JMP(head_tb, tb, n)                                 \
+    TB_FOR_EACH_TAGGED((head_tb)->jmp_list_head, tb, n, jmp_list_next)
+
 static bool tb_cmp(const void *ap, const void *bp)
 {
     const TranslationBlock *a = ap;
@@ -51,7 +60,28 @@ void tb_htable_init(void)
     qht_init(&tb_ctx.htable, tb_cmp, CODE_GEN_HTABLE_SIZE, mode);
 }
 
+typedef struct PageDesc PageDesc;
+
 #ifdef CONFIG_USER_ONLY
+
+/*
+ * In user-mode page locks aren't used; mmap_lock is enough.
+ */
+#define assert_page_locked(pd) tcg_debug_assert(have_mmap_lock())
+
+static inline void page_lock_pair(PageDesc **ret_p1, tb_page_addr_t phys1,
+                                  PageDesc **ret_p2, tb_page_addr_t phys2,
+                                  bool alloc)
+{
+    *ret_p1 = NULL;
+    *ret_p2 = NULL;
+}
+
+static inline void page_lock(PageDesc *pd) { }
+static inline void page_unlock(PageDesc *pd) { }
+static inline void page_lock_tb(const TranslationBlock *tb) { }
+static inline void page_unlock_tb(const TranslationBlock *tb) { }
+
 /*
  * For user-only, since we are protecting all of memory with a single lock,
  * and because the two pages of a TranslationBlock are always contiguous,
@@ -157,6 +187,12 @@ static int v_l2_levels;
 
 static void *l1_map[V_L1_MAX_SIZE];
 
+struct PageDesc {
+    QemuSpin lock;
+    /* list of TBs intersecting this ram page */
+    uintptr_t first_tb;
+};
+
 void page_table_config_init(void)
 {
     uint32_t v_l1_bits;
@@ -177,7 +213,7 @@ void page_table_config_init(void)
     assert(v_l2_levels >= 0);
 }
 
-PageDesc *page_find_alloc(tb_page_addr_t index, bool alloc)
+static PageDesc *page_find_alloc(tb_page_addr_t index, bool alloc)
 {
     PageDesc *pd;
     void **lp;
@@ -243,6 +279,303 @@ PageDesc *page_find_alloc(tb_page_addr_t index, bool alloc)
     return pd + (index & (V_L2_SIZE - 1));
 }
 
+static inline PageDesc *page_find(tb_page_addr_t index)
+{
+    return page_find_alloc(index, false);
+}
+
+/**
+ * struct page_entry - page descriptor entry
+ * @pd:     pointer to the &struct PageDesc of the page this entry represents
+ * @index:  page index of the page
+ * @locked: whether the page is locked
+ *
+ * This struct helps us keep track of the locked state of a page, without
+ * bloating &struct PageDesc.
+ *
+ * A page lock protects accesses to all fields of &struct PageDesc.
+ *
+ * See also: &struct page_collection.
+ */
+struct page_entry {
+    PageDesc *pd;
+    tb_page_addr_t index;
+    bool locked;
+};
+
+/**
+ * struct page_collection - tracks a set of pages (i.e. &struct page_entry's)
+ * @tree:   Binary search tree (BST) of the pages, with key == page index
+ * @max:    Pointer to the page in @tree with the highest page index
+ *
+ * To avoid deadlock we lock pages in ascending order of page index.
+ * When operating on a set of pages, we need to keep track of them so that
+ * we can lock them in order and also unlock them later. For this we collect
+ * pages (i.e. &struct page_entry's) in a binary search @tree. Given that the
+ * @tree implementation we use does not provide an O(1) operation to obtain the
+ * highest-ranked element, we use @max to keep track of the inserted page
+ * with the highest index. This is valuable because if a page is not in
+ * the tree and its index is higher than @max's, then we can lock it
+ * without breaking the locking order rule.
+ *
+ * Note on naming: 'struct page_set' would be shorter, but we already have a few
+ * page_set_*() helpers, so page_collection is used instead to avoid confusion.
+ *
+ * See also: page_collection_lock().
+ */
+struct page_collection {
+    GTree *tree;
+    struct page_entry *max;
+};
+
+typedef int PageForEachNext;
+#define PAGE_FOR_EACH_TB(start, end, pagedesc, tb, n) \
+    TB_FOR_EACH_TAGGED((pagedesc)->first_tb, tb, n, page_next)
+
+#ifdef CONFIG_DEBUG_TCG
+
+static __thread GHashTable *ht_pages_locked_debug;
+
+static void ht_pages_locked_debug_init(void)
+{
+    if (ht_pages_locked_debug) {
+        return;
+    }
+    ht_pages_locked_debug = g_hash_table_new(NULL, NULL);
+}
+
+static bool page_is_locked(const PageDesc *pd)
+{
+    PageDesc *found;
+
+    ht_pages_locked_debug_init();
+    found = g_hash_table_lookup(ht_pages_locked_debug, pd);
+    return !!found;
+}
+
+static void page_lock__debug(PageDesc *pd)
+{
+    ht_pages_locked_debug_init();
+    g_assert(!page_is_locked(pd));
+    g_hash_table_insert(ht_pages_locked_debug, pd, pd);
+}
+
+static void page_unlock__debug(const PageDesc *pd)
+{
+    bool removed;
+
+    ht_pages_locked_debug_init();
+    g_assert(page_is_locked(pd));
+    removed = g_hash_table_remove(ht_pages_locked_debug, pd);
+    g_assert(removed);
+}
+
+static void do_assert_page_locked(const PageDesc *pd,
+                                  const char *file, int line)
+{
+    if (unlikely(!page_is_locked(pd))) {
+        error_report("assert_page_lock: PageDesc %p not locked @ %s:%d",
+                     pd, file, line);
+        abort();
+    }
+}
+#define assert_page_locked(pd) do_assert_page_locked(pd, __FILE__, __LINE__)
+
+void assert_no_pages_locked(void)
+{
+    ht_pages_locked_debug_init();
+    g_assert(g_hash_table_size(ht_pages_locked_debug) == 0);
+}
+
+#else /* !CONFIG_DEBUG_TCG */
+
+static inline void page_lock__debug(const PageDesc *pd) { }
+static inline void page_unlock__debug(const PageDesc *pd) { }
+static inline void assert_page_locked(const PageDesc *pd) { }
+
+#endif /* CONFIG_DEBUG_TCG */
+
+static void page_lock(PageDesc *pd)
+{
+    page_lock__debug(pd);
+    qemu_spin_lock(&pd->lock);
+}
+
+static void page_unlock(PageDesc *pd)
+{
+    qemu_spin_unlock(&pd->lock);
+    page_unlock__debug(pd);
+}
+
+static inline struct page_entry *
+page_entry_new(PageDesc *pd, tb_page_addr_t index)
+{
+    struct page_entry *pe = g_malloc(sizeof(*pe));
+
+    pe->index = index;
+    pe->pd = pd;
+    pe->locked = false;
+    return pe;
+}
+
+static void page_entry_destroy(gpointer p)
+{
+    struct page_entry *pe = p;
+
+    g_assert(pe->locked);
+    page_unlock(pe->pd);
+    g_free(pe);
+}
+
+/* returns false on success */
+static bool page_entry_trylock(struct page_entry *pe)
+{
+    bool busy;
+
+    busy = qemu_spin_trylock(&pe->pd->lock);
+    if (!busy) {
+        g_assert(!pe->locked);
+        pe->locked = true;
+        page_lock__debug(pe->pd);
+    }
+    return busy;
+}
+
+static void do_page_entry_lock(struct page_entry *pe)
+{
+    page_lock(pe->pd);
+    g_assert(!pe->locked);
+    pe->locked = true;
+}
+
+static gboolean page_entry_lock(gpointer key, gpointer value, gpointer data)
+{
+    struct page_entry *pe = value;
+
+    do_page_entry_lock(pe);
+    return FALSE;
+}
+
+static gboolean page_entry_unlock(gpointer key, gpointer value, gpointer data)
+{
+    struct page_entry *pe = value;
+
+    if (pe->locked) {
+        pe->locked = false;
+        page_unlock(pe->pd);
+    }
+    return FALSE;
+}
+
+/*
+ * Trylock a page, and if successful, add the page to a collection.
+ * Returns true ("busy") if the page could not be locked; false otherwise.
+ */
+static bool page_trylock_add(struct page_collection *set, tb_page_addr_t addr)
+{
+    tb_page_addr_t index = addr >> TARGET_PAGE_BITS;
+    struct page_entry *pe;
+    PageDesc *pd;
+
+    pe = g_tree_lookup(set->tree, &index);
+    if (pe) {
+        return false;
+    }
+
+    pd = page_find(index);
+    if (pd == NULL) {
+        return false;
+    }
+
+    pe = page_entry_new(pd, index);
+    g_tree_insert(set->tree, &pe->index, pe);
+
+    /*
+     * If this is either (1) the first insertion or (2) a page whose index
+     * is higher than any other so far, just lock the page and move on.
+     */
+    if (set->max == NULL || pe->index > set->max->index) {
+        set->max = pe;
+        do_page_entry_lock(pe);
+        return false;
+    }
+    /*
+     * Try to acquire out-of-order lock; if busy, return busy so that we acquire
+     * locks in order.
+     */
+    return page_entry_trylock(pe);
+}
+
+static gint tb_page_addr_cmp(gconstpointer ap, gconstpointer bp, gpointer udata)
+{
+    tb_page_addr_t a = *(const tb_page_addr_t *)ap;
+    tb_page_addr_t b = *(const tb_page_addr_t *)bp;
+
+    if (a == b) {
+        return 0;
+    } else if (a < b) {
+        return -1;
+    }
+    return 1;
+}
+
+/*
+ * Lock a range of pages ([@start,@end[) as well as the pages of all
+ * intersecting TBs.
+ * Locking order: acquire locks in ascending order of page index.
+ */
+struct page_collection *
+page_collection_lock(tb_page_addr_t start, tb_page_addr_t end)
+{
+    struct page_collection *set = g_malloc(sizeof(*set));
+    tb_page_addr_t index;
+    PageDesc *pd;
+
+    start >>= TARGET_PAGE_BITS;
+    end   >>= TARGET_PAGE_BITS;
+    g_assert(start <= end);
+
+    set->tree = g_tree_new_full(tb_page_addr_cmp, NULL, NULL,
+                                page_entry_destroy);
+    set->max = NULL;
+    assert_no_pages_locked();
+
+ retry:
+    g_tree_foreach(set->tree, page_entry_lock, NULL);
+
+    for (index = start; index <= end; index++) {
+        TranslationBlock *tb;
+        PageForEachNext n;
+
+        pd = page_find(index);
+        if (pd == NULL) {
+            continue;
+        }
+        if (page_trylock_add(set, index << TARGET_PAGE_BITS)) {
+            g_tree_foreach(set->tree, page_entry_unlock, NULL);
+            goto retry;
+        }
+        assert_page_locked(pd);
+        PAGE_FOR_EACH_TB(unused, unused, pd, tb, n) {
+            if (page_trylock_add(set, tb_page_addr0(tb)) ||
+                (tb_page_addr1(tb) != -1 &&
+                 page_trylock_add(set, tb_page_addr1(tb)))) {
+                /* drop all locks, and reacquire in order */
+                g_tree_foreach(set->tree, page_entry_unlock, NULL);
+                goto retry;
+            }
+        }
+    }
+    return set;
+}
+
+void page_collection_unlock(struct page_collection *set)
+{
+    /* entries are unlocked and freed via page_entry_destroy */
+    g_tree_destroy(set->tree);
+    g_free(set);
+}
+
 /* Set to NULL all the 'first_tb' fields in all PageDescs. */
 static void page_flush_tb_1(int level, void **lp)
 {
@@ -338,7 +671,6 @@ static void tb_page_remove(TranslationBlock *tb)
         tb_page_remove1(tb, pd);
     }
 }
-
 #endif
 
 /* flush all the translation blocks */
@@ -536,15 +868,6 @@ static void tb_phys_invalidate__locked(TranslationBlock *tb)
 }
 
 #ifdef CONFIG_USER_ONLY
-static inline void page_lock_pair(PageDesc **ret_p1, tb_page_addr_t phys1,
-                                  PageDesc **ret_p2, tb_page_addr_t phys2,
-                                  bool alloc)
-{
-    *ret_p1 = NULL;
-    *ret_p2 = NULL;
-}
-static inline void page_lock_tb(const TranslationBlock *tb) { }
-static inline void page_unlock_tb(const TranslationBlock *tb) { }
 #else
 static void page_lock_pair(PageDesc **ret_p1, tb_page_addr_t phys1,
                            PageDesc **ret_p2, tb_page_addr_t phys2, bool alloc)
@@ -756,8 +1079,7 @@ bool tb_invalidate_phys_page_unwind(tb_page_addr_t addr, uintptr_t pc)
 #else
 /*
  * @p must be non-NULL.
- * user-mode: call with mmap_lock held.
- * !user-mode: call with all @pages locked.
+ * Call with all @pages locked.
  */
 static void
 tb_invalidate_phys_page_range__locked(struct page_collection *pages,
@@ -828,8 +1150,6 @@ tb_invalidate_phys_page_range__locked(struct page_collection *pages,
 /*
  * Invalidate all TBs which intersect with the target physical
  * address page @addr.
- *
- * Called with mmap_lock held for user-mode emulation
  */
 void tb_invalidate_phys_page(tb_page_addr_t addr)
 {
@@ -855,8 +1175,6 @@ void tb_invalidate_phys_page(tb_page_addr_t addr)
  * 'is_cpu_write_access' should be true if called from a real cpu write
  * access: the virtual CPU will exit the current TB if code is modified inside
  * this TB.
- *
- * Called with mmap_lock held for user-mode emulation.
  */
 void tb_invalidate_phys_range(tb_page_addr_t start, tb_page_addr_t end)
 {
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 0537723b33..a5de92142b 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -63,52 +63,6 @@
 #include "tb-context.h"
 #include "internal.h"
 
-/* make various TB consistency checks */
-
-/**
- * struct page_entry - page descriptor entry
- * @pd:     pointer to the &struct PageDesc of the page this entry represents
- * @index:  page index of the page
- * @locked: whether the page is locked
- *
- * This struct helps us keep track of the locked state of a page, without
- * bloating &struct PageDesc.
- *
- * A page lock protects accesses to all fields of &struct PageDesc.
- *
- * See also: &struct page_collection.
- */
-struct page_entry {
-    PageDesc *pd;
-    tb_page_addr_t index;
-    bool locked;
-};
-
-/**
- * struct page_collection - tracks a set of pages (i.e. &struct page_entry's)
- * @tree:   Binary search tree (BST) of the pages, with key == page index
- * @max:    Pointer to the page in @tree with the highest page index
- *
- * To avoid deadlock we lock pages in ascending order of page index.
- * When operating on a set of pages, we need to keep track of them so that
- * we can lock them in order and also unlock them later. For this we collect
- * pages (i.e. &struct page_entry's) in a binary search @tree. Given that the
- * @tree implementation we use does not provide an O(1) operation to obtain the
- * highest-ranked element, we use @max to keep track of the inserted page
- * with the highest index. This is valuable because if a page is not in
- * the tree and its index is higher than @max's, then we can lock it
- * without breaking the locking order rule.
- *
- * Note on naming: 'struct page_set' would be shorter, but we already have a few
- * page_set_*() helpers, so page_collection is used instead to avoid confusion.
- *
- * See also: page_collection_lock().
- */
-struct page_collection {
-    GTree *tree;
-    struct page_entry *max;
-};
-
 /* Make sure all possible CPU event bits fit in tb->trace_vcpu_dstate */
 QEMU_BUILD_BUG_ON(CPU_TRACE_DSTATE_MAX_EVENTS >
                   sizeof_field(TranslationBlock, trace_vcpu_dstate)
@@ -352,261 +306,6 @@ void page_init(void)
 #endif
 }
 
-/* In user-mode page locks aren't used; mmap_lock is enough */
-#ifdef CONFIG_USER_ONLY
-struct page_collection *
-page_collection_lock(tb_page_addr_t start, tb_page_addr_t end)
-{
-    return NULL;
-}
-
-void page_collection_unlock(struct page_collection *set)
-{ }
-#else /* !CONFIG_USER_ONLY */
-
-#ifdef CONFIG_DEBUG_TCG
-
-static __thread GHashTable *ht_pages_locked_debug;
-
-static void ht_pages_locked_debug_init(void)
-{
-    if (ht_pages_locked_debug) {
-        return;
-    }
-    ht_pages_locked_debug = g_hash_table_new(NULL, NULL);
-}
-
-static bool page_is_locked(const PageDesc *pd)
-{
-    PageDesc *found;
-
-    ht_pages_locked_debug_init();
-    found = g_hash_table_lookup(ht_pages_locked_debug, pd);
-    return !!found;
-}
-
-static void page_lock__debug(PageDesc *pd)
-{
-    ht_pages_locked_debug_init();
-    g_assert(!page_is_locked(pd));
-    g_hash_table_insert(ht_pages_locked_debug, pd, pd);
-}
-
-static void page_unlock__debug(const PageDesc *pd)
-{
-    bool removed;
-
-    ht_pages_locked_debug_init();
-    g_assert(page_is_locked(pd));
-    removed = g_hash_table_remove(ht_pages_locked_debug, pd);
-    g_assert(removed);
-}
-
-void do_assert_page_locked(const PageDesc *pd, const char *file, int line)
-{
-    if (unlikely(!page_is_locked(pd))) {
-        error_report("assert_page_lock: PageDesc %p not locked @ %s:%d",
-                     pd, file, line);
-        abort();
-    }
-}
-
-void assert_no_pages_locked(void)
-{
-    ht_pages_locked_debug_init();
-    g_assert(g_hash_table_size(ht_pages_locked_debug) == 0);
-}
-
-#else /* !CONFIG_DEBUG_TCG */
-
-static inline void page_lock__debug(const PageDesc *pd) { }
-static inline void page_unlock__debug(const PageDesc *pd) { }
-
-#endif /* CONFIG_DEBUG_TCG */
-
-void page_lock(PageDesc *pd)
-{
-    page_lock__debug(pd);
-    qemu_spin_lock(&pd->lock);
-}
-
-void page_unlock(PageDesc *pd)
-{
-    qemu_spin_unlock(&pd->lock);
-    page_unlock__debug(pd);
-}
-
-static inline struct page_entry *
-page_entry_new(PageDesc *pd, tb_page_addr_t index)
-{
-    struct page_entry *pe = g_malloc(sizeof(*pe));
-
-    pe->index = index;
-    pe->pd = pd;
-    pe->locked = false;
-    return pe;
-}
-
-static void page_entry_destroy(gpointer p)
-{
-    struct page_entry *pe = p;
-
-    g_assert(pe->locked);
-    page_unlock(pe->pd);
-    g_free(pe);
-}
-
-/* returns false on success */
-static bool page_entry_trylock(struct page_entry *pe)
-{
-    bool busy;
-
-    busy = qemu_spin_trylock(&pe->pd->lock);
-    if (!busy) {
-        g_assert(!pe->locked);
-        pe->locked = true;
-        page_lock__debug(pe->pd);
-    }
-    return busy;
-}
-
-static void do_page_entry_lock(struct page_entry *pe)
-{
-    page_lock(pe->pd);
-    g_assert(!pe->locked);
-    pe->locked = true;
-}
-
-static gboolean page_entry_lock(gpointer key, gpointer value, gpointer data)
-{
-    struct page_entry *pe = value;
-
-    do_page_entry_lock(pe);
-    return FALSE;
-}
-
-static gboolean page_entry_unlock(gpointer key, gpointer value, gpointer data)
-{
-    struct page_entry *pe = value;
-
-    if (pe->locked) {
-        pe->locked = false;
-        page_unlock(pe->pd);
-    }
-    return FALSE;
-}
-
-/*
- * Trylock a page, and if successful, add the page to a collection.
- * Returns true ("busy") if the page could not be locked; false otherwise.
- */
-static bool page_trylock_add(struct page_collection *set, tb_page_addr_t addr)
-{
-    tb_page_addr_t index = addr >> TARGET_PAGE_BITS;
-    struct page_entry *pe;
-    PageDesc *pd;
-
-    pe = g_tree_lookup(set->tree, &index);
-    if (pe) {
-        return false;
-    }
-
-    pd = page_find(index);
-    if (pd == NULL) {
-        return false;
-    }
-
-    pe = page_entry_new(pd, index);
-    g_tree_insert(set->tree, &pe->index, pe);
-
-    /*
-     * If this is either (1) the first insertion or (2) a page whose index
-     * is higher than any other so far, just lock the page and move on.
-     */
-    if (set->max == NULL || pe->index > set->max->index) {
-        set->max = pe;
-        do_page_entry_lock(pe);
-        return false;
-    }
-    /*
-     * Try to acquire out-of-order lock; if busy, return busy so that we acquire
-     * locks in order.
-     */
-    return page_entry_trylock(pe);
-}
-
-static gint tb_page_addr_cmp(gconstpointer ap, gconstpointer bp, gpointer udata)
-{
-    tb_page_addr_t a = *(const tb_page_addr_t *)ap;
-    tb_page_addr_t b = *(const tb_page_addr_t *)bp;
-
-    if (a == b) {
-        return 0;
-    } else if (a < b) {
-        return -1;
-    }
-    return 1;
-}
-
-/*
- * Lock a range of pages ([@start,@end[) as well as the pages of all
- * intersecting TBs.
- * Locking order: acquire locks in ascending order of page index.
- */
-struct page_collection *
-page_collection_lock(tb_page_addr_t start, tb_page_addr_t end)
-{
-    struct page_collection *set = g_malloc(sizeof(*set));
-    tb_page_addr_t index;
-    PageDesc *pd;
-
-    start >>= TARGET_PAGE_BITS;
-    end   >>= TARGET_PAGE_BITS;
-    g_assert(start <= end);
-
-    set->tree = g_tree_new_full(tb_page_addr_cmp, NULL, NULL,
-                                page_entry_destroy);
-    set->max = NULL;
-    assert_no_pages_locked();
-
- retry:
-    g_tree_foreach(set->tree, page_entry_lock, NULL);
-
-    for (index = start; index <= end; index++) {
-        TranslationBlock *tb;
-        PageForEachNext n;
-
-        pd = page_find(index);
-        if (pd == NULL) {
-            continue;
-        }
-        if (page_trylock_add(set, index << TARGET_PAGE_BITS)) {
-            g_tree_foreach(set->tree, page_entry_unlock, NULL);
-            goto retry;
-        }
-        assert_page_locked(pd);
-        PAGE_FOR_EACH_TB(unused, unused, pd, tb, n) {
-            if (page_trylock_add(set, tb_page_addr0(tb)) ||
-                (tb_page_addr1(tb) != -1 &&
-                 page_trylock_add(set, tb_page_addr1(tb)))) {
-                /* drop all locks, and reacquire in order */
-                g_tree_foreach(set->tree, page_entry_unlock, NULL);
-                goto retry;
-            }
-        }
-    }
-    return set;
-}
-
-void page_collection_unlock(struct page_collection *set)
-{
-    /* entries are unlocked and freed via page_entry_destroy */
-    g_tree_destroy(set->tree);
-    g_free(set);
-}
-
-#endif /* !CONFIG_USER_ONLY */
-
 /* Called with mmap_lock held for user mode emulation.  */
 TranslationBlock *tb_gen_code(CPUState *cpu,
                               target_ulong pc, target_ulong cs_base,
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH 00/24] accel/tcg: Rewrite user-only vma tracking
  2022-10-06  3:10 [PATCH 00/24] accel/tcg: Rewrite user-only vma tracking Richard Henderson
                   ` (23 preceding siblings ...)
  2022-10-06  3:11 ` [PATCH 24/24] accel/tcg: Move remainder of page locking to tb-maint.c Richard Henderson
@ 2022-10-24 23:05 ` Richard Henderson
  24 siblings, 0 replies; 55+ messages in thread
From: Richard Henderson @ 2022-10-24 23:05 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex.bennee, laurent, pbonzini, imp, Philippe Mathieu-Daudé

Ping.

On 10/6/22 13:10, Richard Henderson wrote:
> The primary motivator here are the numerous bug reports (e.g. #290)
> about not being able to handle very large memory allocations.
> I presume all or most of these are due to guest use of the clang
> address sanitizer, which allocates a massive shadow vma.
> 
> This patch set copies the linux kernel code for interval trees,
> which is what the kernel itself uses for managing vmas.  I then
> purge all (real) use of PageDesc from user-only.  This is easy
> for user-only because everything tricky happens under mmap_lock();
> 
> I have thought only briefly about using interval trees for system
> mode too, but the locking situation there is more difficult.  So
> for now that code gets moved around but not substantially changed.
> 
> The test case from #290 is added to test/tcg/multiarch/.
> Before this patch set, on my moderately beefy laptop, it takes 39s
> and has an RSS of 28GB before the qemu process is killed.  After
> the patch set, the test case successfully allocates 16TB and
> completes in 0.013s.
> 
> 
> r~
> 
> 
> Richard Henderson (24):
>    util: Add interval-tree.c
>    accel/tcg: Make page_alloc_target_data allocation constant
>    accel/tcg: Remove disabled debug in translate-all.c
>    accel/tcg: Split out PageDesc to internal.h
>    accel/tcg: Split out tb-maint.c
>    accel/tcg: Move assert_no_pages_locked to internal.h
>    accel/tcg: Drop cpu_get_tb_cpu_state from TARGET_HAS_PRECISE_SMC
>    accel/tcg: Remove duplicate store to tb->page_addr[]
>    accel/tcg: Introduce tb_{set_}page_addr{0,1}
>    accel/tcg: Rename tb_invalidate_phys_page
>    accel/tcg: Rename tb_invalidate_phys_page_range and drop end parameter
>    accel/tcg: Unify declarations of tb_invalidate_phys_range
>    accel/tcg: Use tb_invalidate_phys_page in page_set_flags
>    accel/tcg: Call tb_invalidate_phys_page for PAGE_RESET
>    accel/tcg: Use interval tree for TBs in user-only mode
>    accel/tcg: Use page_reset_target_data in page_set_flags
>    accel/tcg: Use tb_invalidate_phys_range in page_set_flags
>    accel/tcg: Move TARGET_PAGE_DATA_SIZE impl to user-exec.c
>    accel/tcg: Simplify page_get/alloc_target_data
>    accel/tcg: Use interval tree for TARGET_PAGE_DATA_SIZE
>    accel/tcg: Move page_{get,set}_flags to user-exec.c
>    accel/tcg: Use interval tree for user-only page tracking
>    accel/tcg: Move PageDesc tree into tb-maint.c for system
>    accel/tcg: Move remainder of page locking to tb-maint.c
> 
>   accel/tcg/internal.h            |   40 +
>   include/exec/cpu-all.h          |   22 +-
>   include/exec/exec-all.h         |   75 +-
>   include/exec/ram_addr.h         |    2 -
>   include/exec/translate-all.h    |    8 +-
>   include/qemu/interval-tree.h    |   99 ++
>   target/arm/cpu.h                |    8 +
>   target/arm/internals.h          |    4 -
>   accel/tcg/cpu-exec.c            |    9 +-
>   accel/tcg/tb-maint.c            | 1222 ++++++++++++++++++++++
>   accel/tcg/translate-all.c       | 1683 +------------------------------
>   accel/tcg/translator.c          |    9 +-
>   accel/tcg/user-exec.c           |  662 ++++++++++++
>   bsd-user/mmap.c                 |    2 -
>   cpu.c                           |    4 +-
>   linux-user/mmap.c               |    4 -
>   target/arm/mte_helper.c         |    5 -
>   tests/tcg/multiarch/test-vma.c  |   22 +
>   tests/unit/test-interval-tree.c |  209 ++++
>   util/interval-tree.c            |  881 ++++++++++++++++
>   accel/tcg/meson.build           |    1 +
>   tests/unit/meson.build          |    1 +
>   util/meson.build                |    1 +
>   23 files changed, 3235 insertions(+), 1738 deletions(-)
>   create mode 100644 include/qemu/interval-tree.h
>   create mode 100644 accel/tcg/tb-maint.c
>   create mode 100644 tests/tcg/multiarch/test-vma.c
>   create mode 100644 tests/unit/test-interval-tree.c
>   create mode 100644 util/interval-tree.c
> 



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 01/24] util: Add interval-tree.c
  2022-10-06  3:10 ` [PATCH 01/24] util: Add interval-tree.c Richard Henderson
@ 2022-10-25  8:40   ` Alex Bennée
  2022-10-25 10:36     ` Richard Henderson
  0 siblings, 1 reply; 55+ messages in thread
From: Alex Bennée @ 2022-10-25  8:40 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, laurent, pbonzini, imp, f4bug


Richard Henderson <richard.henderson@linaro.org> writes:

> Copy and simplify the Linux kernel's interval_tree_generic.h,
> instantiating for uint64_t.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
<snip>
> diff --git a/util/interval-tree.c b/util/interval-tree.c
> new file mode 100644
> index 0000000000..9578c05830
> --- /dev/null
> +++ b/util/interval-tree.c
> @@ -0,0 +1,881 @@
<snip>
> +
> +#if 1
> +static void debug_interval_tree_int(IntervalTreeNode *node,
> +                                    const char *dir, int level)
> +{
> +    printf("%4d %*s %s [%" PRId64 ",%" PRId64 "] subtree_last:%" PRId64 "\n",
> +           level, level + 1, dir, rb_is_red(&node->rb) ? "r" : "b",
> +           node->start, node->last, node->subtree_last);
> +
> +    if (node->rb.rb_left) {
> +        debug_interval_tree_int(rb_to_itree(node->rb.rb_left), "<", level + 1);
> +    }
> +    if (node->rb.rb_right) {
> +        debug_interval_tree_int(rb_to_itree(node->rb.rb_right), ">", level + 1);
> +    }
> +}
> +
> +void debug_interval_tree(IntervalTreeNode *node);
> +void debug_interval_tree(IntervalTreeNode *node)
> +{
> +    if (node) {
> +        debug_interval_tree_int(node, "*", 0);
> +    } else {
> +        printf("null\n");
> +    }
> +}
> +#endif

This seems superfluous especially as we have unit tests.

Otherwise:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/24] accel/tcg: Make page_alloc_target_data allocation constant
  2022-10-06  3:10 ` [PATCH 02/24] accel/tcg: Make page_alloc_target_data allocation constant Richard Henderson
@ 2022-10-25  8:45   ` Alex Bennée
  0 siblings, 0 replies; 55+ messages in thread
From: Alex Bennée @ 2022-10-25  8:45 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, laurent, pbonzini, imp, f4bug


Richard Henderson <richard.henderson@linaro.org> writes:

> Use a constant target data allocation size for all pages.
> This will be necessary to reduce overhead of page tracking.
> Since TARGET_PAGE_DATA_SIZE is now required, we can use this
> to omit data tracking for targets that don't require it.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 03/24] accel/tcg: Remove disabled debug in translate-all.c
  2022-10-06  3:10 ` [PATCH 03/24] accel/tcg: Remove disabled debug in translate-all.c Richard Henderson
@ 2022-10-25  8:46   ` Alex Bennée
  0 siblings, 0 replies; 55+ messages in thread
From: Alex Bennée @ 2022-10-25  8:46 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, laurent, pbonzini, imp, f4bug


Richard Henderson <richard.henderson@linaro.org> writes:

> These items printf, and could be replaced with proper
> tracepoints if we really cared.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 04/24] accel/tcg: Split out PageDesc to internal.h
  2022-10-06  3:10 ` [PATCH 04/24] accel/tcg: Split out PageDesc to internal.h Richard Henderson
@ 2022-10-25  8:47   ` Alex Bennée
  0 siblings, 0 replies; 55+ messages in thread
From: Alex Bennée @ 2022-10-25  8:47 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, laurent, pbonzini, imp, f4bug


Richard Henderson <richard.henderson@linaro.org> writes:

> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 05/24] accel/tcg: Split out tb-maint.c
  2022-10-06  3:10 ` [PATCH 05/24] accel/tcg: Split out tb-maint.c Richard Henderson
@ 2022-10-25  8:49   ` Alex Bennée
  0 siblings, 0 replies; 55+ messages in thread
From: Alex Bennée @ 2022-10-25  8:49 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, laurent, pbonzini, imp, f4bug


Richard Henderson <richard.henderson@linaro.org> writes:

> Move all of the TranslationBlock flushing and page linking
> code from translate-all.c to tb-maint.c.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 06/24] accel/tcg: Move assert_no_pages_locked to internal.h
  2022-10-06  3:10 ` [PATCH 06/24] accel/tcg: Move assert_no_pages_locked to internal.h Richard Henderson
@ 2022-10-25  8:49   ` Alex Bennée
  0 siblings, 0 replies; 55+ messages in thread
From: Alex Bennée @ 2022-10-25  8:49 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, laurent, pbonzini, imp, f4bug


Richard Henderson <richard.henderson@linaro.org> writes:

> There are no users outside of accel/tcg; this function
> does not need to be defined in exec-all.h.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 07/24] accel/tcg: Drop cpu_get_tb_cpu_state from TARGET_HAS_PRECISE_SMC
  2022-10-06  3:10 ` [PATCH 07/24] accel/tcg: Drop cpu_get_tb_cpu_state from TARGET_HAS_PRECISE_SMC Richard Henderson
@ 2022-10-25  8:50   ` Alex Bennée
  0 siblings, 0 replies; 55+ messages in thread
From: Alex Bennée @ 2022-10-25  8:50 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, laurent, pbonzini, imp, f4bug


Richard Henderson <richard.henderson@linaro.org> writes:

> The results of the calls to cpu_get_tb_cpu_state,
> current_{pc,cs_base,flags}, are not used.
> In tb_invalidate_phys_page, use bool for current_tb_modified.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 08/24] accel/tcg: Remove duplicate store to tb->page_addr[]
  2022-10-06  3:10 ` [PATCH 08/24] accel/tcg: Remove duplicate store to tb->page_addr[] Richard Henderson
@ 2022-10-25  9:12   ` Alex Bennée
  0 siblings, 0 replies; 55+ messages in thread
From: Alex Bennée @ 2022-10-25  9:12 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, laurent, pbonzini, imp, f4bug


Richard Henderson <richard.henderson@linaro.org> writes:

> When we added the fast path, we initialized page_addr[] early.
> These stores in and around tb_page_add() are redundant; remove them.
>
> Fixes: 50627f1b7b1 ("accel/tcg: Add fast path for translator_ld*")
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 09/24] accel/tcg: Introduce tb_{set_}page_addr{0,1}
  2022-10-06  3:10 ` [PATCH 09/24] accel/tcg: Introduce tb_{set_}page_addr{0,1} Richard Henderson
@ 2022-10-25  9:47   ` Alex Bennée
  0 siblings, 0 replies; 55+ messages in thread
From: Alex Bennée @ 2022-10-25  9:47 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, laurent, pbonzini, imp, f4bug


Richard Henderson <richard.henderson@linaro.org> writes:

> This data structure will be replaced for user-only: add accessors.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 10/24] accel/tcg: Rename tb_invalidate_phys_page
  2022-10-06  3:10 ` [PATCH 10/24] accel/tcg: Rename tb_invalidate_phys_page Richard Henderson
@ 2022-10-25  9:49   ` Alex Bennée
  0 siblings, 0 replies; 55+ messages in thread
From: Alex Bennée @ 2022-10-25  9:49 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, laurent, pbonzini, imp, f4bug


Richard Henderson <richard.henderson@linaro.org> writes:

> Rename to tb_invalidate_phys_page_unwind to emphasize that
> we also detect invalidating the current TB, and also to free
> up that name for other usage.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 01/24] util: Add interval-tree.c
  2022-10-25  8:40   ` Alex Bennée
@ 2022-10-25 10:36     ` Richard Henderson
  0 siblings, 0 replies; 55+ messages in thread
From: Richard Henderson @ 2022-10-25 10:36 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel, laurent, pbonzini, imp, f4bug

On 10/25/22 18:40, Alex Bennée wrote:
> 
> Richard Henderson <richard.henderson@linaro.org> writes:
> 
>> Copy and simplify the Linux kernel's interval_tree_generic.h,
>> instantiating for uint64_t.
>>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
> <snip>
>> diff --git a/util/interval-tree.c b/util/interval-tree.c
>> new file mode 100644
>> index 0000000000..9578c05830
>> --- /dev/null
>> +++ b/util/interval-tree.c
>> @@ -0,0 +1,881 @@
> <snip>
>> +
>> +#if 1
>> +static void debug_interval_tree_int(IntervalTreeNode *node,
>> +                                    const char *dir, int level)
>> +{
>> +    printf("%4d %*s %s [%" PRId64 ",%" PRId64 "] subtree_last:%" PRId64 "\n",
>> +           level, level + 1, dir, rb_is_red(&node->rb) ? "r" : "b",
>> +           node->start, node->last, node->subtree_last);
>> +
>> +    if (node->rb.rb_left) {
>> +        debug_interval_tree_int(rb_to_itree(node->rb.rb_left), "<", level + 1);
>> +    }
>> +    if (node->rb.rb_right) {
>> +        debug_interval_tree_int(rb_to_itree(node->rb.rb_right), ">", level + 1);
>> +    }
>> +}
>> +
>> +void debug_interval_tree(IntervalTreeNode *node);
>> +void debug_interval_tree(IntervalTreeNode *node)
>> +{
>> +    if (node) {
>> +        debug_interval_tree_int(node, "*", 0);
>> +    } else {
>> +        printf("null\n");
>> +    }
>> +}
>> +#endif
> 
> This seems superfluous especially as we have unit tests.

Oh, I used this directly from gdb.  I meant to disable it again.

r~



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 11/24] accel/tcg: Rename tb_invalidate_phys_page_range and drop end parameter
  2022-10-06  3:11 ` [PATCH 11/24] accel/tcg: Rename tb_invalidate_phys_page_range and drop end parameter Richard Henderson
@ 2022-10-25 13:21   ` Alex Bennée
  0 siblings, 0 replies; 55+ messages in thread
From: Alex Bennée @ 2022-10-25 13:21 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, laurent, pbonzini, imp, f4bug


Richard Henderson <richard.henderson@linaro.org> writes:

> This function is is never called with a real range,
> only for a single page.  Drop the second parameter
> and rename to tb_invalidate_phys_page.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 12/24] accel/tcg: Unify declarations of tb_invalidate_phys_range
  2022-10-06  3:11 ` [PATCH 12/24] accel/tcg: Unify declarations of tb_invalidate_phys_range Richard Henderson
@ 2022-10-25 13:24   ` Alex Bennée
  0 siblings, 0 replies; 55+ messages in thread
From: Alex Bennée @ 2022-10-25 13:24 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, laurent, pbonzini, imp, f4bug


Richard Henderson <richard.henderson@linaro.org> writes:

> We missed this function when we introduced tb_page_addr_t.

That was a loooong time ago.

Anyway:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 14/24] accel/tcg: Call tb_invalidate_phys_page for PAGE_RESET
  2022-10-06  3:11 ` [PATCH 14/24] accel/tcg: Call tb_invalidate_phys_page for PAGE_RESET Richard Henderson
@ 2022-10-25 15:42   ` Alex Bennée
  2022-10-25 20:55     ` Richard Henderson
  0 siblings, 1 reply; 55+ messages in thread
From: Alex Bennée @ 2022-10-25 15:42 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, laurent, pbonzini, imp, f4bug


Richard Henderson <richard.henderson@linaro.org> writes:

> When PAGE_RESET is set, we are replacing pages with new
> content, which means that we need to invalidate existing
> cached data, such as TranslationBlocks.  Perform the
> reset invalidate while we're doing other invalidates,
> which allows us to remove the separate invalidates from
> the user-only mmap/munmap/mprotect routines.
>
> In addition, restrict invalidation to PAGE_EXEC pages.
> Since cdf713085131, we have validated PAGE_EXEC is present
> before translation, which means we can assume that if the
> bit is not present, there are no translations to invalidate.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  accel/tcg/translate-all.c | 19 +++++++++++--------
>  bsd-user/mmap.c           |  2 --
>  linux-user/mmap.c         |  4 ----
>  3 files changed, 11 insertions(+), 14 deletions(-)
>
> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> index 8d5233fa9e..478301f227 100644
> --- a/accel/tcg/translate-all.c
> +++ b/accel/tcg/translate-all.c
> @@ -1352,7 +1352,7 @@ int page_get_flags(target_ulong address)
>  void page_set_flags(target_ulong start, target_ulong end, int flags)
>  {
>      target_ulong addr, len;
> -    bool reset_target_data;
> +    bool reset;
>  
>      /* This function should never be called with addresses outside the
>         guest address space.  If this assert fires, it probably indicates
> @@ -1369,7 +1369,7 @@ void page_set_flags(target_ulong start, target_ulong end, int flags)
>      if (flags & PAGE_WRITE) {
>          flags |= PAGE_WRITE_ORG;
>      }
> -    reset_target_data = !(flags & PAGE_VALID) || (flags & PAGE_RESET);
> +    reset = !(flags & PAGE_VALID) || (flags & PAGE_RESET);
>      flags &= ~PAGE_RESET;

Not a problem with this patch but I was a little confused by PAGE_VALID
because its the one "special" flag not documented in cpu-all.h:

  /* same as PROT_xxx */
  #define PAGE_READ      0x0001
  #define PAGE_WRITE     0x0002
  #define PAGE_EXEC      0x0004
  #define PAGE_BITS      (PAGE_READ | PAGE_WRITE | PAGE_EXEC)

The above are self explanatory as they mirror the mmap flags. But what
does PAGE_VALID really mean. We set it in the elfloader and whenever we
mmap which begs the question when is a page not VALID? The only place
that ever seems to clear the flag is the PPC mmu_helper code in a
response to a particular TLB operations. Should that code instead be
doing page_set_flags(PAGE_RESET)?

  #define PAGE_VALID     0x0008
  /*
   * Original state of the write flag (used when tracking self-modifying code)
   */
  #define PAGE_WRITE_ORG 0x0010
  /*
   * Invalidate the TLB entry immediately, helpful for s390x
   * Low-Address-Protection. Used with PAGE_WRITE in tlb_set_page_with_attrs()
   */
  #define PAGE_WRITE_INV 0x0020
  /* For use with page_set_flags: page is being replaced; target_data cleared. */
  #define PAGE_RESET     0x0040
  /* For linux-user, indicates that the page is MAP_ANON. */
  #define PAGE_ANON      0x0080

  #if defined(CONFIG_BSD) && defined(CONFIG_USER_ONLY)
  /* FIXME: Code that sets/uses this is broken and needs to go away.  */
  #define PAGE_RESERVED  0x0100
  #endif
  /* Target-specific bits that will be used via page_get_flags().  */
  #define PAGE_TARGET_1  0x0200
  #define PAGE_TARGET_2  0x0400

  /*
   * For linux-user, indicates that the page is mapped with the same semantics
   * in both guest and host.
   */
  #define PAGE_PASSTHROUGH 0x0800

Anyway that confusion aside:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 15/24] accel/tcg: Use interval tree for TBs in user-only mode
  2022-10-06  3:11 ` [PATCH 15/24] accel/tcg: Use interval tree for TBs in user-only mode Richard Henderson
@ 2022-10-25 15:58   ` Alex Bennée
  2022-10-25 21:19     ` Richard Henderson
  0 siblings, 1 reply; 55+ messages in thread
From: Alex Bennée @ 2022-10-25 15:58 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, laurent, pbonzini, imp, f4bug


Richard Henderson <richard.henderson@linaro.org> writes:

> Begin weaning user-only away from PageDesc.
>
> Since, for user-only, all TB (and page) manipulation is done with
> a single mutex, and there is no virtual/physical discontinuity to
> split a TB across discontinuous pages, place all of the TBs into
> a single IntervalTree. This makes it trivial to find all of the
> TBs intersecting a range.
>
> Retain the existing PageDesc + linked list implementation for
> system mode.  Move the portion of the implementation that overlaps
> the new user-only code behind the common ifdef.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  accel/tcg/internal.h      |  16 +-
>  include/exec/exec-all.h   |  43 ++++-
>  accel/tcg/tb-maint.c      | 388 ++++++++++++++++++++++----------------
>  accel/tcg/translate-all.c |   4 +-
>  4 files changed, 280 insertions(+), 171 deletions(-)
>
<snip>
> diff --git a/accel/tcg/tb-maint.c b/accel/tcg/tb-maint.c
> index c8e921089d..14e8e47a6a 100644
> --- a/accel/tcg/tb-maint.c
> +++ b/accel/tcg/tb-maint.c
> @@ -18,6 +18,7 @@
>   */
>  
>  #include "qemu/osdep.h"
> +#include "qemu/interval-tree.h"
>  #include "exec/cputlb.h"
>  #include "exec/log.h"
>  #include "exec/exec-all.h"
> @@ -50,6 +51,75 @@ void tb_htable_init(void)
>      qht_init(&tb_ctx.htable, tb_cmp, CODE_GEN_HTABLE_SIZE, mode);
>  }

I wonder for the sake of avoiding recompilation of units later on and
having a clean separation between user and system mode it would be worth
putting this stuff in a tb-maint-user.c?

>  
> +#ifdef CONFIG_USER_ONLY
> +/*
> + * For user-only, since we are protecting all of memory with a single lock,
> + * and because the two pages of a TranslationBlock are always contiguous,
> + * use a single data structure to record all TranslationBlocks.
> + */
<snip>
> +
> +/*
> + * Called with mmap_lock held. If pc is not 0 then it indicates the
> + * host PC of the faulting store instruction that caused this invalidate.
> + * Returns true if the caller needs to abort execution of the current
> + * TB (because it was modified by this store and the guest CPU has
> + * precise-SMC semantics).
> + */
> +bool tb_invalidate_phys_page_unwind(tb_page_addr_t addr, uintptr_t pc)
> +{
> +    assert(pc != 0);
> +#ifdef TARGET_HAS_PRECISE_SMC
> +    assert_memory_lock();

Out of interest is this just because x86 has such a strong memory model
you can get away with this sort of patching without explicit flushes?
I'm curious why this is the only arch we jump through these hoops for?

> +    {
> +        TranslationBlock *current_tb = tcg_tb_lookup(pc);
> +        bool current_tb_modified = false;
> +        TranslationBlock *tb;
> +        PageForEachNext n;
> +
> +        addr &= TARGET_PAGE_MASK;
> +
> +        PAGE_FOR_EACH_TB(addr, addr + TARGET_PAGE_SIZE, unused, tb, n) {
> +            if (current_tb == tb &&
> +                (tb_cflags(current_tb) & CF_COUNT_MASK) != 1) {
> +                /*
> +                 * If we are modifying the current TB, we must stop its
> +                 * execution. We could be more precise by checking that
> +                 * the modification is after the current PC, but it would
> +                 * require a specialized function to partially restore
> +                 * the CPU state.
> +                 */
> +                current_tb_modified = true;
> +                cpu_restore_state_from_tb(current_cpu, current_tb, pc, true);
> +            }
> +            tb_phys_invalidate__locked(tb);
> +        }
> +
> +        if (current_tb_modified) {
> +            /* Force execution of one insn next time.  */
> +            CPUState *cpu = current_cpu;
> +            cpu->cflags_next_tb = 1 | CF_NOIRQ | curr_cflags(current_cpu);
> +            return true;
> +        }
> +    }
> +#else
> +    tb_invalidate_phys_page(addr);
> +#endif /* TARGET_HAS_PRECISE_SMC */
> +    return false;
> +}
> +#else
>  /*
>   * @p must be non-NULL.
>   * user-mode: call with mmap_lock held.
> @@ -492,22 +637,17 @@ tb_invalidate_phys_page_range__locked(struct page_collection *pages,
>  {
>      TranslationBlock *tb;
>      tb_page_addr_t tb_start, tb_end;
> -    int n;
> +    PageForEachNext n;
>  #ifdef TARGET_HAS_PRECISE_SMC
> -    CPUState *cpu = current_cpu;
> -    bool current_tb_not_found = retaddr != 0;
>      bool current_tb_modified = false;
> -    TranslationBlock *current_tb = NULL;
> +    TranslationBlock *current_tb = retaddr ? tcg_tb_lookup(retaddr) : NULL;
>  #endif /* TARGET_HAS_PRECISE_SMC */
>  
> -    assert_page_locked(p);
> -
>      /*
>       * We remove all the TBs in the range [start, end[.
>       * XXX: see if in some cases it could be faster to invalidate all the code
>       */

I'm guessing this comment is quite stale now given we try quite hard to
avoid doing lots of code gen over and over again. The only case I can
think of is memory clear routines after we've had code which there might
be some heuristics we could use to detect but don't currently.

<snip>

Otherwise:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 16/24] accel/tcg: Use page_reset_target_data in page_set_flags
  2022-10-06  3:11 ` [PATCH 16/24] accel/tcg: Use page_reset_target_data in page_set_flags Richard Henderson
@ 2022-10-25 16:12   ` Alex Bennée
  0 siblings, 0 replies; 55+ messages in thread
From: Alex Bennée @ 2022-10-25 16:12 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, laurent, pbonzini, imp, f4bug


Richard Henderson <richard.henderson@linaro.org> writes:

> Use the existing function for clearing target data.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 17/24] accel/tcg: Use tb_invalidate_phys_range in page_set_flags
  2022-10-06  3:11 ` [PATCH 17/24] accel/tcg: Use tb_invalidate_phys_range " Richard Henderson
@ 2022-10-25 16:14   ` Alex Bennée
  0 siblings, 0 replies; 55+ messages in thread
From: Alex Bennée @ 2022-10-25 16:14 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, laurent, pbonzini, imp, f4bug


Richard Henderson <richard.henderson@linaro.org> writes:

> Flush translation blocks in bulk, rather than page-by-page.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 18/24] accel/tcg: Move TARGET_PAGE_DATA_SIZE impl to user-exec.c
  2022-10-06  3:11 ` [PATCH 18/24] accel/tcg: Move TARGET_PAGE_DATA_SIZE impl to user-exec.c Richard Henderson
@ 2022-10-25 16:15   ` Alex Bennée
  0 siblings, 0 replies; 55+ messages in thread
From: Alex Bennée @ 2022-10-25 16:15 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, laurent, pbonzini, imp, f4bug


Richard Henderson <richard.henderson@linaro.org> writes:

> Since "target data" is always user-only, move it out of
> translate-all.c to user-exec.c.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 19/24] accel/tcg: Simplify page_get/alloc_target_data
  2022-10-06  3:11 ` [PATCH 19/24] accel/tcg: Simplify page_get/alloc_target_data Richard Henderson
@ 2022-10-25 16:19   ` Alex Bennée
  0 siblings, 0 replies; 55+ messages in thread
From: Alex Bennée @ 2022-10-25 16:19 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, laurent, pbonzini, imp, f4bug


Richard Henderson <richard.henderson@linaro.org> writes:

> Since the only user, Arm MTE, always requires allocation,
> merge the get and alloc functions to always produce a
> non-null result.  Also assume that the user has already
> checked page validity.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 20/24] accel/tcg: Use interval tree for TARGET_PAGE_DATA_SIZE
  2022-10-06  3:11 ` [PATCH 20/24] accel/tcg: Use interval tree for TARGET_PAGE_DATA_SIZE Richard Henderson
@ 2022-10-25 19:30   ` Alex Bennée
  2022-10-25 21:29     ` Richard Henderson
  0 siblings, 1 reply; 55+ messages in thread
From: Alex Bennée @ 2022-10-25 19:30 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, laurent, pbonzini, imp, f4bug


Richard Henderson <richard.henderson@linaro.org> writes:

> Continue weaning user-only away from PageDesc.
>
> Use an interval tree to record target data.
> Chunk the data, to minimize allocation overhead.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  accel/tcg/internal.h  |   1 -
>  accel/tcg/user-exec.c | 110 ++++++++++++++++++++++++++++++++----------
>  2 files changed, 85 insertions(+), 26 deletions(-)
>
> diff --git a/accel/tcg/internal.h b/accel/tcg/internal.h
> index 1bd5a02911..8731dc52e2 100644
> --- a/accel/tcg/internal.h
> +++ b/accel/tcg/internal.h
> @@ -26,7 +26,6 @@
>  typedef struct PageDesc {
>  #ifdef CONFIG_USER_ONLY
>      unsigned long flags;
> -    void *target_data;
>  #else
>      QemuSpin lock;
>      /* list of TBs intersecting this ram page */
> diff --git a/accel/tcg/user-exec.c b/accel/tcg/user-exec.c
> index fb7d6ee9e9..bce3d5f335 100644
> --- a/accel/tcg/user-exec.c
> +++ b/accel/tcg/user-exec.c
> @@ -210,47 +210,107 @@ tb_page_addr_t get_page_addr_code_hostp(CPUArchState *env, target_ulong addr,
>      return addr;
>  }
>  
> +#ifdef TARGET_PAGE_DATA_SIZE
> +/*
> + * Allocate chunks of target data together.  For the only current user,
> + * if we allocate one hunk per page, we have overhead of 40/128 or 40%.
> + * Therefore, allocate memory for 64 pages at a time for overhead < 1%.
> + */
> +#define TPD_PAGES  64
> +#define TBD_MASK   (TARGET_PAGE_MASK * TPD_PAGES)
> +
> +typedef struct TargetPageDataNode {
> +    IntervalTreeNode itree;
> +    char data[TPD_PAGES][TARGET_PAGE_DATA_SIZE] __attribute__((aligned));
> +} TargetPageDataNode;
> +
> +static IntervalTreeRoot targetdata_root;
> +
>  void page_reset_target_data(target_ulong start, target_ulong end)
>  {
> -#ifdef TARGET_PAGE_DATA_SIZE
> -    target_ulong addr, len;
> +    IntervalTreeNode *n, *next;
> +    target_ulong last;
>  
> -    /*
> -     * This function should never be called with addresses outside the
> -     * guest address space.  If this assert fires, it probably indicates
> -     * a missing call to h2g_valid.
> -     */
> -    assert(end - 1 <= GUEST_ADDR_MAX);
> -    assert(start < end);
>      assert_memory_lock();
>  
>      start = start & TARGET_PAGE_MASK;
> -    end = TARGET_PAGE_ALIGN(end);
> +    last = TARGET_PAGE_ALIGN(end) - 1;
>  
> -    for (addr = start, len = end - start;
> -         len != 0;
> -         len -= TARGET_PAGE_SIZE, addr += TARGET_PAGE_SIZE) {
> -        PageDesc *p = page_find_alloc(addr >> TARGET_PAGE_BITS, 1);
> +    for (n = interval_tree_iter_first(&targetdata_root, start, last),
> +         next = n ? interval_tree_iter_next(n, start, last) : NULL;
> +         n != NULL;
> +         n = next,
> +         next = next ? interval_tree_iter_next(n, start, last) : NULL) {
> +        target_ulong n_start, n_last, p_ofs, p_len;
> +        TargetPageDataNode *t;
>  
> -        g_free(p->target_data);
> -        p->target_data = NULL;
> +        if (n->start >= start && n->last <= last) {
> +            interval_tree_remove(n, &targetdata_root);
> +            g_free(n);
> +            continue;
> +        }
> +
> +        if (n->start < start) {
> +            n_start = start;
> +            p_ofs = (start - n->start) >> TARGET_PAGE_BITS;
> +        } else {
> +            n_start = n->start;
> +            p_ofs = 0;
> +        }
> +        n_last = MIN(last, n->last);
> +        p_len = (n_last + 1 - n_start) >> TARGET_PAGE_BITS;
> +
> +        t = container_of(n, TargetPageDataNode, itree);
> +        memset(t->data[p_ofs], 0, p_len * TARGET_PAGE_DATA_SIZE);
>      }
> -#endif
>  }
>  
> -#ifdef TARGET_PAGE_DATA_SIZE
>  void *page_get_target_data(target_ulong address)
>  {
> -    PageDesc *p = page_find(address >> TARGET_PAGE_BITS);
> -    void *ret = p->target_data;
> +    IntervalTreeNode *n;
> +    TargetPageDataNode *t;
> +    target_ulong page, region;
> +    bool locked;
>  
> -    if (!ret) {
> -        ret = g_malloc0(TARGET_PAGE_DATA_SIZE);
> -        p->target_data = ret;
> +    page = address & TARGET_PAGE_MASK;
> +    region = address & TBD_MASK;
> +
> +    n = interval_tree_iter_first(&targetdata_root, page, page);
> +    if (n) {
> +        goto found;
>      }
> -    return ret;
> +
> +    /*
> +     * See util/interval-tree.c re lockless lookups: no false positives but
> +     * there are false negatives.  If we find nothing, retry with the mmap
> +     * lock acquired.  We also need the lock for the allocation + insert.
> +     */
> +    locked = have_mmap_lock();

Are we expecting to already hold the lock here?

> +    if (!locked) {
> +        mmap_lock();
> +        n = interval_tree_iter_first(&targetdata_root, page, page);

So we only search if we haven't got a lock already.

> +        if (n) {
> +            mmap_unlock();
> +            goto found;
> +        }
> +    }
> +
> +    t = g_new0(TargetPageDataNode, 1);
> +    n = &t->itree;
> +    n->start = region;
> +    n->last = region | ~TBD_MASK;
> +    interval_tree_insert(n, &targetdata_root);
> +    if (!locked) {
> +        mmap_unlock();
> +    }

To be honest the mmap_lock safe to use recursively and wrapping the
locked portion in a LOCK_GUARD would make me happier that we didn't cock
up unwinding.

> +
> + found:
> +    t = container_of(n, TargetPageDataNode, itree);
> +    return t->data[(page - region) >> TARGET_PAGE_BITS];
>  }
> -#endif
> +#else
> +void page_reset_target_data(target_ulong start, target_ulong end) { }
> +#endif /* TARGET_PAGE_DATA_SIZE */
>  
>  /* The softmmu versions of these helpers are in cputlb.c.  */


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 14/24] accel/tcg: Call tb_invalidate_phys_page for PAGE_RESET
  2022-10-25 15:42   ` Alex Bennée
@ 2022-10-25 20:55     ` Richard Henderson
  0 siblings, 0 replies; 55+ messages in thread
From: Richard Henderson @ 2022-10-25 20:55 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel, laurent, pbonzini, imp, f4bug

On 10/26/22 01:42, Alex Bennée wrote:
> Not a problem with this patch but I was a little confused by PAGE_VALID
> because its the one "special" flag not documented in cpu-all.h:
> 
>    /* same as PROT_xxx */
>    #define PAGE_READ      0x0001
>    #define PAGE_WRITE     0x0002
>    #define PAGE_EXEC      0x0004
>    #define PAGE_BITS      (PAGE_READ | PAGE_WRITE | PAGE_EXEC)
> 
> The above are self explanatory as they mirror the mmap flags. But what
> does PAGE_VALID really mean.

It exists so that unmapped pages have flags == 0 and
mmap(..., PROT_NONE, ...) pages have flags != 0.

Perhaps a better name would have been "allocated".
You are perhaps 25 years to late to bikeshed that name.  :-)

> The only place
> that ever seems to clear the flag is the PPC mmu_helper code in a
> response to a particular TLB operations. Should that code instead be
> doing page_set_flags(PAGE_RESET)?

Heh.  ppc seems to have re-used the symbol for its own internal data structure.
Well, considering that it's got to pass on "prot" to tlb_set_page at some point, re-using 
those names isn't a horrible idea.

Anyway, you can see target_munmap clear PAGE_VALID here:

         page_set_flags(start, start + len, 0);

PAGE_RESET exists to distinguish mmap replacing an exiting mapping (old backing storage 
replaced) from mprotect (old backing storage retained, new permissions applied).


r~


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 15/24] accel/tcg: Use interval tree for TBs in user-only mode
  2022-10-25 15:58   ` Alex Bennée
@ 2022-10-25 21:19     ` Richard Henderson
  0 siblings, 0 replies; 55+ messages in thread
From: Richard Henderson @ 2022-10-25 21:19 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel, laurent, pbonzini, imp, f4bug

On 10/26/22 01:58, Alex Bennée wrote:
> 
> Richard Henderson <richard.henderson@linaro.org> writes:
> 
>> Begin weaning user-only away from PageDesc.
>>
>> Since, for user-only, all TB (and page) manipulation is done with
>> a single mutex, and there is no virtual/physical discontinuity to
>> split a TB across discontinuous pages, place all of the TBs into
>> a single IntervalTree. This makes it trivial to find all of the
>> TBs intersecting a range.
>>
>> Retain the existing PageDesc + linked list implementation for
>> system mode.  Move the portion of the implementation that overlaps
>> the new user-only code behind the common ifdef.
>>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>   accel/tcg/internal.h      |  16 +-
>>   include/exec/exec-all.h   |  43 ++++-
>>   accel/tcg/tb-maint.c      | 388 ++++++++++++++++++++++----------------
>>   accel/tcg/translate-all.c |   4 +-
>>   4 files changed, 280 insertions(+), 171 deletions(-)
>>
> <snip>
>> diff --git a/accel/tcg/tb-maint.c b/accel/tcg/tb-maint.c
>> index c8e921089d..14e8e47a6a 100644
>> --- a/accel/tcg/tb-maint.c
>> +++ b/accel/tcg/tb-maint.c
>> @@ -18,6 +18,7 @@
>>    */
>>   
>>   #include "qemu/osdep.h"
>> +#include "qemu/interval-tree.h"
>>   #include "exec/cputlb.h"
>>   #include "exec/log.h"
>>   #include "exec/exec-all.h"
>> @@ -50,6 +51,75 @@ void tb_htable_init(void)
>>       qht_init(&tb_ctx.htable, tb_cmp, CODE_GEN_HTABLE_SIZE, mode);
>>   }
> 
> I wonder for the sake of avoiding recompilation of units later on and
> having a clean separation between user and system mode it would be worth
> putting this stuff in a tb-maint-user.c?

Something like that might be possible toward the end of the series, but there's too much 
tangle in the middle.

>> +bool tb_invalidate_phys_page_unwind(tb_page_addr_t addr, uintptr_t pc)
>> +{
>> +    assert(pc != 0);
>> +#ifdef TARGET_HAS_PRECISE_SMC
>> +    assert_memory_lock();
> 
> Out of interest is this just because x86 has such a strong memory model
> you can get away with this sort of patching without explicit flushes?

Yes.

> I'm curious why this is the only arch we jump through these hoops for?

It's probably the only one for which we have extensive test cases for self-modifying code 
(aka msdos).  I can imagine that other old targets might technically require it, e.g. 
m68k, or some of the micro-controllers.

I'm actually not sure why we don't enable it everywhere -- it doesn't seem like it's 
adding lots of overhead.  But until someone reports a bug, or it gets in the way of 
multi-target, I'm happy to not enable it anywhere else.

>>       /*
>>        * We remove all the TBs in the range [start, end[.
>>        * XXX: see if in some cases it could be faster to invalidate all the code
>>        */
> 
> I'm guessing this comment is quite stale now given we try quite hard to
> avoid doing lots of code gen over and over again. The only case I can
> think of is memory clear routines after we've had code which there might
> be some heuristics we could use to detect but don't currently.

I think there's still room for exploration here, especially detecting page clearing as you 
say.


r~


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 20/24] accel/tcg: Use interval tree for TARGET_PAGE_DATA_SIZE
  2022-10-25 19:30   ` Alex Bennée
@ 2022-10-25 21:29     ` Richard Henderson
  0 siblings, 0 replies; 55+ messages in thread
From: Richard Henderson @ 2022-10-25 21:29 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel, laurent, pbonzini, imp, f4bug

On 10/26/22 05:30, Alex Bennée wrote:
>>   void *page_get_target_data(target_ulong address)
>>   {
>> -    PageDesc *p = page_find(address >> TARGET_PAGE_BITS);
>> -    void *ret = p->target_data;
>> +    IntervalTreeNode *n;
>> +    TargetPageDataNode *t;
>> +    target_ulong page, region;
>> +    bool locked;
>>   
>> -    if (!ret) {
>> -        ret = g_malloc0(TARGET_PAGE_DATA_SIZE);
>> -        p->target_data = ret;
>> +    page = address & TARGET_PAGE_MASK;
>> +    region = address & TBD_MASK;
>> +
>> +    n = interval_tree_iter_first(&targetdata_root, page, page);
>> +    if (n) {
>> +        goto found;
>>       }
>> -    return ret;
>> +
>> +    /*
>> +     * See util/interval-tree.c re lockless lookups: no false positives but
>> +     * there are false negatives.  If we find nothing, retry with the mmap
>> +     * lock acquired.  We also need the lock for the allocation + insert.
>> +     */
>> +    locked = have_mmap_lock();
> 
> Are we expecting to already hold the lock here?

No.  This is called in the middle of a normal load/store instruction, in response to MTE 
being enabled.  We really want to avoid taking a lock if we can.

> 
>> +    if (!locked) {
>> +        mmap_lock();
>> +        n = interval_tree_iter_first(&targetdata_root, page, page);
> 
> So we only search if we haven't got a lock already.

We only search *again* if we haven't got a lock already.  If we had the lock, then the 
first search above produced golden results.  And we must have the lock, acquired just now...

> 
>> +        if (n) {
>> +            mmap_unlock();
>> +            goto found;
>> +        }
>> +    }
>> +
>> +    t = g_new0(TargetPageDataNode, 1);
>> +    n = &t->itree;
>> +    n->start = region;
>> +    n->last = region | ~TBD_MASK;
>> +    interval_tree_insert(n, &targetdata_root);

... for the modification to the data structure.

>> +    if (!locked) {
>> +        mmap_unlock();
>> +    }
> 
> To be honest the mmap_lock safe to use recursively and wrapping the
> locked portion in a LOCK_GUARD would make me happier that we didn't cock
> up unwinding.

Fair.  I was trying to avoid a second interval tree walk if we *did* happen to have the 
lock, but as I said myself above, that's extremely unlikely given the only user.


r~


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 21/24] accel/tcg: Move page_{get, set}_flags to user-exec.c
  2022-10-06  3:11 ` [PATCH 21/24] accel/tcg: Move page_{get,set}_flags to user-exec.c Richard Henderson
@ 2022-10-26 12:35   ` Alex Bennée
  0 siblings, 0 replies; 55+ messages in thread
From: Alex Bennée @ 2022-10-26 12:35 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, laurent, pbonzini, imp, f4bug


Richard Henderson <richard.henderson@linaro.org> writes:

> This page tracking implementation is specific to user-only,
> since the system softmmu version is in cputlb.c.  Move it
> out of translate-all.c to user-exec.c.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 22/24] accel/tcg: Use interval tree for user-only page tracking
  2022-10-06  3:11 ` [PATCH 22/24] accel/tcg: Use interval tree for user-only page tracking Richard Henderson
@ 2022-10-26 13:36   ` Alex Bennée
  2022-10-27 10:20     ` Richard Henderson
  0 siblings, 1 reply; 55+ messages in thread
From: Alex Bennée @ 2022-10-26 13:36 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, laurent, pbonzini, imp, f4bug


Richard Henderson <richard.henderson@linaro.org> writes:

> Finish weaning user-only away from PageDesc.
>
> Using an interval tree to track page permissions means that
> we can represent very large regions efficiently.
>
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/290
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/967
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1214
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  accel/tcg/internal.h           |   4 +-
>  accel/tcg/tb-maint.c           |  20 +-
>  accel/tcg/user-exec.c          | 614 ++++++++++++++++++++++-----------
>  tests/tcg/multiarch/test-vma.c |  22 ++
>  4 files changed, 451 insertions(+), 209 deletions(-)
>  create mode 100644 tests/tcg/multiarch/test-vma.c
>
> diff --git a/accel/tcg/internal.h b/accel/tcg/internal.h
> index 250f0daac9..c7e157d1cd 100644
> --- a/accel/tcg/internal.h
> +++ b/accel/tcg/internal.h
> @@ -24,9 +24,7 @@
>  #endif
>  
>  typedef struct PageDesc {
> -#ifdef CONFIG_USER_ONLY
> -    unsigned long flags;
> -#else
> +#ifndef CONFIG_USER_ONLY
>      QemuSpin lock;
>      /* list of TBs intersecting this ram page */
>      uintptr_t first_tb;
> diff --git a/accel/tcg/tb-maint.c b/accel/tcg/tb-maint.c
> index 14e8e47a6a..694440cb4a 100644
> --- a/accel/tcg/tb-maint.c
> +++ b/accel/tcg/tb-maint.c
> @@ -68,15 +68,23 @@ static void page_flush_tb(void)
<snip>
>  
>  int page_get_flags(target_ulong address)
>  {
> -    PageDesc *p;
> +    PageFlagsNode *p = pageflags_find(address, address);
>  
> -    p = page_find(address >> TARGET_PAGE_BITS);
> +    /*
> +     * See util/interval-tree.c re lockless lookups: no false positives but
> +     * there are false negatives.  If we find nothing, retry with the mmap
> +     * lock acquired.
> +     */
>      if (!p) {
> -        return 0;
> +        if (have_mmap_lock()) {
> +            return 0;
> +        }
> +        mmap_lock();
> +        p = pageflags_find(address, address);
> +        mmap_unlock();
> +        if (!p) {
> +            return 0;
> +        }
>      }
>      return p->flags;

To avoid the brain twisting following locks and multiple return legs how about this:

  int page_get_flags(target_ulong address)
  {
      PageFlagsNode *p = pageflags_find(address, address);

      /*
       * See util/interval-tree.c re lockless lookups: no false positives but
       * there are false negatives.  If we had the lock and found
       * nothing we are done, otherwise retry with the mmap lock acquired.
       */
      if (have_mmap_lock()) {
          return p ? p->flags : 0;
      }

      mmap_lock();
      p = pageflags_find(address, address);
      mmap_unlock();

      return p ? p->flags : 0;
  }


> diff --git a/tests/tcg/multiarch/test-vma.c b/tests/tcg/multiarch/test-vma.c
> new file mode 100644
> index 0000000000..2893d60334
> --- /dev/null
> +++ b/tests/tcg/multiarch/test-vma.c
> @@ -0,0 +1,22 @@
> +/*
> + * Test very large vma allocations.
> + * The qemu out-of-memory condition was within the mmap syscall itself.
> + * If the syscall actually returns with MAP_FAILED, the test succeeded.
> + */
> +#include <sys/mman.h>
> +
> +int main()
> +{
> +    int n = sizeof(size_t) == 4 ? 32 : 45;
> +
> +    for (int i = 28; i < n; i++) {
> +        size_t l = (size_t)1 << i;
> +        void *p = mmap(0, l, PROT_NONE,
> +                       MAP_PRIVATE | MAP_ANONYMOUS | MAP_NORESERVE, -1, 0);
> +        if (p == MAP_FAILED) {
> +            break;
> +        }
> +        munmap(p, l);
> +    }
> +    return 0;
> +}

So is the failure mode here we actually seg or bus out?

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 24/24] accel/tcg: Move remainder of page locking to tb-maint.c
  2022-10-06  3:11 ` [PATCH 24/24] accel/tcg: Move remainder of page locking to tb-maint.c Richard Henderson
@ 2022-10-26 13:48   ` Alex Bennée
  0 siblings, 0 replies; 55+ messages in thread
From: Alex Bennée @ 2022-10-26 13:48 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, laurent, pbonzini, imp, f4bug


Richard Henderson <richard.henderson@linaro.org> writes:

> The only thing that still touches PageDesc in translate-all.c
> are some locking routines related to tb-maint.c which have not
> yet been moved.  Do so now.
>
> Move some code up in tb-maint.c as well, to untangle the maze
> of ifdefs, and allow a sensible final ordering.
>
> Move some declarations from exec/translate-all.h to internal.h,
> as they are only used within accel/tcg/.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 22/24] accel/tcg: Use interval tree for user-only page tracking
  2022-10-26 13:36   ` Alex Bennée
@ 2022-10-27 10:20     ` Richard Henderson
  2022-10-27 15:59       ` Alex Bennée
  0 siblings, 1 reply; 55+ messages in thread
From: Richard Henderson @ 2022-10-27 10:20 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel, laurent, pbonzini, imp, f4bug

On 10/26/22 23:36, Alex Bennée wrote:
> 
> Richard Henderson <richard.henderson@linaro.org> writes:
> 
>> Finish weaning user-only away from PageDesc.
>>
>> Using an interval tree to track page permissions means that
>> we can represent very large regions efficiently.
>>
>> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/290
>> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/967
>> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1214
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>   accel/tcg/internal.h           |   4 +-
>>   accel/tcg/tb-maint.c           |  20 +-
>>   accel/tcg/user-exec.c          | 614 ++++++++++++++++++++++-----------
>>   tests/tcg/multiarch/test-vma.c |  22 ++
>>   4 files changed, 451 insertions(+), 209 deletions(-)
>>   create mode 100644 tests/tcg/multiarch/test-vma.c
>>
>> diff --git a/accel/tcg/internal.h b/accel/tcg/internal.h
>> index 250f0daac9..c7e157d1cd 100644
>> --- a/accel/tcg/internal.h
>> +++ b/accel/tcg/internal.h
>> @@ -24,9 +24,7 @@
>>   #endif
>>   
>>   typedef struct PageDesc {
>> -#ifdef CONFIG_USER_ONLY
>> -    unsigned long flags;
>> -#else
>> +#ifndef CONFIG_USER_ONLY
>>       QemuSpin lock;
>>       /* list of TBs intersecting this ram page */
>>       uintptr_t first_tb;
>> diff --git a/accel/tcg/tb-maint.c b/accel/tcg/tb-maint.c
>> index 14e8e47a6a..694440cb4a 100644
>> --- a/accel/tcg/tb-maint.c
>> +++ b/accel/tcg/tb-maint.c
>> @@ -68,15 +68,23 @@ static void page_flush_tb(void)
> <snip>
>>   
>>   int page_get_flags(target_ulong address)
>>   {
>> -    PageDesc *p;
>> +    PageFlagsNode *p = pageflags_find(address, address);
>>   
>> -    p = page_find(address >> TARGET_PAGE_BITS);
>> +    /*
>> +     * See util/interval-tree.c re lockless lookups: no false positives but
>> +     * there are false negatives.  If we find nothing, retry with the mmap
>> +     * lock acquired.
>> +     */
>>       if (!p) {
>> -        return 0;
>> +        if (have_mmap_lock()) {
>> +            return 0;
>> +        }
>> +        mmap_lock();
>> +        p = pageflags_find(address, address);
>> +        mmap_unlock();
>> +        if (!p) {
>> +            return 0;
>> +        }
>>       }
>>       return p->flags;
> 
> To avoid the brain twisting following locks and multiple return legs how about this:
> 
>    int page_get_flags(target_ulong address)
>    {
>        PageFlagsNode *p = pageflags_find(address, address);
> 
>        /*
>         * See util/interval-tree.c re lockless lookups: no false positives but
>         * there are false negatives.  If we had the lock and found
>         * nothing we are done, otherwise retry with the mmap lock acquired.
>         */
>        if (have_mmap_lock()) {
>            return p ? p->flags : 0;
>        }
> 
>        mmap_lock();
>        p = pageflags_find(address, address);
>        mmap_unlock();
> 
>        return p ? p->flags : 0;
>    }

I'm unwilling to put an expensive test like a function call (have_mmap_lock) before an 
inexpensive test like pointer != NULL.

I don't see what's so brain twisting about the code as is.  The lock tightly surrounds a 
single statement, with a couple of pointer tests.

>> +/*
>> + * Test very large vma allocations.
>> + * The qemu out-of-memory condition was within the mmap syscall itself.
>> + * If the syscall actually returns with MAP_FAILED, the test succeeded.
>> + */
>> +#include <sys/mman.h>
>> +
>> +int main()
>> +{
>> +    int n = sizeof(size_t) == 4 ? 32 : 45;
>> +
>> +    for (int i = 28; i < n; i++) {
>> +        size_t l = (size_t)1 << i;
>> +        void *p = mmap(0, l, PROT_NONE,
>> +                       MAP_PRIVATE | MAP_ANONYMOUS | MAP_NORESERVE, -1, 0);
>> +        if (p == MAP_FAILED) {
>> +            break;
>> +        }
>> +        munmap(p, l);
>> +    }
>> +    return 0;
>> +}
> 
> So is the failure mode here we actually seg or bus out?

SEGV or KILL (via oom) depending on the state of the system. If the host is *really* 
beefy, it may even complete but with an unreasonable timeout.

r~


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 22/24] accel/tcg: Use interval tree for user-only page tracking
  2022-10-27 10:20     ` Richard Henderson
@ 2022-10-27 15:59       ` Alex Bennée
  2022-10-27 21:38         ` Richard Henderson
  0 siblings, 1 reply; 55+ messages in thread
From: Alex Bennée @ 2022-10-27 15:59 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, laurent, pbonzini, imp, f4bug


Richard Henderson <richard.henderson@linaro.org> writes:

> On 10/26/22 23:36, Alex Bennée wrote:
>> Richard Henderson <richard.henderson@linaro.org> writes:
>> 
>>> Finish weaning user-only away from PageDesc.
>>>
>>> Using an interval tree to track page permissions means that
>>> we can represent very large regions efficiently.
>>>
>>> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/290
>>> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/967
>>> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1214
>>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>>> ---
>>>   accel/tcg/internal.h           |   4 +-
>>>   accel/tcg/tb-maint.c           |  20 +-
>>>   accel/tcg/user-exec.c          | 614 ++++++++++++++++++++++-----------
>>>   tests/tcg/multiarch/test-vma.c |  22 ++
>>>   4 files changed, 451 insertions(+), 209 deletions(-)
>>>   create mode 100644 tests/tcg/multiarch/test-vma.c
>>>
>>> diff --git a/accel/tcg/internal.h b/accel/tcg/internal.h
>>> index 250f0daac9..c7e157d1cd 100644
>>> --- a/accel/tcg/internal.h
>>> +++ b/accel/tcg/internal.h
>>> @@ -24,9 +24,7 @@
>>>   #endif
>>>     typedef struct PageDesc {
>>> -#ifdef CONFIG_USER_ONLY
>>> -    unsigned long flags;
>>> -#else
>>> +#ifndef CONFIG_USER_ONLY
>>>       QemuSpin lock;
>>>       /* list of TBs intersecting this ram page */
>>>       uintptr_t first_tb;
>>> diff --git a/accel/tcg/tb-maint.c b/accel/tcg/tb-maint.c
>>> index 14e8e47a6a..694440cb4a 100644
>>> --- a/accel/tcg/tb-maint.c
>>> +++ b/accel/tcg/tb-maint.c
>>> @@ -68,15 +68,23 @@ static void page_flush_tb(void)
>> <snip>
>>>     int page_get_flags(target_ulong address)
>>>   {
>>> -    PageDesc *p;
>>> +    PageFlagsNode *p = pageflags_find(address, address);
>>>   -    p = page_find(address >> TARGET_PAGE_BITS);
>>> +    /*
>>> +     * See util/interval-tree.c re lockless lookups: no false positives but
>>> +     * there are false negatives.  If we find nothing, retry with the mmap
>>> +     * lock acquired.
>>> +     */
>>>       if (!p) {
>>> -        return 0;
>>> +        if (have_mmap_lock()) {
>>> +            return 0;
>>> +        }
>>> +        mmap_lock();
>>> +        p = pageflags_find(address, address);
>>> +        mmap_unlock();
>>> +        if (!p) {
>>> +            return 0;
>>> +        }
>>>       }
>>>       return p->flags;
>> To avoid the brain twisting following locks and multiple return legs
>> how about this:
>>    int page_get_flags(target_ulong address)
>>    {
>>        PageFlagsNode *p = pageflags_find(address, address);
>>        /*
>>         * See util/interval-tree.c re lockless lookups: no false positives but
>>         * there are false negatives.  If we had the lock and found
>>         * nothing we are done, otherwise retry with the mmap lock acquired.
>>         */
>>        if (have_mmap_lock()) {
>>            return p ? p->flags : 0;
>>        }
>>        mmap_lock();
>>        p = pageflags_find(address, address);
>>        mmap_unlock();
>>        return p ? p->flags : 0;
>>    }
>
> I'm unwilling to put an expensive test like a function call
> (have_mmap_lock) before an inexpensive test like pointer != NULL.

Is it really that more expensive?

> I don't see what's so brain twisting about the code as is.  The lock
> tightly surrounds a single statement, with a couple of pointer tests.

Sure, I guess I'm just trying to avoid having so many returns out of
the code at various levels of nesting. The page_get_target_data code is
harder to follow. What about:

int page_get_flags(target_ulong address)
{
    PageFlagsNode *p = pageflags_find(address, address);

    /*
     * See util/interval-tree.c re lockless lookups: no false positives but
     * there are false negatives.  If we had the lock and found
     * nothing we are done, otherwise retry with the mmap lock acquired.
     */
    if (p) {
        return p->flags;
    } else if (have_mmap_lock()) {
        return 0;
    }

    mmap_lock();
    p = pageflags_find(address, address);
    mmap_unlock();

    return p ? p->flags : 0;
}

and:

static IntervalTreeNode * new_target_data_locked(target_ulong region)
{
    IntervalTreeNode *n;
    TargetPageDataNode *t;

    t = g_new0(TargetPageDataNode, 1);
    n = &t->itree;
    n->start = region;
    n->last = region | ~TBD_MASK;
    interval_tree_insert(n, &targetdata_root);

    return n;
}

static inline void * get_target_data(IntervalTreeNode *n,
                                     target_ulong region, target_ulong page)
{
    TargetPageDataNode *t;
    t = container_of(n, TargetPageDataNode, itree);
    return t->data[(page - region) >> TARGET_PAGE_BITS];
}

void *page_get_target_data(target_ulong address)
{
    IntervalTreeNode *n;
    target_ulong page, region;

    page = address & TARGET_PAGE_MASK;
    region = address & TBD_MASK;

    n = interval_tree_iter_first(&targetdata_root, page, page);
    if (n) {
        return get_target_data(n, region, page);
    }

    /*
     * See util/interval-tree.c re lockless lookups: no false positives but
     * there are false negatives.  If we find nothing but had the lock
     * then we need a new node, otherwise try again under lock and
     * potentially allocate a new node.
     */
    if (have_mmap_lock()) {
        n = new_target_data_locked(region);
        return get_target_data(n, region, page);
    } else {
        mmap_lock();
        n = interval_tree_iter_first(&targetdata_root, page, page);

        if (!n) {
            n = new_target_data_locked(region);
        }
        mmap_unlock();
    }

    return get_target_data(n, region, page);
}

>
>>> +/*
>>> + * Test very large vma allocations.
>>> + * The qemu out-of-memory condition was within the mmap syscall itself.
>>> + * If the syscall actually returns with MAP_FAILED, the test succeeded.
>>> + */
>>> +#include <sys/mman.h>
>>> +
>>> +int main()
>>> +{
>>> +    int n = sizeof(size_t) == 4 ? 32 : 45;
>>> +
>>> +    for (int i = 28; i < n; i++) {
>>> +        size_t l = (size_t)1 << i;
>>> +        void *p = mmap(0, l, PROT_NONE,
>>> +                       MAP_PRIVATE | MAP_ANONYMOUS | MAP_NORESERVE, -1, 0);
>>> +        if (p == MAP_FAILED) {
>>> +            break;
>>> +        }
>>> +        munmap(p, l);
>>> +    }
>>> +    return 0;
>>> +}
>> So is the failure mode here we actually seg or bus out?
>
> SEGV or KILL (via oom) depending on the state of the system. If the
> host is *really* beefy, it may even complete but with an unreasonable
> timeout.
>
> r~


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 22/24] accel/tcg: Use interval tree for user-only page tracking
  2022-10-27 15:59       ` Alex Bennée
@ 2022-10-27 21:38         ` Richard Henderson
  0 siblings, 0 replies; 55+ messages in thread
From: Richard Henderson @ 2022-10-27 21:38 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel, laurent, pbonzini, imp, f4bug

On 10/28/22 01:59, Alex Bennée wrote:
>> I'm unwilling to put an expensive test like a function call
>> (have_mmap_lock) before an inexpensive test like pointer != NULL.
> 
> Is it really that more expensive?

Well, yes.  I mean, the function call isn't really slow, but it isn't single-cycle like a 
comparison against 0.

> Sure, I guess I'm just trying to avoid having so many returns out of
> the code at various levels of nesting. The page_get_target_data code is
> harder to follow. What about:
> 
> int page_get_flags(target_ulong address)
> {
>      PageFlagsNode *p = pageflags_find(address, address);
> 
>      /*
>       * See util/interval-tree.c re lockless lookups: no false positives but
>       * there are false negatives.  If we had the lock and found
>       * nothing we are done, otherwise retry with the mmap lock acquired.
>       */
>      if (p) {
>          return p->flags;
>      } else if (have_mmap_lock()) {
>          return 0;
>      }
> 
>      mmap_lock();
>      p = pageflags_find(address, address);
>      mmap_unlock();
> 
>      return p ? p->flags : 0;
> }

Ok, I can use this.  In for v3.

As for page_get_target_data, see v2, in which there has been clean up.


r~


^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2022-10-27 21:41 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-06  3:10 [PATCH 00/24] accel/tcg: Rewrite user-only vma tracking Richard Henderson
2022-10-06  3:10 ` [PATCH 01/24] util: Add interval-tree.c Richard Henderson
2022-10-25  8:40   ` Alex Bennée
2022-10-25 10:36     ` Richard Henderson
2022-10-06  3:10 ` [PATCH 02/24] accel/tcg: Make page_alloc_target_data allocation constant Richard Henderson
2022-10-25  8:45   ` Alex Bennée
2022-10-06  3:10 ` [PATCH 03/24] accel/tcg: Remove disabled debug in translate-all.c Richard Henderson
2022-10-25  8:46   ` Alex Bennée
2022-10-06  3:10 ` [PATCH 04/24] accel/tcg: Split out PageDesc to internal.h Richard Henderson
2022-10-25  8:47   ` Alex Bennée
2022-10-06  3:10 ` [PATCH 05/24] accel/tcg: Split out tb-maint.c Richard Henderson
2022-10-25  8:49   ` Alex Bennée
2022-10-06  3:10 ` [PATCH 06/24] accel/tcg: Move assert_no_pages_locked to internal.h Richard Henderson
2022-10-25  8:49   ` Alex Bennée
2022-10-06  3:10 ` [PATCH 07/24] accel/tcg: Drop cpu_get_tb_cpu_state from TARGET_HAS_PRECISE_SMC Richard Henderson
2022-10-25  8:50   ` Alex Bennée
2022-10-06  3:10 ` [PATCH 08/24] accel/tcg: Remove duplicate store to tb->page_addr[] Richard Henderson
2022-10-25  9:12   ` Alex Bennée
2022-10-06  3:10 ` [PATCH 09/24] accel/tcg: Introduce tb_{set_}page_addr{0,1} Richard Henderson
2022-10-25  9:47   ` Alex Bennée
2022-10-06  3:10 ` [PATCH 10/24] accel/tcg: Rename tb_invalidate_phys_page Richard Henderson
2022-10-25  9:49   ` Alex Bennée
2022-10-06  3:11 ` [PATCH 11/24] accel/tcg: Rename tb_invalidate_phys_page_range and drop end parameter Richard Henderson
2022-10-25 13:21   ` Alex Bennée
2022-10-06  3:11 ` [PATCH 12/24] accel/tcg: Unify declarations of tb_invalidate_phys_range Richard Henderson
2022-10-25 13:24   ` Alex Bennée
2022-10-06  3:11 ` [PATCH 13/24] accel/tcg: Use tb_invalidate_phys_page in page_set_flags Richard Henderson
2022-10-06  3:11 ` [PATCH 14/24] accel/tcg: Call tb_invalidate_phys_page for PAGE_RESET Richard Henderson
2022-10-25 15:42   ` Alex Bennée
2022-10-25 20:55     ` Richard Henderson
2022-10-06  3:11 ` [PATCH 15/24] accel/tcg: Use interval tree for TBs in user-only mode Richard Henderson
2022-10-25 15:58   ` Alex Bennée
2022-10-25 21:19     ` Richard Henderson
2022-10-06  3:11 ` [PATCH 16/24] accel/tcg: Use page_reset_target_data in page_set_flags Richard Henderson
2022-10-25 16:12   ` Alex Bennée
2022-10-06  3:11 ` [PATCH 17/24] accel/tcg: Use tb_invalidate_phys_range " Richard Henderson
2022-10-25 16:14   ` Alex Bennée
2022-10-06  3:11 ` [PATCH 18/24] accel/tcg: Move TARGET_PAGE_DATA_SIZE impl to user-exec.c Richard Henderson
2022-10-25 16:15   ` Alex Bennée
2022-10-06  3:11 ` [PATCH 19/24] accel/tcg: Simplify page_get/alloc_target_data Richard Henderson
2022-10-25 16:19   ` Alex Bennée
2022-10-06  3:11 ` [PATCH 20/24] accel/tcg: Use interval tree for TARGET_PAGE_DATA_SIZE Richard Henderson
2022-10-25 19:30   ` Alex Bennée
2022-10-25 21:29     ` Richard Henderson
2022-10-06  3:11 ` [PATCH 21/24] accel/tcg: Move page_{get,set}_flags to user-exec.c Richard Henderson
2022-10-26 12:35   ` [PATCH 21/24] accel/tcg: Move page_{get, set}_flags " Alex Bennée
2022-10-06  3:11 ` [PATCH 22/24] accel/tcg: Use interval tree for user-only page tracking Richard Henderson
2022-10-26 13:36   ` Alex Bennée
2022-10-27 10:20     ` Richard Henderson
2022-10-27 15:59       ` Alex Bennée
2022-10-27 21:38         ` Richard Henderson
2022-10-06  3:11 ` [PATCH 23/24] accel/tcg: Move PageDesc tree into tb-maint.c for system Richard Henderson
2022-10-06  3:11 ` [PATCH 24/24] accel/tcg: Move remainder of page locking to tb-maint.c Richard Henderson
2022-10-26 13:48   ` Alex Bennée
2022-10-24 23:05 ` [PATCH 00/24] accel/tcg: Rewrite user-only vma tracking Richard Henderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.